Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240605となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 変数の相似選択のためのクラスタモデルと学生の就業率予測の強化 Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction ( http://arxiv.org/abs/2407.16884v1 ) ライセンス: Link先を確認	Pooja Thakar, Anil Mehta, Manisha,	(参考訳) 教育データマイニング(EDM: Educational Data Mining)は、データマイニングが学生のパフォーマンス予測に広く利用されている、有望な分野である。高等教育が直面する最も一般的かつ最近の課題の1つは、生徒を巧みに雇用できるようにすることである。施設は大量のデータを持っているが、それでも知識を明らかにして生徒を指導することはできない。教育におけるデータは一般的に非常に大きく、多次元であり、自然界では不均衡である。このようなデータから知識を抽出するプロセスには、独自の問題セットがあり、非常に複雑なタスクである。本稿では,様々な大学や大学から,MCA(Masters in Computer Applications)の学生データを収集する。データセットは、大きく、不均衡で、本質的に多次元である。本稿では,前処理段階に適用されたクラスタベースモデルを用いて,変数のパーシミュニケートな選択を支援し,予測アルゴリズムの性能を向上させる。したがって、学生の就労率の予測がより容易になる。 Educational Data Mining (EDM) is a promising field, where data mining is widely used for predicting students performance. One of the most prevalent and recent challenge that higher education faces today is making students skillfully employable. Institutions possess large volume of data; still they are unable to reveal knowledge and guide their students. Data in education is generally very large, multidimensional and unbalanced in nature. Process of extracting knowledge from such data has its own set of problems and is a very complicated task. In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India. The dataset is large, unbalanced and multidimensional in nature. A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables and improves the performance of predictive algorithms. Hence, facilitate in better prediction of Students Employability.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-05
# GPT-4におけるモラルの1次元マッピング--モラル領域の国別推定精度がモラル領域にどのように依存するか GPT-4's One-Dimensional Mapping of Morality: How the Accuracy of Country-Estimates Depends on Moral Domain ( http://arxiv.org/abs/2407.16886v1 ) ライセンス: Link先を確認	Pontus Strimling, Joel Krueger, Simon Karlsson,	(参考訳) 以前の研究では、Open AIのGPTモデルは、各国間の道徳的意見の変化を予測することができるが、低所得国に比べて、高い所得国では精度が著しく高い傾向にあることが示されている。本研究は, 過去の知見を再現し, 道徳的問題の種類によってどのように精度が変化するかを調べることによって研究を進めることを目的としている。世界価値調査と欧州価値調査の回答を用いて、63か国18の道徳問題をカバーし、各道徳問題の平均スコアを算出し、GPT-4の予測と比較した。以上の結果から,GPT-4は低所得国よりも高所得国において高い予測的成功率を示した。しかしながら, GPT-4は, 各国の保守主義・自由主義の程度を反映して, 主に一つの次元に基づいて予測を行う。逆に、現実世界の道徳観は2次元のように見える。道徳的問題が道徳的領域に基づいて分類されると、GPT-4の予測は、高所得者(r = .77)と低所得者(r = .58)の両方で、個人性領域において著しく正確であることが分かる。しかし、予測精度は高所得国(r = .30)と低所得国(r = -.16)の両方で暴力的不正直な領域で著しく低下し、GPT-4の1次元の世界観が道徳的景観の複雑さを完全に捉えていないことを示している。本研究は、GPT-4の道徳的理解を理解するために、国固有の特徴を考えるだけでなく、目前にある道徳的問題の特徴も考慮することの重要性を強調している。 Prior research demonstrates that Open AI's GPT models can predict variations in moral opinions between countries but that the accuracy tends to be substantially higher among high-income countries compared to low-income ones. This study aims to replicate previous findings and advance the research by examining how accuracy varies with different types of moral questions. Using responses from the World Value Survey and the European Value Study, covering 18 moral issues across 63 countries, we calculated country-level mean scores for each moral issue and compared them with GPT-4's predictions. Confirming previous findings, our results show that GPT-4 has greater predictive success in high-income than in low-income countries. However, our factor analysis reveals that GPT-4 bases its predictions primarily on a single dimension, presumably reflecting countries' degree of conservatism/liberalism. Conversely, the real-world moral landscape appears to be two-dimensional, differentiating between personal-sexual and violent-dishonest issues. When moral issues are categorized based on their moral domain, GPT-4's predictions are found to be remarkably accurate in the personal-sexual domain, across both high-income (r = .77) and low-income (r = .58) countries. Yet the predictive accuracy significantly drops in the violent-dishonest domain for both high-income (r = .30) and low-income (r = -.16) countries, indicating that GPT-4's one-dimensional world-view does not fully capture the complexity of the moral landscape. In sum, this study underscores the importance of not only considering country-specific characteristics to understand GPT-4's moral understanding, but also the characteristics of the moral issues at hand.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-05
# インド高等教育システムにおける雇用可能性の統一予測モデル Unified Prediction Model for Employability in Indian Higher Education System ( http://arxiv.org/abs/2407.17591v1 ) ライセンス: Link先を確認	Pooja Thakar, Anil Mehta, Manisha,	(参考訳) 教育データマイニングは、過去10年間で研究者の間で非常に人気がある。この領域における以前の取り組みは、学生の学業成績の予測にのみ向けられていた。大学構内における学生の就学率の予測は, 学生の就学初期における就学率の予測に向け, 学生の就学率の予測に向けられた研究が極めて少ない。さらに、既存の学生雇用予測の研究は、アプローチにおいて普遍的ではなく、1つのコースまたは大学/機関のみに基づいている。そのため、あるコンテキストから別のコンテキストへ拡張性がない。統一の必要性から、Bchelor in Engineering/Technology and Masters in Computer Applicationsという専門技術コースのデータがインド17州から収集されている。このようなデータを扱うために、17の状態データセットに統一的な予測モデルが開発され、適用されている。本研究は, モデルが普遍的に適用可能であることを証明し, 異なる文化的背景とコース構造を持つインドパン・インディアの様々な州や機関に適用可能であることを実証する。また,本論文は,学生の就学率の予測に関して,国家に対するインド教育制度に有意な差がないことを統計的に調査し,証明している。モデルは、インドのシナリオにおける学生雇用率予測のための一般化されたソリューションを提供する。 Educational Data Mining has become extremely popular among researchers in last decade. Prior effort in this area was only directed towards prediction of academic performance of a student. Very less number of researches are directed towards predicting employability of a student i.e. prediction of students performance in campus placements at an early stage of enrollment. Furthermore, existing researches on students employability prediction are not universal in approach and is either based upon only one type of course or University/Institute. Henceforth, is not scalable from one context to another. With the necessity of unification, data of professional technical courses namely Bachelor in Engineering/Technology and Masters in Computer Applications students have been collected from 17 states of India. To deal with such a data, a unified predictive model has been developed and applied on 17 states datasets. The research done in this paper proves that model has universal application and can be applied to various states and institutes pan India with different cultural background and course structure. This paper also explores and proves statistically that there is no significant difference in Indian Education System with respect to states as far as prediction of employability of students is concerned. Model provides a generalized solution for student employability prediction in Indian Scenario.	翻訳日:2024-08-05 01:35:56 公開日:2024-06-05
# スライダチャット:3Dスライダのためのローカルチャットボットの構築 SlicerChat: Building a Local Chatbot for 3D Slicer ( http://arxiv.org/abs/2407.11987v1 ) ライセンス: Link先を確認	Colton Barr,	(参考訳) 3D Slicerは3Dデータ視覚化と分析のための強力なプラットフォームだが、新しいユーザーにとって大きな学習曲線がある。 ChatGPTのような生成AIアプリケーションは、自然言語を使ってさまざまなドキュメントソース間のギャップを埋める潜在的な方法として登場した。しかし、3DスライダのドキュメンテーションへのLLMサービスの露出は限られているため、ChatGPTと関連するサービスは幻覚に悩まされる傾向にある。このプロジェクトの目的は、SlicerChatと呼ばれるチャットボットアーキテクチャを構築することであり、3D Slicer関連の質問に答え、オープンソースモデルを使用してローカルで実行できるように最適化されている。この研究で調査された中核的な質問は、微調整、モデルサイズ、そしてプロンプトに含まれるドメイン知識の種類による、回答の品質と速度の違いに関するものだ。プロトタイプのSlicerChatシステムは、Code-Llama Instructアーキテクチャに基づいた3Dスライダのカスタム拡張として開発された。低階適応を用いてサイズ1.1B,7B,13Bのモデルを微調整し、3Dスライダドキュメンテーションの様々なソースを検索型拡張生成パラダイムで使用するためにコンパイルした。 5つの3D Slicer質問のベンチマークデータセットで、ファインチューニングとモデルサイズの組み合わせをテストすると、ファインチューニングはベースアーキテクチャと比較してモデル性能や速度に影響を与えず、より大きなモデルの方が大幅に速度を低下させる結果となった。プロンプトに3Dスライダのドキュメンテーションを追加する実験では、PythonのサンプルコードとMarkdownのドキュメンテーションが最も有用な情報であるが、3DスライダのシーンデータとDiscourseからの質問もモデルのパフォーマンスを改善した。結論として、このプロジェクトは高品質でローカルなチャットボットを3D Slicerに直接統合し、新しいユーザーや経験豊富な開発者がソフトウェアをより効率的に使えるようにする可能性を示している。 3D Slicer is a powerful platform for 3D data visualization and analysis, but has a significant learning curve for new users. Generative AI applications, such as ChatGPT, have emerged as a potential method of bridging the gap between various sources of documentation using natural language. The limited exposure of LLM services to 3D Slicer documentation, however, means that ChatGPT and related services tend to suffer from significant hallucination. The objective of this project is to build a chatbot architecture, called SlicerChat, that is optimized to answer 3D Slicer related questions and able to run locally using an open-source model. The core research questions explored in this work revolve around the answer quality and speed differences due to fine-tuning, model size, and the type of domain knowledge included in the prompt. A prototype SlicerChat system was built as a custom extension in 3D Slicer based on the Code-Llama Instruct architecture. Models of size 1.1B, 7B and 13B were fine-tuned using Low rank Adaptation, and various sources of 3D Slicer documentation were compiled for use in a Retrieval Augmented Generation paradigm. Testing combinations of fine-tuning and model sizes on a benchmark dataset of five 3D Slicer questions revealed that fine-tuning had no impact on model performance or speed compared to the base architecture, and that larger models performed better with a significant speed decrease. Experiments with adding 3D Slicer documentation to the prompt showed that Python sample code and Markdown documentation were the most useful information to include, but that adding 3D Slicer scene data and questions taken from Discourse also improved model performance. In conclusion, this project shows the potential for integrating a high quality, local chatbot directly into 3D Slicer to help new users and experienced developers alike to more efficiently use the software.	翻訳日:2024-07-22 11:50:18 公開日:2024-06-05
# メタフォリックパラフレーズを用いたよりハードなクロスドキュメントイベント参照解決データセットの生成 Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing ( http://arxiv.org/abs/2407.11988v1 ) ライセンス: Link先を確認	Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin,	(参考訳) 最も一般的なクロスドキュメントイベント参照解決(CDEC)データセットは、コア参照イベントトリガ(イベントを参照する単語やフレーズ)間の語彙的多様性が欠如しているため、タスクの真の難しさを伝えることができない。さらに、図形言語のためのイベントデータセットのデジェストがあり、イベント理解における重要な研究の道のりを制限している。象徴的で比喩的な言語でCDECにイベントコアフバンクプラス(ECB+)の語彙的に豊かな変種であるECB+METAを導入することで、これらの2つの問題に対処する。我々は、ECB+の文書における文の比喩的変換のツールとしてChatGPTを使用し、変換された文の元のイベントトリガーを半自動的にタグ付けする。このようにして、高価なコア参照リンクの再注釈を避ける。我々は、ECB+METAとの闘いをうまくこなす既存の手法を示す結果を示し、より困難なデータセットに関するCDEC研究の道を開く。コード/データ:https://github.com/ahmeshaf/llms_coref The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref	翻訳日:2024-07-22 11:30:12 公開日:2024-06-05
# 大規模言語モデルにおけるヘッド・オブ・ライン・ブロッキングの解決に必要なのは1つのキュー One Queue Is All You Need: Resolving Head-of-Line Blocking in Large Language Model Serving ( http://arxiv.org/abs/2407.00047v1 ) ライセンス: Link先を確認	Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Shengkun Cui, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer,	(参考訳) LLM(Large Language Model)は,エンタープライズアプリケーションとコンシューマアプリケーションの両方を対象とするクラウドプロバイダにとって,ますます重要なワークロードになっています。これらのアプリケーションからのLLM推論要求には、本番環境に従わなければならないエンドツーエンドのレイテンシSLOがある。しかし、既存のLLMサービスシステムは、エンドツーエンドのレイテンシSLOよりも、要求サービススループットや要求実行遅延といった最適化目標に重点を置いている。待ち時間に敏感なリクエストに対するエンドツーエンドのSLOを実現することは、リクエストキューにヘッド・オブ・ライン(HOL)がブロックされているため困難である。上記の課題に対処するため,LLMサービスのためのマルチモデルキュー管理フレームワークであるQLMを提案する。 QLMは確率的プログラミングを用いて、複数のLSMサービングオペレーション(LSO)の動作をオーケストレーションし、HOLブロックを減らし、SLO達成を最大化する。具体的には、モデルスワップ、要求消去、GPU-CPU状態スワップ、ロードバランシング、ウォームモデルスタートなどである。実世界のLLMサービスデータセットを用いた異種GPUデバイスおよびモデルの評価は、QLMがSLOの達成率を40-90%改善し、スループットを20-400%向上し、他の最先端のLLMサービスシステムと比較してデバイス利用率を維持または改善していることを示している。 $ $Large language models (LLMs) have become an increasingly important workload for cloud providers catering to both enterprise and consumer applications. LLM inference requests from these applications have end-to-end latency SLOs that must be adhered to in production settings. However, existing LLM serving systems focus on optimization objectives such as request serving throughput or request execution latency rather than the end-to-end latency SLOs. Achieving end-to-end SLOs for latency-sensitive requests is challenging due to head-of-line (HOL) blocking in the request queue, which results from bursty arrival rates and insufficient resources. To address the above challenge, we propose QLM, a multi-model queue management framework for LLM serving. QLM uses stochastic programming to orchestrate the actions of multiple LLM Serving Operations (LSOs) to reduce HOL blocking and maximize SLO attainment. Specifically, QLM uses the following LSOs: model swapping, request eviction, GPU-CPU state swapping, load balancing, and warm model start. Evaluation on heterogeneous GPU devices and models with real-world LLM serving dataset shows that QLM improves SLO attainment by 40-90% and throughput by 20-400% while maintaining or improving device utilization compared to other state-of-the-art LLM serving systems.	翻訳日:2024-07-07 13:43:41 公開日:2024-06-05
# Block-Toeplitz Augmented Covariance Matrices and Siegel Metricsを用いたモータ画像BCI分類の計算効率の向上 Enhancing Computational Efficiency of Motor Imagery BCI Classification with Block-Toeplitz Augmented Covariance Matrices and Siegel Metric ( http://arxiv.org/abs/2406.16909v1 ) ライセンス: Link先を確認	Igor Carrara, Theodore Papadopoulo,	(参考訳) 脳波信号は多次元データセットとして表現される。運動画像分類を改善するために, 拡張共分散法(ACM)の強化を導入し, 動的系の位相空間再構成とリーマン幾何学の組合せとして現れる。実際、分類を改善するための対称正定行列の構成に基づいている。しかし、この行列は以前に無視されたブロック・トゥープリッツ構造を持つ。この研究は、それらが属する実多様体におけるそのような行列、すなわちブロック・トゥープリッツ SPD 行列の集合を扱う。いくつかの操作の後、この集合はSPD多様体とシーゲルディスク空間の積と見なすことができ、提案手法はMOABBフレームワークを用いてセッション内評価法を用いて検証された。 ACMと同じような分類性能を実現しており、一般的には--あるいは---------------------------------------------------------------------------------------------------- --------------------------------------------------------------- しかし、結果としてACMよりも計算効率が向上し、リアルタイム実験にさらに適している。 Electroencephalographic signals are represented as multidimensional datasets. We introduce an enhancement to the augmented covariance method (ACM), exploiting more thoroughly its mathematical properties, in order to improve motor imagery classification.Standard ACM emerges as a combination of phase space reconstruction of dynamical systems and of Riemannian geometry. Indeed, it is based on the construction of a Symmetric Positive Definite matrix to improve classification. But this matrix also has a Block-Toeplitz structure that was previously ignored. This work treats such matrices in the real manifold to which they belong: the set of Block-Toeplitz SPD matrices. After some manipulation, this set is can be seen as the product of an SPD manifold and a Siegel Disk Space.The proposed methodology was tested using the MOABB framework with a within-session evaluation procedure. It achieves a similar classification performance to ACM, which is typically better than -- or at worse comparable to -- state-of-the-art methods. But, it also improves consequently the computational efficiency over ACM, making it even more suitable for real time experiments.	翻訳日:2024-07-01 06:41:31 公開日:2024-06-05
# 心の目:マルチモーダル類似性学習による脳波による画像認識 Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning ( http://arxiv.org/abs/2406.16910v1 ) ライセンス: Link先を確認	Chi-Sheng Chen, Chun-Shu Wei,	(参考訳) 非侵襲脳波(EEG)信号からの画像の復号は、人間の脳がどのように視覚情報を現実世界のシナリオで処理するかを理解する上で大きな課題である。信号対雑音比と非定常性の問題に対処するために,ゼロショット脳波画像分類のためのMUltimodal similarity-keeper contrastivE learning (MUSE) フレームワークを提案する。我々は、脳波信号に適した多変量時系列エンコーダを開発し、広範囲な視覚的脳波データセットを用いて、正規化されたコントラスト脳波画像事前学習の有効性を評価する。本手法は,200方向ゼロショット画像分類において,トップ1の精度が19.3%,トップ5の精度が48.8%の最先端性能を実現する。さらに、モデル解釈による神経パターンの可視化を行い、人間の脳の視覚的処理のダイナミクスに光を当てる。この作業のコードリポジトリは、https://github.com/ChiShengChen/MUSE_EEG.comで公開されている。 Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG.	翻訳日:2024-07-01 06:31:46 公開日:2024-06-05
# ナノダイヤモンドセンサを用いた動的非局所変形の測定 Measurement of dynamic nonlocal deformation using nanodiamond sensors ( http://arxiv.org/abs/2406.18577v1 ) ライセンス: Link先を確認	Yue Cui, Weng-Hang Leong, Guoli Zhu, Ren-Bao Liu, Quan Li,	(参考訳) 原子間力顕微鏡によるインデンテーションとナノダイアモンドによる配向追跡を統合した非局所変形検出は、高精度で空間分解能が高く、ソフトバイオシステムの機械的特性を研究するのに有用な技術である。しかし、この技術は現在、生体活動や他の外部の摂動とインデンテーションによる変形を区別できないため、生命の無いシステムに限られている。そこで我々は,この制限を克服するために,振動ナノインデンテーションと分光分析を用いた動的非局所変形検出法を開発した。粘弾性材料と生体細胞の機械的応答における表面・界面効果の開示につながる、時間的および空間的に解決された機械的解析を、数十マイクロ秒のタイムラグ精度、ナノメートルの垂直変形精度、およびサブハンドレッドナノメートルの空間的解像度で実現する。表面張力の無視は、材料の液体のような特性を過小評価する。この研究は、軟質で複雑な生体関連物質の時空間力学的解析の有用なツールとしてナノダイヤモンドセンサーを実証する。 Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induced deformation from that associated with live activities or other external perturbations. Here we develop a dynamic nonlocal deformation sensing method using oscillatory nanoindentation and spectroscopic analysis to overcome this limitation. The method realizes both temporally and spatially resolved mechanical analysis, with tens of microsecond time-lag precision, nanometer vertical deformation precision, and sub-hundred nanometer lateral spatial resolution, leading to the disclosure of surface/interface effects in the mechanical response of viscoelastic materials and live cells. Neglecting surface tension would underestimate the liquid-like characteristics of the materials. This work demonstrates nanodiamond sensors as a useful tool for spatial-temporal mechanical analysis of soft, complex bio-relevant materials.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-05
# Hire: 画像テキストマッチングのためのハイブリッドモーダルインタラクションとマルチリレーショナルエンハンスメント Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching ( http://arxiv.org/abs/2406.18579v1 ) ライセンス: Link先を確認	Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose,	(参考訳) 画像テキストマッチング(ITM)はコンピュータビジョンの基本的な問題である。重要な問題は、視覚とテキストの表現を共同で学習し、それらの類似性を正確に見積もることである。既存のほとんどの手法は、モダリティにおける特徴強化や、モダリティ間の特徴相互作用に重点を置いているが、それにもかかわらず、対応する文とリッチな文脈意味論に一致するオブジェクト間の関係に基づいて、オブジェクト表現の文脈情報を無視している。本稿では,オブジェクトと単語間のモーダル間セマンティクスを暗黙的および明示的関係モデリングで関連づける,画像テキストマッチングのための複合モーダルインタラクションとマルチリレーショナルエンハンスメント(termed \textit{Hire})を提案する。特に、明示的なモーダル空間意味グラフに基づく推論ネットワークは、オブジェクトの空間位置とシーングラフの明示的な関係によって導かれる、空間的および意味的な関係性を持つ視覚オブジェクトの文脈的表現を改善するように設計されている。我々は、明示的な関係検出の耐障害性を改善するために、明示的なモデリングの前に潜在的な関係の相互作用に暗黙的な関係のモデリングを用いる。そして、視覚的およびテキスト的意味表現は、モーダル間対話的注意とモーダル間アライメントによって共同で洗練される。オブジェクトのコンテキストとテキストのコンテキストを関連付けるため、クロスレベルなオブジェクト文と単語画像に基づく対話的注意による視覚的意味表現をさらに洗練する。広汎な実験により、暗黙的および明示的なモデリングとのハイブリッド・モーダル相互作用が画像テキストマッチングにおいてより有益であることが検証された。提案した‘textit{Hire} は MS-COCO と Flickr30K のベンチマークで新しい最先端結果を得る。 Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-object relationships that match the corresponding sentences with rich contextual semantics. In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling. In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects with salient spatial and semantic relational connectivities, guided by the explicit relationships of the objects' spatial positions and their scene graph. We use implicit relationship modelling for potential relationship interactions before explicit modelling to improve the fault tolerance of explicit relationship detection. Then the visual and textual semantic representations are refined jointly via inter-modal interactive attention and cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via cross-level object-sentence and word-image-based interactive attention. Extensive experiments validate that the proposed hybrid-modal interaction with implicit and explicit modelling is more beneficial for image-text matching. And the proposed \textit{Hire} obtains new state-of-the-art results on MS-COCO and Flickr30K benchmarks.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-05
# 大規模生成ネットワーク上のシーディング光-拡散モデルにおける疫学的不確かさの推定 Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models ( http://arxiv.org/abs/2406.18580v1 ) ライセンス: Link先を確認	Lucas Berry, Axel Brando, David Meger,	(参考訳) 1億のパラメータ数と高次元画像空間での演算で有名な生成拡散モデルは、計算要求による従来の不確実性推定手法に重大な課題を提起する。本研究では,拡散モデルの疫学的不確実性を推定するために設計されたDiffusion Ensembles for Capturing Uncertainity (DECU) という革新的なフレームワークを紹介する。 DECUフレームワークは、事前訓練されたパラメータの静的なセットを組み込んで条件拡散モデルのアンサンブルを効率的に訓練する手法を導入し、計算負担と訓練を必要とするパラメータの数を大幅に削減する。さらに、DECはPairwise-Distance Estimator (PaiDEs) を用いて、高次元空間におけるモデル出力と重みの相互情報を評価することで、てんかんの不確かさを正確に測定する。このフレームワークの有効性は、ImageNetデータセットの実験を通じて実証され、特にアンダーサンプル画像クラスにおいて、てんかん不確実性を捉える能力を強調している。 Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-05
# スティル化スコア蒸留によるDream-in-Style:テキスト・ツー・3D生成 Dream-in-Style: Text-to-3D Generation using Stylized Score Distillation ( http://arxiv.org/abs/2406.18581v1 ) ライセンス: Link先を確認	Hubert Kompanowski, Binh-Son Hua,	(参考訳) 本稿では,3次元オブジェクトをスタイルで生成する手法を提案する。提案手法では,テキストプロンプトとスタイル参照イメージを入力として取り込んでニューラルラディアンスフィールドを再構成し,テキストプロンプトと参照画像に続くスタイルに整合した3Dモデルを合成する。 3Dオブジェクトを同時に生成し,一行でスタイル転送を行うために,テキストから3Dまでの最適化プロセスを導出し,視覚的に可視な形状と外観を出力するスタイリングされたスコア蒸留損失を提案する。本発明のスタイライズされたスコア蒸留は,従来の事前訓練されたテキスト・ツー・イメージモデルと,参照画像からスタイルを注入するために操作された自己保持層のキーと値の特徴を組み合わさったものである。最新の手法との比較により,本手法の強い視覚的性能が示され,ユーザ研究の定量的結果によってさらに裏付けられた。 We present a method to generate 3D objects in styles. Our method takes a text prompt and a style reference image as input and reconstructs a neural radiance field to synthesize a 3D model with the content aligning with the text prompt and the style following the reference image. To simultaneously generate the 3D object and perform style transfer in one go, we propose a stylized score distillation loss to guide a text-to-3D optimization process to output visually plausible geometry and appearance. Our stylized score distillation is based on a combination of an original pretrained text-to-image model and its modified sibling with the key and value features of self-attention layers manipulated to inject styles from the reference image. Comparisons with state-of-the-art methods demonstrated the strong visual performance of our method, further supported by the quantitative results from our user study.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-05
# 正準整合場:点雲からの動的形状の再構成 Canonical Consolidation Fields: Reconstructing Dynamic Shapes from Point Clouds ( http://arxiv.org/abs/2406.18582v1 ) ライセンス: Link先を確認	Miaowei Wang, Changjian Li, Amir Vaxman,	(参考訳) カノニカル・コンソリデーション・フィールド(CanFields: Canonical Consolidation Fields: CanFields)は、独立にサンプリングされた点雲の時系列を単一の変形コヒーレントな形状に再構成する手法である。このような入力は、しばしばモーションキャプチャーから来る。既存の手法は幾何と変形を組み合わせ、細部を滑らかにし、移動点を追跡する能力を失うか、あるいは変形を明示的に追跡するが、位相的および幾何学的アーティファクトを導入する。我々の斬新さは、ノイズや外れ値の影響を低減し、欠落した領域を克服できる方法で、点雲を単一の標準形にまとめることにある。変形を導く速度場を同時に再構築する。この統合により、低周波変形を忠実に再現しながら、幾何学の高周波詳細を維持できる。私たちのアーキテクチャは単純なコンポーネントで構成されており、データセットを使わずに任意の入力形状に適合します。提案手法のロバスト性および精度を,欠落領域,スパースフレーム,ノイズを含む多様な動的点雲のベンチマークで示す。 We present Canonical Consolidation Fields (CanFields): a method for reconstructing a time series of independently-sampled point clouds into a single deforming coherent shape. Such input often comes from motion capture. Existing methods either couple the geometry and the deformation, where by doing so they smooth fine details and lose the ability to track moving points, or they track the deformation explicitly, but introduce topological and geometric artifacts. Our novelty lies in the consolidation of the point clouds into a single canonical shape in a way that reduces the effect of noise and outliers, and enables us to overcome missing regions. We simultaneously reconstruct the velocity fields that guide the deformation. This consolidation allows us to retain the high-frequency details of the geometry, while faithfully reproducing the low-frequency deformation. Our architecture comprises simple components, and fits any single input shape without using datasets. We demonstrate the robustness and accuracy of our methods on a diverse benchmark of dynamic point clouds, including missing regions, sparse frames, and noise.	翻訳日:2024-07-01 05:40:31 公開日:2024-06-05
# Lumina-Next:Next-DiTでLumina-T2Xをより強く高速に Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT ( http://arxiv.org/abs/2406.18583v1 ) ライセンス: Link先を確認	Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao,	(参考訳) Lumina-T2Xは、フローベースの大規模拡散変換器の初期段階のファミリーであり、画像やビデオなどの様々なモダリティにノイズを変換する統一的なフレームワークを確立し、テキスト命令で条件付けされている。その有望な機能にもかかわらず、Lumina-T2Xは、トレーニング不安定、遅い推論、外挿アーティファクトなどの課題に直面している。本稿では,Lumina-T2Xの改良版であるLumina-Nextについて述べる。本稿では,Frag-DiTアーキテクチャの包括的解析から始め,Next-DiTアーキテクチャに3D RoPEとサンドイッチ正規化を導入することで,いくつかの部分最適化コンポーネントを同定する。より高分解能な外挿を実現するために,3D RoPEとテキスト・画像生成に適用された異なるコンテキスト外挿手法を徹底的に比較し,拡散トランスフォーマに適した周波数・時間対応スケール付き RoPE を提案する。さらに,フローODEとコンテキストドロップ法を解く際のサンプリングステップを削減するためのシグモイド時間離散化スケジュールを導入し,冗長な視覚トークンをマージしてネットワーク評価を高速化し,全体のサンプリング速度を効果的に向上させた。これらの改善により、Lumina-Nextは基本的なテキスト・ツー・イメージ生成の品質と効率を向上するだけでなく、デコーダベースのLCMをテキストエンコーダとして使い、優れた解像度外挿機能と多言語生成をゼロショットで実現している。汎用的な生成フレームワークとしてLumina-Nextをさらに検証するために、視覚認識、マルチビュー、オーディオ、音楽、ポイントクラウド生成など様々なタスクをインスタンス化し、これらの領域で強いパフォーマンスを示す。すべてのコードとモデルウェイトをリリースすることにより、ユニバーサルモデリングが可能な次世代生成AIの開発を進めることを目指している。 Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling.	翻訳日:2024-07-01 05:40:31 公開日:2024-06-05
# ロボットマニピュレーションのための不変マッチングを用いたワンショット模倣学習 One-Shot Imitation Learning with Invariance Matching for Robotic Manipulation ( http://arxiv.org/abs/2405.13178v2 ) ライセンス: Link先を確認	Xinyu Zhang, Abdeslam Boularias,	(参考訳) 多様な操作タスクを実行できる単一の普遍的なポリシーを学ぶことは、ロボティクスにおける有望な新しい方向性である。しかし、既存のテクニックは、トレーニング中に遭遇したタスクのみを実行することができ、新しいタスクを学ぶために多数のデモを必要とする学習ポリシーに限られている。一方、人間は1つの無意味なデモンストレーションから新しいタスクを学ぶことができる。そこで本研究では,IMOP(Invariance-Matching One-shot Policy Learning)アルゴリズムを提案する。エンドエフェクタのポーズを直接学習する標準的なプラクティスとは対照的に、IMOPはまず与えられたタスクの状態空間の不変領域を学習し、次にデモとテストシーン間の不変領域をマッチングしてエンドエフェクタのポーズを計算する。 IMOPは18のRLBenchタスクで訓練され、18のタスクで平均4.5%、最先端のタスクを継続的に上回る成功率を達成した。さらに重要なことは、IMOPは1つの未発表のデモから新しいタスクを学習でき、微調整なしで、9つのカテゴリで選択された22の新規タスクに対して、最先端のタスクよりも11.5\%の平均的な成功率の向上を達成することができる。 IMOPはまた、新しい形状に一般化し、デモと異なるオブジェクトを操作することを学べる。さらに、IMOPは1つの実ロボットデモを用いて、ワンショットのsim-to-real転送を行うことができる。 Learning a single universal policy that can perform a diverse set of manipulation tasks is a promising new direction in robotics. However, existing techniques are limited to learning policies that can only perform tasks that are encountered during training, and require a large number of demonstrations to learn new tasks. Humans, on the other hand, often can learn a new task from a single unannotated demonstration. In this work, we propose the Invariance-Matching One-shot Policy Learning (IMOP) algorithm. In contrast to the standard practice of learning the end-effector's pose directly, IMOP first learns invariant regions of the state space for a given task, and then computes the end-effector's pose through matching the invariant regions between demonstrations and test scenes. Trained on the 18 RLBench tasks, IMOP achieves a success rate that outperforms the state-of-the-art consistently, by 4.5% on average over the 18 tasks. More importantly, IMOP can learn a novel task from a single unannotated demonstration, and without any fine-tuning, and achieves an average success rate improvement of $11.5\%$ over the state-of-the-art on 22 novel tasks selected across nine categories. IMOP can also generalize to new shapes and learn to manipulate objects that are different from those in the demonstration. Further, IMOP can perform one-shot sim-to-real transfer using a single real-robot demonstration.	翻訳日:2024-06-23 14:05:12 公開日:2024-06-05
# 乱流におけるスイミングのためのアクター・クリティカル強化学習における物理インフォームド批判 Physics-Informed Critic in an Actor-Critic Reinforcement Learning for Swimming in Turbulence ( http://arxiv.org/abs/2406.10242v1 ) ライセンス: Link先を確認	Christopher Koh, Laurent Pagnier, Michael Chertkov,	(参考訳) 乱流拡散は粒子を分離に近接させる。受動的に対流する粒子に近い粒子を維持するために必要な水泳の努力について検討した。本研究では,新しい物理情報強化学習(PIRL)戦略と所定の制御(PC)戦略と標準物理情報強化学習戦略とを開発・比較することにより,これらの取り組みを意図した目標と最適にバランスさせることを検討する。我々のPIRLスキームはActor-Physicistと呼ばれ、Actor-Criticアルゴリズムの適応であり、ニューラルネットワークのパラメータ化Criticを解析的に導出された物理的ヒューリスティック関数(物理学者)に置き換える。この戦略は、確率的最適制御の定式化と標準物理非依存のアクター・クリティカル型アルゴリズムから導かれる解析計算された最適PCポリシーと比較される。 Turbulent diffusion causes particles placed in proximity to separate. We investigate the required swimming efforts to maintain a particle close to its passively advected counterpart. We explore optimally balancing these efforts with the intended goal by developing and comparing a novel Physics-Informed Reinforcement Learning (PIRL) strategy with prescribed control (PC) and standard physics-agnostic Reinforcement Learning strategies. Our PIRL scheme, coined the Actor-Physicist, is an adaptation of the Actor-Critic algorithm in which the Neural Network parameterized Critic is replaced with an analytically derived physical heuristic function (the physicist). This strategy is then compared with an analytically computed optimal PC policy derived from a stochastic optimal control formulation and standard physics-agnostic Actor-Critic type algorithms.	翻訳日:2024-06-23 13:35:51 公開日:2024-06-05
# フェイクニュースの検出における大規模言語モデルの有効性の評価:比較分析 Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysis ( http://arxiv.org/abs/2406.06584v1 ) ライセンス: Link先を確認	Sahas Koka, Anthony Vuong, Anish Kataria,	(参考訳) 人工知能の影響がますます高まる時代において、偽ニュースの検出は特に、誤報が社会に重大な影響を及ぼす選挙シーズンのような文脈において重要である。本研究では,偽ニュースコンテンツの識別・フィルタリングにおける各種LLMの有効性について検討した。比較分析アプローチを用いて、GPT-4、Claude 3 Sonnet、Gemini Pro 1.0、Mistral Largeの4つの大きなLLMと、Gemma 7BとMistral 7Bの2つの小さなLLMをテストした。 Kaggleのフェイクニュースデータセットのサンプルを使用することで、この研究はフェイクニュース検出におけるLLMの現在の能力と限界に光を当てるだけでなく、AI駆動の情報整合性向上における開発者や政策立案者の影響についても議論する。 In an era increasingly influenced by artificial intelligence, the detection of fake news is crucial, especially in contexts like election seasons where misinformation can have significant societal impacts. This study evaluates the effectiveness of various LLMs in identifying and filtering fake news content. Utilizing a comparative analysis approach, we tested four large LLMs -- GPT-4, Claude 3 Sonnet, Gemini Pro 1.0, and Mistral Large -- and two smaller LLMs -- Gemma 7B and Mistral 7B. By using fake news dataset samples from Kaggle, this research not only sheds light on the current capabilities and limitations of LLMs in fake news detection but also discusses the implications for developers and policymakers in enhancing AI-driven informational integrity.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-05
# 離散時間力学系の解釈可能なモデルに対する表現的記号回帰 Expressive Symbolic Regression for Interpretable Models of Discrete-Time Dynamical Systems ( http://arxiv.org/abs/2406.06585v1 ) ライセンス: Link先を確認	Adarsh Iyer, Nibodh Boddupalli, Jeff Moehlis,	(参考訳) 離散時間力学系(定位写像)を定義する解釈可能な数学的表現は、科学的な関心の多くの現象をモデル化することができ、システムの振る舞いをより深く理解することができる。第一原理から表現を定式化するのは難しいため,データストリームのみを与えられた反復写像の表現を識別することが特に重要である。本研究では,この課題に対して,SymANNTEx(SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx ,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNT,SymANNT,SymANNT,SymANNT,SymANNT,S 回帰を最適化するためにモデルパイプラインを修正し、古典的なカオスマップを識別する際の調整されたモデルの挙動を特徴付ける。パーシモニーの目的により、スパーシリティ誘導重み正規化と情報理論インフォームド・シンプリケーションが実現される。修正したSymanNTExモデルでは,単一状態のマップを適切に識別し,二状態のアトラクタの近似に適度に成功していることを示す。これらのパフォーマンスは、データ駆動の科学的な発見と解釈を大いに約束する。 Interpretable mathematical expressions defining discrete-time dynamical systems (iterated maps) can model many phenomena of scientific interest, enabling a deeper understanding of system behaviors. Since formulating governing expressions from first principles can be difficult, it is of particular interest to identify expressions for iterated maps given only their data streams. In this work, we consider a modified Symbolic Artificial Neural Network-Trained Expressions (SymANNTEx) architecture for this task, an architecture more expressive than others in the literature. We make a modification to the model pipeline to optimize the regression, then characterize the behavior of the adjusted model in identifying several classical chaotic maps. With the goal of parsimony, sparsity-inducing weight regularization and information theory-informed simplification are implemented. We show that our modified SymANNTEx model properly identifies single-state maps and achieves moderate success in approximating a dual-state attractor. These performances offer significant promise for data-driven scientific discovery and interpretation.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-05
# Bi-Chainer: 双方向チェインで推論する大規模言語モデルを自動化する Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining ( http://arxiv.org/abs/2406.06586v1 ) ライセンス: Link先を確認	Shuqi Liu, Bowei He, Linqi Song,	(参考訳) 大規模言語モデル(LLM)は人間のような推論能力を示しているが、複雑な論理問題を解く上ではまだ課題に直面している。前方連鎖や後方連鎖のような既存の一方向連鎖法は、予測精度の低下や効率性の低下といった問題に悩まされる。そこで本研究では,2方向チェインリング手法であるBi-Chainerを提案する。これにより、中間推論結果をガイダンスとして利用して推論プロセスを容易にすることができる。 Bi-Chainerは,4つの挑戦的論理推論データセット上で,一方向チェインフレームワーク上での高精度ブートを実現する。さらに、Bi-Chainerは中間証明ステップの精度を高め、推論呼び出しの平均回数を減らし、より効率的で正確な推論を行う。 Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on four challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-05
# 感覚体験における人間とAIの知覚アライメントの探索:LLMは繊維の手を理解するか? Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand? ( http://arxiv.org/abs/2406.06587v1 ) ライセンス: Link先を確認	Shu Zhong, Elia Gatti, Youngjun Cho, Marianna Obrist,	(参考訳) 人間の意図による大規模言語モデル(LLM)の振る舞いの調整は、将来のAIにとって重要である。このアライメントの重要かつしばしば見落とされがちな側面は知覚アライメントである。タッチのような知覚のモダリティは、視覚のような他の感覚のモダリティよりも多面的かつニュアンス的である。本研究は,LLMが「触覚ハンド」タスクを用いて,人間の触覚とどのように協調するかを検討する。私たちは"Guess What Textile"インタラクションを作り、参加者には2つの繊維サンプル(ターゲットと参照)が与えられました。見ることなく、参加者はそれらの違いをLSMに説明しました。これらの記述を用いて、LLMは、その高次元埋め込み空間内での類似性を評価することによって、ターゲット繊維の同定を試みた。以上の結果から, 知覚的アライメントの程度は異なるが, 異なる繊維試料間で大きく異なることが示唆された。例えば、LLMの予測は絹のサテンには適しているが、綿のデニムには適していない。さらに, LLM予測と密に一致した織物経験を, 参加者は認識しなかった。これは触覚のアライメントに関する最初の調査であり、繊維の手で例示されている。このアライメントのばらつきの可能性のある源泉と、人間の知覚的アライメントが将来の日常業務にどのように役立つかについて議論する。 Aligning large language models (LLMs) behaviour with human intent is critical for future AI. An important yet often overlooked aspect of this alignment is the perceptual alignment. Perceptual modalities like touch are more multifaceted and nuanced compared to other sensory modalities such as vision. This work investigates how well LLMs align with human touch experiences using the "textile hand" task. We created a "Guess What Textile" interaction in which participants were given two textile samples -- a target and a reference -- to handle. Without seeing them, participants described the differences between them to the LLM. Using these descriptions, the LLM attempted to identify the target textile by assessing similarity within its high-dimensional embedding space. Our results suggest that a degree of perceptual alignment exists, however varies significantly among different textile samples. For example, LLM predictions are well aligned for silk satin, but not for cotton denim. Moreover, participants didn't perceive their textile experiences closely matched by the LLM predictions. This is only the first exploration into perceptual alignment around touch, exemplified through textile hand. We discuss possible sources of this alignment variance, and how better human-AI perceptual alignment can benefit future everyday tasks.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# Llama大言語モデルの創発的シンボリック推論能力の評価 Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models ( http://arxiv.org/abs/2406.06588v1 ) ライセンス: Link先を確認	Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti,	(参考訳) 大規模言語モデル (LLM) は,ユーザとのチャットの唯一の目的としてトレーニングされることの多い場合でも,幅広いタスクにおいて,優れたパフォーマンスを実現している。その他のスキルの中で、LLMは数学的推論ベンチマークにおいて創発的な能力を示し、適切なプロンプト法によって引き起こすことができる。本研究では,様々なシンボリック推論タスクにおいて,人気のあるオープンソースLLMの能力と限界を体系的に検討する。 Llama 2 ファミリーの3つのモデルについて,難易度の異なる数式を解く必要がある2つのデータセットで評価した。我々はLLM(Llama 2 Chat)とLlama 2(MAmmoTHとMetaMath)の2つの微調整版を数学的問題に対処するためにテストした。モデルのサイズを拡大し、関連するタスクを微調整することで、パフォーマンスが大幅に向上するのを観察する。さらに, 細粒度評価法を用いて, 計算精度の低い数式では, 計算精度が向上する傾向がみられた。 Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical reasoning benchmarks, which can be elicited with appropriate prompting methods. In this work, we systematically investigate the capabilities and limitations of popular open-source LLMs on different symbolic reasoning tasks. We evaluate three models of the Llama 2 family on two datasets that require solving mathematical formulas of varying degrees of difficulty. We test a generalist LLM (Llama 2 Chat) as well as two fine-tuned versions of Llama 2 (MAmmoTH and MetaMath) specifically designed to tackle mathematical problems. We observe that both increasing the scale of the model and fine-tuning it on relevant tasks lead to significant performance gains. Furthermore, using fine-grained evaluation measures, we find that such performance gains are mostly observed with mathematical formulas of low complexity, which nevertheless often remain challenging even for the largest fine-tuned models.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# PatentEval: 特許生成におけるエラーを理解する PatentEval: Understanding Errors in Patent Generation ( http://arxiv.org/abs/2406.06589v1 ) ライセンス: Link先を確認	You Zuo, Kim Gerdes, Eric Villemonte de La Clergerie, Benoît Sagot,	(参考訳) 本研究では,機械が生成する特許文書における2つの異なるタスク,すなわちクレーム・ツー・アストラクション生成と,先行するクレームの生成を評価するための総合的なエラータイプロジーを提案する。我々はまた,この文脈で言語モデルを体系的に評価するためのベンチマークであるPatentEvalを開発した。我々の研究は、様々なモデルの人間によって注釈付けされた比較分析を含む。これらは、特許ドメイン内のタスクのトレーニング中に特別に適応されたものから、最新の汎用大規模言語モデル(LLM)まで様々である。さらに,特許文書評価における人間の判断を近似する指標について検討し,これらの指標が専門家評価とどの程度一致しているかを分析した。これらのアプローチは、特許テキスト生成の専門分野における現在の言語モデルの能力と限界に関する貴重な洞察を提供する。 In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: claims-to-abstract generation, and the generation of the next claim given previous ones. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context. Our study includes a comparative analysis, annotated by humans, of various models. These range from those specifically adapted during training for tasks within the patent domain to the latest general-purpose large language models (LLMs). Furthermore, we explored and evaluated some metrics to approximate human judgments in patent text evaluation, analyzing the extent to which these metrics align with expert assessments. These approaches provide valuable insights into the capabilities and limitations of current language models in the specialized field of patent text generation.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# LLMは古典的か非単調的か?ジェネリクスから学ぶ Are LLMs classical or nonmonotonic reasoners? Lessons from generics ( http://arxiv.org/abs/2406.06590v1 ) ライセンス: Link先を確認	Alina Leidinger, Robert van Rooij, Ekaterina Shutova,	(参考訳) LLMにおける推論に関する最近の研究は、機械や人間のフィードバックに対する印象的な性能と柔軟な適応の証拠を提供している。現実世界をナビゲートするために人間の認知に不可欠な非単調な推論は、難しいが未調査の課題である。本研究では,7つの最先端LCMの非単調な推論能力について,1つの抽象的および1つの常識的推論タスク,例えば「バードフライ」や「ペンギンは飛べない」例外について検討する(図1参照)。 LLMは人間の非単調な推論能力に従って推論パターンを示すが、支持する例("Owls fly")や非関連情報("Lions has manes")の追加によって、ジェネリックスの真理条件に対する安定した信念を維持することができない。我々の研究は、人間の推論行動のLCMへの寄与と、一般的な能力の評価の落とし穴を浮き彫りにし、一貫した推論はいまだ解明されていない。 Recent scholarship on reasoning in LLMs has supplied evidence of impressive performance and flexible adaptation to machine generated or human feedback. Nonmonotonic reasoning, crucial to human cognition for navigating the real world, remains a challenging, yet understudied task. In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as 'Birds fly', and exceptions, 'Penguins don't fly' (see Fig. 1). While LLMs exhibit reasoning patterns in accordance with human nonmonotonic reasoning abilities, they fail to maintain stable beliefs on truth conditions of generics at the addition of supporting examples ('Owls fly') or unrelated information ('Lions have manes'). Our findings highlight pitfalls in attributing human reasoning behaviours to LLMs, as well as assessing general capabilities, while consistent reasoning remains elusive.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# 肺癌検診におけるTNM分類の高度化のための多言語大言語モデルの検討 Exploring Multilingual Large Language Models for Enhanced TNM classification of Radiology Report in lung cancer staging ( http://arxiv.org/abs/2406.06591v1 ) ライセンス: Link先を確認	Hidetoshi Matsuo, Mizuho Nishio, Takaaki Matsunaga, Koji Fujimoto, Takamichi Murakami,	(参考訳) 背景: 労働集約的構造と物語的報告により, 構造的放射線学報告は未発達のままである。ディープラーニング、特にGPT-3.5のような大規模言語モデル(LLM)は、自然言語による放射線学レポートの構造化を自動化することを約束している。しかし、LLMは英語以外の言語では効果が低いことが報告されているが、そのラジオロジカルな性能は広く研究されていない。目的: 本研究は, GPT3.5-turbo (GPT3.5) を用いた放射線学報告に基づくTNM分類の精度と日本語と英語の多言語LLMの有用性について検討することを目的とした。対象と方法:GPT3.5を用いて肺がんの胸部CT検査からTNM分類を自動的に生成し,その性能を評価するシステムを開発した。一般化線形混合モデルを用いて,両言語で完全あるいは部分的なTNM定義を提供することによる影響を統計的に分析した。結果: TNM の完全定義と, 英語での放射線学報告(M = 94%, N = 80%, T = 47%, ALL = 36%)により, 高い精度が得られた。 T, N, M の各因子の定義はそれぞれの精度を統計的に改善した(T: odds ratio (OR) = 2.35, p < 0.001; N: OR = 1.94, p < 0.01; M: OR = 2.50, p < 0.001)。日本人の報告では、NとMの精度が低下した(Nの精度:OR = 0.74、Mの精度:OR = 0.21)。結論:本研究は,TNM自動分類における多言語LPMの有用性をラジオグラフィーレポートで示している。追加のモデルトレーニングがなくても、提供されたTNM定義により性能が向上し、放射線学の文脈におけるLLMの関連性が示唆された。 Background: Structured radiology reports remains underdeveloped due to labor-intensive structuring and narrative-style reporting. Deep learning, particularly large language models (LLMs) like GPT-3.5, offers promise in automating the structuring of radiology reports in natural languages. However, although it has been reported that LLMs are less effective in languages other than English, their radiological performance has not been extensively studied. Purpose: This study aimed to investigate the accuracy of TNM classification based on radiology reports using GPT3.5-turbo (GPT3.5) and the utility of multilingual LLMs in both Japanese and English. Material and Methods: Utilizing GPT3.5, we developed a system to automatically generate TNM classifications from chest CT reports for lung cancer and evaluate its performance. We statistically analyzed the impact of providing full or partial TNM definitions in both languages using a Generalized Linear Mixed Model. Results: Highest accuracy was attained with full TNM definitions and radiology reports in English (M = 94%, N = 80%, T = 47%, and ALL = 36%). Providing definitions for each of the T, N, and M factors statistically improved their respective accuracies (T: odds ratio (OR) = 2.35, p < 0.001; N: OR = 1.94, p < 0.01; M: OR = 2.50, p < 0.001). Japanese reports exhibited decreased N and M accuracies (N accuracy: OR = 0.74 and M accuracy: OR = 0.21). Conclusion: This study underscores the potential of multilingual LLMs for automatic TNM classification in radiology reports. Even without additional model training, performance improvements were evident with the provided TNM definitions, indicating LLMs' relevance in radiology contexts.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# 自動プロセススーパービジョンによる言語モデルの数学的推論の改善 Improve Mathematical Reasoning in Language Models by Automated Process Supervision ( http://arxiv.org/abs/2406.06592v1 ) ライセンス: Link先を確認	Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi,	(参考訳) 数学的問題の解決やコード生成といった複雑な多段階推論タスクは、最も先進的な大規模言語モデル(LLM)でさえも大きなハードルとなる。 LLMの出力をORM(Outcome Reward Model)で検証することは、LLMの推論性能を向上させるための標準推論時間技術である。しかし、これは、中間結果が適切に報酬や罰則が与えられていない長い、または複数のホップ推論チェーンを持つタスクの推論には不十分であることを示す。プロセス監督は、推論プロセス中に中間報酬を割り当てることで、この制限に対処する。これまで、プロセスの監視データ収集に使われた手法は、人間のアノテーションやモンテカルロのステップごとの見積もりに頼っていた。この課題に対応して,高品質なプロセス監視データの効率的な収集を目的とした,MCTSアルゴリズムである「textit{OmegaPRM}」を提案する。このアルゴリズムは、二項探索によるChain of Thought(CoT)の最初のエラーを迅速に識別し、正と負の例のバランスをとり、効率と品質の両立を保証する。その結果、プロセスリワードモデル(Process Reward Model:PRM)をトレーニングするために、150万以上のプロセス監視アノテーションを収集できるようになりました。この完全自動化プロセスの監督と重み付き自己整合性アルゴリズムを併用して、Gemini Proモデルの数学推論性能を改良し、MATHベンチマークで69.4 %の成功率、51 %のベースモデル性能から36 %の改善を実現した。さらに、プロセス全体が人間の介入なしに動作し、既存の方法と比較して、我々の手法は金銭的にも計算的にも費用対効果がある。 Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods.	翻訳日:2024-06-12 21:14:20 公開日:2024-06-05
# ESBMCによるArm(R) Confidential Computing Architectureのコンポーネント検証 Verifying components of Arm(R) Confidential Computing Architecture with ESBMC ( http://arxiv.org/abs/2406.04375v1 ) ライセンス: Link先を確認	Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, Lucas C. Cordeiro,	(参考訳) Realm Management Monitor(RMM)は、Arm Confidential Computing Architecture(Arm CCA)において重要なファームウェアコンポーネントである。これまでの研究は、RMMの仕様とプロトタイプ参照実装の検証に形式的手法を適用していた。しかし、単一の検証ツールにのみ依存することは、特定のバグや脆弱性の監視につながる可能性がある。本稿では,SMT(Satifiability Modulo Theories)ベースのソフトウェアモデルチェッカーであるESBMCの適用について述べる。 ESBMCのソースコードを正確に解析し、適切な時間枠内で仕様の失敗を特定する能力を示します。さらに,産業技術者の効率を高めるため,ESBMCの潜在的な改善を提案する。この研究は、実世界のシナリオにおける形式的検証技術の能力の探求に寄与し、産業的検証のニーズを満たすためのさらなる改善の道筋を提案する。 Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the application of ESBMC, a state-of-the-art Satisfiability Modulo Theories (SMT)-based software model checker, to further enhance RRM verification. We demonstrate ESBMC's ability to precisely parse the source code and identify specification failures within a reasonable time frame. Moreover, we propose potential improvements for ESBMC to enhance its efficiency for industry engineers. This work contributes to exploring the capabilities of formal verification techniques in real-world scenarios and suggests avenues for further improvements to better meet industrial verification needs.	翻訳日:2024-06-10 18:49:00 公開日:2024-06-05
# グラフニューラルネットワークとマンバの併用による全スライド画像の局所的・大域的組織空間的関係の把握 Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images ( http://arxiv.org/abs/2406.04377v1 ) ライセンス: Link先を確認	Ruiwen Ding, Kha-Dinh Luong, Erika Rodriguez, Ana Cristina Araujo Lemos da Silva, William Hsu,	(参考訳) 計算病理学では、ギガピクセル全体のスライド画像(WSI)から空間的特徴を抽出することが基本的な課題であるが、その大きさが大きいため、WSIは通常より小さなタイルに分割される。この分析の重要な側面は、これらのタイルから情報を集約し、WSIレベルで予測することです。本稿では,メッセージパッシンググラフニューラルネットワーク(GNN)と状態空間モデル(Mamba)を組み合わせて,WSIにおけるタイル間の局所的空間的関係とグローバル的空間的関係を捉えるモデルを提案する。早期肺腺癌(LUAD)患者の無再発生存予測に有効であった。タイルレベルの情報要約統計に基づくアグリゲーション、マルチインスタンス学習(MIL)ベースのアグリゲーション、GNNベースのアグリゲーション、GNNベースのアグリゲーションなど、WSIにおけるタイルレベルの情報アグリゲーションの最先端手法と比較した。追加実験では、異なるタイプのノード特徴と異なるタイルサンプリング戦略がモデル性能に与える影響が示された。この作業は、WSIベースの分析にも容易に拡張できます。コード:https://github.com/rina-ding/gat-mamba。 In computational pathology, extracting spatial features from gigapixel whole slide images (WSIs) is a fundamental task, but due to their large size, WSIs are typically segmented into smaller tiles. A critical aspect of this analysis is aggregating information from these tiles to make predictions at the WSI level. We introduce a model that combines a message-passing graph neural network (GNN) with a state space model (Mamba) to capture both local and global spatial relationships among the tiles in WSIs. The model's effectiveness was demonstrated in predicting progression-free survival among patients with early-stage lung adenocarcinomas (LUAD). We compared the model with other state-of-the-art methods for tile-level information aggregation in WSIs, including tile-level information summary statistics-based aggregation, multiple instance learning (MIL)-based aggregation, GNN-based aggregation, and GNN-transformer-based aggregation. Additional experiments showed the impact of different types of node features and different tile sampling strategies on the model performance. This work can be easily extended to any WSI-based analysis. Code: https://github.com/rina-ding/gat-mamba.	翻訳日:2024-06-10 18:49:00 公開日:2024-06-05
# TIDMAD:AIによる暗黒物質発見のための時系列データセット TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising ( http://arxiv.org/abs/2406.04378v1 ) ライセンス: Link先を確認	J. T. Fry, Aobo Li, Lindley Winslow, Xinyi Hope Fu, Zhenghao Fu, Kaliroe M. W. Pappas,	(参考訳) ダークマターは宇宙の物質の約85%を占めていますが、地球上の実験室では直接観測されていません。ダークマターの起源は、現代物理学において最も重要な問題の一つであり、ダークマターを確実に検出することは、基礎科学におけるノーベル賞レベルのブレークスルーとなるだろう。 ABRACADABRA実験は暗黒物質を探すために特別に設計された。発見はまだされていないが、ABRACADABRAは物理学界で広く支持されている暗黒物質探索の結果を多数生成している。実験では、超長い時系列データを毎秒1000万サンプルの速度で生成し、そこでダークマター信号は超長い時系列の中で正弦波振動モードとして現れる。本稿では、ABRACADABRA実験から得られた包括的なデータリリースであるTIDMADについて、トレーニング、検証、科学サブセットに分割した超長期時系列データセット、直接モデルベンチマークのための慎重に設計されたデノナイズスコア、および物理論文として出版に適したコミュニティ標準ダークマター検索結果を生成する完全な分析フレームワークについて述べる。このデータリリースにより、コアAIアルゴリズムが信号を抽出し、実際の物理結果を生成することにより、基礎科学が前進する。データダウンロードと関連する解析スクリプトはhttps://github.com/jessicafry/TIDMADで公開されている。 Dark matter makes up approximately 85% of total matter in our universe, yet it has never been directly observed in any laboratory on Earth. The origin of dark matter is one of the most important questions in contemporary physics, and a convincing detection of dark matter would be a Nobel-Prize-level breakthrough in fundamental science. The ABRACADABRA experiment was specifically designed to search for dark matter. Although it has not yet made a discovery, ABRACADABRA has produced several dark matter search results widely endorsed by the physics community. The experiment generates ultra-long time-series data at a rate of 10 million samples per second, where the dark matter signal would manifest itself as a sinusoidal oscillation mode within the ultra-long time series. In this paper, we present the TIDMAD -- a comprehensive data release from the ABRACADABRA experiment including three key components: an ultra-long time series dataset divided into training, validation, and science subsets; a carefully-designed denoising score for direct model benchmarking; and a complete analysis framework which produces a community-standard dark matter search result suitable for publication as a physics paper. This data release enables core AI algorithms to extract the signal and produce real physics results thereby advancing fundamental science. The data downloading and associated analysis scripts are available at https://github.com/jessicafry/TIDMAD	翻訳日:2024-06-10 18:49:00 公開日:2024-06-05
# 近接量子コンピュータにおけるオープン量子システムの長時間誤差緩和シミュレーション Long-Time Error-Mitigating Simulation of Open Quantum Systems on Near Term Quantum Computers ( http://arxiv.org/abs/2108.01183v2 ) ライセンス: Link先を確認	Brian Rost, Lorenzo Del Re, Nathan Earnest, Alexander F. Kemper, Barbara Jones, James K. Freericks,	(参考訳) 本研究では,最大2千個のエンタングゲートを含むディープ回路においても,ハードウェアエラーに対する堅牢性を示す量子ハードウェア上でのオープン量子システムシミュレーションについて検討する。無限の熱浴に結合した2つの電子系をシミュレートする。 1) 駆動電界における散逸性自由電子の系,及び 2) 磁場中の単一軌道における2つの相互作用する電子の熱化(ハバード原子)。これらの問題はIBMの量子コンピュータを用いて解決され、長い目で見れば忠実度が低下する兆しはない。この結果から, 開放量子系シミュレーションアルゴリズムは, ノイズの多いハードウェア上で, 同様に複雑な非散逸性アルゴリズムをはるかに上回ることができることを示した。我々の2つの例は、駆動散逸型量子多体問題は最終的に量子コンピュータで解決できることを約束している。 We study an open quantum system simulation on quantum hardware, which demonstrates robustness to hardware errors even with deep circuits containing up to two thousand entangling gates. We simulate two systems of electrons coupled to an infinite thermal bath: 1) a system of dissipative free electrons in a driving electric field; and 2) the thermalization of two interacting electrons in a single orbital in a magnetic field -- the Hubbard atom. These problems are solved using IBM quantum computers, showing no signs of decreasing fidelity at long times. Our results demonstrate that algorithms for simulating open quantum systems are able to far outperform similarly complex non-dissipative algorithms on noisy hardware. Our two examples show promise that the driven-dissipative quantum many-body problem can eventually be solved on quantum computers.	翻訳日:2024-06-08 01:27:18 公開日:2024-06-05
# 摂動理論と正方形の和 Perturbation Theory and the Sum of Squares ( http://arxiv.org/abs/2205.12325v3 ) ライセンス: Link先を確認	Matthew B. Hastings,	(参考訳) sum-of-squares (SoS) 階層は半定値プログラミングに基づく強力な手法であり、古典的および量子最適化の両問題に利用できる。この階層はいくつかの名前で呼ばれ、特に量子化学では還元密度行列 (reduced density matrix, RDM) と呼ばれる。我々は、スピン系(またはクビット系)、ボゾン系(非調和振動子)、およびクォート相互作用を持つフェルミオン系(フェルミオン系)の3種類の系の弱い結合摂動理論を再現するこの階層の能力を考える。このようなフェルミオン系に対しては、次数-$4$ SoS(量子化学において2$-RDMと呼ばれる)が二階摂動理論を再現しないが、次数-$6$ SoS(3$-RDM)が再現する(そして三階摂動理論を再現すると予想する)。実際、これを実現できる6$SoSの断片を特定できるが、これは実際の量子化学計算に有用であり、この断片を6$SoSよりも低コストで実装できる可能性がある。注目すべきことに、この断片は、Sachdev-Ye-Kitaev(SYK)モデルのためにHastingsとO'Donnellによって研究されたものと非常に似ている。 The sum-of-squares (SoS) hierarchy is a powerful technique based on semi-definite programming that can be used for both classical and quantum optimization problems. This hierarchy goes under several names; in particular, in quantum chemistry it is called the reduced density matrix (RDM) method. We consider the ability of this hierarchy to reproduce weak coupling perturbation theory for three different kinds of systems: spin (or qubit) systems, bosonic systems (the anharmonic oscillator), and fermionic systems with quartic interactions. For such fermionic systems, we show that degree-$4$ SoS (called $2$-RDM in quantum chemsitry) does not reproduce second order perturbation theory but degree-$6$ SoS ($3$-RDM) does (and we conjecture that it reproduces third order perturbation theory). Indeed, we identify a fragment of degree-$6$ SoS which can do this, which may be useful for practical quantum chemical calculations as it may be possible to implement this fragment with less cost than the full degree-$6$ SoS. Remarkably, this fragment is very similar to one studied by Hastings and O'Donnell for the Sachdev-Ye-Kitaev (SYK) model.	翻訳日:2024-06-08 01:27:18 公開日:2024-06-05
# 2部ネットワークにおける遅延補正ブロックモデルの変分推定 Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks ( http://arxiv.org/abs/2206.08465v2 ) ライセンス: Link先を確認	Yunpeng Zhao, Ning Hao, Ji Zhu,	(参考訳) バイパルタイトグラフは様々な科学・工学分野にまたがる。同時に二部グラフ内の2種類のノードを双クラスタリングによってグループ化することは、そのようなグラフのネットワーク解析における根本的な課題である。潜在ブロックモデル(英: latent block model、LBM)は、ビクラスタリングのためのモデルベースのツールである。しかし、LBMの有効性は、データ行列における行と列の和の影響によって制限されることが多い。この制限に対処するために、行と列クラスタの異なる次数を考慮した次数補正潜在ブロックモデル(DC-LBM)を導入し、実世界のデータセットとシミュレーションデータの性能を大幅に向上させる。我々は,Mステップにおけるパラメータ推定のための閉形式解を作成することにより,効率的な変動予測-最大化アルゴリズムを開発した。さらに、直流-LBMの下での変動推定器のラベルの一貫性と収束率を証明し、グラフの大きさが大きくなると、平均的な行や列の次数が無限大に近づく限り、期待されるグラフ密度はゼロに近づく。 Bipartite graphs are ubiquitous across various scientific and engineering fields. Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs. The latent block model (LBM) is a commonly used model-based tool for biclustering. However, the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix. To address this limitation, we introduce the degree-corrected latent block model (DC-LBM), which accounts for the varying degrees in row and column clusters, significantly enhancing performance on real-world data sets and simulated data. We develop an efficient variational expectation-maximization algorithm by creating closed-form solutions for parameter estimates in the M steps. Furthermore, we prove the label consistency and the rate of convergence of the variational estimator under the DC-LBM, allowing the expected graph density to approach zero as long as the average expected degrees of rows and columns approach infinity when the size of the graph increases.	翻訳日:2024-06-08 01:19:21 公開日:2024-06-05
# フェデラル・フェデラル・フェデラル・フェデラル・ラーニング」、米連邦捜査局(表 FedCC: Robust Federated Learning against Model Poisoning Attacks ( http://arxiv.org/abs/2212.01976v2 ) ライセンス: Link先を確認	Hyejun Jeong, Hamin Son, Seohu Lee, Jayun Hyun, Tai-Myoung Chung,	(参考訳) 学習モデルにおけるプライバシの懸念に対処するために設計されたフェデレートラーニングは、データプライバシを保護する新たな分散パラダイムを導入しているが、サーバがローカルデータセットにアクセスできないことと保護対象の変化によって、攻撃面を区別する。堅牢なアグリゲーションアルゴリズムを含む既存のアプローチでは、悪意のあるクライアント、特に独立性のない分散データを効果的にフィルタリングすることができない。さらに、これらのアプローチは非IIDデータと毒殺攻撃を別々に扱うことが多い。両課題を同時に解決するため,FedCCは単純だが斬新なアルゴリズムである。クラスタリングにはPenultimate Layer RepresentationsのCentered Kernel Alignment類似性を活用し、非IIDデータ設定でも選択したパラメータを選択的に平均化することにより、悪意のあるクライアントを識別およびフィルタリングすることができる。対象のないモデル中毒とバックドア攻撃を緩和するFedCCの有効性について検討した。 FedCCは、既存の外れ値検出ベースと1次統計ベースの方法と比較して、攻撃の信頼性を一貫したゼロに減らす。具体的には、グローバルパフォーマンスの平均劣化を65.5倍に抑える。学習モデルを評価するというこの新たな視点は、FLモデルのセキュリティとプライバシの分野に価値ある貢献をもたらすと信じています。コードは、論文の受理時に利用可能になる。 Federated Learning, designed to address privacy concerns in learning models, introduces a new distributed paradigm that safeguards data privacy but differentiates the attack surface due to the server's inaccessibility to local datasets and the change in protection objective--parameters' integrity. Existing approaches, including robust aggregation algorithms, fail to effectively filter out malicious clients, especially those with non-Independently and Identically Distributed data. Furthermore, these approaches often tackle non-IID data and poisoning attacks separately. To address both challenges simultaneously, we present FedCC, a simple yet novel algorithm. It leverages the Centered Kernel Alignment similarity of Penultimate Layer Representations for clustering, allowing it to identify and filter out malicious clients by selectively averaging chosen parameters, even in non-IID data settings. Our extensive experiments demonstrate the effectiveness of FedCC in mitigating untargeted model poisoning and backdoor attacks. FedCC reduces the attack confidence to a consistent zero compared to existing outlier detection-based and first-order statistics-based methods. Specifically, it significantly minimizes the average degradation of global performance by 65.5\%. We believe that this new perspective of assessing learning models makes it a valuable contribution to the field of FL model security and privacy. The code will be made available upon paper acceptance.	翻訳日:2024-06-08 01:19:21 公開日:2024-06-05
# 近所で何が起きているのか? 地元ニュースの発見を監督するアプローチ What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News ( http://arxiv.org/abs/2301.08146v3 ) ライセンス: Link先を確認	Deven Santosh Shah, Shiying He, Gosuddin Kamaruddin Siddiqi, Radhika Bansal,	(参考訳) 地域ニュース記事は、都市、郡、州のような地理的領域のユーザーに影響を与えるニュースのサブセットである。ローカルニュースの検出(ステップ) 1)その地理的位置と衝突半径を決定する(ステップ) 2) 正確な地域ニュースレコメンデーションに向けた重要なステップは2つある。ニュースタイトルから市名を検出するようなルールに基づくナイーブな手法は、ニュース内容の理解の欠如により誤った結果をもたらす傾向にある。自然言語処理の最新技術を活用し,ローカルニュースの自動検出とコンテンツに基づくローカルニュースレコメンデーションを可能にする統合パイプラインを開発した。本稿では,(1)ドメイン知識と自動データ処理を組み込んだ弱教師付きフレームワーク,(2)多言語設定への拡張性について述べる。スタンフォード大学のCoreNLP NERモデルと比較して、パイプラインの精度は高く、実世界および人間ラベル付きデータセット上でリコール評価を行う。このパイプラインは、より正確なローカルニュースをユーザーに提供し、ローカルビジネスがより露出しやすくし、近隣の安全に関する情報を提供する可能性がある。 Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.	翻訳日:2024-06-08 01:19:21 公開日:2024-06-05
# 2つの遠方励起原子からの遅延誘起自然暗黒状態発生 Delay-induced spontaneous dark state generation from two distant excited atoms ( http://arxiv.org/abs/2303.06559v2 ) ライセンス: Link先を確認	William Alvarez-Giron, Pablo Solano, Kanu Sinha, Pablo Barberis-Blostein,	(参考訳) 1次元導波路に結合した2つの完全に励起された2層原子の非マルコフ動力学を遅延の有無で検討する。我々は、逆原子アンサンブルが放射を増強するために同期する、よく知られた超蛍光現象に類似して、原子間分離に応じて原子を絡み合った暗黒状態に同期させる「サブ蛍光」効果が存在することを示した。我々の結果は長距離量子ネットワークに関係しており、遠方の量子エミッタ間の自発的な絡み合い発生のメカニズムを提示する。 We investigate the collective non-Markovian dynamics of two fully excited two-level atoms coupled to a one-dimensional waveguide in the presence of delay. We demonstrate that analogous to the well-known superfluorescence phenomena, where an inverted atomic ensemble synchronizes to enhance its emission, there is a `subfluorescence' effect that synchronizes the atoms into an entangled dark state depending on the interatomic separation. Our results are pertinent to long-distance quantum networks, presenting a mechanism for spontaneous entanglement generation between distant quantum emitters.	翻訳日:2024-06-08 01:09:36 公開日:2024-06-05
# 線形回帰としての増大バランスウェイト Augmented balancing weights as linear regression ( http://arxiv.org/abs/2304.14545v3 ) ライセンス: Link先を確認	David Bruns-Smith, Oliver Dukes, Avi Feller, Elizabeth L. Ogburn,	(参考訳) 本稿では,自動脱バイアス機械学習(AutoDML)としても知られる拡張バランスウェイトの特徴について述べる。これらの人気の高い2倍の堅牢または非バイアスの機械学習推定器は、結果モデリングと重みのバランスをとることで、確率スコアを推定し、反転させる代わりに、共変量バランスを直接達成する重みを結合する。結果モデルと重み付けモデルの両方が、ある(おそらく無限)基底で線型である場合、拡張推定器は、元の結果モデルからの係数と不注意な通常の最小二乗(OLS)からの係数を同じデータに結合する係数を持つ単一の線形モデルと等価であることを示す。正規化パラメータの特定の選択の下では、拡張推定器はOLS推定器のみに崩壊することが多く、例えば1986年のラロンデデータセットの再解析で発生する。次に、これらの結果を結果と重み付けモデルの特定の選択に拡張します。まず、結果モデルと重み付けモデルの両方に(カーネル)リッジ回帰を用いた拡張推定器は、1つの(カーネル)リッジ回帰と等価であることを示す。これは有限サンプルで数値的に保持され、アンダースムーシングと漸近的な収束率の新しい解析の基礎となる。重み付けモデルがラッソペナル化回帰である場合、特殊ケースに対して閉形式表現を与え、 ``double selection' 特性を示す。我々のフレームワークは、この人気の高い推定器のクラスにブラックボックスを開き、アンダースムースとダブルロバストな推定器の半パラメトリック効率に関する既存の結果のギャップを埋め、拡張バランスウェイトの性能に関する新たな洞察を提供する。 We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning (AutoDML). These popular doubly robust or de-biased machine learning estimators combine outcome modeling with balancing weights - weights that achieve covariate balance directly in lieu of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the coefficients from the original outcome model and coefficients from an unpenalized ordinary least squares (OLS) fit on the same data. We see that, under certain choices of regularization parameters, the augmented estimator often collapses to the OLS estimator alone; this occurs for example in a re-analysis of the Lalonde 1986 dataset. We then extend these results to specific choices of outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression. This holds numerically in finite samples and lays the groundwork for a novel analysis of undersmoothing and asymptotic rates of convergence. When the weighting model is instead lasso-penalized regression, we give closed-form expressions for special cases and demonstrate a ``double selection'' property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.	翻訳日:2024-06-08 01:09:36 公開日:2024-06-05
# $\mathbb{R}$-smooth Banach空間における非線形方程式のPINN誤差推定 PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spaces ( http://arxiv.org/abs/2305.11915v3 ) ライセンス: Link先を確認	Jiexing Gao, Yurii Zakharian,	(参考訳) 本稿では,PINNの誤差推定を許容するPDEの演算型クラスについて述べる。また、$L^p$空間に対して、PINNの残差境界のツールであるブランブル・ヒルベルト型補題を得る。 In the paper, we describe in operator form classes of PDEs that admit PINN's error estimation. Also, for $L^p$ spaces, we obtain a Bramble-Hilbert type lemma that is a tool for PINN's residuals bounding.	翻訳日:2024-06-08 01:09:36 公開日:2024-06-05
# C-MCTS:Monte Carlo Tree Searchによる安全な計画 C-MCTS: Safe Planning with Monte Carlo Tree Search ( http://arxiv.org/abs/2305.16209v3 ) ライセンス: Link先を確認	Dinesh Parthasarathy, Georgios Kontes, Axel Plinge, Christopher Mutschler,	(参考訳) CMDP(Constrained Markov Decision Process)の定式化は、制約を受ける安全クリティカルな意思決定タスクの解決を可能にする。 CMDPはReinforcement Learningの文献で広く研究されているが、MCTSのようなサンプリングベースの計画アルゴリズムにはほとんど注目されていない。従来のアプローチは、モンテカルロのコスト見積を用いて、高い分散に苦しむ制約違反を避けるため、コストに関して保守的に機能する。エージェント展開前のオフラインフェーズで時間差学習を訓練した安全評論家を用いてコストを見積もるConstrained MCTS(C-MCTS)を提案する。批評家は、展開中にMCTS内の安全でない軌道をプルーニングすることで探索を制限する。 C-MCTSはコスト制約を満たすが、制約境界に近づき、以前の作業よりも高い報酬を達成する。良い副産物として、プランナーはより効率的なw.r.t.プランニングステップである。最も重要なことは、プランナーと現実世界のモデルミスマッチの下では、C-MCTSは以前の作業よりもコスト違反の影響を受けにくいことである。 The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches perform conservatively with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance. We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration by pruning unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards than previous work. As a nice byproduct, the planner is more efficient w.r.t. planning steps. Most importantly, under model mismatch between the planner and the real world, C-MCTS is less susceptible to cost violations than previous work.	翻訳日:2024-06-08 01:09:36 公開日:2024-06-05
# ArtWhisperer:芸術創造における人間とAIのインタラクションを特徴付けるデータセット ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations ( http://arxiv.org/abs/2306.08141v3 ) ライセンス: Link先を確認	Kailas Vodrahalli, James Zou,	(参考訳) 生成的AIがより普及するにつれて、人間のユーザがそのようなモデルとどのように相互作用するかを研究することが重要である。本研究では,対象画像の生成にテキスト・ツー・イメージ・モデルをどのように利用するかを検討する。このインタラクションを研究するために、私たちはArtWhispererというオンラインゲームを作成しました。このゲームを通して、5万以上の人間とAIのインタラクションを記録し、各インタラクションは、ユーザが生成した1つのテキストプロンプトと、それに対応する生成された画像に対応する。その多くは、ユーザがターゲットイメージの最良のプロンプトを見つけるために反復的なインタラクションであり、これは人間とAIのコラボレーションを研究するためのユニークなシーケンシャルデータセットである。本データセットの初期分析では,迅速なインタラクションとユーザ戦略のいくつかの特徴を同定する。人々は多様なプロンプトを提出し、類似した画像を生成するさまざまなテキスト記述を発見できる。興味深いことに、ユーザがより良いプロンプトを見つけるため、迅速な多様性は低下しない。さらに,我々のデータセットを用いたAIの聴取可能性の定量化のための新しい指標を提案する。我々は、タスクを適切に完了させるために必要な相互作用の期待数として、ステアビリティを定義する。この値は、各目標タスクにマルコフ連鎖を適合させ、マルコフ連鎖の適切なスコアに到達するための期待時間を計算することで推定する。我々は、異なるタイプのターゲットイメージと2つの異なるモデルでAIのステアビリティを定量化し比較し、都市と自然世界のイメージが芸術的、幻想的なイメージよりもステアビリティが高いことを発見した。これらの知見は、AIとAIの相互作用に関する洞察を与え、AIのステアビリティを評価する具体的な方法を示し、ArtWhispererデータセットの汎用性を実証する。 As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.	翻訳日:2024-06-08 00:59:06 公開日:2024-06-05
# 高次ネットワークにおけるDegree Heterogeneity: Inference in the Hypergraph $\boldsymbolβ$-Model Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model ( http://arxiv.org/abs/2307.02818v4 ) ライセンス: Link先を確認	Sagnik Nandy, Bhaswar B. Bhattacharya,	(参考訳) ランダムグラフの$\boldsymbol{\beta}$-modelは、次数不均一なネットワーク内の対相互作用を表現するために一般的に用いられる。対の相互作用を超えて、Stasi et al (2014) は高次(複数方向)相互作用を持つネットワークにおける次不均一性を捉えるためのハイパーグラフ $\boldsymbol{\beta}$-model を導入した。本稿では,複数の層を持つハイパーグラフ $\boldsymbol{\beta}$-model の厳密な研究を開始する。まず、最大極大推定値(ML)の収束率を導出し、その最小値の最適性を確立する。また,ML推定の限界分布を導出し,モデルパラメータに対する漸近的に有効な信頼区間を構築する。次に、ハイパーグラフ $\boldsymbol{\beta}$-model における適合性の問題を考える。具体的には、Null仮説の下でのLRテストの漸近正規性を確立し、その検出閾値を導出し、しきい値における制限パワーを導出する。興味深いことに、LRテストの検出しきい値はこのしきい値以下で漸近的に無力である、最小限の最適値であることが判明した。理論的結果は数値実験でさらに検証される。ハイパーグラフ $\boldsymbol{\beta}$-models の推定と推論のための理論的枠組みの開発に加えて、上記の結果はグラフ $\boldsymbol{\beta}$-model の多くのギャップを埋める。 The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.	翻訳日:2024-06-08 00:59:06 公開日:2024-06-05
# 1つの論理量子ビットを符号化した量子極符号のファクトリベースフォールトトレラント生成 Factory-based Fault-tolerant Preparation of Quantum Polar Codes Encoding One logical Qubit ( http://arxiv.org/abs/2307.15226v2 ) ライセンス: Link先を確認	Ashutosh Goswami, Mehdi Mhalla, Valentin Savin,	(参考訳) Q1符号の論理的符号状態、すなわち1量子ビットを符号化する量子極性符号を作成するためのフォールトトレラントな方法が最近提案されている。その耐故障性は、エラー検出装置によって保証され、準備中にエラーが検出された場合には、完全に破棄される。誤り検出のため、準備は確率的であり、その成功率である準備率は、コード長とともに急速に減少し、大きなコード長のコード状態の準備が妨げられる。そこで本研究では,Q1コードステートの複製を並列に数回作成しようとする,Q1コードステートの工場準備について考察する。余分なスケジューリングステップを用いることで、エラーが検出されるたびに準備が完全に破棄されるのを回避できるので、順に準備率が向上する。さらに, モンテカルロシミュレーションに基づく数値結果の厳密な整合性を示す工場準備法を用いて構築したQ1符号の合成と論理誤差率を推定する理論的手法を提案する。したがって,モンテカルロシミュレーションが現実的に実現不可能な大符号長の推定値を提供するには,理論的手法が有用である。例えば、N = 256 の場合、p = 10^{-3} の実際に興味深い物理誤差率に対して 0.02\% から 27\% に増加する。 N = 256 の Q1 符号は、それぞれ p = 10^{-3} と p = 3 x 10^{-4} に対して 10^{-11} と 10^{-15} の論理誤差率を達成する。これは、類似の符号長と最小距離を持つ曲面符号と比較して約3桁の改善に対応しており、大規模なフォールトトレラント量子コンピューティングのための提案されたスキームの可能性を示唆している。 A fault-tolerant way to prepare logical code-states of Q1 codes, i.e., quantum polar codes encoding one qubit, has been recently proposed. The fault tolerance therein is guaranteed by an error detection gadget, where if an error is detected during the preparation, one discards entirely the preparation. Due to error detection, the preparation is probabilistic, and its success rate, referred to as the preparation rate, decreases rapidly with the code-length, preventing the preparation of code-states of large code-lengths. In this paper, to improve the preparation rate, we consider a factory preparation of Q1 code-states, where one attempts to prepare several copies of Q1 code-states in parallel. Using an extra scheduling step, we can avoid discarding the preparation entirely, every time an error is detected, hence, achieving an increased preparation rate in turn. We further provide a theoretical method to estimate preparation and logical error rates of Q1 codes, prepared using factory preparation, which is shown to tightly fit the Monte-Carlo simulation based numerical results. Therefore, our theoretical method is useful for providing estimates for large code-lengths, where Monte-Carlo simulations are practically not feasible. Our numerical results, for a circuit-level depolarizing noise model, indicate that the preparation rate increases significantly, especially for large code-length N. For example, for N = 256, it increases from 0.02\% to 27\% for a practically interesting physical error rate of p = 10^{-3}. Remarkably, a Q1 code with N = 256 achieves logical error rates around 10^{-11} and 10^{-15} for p = 10^{-3} and p = 3 x 10^{-4}, respectively. This corresponds to an improvement of about three orders of magnitude compared to a surface code with similar code-length and minimum distance, thus showing the promise of the proposed scheme for large-scale fault-tolerant quantum computing.	翻訳日:2024-06-08 00:59:06 公開日:2024-06-05
# ゼロサムマルコフゲームにおけるモデルフリーアルゴリズムのサンプル効率の改善 Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games ( http://arxiv.org/abs/2308.08858v2 ) ライセンス: Link先を確認	Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang,	(参考訳) 近年,マルチエージェント強化学習(RL)の理論研究において,ツープレイヤーゼロサムマルコフゲームの問題が注目されている。特に有限ホライズン・エピソード・マルコフ決定過程(MDPs)では、モデルベースのアルゴリズムは、標本の複雑さが$O(H^3SAB/\epsilon^2)$で、地平線上の$H$と州数$S$(それぞれ$A$と$B$は2人のプレイヤーのアクションの数を表す)の依存性が最適である$O(H^3SAB/\epsilon^2)$を見つけることができる。しかし、既存のモデルフリーアルゴリズムではそのような最適性を達成できない。本研究では,モデルフリーのステージベースQ-ラーニングアルゴリズムを提案し,モデルフリーのアルゴリズムがモデルベースアルゴリズムと同一のサンプル複雑性を達成できることを示し,モデルフリーのアルゴリズムがモデルベースアルゴリズムと同一の最適性を享受できることを初めて示す。 H$への依存性の主な改善は、単一のエージェントRLでしか使われていなかった参照アドバンテージ分解に基づいて、一般的な分散還元技術を活用することで生じる。しかし、そのような手法は値関数の臨界単調性に依存しており、これはマルコフのゲームでは粗相関平衡(CCE)オラクルによるポリシーの更新によって成り立たない。そこで,この手法をマルコフゲームに拡張するために,提案アルゴリズムは,値差が史上最小となる楽観的かつ悲観的な値関数のペアとして参照値関数を更新し,標本効率の向上を期待する鍵となる設計を特徴としている。 The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $\epsilon$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/\epsilon^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.	翻訳日:2024-06-08 00:49:21 公開日:2024-06-05
# 単一光子量子ランキング:シークエンシャルデコーディングが高次元エンタングルメントに遭遇する Single Photon Quantum Ranging: When Sequential Decoding Meets High Dimensional Entanglement ( http://arxiv.org/abs/2308.13045v2 ) ライセンス: Link先を確認	Armanpreet Pannu, Han Liu, Amr S. Helmy, Hesham El Gamal,	(参考訳) モード毎の低雑音レベルと低反射率(高損失)状態における量子レンジ問題について考察する。本稿では, 単一光子伝送戦略に焦点をあて, 送信機における高次元時間ビン絡み合わせと検出器における逐次決定ルールを慎重に構成した新しい手法を提案する。解析結果から, 単一光子古典法, 従来提案されていた2モード圧縮真空レンジリング法, ブロックベースの古典的スキームなどと比較して, この手法から, 様々な操作パラメータで活用できる重要な性能向上が得られた。このパフォーマンス向上は、 1)高次元時間ビン絡み合わされた信号が単一の光子と非常に微細な範囲分解能を提供する能力 2) 逐次決定規則は, 誤差の確率に制約のある送信光子の平均個数を最小化する。分析は低エネルギー/低騒音に限られるが、提案手法の優れた性能はより広い範囲のシナリオにまで拡張され、さらなる解析的および実験的研究の動機となるだろうと推測する。 We consider the quantum ranging problem in the low noise level per mode and low reflectivity (high loss) regime. We focus on single photon transmission strategies and propose a novel approach that combines high dimensional time-bin entanglement at the transmitter with a carefully constructed sequential decision rule at the detector. Our analytical results establish the significant performance gains that can be leveraged from this approach in a range of operating parameters, as compared to the single photon classical approach, the two-mode squeezed vacuum ranging scheme proposed earlier, and even the block-based classical scheme. One can attribute this performance gain to 1) the ability of the high dimensional time-bin entangled signaling to offer a very fine range resolution with a single transmitted photon and 2) the ability of the sequential decision rule to minimize the average number of transmitted photon subject to a constraint on the probability of error. While our analysis is limited to the low energy/low noise regime, we conjecture that the proposed approach's superior performance extends to a wider range of scenarios which should motivate further analytical and experimental investigations.	翻訳日:2024-06-08 00:49:21 公開日:2024-06-05
# シャープネスを考慮した最小化と安定性の限界 Sharpness-Aware Minimization and the Edge of Stability ( http://arxiv.org/abs/2309.12488v6 ) ライセンス: Link先を確認	Philip M. Long, Peter L. Bartlett,	(参考訳) 最近の実験では、勾配降下(GD)をステップサイズ$\eta$でトレーニングする場合、損失のHessianの演算ノルムは、約2/\eta$に達するまで増加し、その後、この値に変動する。 2/\eta$は、この損失の局所的な二次近似を考慮して「安定性の端」と呼ばれる。我々は,GD の変種である SAM (Sharpness-Aware Minimization) の「安定性の端」に到達するための同様の計算を行う。 GDの場合とは異なり、結果のSAM-辺は勾配のノルムに依存する。 3つのディープラーニングトレーニングタスクを用いて、SAMは、この分析によって同定された安定性の端で動作していることを実証的に確認する。 Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.	翻訳日:2024-06-08 00:39:36 公開日:2024-06-05
# 自己スペシャライゼーション - 大規模言語モデルにおける潜在専門家の発見 Self-Specialization: Uncovering Latent Expertise within Large Language Models ( http://arxiv.org/abs/2310.00160v2 ) ライセンス: Link先を確認	Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky,	(参考訳) 近年の研究では、人間の手書き種子から始まるモデル自体から生成された命令データを用いて、大規模言語モデルが一般的な指示に従うように整列された自己アライメントの有効性が実証されている。本研究では、総合的なアライメントではなく、専門家ドメイン専門化(例えば、バイオメディシン、ファイナンス)のための自己アライメントに焦点を当てる。予備的な例として、汎用的な指示追従訓練が下流の専門家ドメインの性能に及ぼす限界効果を定量的に示す。そこで本研究では,数個のラベル付き種子を有効利用して,クロスタスクの一般化を実現しつつ,効果的なモデル特化を可能にする自己特殊化を提案する。自己専門化(Self-specialization)は、ジェネラリストが事前訓練したLLMから専門家モデルを“彫り出す”ための、データとパラメータ効率のよい方法を提供する。バイオメディカル・ファイナンシャル・ドメインにおける実験結果から,我々の自己専門化モデルは,そのベースモデルよりも大きなマージンで優れており,また,一般に訓練されたり,他の方法で対象ドメインに適応した大規模モデルよりも大きいことが示唆された。 Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we quantitively show the marginal effect that generic instruction-following training has on downstream expert domains' performance. To remedy this, we propose self-specialization - allowing for effective model specialization while achieving cross-task generalization by leveraging only a few labeled seeds. Self-specialization offers a data- and parameter-efficient way of "carving out" an expert model out of a generalist pre-trained LLM. Exploring a variety of popular open large models as a base for specialization, our experimental results in both biomedical and financial domains show that our self-specialized models outperform their base models by a large margin, and even larger models that are generally instruction-tuned or that have been adapted to the target domain by other means.	翻訳日:2024-06-08 00:39:36 公開日:2024-06-05
# マルチタイル型ニューラルラジアンスフィールド(NeRF) -- 大規模航空データセットの幾何学的評価 Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets ( http://arxiv.org/abs/2310.00530v4 ) ライセンス: Link先を確認	Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino,	(参考訳) ニューラル・ラジアンス・フィールド(Neural Radiance Fields、NeRF)は、航空写真撮影を含む3D再構成作業の恩恵を受ける可能性がある。しかしながら、推定幾何のスケーラビリティと精度は、大規模な航空資産には十分に文書化されていないため、そのようなデータセットは通常、非常に高いメモリ消費と緩やかな収束をもたらす。と。本稿では,大規模な航空データセット上でのNeRFのスケール化を目標とし,NeRFの詳細な幾何学的評価を行う。具体的には、RAMのイメージローディング時のメモリ消費を削減するためのマルチカメラタイリング(MCT)戦略、GPUメモリの表現訓練、タイル内収束率の向上について紹介する。 MCTは、大きなフレームイメージを異なるカメラモデルで複数のタイル画像に分解し、これらの小さなフレームイメージを、精度を損なうことなく、特定の場所に必要なトレーニングプロセスに投入する。提案手法は代表的手法であるMip-NeRFに実装し,その幾何学的性能を2つの典型的な空中データセット上の3フォットグラムのMVSパイプラインとLiDAR参照データと比較する。定性的かつ定量的な結果は、提案したNeRFアプローチが従来の手法よりも完全性やオブジェクトの詳細をもたらすことを示唆している。 Neural Radiance Fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well-documented for large-scale aerial assets,since such datasets usually result in very high memory consumption and slow convergence.. In this paper, we aim to scale the NeRF on large-scael aerial datasets and provide a thorough geometry assessment of NeRF. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory, and increase the convergence rate within tiles. MCT decomposes a large-frame image into multiple tiled images with different camera models, allowing these small-frame images to be fed into the training process as needed for specific locations without a loss of accuracy. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines on two typical aerial datasets against LiDAR reference data. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy.	翻訳日:2024-06-08 00:39:36 公開日:2024-06-05
# POTLOC:Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization ( http://arxiv.org/abs/2310.13585v2 ) ライセンス: Link先を確認	Elahe Vahdani, Yingli Tian,	(参考訳) 本稿では,1フレームのみをトレーニングセットの各アクションインスタンスにアノテートする点教師付き時間的動作検出の課題に対処する。現在のメソッドのほとんどは、アノテーション付きポイントのスパースな性質によって妨げられ、アクションの継続的な構造やアクションインスタンス内の固有の時間的およびセマンティックな依存関係を効果的に表現するのに苦労しています。その結果、これらの手法は単に最も独特なアクションセグメントだけを学習し、不完全なアクションプロポーザルの作成につながった。本稿では,Pseudo-label Oriented Transformer(POTLOC)を提案する。 POTLocは、自己学習戦略を通じて、継続的なアクション構造を特定し、追跡するように設計されている。ベースモデルは、ポイントレベルの監督のみでアクションプロポーザルを生成することから始まります。これらの提案は、推定された行動境界の精度を高めるために、改良と回帰を行い、その後、補助的な監視信号として「擬似ラベル」を生産する結果となった。モデルのアーキテクチャは、トランスフォーマーと時間的特徴ピラミッドを統合して、ビデオスニペットの依存関係と様々な期間のモデルアクションをキャプチャする。粗い位置と行動の境界に関する情報を提供する擬似ラベルは、行動力学の学習を促進するためのトランスフォーマーの指導を支援する。 POTLOCはTHUMOS'14とActivityNet-v1.2データセットの最先端のポイント管理手法より優れている。 This paper tackles the challenge of point-supervised temporal action detection, wherein only a single frame is annotated for each action instance in the training set. Most of the current methods, hindered by the sparse nature of annotated points, struggle to effectively represent the continuous structure of actions or the inherent temporal and semantic dependencies within action instances. Consequently, these methods frequently learn merely the most distinctive segments of actions, leading to the creation of incomplete action proposals. This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action Localization utilizing only point-level annotation. POTLoc is designed to identify and track continuous action structures via a self-training strategy. The base model begins by generating action proposals solely with point-level supervision. These proposals undergo refinement and regression to enhance the precision of the estimated action boundaries, which subsequently results in the production of `pseudo-labels' to serve as supplementary supervisory signals. The architecture of the model integrates a transformer with a temporal feature pyramid to capture video snippet dependencies and model actions of varying duration. The pseudo-labels, providing information about the coarse locations and boundaries of actions, assist in guiding the transformer for enhanced learning of action dynamics. POTLoc outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.	翻訳日:2024-06-08 00:29:50 公開日:2024-06-05
# AGIへの道の歩みを運用するためのAGIのレベル Levels of AGI for Operationalizing Progress on the Path to AGI ( http://arxiv.org/abs/2311.02462v4 ) ライセンス: Link先を確認	Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg,	(参考訳) 本稿では,人工知能(AGI)モデルとその前駆体の性能と動作を分類する枠組みを提案する。このフレームワークは、AGIのパフォーマンス、一般性、自律性のレベルを導入し、モデルを比較し、リスクを評価し、AGIへの道筋に沿って進捗を測定する共通の言語を提供する。フレームワークを開発するために、既存のAGIの定義を分析し、AGIにとって有用なオントロジーが満たすべき6つの原則を抽出する。これらの原則を念頭において、我々は「AGIのレベル」の深さ(性能)と広さ(一般性)の能力に基づいて提案し、現在のシステムがこのオントロジーにどのように適合するかを反映する。これらのレベルに対してAGIモデルの振る舞いと能力を定量化する将来のベンチマークの課題について論じる。最後に、これらのAGIのレベルが自律性やリスクといったデプロイメント上の考慮事項とどのように相互作用するかについて議論し、高機能なAIシステムの責任と安全なデプロイメントにおいて、ヒューマン・AIインタラクションパラダイムを慎重に選択することの重要性を強調します。 We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. With these principles in mind, we propose "Levels of AGI" based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.	翻訳日:2024-06-08 00:29:50 公開日:2024-06-05
# 非凸分散学習のための圧縮・スパースモデル Compressed and Sparse Models for Non-Convex Decentralized Learning ( http://arxiv.org/abs/2311.05760v2 ) ライセンス: Link先を確認	Andrew Campbell, Hang Liu, Leah Woldemariam, Anna Scaglione,	(参考訳) 最近の研究は、特に大規模かつ過度にパラメータ化されたニューラルネットワーク(NN)において、分散機械学習(ML)の効率に重要なボトルネックとして、頻繁なモデル通信を強調している。そこで本研究では,勾配圧縮手法とモデルスペーシフィケーションを組み合わせた新しい分散MLアルゴリズムであるMalcom-PSGDを提案する。我々は,目標値に$\ell_1$正規化を加えてモデルの疎結合を促進し,学習のための分散近位SGD法を提案する。提案手法では,ベクトル源符号化とディザリングに基づく量子化を用いて,疎化モデルの圧縮勾配通信を行う。我々の分析は、Malcom-PSGDが、一定のコンセンサスと学習率を仮定して、反復に対して$\mathcal{O}(1/\sqrt{t})$の収束率を達成していることを示している。この結果は,非凸圧縮SGD法の収束性の証明によって裏付けられる。さらに,Malcom-PSGDに関連する通信コストに対して,クローズドフォームの表現を行う。その結果,提案手法は,最先端技術と比較して通信コストを約7,5 %削減できることがわかった。 Recent research highlights frequent model communication as a significant bottleneck to the efficiency of decentralized machine learning (ML), especially for large-scale and over-parameterized neural networks (NNs). To address this, we present Malcom-PSGD, a novel decentralized ML algorithm that combines gradient compression techniques with model sparsification. We promote model sparsity by adding $\ell_1$ regularization to the objective and present a decentralized proximal SGD method for training. Our approach employs vector source coding and dithering-based quantization for the compressed gradient communication of sparsified models. Our analysis demonstrates that Malcom-PSGD achieves a convergence rate of $\mathcal{O}(1/\sqrt{t})$ with respect to the iterations $t$, assuming a constant consensus and learning rate. This result is supported by our proof for the convergence of non-convex compressed Proximal SGD methods. Additionally, we conduct a bit analysis, providing a closed-form expression for the communication costs associated with Malcom-PSGD. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately $75\%$ when compared to the state-of-the-art.	翻訳日:2024-06-08 00:29:50 公開日:2024-06-05
# 量子セキュアデジタル署名のための同相ポリノミアル公開鍵暗号 Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature ( http://arxiv.org/abs/2311.08967v3 ) ライセンス: Link先を確認	Randy Kuang, Maria Perepechaenko, Mahmoud Sayed, Dafu Lou,	(参考訳) 2022年の研究でKuangらは、量子セーフな公開鍵システムにおける乗算と除算の逆関係を利用した多変数ポリノミアル公開鍵(MPPK)暗号を導入した。彼らはMPPKをホモモルフィックなポリノミアル公開鍵(HPPK)に拡張し、大きな隠蔽リング操作に同型暗号化を適用した。当初、鍵カプセル化(KEM)のために設計されたHPPKのセキュリティは、公開多項式の同型暗号化に依存している。本稿では,HPPK KEMをデジタル署名方式に拡張する。 HPPK KEMをデジタルシグネチャに適応させるために、Barrett還元アルゴリズムの拡張を導入し、モジュラ乗算を素体上の検証方程式の分割に変換する。拡張アルゴリズムは、署名を公開多項式係数に非線形に埋め込み、初期のMPPK DSスキームの脆弱性に対処する。セキュリティ分析は、プライマリフィールドサイズの2倍のリングビット長を考慮して、プライベートキーリカバリと偽シグネチャ攻撃の指数関数的複雑性を示す。 In their 2022 study, Kuang et al. introduced Multivariable Polynomial Public Key (MPPK) cryptography, leveraging the inversion relationship between multiplication and division for quantum-safe public key systems. They extended MPPK into Homomorphic Polynomial Public Key (HPPK), employing homomorphic encryption for large hidden ring operations. Originally designed for key encapsulation (KEM), HPPK's security relies on homomorphic encryption of public polynomials. This paper expands HPPK KEM to a digital signature scheme, facing challenges due to the distinct nature of verification compared to decryption. To adapt HPPK KEM to digital signatures, the authors introduce an extension of the Barrett reduction algorithm, transforming modular multiplications into divisions in the verification equation over a prime field. The extended algorithm non-linearly embeds the signature into public polynomial coefficients, addressing vulnerabilities in earlier MPPK DS schemes. Security analysis demonstrates exponential complexity for private key recovery and forged signature attacks, considering ring bit length twice that of the prime field size.	翻訳日:2024-06-08 00:29:50 公開日:2024-06-05
# genEVA:LLMを用いた分岐物語の生成と可視化 GENEVA: GENErating and Visualizing branching narratives using LLMs ( http://arxiv.org/abs/2311.09213v3 ) ライセンス: Link先を確認	Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan,	(参考訳) 対話型ロールプレイングゲーム(RPG)は強力なストーリーテリングを必要とする。これらの物語は、大きな創造的なチームを書くのに何年もかかるかもしれない。本研究では,このプロセスを支援するため,大規模生成テキストモデルの可能性を示す。プロトタイプツールである \textbf{GENEVA} は、デザイナによって提供される高レベルな物語記述と制約にマッチするストーリーラインの分岐と再収束を伴うリッチな物語グラフを生成する。大規模言語モデル(LLM)であるGPT-4は、分岐した物語を生成し、2段階のプロセスでグラフ形式でレンダリングするために使用される。本稿では,異なる文脈制約下での4つの有名な物語の分岐物語生成におけるgenEVAの利用について述べる。このツールはゲーム開発、シミュレーション、その他のゲームライクな特性を持つアプリケーションを支援する可能性がある。 Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties.	翻訳日:2024-06-08 00:20:02 公開日:2024-06-05
# NFTウォッシュ取引:直接対間接推定 NFT Wash Trading: Direct vs. Indirect Estimation ( http://arxiv.org/abs/2311.18717v2 ) ライセンス: Link先を確認	Brett Hemenway Falk, Gerry Tsoukalas, Niuniu Zhang,	(参考訳) 最近の研究では、Binanceのようなオフチェーン暗号取引所における取引価値の約70%が洗浄取引である。この論文は、NFT市場へ向けられ、トランザクションのオンチェーンの性質、すなわちWeb3のイノベーションのキーテットは、適用すべきより直接的な推定方法を可能にする。最大の3つのNFT市場に焦点を当てると、NFTボリュームの30-40%、取引価値の25-95%が洗剤取引であることがわかった。我々はこの直接的なアプローチを利用して、文献で提案されている最近の間接推定手法を批判的に評価し、効果の大きな違いを明らかにし、一部は完全に失敗する。 Cong et al (2023) で示唆されているように、トレードラウンドネスフィルタは最も正確な間接推定法として出現する。実際,超パラメータ微調整による直接的および間接的アプローチの緊密な整合性を示す。本研究は,デジタルファイナンスにおける金融不正の検出・規制における技術革新の重要性を明らかにするものである。 Recent studies estimate around 70% of traded value on off-chain crypto exchanges like Binance is wash trading. This paper turns to NFT markets, where the on-chain nature of transactions-a key tenet of Web3 innovation-enables more direct estimation methods to be applied. Focusing on three of the largest NFT marketplaces, we find 30-40% of NFT volume and 25-95% of traded value involve wash trading. We leverage this direct approach to critically evaluate recent indirect estimation methods suggested in the literature, revealing major differences in effectiveness, with some failing altogether. Trade-roundedness filters, as suggested in Cong et al. (2023), emerge as the most accurate indirect estimation method. In fact, we show how direct and indirect approaches can be closely aligned via hyper-parameter fine-tuning. Our findings underscore the crucial role of technological innovation in detecting and regulating financial misconduct in digital finance.	翻訳日:2024-06-08 00:20:02 公開日:2024-06-05
# 人間のように反応する:人間に固有の振る舞いをNAOに組み込む Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced Sociability ( http://arxiv.org/abs/2312.07671v2 ) ライセンス: Link先を確認	Ali Ghadami, Mohammadreza Taghimohammadi, Mohammad Mohammadzadeh, Mohammad Hosseinipour, Alireza Taheri,	(参考訳) ロボットの人間に対する受容性と社会性は、人間のような反応を取り入れることで著しく向上することができる。人間は考えずに、環境イベントに素早く反応できる。人間が自然反応を示す例は、突然大きな音に遭遇し、彼らを驚かせたり、怖がらせたりする時である。このような瞬間において、個人は直感的に手を動かし、音の起源に向かって向きを変え、出来事の原因を判断しようとする。この固有の行動は、この研究の少ない社会ロボティクスを探求する動機となった。本研究では, 動作発生器, 音響分類器, YOLOオブジェクト検出器から構成されるマルチモーダルシステムを用いて, 環境を感知し, 突然の音の存在下, 自然の人間の恐怖反応を示し, そして, 環境中の恐怖を感知する音源を特定する。これらの有効な動きと推論は、本質的な人間の反応を模倣し、ロボットの社会性を高めることができる。動作生成のために,LSTMとMDNネットワークに基づくモデルを提案し,様々な動作を合成した。また、音検出の場合、音信号のスペクトログラムを入力として使用する転写学習モデルが好ましい。音響検出、モーション生成、画像認識の個別モデルを開発した後、NAOロボットに実装された総合的な「フィーア」モジュールに統合された。最後に、恐怖モジュールを実用的にテストし、2つの専門家グループと非専門家グループ(ロボティクス分野)がロボットの性能を評価するためのアンケートを作成した。提案モジュールは,ロボットの周囲環境において,突発的かつ大音量の音が鳴り響く場合に,ロボットが人間のように振る舞うことを参加者に納得させ,また,非専門家が社会ロボットとその性能に対して高い期待を抱いていることを示す。 Robots' acceptability among humans and their sociability can be significantly enhanced by incorporating human-like reactions. Humans can react to environmental events very quickly and without thinking. An instance where humans show natural reactions is when they encounter a sudden and loud sound that startles or frightens them. During such moments, individuals may instinctively move their hands, turn toward the origin of the sound, and try to determine the event's cause. This inherent behavior motivated us to explore this less-studied part of social robotics. In this work, a multi-modal system composed of an action generator, sound classifier, and YOLO object detector was designed to sense the environment and, in the presence of sudden loud sounds, show natural human fear reactions; and finally, locate the fear-causing sound source in the environment. These valid generated motions and inferences could imitate intrinsic human reactions and enhance the sociability of robots. For motion generation, a model based on LSTM and MDN networks was proposed to synthesize various motions. Also, in the case of sound detection, a transfer learning model was preferred that used the spectrogram of the sound signals as its input. After developing individual models for sound detection, motion generation, and image recognition, they were integrated into a comprehensive "fear" module implemented on the NAO robot. Finally, the fear module was tested in practical application and two groups of experts and non-experts (in the robotics area) filled out a questionnaire to evaluate the performance of the robot. We indicated that the proposed module could convince the participants that the Nao robot acts and reasons like a human when a sudden and loud sound is in the robot's peripheral environment, and additionally showed that non-experts have higher expectations about social robots and their performance.	翻訳日:2024-06-08 00:20:02 公開日:2024-06-05
# Webの衝撃が機械翻訳される:マルチウェイ並列性からの洞察 A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism ( http://arxiv.org/abs/2401.05749v2 ) ライセンス: Link先を確認	Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico,	(参考訳) ウェブ上のコンテンツは、しばしば多くの言語に翻訳されることを示し、これらのマルチウェイ翻訳の低品質は、機械翻訳(MT)を用いて作成された可能性が高いことを示している。マルチウェイ並列で機械生成されたコンテンツは、下位のリソース言語における翻訳を支配しているだけでなく、それらの言語における全ウェブコンテンツの大部分を構成している。また、多くの言語に翻訳されるコンテンツの種類の選択バイアスの証拠も、MTを通して低品質の英語コンテンツが多くの低レベルリソース言語に翻訳されるのと一致している。本研究は、モノリンガルデータとバイリンガルデータの両方をウェブから抽出した多言語大言語モデルのようなトレーニングモデルに関する深刻な懸念を提起する。 We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT). Multi-way parallel, machine generated content not only dominates the translations in lower resource languages; it also constitutes a large fraction of the total web content in those languages. We also find evidence of a selection bias in the type of content which is translated into many languages, consistent with low quality English content being translated en masse into many lower resource languages, via MT. Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web.	翻訳日:2024-06-08 00:10:18 公開日:2024-06-05
# Medusa: 複数のデコードヘッドを備えたシンプルなLCM推論高速化フレームワーク Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads ( http://arxiv.org/abs/2401.10774v2 ) ライセンス: Link先を確認	Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao,	(参考訳) 大規模言語モデル(LLM)は、逐次計算を必要とする自動回帰デコーディングを採用し、各ステップは前のステップの出力に依存する。これにより、各ステップが完全なモデルパラメータをHigh-Bandwidth Memory (HBM)からアクセラレータのキャッシュに移行する必要があるため、ボトルネックが発生する。投機的復号法のような手法はこの問題に対処するために提案されているが、それらの実装は独立したドラフトモデルの取得と維持に関わる課題によって妨げられている。本稿では,複数のトークンを並列に予測するために,余分なデコードヘッドを追加することで,LCM推論を効率化するMedusaを提案する。ツリーベースのアテンションメカニズムを使用して、Medusaは複数の候補継続を構築し、各デコードステップでそれらを同時に検証する。並列処理を活用することで、Medusaはデコードステップの数を大幅に削減する。 Medusa-1: Medusa は凍結した背骨 LLM 上に直接微調整され,無害な推論の加速を可能にする。 Medusa-2: MedusaはバックボーンLLMと共に微調整され、Medusaヘッドの予測精度が向上し、スピードアップが向上するが、バックボーンモデルの能力を保持する特別なトレーニングレシピが必要である。また、トレーニングデータがない状況に対処する自己蒸留や、生成品質を維持しつつ受け入れ率を高める典型的な受入方式など、Medusaの有用性を向上または拡張するいくつかの拡張を提案する。様々な大きさのモデルと訓練手順を用いてメデューサを評価する。実験により,Medusa-1は生成品質を損なうことなく2.2倍以上の高速化が可能であり,Medusa-2は2.3～3.6倍の高速化を実現している。 Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementation is impeded by the challenges associated with acquiring and maintaining a separate draft model. In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. Using a tree-based attention mechanism, Medusa constructs multiple candidate continuations and verifies them simultaneously in each decoding step. By leveraging parallel processing, Medusa substantially reduces the number of decoding steps required. We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration. Medusa-2: Medusa is fine-tuned together with the backbone LLM, enabling better prediction accuracy of Medusa heads and higher speedup but needing a special training recipe that preserves the backbone model's capabilities. Moreover, we propose several extensions that improve or expand the utility of Medusa, including a self-distillation to handle situations where no training data is available and a typical acceptance scheme to boost the acceptance rate while maintaining generation quality. We evaluate Medusa on models of various sizes and training procedures. Our experiments demonstrate that Medusa-1 can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x.	翻訳日:2024-06-08 00:00:12 公開日:2024-06-05
# 脱獄攻撃に対する言語モデルのロバストプロンプト最適化 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( http://arxiv.org/abs/2401.17263v3 ) ライセンス: Link先を確認	Andy Zhou, Bo Li, Haohan Wang,	(参考訳) AIアライメントの進歩にもかかわらず、大きな言語モデル(LLM)は敵の攻撃や脱獄に弱いままであり、敵は望ましくない行動を誘発するためにプロンプトを修正することができる。いくつかの防衛策が提案されているが、新たに提案された攻撃やより挑戦的な脅威モデルには適応していない。そこで本稿では,ロバスト・プロンプト・最適化(RPO)を用いて,ロバスト・プロンプト・最適化(RPO)による堅牢なシステムレベルの防御を実現する。本手法では, 敵を防御目標に直接組み込み, 軽量かつ移動可能な接尾辞を最適化することにより, RPOが最悪の場合の適応攻撃に適応できるようにする。 GPT-4の攻撃成功率(ASR)は6%,Llama-2は0%,JailbreakBenchは0%に低下した。コードはhttps://github.com/lapisrocks/rpoにある。 Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO) to create robust system-level defenses. Our approach directly incorporates the adversary into the defensive objective and optimizes a lightweight and transferable suffix, enabling RPO to adapt to worst-case adaptive attacks. Our theoretical and experimental results show improved robustness to both jailbreaks seen during optimization and unknown jailbreaks, reducing the attack success rate (ASR) on GPT-4 to 6% and Llama-2 to 0% on JailbreakBench, setting the state-of-the-art. Code can be found at https://github.com/lapisrocks/rpo	翻訳日:2024-06-08 00:00:12 公開日:2024-06-05
# 絡み合いと測定の相補的関係 Complementary Relationships between Entanglement and Measurement ( http://arxiv.org/abs/2401.17537v2 ) ライセンス: Link先を確認	Michael Steiner, Ronald Rendell,	(参考訳) パターン可視性、予測可能性、識別可能性などの粒子の干渉特性に関する補完的な関係が存在する。さらに、情報ゲイン$G$と、絡み合ったスピン対に対する測定障害$F$の関係が知られている。ここでは、同様の絡み合いと測定の相補関係が生じるかどうかを考察する。量子ビット系では、単一系における測定と二部系における測定の両方が絡み合いに関して考慮される。 $\overline{E}+D\le 1$は、測定後の平均絡み合いが$\overline{E}$であり、1つの測定の計測乱れが$D$であることを示す。 Alice と Bob が共有する二部系の測定について、$\overline{E}+G\le 1$ ここで$G$は、Bob が得るアリスの結果に関する最大情報ゲインである。これらの結果は任意の初期混合状態と非エルミート作用素に対して一般化される。最大絡み合った初期状態の場合、$D\le E_{L}$と$G\le E_{L}$はアリスによる測定による絡み合い損失である。得られた乱れ量や情報取得量は、絡み合いによって厳密に制限されていると結論付けている。 Complementary relationships exist regarding interference properties of particles such as pattern visibility, predictability and distinguishability. Additionally, relationships are known between information gain $G$ and measurement disturbance $F$ for entangled spin pairs. The question of whether a similar complementary relationship between entanglement and measurement occurs is examined herein. For qubit systems, both measurement on a single system and measurements on a bipartite system are considered in regards to the entanglement. It is proven that $\overline{E}+D\le 1$ holds where $\overline{E}$ is the average entanglement after a measurement is made and for which $D$ is a measure of the measurement disturbance of a single measurement. For measurements on a bipartite system shared by Alice and Bob ,it is shown that $\overline{E}+G\le 1$ where $G$ is the maximum information gain regarding Alice's result that can be obtained by Bob. These results are generalized for arbitrary initial mixed states and as well to non-Hermitian operators. In the case of maximally entangled initial states, it is found that $D\le E_{L}$ and $G\le E_{L}$ where $E_{L}$ is the entanglement loss due to measurement by Alice. We conclude that the amount of disturbance and information gain that one can gain are strictly limited by entanglement.	翻訳日:2024-06-08 00:00:12 公開日:2024-06-05
# Monotone, Bi-Lipschitz, Polyak-Lojasiewicz Networks Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks ( http://arxiv.org/abs/2402.01344v4 ) ライセンス: Link先を確認	Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester,	(参考訳) 本稿では, 入力摂動に対する出力感度) と逆リプシッツ(出力と出力の差分性)の両方をスムーズに制御できるバイリプシッツ可逆ニューラルネットワークBiLipNetを提案する。 2つ目の貢献は、新しいスカラー出力ネットワークPLNetであり、これはBiLipNetと二次ポテンシャルの合成である。我々はPLNetがPolyak-Lojasiewicz条件を満たすことを示し、一意かつ効率的に計算可能な大域的最小値で非凸サロゲート損失を学習するために適用可能であることを示す。これらのネットワークの中心となる技術的要素は、証明された強い単調性とリプシッツ性を持つ新しい可逆的残留層であり、ビリップネットを構築するために直交層を構成する。これらの性質の証明は増分二次的制約に基づいており、スペクトル正規化で達成できるよりもはるかに厳密な境界となる。さらに、高速アルゴリズムを適用可能な3演算分割問題の連続として、BiLipNetの逆数、つまりPLNetの最小値の計算を定式化する。 This paper presents a new bi-Lipschitz invertible neural network, the BiLipNet, which has the ability to smoothly control both its Lipschitzness (output sensitivity to input perturbations) and inverse Lipschitzness (input distinguishability from different outputs). The second main contribution is a new scalar-output network, the PLNet, which is a composition of a BiLipNet and a quadratic potential. We show that PLNet satisfies the Polyak-Lojasiewicz condition and can be applied to learn non-convex surrogate losses with a unique and efficiently-computable global minimum. The central technical element in these networks is a novel invertible residual layer with certified strong monotonicity and Lipschitzness, which we compose with orthogonal layers to build the BiLipNet. The certification of these properties is based on incremental quadratic constraints, resulting in much tighter bounds than can be achieved with spectral normalization. Moreover, we formulate the calculation of the inverse of a BiLipNet -- and hence the minimum of a PLNet -- as a series of three-operator splitting problems, for which fast algorithms can be applied.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# 補助的短遅延による強遅延フィードバックによる強化学習の強化 Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays ( http://arxiv.org/abs/2402.03141v2 ) ライセンス: Link先を確認	Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang,	(参考訳) 強化学習(Reinforcement Learning, RL)は、事象と知覚知覚の間の遅延の一般的な場合において困難である。最先端のSOTA(State-of-the-art State Augmentation)技術は、確率的環境における状態空間の爆発または性能劣化に悩まされる。これらの課題に対処するために, 確率環境における性能を損なうことなく, 短時間の遅延を含む補助的タスクを利用して, 長時間の遅延でRLを加速する, 補助的強化学習(AD-RL)手法を提案する。具体的には、AD-RLは短い遅延に対する値関数を学習し、ブートストラップとポリシー改善技術を用いて長い遅延に調整する。理論的には、これはサンプルの複雑さを大幅に減少させる可能性がある。決定論的および確率的ベンチマークでは,本手法はサンプル効率と政策性能の両方においてSOTAよりも有意に優れていた。コードはhttps://github.com/QingyuanWuNothing/AD-RLで入手できる。 Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# ポリノミアル時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和 Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v2 ) ライセンス: Link先を確認	Sungyoon Kim, Mert Pilanci,	(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムであれば,n がトレーニングサンプル数である O(log n^0.5) の係数によって,元の問題と緩和の間の相対的最適性ギャップが有界であることが示される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、抽出可能な多項式時間アルゴリズムにつながる。さらに, 緩やかな仮定の下では, 局所勾配法は訓練損失の低い点に収束し, 高い確率で収束することを示す。その結果,局所勾配法が有効である理由の理解に新たな光を当てることができた。 In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# DySLIM:カオスシステムのための不変測度による動的安定学習 DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems ( http://arxiv.org/abs/2402.04467v2 ) ライセンス: Link先を確認	Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez,	(参考訳) 散逸的なカオス系から力学を学ぶことは、その固有の不安定さのために、その正のリャプノフ指数によって形式化され、学習力学における誤りを指数関数的に増幅することが知られている。しかし、これらの系の多くはエルゴード性や引力を示す:コンパクトで非常に複雑な多様体で、軌跡は有限時間で収束し、不変測度、すなわち力学の作用の下で不変な確率分布をサポートし、システムの長期的な統計的挙動を規定する。本研究では、この構造を利用して、軌跡間の不適合のみを対象とする典型的な手法と対照的に、不変測度と力学の学習を対象とする新しい枠組みを提案する。我々のフレームワークは、既存の学習目的で使用できる、抽出可能でサンプルの効率的な目的を提案するのに使われます。我々のDynamics Stable Learning by Invariant Measure (DySLIM) の目的は、他の学習目標と比較して、より優れたポイントワイドトラッキングと長期統計精度を実現するモデルトレーニングを可能にすることである。スケーラブルな正規化項で分布をターゲットとすることで、気候や気候モデルのようなゆっくりと変化する分布を示すより複雑なシステムにこのアプローチを拡張できることを期待する。 Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measure (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# Caduceus: 双方向等価長鎖DNA配列モデリング Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling ( http://arxiv.org/abs/2403.03234v2 ) ライセンス: Link先を確認	Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov,	(参考訳) 大規模シーケンスモデリングが急速に進歩し、生物学やゲノム工学に発展した。しかし、ゲノム配列のモデリングは、長距離トークン相互作用のモデル化の必要性、ゲノムの上流領域と下流領域の影響、DNAの逆相補性(RC)といった課題をもたらす。本稿では、長距離マンバブロックから構築したこれらの課題に動機づけられたアーキテクチャを提案し、それを双方向性をサポートするBiMambaコンポーネントに拡張し、さらにRC等分散をサポートするMambaDNAブロックに拡張する。 RC同種二方向長鎖DNA言語モデルの最初のファミリーであるCaduceusの基盤としてMambaDNAを用い,CaduceusのDNA基盤モデルを生成する事前学習および微調整戦略を導入する。 Caduceusは、ダウンストリームベンチマークで以前の長距離モデルよりも優れており、挑戦的な長距離変動効果予測タスクでは、双方向性や等分散を生かさない10倍の大きなモデルの性能を上回っている。 Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of 10x larger models that do not leverage bi-directionality or equivariance.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# SU(3)離散部分群に対する原始量子ゲート:$Σ(36\times3)$ Primitive Quantum Gates for an SU(3) Discrete Subgroup: $Σ(36\times3)$ ( http://arxiv.org/abs/2405.05973v2 ) ライセンス: Link先を確認	Erik J. Gustafson, Yao Ji, Henry Lamm, Edison M. Murairi, Shuchen Zhu,	(参考訳) 我々は、108要素の$\Sigma(36\times3)$群のデジタル量子シミュレーションのための原始ゲートセットを構築する。量子シミュレーションのために$SU(3)$の非アーベル結晶のような部分群が構築されたのはこれが初めてである。ゲージリンクレジスタと必要なプリミティブ -- 反転ゲート、グループ乗算ゲート、トレースゲート、および$\Sigma(36\times3)$ Fourier変換 -- は、8量子符号化と不均一3量子レジスタと2量子レジスタの両方に対して提示される。後者では、任意のユニタリをこのアーキテクチャに分解する特別なコンパイラが開発された。 We construct the primitive gate set for the digital quantum simulation of the 108-element $\Sigma(36\times3)$ group. This is the first time a nonabelian crystal-like subgroup of $SU(3)$ has been constructed for quantum simulation. The gauge link registers and necessary primitives -- the inversion gate, the group multiplication gate, the trace gate, and the $\Sigma(36\times3)$ Fourier transform -- are presented for both an eight-qubit encoding and a heterogeneous three-qutrit plus two-qubit register. For the latter, a specialized compiler was developed for decomposing arbitrary unitaries onto this architecture.	翻訳日:2024-06-07 23:50:27 公開日:2024-06-05
# 平均$n$-stepの返却は強化学習における変数を減らす Averaging $n$-step Returns Reduces Variance in Reinforcement Learning ( http://arxiv.org/abs/2402.03903v2 ) ライセンス: Link先を確認	Brett Daley, Martha White, Marlos C. Machado,	(参考訳) n$-step returnや$\lambda$-returnsといったマルチステップリターンは、強化学習(RL)メソッドのサンプル効率を改善するために一般的に使用される。多段階学習の利点を逆転させ、未来に近づきすぎると、多段階学習の利点が逆転する。我々の研究では、分散を減らすために複合戻り値 -- $n$-step の重み付き平均値 -- が示される。与えられた$n$-stepの戻り値と同じ縮約係数を持つ任意の化合物が、厳密に分散を減少させることを初めて証明する。さらに,この分散還元特性が線形関数近似の下での時間差学習の有限サンプル複雑性を向上させることを証明した。一般化合物のリターンは実装に費用がかかるため,ミニバッチ経験再生を用いた場合であっても,効率を保ちながら分散を低減できる2ブートストラップリターンを導入する。 DQN や PPO のような深部RL 剤の試料効率が$n$-step である場合が多いことを示す実験を行った。 Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- weighted averages of $n$-step returns -- to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that compound returns often increase the sample efficiency of $n$-step deep RL agents like DQN and PPO.	翻訳日:2024-06-07 23:40:31 公開日:2024-06-05
# 離散状態空間上の生成フロー:タンパク質共設計への応用によるマルチモーダルフローの実現 Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design ( http://arxiv.org/abs/2402.04997v2 ) ライセンス: Link先を確認	Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola,	(参考訳) 離散データと連続データを組み合わせることは、生成モデルにとって重要な能力である。本稿では、離散データの新しいフローベースモデルである離散フローモデル(DFM)について述べる。私たちの重要な洞察は、連続時間マルコフ連鎖を用いて連続空間フローマッチングの離散的等価性を実現できるということです。 DFMは、離散拡散モデルを特定のインスタンスとして含む単純な導出の恩恵を受けつつ、既存の拡散に基づくアプローチよりも優れた性能を実現している。我々はDFM法を用いてマルチモーダルフローに基づくモデリングフレームワークを構築した。この能力をタンパク質共設計のタスクに適用し、タンパク質の構造と配列を共同生成するモデルを学ぶ。提案手法は,同じマルチモーダルモデルを用いてシーケンスや構造を柔軟に生成しながら,最先端の協調設計性能を実現する。 Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.	翻訳日:2024-06-07 23:40:31 公開日:2024-06-05
# ゼロショットエンドツーエンド音声翻訳の限界を押し上げる Pushing the Limits of Zero-shot End-to-End Speech Translation ( http://arxiv.org/abs/2402.10422v2 ) ライセンス: Link先を確認	Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà,	(参考訳) データ不足とテキストモダリティ間のモダリティギャップは、エンドツーエンド音声翻訳(ST)システムの2つの大きな障害であり、その性能を損なう。従来の研究は、外部MTデータを活用することによるこれらの課題の軽減と、音声テキスト表現を近づける距離メトリクスの最適化を試みてきた。しかし、競争結果を達成するには、通常いくつかのSTデータが必要である。このため、ゼロショットSTの手法であるZeroSwotを導入し、ペアのSTデータを使わずにモダリティギャップをブリッジする。新たなCTC圧縮と最適トランスポートを利用して、ASRデータのみを用いて音声エンコーダを訓練し、多言語MTモデルの表現空間と整合する。音声エンコーダは、推論時にMTモデルとシームレスに統合され、MTモデルによってサポートされている全ての言語間で、音声からテキストへの直接変換を可能にする。実験の結果,STデータを使わずに効率よくモダリティギャップを塞ぐことができることがわかったが,MuST-CとCoVoSTは従来のゼロショットモデルだけでなく,教師付きモデルよりも手法の優位性を実証し,最先端の結果を得ることができた。 Data scarcity and the modality gap between the speech and text modalities are two major obstacles of end-to-end Speech Translation (ST) systems, thus hindering their performance. Prior work has attempted to mitigate these challenges by leveraging external MT data and optimizing distance metrics that bring closer the speech-text representations. However, achieving competitive results typically requires some ST data. For this reason, we introduce ZeroSwot, a method for zero-shot ST that bridges the modality gap without any paired ST data. Leveraging a novel CTC compression and Optimal Transport, we train a speech encoder using only ASR data, to align with the representation space of a massively multilingual MT model. The speech encoder seamlessly integrates with the MT model at inference, enabling direct translation from speech to text, across all languages supported by the MT model. Our experiments show that we can effectively close the modality gap without ST data, while our results on MuST-C and CoVoST demonstrate our method's superiority over not only previous zero-shot models, but also supervised ones, achieving state-of-the-art results.	翻訳日:2024-06-07 23:30:46 公開日:2024-06-05
# Llamasは英語で働くか?多言語トランスフォーマーの潜在言語について Do Llamas Work in English? On the Latent Language of Multilingual Transformers ( http://arxiv.org/abs/2402.10588v3 ) ライセンス: Link先を確認	Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West,	(参考訳) 我々は、言語モデルがどのように機能するか、言語バイアスの起源を理解する上で重要な問題である、英語を内部的なピボット言語として使用する、バランスの取れない英語支配のコーパスで訓練された多言語言語モデルかどうかを問う。変換器モデルのLlama-2ファミリに着目し,一意に正しい単発連続性を持つ英語でないプロンプトを慎重に構築する。層から層へ変換器は、最終プロンプトトークンの入力埋め込みを次の確率が計算される出力埋め込みに徐々にマッピングする。中間埋め込みを高次元空間で追跡すると、(1)中間埋め込みは出力トークンの埋め込みから遠く離れたところから始まり、(2)既に中間層で意味論的に正しい次のトークンを復号できるが、そのバージョンが英語で入力言語よりも高い確率を与える。これらの結果を「入力空間」と「概念空間」と「出力空間」の3つの相がそれぞれ動作する概念モデルにキャストした。重要な証拠としては、抽象的な「概念空間」は他の言語よりも英語に近いことが示唆されており、多言語言語モデルが持つバイアスに関して重要な結果をもたらす可能性がある。 We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation. From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space", respectively. Crucially, our evidence suggests that the abstract "concept space" lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.	翻訳日:2024-06-07 23:30:46 公開日:2024-06-05
# ソフトな自己整合性により言語モデルエージェントが改善 Soft Self-Consistency Improves Language Model Agents ( http://arxiv.org/abs/2402.13212v2 ) ライセンス: Link先を確認	Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal,	(参考訳) 大規模言語モデル(LLM)の生成は、最終的な答えを選択するために複数のソリューションのサンプリングとスコアリングによって改善される。自己整合性(SC)のような現在の「サンプルと選択」手法は、回答を得るために多数決に頼っている。しかし、タスクが多くの明瞭で有効な答えを持っている場合、投票による選択には多数のサンプルが必要である。これにより、SCは複数のアクション(回答)を逐次生成する対話的なタスクに対して、極めて高価になる。このようなタスクに対して多数決が一貫した利得を得られないことを確立した後、スコアリング基準を軟化して成功率を高める方法を示す。我々は,SCの不連続スコアをモデル確率から計算した連続スコアに置き換えるソフト自己整合性(SOFT-SC)を導入する。 SOFT-SCは長期の対話的タスクの性能と効率を向上し、SCと同等またはより良いパフォーマンスのために半分のサンプルを必要とする。一定の数のサンプルに対して、SOFT-SCは、bashプログラムの絶対的な成功率でSCを1.3%上回り、オンラインショッピング(WebShop)では6.6%増、インタラクティブホームゲーム(ALFWorld)では4.7%増となる。最後に,オープンソースモデルとブラックボックスモデルの両方に適用可能であることを示す。 Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.	翻訳日:2024-06-07 21:22:40 公開日:2024-06-05
# 因果推論問題に対する言語モデルの最適化 Optimizing Language Models for Human Preferences is a Causal Inference Problem ( http://arxiv.org/abs/2402.14979v2 ) ライセンス: Link先を確認	Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency,	(参考訳) 大規模言語モデル(LLM)が学術的・商業的に広く使われるようになるにつれて、言語モデルが人間の好みに沿ったテキストを生成する方法への関心が高まっている。本稿では,テキストと関連する数値結果からなる直接結果データセットから人選好の言語モデル最適化について検討する。まず,言語モデルの最適化を因果問題と見なして,モデルがテキストと結果の関係を正しく学習することを保証する。本稿では,この因果的言語最適化問題を形式化し,その問題に対する非バイアスな代用目的を解決する手法-因果的選好最適化(CPO)を開発した。さらにCPOを2倍に頑健なCPO(DR-CPO)で拡張し,サロゲート目標のばらつきを低減し,バイアスに対する強い保証を維持した。最後に, DR-CPOの有効性を実証的に実証し, 困難条件下でのDR-CPOのロバスト性を検証した。 As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical outcome measuring the reader's response. We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method--causal preference optimization (CPO)--that solves an unbiased surrogate objective for the problem. We further extend CPO with doubly robust CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.	翻訳日:2024-06-07 21:12:20 公開日:2024-06-05
# SoK:フェデレーション・アンラーニングにおける課題と機会 SoK: Challenges and Opportunities in Federated Unlearning ( http://arxiv.org/abs/2403.02437v2 ) ライセンス: Link先を確認	Hyejun Jeong, Shiqing Ma, Amir Houmansadr,	(参考訳) 2017年に導入されたフェデレートラーニング(FL)は、信頼できない当事者間の協調的な学習を促進する。これにより、GDPRやCPRAといったプライバシー規制を尊重しながら、ユーザデータのトレーニングモデルが可能になる。しかし、新たなプライバシ要件は、データ所有者や法執行機関から要求された場合、モデル所有者にいくつかの学習データ、例えば、emph{forget}を指定できるように委任する可能性がある。これにより、"emph{machine unlearning}"と呼ばれる研究分野が誕生した。 FLの文脈では、集中的な環境での未学習のために開発された多くのテクニックは、簡単には適用できない。これは、集中学習と分散学習、特に相互作用性、確率性、不均一性、FLにおける限定的なアクセシビリティの違いによるものである。これに対し、最近の研究はFLに適した未学習メカニズムの開発に重点を置いている。本論文は、この新興分野の研究動向と課題を特定することを目的として、emph{federated unlearning}文学を深く研究することを目的としている。 FLアンラーニング(2020年以降)で発表された論文を慎重に分類することで、フェデレートされたアンラーニングのユニークな複雑さを特定し、集中型アンラーニングメソッドを直接適用する際の制限を強調することを目指している。我々は、影響の除去と性能回復に関する既存の非学習手法を比較し、脅威モデルと仮定を比較し、その意味と限界について議論する。例えば、データの不均一性やシミュレーション、デモに使われるデータセット、評価指標など、さまざまな観点からFLアンラーニング研究の実験的なセットアップを分析する。我々の研究は、将来のフェデレーション・アンラーニング研究のための洞察と提案を提供することを目的としている。 Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning.	翻訳日:2024-06-07 21:02:35 公開日:2024-06-05
# DRAGIN:大規模言語モデルの情報要求に基づく動的検索拡張生成 DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models ( http://arxiv.org/abs/2403.10081v2 ) ライセンス: Link先を確認	Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu,	(参考訳) 動的検索拡張生成(RAG)パラダイムは,Large Language Models(LLMs)のテキスト生成プロセスにおいて,いつ,何を検索するかを積極的に決定する。このパラダイムには2つの重要な要素がある: 検索モジュールをアクティベートする最適なモーメントを識別する(検索するタイミングを決定する)ことと、検索が起動したら適切なクエリを作成する(検索する項目を決定する)ことである。しかし、現在の動的RAGメソッドはどちらの面においても不足している。まず、いつ取得するかを決める戦略は、しばしば静的なルールに依存します。さらに、何を取得するかを決める戦略は、通常、LLMの最新の文や最後のいくつかのトークンに制限されるが、LLMのリアルタイム情報要求は、コンテキスト全体にまたがる可能性がある。これらの制約を克服するために,LLMのリアルタイム情報要求に基づく動的検索拡張生成(DRAGIN)という新しいフレームワークを導入する。本フレームワークは,テキスト生成プロセスにおいて,LLMのリアルタイム情報要求に基づいて,いつ,何を取得するかを決定するように設計されている。 DRAGINと既存の4つの知識集約型生成データセットを包括的に比較した。実験の結果,DRAGINは全タスクにおいて優れた性能を示し,本手法の有効性を実証した。 https://github.com/oneal2000/DRAGIN/tree/main Dynamic retrieval augmented generation (RAG) paradigm actively decides when and what to retrieve during the text generation process of Large Language Models (LLMs). There are two key elements of this paradigm: identifying the optimal moment to activate the retrieval module (deciding when to retrieve) and crafting the appropriate query once retrieval is triggered (determining what to retrieve). However, current dynamic RAG methods fall short in both aspects. Firstly, the strategies for deciding when to retrieve often rely on static rules. Moreover, the strategies for deciding what to retrieve typically limit themselves to the LLM's most recent sentence or the last few tokens, while the LLM's real-time information needs may span across the entire context. To overcome these limitations, we introduce a new framework, DRAGIN, i.e., Dynamic Retrieval Augmented Generation based on the real-time Information Needs of LLMs. Our framework is specifically designed to make decisions on when and what to retrieve based on the LLM's real-time information needs during the text generation process. We evaluate DRAGIN along with existing methods comprehensively over 4 knowledge-intensive generation datasets. Experimental results show that DRAGIN achieves superior performance on all tasks, demonstrating the effectiveness of our method. We have open-sourced all the code, data, and models in GitHub: https://github.com/oneal2000/DRAGIN/tree/main	翻訳日:2024-06-07 20:52:38 公開日:2024-06-05
# VORTEX:リアルタイムオフチェーン支払いと暗号通貨のクロスチェーンスワップ VORTEX: Real-Time Off-Chain Payments and Cross-Chain Swaps for Cryptocurrencies ( http://arxiv.org/abs/2403.15191v3 ) ライセンス: Link先を確認	Di Wu, Jian Liu, Zhengwei Hou, Wu Wen, Kui Ren,	(参考訳) 本稿では、オフチェーン決済とクロスチェーンスワップの2つの重要な課題に対処する、TEEベースのレイヤ2ソリューションであるVERTEXを提案する。チャンネルなしのオフチェーン支払い: オンチェーン関係や仲介チャネルを必要とせずに、誰にでも直接支払いができる。 - リアルタイムだが分散化されたクロスチェーンスワップ: 中央サーバに頼ることなく、リアルタイムのクロスチェーンスワップを可能にする、最初の既知のソリューションである。この新機能は、画期的な公正な交換プロトコルによって実現されている。 TEEクラッシュ耐性(TEE crash-tolerance): TEEクラッシュを処理するための2つのソリューションを提供する。我々は1000ノードからなるネットワーク上でECHOを評価し,その評価結果から,ECHOが7000TPSを達成することを示す。 In this paper, we present VERTEX, a TEE-based layer-2 solution that tackles two crucial challenges in the realm of cryptocurrencies: off-chain payments and cross-chain swaps. It offers three notable features: - Channel-free off-chain payments: it allows a payer to make direct payments to anyone without requiring any on-chain relationship or intermediary channels. - Real-time yet decentralized cross-chain swaps: it is the first known solution that enables real-time cross-chain swaps without relying on a central server. This novel feature is made possible through a ground-breaking fair exchange protocol. - TEE crash-tolerance: it offers two solutions to handle TEE crashes, one of which involves an innovative application of time-lock puzzles in this context. We evaluate ECHO on a network consists of 1000 nodes and the evaluation results show that ECHO can achieve 7000 TPS	翻訳日:2024-06-07 20:52:38 公開日:2024-06-05
# 動的システムの高精度かつ効率的な予測のためのハイブリッド化と次世代貯留層計算 Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems ( http://arxiv.org/abs/2403.18953v2 ) ライセンス: Link先を確認	Ravi Chepuri, Dael Amzalag, Thomas Antonsen Jr., Michelle Girvan,	(参考訳) Reservoir Computer (RC) は時系列予測のための強力な機械学習アーキテクチャである。近年,次世代貯水池コンピュータ (NGRC) が登場し,計算コストの削減やトレーニングデータ要求の低減など,RCに対して明確な優位性を提供している。しかし、NGRCはデータのサンプリング時間や非線形性のタイプに敏感であるなど、実際的な困難がある。本稿では,動的システムの時系列予測のためのハイブリッドRC-NGRC手法を提案する。計算資源の制限,準最適ハイパーパラメータ,疎サンプリングされたトレーニングデータなどの制約により,我々のハイブリッドアプローチは,カオス力学系の長期統計を正確に予測し,RCとNGRCのみが不足している状況において捉えることができることを示す。これらの条件下では, 小型貯水池を用いたハイブリッドRC-NGRC法は, 従来のRCよりもはるかに大きな貯水池に近づき, 従来のRCよりも計算効率が大きく向上し, 同時にNGRCの限界にも対処できることを示す。計算効率が高く,NGRC単独では不十分な場合に,ハイブリッドRC-NGRCアプローチが特に有用である可能性が示唆された。 Reservoir computers (RCs) are powerful machine learning architectures for time series prediction. Recently, next generation reservoir computers (NGRCs) have been introduced, offering distinct advantages over RCs, such as reduced computational expense and lower training data requirements. However, NGRCs have their own practical difficulties, including sensitivity to sampling time and type of nonlinearities in the data. Here, we introduce a hybrid RC-NGRC approach for time series forecasting of dynamical systems. We show that our hybrid approach can produce accurate short term predictions and capture the long term statistics of chaotic dynamical systems in situations where the RC and NGRC components alone are insufficient, e.g., due to constraints from limited computational resources, sub-optimal hyperparameters, sparsely-sampled training data, etc. Under these conditions, we show for multiple model chaotic systems that the hybrid RC-NGRC method with a small reservoir can achieve prediction performance approaching that of a traditional RC with a much larger reservoir, illustrating that the hybrid approach can offer significant gains in computational efficiency over traditional RCs while simultaneously addressing some of the limitations of NGRCs. Our results suggest that hybrid RC-NGRC approach may be particularly beneficial in cases when computational efficiency is a high priority and an NGRC alone is not adequate.	翻訳日:2024-06-07 20:42:53 公開日:2024-06-05
# TOD3Cap:屋外シーンでの3D映像撮影を目指す TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes ( http://arxiv.org/abs/2403.19589v2 ) ライセンス: Link先を確認	Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao,	(参考訳) 3D高密度キャプションは、自然言語による3Dシーンの包括的理解を実現するための基盤となる。最近、特に屋内で顕著な成果をみせている。しかし、屋外シーンにおける3次元高密度キャプションの探索は、2つの大きな課題によって妨げられている。 1) ダイナミックスや疎視的入力などの屋内と屋外のシーン間の領域ギャップは,既存の屋内手法を直接適用することが困難である。 2) アウトドアシーンに適した包括的ボックスキャプションペアアノテーションによるデータ不足。そこで本研究では,屋外3次元高密度キャプションの新たな課題について紹介する。入力として,パノラマカメラリグで撮影したLiDAR点雲とRGB画像のセットを仮定する。期待される出力は、キャプション付きのオブジェクトボックスのセットです。この課題に対処するために,BEV表現を利用してオブジェクトボックスの提案を生成し,リレーショナルQ-FormerとLLaMA-Adapterを統合するTOD3Capネットワークを提案する。また、850シーンから64.3Kの屋外オブジェクトを2.3M記述したTOD3Capデータセットも導入した。特に,私たちのTOD3Capネットワークは,屋外シーンにおける3Dオブジェクトのローカライズとキャプションを効果的に行うことができ,ベースライン手法の精度を著しく向上させる(+9.6 CiDEr@0.5IoU)。コード、データ、モデルはhttps://github.com/jxbbb/TOD3Capで公開されている。 3D dense captioning stands as a cornerstone in achieving a comprehensive understanding of 3D scenes through natural language. It has recently witnessed remarkable achievements, particularly in indoor settings. However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes. To this end, we introduce the new task of outdoor 3D dense captioning. As input, we assume a LiDAR point cloud and a set of RGB images captured by the panoramic camera rig. The expected output is a set of object boxes with captions. To tackle this task, we propose the TOD3Cap network, which leverages the BEV representation to generate object box proposals and integrates Relation Q-Former with LLaMA-Adapter to generate rich captions for these objects. We also introduce the TOD3Cap dataset, the largest one to our knowledge for 3D dense captioning in outdoor scenes, which contains 2.3M descriptions of 64.3K outdoor objects from 850 scenes. Notably, our TOD3Cap network can effectively localize and caption 3D objects in outdoor scenes, which outperforms baseline methods by a significant margin (+9.6 CiDEr@0.5IoU). Code, data, and models are publicly available at https://github.com/jxbbb/TOD3Cap.	翻訳日:2024-06-07 20:42:53 公開日:2024-06-05
# LLM評価のロバスト性の評価とベンチマークの分布推定 Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks ( http://arxiv.org/abs/2404.16966v2 ) ライセンス: Link先を確認	Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono,	(参考訳) ベンチマークは、LLM(Large Language Models)を評価するための中心的なアプローチとして登場した。調査コミュニティは、しばしばモデルの性能を評価するために、ベンチマークの試験プロンプト全体にわたるモデルの平均パフォーマンスに依存します。これは、ベンチマーク内のテストプロンプトが実世界の関心の分布からランダムなサンプルを表すという仮定と一致している。これは一般的にはそうではありませんが、代わりに特定のユースケースによって関心の分布が異なります。 1) テストプロンプト間のモデル性能の相関は非ランダムであり,(2) テストプロンプト間の相関を考慮すれば,主要なベンチマーク上でモデルランキングを変更することができる。 Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from a real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.	翻訳日:2024-06-07 20:33:09 公開日:2024-06-05
# 人間と大言語モデルにおける創造的プロセスの特徴付け Characterising the Creative Process in Humans and Large Language Models ( http://arxiv.org/abs/2405.00899v2 ) ライセンス: Link先を確認	Surabhi S. Nath, Peter Dayan, Claire Stevenson,	(参考訳) 大きな言語モデルは非常に創造的で、創造的なタスクにおいて平均的な人間と同等に機能することが多い。しかし, LLM の創造性の研究は, 創造性にはほとんど関心を持たず, 単に \textit{products} に焦点を絞っている。人間の創造性に関するプロセス分析は、しばしば手書きのカテゴリや応答時間を利用する必要があるが、LLMには適用されない。本稿では,人間とLLMが交互利用課題における意味空間を探索する方法と,言語周波数課題における行動とを対比する手法を提案する。文埋め込みを用いて応答カテゴリを識別し、ジャンププロファイルを生成するために使用する意味的類似性を計算する。我々の結果は、人間における初期の研究と相関し、永続性(意味空間の深部探索)とフレキシブル(複数の意味空間を横断する広部探索)の両方を創造性へと導いてくれる。 LLMは、タスクによって異なる永続性または柔軟なパスに偏りがあることが判明した。人口としてのLSMは人間のプロファイルと一致するが、創造性との関係は異なる。我々のデータセットとスクリプトは \href{https://github.com/surabhisnath/Creative_Process}{GitHub} で入手できる。 Large language models appear quite creative, often performing on par with the average human on creative tasks. However, research on LLM creativity has focused solely on \textit{products}, with little attention on the creative \textit{process}. Process analyses of human creativity often require hand-coded categories or exploit response times, which do not apply to LLMs. We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task, and contrast with behaviour in a Verbal Fluency Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity, where both pathways lead to similar creativity scores. LLMs were found to be biased towards either persistent or flexible paths, that varied across tasks. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity. Our dataset and scripts are available on \href{https://github.com/surabhisnath/Creative_Process}{GitHub}.	翻訳日:2024-06-07 20:33:09 公開日:2024-06-05
# グラフニューラルネットワークの条件シフト・ロバスト整形予測 Conditional Shift-Robust Conformal Prediction for Graph Neural Network ( http://arxiv.org/abs/2405.11968v2 ) ライセンス: Link先を確認	S. Akansha,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの結果を予測する強力なツールとして登場した。有効性にもかかわらず、GNNの重大な欠点は、堅牢な不確実性推定を提供する能力が限られていることであり、エラーが重大な結果をもたらす状況において、信頼性に課題が生じる。さらに、GNNは、トレーニングデータとテストデータが同一の分布に従えば、実際のグラフデータシナリオでは、しばしば無意味な条件となる。本稿では,予測モデル出力を予測集合に変換することで不確かさを定量化するための,広く知られている統計手法であるコンフォメーション予測を利用して,条件シフト\footnote{Representing the change of Conditional probability distribution \(P(label\|input)\) from source domain to target domain。グラフベースの半教師あり学習(SSL)。さらに,潜在段階における条件シフトを最小限に抑えて,モデル予測の精細化を目的とした新たな損失関数を提案する。条件シフトロバスト (CondSR) によるGNNの共形予測は, モデルに依存しない, 様々な分類モデルに適用可能なアプローチである。提案手法の有効性を標準グラフベンチマークデータセットで検証し,ノード分類タスクにおける最先端のGNNと統合する。包括的評価により,提案手法は任意の目標限界範囲を連続的に達成し,条件付きシフト下での最先端GNNモデルの精度を最大12倍に向上し,予測セットサイズを最大48倍に削減することを示す。コードの実装は、さらなる探索と実験のために公開されています。 Graph Neural Networks (GNNs) have emerged as potent tools for predicting outcomes in graph-structured data. Despite their efficacy, a significant drawback of GNNs lies in their limited ability to provide robust uncertainty estimates, posing challenges to their reliability in contexts where errors carry significant consequences. Moreover, GNNs typically excel in in-distribution settings, assuming that training and test data follow identical distributions a condition often unmet in real world graph data scenarios. In this article, we leverage conformal prediction, a widely recognized statistical technique for quantifying uncertainty by transforming predictive model outputs into prediction sets, to address uncertainty quantification in GNN predictions amidst conditional shift\footnote{Representing the change in conditional probability distribution \(P(label\|input)\) from source domain to target domain.} in graph-based semi-supervised learning (SSL). Additionally, we propose a novel loss function aimed at refining model predictions by minimizing conditional shift in latent stages. Termed Conditional Shift Robust (CondSR) conformal prediction for GNNs, our approach CondSR is model-agnostic and adaptable to various classification models. We validate the effectiveness of our method on standard graph benchmark datasets, integrating it with state-of-the-art GNNs in node classification tasks. Comprehensive evaluations demonstrate that our approach consistently achieves any predefined target marginal coverage, enhances the accuracy of state of the art GNN models by up to 12\% under conditional shift, and reduces the prediction set size by up to 48\%. The code implementation is publicly available for further exploration and experimentation.	翻訳日:2024-06-07 20:23:24 公開日:2024-06-05
# 複合現実感に向けたマルチモーダルファイングラインドトレーニングアシスタントのための自律ワークフロー Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality ( http://arxiv.org/abs/2405.13034v2 ) ライセンス: Link先を確認	Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo Cesar,	(参考訳) 自律人工知能(AI)エージェントは、言語ベースの環境を自動的に理解するための有望なプロトコルとして、特に大規模言語モデル(LLM)の指数関数的開発とともに登場した。しかし、マルチモーダル環境の詳細な包括的理解はいまだ未解明のままである。この作業は、AIエージェントを詳細にトレーニングするための拡張現実(XR)アプリケーションにシームレスに統合するための自律ワークフローを設計する。パイロットXR環境におけるLEGOブロック組立のためのマルチモーダルきめ細粒度トレーニングアシスタントのデモンストレーションを行う。具体的には、記憶、計画、XRツールとの相互作用をLLMと統合した脳言語エージェントと視覚言語エージェントを設計し、エージェントが過去の経験に基づいて行動を決定することを可能にする。さらに,商業LLMによって提供されるワークフローで自動的に合成される多モーダルなアセンブリ・ダイアログ・データセットLEGO-MRTAを紹介する。このデータセットは、マルチモーダルな指示マニュアル、会話、XR応答、視覚質問応答を含む。最後に,提案したデータセットを微調整することなく,その性能を評価するため,複数のオープンソース LLM をベンチマークとして提示する。我々は、このワークフローのより広範な影響が、XR環境におけるシームレスなユーザインタラクションのためのスマートアシスタントの開発を促進し、AIとHCIコミュニティの両方の研究を促進することを期待する。 Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.	翻訳日:2024-06-07 20:23:24 公開日:2024-06-05
# 説明可能な音声感情認識のための反復的特徴増強 Iterative Feature Boosting for Explainable Speech Emotion Recognition ( http://arxiv.org/abs/2405.20172v3 ) ライセンス: Link先を確認	Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara,	(参考訳) 音声感情認識(SER)では、その実用的重要性を考慮せずに事前定義された特徴を用いることで、冗長で無関係な情報を含む高次元データセットが生成される可能性がある。その結果、高次元学習はしばしば計算複雑性を増大させながらモデルの精度を低下させる。本研究は,効率的なSERシステムを構築するために,特徴を慎重に検討し,分析することの重要性を浮き彫りにしている。本稿では,効率的な特徴工学手法に基づく新しい教師付きSER手法を提案する。特徴の関連性を評価し,特徴セットを洗練させるために,結果の説明可能性に特に注意を払っている。これは機能評価ループを通じて反復的に実行され、Shapley値を使用して機能選択を強化し、フレームワーク全体のパフォーマンスを改善する。このアプローチによって、モデルパフォーマンスと透明性のメリットのバランスが取れます。提案手法は,TESSデータセット上での感情認識において,ヒトレベルのパフォーマンス(HLP)および最先端の機械学習手法より優れる。本論文のソースコードはhttps://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognitionで公開されている。 In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset. The source code of this paper is publicly available at https://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognition.	翻訳日:2024-06-07 20:03:47 公開日:2024-06-05
# 合理性を考慮したマルチモーダル・マルチエージェントシステム:サーベイ Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey ( http://arxiv.org/abs/2406.00252v2 ) ライセンス: Link先を確認	Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick,	(参考訳) 合理性(Rationality)とは、論理的な思考と、証拠や論理的な規則に沿った決定によって特徴づけられる、理性によって導かれる性質である。この品質は、ソリューションが十分に確立され、体系的に導出されることを保証するため、効果的な問題解決に不可欠である。大きな言語モデル(LLM)が顕著な精度で人間に似たテキストを生成するのに進歩しているにもかかわらず、トレーニングデータから継承されたバイアス、異なるコンテキスト間での不整合、複数のコンテキスト層を含む複雑なシナリオを理解するのが困難である。したがって、近年の研究は、一貫性と信頼性を高めるために、様々な種類のデータやツールと協調して働く複数のエージェントの強度を活用しようとしている。そこで本稿は,マルチモーダルシステムとマルチエージェントシステムが合理性に向かって進んでいるかを理解することを目的として,現状を調査し,合理性の観点から単モーダルシステムと単モーダルシステムの進歩を特定し,オープンな問題と今後の方向性について議論する。 https://github.com/bowen-upenn/MMMA_Rationality.comでオープンリポジトリをメンテナンスしています。 Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at https://github.com/bowen-upenn/MMMA_Rationality.	翻訳日:2024-06-07 20:03:47 公開日:2024-06-05
# メル周波数ケプストラム係数を用いた心臓音の高次分類 : 単音・アンサンブル分類法の比較検討 Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier Strategies ( http://arxiv.org/abs/2406.00702v2 ) ライセンス: Link先を確認	Amir Masoud Rahmani, Amir Haider, Parisa Khoshvaght, Mohammad Adeli, Entesar Gemeay, Yazeed Alkhrijah, Mokhtar Mohammadi, Mehdi Hosseinzadeh,	(参考訳) 本稿では,Mel Frequency Cepstral Coefficients (MFCCs) の2つの分類法(単一分類法とアンサンブル分類法)を用いた異常心電図検出における有効性について検討する。 Phonocardiograms were segmented into S1, systole, S2, and diastole intervals, and 13 MFCCs estimated from each segment, by 52 MFCCs per beat。単分類法では,9拍子のMFCCを平均化して心エコー図の分類を行った。逆に、アンサンブル分類法は9つの分類法を用いて、ビートを正常または異常として個別に評価し、全体分類は多数決に基づいて行った。どちらの方法も一般に公開されている心電図データベース上でテストされた。その結果, 単一分類法よりも高い精度を達成し, MFCCを時間, 時間, 統計的特徴など他の特徴よりも有効とみなし, 同様の研究で評価した。 This paper explores the efficacy of Mel Frequency Cepstral Coefficients (MFCCs) in detecting abnormal phonocardiograms using two classification strategies: a single-classifier and an ensemble-classifier approach. Phonocardiograms were segmented into S1, systole, S2, and diastole intervals, with thirteen MFCCs estimated from each segment, yielding 52 MFCCs per beat. In the single-classifier strategy, the MFCCs from nine consecutive beats were averaged to classify phonocardiograms. Conversely, the ensemble-classifier strategy employed nine classifiers to individually assess beats as normal or abnormal, with the overall classification based on the majority vote. Both methods were tested on a publicly available phonocardiogram database. Results demonstrated that the ensemble-classifier strategy achieved higher accuracy compared to the single-classifier approach, establishing MFCCs as more effective than other features, including time, time-frequency, and statistical features, evaluated in similar studies.	翻訳日:2024-06-07 19:54:03 公開日:2024-06-05
# ロバストセグメンテーションのための感度インフォームメント Sensitivity-Informed Augmentation for Robust Segmentation ( http://arxiv.org/abs/2406.01425v3 ) ライセンス: Link先を確認	Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin,	(参考訳) セグメンテーションは、仮想トライオン、医療画像、自律運転、農業自動化など、多くのビジュアルコンピューティングアプリケーションにおいて不可欠なモジュールである。これらのアプリケーションは、一般的な携帯電話や高価な衛星画像カメラからでも、視覚センサーのデータの品質を劣化させることのできる、広範な消費者利用または高度に変動した環境を含むことが多い。ユーザ差や天候条件などの外部ノイズに加えて、カメラ品質の変動やレンズ歪みなどの内部ノイズは、開発と展開の両方においてセグメンテーションモデルの性能に影響を与える可能性がある。本研究では,学習ベースセグメンテーションモデルの堅牢性を高めるための,効率的で適応性が高く,勾配のない手法を提案する。まず,Kernel Inception Distance (KID) を用いた新しい適応感度解析手法を提案する。次に、適応SAとサンプル摂動ハイパーパラメータ値を用いて感度曲線をモデル化する。最後に、選択した摂動値を用いて対人訓練を行い、オンライントレーニング中のロバスト性を動的に再評価する。我々の手法は最小限の微調整でエンドツーエンドに実装され、セグメンテーションのための最先端データ拡張技術より一貫して優れている。これは、ビジュアルコンピューティングやコンピュータグラフィックスアプリケーションで使用される様々なセグメンテーションデータセットに対して、クリーンなデータ評価と現実の悪質なシナリオ評価の両方において、大幅な改善を示す。 Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In addition to external noises like user difference or weather conditions, internal noises such as variations in camera quality or lens distortion can affect the performance of segmentation models during both development and deployment. In this work, we present an efficient, adaptable, and gradient-free method to enhance the robustness of learning-based segmentation models across training. First, we introduce a novel adaptive sensitivity analysis (ASA) using Kernel Inception Distance (KID) on basis perturbations to benchmark perturbation sensitivity of pre-trained segmentation models. Then, we model the sensitivity curve using the adaptive SA and sample perturbation hyperparameter values accordingly. Finally, we conduct adversarial training with the selected perturbation values and dynamically re-evaluate robustness during online training. Our method, implemented end-to-end with minimal fine-tuning required, consistently outperforms state-of-the-art data augmentation techniques for segmentation. It shows significant improvement in both clean data evaluation and real-world adverse scenario evaluation across various segmentation datasets used in visual computing and computer graphics applications.	翻訳日:2024-06-07 19:54:03 公開日:2024-06-05
# Qラーニングにおける連続状態行動空間の識別方法--シンボリック・コントロール・アプローチ How to discretize continuous state-action spaces in Q-learning: A symbolic control approach ( http://arxiv.org/abs/2406.01548v3 ) ライセンス: Link先を確認	Sadek Belamfedel Alaoui, Adnane Saoud,	(参考訳) Q-ラーニングは、特定の目標を達成するためにコントローラを合成する効果的なアプローチとして広く認識されている。しかし、継続的な状態-作用空間によって引き起こされる課題への対処は現在も研究の焦点となっている。本稿では,空間離散化法における大きな欠点を浮き彫りにした系統解析について述べる。この課題に対処するため,本論文では,抽象から制御システムへのシミュレーションの交互化など,行動関係を表現するシンボリックモデルを提案する。この関係により、オリジナルのシステムへの抽象化に基づいて、合成されたコントローラをシームレスに適用することができる。シンボリックモデルのための新しいQ-ラーニング手法を導入し、最適なポリシーを符号化する2つのQ-テーブルを生成する。理論解析により、これらのQ-テーブルは、連続空間を持つ元の系のQ-値の上界と下界の両方として機能することを示した。さらに,空間抽象のパラメータとQ値の損失との相関について検討した。このアルゴリズムは任意の精度で最適性を達成し、精度と計算複雑性の間のトレードオフを制御する。得られた結果は、適切な学習パラメータを選択し、コントローラを洗練するための貴重な洞察を提供する。提案したQ-ラーニングに基づく記号モデルの工学的妥当性を2つのケーススタディで示す。 Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.	翻訳日:2024-06-07 19:54:03 公開日:2024-06-05
# 隠れた要因を明らかにする: 音声感情認識における特徴増強のための説明可能なAI Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition ( http://arxiv.org/abs/2406.01624v2 ) ライセンス: Link先を確認	Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara,	(参考訳) 音声感情認識(SER)は、メンタルヘルス、教育、人間とコンピュータの相互作用など、いくつかの応用分野から注目されている。しかし、SERシステムの精度は、無関係かつ冗長な情報を含む可能性のある高次元特徴集合によって妨げられる。そこで本研究では,機械学習モデルの性能向上のための機能関連性や説明可能性を重視した,SERの反復的機能強化手法を提案する。我々のアプローチは、効率的なSERシステムを構築するための細心の注意を要する特徴の選択と分析である。モデル説明可能性による主要な問題に対処するために、Shapley値を持つ機能評価ループを用いて、反復的に機能セットを洗練します。このプロセスはモデルの性能と透明性のバランスをとっており、モデルの予測を包括的に理解することができる。提案手法は、無関係で冗長な特徴の識別や削除など、いくつかの利点を提供し、より効果的なモデルをもたらす。さらに、説明可能性を促進し、モデルの予測の理解を促進し、感情決定の重要な特徴を識別する。提案手法の有効性はトロントの感情音声セット(TESS)、ベルリンの感情音声データベース(EMO-DB)、Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS)、およびSurrey Audio-Visual Expressed Emotion(SAVEE)データセットのSERベンチマークで検証され、最先端の手法よりも優れている。私たちの知る限りでは、SERフレームワークにモデル説明可能性を導入するのはこれが初めてです。本論文のソースコードは、https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech -Emotion-Recognitionを通じて公開されている。 Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech -Emotion-Recognition.	翻訳日:2024-06-07 19:54:03 公開日:2024-06-05
# ECHOで高速でタイムリーに暗号化されたトラフィック分類 Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO ( http://arxiv.org/abs/2406.01852v2 ) ライセンス: Link先を確認	Shilo Daum, Tal Shapira, Anat Bremler-Barr, David Hay,	(参考訳) インターネットトラフィックの95%が暗号化されているため、このトラフィックを分類するための効果的なアプローチは、ネットワークのセキュリティと管理にとって不可欠である。本稿では,ML/DLベースの暗号化トラフィック分類のための新しい最適化プロセスであるECHOを紹介する。 ECHOは、分類時間とメモリ利用の両方を目標とし、2つの革新的なテクニックを取り入れている。最初のコンポーネントであるHO(Hyperparameter Optimization of binnings)は、効率的なトラフィック表現を作ることを目的としている。従来の研究では,パケットサイズやパケット到着時刻を固定サイズのビンにマッピングする表現を用いていた。これらの不均一な双対は、トレーニング段階でハイパーパラメータ最適化アルゴリズムを用いて導出される。 HOは必要な表現サイズに応じて精度を著しく向上させるか、または同等に、より小さな表現を用いて同等の精度を達成する。次に,EC(Early Classification of traffic)を導入し,信頼度に基づいて,異なる終了時間に適応した分類器のカスケードを用いて,より高速な分類を可能にする。 ECは、平均分類遅延を最大90%削減する。注目すべきは、この手法が分類精度を維持するだけでなく、場合によってはその精度を向上させることである。 3つの公開データセットを用いて、組み合わせた手法であるEarly Classification with Hyperparameter Optimization (ECHO)が、分類効率を大幅に向上させることを示した。 With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.	翻訳日:2024-06-07 19:54:03 公開日:2024-06-05
# コア毎のクリッピングによる低メモリ化と性能向上を効果的に訓練するASRモデル Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping ( http://arxiv.org/abs/2406.02004v2 ) ライセンス: Link先を確認	Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan,	(参考訳) グラディエント・クリッピングは、大規模自動音声認識(ASR)モデルの訓練において重要な役割を果たす。一般的には、勾配の爆発を防ぐためのミニバッチ勾配や、意図しない暗記を緩和するために個々のサンプル勾配に適用される。この研究は、幅広いASRモデルのトレーニングにおいて、勾配クリッピングの特定の粒度、すなわちコアごとのクリッピング(PCC)の影響を体系的に調査する。我々は,PCCがASRモデルにおける意図しない記憶を効果的に緩和できることを実証的に実証した。驚くべきことに、PCCはASRのパフォーマンス指標に肯定的な影響を与え、収束率の改善と単語誤り率の低減につながっている。さらに,PCCが導入したハイパーパラメータの調整を避けるため,並列化最適化のための新しい変種アダプティブ・パー・コア・クリッピング(APCC)を提案する。本研究は,PCCの多面的メリットを,堅牢でプライバシ・フォワードなASRモデルトレーニングの戦略として強調した。 Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models. We empirically demonstrate that PCC can effectively mitigate unintended memorization in ASR models. Surprisingly, we find that PCC positively influences ASR performance metrics, leading to improved convergence rates and reduced word error rates. To avoid tuning the additional hyperparameter introduced by PCC, we further propose a novel variant, adaptive per-core clipping (APCC), for streamlined optimization. Our findings highlight the multifaceted benefits of PCC as a strategy for robust, privacy-forward ASR model training.	翻訳日:2024-06-07 19:44:18 公開日:2024-06-05
# Alice in Wonderland: State-Of-the-Art Large Language Modelにおける完全推論のブレークダウンを示す単純なタスク Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models ( http://arxiv.org/abs/2406.02061v2 ) ライセンス: Link先を確認	Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev,	(参考訳) 大規模言語モデル(LLM)は、しばしば基礎モデルの例として記述される。すなわち、様々なタスクや状況に対して、ほとんどショーやゼロショットの方法で強く移行するモデルであると同時に、事前トレーニングスケールを拡大する際の関数改善を予測するスケーリング法則を示す。これらの異なる機能やタスクが優れているという主張は、そのようなモデルに対して高いスコアを示す標準化されたベンチマークの様々なセットにまたがる測定に依存する。ここでは,人間によって容易に解ける簡潔で簡潔な自然言語で定式化された従来の共通感覚問題を用いて,強機能を主張する最大規模で訓練された最先端モデルの機能と推論能力の劇的な分解を実演する。モデルは間違った解に強い自信を表現し、しばしば非感覚的な「推論」のような説明は、明らかに失敗した応答の妥当性を正当化し、バックアップすることに似ている。正しいソリューションを得るための様々な標準的な介入、例えば、様々な種類の強化プロンプト、あるいは、複数のステップの再評価によって間違ったソリューションを再考するようモデルに促す、といったことは失敗します。これらの最初の観察は、科学・技術界に、現在のLLMの主張する能力の緊急な再評価を刺激するものであり、このような再評価は、現在の最先端の評価手順やベンチマークによって明らかに発見されないような基本的な理由づけ欠陥を適切に検出できるような、標準化されたベンチマークを作成するための共通の行動も必要である。論文における実験の再現コードと生の実験データはhttps://github.com/LAION-AI/AIWで見ることができる。 Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across various sets of standardized benchmarks showing high scores for such models. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales which claim strong function, using a simple, short, conventional common sense problem formulated in concise natural language, easily solvable by humans. The breakdown is dramatic, as models also express strong overconfidence in their wrong solutions, while providing often non-sensical "reasoning"-like explanations akin to confabulations to justify and backup the validity of their clearly failed responses, making them sound plausible. Various standard interventions in an attempt to get the right solution, like various type of enhanced prompting, or urging the models to reconsider the wrong solutions again by multi step re-evaluation, fail. We take these initial observations to the scientific and technological community to stimulate urgent re-assessment of the claimed capabilities of current generation of LLMs, Such re-assessment also requires common action to create standardized benchmarks that would allow proper detection of such basic reasoning deficits that obviously manage to remain undiscovered by current state-of-the-art evaluation procedures and benchmarks. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/AIW	翻訳日:2024-06-07 19:44:18 公開日:2024-06-05
# ハウサ語、ヨルバ語、イグボ語に対する攻撃言語とヘイトスピーチ検出のための多言語データセット A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages ( http://arxiv.org/abs/2406.02169v2 ) ライセンス: Link先を確認	Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala,	(参考訳) オンライン攻撃言語の普及は、特に多言語文脈において、効果的な検出メカニズムの開発を必要とする。本研究は,ナイジェリアの主要言語であるHausa,Yoruba,Igboの3言語において,攻撃的言語検出のための新しいデータセットの開発と導入の課題に対処する。私たちはTwitterからデータを収集し、それを手動でアノテートして、ネイティブスピーカーを使用して、3つの言語毎にデータセットを作成しました。トレーニング済みの言語モデルを用いて、データセットにおける攻撃言語の検出の有効性を評価した。最高の性能モデルは90%の精度を達成した。攻撃的言語検出の研究をさらに支援するため、データセットとモデルを一般公開する計画である。 The proliferation of online offensive language necessitates the development of effective detection mechanisms, especially in multilingual contexts. This study addresses the challenge by developing and introducing novel datasets for offensive language detection in three major Nigerian languages: Hausa, Yoruba, and Igbo. We collected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers. We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best-performing model achieved an accuracy of 90\%. To further support research in offensive language detection, we plan to make the dataset and our models publicly available.	翻訳日:2024-06-07 19:44:18 公開日:2024-06-05
# Flash拡散: 画像生成のための条件付き拡散モデルを高速化する Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation ( http://arxiv.org/abs/2406.02347v2 ) ライセンス: Link先を確認	Clement Chadebec, Onur Tasar, Eyal Benaroche, Benjamin Aubin,	(参考訳) 本稿では,Flash拡散モデルの生成を高速化する,効率的で高速で多用途な蒸留法を提案する。このメソッドは、COCO2014とCOCO2017データセット上でイメージ生成を行ういくつかのステップにおいて、FIDとCLIP-Scoreの面で最先端のパフォーマンスに達する。その効率性に加えて、この手法の汎用性は、テキスト・トゥ・イメージ、インペイント、フェイス・スワッピング、スーパーレゾリューション、UNetベースのデノイザ(SD1.5, SDXL)やDiT(Pixart-$\alpha$)、アダプタなどの異なるバックボーンの使用など、いくつかのタスクにまたがる。いずれの場合も、非常に高品質な画像生成を維持しながら、サンプリングステップの数を劇的に削減することができる。公式実装はhttps://github.com/gojasper/flash-diffusion.comで公開されている。 In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swapping, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$\alpha$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at https://github.com/gojasper/flash-diffusion.	翻訳日:2024-06-07 19:44:18 公開日:2024-06-05
# Llumnix: 大規模言語モデルの実行のための動的スケジューリング Llumnix: Dynamic Scheduling for Large Language Model Serving ( http://arxiv.org/abs/2406.03243v1 ) ライセンス: Link先を確認	Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin,	(参考訳) 大規模言語モデル(LLM)に対する推論は、人々の日常生活における潜在能力を解放する鍵となる。しかし、リソース要件やレイテンシ要件の点で要求が本質的に不均一で予測できないため、多様なアプリケーションとLLMの動的実行特性の結果として、効率的なLLM提供は依然として困難である。既存のシステムは、これらの特性を扱うのに基本的に制限されており、厳しいキューの遅延、尾の遅延の低さ、SLO違反などの問題を引き起こす。 Llumnixは、複数のモデルインスタンスにまたがる実行時再スケジューリングによって、不均一で予測不能な要求に応答するLLMサービスシステムである。現代のオペレーティングシステムのCPUコア間のコンテキストスイッチと同様に、Llumnixはリクエストを再スケジュールし、ロードバランシングとアイソレーションを改善し、リソースのフラグメンテーションを緩和し、リクエスト優先順位とSLOを区別する。 Llumnixは、リクエストとそのインメモリ状態に対する効率的でスケーラブルなライブマイグレーションメカニズムでリスケジュールを実装し、複数のリスケジュールシナリオをエレガントに統一する動的スケジューリングポリシでそれを活用している。評価の結果,Llumnixはテールレイテンシを桁違いに改善し,高優先度要求を最大1.5倍高速化し,類似のテールレイテンシを実現しつつ36%のコスト削減を実現した。 Llumnixはhttps://github.com/AlibabaPAI/llumnixで公開されている。 Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamentally limited in handling these characteristics and cause problems such as severe queuing delays, poor tail latencies, and SLO violations. We introduce Llumnix, an LLM serving system that reacts to such heterogeneous and unpredictable requests by runtime rescheduling across multiple model instances. Similar to context switching across CPU cores in modern operating systems, Llumnix reschedules requests to improve load balancing and isolation, mitigate resource fragmentation, and differentiate request priorities and SLOs. Llumnix implements the rescheduling with an efficient and scalable live migration mechanism for requests and their in-memory states, and exploits it in a dynamic scheduling policy that unifies the multiple rescheduling scenarios elegantly. Our evaluations show that Llumnix improves tail latencies by an order of magnitude, accelerates high-priority requests by up to 1.5x, and delivers up to 36% cost savings while achieving similar tail latencies, compared against state-of-the-art LLM serving systems. Llumnix is publicly available at https://github.com/AlibabaPAI/llumnix.	翻訳日:2024-06-07 19:34:24 公開日:2024-06-05
# スワップゲートによる大域フェルミオンモード最適化 Global fermionic mode optimization via swap gates ( http://arxiv.org/abs/2406.03449v1 ) ライセンス: Link先を確認	Gero Friesecke, Miklós Antal Werner, Kornél Kapás, Andor Menczer, Örs Legeza,	(参考訳) 本稿では,大域フェルミオンモード最適化を用いて,与えられた誤差マージンに対する量子多体波関数の最適表現を求めるための一般的な手法を提案する。固定階数行列積状態多様体上の定常点は、グラスマン多様体 [Phys. Rev. Lett. 117, 210402] 上の合同最適化とスワップゲート制御置換によって得られる。大域量の最小化、ブロックエントロピー領域は、この方法が偏微分に関して全ての基準を満たすことを保証している。強相関分子系の大規模密度行列再正規化群シミュレーションと二次元フェルミオン格子モデルによる数値計算結果について述べる。 We propose a general approach to find an optimal representation of a quantum many body wave function for a given error margin via global fermionic mode optimization. The stationary point on a fixed rank matrix product state manifold is obtained via a joint optimization on the Grassman manifold [Phys. Rev. Lett. 117, 210402] together with swap gates controlled permutations. The minimization of the global quantity, the block entropy area, guarantees that the method fulfills all criteria with respect to partial derivatives. Numerical results via large scale density matrix renormalization group simulations on strongly correlated molecular systems and two-dimensional fermionic lattice models are discussed.	翻訳日:2024-06-07 19:34:24 公開日:2024-06-05
# 多次元・不均衡データセットに対するロバスト予測モデル Robust Prediction Model for Multidimensional and Unbalanced Datasets ( http://arxiv.org/abs/2406.03507v1 ) ライセンス: Link先を確認	Pooja Thakar, Anil Mehta, Manisha,	(参考訳) データマイニングは有望な分野であり、予測能力のために複数のドメインに適用されている。実世界のデータは、多次元性、不均衡、欠落した値の問題に悩まされるため、データマイニングに簡単には利用できない。初心者による予測能力の使用は困難である。初心者は、利用可能な大量のデータから関連する属性のセットを見つけることは困難である。本稿では,ロバスト予測モデルを用いて属性の集合を見つけ,不均衡な実生活データセットと多次元実生活データセットの問題を解き,情報的意思決定のためのパターンの発見を支援する。モデルは、健康分野、教育、ビジネス、詐欺検出の5つの異なるデータセットでテストされる。その結果、モデルが頑健に動作し、様々な領域で適用可能であることが示された。 Data Mining is a promising field and is applied in multiple domains for its predictive capabilities. Data in the real world cannot be readily used for data mining as it suffers from the problems of multidimensionality, unbalance and missing values. It is difficult to use its predictive capabilities by novice users. It is difficult for a beginner to find the relevant set of attributes from a large pool of data available. The paper presents a Robust Prediction Model that finds a relevant set of attributes; resolves the problems of unbalanced and multidimensional real-life datasets and helps in finding patterns for informed decision making. Model is tested upon five different datasets in the domain of Health Sector, Education, Business and Fraud Detection. The results showcase the robust behaviour of the model and its applicability in various domains.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 事前訓練エンコーダのバックドア緩和に関する相互情報案内 Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders ( http://arxiv.org/abs/2406.03508v1 ) ライセンス: Link先を確認	Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang,	(参考訳) ラベル付きデータを必要としないエンコーダの事前トレーニングには,自己教師付き学習(SSL)がますます魅力的なものになっている。これらのトレーニング済みエンコーダ上に構築された下流タスクは、ほぼ最先端のパフォーマンスを達成することができる。しかし、SSLによる事前訓練されたエンコーダは、既存の研究で示されているように、バックドア攻撃に対して脆弱である。下流タスクモデルのために多くのバックドア緩和技術が設計されている。しかし,事前学習時のラベル情報の欠如により,事前学習エンコーダに適用した場合,その有効性は損なわれ,制限される。本稿では,事前訓練したエンコーダに対するバックドア攻撃に対処するため,MIMICという相互誘導型バックドア緩和手法を提案する。 MIMICは、潜在的なバックドアエンコーダを教師ネットとして扱い、知識蒸留を用いて教師ネットからクリーンな学生エンコーダを蒸留する。既存の知識蒸留のアプローチとは異なり、MIMICは学生を無作為な体重で初期化し、教師のネットからバックドアを継承しない。そして、MIMICは各層間の相互情報と抽出した特徴を利用して、教師ネット内の良識の所在を特定する。蒸留損失は, クローン損失と注意損失の2つの側面から発生し, バックドアを緩和し, エンコーダ性能を同時に維持することを目的としている。 SSLにおける2つのバックドア攻撃による評価の結果,MIMIC はクリーンデータの 5% しか利用せず,最先端のバックドア緩和技術7 を超越して攻撃成功率を大幅に低減できることが示された。 Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC. MIMIC treats the potentially backdoored encoder as the teacher net and employs knowledge distillation to distill a clean student encoder from the teacher net. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing <5% of clean data, surpassing seven state-of-the-art backdoor mitigation techniques.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 非対称調和振動子のコヒーレント状態 Coherent states of the asymmetric harmonic oscillator ( http://arxiv.org/abs/2406.03509v1 ) ライセンス: Link先を確認	G. Chadzitaskos,	(参考訳) 非対称高調波発振器に対して, 非対称性パラメータがばね定数比の平方根となる形式的コヒーレント状態を構築した。これらの状態はグラウバーのアプローチとペレロモフのアプローチに基づいているが、一般にコヒーレントな状態に必要な全ての性質を満たすわけではない。時間が経つにつれ、このような方法で導入されたコヒーレントな状態は一般に非コヒーレントになる。しかし、スプリング定数の平方根比に対して、$\frac{4k+1}{4l+1}$または$\frac{4k+3}{4l+3}$の特定のパラメータが存在する。これらのパラメータに対して、固有状態のヒルベルト空間の部分空間上のコヒーレント状態を構築することができる。これらのコヒーレントな状態は、進化の過程でコヒーレンスを維持する。この事例も分析される。 We constructed formal coherent states for an asymmetric harmonic oscillator, where the asymmetry parameter is the square root of the ratio of spring constants. Although these states are constructed based on both Glauber's and Perelomov's approaches, in general they do not satisfy all the properties required for coherent states. Over time, the coherent states introduced in this way generally become incoherent. However, there are some specific parameters for the square root ratios of the spring constants $\frac{4k+1}{4l+1}$ or $\frac{4k+3}{4l+3}$. For these parameters it is possible to construct coherent states on the subspace of the Hilbert space of eigenstates. These coherent states keep their coherence during the time evolution. This case is also analyzed.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 音声による臨床うつ病スクリーニング : 実証的研究 Speech-based Clinical Depression Screening: An Empirical Study ( http://arxiv.org/abs/2406.03510v1 ) ライセンス: Link先を確認	Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi,	(参考訳) 本研究では, 精神科面接, チャットボット会話, テキスト読解など, さまざまな相互作用シナリオを対象としたAIによる抑うつスクリーニングにおける音声信号の有用性について検討した。参加者には、北京大学第6病院の外来から徴発されたうつ病患者や、地域社会のコントロールグループメンバーが含まれており、すべて標準化された診断プロトコルに従って精神科医によって診断されている。音声と深部音声の特徴を各参加者の分節録音から抽出した。分類はニューラルネットワークまたはSVMを使用して行われ、最終的な評価はまとめられたクリップ結果によって決定された。対話シナリオ, 音声処理技術, 特徴型による分析により, 抑うつスクリーニングの重要な指標として音声が確認される。具体的には、人間とコンピュータの相互作用が臨床面接の有効性と一致し、読解タスクを超越する。セグメントの長さと量はモデル性能に大きく影響し、ディープ音声の特徴は従来の音響特性よりもかなり優れていた。 This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants includes depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading tasks. Segment duration and quantity significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# MagiNet:不完全なトラフィックデータのためのマスク対応グラフインプットネットワーク MagiNet: Mask-Aware Graph Imputation Network for Incomplete Traffic Data ( http://arxiv.org/abs/2406.03511v1 ) ライセンス: Link先を確認	Jianping Zhou, Bin Lu, Zhanyu Liu, Siyu Pan, Xuejun Feng, Hua Wei, Guanjie Zheng, Xinbing Wang, Chenghu Zhou,	(参考訳) 検出器の故障と通信障害のため、交通データの収集中に欠落したデータがどこにでもある。したがって、インテリジェントトランスポートシステム(ITS)のデータ分析と意思決定を容易にするために、欠落した値をインプットすることが極めて重要である。しかし、既存の計算手法は一般に、欠落した値を初期化し、避けられないノイズを発生させるため、0のプリフィル技術を実行する。さらに,不完全な交通データに内在する時空間相関を明らかにするために,過度に平滑な補間を観測する。そこで我々はMask-Aware Graph imputation Network: MagiNetを提案する。適応マスク時空間エンコーダを設計し、不完全データの潜在表現を学習し、不足した値への依存を解消する。さらに、複数のブロックを積み重ねた時空間デコーダを考案し、不完全なトラフィックデータ中の空間的および時間的依存関係を捕捉し、過度に平滑な計算を緩和する。その結果, RMSEでは平均4.31%, MAPEでは3.72%向上した。 Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, introducing inevitable noises. Moreover, we observe prevalent over-smoothing interpolations, falling short in revealing the intrinsic spatio-temporal correlations of incomplete traffic data. To this end, we propose Mask-Aware Graph imputation Network: MagiNet. Our method designs an adaptive mask spatio-temporal encoder to learn the latent representations of incomplete data, eliminating the reliance on pre-filling missing values. Furthermore, we devise a spatio-temporal decoder that stacks multiple blocks to capture the inherent spatial and temporal dependencies within incomplete traffic data, alleviating over-smoothing imputation. Extensive experiments demonstrate that our method outperforms state-of-the-art imputation methods on five real-world traffic datasets, yielding an average improvement of 4.31% in RMSE and 3.72% in MAPE.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 困難か違いか?オーディオディープフェイク検出の一般化を理解する Harder or Different? Understanding Generalization of Audio Deepfake Detection ( http://arxiv.org/abs/2406.03512v1 ) ライセンス: Link先を確認	Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger,	(参考訳) 最近の研究は、音声のディープフェイク検出における重要な課題を強調している。これは、テキスト音声(TTS)モデルの品質が継続的に向上していること、すなわち、より新しいDeepFakesは単に'ハード'で検出できるのか? あるいは、あるモデルで生成されたディープフェイクが、別のモデルで生成されたディープフェイクと根本的に異なるからだろうか? ドメイン内テストデータとドメイン外テストデータのパフォーマンスギャップを'ハードネス'と'ディファレンス'コンポーネントに分解することで、この問題に答える。 ASVspoofデータベースを用いて行った実験は、硬さ成分が事実上無視可能であることを示している。これは現実世界のディープフェイク検出に直接的な意味を持ち、現在支配的な研究トレンドであるモデル容量の増加だけでは、一般化の課題に効果的に対処できないことを強調している。 Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using another model? We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components. Experiments performed using ASVspoof databases indicate that the hardness component is practically negligible, with the performance gap being attributed primarily to the difference component. This has direct implications for real-world deepfake detection, highlighting that merely increasing model capacity, the currently-dominant research trend, may not effectively address the generalization challenge.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# デバイス間フェデレーション学習のためのバッファ付き非同期セキュアアグリゲーション Buffered Asynchronous Secure Aggregation for Cross-Device Federated Learning ( http://arxiv.org/abs/2406.03516v1 ) ライセンス: Link先を確認	Kun Wang, Yi-Rui Yang, Wu-Jun Li,	(参考訳) 非同期フェデレーション学習(AFL)は、デバイス間フェデレーション学習におけるデバイス不均一性の課題に対処する有効な方法である。しかしながら、AFLは通常、既存のセキュアアグリゲーションプロトコルは同期アグリゲーションに基づいているため、フェデレートラーニングにおけるユーザのプライバシを保護するために使用される既存のセキュアアグリゲーションプロトコルと互換性がない。本稿では,バッファ型非同期セキュアアグリゲーション(BASA)と呼ばれる新しいセキュアアグリゲーションプロトコルを提案する。既存のプロトコルと比較して、BASAはAFLと完全に互換性があり、各ユーザがユーザ間の同期通信に頼ることなく、サーバとの1ラウンドの通信しか必要としないという条件の下でセキュアなアグリゲーションを提供する。 BASAに基づいてハードウェアに余分な要求を伴わずにセキュアなアグリゲーションを実現する最初のAFL法を提案する。我々は、BASAが、トレーニング効率とスケーラビリティの観点から、クロスデバイス・フェデレーション・ラーニングのための既存のセキュア・アグリゲーション・プロトコルより優れていることを実証的に実証した。 Asynchronous federated learning (AFL) is an effective method to address the challenge of device heterogeneity in cross-device federated learning. However, AFL is usually incompatible with existing secure aggregation protocols used to protect user privacy in federated learning because most existing secure aggregation protocols are based on synchronous aggregation. To address this problem, we propose a novel secure aggregation protocol named buffered asynchronous secure aggregation (BASA) in this paper. Compared with existing protocols, BASA is fully compatible with AFL and provides secure aggregation under the condition that each user only needs one round of communication with the server without relying on any synchronous interaction among users. Based on BASA, we propose the first AFL method which achieves secure aggregation without extra requirements on hardware. We empirically demonstrate that BASA outperforms existing secure aggregation protocols for cross-device federated learning in terms of training efficiency and scalability.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 不均一な個人差分学習のための雑音認識アルゴリズム Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning ( http://arxiv.org/abs/2406.03519v1 ) ライセンス: Link先を確認	Saber Malekmohammadi, Yaoliang Yu, Yang Cao,	(参考訳) 高いユーティリティと厳密なデータプライバシは、いくつかのクライアント間で分散したデータからモデルを学ぶ、フェデレートラーニング(FL)システムの主要な目標のひとつです。後者はFL(DPFL)で差分プライバシーを利用することで実現されている。クライアントのプライバシ要件には不均一性があることが多く、既存のDPFLは、クライアントの統一的なプライバシ要件を前提とするか、あるいはサーバが完全に信頼されていない場合(設定)には適用できない。さらに、クライアントのバッチサイズやデータセットサイズには不均一性がしばしば存在し、示すように、クライアントモデルの更新間でDPノイズレベルが余分に変化する。このような異種性の源では、クライアントのアグリゲーションの重み付けをプライバシパラメータに比例して割り当てるなど、直接的なアグリゲーション戦略によって、実用性が低下する。本稿では,クライアントモデル更新における真のノイズレベルを効率的に推定し,集約モデル更新におけるノイズレベルを大幅に低減するRobust-HDPを提案する。 Robust-HDPはユーティリティと収束速度を改善し、不正なプライバシパラメータをサーバに送信する可能性のあるクライアントに対して安全である。複数のデータセットに対する大規模な実験結果と理論的解析により,Robust-HDPの有効性が確認された。私たちのコードはここにある。 High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not applicable when server is not fully trusted (our setting). Furthermore, there is often heterogeneity in batch and/or dataset size of clients, which as shown, results in extra variation in the DP noise level across clients model updates. With these sources of heterogeneity, straightforward aggregation strategies, e.g., assigning clients aggregation weights proportional to their privacy parameters will lead to lower utility. We propose Robust-HDP, which efficiently estimates the true noise level in clients model updates and reduces the noise-level in the aggregated model updates considerably. Robust-HDP improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameter to server. Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# VideoPhy:ビデオ生成のための物理コモンセンスの評価 VideoPhy: Evaluating Physical Commonsense for Video Generation ( http://arxiv.org/abs/2406.03520v1 ) ライセンス: Link先を確認	Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover,	(参考訳) インターネット規模のビデオデータの事前トレーニングの最近の進歩は、様々な視覚概念やスタイルで高品質な動画を作成できるテキスト・ビデオ生成モデルの開発につながっている。現実的な動きを合成し、複雑な物体をレンダリングする能力により、これらの生成モデルは物理世界の汎用シミュレータになる可能性がある。しかし、既存のテキスト・ビデオ生成モデルでは、この目標からどこまで離れているのかは不明だ。この目的のために、生成したビデオが現実世界のアクティビティの物理的なコモンセンスに従うかどうかを評価するために設計されたベンチマークであるVideoPhyを紹介する(例えば、大理石は傾斜した表面に置かれたときにロールダウンする)。具体的には、物理世界における様々な物質種間の相互作用を含む688のキャプションのリスト(例えば、固形固形流体、固形流体、流体流体)をキュレートする。次に、オープンモデル(例: VideoCrafter2)やクローズドモデル(例: Google, Pika)など、さまざまな最先端のテキスト・ビデオ生成モデルから、これらのキャプションに条件付けされたビデオを生成します。さらに,人間による評価の結果,既存のモデルではテキストプロンプトに忠実な動画を生成する能力が乏しく,物理的コモンセンスも欠如していることが判明した。具体的には、最高のパフォーマンスモデルであるピカは、19.7%のインスタンスでキャプションと物理法に準拠するビデオを生成する。 VideoPhyは、ビデオ生成モデルは物理的な世界を正確にシミュレートするものではないと強調する。最後に、データセットを自動評価器であるVideoCon-Physicsで補足し、意味的定着と物理的常識を大規模に評価する。 Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However, it is unclear how far we are from this goal with the existing text-to-video generative models. To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). Specifically, we curate a list of 688 captions that involve interactions between various material types in the physical world (e.g., solid-solid, solid-fluid, fluid-fluid). We then generate videos conditioned on these captions from diverse state-of-the-art text-to-video generative models, including open models (e.g., VideoCrafter2) and closed models (e.g., Lumiere from Google, Pika). Further, our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts, while also lack physical commonsense. Specifically, the best performing model, Pika, generates videos that adhere to the caption and physical laws for only 19.7% of the instances. VideoPhy thus highlights that the video generative models are far from accurately simulating the physical world. Finally, we also supplement the dataset with an auto-evaluator, VideoCon-Physics, to assess semantic adherence and physical commonsense at scale.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 開非平衡量子系におけるMpemba効果 Mpemba effects in open nonequilibrium quantum systems ( http://arxiv.org/abs/2406.03521v1 ) ライセンス: Link先を確認	Andrea Nava, Reinhold Egger,	(参考訳) いくつかの貯水池に結合した量子系を開放するために、古典的な熱的メンバ効果(初期のホット系は、冷たいものよりも最終平衡状態に速く緩和する)を一般化する。一般に、2つの異なる種類の量子Mpemba効果が可能であることを示す。それらは量子状態トモグラフィーによって区別される。しかし、(型を決定することなしに)量子ムペンバ効果の存在は、電流やエネルギーのような単純な観測可能量を測定することで既に確立できる。 2つの金属鉛に結合した相互作用する2サイト北エフ模型の実験可能な場合の一般的な結果について述べる。 We generalize the classical thermal Mpemba effect (where an initially hot system relaxes faster to the final equilibrium state than a cold one) to open quantum systems coupled to several reservoirs. We show that, in general, two different types of quantum Mpemba effects are possible. They may be distinguished by quantum state tomography. However, the existence of a quantum Mpemba effect (without determining the type) can already be established by measuring simpler observables such as currents or energies. We illustrate our general results for the experimentally feasible case of an interacting two-site Kitaev model coupled to two metallic leads.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# $\mathcal{PT}$-symmetric系における拡散複雑性と局在 Spread complexity and localization in $\mathcal{PT}$-symmetric systems ( http://arxiv.org/abs/2406.03524v1 ) ライセンス: Link先を確認	Aranya Bhattacharya, Rathindra Nath Das, Bidyut Dey, Johanna Erdmenger,	(参考訳) 本稿では,拡散複雑性と拡散エントロピーを用いた$\mathcal{PT}$-対称量子系における波動関数の拡散に関する研究フレームワークを提案する。境界点に複雑なオンサイトポテンシャルを持つ強結合鎖を考える。 $\mathcal{PT}$-unbroken 相では、波動関数は非局在化される。我々は、$\mathcal{PT}$-breakken 相において、強結合格子の片端に局在する。この局在は非エルミート皮膚効果の実現である。 $\mathcal{PT}$-breakken 相の局在は格子鎖基底とクリロフ基底の両方で観察される。スプレッドエントロピー、エントロピー複雑性、およびクリロフ逆参加比(英語版)と呼ばれるさらなる尺度は、波動関数のダイナミクスを探索し、クリロフ基底で探索された局所化の強さを定量化する。状態の情報を保存するために必要なクリロフ基底ベクトルの数は、局所化の強さによって減少する。以上の結果から,Krylov空間の測度を非エルミート皮膚効果とその局在相転移の特徴づけに利用できることが示唆された。 We present a framework for investigating wave function spreading in $\mathcal{PT}$-symmetric quantum systems using spread complexity and spread entropy. We consider a tight-binding chain with complex on-site potentials at the boundary sites. In the $\mathcal{PT}$-unbroken phase, the wave function is delocalized. We find that in the $\mathcal{PT}$-broken phase, it becomes localized on one edge of the tight-binding lattice. This localization is a realization of the non-Hermitian skin effect. Localization in the $\mathcal{PT}$-broken phase is observed both in the lattice chain basis and the Krylov basis. Spread entropy, entropic complexity, and a further measure that we term the Krylov inverse participation ratio probe the dynamics of wave function spreading and quantify the strength of localization probed in the Krylov basis. The number of Krylov basis vectors required to store the information of the state reduces with the strength of localization. Our results demonstrate how measures in Krylov space can be used to characterize the non-hermitian skin effect and its localization phase transition.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# エッジ重み決定図を用いた混合次元量子状態生成 Mixed-Dimensional Qudit State Preparation Using Edge-Weighted Decision Diagrams ( http://arxiv.org/abs/2406.03531v1 ) ライセンス: Link先を確認	Kevin Mato, Stefan Hillmich, Robert Wille,	(参考訳) 量子コンピュータは、古典的なコンピュータでは基本的に難解な重要な問題を解く可能性がある。量子コンピューティングプラットフォームの基盤となる物理は、多値論理(multi-valued logic)の使用をサポートする。このポテンシャルを利用するための重要な要素の1つは、多値系(qudit)のために量子状態を効率的に準備する能力である。量子コンピュータの時間感度のため、必要な状態に備える回路は可能な限り短くする必要がある。本稿では,混合次元系に着目した量子状態生成法について検討する。提案手法は, 対応する混合次元量子状態を構成する量子回路を自動生成する。この目的のために、決定図は、実現される量子状態のコンパクトな表現として使用される。さらに、量子状態を近似して、精度、メモリの複雑さ、回路内の演算数の間の微調整されたトレードオフを可能にする能力も取り入れている。実験的な評価は、高速でスケーラブルな量子状態の準備を容易にするための提案手法の有効性を示し、性能は決定図のサイズに直接関連している。この実装は MQT Qudits at github.com/cda-tum/mqt-qudits のフレームワーク MQT Qudits の一部として、ミュンヘン量子ツールキット(MQT)の一部として無料で利用可能である。 Quantum computers have the potential to solve important problems which are fundamentally intractable on a classical computer. The underlying physics of quantum computing platforms supports using multi-valued logic, which promises a boost in performance over the prevailing two-level logic. One key element to exploiting this potential is the capability to efficiently prepare quantum states for multi-valued, or qudit, systems. Due to the time sensitivity of quantum computers, the circuits to prepare the required states have to be as short as possible. In this paper, we investigate quantum state preparation with a focus on mixed-dimensional systems, where the individual qudits may have different dimensionalities. The proposed approach automatically realizes quantum circuits constructing a corresponding mixed-dimensional quantum state. To this end, decision diagrams are used as a compact representation of the quantum state to be realized. We further incorporate the ability to approximate the quantum state to enable a finely controlled trade-off between accuracy, memory complexity, and number of operations in the circuit. Empirical evaluations demonstrate the effectiveness of the proposed approach in facilitating fast and scalable quantum state preparation, with performance directly linked to the size of the decision diagram. The implementation is freely available as part of Munich Quantum Toolkit~(MQT), under the framework MQT Qudits at github.com/cda-tum/mqt-qudits.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 量子コンピューティングにおける時間的ハドロン真空分極と光による散乱:シュウィンガーモデル実験 Towards Quantum Computing Timelike Hadronic Vacuum Polarization and Light-by-Light Scattering: Schwinger Model Tests ( http://arxiv.org/abs/2406.03536v1 ) ライセンス: Link先を確認	João Barata, Kazuki Ikeda, Swagato Mukherjee, Jonathan Raghoonanan,	(参考訳) ハドロン真空分極(HVP)と光バイライト散乱(HLBL)は、ミューオンの異常な磁気モーメントに関する標準モデル予測を評価する上で重要である。しかし、これらの観測可能な時間的領域の直接的な第一原理格子ゲージ理論に基づく計算は、依然として困難である。空間的領域における格子量子色力学(QCD)計算と、時間的領域からの実験データパラメトリゼーションに依存する分散的アプローチとの相違が持続する。本稿では、1+1次元量子電磁力学(QED)、すなわちシュウィンガーモデルを用いてHVPとHLBLを解析する手法を紹介する。そのために、テンソルネットワーク技術、特に行列積状態とデジタル量子コンピュータの古典的エミュレータの両方を使用します。単純化されたモデルで実現可能性を示すため、我々の手法はデジタル量子コンピュータを活用した将来の取り組みの舞台となる。 Hadronic vacuum polarization (HVP) and light-by-light scattering (HLBL) are crucial for evaluating the Standard Model predictions concerning the muon's anomalous magnetic moment. However, direct first-principle lattice gauge theory-based calculations of these observables in the timelike region remain challenging. Discrepancies persist between lattice quantum chromodynamics (QCD) calculations in the spacelike region and dispersive approaches relying on experimental data parametrization from the timelike region. Here, we introduce a methodology employing 1+1-dimensional quantum electrodynamics (QED), i.e. the Schwinger Model, to investigate the HVP and HLBL. To that end, we use both tensor network techniques, specifically matrix product states, and classical emulators of digital quantum computers. Demonstrating feasibility in a simplified model, our approach sets the stage for future endeavors leveraging digital quantum computers.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# データ複雑度の幾何学的視点:拡散モデルを用いた効率的な局所固有次元推定 A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models ( http://arxiv.org/abs/2406.03537v1 ) ライセンス: Link先を確認	Hamidreza Kamkari, Brendan Leigh Ross, Rasa Hosseinzadeh, Jesse C. Cresswell, Gabriel Loaiza-Ganem,	(参考訳) 高次元データは一般に低次元部分多様体の上にあり、ダトゥムの局所内在次元(LID)を推定する(つまり、それが属する部分多様体の次元)ことは長年の問題である。 LIDは、変化の局所的な要因の数として理解することができる: ダタムの変動の要因が多ければ多いほど、それがより複雑になる傾向がある。この量の推定は、ニューラルネットワークの一般化からアウト・オブ・ディストリビューションデータの検出、敵例、AI生成テキストに至るまで、コンテキストにおいて有用であることが証明されている。近年の深層生成モデルの成功は、それらをLID推定に活用する機会を与えるが、生成モデルに基づく現在の手法は、不正確な見積もりを生成し、単一の事前学習モデル以上のものを必要とし、計算集約的であり、あるいは最良の深部生成モデル、すなわち拡散モデル(DM)を利用できない。本研究では, DMに付随するFokker-Planck方程式が, 上記すべての欠陥に対処するLID推定器を提供することを示す。我々の推定器はFLIPDと呼ばれ、すべての一般的なDMと互換性があり、LID推定ベンチマークで既存のベースラインを上回っている。また,実LIDが不明な自然画像にもFLIPDを適用した。競合推定器と比較して、FLIPDは複雑性の非LID測度と高い相関を示し、複雑性の質的な評価とよく一致し、安定拡散のスケールで高解像度の画像を抽出可能な唯一の推定器である。 High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models, i.e. diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide a LID estimator which addresses all the aforementioned deficiencies. Our estimator, called FLIPD, is compatible with all popular DMs, and outperforms existing baselines on LID estimation benchmarks. We also apply FLIPD on natural images where the true LID is unknown. Compared to competing estimators, FLIPD exhibits a higher correlation with non-LID measures of complexity, better matches a qualitative assessment of complexity, and is the only estimator to remain tractable with high-resolution images at the scale of Stable Diffusion.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 非アベリアンアゾニック系における絡み合い非対称性 Entanglement Asymmetry in non-Abelian Anyonic Systems ( http://arxiv.org/abs/2406.03546v1 ) ライセンス: Link先を確認	Nicetu Tibau Vidal, Ved Kunte, Lucia Vilchez-Estevez, Mohit Lal Bera, Manabendra Nath Bera,	(参考訳) フォールトトレラントなトポロジカル量子計算のための有望なプラットフォームであるNon-Abelian anyonsは、物理的に許容される状態と演算に制限を課すチャージスーパーセレクションルール(cSSR)に準拠している。しかし、任意の量子情報理論におけるcSSRと融合規則の分岐はほとんど未解明のままである。本研究では, クイディット, ボソン, フェルミオンなどの非アノニック系と, 情報理論特性が根本的に異なることを明らかにし, 複雑な構造を提示する。バイパルタイト系では、純粋な状態は異なる境界スペクトルを持ち、混合状態は純粋な境界状態を含む。さらに注目すべきは、純粋な絡み合った状態において、当事者は絡み合った状態への平等なアクセスを欠いている可能性があることだ。この絡み合った非対称性は、アリスとボブの間に共有される絡み合った正準状態を用いて量子テレポーテーションにおいて現れ、アリスは未知の量子情報をボブに完全にテレポーティングできるが、ボブはこの能力に欠ける。これらの特徴は従来の理解に挑戦し、量子情報や相関を常に特徴付ける新しいアプローチを必要とする。これらの特徴は非アベリア格子ゲージ場理論にも現れることを期待する。本研究は, 量子通信と暗号プロトコルの実現に繋がる可能性があり, 一方が他方に傾いている場合の知識理論的側面の理解を著しく促進する。 Non-Abelian anyons, a promising platform for fault-tolerant topological quantum computation, adhere to the charge super-selection rule (cSSR), which imposes restrictions on physically allowed states and operations. However, the ramifications of cSSR and fusion rules in anyonic quantum information theory remain largely unexplored. In this study, we unveil that the information-theoretic characteristics of anyons diverge fundamentally from those of non-anyonic systems such as qudits, bosons, and fermions and display intricate structures. In bipartite anyonic systems, pure states may have different marginal spectra, and mixed states may contain pure marginal states. More striking is that in a pure entangled state, parties may lack equal access to entanglement. This entanglement asymmetry is manifested in quantum teleportation employing an entangled anyonic state shared between Alice and Bob, where Alice can perfectly teleport unknown quantum information to Bob, but Bob lacks this capability. These traits challenge conventional understanding, necessitating new approaches to characterize quantum information and correlations in anyons. We expect that these distinctive features will also be present in non-Abelian lattice gauge field theories. Our findings significantly advance the understanding of the information-theoretic aspects of anyons and may lead to realizations of quantum communication and cryptographic protocols where one party holds sway over the other.	翻訳日:2024-06-07 19:24:39 公開日:2024-06-05
# 統合不確実性注入による深層学習によるロバスト通信と計算 Robust Communication and Computation using Deep Learning via Joint Uncertainty Injection ( http://arxiv.org/abs/2406.03548v1 ) ライセンス: Link先を確認	Robert-Jeron Reifert, Hayssam Dahrouj, Alaa Alameer Ahmad, Haris Gacanin, Aydin Sezgin,	(参考訳) コミュニケーションと計算の収束は、機械学習と人工知能の統合とともに、第6世代の通信システム(6G)の鍵となる力となる。本稿では,空間多重化を用いた複数のデバイスを同時に運用する1つの基地局のネットワークについて考察する。そこで本稿では,チャネル情報と計算状態情報の両面での不確実性の中で,計算割り当てとともに送信と計算の能力を同時に管理する,革新的なディープラーニングベースのアプローチを提案する。より具体的には、計算と電力制約の対象となるサービス機器間の最悪の遅延を最小限に抑える、堅牢なソリューションを提案することを目的としている。この論文は、推定チャネルと計算要求を最適化されたリソース割り当てにマッピングするディープニューラルネットワーク(DNN)ベースのソリューションを使用する。トレーニング中、DNN出力後に不確実性サンプルを注入し、通信および計算推定誤差の両方を共同で考慮する。 DNNは、堅牢なユーティリティを使用してバックプロパゲーションを通じてトレーニングされ、したがって、不確実性分布を暗黙的に学習する。本研究は, 従来のDNN法と比較して, 高チャネル, 計算不確実性系において, 堅牢な遅延性能が向上していることを検証するものである。 The convergence of communication and computation, along with the integration of machine learning and artificial intelligence, stand as key empowering pillars for the sixth-generation of communication systems (6G). This paper considers a network of one base station serving a number of devices simultaneously using spatial multiplexing. The paper then presents an innovative deep learning-based approach to simultaneously manage the transmit and computing powers, alongside computation allocation, amidst uncertainties in both channel and computing states information. More specifically, the paper aims at proposing a robust solution that minimizes the worst-case delay across the served devices subject to computation and power constraints. The paper uses a deep neural network (DNN)-based solution that maps estimated channels and computation requirements to optimized resource allocations. During training, uncertainty samples are injected after the DNN output to jointly account for both communication and computation estimation errors. The DNN is then trained via backpropagation using the robust utility, thus implicitly learning the uncertainty distributions. Our results validate the enhanced robust delay performance of the joint uncertainty injection versus the classical DNN approach, especially in high channel and computational uncertainty regimes.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# Npix2Cpix: 歴史的文書画像からの透かし検索のための検索分類統合を備えたGANベースの画像変換ネットワーク Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images ( http://arxiv.org/abs/2406.03556v1 ) ライセンス: Link先を確認	Utsab Saha, Sawradip Saha, Shaikh Anowarul Fattah, Mohammad Saquib,	(参考訳) 古代の透かしの識別と復元は、長い間、コーディコロジーと歴史の主要なトピックであった。透かしに基づく歴史文書の分類は、透かしの多様性、混み合った、騒々しいサンプル、複数の表現のモード、クラスとクラス内の変化の微妙な区別のために困難である。本稿では,U-net をベースとした条件付き逆数生成ネットワーク (GAN) を提案する。劣化した(ノイズの多い)ピクセルからクリーンなピクセルへの画像変換を行う能力を考えると、提案するネットワークはNpix2Cpixと呼ばれる。提案ネットワークでは,直接劣化した透かし画像を利用する代わりに,逆算学習を用いて画像から画像への変換を用いて,透かしの復元と分類を行う。入力ノイズ画像からクリーンな画像を出力するマッピングを学習するために、提案したU-netベースのGANのジェネレータと判別器を、画像間の距離に基づいて2つの別々の損失関数を用いて訓練する。提案したGANをノイズの多い透かし画像の前処理に使用した後、シームズをベースとしたワンショット学習を用いて透かしを分類する。大規模な歴史的透かしデータセットの実験結果によると、汚染画像から透かしを抽出すると、高いワンショット分類精度が得られる。得られた透かしの質的,定量的評価は,提案手法の有効性を示すものである。 The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks can be difficult due to the diversity of watermarks, crowded and noisy samples, multiple modes of representation, and minor distinctions between classes and intra-class changes. This paper proposes a U-net-based conditional generative adversarial network (GAN) to translate noisy raw historical watermarked images into clean, handwriting-free images with just watermarks. Considering its ability to perform image translation from degraded (noisy) pixels to clean pixels, the proposed network is termed as Npix2Cpix. Instead of employing directly degraded watermarked images, the proposed network uses image-to-image translation using adversarial learning to create clutter and handwriting-free images for restoring and categorizing the watermarks for the first time. In order to learn the mapping from input noisy image to output clean image, the generator and discriminator of the proposed U-net-based GAN are trained using two separate loss functions, each of which is based on the distance between images. After using the proposed GAN to pre-process noisy watermarked images, Siamese-based one-shot learning is used to classify watermarks. According to experimental results on a large-scale historical watermark dataset, extracting watermarks from tainted images can result in high one-shot classification accuracy. The qualitative and quantitative evaluation of the retrieved watermarks illustrates the effectiveness of the proposed approach.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# サブトラクティブホモモルフィズムによる外部データベースのステートレスおよび非インタラクティブ順序保存暗号化 Stateless and Non-Interactive Order-Preserving Encryption for Outsourced Databases through Subtractive Homomorphism ( http://arxiv.org/abs/2406.03559v1 ) ライセンス: Link先を確認	Dongfang Zhao,	(参考訳) OPEは、アウトソースされたデータベースサーバが、インデックスや完全な範囲クエリを構築するために、暗号化されたタプルをソートできる重要な技術であるため、アウトソースされたデータベースの文脈で、20年以上にわたって広く研究されてきた。最先端のOPEスキームの必要性 (i)ステートフルなクライアント -- クライアントが平文と暗号文の間のマッピングのローカルストレージを管理していることを意味する。 (ii)クエリ中のクライアントとサーバ間のインタラクション。第一のケースでは、ストレージ要件がクライアントの能力を超える可能性がある;第二のケースでは、サーバがソートや比較を含むクエリを実行すると、クライアントはアクセスできないかもしれない。本稿では、ステートレスクライアントに適した新しいOPEスキームを提案し、クエリ中にクライアントとサーバのインタラクションを必要としない。提案プロトコルの鍵となる考え方は,2つの平文の違いの符号が評価鍵を持つ代数演算によって明らかにされるように,同型暗号スキームの基盤となる付加性を活用することである。本論文では,提案プロトコルの正当性と安全性を実証し,その実装と実験結果を拡張レポートに示す。 Order-preserving encryption (OPE) has been extensively studied for more than two decades in the context of outsourced databases because OPE is a key enabling technique to allow the outsourced database servers to sort encrypted tuples in order to build indexes, complete range queries, and so forth. The state-of-the-art OPE schemes require (i) a stateful client -- implying that the client manages the local storage of some mapping between plaintexts and ciphertexts, and/or (ii) the interaction between the client and the server during the query. In production systems, however, the above assumptions do not always hold (not to mention performance overhead): In the first case, the storage requirement could exceed the capability of the client; In the second case, the clients may not be accessible when the server executes a query involving sort or comparison. This paper proposes a new OPE scheme that works for stateless clients and requires no client-server interaction during the queries. The key idea of our proposed protocol is to leverage the underlying additive property of a homomorphic encryption scheme such that the sign of the difference between two plaintexts can be revealed by some algebraic operations with an evaluation key. We will demonstrate the correctness and security of the proposed protocol in this short paper; the implementation and experimental results will be presented in an extended report.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 非線形モデル縮小のためのニューラルな経験的補間法 Neural empirical interpolation method for nonlinear model reduction ( http://arxiv.org/abs/2406.03562v1 ) ライセンス: Link先を確認	Max Hirsch, Federico Pichi, Jan S. Hesthaven,	(参考訳) 本稿では,離散的経験的補間法に代わるニューラルネットワークを用いたニューラル・経験的補間法(NEIM)を導入し,パラメータ化された非線形偏微分方程式に対するリミットオーダーモデル(ROM)において非線形項の計算の時間的複雑さを低減する。 NEIMは、ROMの非線形項のアフィン分解を近似することにより、この還元を達成し、拡張のベクトル項はROM溶液によってニューラルネットワークによって与えられ、係数はいくつかの「最適」係数の補間によって与えられる。 NEIMは強欲な戦略に基づいており,その性能を調査するための基本的な誤り解析を行うことができる。 NEIMは、自動微分モデルにおいて実装が容易で、ROM非線形性の非線形射影であり、非局所非線形性と局所非線形性の両方に効率的であり、ROM非線形性の明示的な形式ではなく、データのみに依存するという利点がある。本稿では, 解依存および解非依存の非線形性, 非線形楕円問題, および液晶の非線形パラボリックモデルに対する方法論の有効性を示す。 In this paper, we introduce the neural empirical interpolation method (NEIM), a neural network-based alternative to the discrete empirical interpolation method for reducing the time complexity of computing the nonlinear term in a reduced order model (ROM) for a parameterized nonlinear partial differential equation. NEIM is a greedy algorithm which accomplishes this reduction by approximating an affine decomposition of the nonlinear term of the ROM, where the vector terms of the expansion are given by neural networks depending on the ROM solution, and the coefficients are given by an interpolation of some "optimal" coefficients. Because NEIM is based on a greedy strategy, we are able to provide a basic error analysis to investigate its performance. NEIM has the advantages of being easy to implement in models with automatic differentiation, of being a nonlinear projection of the ROM nonlinearity, of being efficient for both nonlocal and local nonlinearities, and of relying solely on data and not the explicit form of the ROM nonlinearity. We demonstrate the effectiveness of the methodology on solution-dependent and solution-independent nonlinearities, a nonlinear elliptic problem, and a nonlinear parabolic model of liquid crystals.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# GFN:多元性応用における分解能不変化演算子学習のためのグラフフィードフォワードネットワーク GFN: A graph feedforward network for resolution-invariant reduced operator learning in multifidelity applications ( http://arxiv.org/abs/2406.03569v1 ) ライセンス: Link先を確認	Oisín M. Morrison, Federico Pichi, Jan S. Hesthaven,	(参考訳) 本研究は,多忠実度アプリケーションのための新しい分解能不変モデルオーダー削減戦略を提案する。この研究で開発された新しいニューラルネットワーク層であるグラフフィードフォワードネットワークは、ニューラルネットワークの重みとメッシュのノードとを直接リンクすることで、フィードフォワードネットワークの概念をグラフ構造化データに拡張し、ネットワークの解釈可能性を高める。パラメトリックな偏微分方程式に対する自己エンコーダに基づく還元戦略において,異なるメッシュサイズでのトレーニングとテストの能力を利用する。この拡張は、エラーバウンダリによるパフォーマンス保証が保証されていることを示している。提案手法の能力は, 対流支配現象や高次元パラメータ空間の問題を含む3つの挑戦的ベンチマークで検証される。この手法は, 最先端モデルと比較して軽量で柔軟な手法であり, 単一忠実度と多忠実度の両方のシナリオにおいて優れた一般化性能を示す。 This work presents a novel resolution-invariant model order reduction strategy for multifidelity applications. We base our architecture on a novel neural network layer developed in this work, the graph feedforward network, which extends the concept of feedforward networks to graph-structured data by creating a direct link between the weights of a neural network and the nodes of a mesh, enhancing the interpretability of the network. We exploit the method's capability of training and testing on different mesh sizes in an autoencoder-based reduction strategy for parametrised partial differential equations. We show that this extension comes with provable guarantees on the performance via error bounds. The capabilities of the proposed methodology are tested on three challenging benchmarks, including advection-dominated phenomena and problems with a high-dimensional parameter space. The method results in a more lightweight and highly flexible strategy when compared to state-of-the-art models, while showing excellent generalisation performance in both single fidelity and multifidelity scenarios.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# Concave Objectivesを用いたオンラインパッケージングのための簡易学習支援アルゴリズム A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives ( http://arxiv.org/abs/2406.03574v1 ) ライセンス: Link先を確認	Elena Grigorescu, Young-San Lin, Maoyuan Song,	(参考訳) 学習強化アルゴリズムは、アルゴリズムの性能を向上させるために機械学習予測を使用する可能性があるため、近年、コンピュータサイエンスコミュニティで広く研究されている。予測は、将来を知ることなく、取り消せない決定をするオンラインアルゴリズムにとって特に有用である。このような学習強化されたアルゴリズムは、予測が正確である場合の古典的なオンラインアルゴリズムの限界を克服し、予測が不正確である場合の相容れない実行を目標としている。一般的なアプローチは、既存のオンラインアルゴリズムを特定のアドバイス概念に適応させることである。しかし、理想的には、従来のオンラインソリューションをブラックボックス方式で単純に使うだけで、近似の保証に大きな損失を被ることはない。ブラックボックスを開くのを避けるようなクリーンなソリューションは、しばしばまれであり、初めて見逃されることもある。例えば、Grigorescu et al (NeurIPS 22) は線形プログラムを網羅するオンライン学習アルゴリズムを提案したが、後に彼らの論文で述べられているように、彼らの結果はアドバイスとブラックボックスとして与えられるオンラインアルゴリズムを切り替える自然なアプローチによって仮定できることが判明した。本研究では,オンラインパッキング問題に対して,線形制約とコンケーブ目的を用いた単純な学習拡張アルゴリズムを導入,解析する。オンラインパッキングリニアプログラミング、knapsack、リソース管理のメリット、スループットの最大化、ネットワークユーティリティの最大化など、当社のフレームワークの直接的な応用例をいくつか紹介する。さらに、このような単純なブラックボックス解が最適である場合に必要かつ十分な条件を理解するという問題を提起する。これは、文献から多くのアドホックなアプローチを統合する研究の重要な方向であると考えています。 Learning-augmented algorithms has been extensively studied recently in the computer-science community, due to the potential of using machine learning predictions in order to improve the performance of algorithms. Predictions are especially useful for online algorithms making irrevocable decisions without knowledge of the future. Such learning-augmented algorithms aim to overcome the limitations of classical online algorithms when the predictions are accurate, and still perform comparably when the predictions are inaccurate. A common approach is to adapt existing online algorithms to the particular advice notion employed, which often involves understanding previous sophisticated algorithms and their analyses. However, ideally, one would simply use previous online solutions in a black-box fashion, without much loss in the approximation guarantees. Such clean solutions that avoid opening up black-boxes are often rare, and may be even missed the first time around. For example, Grigorescu et al. (NeurIPS 22) proposed a learning-augmented algorithms for online covering linear programs, but it later turned out that their results can be subsumed by a natural approach that switches between the advice and an online algorithm given as a black-box, as noted in their paper. In this work, we introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives. We exhibit several direct applications of our framework including online packing linear programming, knapsack, resource management benefit, throughput maximization, and network utility maximization. We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal. We believe this is an important direction of research that would unify many ad-hoc approaches from the literature.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 因果推論における不均一効果の再検討 Reconciling Heterogeneous Effects in Causal Inference ( http://arxiv.org/abs/2406.03575v1 ) ライセンス: Link先を確認	Audrey Chang, Emily Diana, Alexander Williams Tolbert,	(参考訳) 本稿では,因果推論における参照クラス問題に対する解法を提案する。本稿では、機械学習におけるモデル乗法にReconcileアルゴリズムを適用し、因果推論における異種効果を再現する。不均一効果の条件平均処理効果(CATE)推定器の相違は参照クラス問題を引き起こす。確率を解釈するために個人からグループ・フレームワークを採用することで、科学哲学や因果推論などの分野にまたがる参照クラス問題は、コンピュータ科学におけるモデル乗法問題と同等であることがわかる。次に、CATE推定器の個々の確率の差分を分解するためにReconcile Algorithmを適用した。基準クラス問題は,グループベースエビデンスを用いた個人確率予測の文脈に現れるため,医療,保険,住宅などの高所得者,特に疎外化社会において,公正な結果の確保に有意な意味を持つ。予測モデリングにおける格差緩和の重要性を強調することで、技術的厳密さと社会的含意の意識を融合した学際戦略のさらなる探究が求められます。最終的に、我々の発見はアルゴリズムの公正性に対する全体論的アプローチを提唱し、株式とアクセスの幅広い目標を達成する上で、思慮深い、十分に取り巻かれたソリューションの重要な役割をあらわすものである。 In this position and problem pitch paper, we offer a solution to the reference class problem in causal inference. We apply the Reconcile algorithm for model multiplicity in machine learning to reconcile heterogeneous effects in causal inference. Discrepancy between conditional average treatment effect (CATE) estimators of heterogeneous effects poses the reference class problem, where estimates for individual predictions differ by choice of reference class. By adopting the individual to group framework for interpreting probability, we can recognize that the reference class problem -- which appears across fields such as philosophy of science and causal inference -- is equivalent to the model multiplicity problem in computer science. We then apply the Reconcile Algorithm to reconcile differences in estimates of individual probability among CATE estimators. Because the reference class problem manifests in contexts of individual probability prediction using group-based evidence, our results have tangible implications for ensuring fair outcomes in high-stakes such as healthcare, insurance, and housing, especially for marginalized communities. By highlighting the importance of mitigating disparities in predictive modeling, our work invites further exploration into interdisciplinary strategies that combine technical rigor with a keen awareness of social implications. Ultimately, our findings advocate for a holistic approach to algorithmic fairness, underscoring the critical role of thoughtful, well-rounded solutions in achieving the broader goals of equity and access.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 階層化データ拡張による交通信号認識の強化--クラス不均衡とインスタンススカルシティの対応 Enhancing Traffic Sign Recognition with Tailored Data Augmentation: Addressing Class Imbalance and Instance Scarcity ( http://arxiv.org/abs/2406.03576v1 ) ライセンス: Link先を確認	Ulan Alsiyeu, Zhasdauren Duisebekov,	(参考訳) 本稿では、道路安全に不可欠な交通標識認識(TSR)における重要な課題、特にデータセットにおけるクラス不均衡とインスタンス不足に対処する。本稿では,合成画像生成,幾何変換,およびモデル堅牢性と精度向上のためのデータセット品質向上のための新しい障害物ベースの拡張手法など,データ拡張技術を紹介する。本手法は,実世界の条件を正確にシミュレートするための多種多様な拡張プロセスを導入し,トレーニングデータの多様性と代表性を拡大する。この結果,TSRモデルの性能は大幅に向上し,交通標識認識システムに大きな影響を及ぼすことがわかった。この研究は、TSRのデータセット制限に対処するだけでなく、異なる領域やアプリケーションにまたがる同様の課題のモデルも提案している。 This paper tackles critical challenges in traffic sign recognition (TSR), which is essential for road safety -- specifically, class imbalance and instance scarcity in datasets. We introduce tailored data augmentation techniques, including synthetic image generation, geometric transformations, and a novel obstacle-based augmentation method to enhance dataset quality for improved model robustness and accuracy. Our methodology incorporates diverse augmentation processes to accurately simulate real-world conditions, thereby expanding the training data's variety and representativeness. Our findings demonstrate substantial improvements in TSR models performance, offering significant implications for traffic sign recognition systems. This research not only addresses dataset limitations in TSR but also proposes a model for similar challenges across different regions and applications, marking a step forward in the field of computer vision and traffic sign recognition systems.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 機械学習における脆弱性検出のための貢献要因の説明 Explaining the Contributing Factors for Vulnerability Detection in Machine Learning ( http://arxiv.org/abs/2406.03577v1 ) ライセンス: Link先を確認	Esma Mouine, Yan Liu, Lu Xiao, Rick Kazman, Xiao Wang,	(参考訳) ソフトウェアリポジトリから脆弱性をマイニングし、機械学習技術を使ってソフトウェア脆弱性を自動的に検出する傾向が増えている。マイニングと学習プロセスの異なる要因は、様々な特性を持つソフトウェアプロジェクトの脆弱性を特定する精度にどのように影響しますか? ソースコードの静的解析、ソフトウェアリポジトリマイニング、NLPベースの機械学習など、この分野での実質的な研究が進められている。しかし、実践者は最先端のベースラインモデルを構築する上で重要な要素についての経験を欠いている。さらに、プロジェクトからプロジェクトへの脆弱性シグネチャの転送可能性に関する経験が不足している。本研究では、異なる脆弱性機能と3つの代表的な機械学習モデルの組み合わせが、実際の17のプロジェクトにおいて、脆弱性検出の精度にどのように影響するかを検討する。脆弱性表現には2つの種類がある。 1) 異なるトークン化戦略と3つの異なる埋め込み技術(bag-of-words, word2vec, fastText)でNLPから抽出されたコード機能。 2) ソフトウェアシステムの抽象的な設計を捉える8つのアーキテクチャメトリクスのセット。 3つの機械学習アルゴリズムには、ランダムフォレストモデル、サポートベクターマシンモデル、残留ニューラルネットワークモデルが含まれる。解析の結果,単語のバケット埋め込みから抽出したシグネチャをランダムな森林と組み合わせることで,他の17プロジェクトと比較すると,検出精度を約4%向上することがわかった。さらに,本実験により,脆弱性シグネチャのドメイン間での転送制限についても検討した。 There is an increasing trend to mine vulnerabilities from software repositories and use machine learning techniques to automatically detect software vulnerabilities. A fundamental but unresolved research question is: how do different factors in the mining and learning process impact the accuracy of identifying vulnerabilities in software projects of varying characteristics? Substantial research has been dedicated in this area, including source code static analysis, software repository mining, and NLP-based machine learning. However, practitioners lack experience regarding the key factors for building a baseline model of the state-of-the-art. In addition, there lacks of experience regarding the transferability of the vulnerability signatures from project to project. This study investigates how the combination of different vulnerability features and three representative machine learning models impact the accuracy of vulnerability detection in 17 real-world projects. We examine two types of vulnerability representations: 1) code features extracted through NLP with varying tokenization strategies and three different embedding techniques (bag-of-words, word2vec, and fastText) and 2) a set of eight architectural metrics that capture the abstract design of the software systems. The three machine learning algorithms include a random forest model, a support vector machines model, and a residual neural network model. The analysis shows a recommended baseline model with signatures extracted through bag-of-words embedding, combined with the random forest, consistently increases the detection accuracy by about 4% compared to other combinations in all 17 projects. Furthermore, we observe the limitation of transferring vulnerability signatures across domains based on our experiments.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 食品による拡散概念代数の限界を理解する Understanding the Limitations of Diffusion Concept Algebra Through Food ( http://arxiv.org/abs/2406.03582v1 ) ライセンス: Link先を確認	E. Zhixuan Zeng, Yuhao Chen, Alexander Wong,	(参考訳) 近年,画像生成技術,特に潜伏拡散モデルが急速に普及している。これらの大規模モデルが学習する意味概念を操作および明確化するために多くの技術が開発され、バイアスと概念関係に関する重要な洞察を提供する。しかしながら、これらの技法は、人間や動物の顔の伝統的な領域と芸術的スタイルの変遷においてのみ検証されることが多い。食品分野は、複雑な構成と地域バイアスを通じて固有の課題を提供しており、既存の方法の限界と機会に光を当てることができる。食品画像のレンズを通して,概念横断技術における定性的パターンと定量的パターンを解析する。我々は、モデルが料理の多様性のニュアンスを捉え、表現する能力に関する測定可能な洞察を明らかにし、モデルのバイアスと制限が出現する領域を特定する。 Image generation techniques, particularly latent diffusion models, have exploded in popularity in recent years. Many techniques have been developed to manipulate and clarify the semantic concepts these large-scale models learn, offering crucial insights into biases and concept relationships. However, these techniques are often only validated in conventional realms of human or animal faces and artistic style transitions. The food domain offers unique challenges through complex compositions and regional biases, which can shed light on the limitations and opportunities within existing methods. Through the lens of food imagery, we analyze both qualitative and quantitative patterns within a concept traversal technique. We reveal measurable insights into the model's ability to capture and represent the nuances of culinary diversity, while also identifying areas where the model's biases and limitations emerge.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 遺伝的プログラミングへのシンボリック回帰のための最近のアルゴリズムの比較 A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming ( http://arxiv.org/abs/2406.03585v1 ) ライセンス: Link先を確認	Yousef A. Radwan, Gabriel Kronberger, Stephan Winkler,	(参考訳) 記号回帰は、解釈可能な結果を生成することを目標とする機械学習手法である。例えばランダムな森やニューラルネットワークのような、不透明な他の機械学習手法とは異なり、象徴的回帰は、科学者が理解可能な方法でデータをモデル化し、マップすることを目的としている。ニューラルネットのマッピング能力と深層学習技術とを、記号回帰の説明力で融合させようとする新しい手法である。本稿では,これらの新しいシステムについて検討し,長年にわたってシンボルレグレッションを先導してきた遺伝的プログラミングに基づく従来の手法と比較して,エンド・ツー・エンドのトランスフォーマーモデルの性能を検証した。我々は、これらのシステムを新しいデータセット上で比較し、よく知られたベンチマークデータセットで改善された古い手法のバイアスを避ける。 Operon が実装した従来の GP 法は,最近発表された2つのシンボル回帰法よりも依然として優れていることを示す。 Symbolic regression is a machine learning method with the goal to produce interpretable results. Unlike other machine learning methods such as, e.g. random forests or neural networks, which are opaque, symbolic regression aims to model and map data in a way that can be understood by scientists. Recent advancements, have attempted to bridge the gap between these two fields; new methodologies attempt to fuse the mapping power of neural networks and deep learning techniques with the explanatory power of symbolic regression. In this paper, we examine these new emerging systems and test the performance of an end-to-end transformer model for symbolic regression versus the reigning traditional methods based on genetic programming that have spearheaded symbolic regression throughout the years. We compare these systems on novel datasets to avoid bias to older methods who were improved on well-known benchmark datasets. Our results show that traditional GP methods as implemented e.g., by Operon still remain superior to two recently published symbolic regression methods.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# CountCLIP -- [Re]CLIPに10までのカウントを教える CountCLIP -- [Re] Teaching CLIP to Count to Ten ( http://arxiv.org/abs/2406.03586v1 ) ライセンス: Link先を確認	Harshvardhan Mestha, Tejas Agarwal, Karan Bania, Shreyas V, Yash Bhisikar,	(参考訳) 大規模視覚言語モデル(VLM)は、下流タスクにおける高いパフォーマンスを実現するために、リッチな共同画像テキスト表現を学習する。しかし、それらはオブジェクトの定量的な理解を示すことができず、カウント・アウェアの表現が不十分である。本稿では,CLIPモデル(Radford et al ,2021)を微調整し,ゼロショット分類の性能を維持しつつ,画像中のゼロショットカウント精度を向上させる方法を提案する。より少ない計算資源でトレーニングデータの小さなサブセットでモデルの性能を向上させる。私たちは、自分たちのコードで研究を再現することで、これらの主張を検証する。実装はhttps://github.com/SforAiDl/CountCLIPで確認できる。 Large vision-language models (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good counting-aware representation. This paper conducts a reproducibility study of 'Teaching CLIP to Count to Ten' (Paiss et al., 2023), which presents a method to finetune a CLIP model (Radford et al., 2021) to improve zero-shot counting accuracy in an image while maintaining the performance for zero-shot classification by introducing a counting-contrastive loss term. We improve the model's performance on a smaller subset of their training data with lower computational resources. We verify these claims by reproducing their study with our own code. The implementation can be found at https://github.com/SforAiDl/CountCLIP.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 対話型検索エンジンにおけるランキング操作 Ranking Manipulation for Conversational Search Engines ( http://arxiv.org/abs/2406.03589v1 ) ライセンス: Link先を確認	Samuel Pfrommer, Yatong Bai, Tanmay Gautam, Somayeh Sojoudi,	(参考訳) 主要な検索エンジンプロバイダは、ユーザクエリに応答して、Large Language Model (LLM)生成コンテンツを急速に取り入れている。これらの対話型検索エンジンは、検索したWebサイトテキストをLLMコンテキストにロードして、要約と解釈を行う。近年の研究では、LLMはジェイルブレイクやインジェクション攻撃に対して非常に脆弱であることが示されており、敵弦を用いたLLMの安全性と品質の目標を阻害している。本研究では,対話型検索エンジンが参照するソースのランク付け順序に対するインジェクションのインジェクションの影響について検討する。そこで本研究では,現実の消費者製品Webサイトの集中データセットを導入し,対話型検索ランキングを敵問題として定式化する。実験により, 対向注入のない会話型検索ランキングを解析し, 製品名, 文書内容, コンテキスト位置の優先順位付けにおいて, 異なるLLMが著しく異なることを示す。次に、低ランク製品を確実に促進する攻撃木ベースのジェイルブレイク手法を提案する。重要なことに、これらの攻撃はPerplexity.aiのような最先端の会話検索エンジンに効果的に転送される。ウェブサイト所有者が検索ランクを上げるための強力な金銭的インセンティブを考えると、我々の問題定式化は将来の堅牢性作業にとって重要であると論じる。 Major search engine providers are rapidly incorporating Large Language Model (LLM)-generated content in response to user queries. These conversational search engines operate by loading retrieved website text into the LLM context for summarization and interpretation. Recent research demonstrates that LLMs are highly vulnerable to jailbreaking and prompt injection attacks, which disrupt the safety and quality goals of LLMs using adversarial strings. This work investigates the impact of prompt injections on the ranking order of sources referenced by conversational search engines. To this end, we introduce a focused dataset of real-world consumer product websites and formalize conversational search ranking as an adversarial problem. Experimentally, we analyze conversational search rankings in the absence of adversarial injections and show that different LLMs vary significantly in prioritizing product name, document content, and context position. We then present a tree-of-attacks-based jailbreaking technique which reliably promotes low-ranked products. Importantly, these attacks transfer effectively to state-of-the-art conversational search engines such as perplexity.ai. Given the strong financial incentive for website owners to boost their search ranking, we argue that our problem formulation is of critical importance for future robustness work.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# BVE + EKF:拡張カルマンフィルタを用いた3次元タスク空間における物体位置推定のための視点推定器 BVE + EKF: A viewpoint estimator for the estimation of the object's position in the 3D task space using Extended Kalman Filters ( http://arxiv.org/abs/2406.03591v1 ) ライセンス: Link先を確認	Sandro Costa Magalhães, António Paulo Moreira, Filipe Neves dos Santos, Jorge Dias,	(参考訳) RGB-Dセンサーは、放射線や雨などの外部の摂動に敏感であるため、オープンフィールド環境で動作している複数の課題に直面している。複数の作品がモノクロカメラを用いて物体の3D位置を認識するという課題に近づいている。しかし、これらの研究の大部分は、複雑なデータ駆動型で予測が難しいディープラーニングベースのソリューションに重点を置いている。そこで本稿では,拡張カルマンフィルタ (EKF) を用いたガウス視点推定器 (BVE) を用いて3次元物体の位置を予測する問題にアプローチする。このアルゴリズムはタスクの効率を証明し、最大平均ユークリッド誤差は約32mmに達した。実験は人工ガウス雑音を用いてMATLABに展開・評価された。今後の研究は、ロボットシステムにシステムを実装することを目指している。 RGB-D sensors face multiple challenges operating under open-field environments because of their sensitivity to external perturbations such as radiation or rain. Multiple works are approaching the challenge of perceiving the 3D position of objects using monocular cameras. However, most of these works focus mainly on deep learning-based solutions, which are complex, data-driven, and difficult to predict. So, we aim to approach the problem of predicting the 3D objects' position using a Gaussian viewpoint estimator named best viewpoint estimator (BVE) powered by an extended Kalman filter (EKF). The algorithm proved efficient on the tasks and reached a maximum average Euclidean error of about 32 mm. The experiments were deployed and evaluated in MATLAB using artificial Gaussian noise. Future work aims to implement the system in a robotic system.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# 質問応答システムにおける検索複雑性の測定 Measuring Retrieval Complexity in Question Answering Systems ( http://arxiv.org/abs/2406.03592v1 ) ライセンス: Link先を確認	Matteo Gabburo, Nicolaas Paul Jedema, Siddhant Garg, Leonardo F. R. Ribeiro, Alessandro Moschitti,	(参考訳) 本稿では,検索に基づく質問回答(QA)においてどの質問が困難なのかを検討する。我が家一検索複雑性(RC)とは、検索された文書の完全性に基づき、質問に答えることの難しさを測る新しい計量である。 (II)任意の検索システムに与えられたRCを測定するための教師なしパイプラインを提案する。提案するパイプラインは,6つのQAベンチマークにおいて,LLMを含む代替推定器よりもRCを正確に測定する。さらに、RCスコアは6つのベンチマークのうち5つでQA性能と専門家の判断の両方と強く相関しており、RCが質問の難易度を効果的に測定していることを示している。その後の高RC質問の分類は、複数のホップ、構成、時間的QAを含む幅広い質問形態にまたがっており、RCスコアが複雑な質問の新たなサブセットを分類できることを示している。我々のシステムは、既存のデータセットに関するより困難な質問の特定を支援することで、検索ベースのシステムに大きな影響を与える。 In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC more accurately than alternative estimators, including LLMs, on six challenging QA benchmarks. Further investigation reveals that RC scores strongly correlate with both QA performance and expert judgment across five of the six studied benchmarks, indicating that RC is an effective measure of question difficulty. Subsequent categorization of high-RC questions shows that they span a broad set of question shapes, including multi-hop, compositional, and temporal QA, indicating that RC scores can categorize a new subset of complex questions. Our system can also have a major impact on retrieval-based systems by helping to identify more challenging questions on existing datasets.	翻訳日:2024-06-07 19:14:47 公開日:2024-06-05
# なぜ「プロブレム」が肯定的感性を予測するのか : 感性分類における非直観的特徴の説明を事例として Why is "Problems" Predictive of Positive Sentiment? A Case Study of Explaining Unintuitive Features in Sentiment Classification ( http://arxiv.org/abs/2406.03594v1 ) ライセンス: Link先を確認	Jiaming Qu, Jaime Arguello, Yue Wang,	(参考訳) 説明可能なAI(XAI)アルゴリズムは、マシンラーニングモデルがどのように予測を行うかを理解するためのものだ。この目的のために、多くのアプローチが、どの入力特徴がターゲットラベルの最も予測可能であるかを説明している。しかし、そのような説明は依然としてユーザを困惑させる可能性がある(例えば、製品レビューでは、"problems"という言葉は肯定的な感情を予測している)。説明が残っていない場合、曖昧な説明は否定的な影響を与える可能性がある。入力特徴と対象ラベルの非直感的関連を説明することは,XAI研究における未探索領域である。本研究は、感情分類器によって学習された直感的関連を事例として、この方向の最初の取り組みを行う。本研究では,(1)ユーザに対して直感的に見える連想を自動的に検出する手法を提案し,(2)非直感的特徴が予測的である理由を理解するための説明を生成する。クラウドソースによる調査(N=300)の結果,提案手法は感情分類における予測的だが直観的でない特徴を効果的に検出・説明できることがわかった。 Explainable AI (XAI) algorithms aim to help users understand how a machine learning model makes predictions. To this end, many approaches explain which input features are most predictive of a target label. However, such explanations can still be puzzling to users (e.g., in product reviews, the word "problems" is predictive of positive sentiment). If left unexplained, puzzling explanations can have negative impacts. Explaining unintuitive associations between an input feature and a target label is an underexplored area in XAI research. We take an initial effort in this direction using unintuitive associations learned by sentiment classifiers as a case study. We propose approaches for (1) automatically detecting associations that can appear unintuitive to users and (2) generating explanations to help users understand why an unintuitive feature is predictive. Results from a crowdsourced study (N=300) found that our proposed approaches can effectively detect and explain predictive but unintuitive features in sentiment classification.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# Hi5: ゼロヒューマンアノテーションによる2次元ハンドポース推定 Hi5: 2D Hand Pose Estimation with Zero Human Annotation ( http://arxiv.org/abs/2406.03599v1 ) ライセンス: Link先を確認	Masum Hasan, Cengiz Ozel, Nina Long, Alexander Martin, Samuel Potter, Tariq Adnan, Sangwu Lee, Amir Zadeh, Ehsan Hoque,	(参考訳) 本研究では,ヒトのアノテーションやバリデーションを必要としない高品質な合成データを集めるための,新しい大規模合成手ポーズ推定データセット,Hi5を提案する。コンピュータグラフィックスの最近の進歩、多様な性別と肌色を持つ高忠実な3Dハンドモデル、ダイナミック環境とカメラの動きを活用して、データ合成パイプラインはデータの多様性と表現を正確に制御し、堅牢で公正なモデルのトレーニングを確実にします。我々は,実世界の変動性を忠実に表現した単一のコンシューマPCを用いて,583,000の画像と正確なポーズアノテーションを用いたデータセットを生成する。 Hi5でトレーニングされたポース推定モデルは、実際のベンチマークで競合的に動作し、オクルージョンと摂動でテストされた実際のデータでトレーニングされたモデルを上回ります。本実験は,実データセットにおけるデータ表現問題に対する有効な解決策として,合成データに対する有望な結果を示す。本論文は, コスト削減と手ポーズ推定のためのデータの多様性, 品質向上を実現するため, 合成データ作成とアノテーションに対する有望な新しいアプローチを提供する。 We propose a new large synthetic hand pose estimation dataset, Hi5, and a novel inexpensive method for collecting high-quality synthetic data that requires no human annotation or validation. Leveraging recent advancements in computer graphics, high-fidelity 3D hand models with diverse genders and skin colors, and dynamic environments and camera movements, our data synthesis pipeline allows precise control over data diversity and representation, ensuring robust and fair model training. We generate a dataset with 583,000 images with accurate pose annotation using a single consumer PC that closely represents real-world variability. Pose estimation models trained with Hi5 perform competitively on real-hand benchmarks while surpassing models trained with real data when tested on occlusions and perturbations. Our experiments show promising results for synthetic data as a viable solution for data representation problems in real datasets. Overall, this paper provides a promising new approach to synthetic data creation and annotation that can reduce costs and increase the diversity and quality of data for hand pose estimation.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 知識を融合した法的な知恵:診断レンズによるLCMの指導と正の非ラベル強化学習 Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning ( http://arxiv.org/abs/2406.03600v1 ) ライセンス: Link先を確認	Yang Wu, Chenghao Wang, Ece Gumusel, Xiaozhong Liu,	(参考訳) 法域を含む様々なアプリケーションへの生成型大規模言語モデル(LLM)の統合は、その拡張性と汎用性によって加速されている。しかし、法的背景のないユーザは、しばしば専門的なクエリを定式化するのに苦労し、LLMにケースの物語を提示する際、必然的に重要な法的要因を見落としてしまうことがある。この問題に対処するために,適応型弁護士のような診断質問を利用してケース情報を収集し,高品質なフィードバックを提供する診断法大規模言語モデル(D3LM)を提案する。 D3LMは、革新的なグラフベースのPositive-Unlabeled Reinforcement Learning (PURL)アルゴリズムを導入し、重要な質問の生成とユーザ-LLMインタラクションの強化を可能にしている。さらに、LCMベースの停止基準の統合により、正確なCourt Views Generation(CVG)が容易になる。また、米国事例法データベースに基づく新たな英語CVGデータセットを導入し、LCM研究と展開の領域を重要次元で強化した。 D3LMは、法域における卓越したパフォーマンスと優れたユーザエクスペリエンスを提供することによって、古典的なLLMを超える。 The integration of generative Large Language Models (LLMs) into various applications, including the legal domain, has been accelerated by their expansive and versatile nature. However, when facing a legal case, users without a legal background often struggle to formulate professional queries and may inadvertently overlook critical legal factors when presenting their case narrative to LLMs. To address this issue, we propose the Diagnostic Legal Large Language Model (D3LM), which utilizes adaptive lawyer-like diagnostic questions to collect additional case information and then provides high-quality feedback. D3LM incorporates an innovative graph-based Positive-Unlabeled Reinforcement Learning (PURL) algorithm, enabling the generation of critical questions and enhancing user-LLM interactions. Moreover, an integrated LLM-based stopping criterion facilitates precise Court Views Generation (CVG). Our research also introduces a new English-language CVG dataset based on the US case law database, enriching the realm of LLM research and deployment with a vital dimension. D3LM surpasses classical LLMs by delivering outstanding performance and a remarkable user experience in the legal domain.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 調整校正: 聴覚下でのコントラスト学習のためのマシンアンラーニング Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditing ( http://arxiv.org/abs/2406.03603v1 ) ライセンス: Link先を確認	Yihan Wang, Yiwei Lu, Guojun Zhang, Franziska Boenisch, Adam Dziedzic, Yaoliang Yu, Xiao-Shan Gao,	(参考訳) 機械学習は、トレーニング済みのモデルパラメータに対する特定のトレーニングデータの影響を取り消すための実行可能なソリューションを提供する。既存のアプローチは、分類と生成モデルのための未学習のレシピを提供する。しかし、重要な機械学習モデル、すなわちコントラスト学習(CL)メソッドのカテゴリは見過ごされてしまう。本稿では、まず、Machine Unlearning for Contrastive Learning(MUC)の枠組みを提案し、既存の手法を適用することで、このギャップを埋める。さらに,いくつかの手法は中等な未学習者であり,既存の監査ツールではデータ所有者が対照的な学習における未学習効果を検証するのに十分でない可能性がある。そこで本稿では,対照学習の特性を明示的に考慮し,未学習の検証を容易にするために,新たな監査指標に最適化することで,アライメント校正(Alignment Calibration, AAC)と呼ばれる新しい手法を提案する。我々は、ACとSimCLR、MoCo、CLIPのベースライン法を経験的に比較した。既存の手法の欠点として,(1)最先端の性能の達成と正確なアンラーニング(トレーニング)の近似,(2)データ所有者がブラックボックス監査によるアンラーニングの効果を明確に可視化できるようにする。 Machine unlearning provides viable solutions to revoke the effect of certain training data on pre-trained model parameters. Existing approaches provide unlearning recipes for classification and generative models. However, a category of important machine learning models, i.e., contrastive learning (CL) methods, is overlooked. In this paper, we fill this gap by first proposing the framework of Machine Unlearning for Contrastive learning (MUC) and adapting existing methods. Furthermore, we observe that several methods are mediocre unlearners and existing auditing tools may not be sufficient for data owners to validate the unlearning effects in contrastive learning. We thus propose a novel method called Alignment Calibration (AC) by explicitly considering the properties of contrastive learning and optimizing towards novel auditing metrics to easily verify unlearning. We empirically compare AC with baseline methods on SimCLR, MoCo and CLIP. We observe that AC addresses drawbacks of existing methods: (1) achieving state-of-the-art performance and approximating exact unlearning (retraining); (2) allowing data owners to clearly visualize the effect caused by unlearning through black-box auditing.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 非平衡多体コロイド系に対する神経力関数 Neural force functional for non-equilibrium many-body colloidal systems ( http://arxiv.org/abs/2406.03606v1 ) ライセンス: Link先を確認	Toni Zimmerman, Florian Sammüller, Sophie Hermann, Matthias Schmidt, Daniel de las Heras,	(参考訳) パワー関数理論と機械学習を組み合わせて、コロイド粒子の非平衡過剰な多体系を1体場のレベルで研究する。まず、ランダムに生成された外部場の影響下でブラウン粒子のコンピュータシミュレーションから、一体場を定常にサンプリングする。ニューラルネットワークは、このデータを用いて訓練され、一体密度と速度プロファイルから一体の内部力場への公式な正確に機能的なマッピングを宇宙空間で局所的に表現する。トレーニングされたネットワークは、非平衡超断熱力場とせん断やバルク粘性などの輸送係数を分析するために使用される。局所的な学習手法により、ネットワークは1体フィールドをサンプリングする元のシミュレーションボックスよりもはるかに大きなシステムに適用できる。ネットワークは、正確な非平衡一体力バランス方程式と連続性方程式を補完し、時間に依存した状況下での力学の実行可能な予測を導出する。トレーニングは定常状態のみに基づいているが、予測力学はシミュレーション結果とよく一致している。神経力学密度汎関数理論は、内部力場が平衡系のそれである極限の場合として簡単に実装できる。このフレームワークは一般的なものであり、ブラウン力学に従って相互作用する粒子の他の多体系にも直接適用できる。 We combine power functional theory and machine learning to study non-equilibrium overdamped many-body systems of colloidal particles at the level of one-body fields. We first sample in steady state the one-body fields relevant for the dynamics from computer simulations of Brownian particles under the influence of randomly generated external fields. A neural network is then trained with this data to represent locally in space the formally exact functional mapping from the one-body density and velocity profiles to the one-body internal force field. The trained network is used to analyse the non-equilibrium superadiabatic force field and the transport coefficients such as shear and bulk viscosities. Due to the local learning approach, the network can be applied to systems much larger than the original simulation box in which the one-body fields are sampled. Complemented with the exact non-equilibrium one-body force balance equation and a continuity equation, the network yields viable predictions of the dynamics in time-dependent situations. Even though training is based on steady states only, the predicted dynamics is in good agreement with simulation results. A neural dynamical density functional theory can be straightforwardly implemented as a limiting case in which the internal force field is that of an equilibrium system. The framework is general and directly applicable to other many-body systems of interacting particles following Brownian dynamics.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# Fantastyc: ブロックチェーンベースのフェデレーションラーニングはセキュアで実践的 Fantastyc: Blockchain-based Federated Learning Made Secure and Practical ( http://arxiv.org/abs/2406.03608v1 ) ライセンス: Link先を確認	William Boitier, Antonella Del Pozzo, Álvaro García-Pérez, Stephane Gazut, Pierre Jobic, Alexis Lemaire, Erwan Mahe, Aurelien Mayoue, Maxence Perion, Deepika Singh, Tuanir Franca Rezende, Sara Tucci-Piergiovanni,	(参考訳) フェデレートラーニング(Federated Learning)は、複数のクライアントが、ローカルデータを共有せずに、中央サーバのオーケストレーションの下で機械学習モデルを協調的にトレーニングすることを可能にする分散フレームワークである。このフレームワークの中心性は、ブロックチェーンベースのフェデレーション学習アプローチによって、文献で扱われる障害点を表している。トレーサビリティを備えた完全な分散ソリューションを保証する一方で、そのようなアプローチは、事実上デプロイされる完全性、機密性、スケーラビリティに関するいくつかの課題に直面している。本稿では,この課題に対処するためのFantastycを提案する。 Federated Learning is a decentralized framework that enables multiple clients to collaboratively train a machine learning model under the orchestration of a central server without sharing their local data. The centrality of this framework represents a point of failure which is addressed in literature by blockchain-based federated learning approaches. While ensuring a fully-decentralized solution with traceability, such approaches still face several challenges about integrity, confidentiality and scalability to be practically deployed. In this paper, we propose Fantastyc, a solution designed to address these challenges that have been never met together in the state of the art.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 次世代高複雑性トレースガスセンシングのための変調リングダウンコム干渉計 Modulated Ringdown Comb Interferometry for next-generation high complexity trace gas sensing ( http://arxiv.org/abs/2406.03609v1 ) ライセンス: Link先を確認	Qizhong Liang, Apoorva Bisht, Andrew Scheck, Peter G. Schunemann, Jun Ye,	(参考訳) 健康と環境に関連するガスサンプルは、典型的には膨大な濃度のダイナミックレンジにまたがる多数の分子種を含む。高濃度分子は強い吸収背景を課し、低濃度種の堅牢な同定を妨げる。高精細度キャビティ増強を伴う中赤外周波数コム分光法は、これまで最も感度の高い多種のトレースガス検出法の多くを実現しているが、その頑健な性能は、コームライン周波数から共振器共振器の分散を避けるために、ガスサンプルに弱い吸収特性しか含まないことを要求する。そこで本研究では, この制約から解放された新しい手法を導入し, 複雑な分子組成と動的分子組成に広い適用性を有する次世代多種トレースガスセンシングの開発を可能にする。変調リングダウンコム干渉法(Modulated Ringdown Comb Interferometry)の原理は、長さ変調キャビティを通して伝達される巨大並列コム線によって輸送されるリングダウンダイナミクスを解くことである。この方法は、フィールドダイナミクスの周期性と、ミッチェルソン干渉計から導入されたドップラー周波数シフトの両方を利用する。分散免疫と高効率データ収集により、スペクトルカバレッジとキャビティ微細化の両面を拡張できる。このプラットフォーム上に構築され、これまでのすべての実験よりも桁違いに優れた微視的・スペクトル的カバレッジの製品が、中赤外において実現されている。広帯域1010cm-1, キャビティ微細度23,000の範囲で, 高度に分散したヒトの呼吸サンプルを計測し, 本手法の有効性を実証した。これにより、20個の異なる分子種を1個の三量体当たりの感度で同時定量することができ、その濃度は7桁に変化する。 Gas samples relevant to health and environment typically contain a plethora of molecular species that span a huge concentration dynamic range. High-concentration molecules impose a strong absorption background that hinders robust identification of low-concentration species. While mid-infrared frequency comb spectroscopy with high-finesse cavity enhancement has realized many of the most sensitive multi-species trace gas detection to date, its robust performance requires gas samples to contain only weak absorption features to avoid dispersing cavity resonances from the comb line frequencies. Here we introduce a new technique that is free from this restriction, thus enabling the development of next-generation multi-species trace gas sensing with broad applicability to complex and dynamic molecular compositions. The principle of Modulated Ringdown Comb Interferometry is to resolve ringdown dynamics carried by massively parallel comb lines transmitted through a length-modulated cavity. This method leverages both periodicity of the field dynamics and Doppler frequency shifts introduced from a Michelson interferometer. Scalable enhancement of both spectral coverage and cavity finesse is enabled with dispersion immune and high-efficiency data collection. Built upon this platform, we realize in the mid-infrared a product of finesse and spectral coverage that is orders of magnitude better than all prior experiments. We demonstrate the power of this technique by measuring highly dispersive exhaled human breath samples over a vastly expanded spectral coverage of 1,010 cm-1 and with cavity finesse of 23,000. This allows for the first time simultaneous quantification of 20 distinct molecular species at > 1 part-per-trillion sensitivity with their concentrations varying by 7 orders of magnitude.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# FedPylot: 自動車のインターネットにおけるリアルタイム物体検出のためのフェデレーション学習 FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles ( http://arxiv.org/abs/2406.03611v1 ) ライセンス: Link先を確認	Cyprien Quéméneur, Soumaya Cherkaoui,	(参考訳) Internet of Vehicles (IoV)は、自動車、インフラ、歩行者、クラウドで構成される密接なネットワークにおいて、低レイテンシのビッグデータ処理を可能にすることで、自動運転およびインテリジェントトランスポートシステム(ITS)の重要なコンポーネントとして出現する。自動運転車は機械学習(ML)に大きく依存しており、エッジで生成されたセンサデータの豊富な恩恵を受けることができる。フェデレートラーニング(FL)は、車載ネットワークにおける洗練されたMLモデルをトレーニングし、道路ユーザのプライバシを保護し、通信オーバーヘッドを軽減するための有望なソリューションである。本稿では,データの不均一性,コンセプトドリフト,ラベル分布スキューを含むリアルタイム物体検出のための最先端YOLOv7モデルのフェデレーション最適化について検討する。この目的のために我々は,ハイパフォーマンスコンピューティング(HPC)システム上でのフェデレーションオブジェクト検出実験をシミュレートする,軽量MPIベースのプロトタイプであるFedPylotを紹介した。本研究は, 精度, 通信コスト, 推論速度に影響を及ぼし, 自動運転車が直面する課題に対するバランスのとれたアプローチを示す。我々は、IoVにおけるFLの適用性に関する有望な結果を実証し、FedPylotが今後のFederated Real-time Object Detection研究の基礎となることを期待する。ソースコードはhttps://github.com/cyprienquemeneur/fedpylot.comで公開されている。 The Internet of Vehicles (IoV) emerges as a pivotal component for autonomous driving and intelligent transportation systems (ITS), by enabling low-latency big data processing in a dense interconnected network that comprises vehicles, infrastructures, pedestrians and the cloud. Autonomous vehicles are heavily reliant on machine learning (ML) and can strongly benefit from the wealth of sensory data generated at the edge, which calls for measures to reconcile model training with preserving the privacy of sensitive user data. Federated learning (FL) stands out as a promising solution to train sophisticated ML models in vehicular networks while protecting the privacy of road users and mitigating communication overhead. This paper examines the federated optimization of the cutting-edge YOLOv7 model to tackle real-time object detection amid data heterogeneity, encompassing unbalancedness, concept drift, and label distribution skews. To this end, we introduce FedPylot, a lightweight MPI-based prototype to simulate federated object detection experiments on high-performance computing (HPC) systems, where we safeguard server-client communications using hybrid encryption. Our study factors in accuracy, communication cost, and inference speed, thereby presenting a balanced approach to the challenges faced by autonomous vehicles. We demonstrate promising results for the applicability of FL in IoV and hope that FedPylot will provide a basis for future research into federated real-time object detection. The source code is available at https://github.com/cyprienquemeneur/fedpylot.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 異常検出の高速化:LLMを用いた非意味的財務データ符号化 Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs ( http://arxiv.org/abs/2406.03614v1 ) ライセンス: Link先を確認	Alexander Bakumenko, Kateřina Hlaváčková-Schindler, Claudia Plant, Nina C. Hubig,	(参考訳) 一般的な台帳データの異常を検出することは、財務記録の信頼性を確保する上で最も重要である。財務監査は、不規則または潜在的に不正なジャーナルエントリを特定するために、機械学習(ML)アルゴリズムにますます依存している。機械学習では、特徴次元の不均一性はデータ解析にかなりの複雑さをもたらす。本稿では,Large Language Models (LLMs) を用いた金融データの異常検出手法を提案する。実世界の財務記録からの非意味的分類データを符号化するために,3つの事前学習された汎用文変換器モデルを検証した。下流分類タスクでは,ロジスティック回帰,ランダムフォレスト,グラディエントブースティングマシン,サポートベクトルマシン,ニューラルネットワークを含む5つの最適化MLモデルを実装,評価した。実験により,LLMが異常検出に有用な情報を提供することを示す。この結果は,金融ジャーナルの項目における異常検出,特に特徴空間の扱いにおいて,LCMsの有効性をさらに裏付けるものである。財務状況等における非意味的データに対するLLM埋め込みの利用について,将来的な視点を論じる。 Detecting anomalies in general ledger data is of utmost importance to ensure trustworthiness of financial records. Financial audits increasingly rely on machine learning (ML) algorithms to identify irregular or potentially fraudulent journal entries, each characterized by a varying number of transactions. In machine learning, heterogeneity in feature dimensions adds significant complexity to data analysis. In this paper, we introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. To encode non-semantic categorical data from real-world financial records, we tested 3 pre-trained general purpose sentence-transformer models. For the downstream classification task, we implemented and evaluated 5 optimized ML models including Logistic Regression, Random Forest, Gradient Boosting Machines, Support Vector Machines, and Neural Networks. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines, in selected settings even by a large margin. The findings further underscore the effectiveness of LLMs in enhancing anomaly detection in financial journal entries, particularly by tackling feature sparsity. We discuss a promising perspective on using LLM embeddings for non-semantic data in the financial context and beyond.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# BEACON: 高価なブラックボックスシステムにおけるノベルティ探索のためのベイズ最適化戦略 BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems ( http://arxiv.org/abs/2406.03616v1 ) ライセンス: Link先を確認	Wei-Ting Tang, Ankush Chakrabarty, Joel A. Paulson,	(参考訳) ノベルティ・サーチ (NS) は、シミュレーションや実験を通じて様々なシステムの振る舞いを自動的に発見する探索アルゴリズムのクラスである。多様な成果を体系的に得ることは、物質や薬物発見、ニューラルアーキテクチャ探索、強化学習、ロボットナビゲーションなど、多くの現実世界の設計問題において重要な要素である。これらの複雑なシステムの入力と出力(つまり振る舞い)の関係は通常閉形式では利用できないので、NSはブラックボックスの視点を必要とする。その結果、一般的なNSアルゴリズムは、システム評価にコストがかかる場合に、入力空間の集中的なサンプリングを必要とする進化的最適化やその他のメタヒューリスティックに依存している。このような高価なブラックボックスシステムに特化して設計されたサンプル効率のNSに対するベイズ最適化法を提案する。提案手法は,多出力ガウス過程 (MOGP) を用いた入力-行動マッピングをモデル化し,探索と搾取の両方を促進するMOGPから得られた先行サンプルに依存する新規度測定値の最大化により,次の点を選択する。効率的な後方サンプリングと高次元ガウス過程モデリングの進歩を活用することで、我々のアプローチをデータの量と入力数の両方に関してスケーラブルにする方法について議論する。クリーンエネルギー技術に使用する多様な金属有機フレームワークの発見などを含む,10の総合的なベンチマーク問題と8つの実世界の問題(最大2133個のインプットを含む)に対して,我々のアプローチを検証した。提案手法は,限られたサンプル予算の下で,より大規模な多様な挙動の集合を見出すことにより,既存のNSアルゴリズムよりも大幅に優れていることを示す。 Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. Systematically obtaining diverse outcomes is a key component in many real-world design problems such as material and drug discovery, neural architecture search, reinforcement learning, and robot navigation. Since the relationship between the inputs and outputs (i.e., behaviors) of these complex systems is typically not available in closed form, NS requires a black-box perspective. Consequently, popular NS algorithms rely on evolutionary optimization and other meta-heuristics that require intensive sampling of the input space, which is impractical when the system is expensive to evaluate. We propose a Bayesian optimization inspired algorithm for sample-efficient NS that is specifically designed for such expensive black-box systems. Our approach models the input-to-behavior mapping with multi-output Gaussian processes (MOGP) and selects the next point to evaluate by maximizing a novelty metric that depends on a posterior sample drawn from the MOGP that promotes both exploration and exploitation. By leveraging advances in efficient posterior sampling and high-dimensional Gaussian process modeling, we discuss how our approach can be made scalable with respect to both amount of data and number of inputs. We test our approach on ten synthetic benchmark problems and eight real-world problems (with up to 2133 inputs) including new applications such as discovery of diverse metal organic frameworks for use in clean energy technology. We show that our approach greatly outperforms existing NS algorithms by finding substantially larger sets of diverse behaviors under limited sample budgets.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# TACT:情報抽出ツールによる複合集約推論の促進 TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools ( http://arxiv.org/abs/2406.03618v1 ) ライセンス: Link先を確認	Avi Caciularu, Alon Jacovi, Eyal Ben-David, Sasha Goldshtein, Tal Schuster, Jonathan Herzig, Gal Elidan, Amir Globerson,	(参考訳) 大規模言語モデル(LLM)は、テキスト間の情報の集約を必要とするクエリではよく機能しないことが多い。この設定をよりよく評価し、モデリング作業を容易にするために、複雑な命令を用いてLSMの推論と計算能力を評価するデータセットであるTACT-Text And calculations through Tablesを紹介した。 TACTには、1つ以上のテキストに散在する縫合情報を要求し、この情報を複雑な統合して回答を生成する、困難な命令が含まれている。既存のテキストと関連するテーブルのデータセットを活用することで、このデータセットを構築します。それぞれのテーブルに対して、新しいクエリを定式化し、それぞれの回答を収集する。このデータセットでは, 現代のLLMはいずれも性能が悪く, 精度が38\%以下であることが実証された。そこで本研究では,テーブルジェネレーション,パンダコマンドジェネレーション,実行という3つのコンポーネントのモデルパフォーマンスを分析した。予期せぬことに、各コンポーネントが現在のLLMに対して重大な課題を提起していることが判明した。これらの知見は、ツールとしてIEと呼ぶ集中型モデリングフレームワークの提案につながります。具体的には、上記の各ステップに"ツール"を追加し、ほとんどショットプロンプトせずに、それぞれのツールを実装することを提案する。このアプローチは既存のプロンプト技術よりも改善され、これらのタスクにおけるモデル機能を強化するための有望な方向性を提供する。 Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38\%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# アフィン変換を超えた対称性の発見 Symmetry Discovery Beyond Affine Transformations ( http://arxiv.org/abs/2406.03619v1 ) ライセンス: Link先を確認	Ben Shaw, Abram Magner, Kevin R. Moon,	(参考訳) 対称性検出は様々な機械学習タスクを改善することが示されている。連続対称性検出の文脈では、現在の最先端の実験はアフィン変換の検出に限られる。多様体の仮定の下で、アフィン変換群を超えたデータの連続対称性を発見するための枠組みを概説する。また、離散対称性を発見するための同様の枠組みも提供する。提案手法をLieGANと呼ばれる既存手法と比較した結果, 試料径の大きいアフィン対称性を検出でき, 試料径の小さいLieGANよりも優れていることがわかった。また,本手法はアフィン群以外の連続対称性の検出が可能であり,一般にLieGANよりも計算効率が高いことを示す。 Symmetry detection has been shown to improve various machine learning tasks. In the context of continuous symmetry detection, current state of the art experiments are limited to the detection of affine transformations. Under the manifold assumption, we outline a framework for discovering continuous symmetry in data beyond the affine transformation group. We also provide a similar framework for discovering discrete symmetry. We experimentally compare our method to an existing method known as LieGAN and show that our method is competitive at detecting affine symmetries for large sample sizes and superior than LieGAN for small sample sizes. We also show our method is able to detect continuous symmetries beyond the affine group and is generally more computationally efficient than LieGAN.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 遅延アルゴリズムによるプライベートオンライン学習 Private Online Learning via Lazy Algorithms ( http://arxiv.org/abs/2406.03620v1 ) ライセンス: Link先を確認	Hilal Asi, Tomer Koren, Daogao Liu, Kunal Talwar,	(参考訳) 本稿では,オンライン学習の問題,特に専門家によるオンライン予測(OPE)とオンライン凸最適化(OCO)について検討する。遅延オンライン学習アルゴリズムをプライベートアルゴリズムに変換する新しい変換を提案する。これらの問題に対して,既存の遅延アルゴリズムを用いて,微分プライベートなOPEとOCOに変換を適用した。 DP-OPEは$\sqrt{T \log d} + T^{1/3} \log(d)/\varepsilon^{2/3}$、DP-OCOは$\sqrt{T} + T^{1/3} \sqrt{d}/\varepsilon^{2/3}$となる。また、DP-OPE の低い境界で結果の補足を行い、これらの値は、低スイッチのプライベートアルゴリズムの自然なファミリーに最適であることを示す。 We study the problem of private online learning, specifically, online prediction from experts (OPE) and online convex optimization (OCO). We propose a new transformation that transforms lazy online learning algorithms into private algorithms. We apply our transformation for differentially private OPE and OCO using existing lazy algorithms for these problems. Our final algorithms obtain regret, which significantly improves the regret in the high privacy regime $\varepsilon \ll 1$, obtaining $\sqrt{T \log d} + T^{1/3} \log(d)/\varepsilon^{2/3}$ for DP-OPE and $\sqrt{T} + T^{1/3} \sqrt{d}/\varepsilon^{2/3}$ for DP-OCO. We also complement our results with a lower bound for DP-OPE, showing that these rates are optimal for a natural family of low-switching private algorithms.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 微小キャビティにおける真空揺らぎによるハニカム格子の絡み合い Entanglement harvesting in buckled honeycomb lattices by vacuum fluctuations in a microcavity ( http://arxiv.org/abs/2406.03624v1 ) ライセンス: Link先を確認	Facundo Arreyes, Federico Escudero, Juan Sebastián Ardenghi, Alfredo Juan,	(参考訳) 平面マイクロキャビティ内に設置した2つの同一折り畳みハニカム格子間の絡み合いについて検討した。時間依存摂動理論を適用することにより、空洞場によって誘導される両方の層間の量子相関を求める。空洞場の初期状態として真空状態を考慮し, 時間とともに変化する自由度を追跡した結果, コンカレンス測定による絡み合いの形成を解析した。共起は、層間光子プロパゲータを介して交換された仮想光子と層の位置に依存することを示す。さらに、等エネルギー電子間の絡み合いの形成は、垂直方向に移動すると増大する傾向にある。以上の結果から,ハニカムの座屈構造と大きなスピン軌道相互作用が絡み合いの収穫に有利であることが示唆された。 We study the entanglement harvesting between two identical buckled honeycomb lattices placed inside a planar microcavity. By applying time dependent perturbation theory, we obtain quantum correlations between both layers induced by the cavity field. Considering the vacuum state as the initial state of the cavity field and tracing out the time-evolved degrees of freedom, we analyze the entanglement formation using the concurrence measure. We show that the concurrence depends on the virtual photon exchanged and the positions of the layer through the interlayer photon propagator. Furthermore, we find that the formation of entanglement between equal energy electrons tends to be enhanced when they move in perpendicular directions. Our results indicate that a buckled honeycomb structure and a large spin-orbit interaction favor the entanglement harvesting.	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 自由度のデグリー:点軌道からのダイナミクスの推測 Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories ( http://arxiv.org/abs/2406.03625v1 ) ライセンス: Link先を確認	Yan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang,	(参考訳) ジェネリック3Dシーンのダイナミクスを理解することは、コンピュータビジョンにおいて基本的に困難であり、シーン再構成、モーショントラッキング、アバター作成に関連する応用の強化に不可欠である。本研究では,3次元点の高密度な長距離運動を推定する問題として,この課題に対処する。点軌跡の集合を観察することにより、ニューラルネットワークによってパラメータ化された暗黙の運動場を学習し、データ駆動やシーン固有の先行情報に頼ることなく、同一領域内の新規点の動きを予測することを目指す。そこで本研究では, 標準フレームと個々の観測フレーム間の滑らかな変形場を学習する動的点場モデルを構築した。しかし、連続するフレーム間の時間的一貫性は無視され、フレーム単位のモデリングによって要求されるパラメータの数は、シーケンス長とともに線形に増加する。これらの欠点に対処するために、SIRENが提供する本質的な正規化を活用し、入力層を変更して時空間的に滑らかな運動場を生成する。さらに、運動場ヤコビ行列を分析し、点周辺の無限小領域における自由度(DOF)とネットワーク隠れ変数がモデルの表現力に影響を与える振る舞いが異なることを明らかにする。これにより、モデルコンパクト性を保ちながら、モデル表現能力を向上させることができる。さらに, 過度に適合するリスクを低減するために, 片方向の運動の滑らかさを仮定した正規化項を導入する。本実験は, 未知点軌道の予測におけるモデルの性能評価と, 誘導による時間メッシュアライメントへの応用について検討した。結果は、その優位性と有効性を示している。プロジェクトのコードとデータが公開されている。 \url{https://yz-cnsdqz.github.io/eigenmotion/DOMA/} Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within the same domain, without relying on any data-driven or scene-specific priors. To achieve this, our approach builds upon the recently introduced dynamic point field model that learns smooth deformation fields between the canonical frame and individual observation frames. However, temporal consistency between consecutive frames is neglected, and the number of required parameters increases linearly with the sequence length due to per-frame modeling. To address these shortcomings, we exploit the intrinsic regularization provided by SIREN, and modify the input layer to produce a spatiotemporally smooth motion field. Additionally, we analyze the motion field Jacobian matrix, and discover that the motion degrees of freedom (DOFs) in an infinitesimal area around a point and the network hidden variables have different behaviors to affect the model's representational power. This enables us to improve the model representation capability while retaining the model compactness. Furthermore, to reduce the risk of overfitting, we introduce a regularization term based on the assumption of piece-wise motion smoothness. Our experiments assess the model's performance in predicting unseen point trajectories and its application in temporal mesh alignment with guidance. The results demonstrate its superiority and effectiveness. The code and data for the project are publicly available: \url{https://yz-cnsdqz.github.io/eigenmotion/DOMA/}	翻訳日:2024-06-07 19:04:59 公開日:2024-06-05
# 量子センシングにおける最適制御とガラス性 Optimal Control and Glassiness in Quantum Sensing ( http://arxiv.org/abs/2406.03627v1 ) ライセンス: Link先を確認	Christopher I. Timms, Michael H. Kolodrubetz,	(参考訳) 量子システムは、材料の走査型プローブ顕微鏡からバイオメディカルイメージングまで幅広い用途を持つ強力な検出器である。例えば、ダイヤモンド中の窒素空孔(NV)中心は、磁場、温度、または関連する信号を検知するための量子ビットとして操作することができる。パルスシーケンスを適切に設計することで、実験は環境ノイズからこの信号をフィルタリングし、単一のNV中心で非常に敏感な測定を可能にする。近年、パルスシーケンスの修正により感度を向上させるために最適な制御が用いられており、特に$\pi$パルスの配置が最適である。ここでは、$\pi$パルスを超えて、連続時間依存の制御フィールドの最適化について検討する。これらのプロトコルを最適化することの難しさは、古典的なフラストレーションスピン系における最小自由エネルギーを見つけるのが困難であることを示す。ほとんどの最適化は、Isingのスピングラスと同様、パワー法則として成長するセンシングプロトコルの自己相関を示すが、連続制御は対数成長が遅いことを示唆しており、より硬いハイゼンベルクのようなガラスの風景を示唆している。 Quantum systems are powerful detectors with wide-ranging applications from scanning probe microscopy of materials to biomedical imaging. Nitrogen vacancy (NV) centers in diamond, for instance, can be operated as qubits for sensing of magnetic field, temperature, or related signals. By well-designed application of pulse sequences, experiments can filter this signal from environmental noise, allowing extremely sensitive measurements with single NV centers. Recently, optimal control has been used to further improve sensitivity by modification of the pulse sequence, most notably by optimal placement of $\pi$ pulses. Here we consider extending beyond $\pi$ pulses, exploring optimization of a continuous, time-dependent control field. We show that the difficulty of optimizing these protocols can be mapped to the difficulty of finding minimum free energy in a classical frustrated spin system. While most optimizations we consider show autocorrelations of the sensing protocol that grow as a power law -- similar to an Ising spin glass -- the continuous control shows slower logarithmic growth, suggestive of a harder Heisenberg-like glassy landscape.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 合成オーバーサンプリング: LLMによるデータ不均衡対策の理論と実践的アプローチ Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance ( http://arxiv.org/abs/2406.03628v1 ) ライセンス: Link先を確認	Ryumei Nakada, Yichen Xu, Lexin Li, Linjun Zhang,	(参考訳) 不均衡なデータと急激な相関は、機械学習とデータサイエンスにおける一般的な課題である。過度に表現されていないクラスのインスタンス数を人工的に増加させるオーバーサンプリングは、これらの課題に対処するために広く採用されている。本稿では,OPAL(\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM- generated data)を導入する。深部生成モデルを用いた合成データ生成に関する最近の研究は、主に予測タスクを対象としている。我々の提案は、不均衡なデータと急激な相関を扱うことに重点を置いているという点で異なっています。より重要なことは、我々は、合成データを使用することの利点を厳格に特徴づけ、ラベルと共変量の両方で高品質な合成データを生成するトランスフォーマーの能力を示す新しい理論を開発することである。さらに,提案手法の有効性を示すために,いくつかの代表的な代替手法と比較して,集中的な数値実験を行った。 Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversampling approach that leverages the capabilities of large language models (LLMs) to generate high-quality synthetic data for minority groups. Recent studies on synthetic data generation using deep generative models mostly target prediction tasks. Our proposal differs in that we focus on handling imbalanced data and spurious correlations. More importantly, we develop a novel theory that rigorously characterizes the benefits of using the synthetic data, and shows the capacity of transformers in generating high-quality synthetic data for both labels and covariates. We further conduct intensive numerical experiments to demonstrate the efficacy of our proposed approach compared to some representative alternative solutions.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 6GのためのアクティブML:効率的なデータ生成、取得、アノテーションを目指して Active ML for 6G: Towards Efficient Data Generation, Acquisition, and Annotation ( http://arxiv.org/abs/2406.03630v1 ) ライセンス: Link先を確認	Omar Alhussein, Ning Zhang, Sami Muhaidat, Weihua Zhuang,	(参考訳) 本稿では6Gネットワークにおけるアクティブ機械学習(ML)の統合について検討する。受動的MLシステムとは異なり、アクティブMLはネットワーク環境と相互作用する。これにより、学習過程を加速しながら、必要なデータ量を減らし、情報や代表データポイントを積極的に選択する。アクティブラーニング研究は主にデータアノテーションに焦点を当てているが、我々は、アノテーション(ラベルとは何か)とデータ取得(収集するサンプルの数)の両方を考慮する、ネットワーク中心のアクティブラーニングフレームワークを求めている。さらに,生成型人工知能(AI)とアクティブラーニングの相乗効果について検討し,アクティブラーニングと生成型AIの両方の既存の限界を克服する。また、6Gネットワークにおけるアクティブラーニングの実践的メリットと性能向上を示すために、mmWaveスループット予測問題に関するケーススタディを取り上げている。さらに,アクティブラーニングの意義を,多数の6Gネットワーク利用事例に拡張する方法について論じる。我々は,能動学習に基づく6Gネットワークが,計算効率,データアノテーション,取得効率,適応性,ネットワークインテリジェンス全般を向上させる可能性を強調した。 6Gネットワークにおけるアクティブラーニングの課題と今後の研究方向性について,新たなクエリ戦略の開発,分散ラーニング統合,ヒューマン・イン・ザ・ループラーニングの導入などについて論じる。 This paper explores the integration of active machine learning (ML) for 6G networks, an area that remains under-explored yet holds potential. Unlike passive ML systems, active ML can be made to interact with the network environment. It actively selects informative and representative data points for training, thereby reducing the volume of data needed while accelerating the learning process. While active learning research mainly focuses on data annotation, we call for a network-centric active learning framework that considers both annotation (i.e., what is the label) and data acquisition (i.e., which and how many samples to collect). Moreover, we explore the synergy between generative artificial intelligence (AI) and active learning to overcome existing limitations in both active learning and generative AI. This paper also features a case study on a mmWave throughput prediction problem to demonstrate the practical benefits and improved performance of active learning for 6G networks. Furthermore, we discuss how the implications of active learning extend to numerous 6G network use cases. We highlight the potential of active learning based 6G networks to enhance computational efficiency, data annotation and acquisition efficiency, adaptability, and overall network intelligence. We conclude with a discussion on challenges and future research directions for active learning in 6G networks, including development of novel query strategies, distributed learning integration, and inclusion of human- and machine-in-the-loop learning.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 潜伏空間でバイアスを発見:教師なしの偏見のアプローチ Discovering Bias in Latent Space: An Unsupervised Debiasing Approach ( http://arxiv.org/abs/2406.03631v1 ) ライセンス: Link先を確認	Dyah Adila, Shuai Zhang, Boran Han, Yuyang Wang,	(参考訳) 基礎モデルの質問応答(QA)能力は、変化を促すために非常に敏感であり、その性能は表面的で意味のない変化に影響を受けやすい。この脆弱性は、オプション位置やマルチモーダル設定における表面画像の特徴など、特定の入力特性に対するモデルの好みや偏見から生じることが多い。モデルの内部表現において、このバイアスを直接修正することを提案する。我々のアプローチであるSteerFairは、モデルの表現空間におけるバイアス方向を見つけ、推論中にアクティベーション値から分離する。具体的には、バイアスが第一の選択肢と正しさの急激な関連性のような単純な関連規則によく従うという観察を利用する。次に、ラベルのないサンプルからこれらのルールのデモを作成し、バイアス方向を識別する。我々は,SteerFairが3つのベンチマークタスクの即時修正において,命令調整されたモデル性能のばらつきを著しく低減できることを実証的に示す。注目すべきは、100のラベルを持つ教師付きベースラインを平均10.86%の精度ポイントと12.95のスコアポイントで上回り、500のラベルとパフォーマンスを一致させることだ。 The question-answering (QA) capabilities of foundation models are highly sensitive to prompt variations, rendering their performance susceptible to superficial, non-meaning-altering changes. This vulnerability often stems from the model's preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model's internal representation. Our approach, SteerFair, finds the bias direction in the model's representation space and steers activation values away from it during inference. Specifically, we exploit the observation that bias often adheres to simple association rules, such as the spurious association between the first option and correctness likelihood. Next, we construct demonstrations of these rules from unlabeled samples and use them to identify the bias directions. We empirically show that SteerFair significantly reduces instruction-tuned model performance variance across prompt modifications on three benchmark tasks. Remarkably, our approach surpasses a supervised baseline with 100 labels by an average of 10.86% accuracy points and 12.95 score points and matches the performance with 500 labels.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 極低リソースプログラミング言語におけるテキストからコードへの合成プログラミングの励磁と補修 Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages ( http://arxiv.org/abs/2406.03636v1 ) ライセンス: Link先を確認	Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit A. Seshia,	(参考訳) コードアプリケーションのための大規模言語モデル(LLM)の最近の進歩は、テストケース生成から自己修復まで、コードに関連する課題に追従する、目覚ましいゼロショットの流速と命令を実証している。しかし、当然のことながら、モデルは非常に低リソースのプログラミング言語 (VLPL) と呼ばれる事前学習で表現されていないプログラミング言語において、構文的に有効なプログラムを構成するのに苦労している。 VLPLは、内部ツールやツールチェーン、レガシ言語など、ドメイン固有の言語を含む重要な設定で表示される。そこで本研究では,LLMs ``naturally'' が使用方法を知っていて,対象の VLPL に自動的にコンパイル可能な中間言語を設計することを提案する。具体的には,Synthetic programming elicitation and compilation (SPEAK)を導入し,LLMがVLPLに対しても構文的に有効なコードを生成する手法を提案する。ケーススタディにおいて,SPEAKの性能を実証的に評価し,既存の検索や微調整ベースラインと比較して,意味的正当性を犠牲にすることなく,構文的に正しいプログラムをより頻繁に生成することを発見した。 Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings including domain-specific languages for internal to tools and tool-chains and legacy languages. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to the target VLPL. Specifically, we introduce synthetic programming elicitation and compilation (SPEAK), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAK in a case study and find that, compared to existing retrieval and fine-tuning baselines, SPEAK produces syntactically correct programs more frequently without sacrificing semantic correctness.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 表現型テキスト音声合成のためのエキスパートのスタイルミックス Style Mixture of Experts for Expressive Text-To-Speech Synthesis ( http://arxiv.org/abs/2406.03637v1 ) ライセンス: Link先を確認	Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman,	(参考訳) 近年,TTS (style transfer text-to-speech) の進歩により,合成音声の表現性が向上した。これらの進歩にもかかわらず、多様で目に見えない参照音声からのスタイリスティックな情報を符号化することは依然として困難である。本稿では、スタイルエンコーダによってモデル化された埋め込み空間を、スタイルエキスパートによって処理される抽出可能なサブセットに分割するアプローチであるStyleMoEを紹介する。提案手法は,TSシステムのスタイルエンコーダをMixture of Experts (MoE)層に置き換える。ゲーティングネットワークを利用して、異なるスタイルの専門家に参照音声をルーティングすることで、各専門家は最適化中のスタイル空間の側面を専門化する。提案手法の有効性を客観的かつ主観的に実証し,多様かつ不明瞭なスタイルに対するスタイル空間のカバー範囲を拡大する。このアプローチは、既存の最先端スタイル転送RTSモデルの性能を向上させることが可能であり、我々の知識に対するスタイル転送RTSにおけるMoEの最初の研究である。 Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 実活動における課題と行動計画 Task and Motion Planning for Execution in the Real ( http://arxiv.org/abs/2406.03641v1 ) ライセンス: Link先を確認	Tianyang Pan, Rahul Shome, Lydia E. Kavraki,	(参考訳) タスク・アンド・モーション・プランニングは、個別のタスク・ドメインに対する推論と連続的なモーション・ジェネレーションを組み合わせた強力なハイブリッド・プランニング手法である。従来の推論では、タスクドメインモデルと十分な情報が必要で、アクションを基盤にして、計画クエリを動作させる。この知識のギャップは、隠蔽や不正確なモデリングのような情報源から生じることが多い。この作業は、作業を含むタスクと動作の計画を生成するが、計画時には完全には理解できない。実行中、そのようなアクションは、提供された人間設計または学習されたクローズドループの振る舞いによって処理される。実行は、タスク目標に到達するまでオフラインで計画された動きとオンライン行動を組み合わせる。行動の失敗は、新しい計画を見つけるための制約として返される。提案したフレームワークを評価し,最先端技術と比較するために,40の実ロボット試験とモチベーション実証を実施した。その結果、実行時間が短縮され、アクションの数が少なくなり、さまざまなギャップが生じる問題の成功率が向上した。実験データは、研究者がこれらの設定をシミュレートするために共有される。この研究は、ロボットが対処できる現実的な部分的な問題に、適用可能なクラスを拡大する、という約束を示している。 Task and motion planning represents a powerful set of hybrid planning methods that combine reasoning over discrete task domains and continuous motion generation. Traditional reasoning necessitates task domain models and enough information to ground actions to motion planning queries. Gaps in this knowledge often arise from sources like occlusion or imprecise modeling. This work generates task and motion plans that include actions cannot be fully grounded at planning time. During execution, such an action is handled by a provided human-designed or learned closed-loop behavior. Execution combines offline planned motions and online behaviors till reaching the task goal. Failures of behaviors are fed back as constraints to find new plans. Forty real-robot trials and motivating demonstrations are performed to evaluate the proposed framework and compare against state-of-the-art. Results show faster execution time, less number of actions, and more success in problems where diverse gaps arise. The experiment data is shared for researchers to simulate these settings. The work shows promise in expanding the applicable class of realistic partially grounded problems that robots can address.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 自由な自己アライメントは可能か? Is Free Self-Alignment Possible? ( http://arxiv.org/abs/2406.03642v1 ) ライセンス: Link先を確認	Dyah Adila, Changho Shin, Yijing Zhang, Frederic Sala,	(参考訳) 事前訓練された言語モデル (LM) の調整は複雑で資源集約的なプロセスであり、多くの場合、大量の地上の好みデータと相当量の計算データにアクセスする必要がある。これらのコストは必要ですか? つまり、本質的なモデル知識のみを使用して、追加のトレーニングなしで整列することが可能か? AlignEZは(1)自己生成の好みデータと(2)表現の編集を利用して、ほぼ費用がかからないアライメントを提供する手法である。推論中、AlignEZはLM表現を変更して望ましくないコンポーネントを減らし、自己生成された選好ペアによって特定される部分空間を用いて望ましいコンポーネントを増強する。実験の結果、このほぼ無償の手順は、6つのデータセットと3つのモデルアーキテクチャで観測されるベーストレーニング済みモデルとチューニング済みモデルの間のギャップを平均31.6%削減することがわかった。さらに、より高価なアライメント手順を高速化する手段としてAlignEZを使用する可能性についても検討する。実験の結果、AlignEZ は、少量の地味嗜好データのみを用いて調整された DPO モデルを改善することがわかった。最後に,AlignEZによる改善が実現可能な条件について検討し,その有効性について貴重な知見を提供する。 Aligning pretrained language models (LMs) is a complex and resource-intensive process, often requiring access to large amounts of ground-truth preference data and substantial compute. Are these costs necessary? That is, it is possible to align using only inherent model knowledge and without additional training? We tackle this challenge with AlignEZ, a novel approach that uses (1) self-generated preference data and (2) representation editing to provide nearly cost-free alignment. During inference, AlignEZ modifies LM representations to reduce undesirable and boost desirable components using subspaces identified via self-generated preference pairs. Our experiments reveal that this nearly cost-free procedure significantly narrows the gap between base pretrained and tuned models by an average of 31.6%, observed across six datasets and three model architectures. Additionally, we explore the potential of using AlignEZ as a means of expediting more expensive alignment procedures. Our experiments show that AlignEZ improves DPO models tuned only using a small subset of ground-truth preference data. Lastly, we study the conditions under which improvement using AlignEZ is feasible, providing valuable insights into its effectiveness.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 氷チャートに基づく海氷分類のための氷損失を用いた部分ラベル学習 Partial Label Learning with Focal Loss for Sea Ice Classification Based on Ice Charts ( http://arxiv.org/abs/2406.03645v1 ) ライセンス: Link先を確認	Behzad Vahedi, Benjamin Lucas, Farnoush Banaei-Kashani, Andrew P. Barrett, Walter N. Meier, Siri Jodha Khalsa, Morteza Karimzadeh,	(参考訳) 北極と地球の気候にとって重要な海氷は、一貫した監視と高解像度のマッピングを必要とする。しかし、手動の海氷マッピングは時間がかかり、主観的であり、自動化された深層学習に基づく分類アプローチの必要性を喚起する。しかし、これらのアルゴリズムの訓練は、訓練データとして一般的に使用される専門家による氷のチャートは、単一の氷のタイプではなく、複数の氷のタイプでポリゴンのマッピングを行うため、困難である。さらに、これらのチャートにおける様々な氷種の分布は、しばしば不均衡であり、支配階級に対する性能バイアスをもたらす。本稿では,複数のラベルとクラス不均衡に対処するための信頼度を明示した部分的ラベル学習タスクとして定式化することで,海氷分類のトレーニングを行う新しいGeoAI手法を提案する。我々は、ポリゴンレベルのラベルを候補部分ラベルとして扱い、対応する氷濃度を各候補ラベルの信頼性スコアとして割り当て、焦点損失と統合して畳み込みニューラルネットワーク(CNN)を訓練する。提案手法により, セチネル-1二重偏極SAR画像の海氷分類性能の向上が図られ, 分類精度が87%から92%に向上し, 平均F-1スコアが90%から93%に向上した。また6つの海氷クラスのうち4つのF-1スコアも改善されている。 Sea ice, crucial to the Arctic and Earth's climate, requires consistent monitoring and high-resolution mapping. Manual sea ice mapping, however, is time-consuming and subjective, prompting the need for automated deep learning-based classification approaches. However, training these algorithms is challenging because expert-generated ice charts, commonly used as training data, do not map single ice types but instead map polygons with multiple ice types. Moreover, the distribution of various ice types in these charts is frequently imbalanced, resulting in a performance bias towards the dominant class. In this paper, we present a novel GeoAI approach to training sea ice classification by formalizing it as a partial label learning task with explicit confidence scores to address multiple labels and class imbalance. We treat the polygon-level labels as candidate partial labels, assign the corresponding ice concentrations as confidence scores to each candidate label, and integrate them with focal loss to train a Convolutional Neural Network (CNN). Our proposed approach leads to enhanced performance for sea ice classification in Sentinel-1 dual-polarized SAR images, improving classification accuracy (from 87% to 92%) and weighted average F-1 score (from 90% to 93%) compared to the conventional training approach of using one-hot encoded labels and Categorical Cross-Entropy loss. It also improves the F-1 score in 4 out of the 6 sea ice classes.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 組合せ最適化のための決定型グラフニューラルネットワーク Decision-focused Graph Neural Networks for Combinatorial Optimization ( http://arxiv.org/abs/2406.03647v1 ) ライセンス: Link先を確認	Yang Liu, Chuan Zhou, Peng Zhang, Shirui Pan, Zhao Li, Hongyang Chen,	(参考訳) 近年,ニューラルネットワークフレームワークによる組合せ最適化(CO)問題の研究に注目が集まっている。これらの課題に取り組むための新たな戦略は、従来のアルゴリズムに代わるグラフニューラルネットワーク(GNN)の採用である。 GNNや従来のアルゴリズムソルバがCOの領域で人気が高まっているにもかかわらず、それらの統合利用とエンドツーエンドフレームワークにおけるそれらの相関について限定的な研究がなされている。私たちの研究の主な焦点は、決定に焦点をあてた学習をグラフに導入することで、より効率的で正確なCOフレームワークを定式化することです。さらに、GNNを利用してCO問題に補助的なサポートで対処する決定に焦点を当てたフレームワークも導入する。エンドツーエンドのアプローチを実現するために、我々は2つのカスケードモジュールを設計した。 (a)教師なし学習グラフ予測モデル、及び (b)2進二進最適化のための解法。最大カット,最大独立セット,最小頂点カバーなど,様々な古典的タスクに対して実証評価を行う。古典的CO問題(MaxCut,MIS,MVC)に対する実験結果から,従来のGNN手法と古典的手法のどちらよりも,本手法の優位性が示された。 In recent years, there has been notable interest in investigating combinatorial optimization (CO) problems by neural-based framework. An emerging strategy to tackle these challenging problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms, a subject that has attracted considerable attention. Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework. The primary focus of our work is to formulate a more efficient and precise framework for CO by employing decision-focused learning on graphs. Additionally, we introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support. To realize an end-to-end approach, we have designed two cascaded modules: (a) an unsupervised trained graph predictive model, and (b) a solver for quadratic binary unconstrained optimization. Empirical evaluations are conducted on various classical tasks, including maximum cut, maximum independent set, and minimum vertex cover. The experimental results on classical CO problems (i.e. MaxCut, MIS, and MVC) demonstrate the superiority of our method over both the standalone GNN approach and classical methods.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 仕様からの強化学習における帰納的一般化 Inductive Generalization in Reinforcement Learning from Specifications ( http://arxiv.org/abs/2406.03651v1 ) ライセンス: Link先を確認	Vignesh Subramanian, Rohit Kushwah, Subhajit Roy, Suguman Bansal,	(参考訳) 論理的仕様からRLの新しい帰納的一般化フレームワークを提案する。 RL環境における多くの興味深いタスクは自然な帰納的構造を持つ。これらの帰納的タスクは同様に全体的目標を持つが、低レベルの述語や分布において帰納的に異なる。本稿では、この帰納的関係を利用して、帰納的タスクのインスタンスに対する適切なポリシーをゼロショットで生成する高階関数、ポリシジェネレータを学習する一般化手順を提案する。提案手法を一組の制御ベンチマークで評価することにより,長期的タスクに対する見当たらないポリシーを一般化する上で,我々のフレームワークが約束することを示す。 We present a novel inductive generalization framework for RL from logical specifications. Many interesting tasks in RL environments have a natural inductive structure. These inductive tasks have similar overarching goals but they differ inductively in low-level predicates and distributions. We present a generalization procedure that leverages this inductive relationship to learn a higher-order function, a policy generator, that generates appropriately adapted policies for instances of an inductive task in a zero-shot manner. An evaluation of the proposed approach on a set of challenging control benchmarks demonstrates the promise of our framework in generalizing to unseen policies for long-horizon tasks.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 長期投資のためのポートフォリオ戦略の組み立て--意思決定とアルゴリズムのための配当自由選好フレームワーク Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms ( http://arxiv.org/abs/2406.03652v1 ) ライセンス: Link先を確認	Duy Khanh Lam,	(参考訳) 本稿では、長期的富という観点から個別の戦略を上回るために、逐次的ポートフォリオのための複数の戦略をまとめることの問題点について考察する。将来の市場における戦略のパフォーマンスの不確実性は、しばしば特定のモデルや統計的仮定に基づいており、投資家はリスクを軽減し、複数の戦略を組み合わせることで堅牢性を高める。しかし、分布のない一貫した選好フレームワークが存在しないことは、あいまいな目的のために組み合わせの決定を複雑にする。このギャップに対処するために、投資家の意思決定選択を確立し、明確な目標を形成することにより、市場条件に関係なく戦略を組み合わせるための新たな意思決定枠組みを導入する。この枠組みを通じて、統計的仮定のない組合せ戦略構築を提案し、決定された基準を満たすような任意の規模のコンポーネント戦略であっても無限である。最後に,提案した戦略を,高速化された変種や他の多戦略とともに検証する。数値実験の結果,シャープ比が小さいが,その累積富が最良成分戦略を上回り,加速戦略が性能を著しく向上させるという,提案した戦略に有利な結果が得られた。 This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. Due to the uncertainty of strategies' performances in the future market, which are often based on specific models and statistical assumptions, investors often mitigate risk and enhance robustness by combining multiple strategies, akin to common approaches in collective learning prediction. However, the absence of a distribution-free and consistent preference framework complicates decisions of combination due to the ambiguous objective. To address this gap, we introduce a novel framework for decision-making in combining strategies, irrespective of market conditions, by establishing the investor's preference between decisions and then forming a clear objective. Through this framework, we propose a combinatorial strategy construction, free from statistical assumptions, for any scale of component strategies, even infinite, such that it meets the determined criterion. Finally, we test the proposed strategy along with its accelerated variant and some other multi-strategies. The numerical experiments show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios, in which their cumulative wealths eventually exceed those of the best component strategies while the accelerated strategy significantly improves performance.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 等価セット制限潜在クラスモデル(ESRLCM) Equivalence Set Restricted Latent Class Models (ESRLCM) ( http://arxiv.org/abs/2406.03653v1 ) ライセンス: Link先を確認	Jesse Bowers, Steve Culpepper,	(参考訳) 潜在クラスモデル(LCM)は多変量分類データをクラスタリングするために使われ、一般に調査応答の解釈に使用される。等価集合制限潜在クラスモデル (ESRLCM) と呼ばれる新しいベイズモデルを提案する。このモデルは、一般的なアイテム応答確率を持つクラスタを特定し、従来の制限された潜在属性モデルよりも汎用的に実行する。本研究では,ESRLCMの識別可能性を検証するとともに,シミュレーションと実世界の応用の両面での有効性を実証する。 Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the identifiability of ESRLCMs, and demonstrate the effectiveness in both simulations and real-world applications.	翻訳日:2024-06-07 18:55:13 公開日:2024-06-05
# 高分解能空中画像のセマンティックセグメンテーションのためのテクスチュアル・ホールグラスネットワーク Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery ( http://arxiv.org/abs/1810.12813v4 ) ライセンス: Link先を確認	Panfeng Li, Youzuo Lin, Emily Schultz-Fellenz,	(参考訳) 航空画像のセマンティックセグメンテーションは、リモートセンシング画像解析において困難かつ重要な問題である。近年、ディープラーニングの成功により、様々な畳み込みニューラルネットワーク(CNN)ベースのモデルが開発されている。しかし、オブジェクトのサイズや不均衡なクラスラベルによって、正確なピクセル単位のセマンティックセグメンテーション結果を得ることは困難である。これらの課題に対処するため,新しいセマンティックセグメンテーション手法を開発し,それをContextual Hourglass Networkと呼ぶ。提案手法では,予測の堅牢性を改善するために,処理された低解像度特徴写像に対する注意機構を組み込んだ新しい時間ガラスモジュールを設計し,文脈意味論を活用する。さらに,複数の時間ガラスモジュールを端から端まで接続することで,エンコーダとデコーダの重ね合わせ構造をさらに活用する。このアーキテクチャは、リッチなマルチスケール機能を効果的に抽出し、中間管理を通じてコンテキストセマンティクスを学習するためのフィードバックループを追加することができる。セマンティックセグメンテーション法の有効性を実証するため,ポツダムとヴァイヒンゲンのデータセットで検証した。他のベースライン手法との比較により,本手法は全体の性能について最高の結果が得られる。 Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.	翻訳日:2024-06-07 05:08:04 公開日:2024-06-05
# サバイバル・サバイバル・スーパービジョンを用いたニューラルトピックモデル:時系列結果の同時予測と臨床像の関連性学習 Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate ( http://arxiv.org/abs/2007.07796v2 ) ライセンス: Link先を確認	George H. Chen, Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss,	(参考訳) 本稿では,特徴関係を明らかにするトピックモデルを同時に学習しながら,生存率を予測するためのニューラルネットワークフレームワークを提案する。特に、トピックが年齢グループ、障害、病気に対応できる「トピック」の分布として、各主題をモデル化する。トピックの存在は、特定の臨床特徴が被験者に現れる可能性が高くなることを意味する。トピックは関連する特徴に関する情報をエンコードし、時間と結果の予測のために教師付きで学習する。我々のフレームワークは、様々なトピックとサバイバルモデルを組み合わせることをサポートし、結果として得られるジョイントサバイバルトピックモデルを、ミニバッチ勾配勾配の標準ニューラルネットオプティマイザを用いて、簡単に大規模データセットにスケールする。例えば、LDA を Cox モデルと組み合わせることが特別な場合であり、その場合、トピック上の対象の分布が Cox モデルへの入力特徴ベクトルとして機能する。臨床データにこれらの神経生存制御トピックモデルを適用する際に生じる実践的実装問題に対処する方法を解説する。提案手法は, 死亡までの予測と入院期間の予測に有効であり, ニューラルサバイバル管理されたトピックモデルが既存のアプローチと競合する精度を達成し, 特徴的関係を説明する解釈可能な臨床トピックが得られた。私たちのコードは、https://github.com/georgehc/survival-topics.comで利用可能です。 We present a neural network framework for learning a survival model to predict a time-to-event outcome while simultaneously learning a topic model that reveals feature relationships. In particular, we model each subject as a distribution over "topics", where a topic could, for instance, correspond to an age group, a disorder, or a disease. The presence of a topic in a subject means that specific clinical features are more likely to appear for the subject. Topics encode information about related features and are learned in a supervised manner to predict a time-to-event outcome. Our framework supports combining many different topic and survival models; training the resulting joint survival-topic model readily scales to large datasets using standard neural net optimizers with minibatch gradient descent. For example, a special case is to combine LDA with a Cox model, in which case a subject's distribution over topics serves as the input feature vector to the Cox model. We explain how to address practical implementation issues that arise when applying these neural survival-supervised topic models to clinical data, including how to visualize results to assist clinical interpretation. We study the effectiveness of our proposed framework on seven clinical datasets on predicting time until death as well as hospital ICU length of stay, where we find that neural survival-supervised topic models achieve competitive accuracy with existing approaches while yielding interpretable clinical topics that explain feature relationships. Our code is available at: https://github.com/georgehc/survival-topics	翻訳日:2024-06-07 05:08:03 公開日:2024-06-05
# ノイズのある半教師あり学習におけるほぼ正確な回復 Almost exact recovery in noisy semi-supervised learning ( http://arxiv.org/abs/2007.14717v4 ) ライセンス: Link先を確認	Konstantin Avrachenkov, Maximilien Dreveton,	(参考訳) グラフに基づく半教師付き学習手法は、グラフ構造とラベル付きデータを組み合わせ、ラベルなしデータを分類する。本研究では,ノイズの多いオラクルが分類に与える影響について検討する。特に、雑音の多いオラクルがラベルのごく一部を明らかにすると、Degree Corrected Stochastic Block Model (DC-SBM) をクラスタリングするための最大 A Posteriori (MAP) 推定器を導出する。次に、MAPの連続緩和から導かれるアルゴリズムを提案し、その一貫性を確立する。数値実験により,非常にノイズの多いラベル付きデータであっても,合成および実データに対して有望な性能が得られることが示された。 Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-05
# 境界性を考慮した階層型ゲームにおける解の概念と自律運転への応用 Solution Concepts in Hierarchical Games under Bounded Rationality with Applications to Autonomous Driving ( http://arxiv.org/abs/2009.10033v5 ) ライセンス: Link先を確認	Atrisha Sarkar, Krzysztof Czarnecki,	(参考訳) 自律走行車(AV)が通常の人間の交通にさらに統合されることで、AVモーションプランニングをマルチエージェント問題として扱うことについてのコンセンサスが高まっている。しかしながら、完全合理性という伝統的なゲーム理論の仮定は、人間の運転には強すぎるため、人間の運転を行動ゲーム理論レンズを通して「emph{bounded rational}(有理性有理性)」の活動として理解する必要がある。その目的のために、有界な有理行動の4つのメタモデル、すなわち、量子レベル-kに基づく3つのメタモデルと、量子エラーを伴うナッシュ平衡に基づく1つのメタモデルを適用する。運転行動のゲーム理論モデルを作成するために,多エージェント動作計画に使用されるフレームワークである階層型ゲーム(hierarchical game)のコンテキストに適用可能な,異なる解の概念を定式化する。さらに、約4kのエージェントと44kの意思決定ポイントを持つ都市交差点における人間運転の寄与したデータセットに基づいて、自然主義的データに適合するモデルと予測能力に基づいて行動モデルを評価する。以上の結果から, 運転行動モデルとして評価された動作モデルのうち, レベル0の振る舞いをルールフォローとしてモデル化したQuantal Level-kモデルの適応として, 運転行動のモデル化が, 自然主義運転行動に最も適していることが示唆された。軌道のレベルでは、アクションのバウンドサンプリングとマックス非ストラテジックモデルは、比較対象モデルの集合の中で最も正確である。また、状況要因が行動モデルの性能に与える影響も明らかにした。 With autonomous vehicles (AV) set to integrate further into regular human traffic, there is an increasing consensus on treating AV motion planning as a multi-agent problem. However, the traditional game-theoretic assumption of complete rationality is too strong for human driving, and there is a need for understanding human driving as a \emph{bounded rational} activity through a behavioural game-theoretic lens. To that end, we adapt four metamodels of bounded rational behaviour: three based on Quantal level-k and one based on Nash equilibrium with quantal errors. We formalize the different solution concepts that can be applied in the context of hierarchical games, a framework used in multi-agent motion planning, for the purpose of creating game theoretic models of driving behaviour. Furthermore, based on a contributed dataset of human driving at a busy urban intersection with a total of approximately 4k agents and 44k decision points, we evaluate the behaviour models on the basis of model fit to naturalistic data, as well as their predictive capacity. Our results suggest that among the behaviour models evaluated, at the level of maneuvers, modeling driving behaviour as an adaptation of the Quantal level-k model with level-0 behaviour modelled as pure rule-following provides the best fit to naturalistic driving behaviour. At the level of trajectories, bounds sampling of actions and a maxmax non-strategic models is the most accurate within the set of models in comparison. We also find a significant impact of situational factors on the performance of behaviour models.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-05
# DoubleML - Rにおけるダブル機械学習のオブジェクト指向実装 DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R ( http://arxiv.org/abs/2103.09603v6 ) ライセンス: Link先を確認	Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen,	(参考訳) RパッケージDoubleMLはChernozhukov et al (2018)のダブル/デバイアスの機械学習フレームワークを実装している。機械学習手法に基づいた因果モデルでパラメータを推定する機能を提供する。ダブル機械学習フレームワークは、Neymanの直交性、高品質な機械学習推定、サンプル分割という3つの重要な要素で構成されている。ニュアンスコンポーネントの推定は、mlr3エコシステムで利用可能なさまざまな最先端の機械学習手法によって行うことができる。 DoubleMLは、部分的に線形でインタラクティブな回帰モデルや、機器変数推定の拡張を含む、さまざまな因果モデルで推論を行うことができる。 DoubleMLのオブジェクト指向実装は、モデル仕様の柔軟性を高め、容易に拡張できるようにする。本稿では、Double Machine LearningフレームワークとRパッケージDoubleMLについて紹介する。シミュレーションおよび実データを用いた再現可能なコード例では、DoubleMLユーザーが機械学習手法に基づいて有効な推論を行うことができることを示す。 The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-05
# 自己指導型学習による話者検証における対向ロバスト性の改善 Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning ( http://arxiv.org/abs/2106.00273v4 ) ライセンス: Link先を確認	Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee,	(参考訳) 以前の研究では、自動話者検証(ASV)が、リプレイ、合成音声、最近出現した敵攻撃などの悪意のある密封攻撃に深刻な脆弱性があることが示されている。 ASVをリプレイや合成音声から守る努力が盛んに行われているが、敵の攻撃に対処するためのアプローチはごくわずかである。 ASVの敵攻撃に取り組むための既存のアプローチは、敵のサンプル生成の知識を必要とするが、敵の攻撃者によって適用される正確な攻撃アルゴリズムを知ることは現実的ではない。この研究は、特定の攻撃アルゴリズムを知らずにASVの敵防衛を行う最初の試みの一つである。自己教師型学習モデル(SSLMs)により、入力中の表面ノイズを緩和し、中断されたものからクリーンなサンプルを再構築する利点を持つが、この研究は、敵の摂動を一種のノイズとみなし、SSLMsによるASVに対する敵の防御を行う。具体的には,2つの視点から敵防衛を行うことを提案する。 1)敵の摂動浄化と 2)対向的摂動検出。実験の結果, 検出モジュールは, 約80%の精度で対向検体を検出することにより, ASVを効果的に遮蔽することがわかった。さらに, ASV の敵防衛性能を評価するための一般的な指標は存在しないため, 浄化法と検出法の両方を考慮することで, 敵防衛評価指標を定式化する。提案した評価フレームワークに基づいて,今後のアプローチのベンチマークを強く推奨する。 Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV against replay and synthetic speech; however, only a few approaches have been explored to deal with adversarial attacks. All the existing approaches to tackle adversarial attacks for ASV require the knowledge for adversarial samples generation, but it is impractical for defenders to know the exact attack algorithms that are applied by the in-the-wild attackers. This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. Inspired by self-supervised learning models (SSLMs) that possess the merits of alleviating the superficial noise in the inputs and reconstructing clean samples from the interrupted ones, this work regards adversarial perturbations as one kind of noise and conducts adversarial defense for ASV by SSLMs. Specifically, we propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%. Moreover, since there is no common metric for evaluating the adversarial defense performance for ASV, this work also formalizes evaluation metrics for adversarial defense considering both purification and detection based approaches into account. We sincerely encourage future works to benchmark their approaches based on the proposed evaluation framework.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-05
# 一般化された「Notの平方根」行列とその隠れた論理作用素の発表および完全行列円ユーラー関数の定義への応用 Generalized "Square roots of Not" matrices, their application to the unveiling of hidden logical operators and to the definition of fully matrix circular Euler functions ( http://arxiv.org/abs/2107.06067v4 ) ライセンス: Link先を確認	Eduardo Mizraji,	(参考訳) ノットの平方根は量子コンピューティング理論において重要な論理演算子であり、それ自身で数学的対象として興味を持つ。物理学では、次元 2 の平方複素行列である。現在の研究において、これは任意の次元の複素正方行列である。線形代数の論理理論への導入は、近年、ニューラルネットワークと量子コンピューティングの分野の研究によって強化されている。ここでは、行列による論理演算の表現を簡潔に記述し、Nt演算子の2乗根に対する一般表現がどのように得られるかを示す。次に2つのトピックを探求します。まず、Deutschのアルゴリズムの短い形式の非量子領域の拡張について検討する。そして、Not の根は虚数単位 i の行列拡大であると仮定し、この考えの下で、オイラー拡大と複素指数関数による円函数の表現に対する完全行列バージョンを得る。 The square root of Not is a logical operator of importance in quantum computing theory and of interest as a mathematical object in its own right. In physics, it is a square complex matrix of dimension 2. In the present work it is a complex square matrix of arbitrary dimension. The introduction of linear algebra into logical theory has been enhanced in recent decades by the researches in the field of neural networks and quantum computing. Here we will make a brief description of the representation of logical operations through matrices and we show how general expressions for the two square roots of the Not operator are obtained. Then, we explore two topics. First, we study an extension to a non-quantum domain of a short form of Deutsch's algorithm. Then, we assume that a root of Not is a matrix extension of the imaginary unit i, and under this idea we obtain fully matrix versions for the Euler expansions and for the representations of circular functions by complex exponentials.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# Wasserstein分布ロバスト最適化のための短時間かつ一般的な双対証明 A Short and General Duality Proof for Wasserstein Distributionally Robust Optimization ( http://arxiv.org/abs/2205.00362v4 ) ライセンス: Link先を確認	Luhao Zhang, Jincheng Yang, Rui Gao,	(参考訳) 本稿では, 関東ロビッチ輸送コスト, 測定可能な損失関数, および有意な確率分布を抑えるような, 分散的ロバストな最適化のための一般化双対性結果を提案する。既存の双対性の結果に固有の交換性原理を仮定すると、我々の証明は1次元凸解析のみを用いる。さらに、ある可測射影と弱い可測選択条件が満たされている場合にのみ、交換性原理が成立することを示した。提案手法のより広範な適用性を示すため,マルコフ決定過程と多段階確率計画における双対性結果の厳密な扱いについて述べる。さらに、インフィニティ・ワッサーシュタイン分布の安定最適化、リスク-逆最適化、グローバル分布の堅牢化など、他の問題にも分析を拡張します。 We present a general duality result for Wasserstein distributionally robust optimization that holds for any Kantorovich transport cost, measurable loss function, and nominal probability distribution. Assuming an interchangeability principle inherent in existing duality results, our proof only uses one-dimensional convex analysis. Furthermore, we demonstrate that the interchangeability principle holds if and only if certain measurable projection and weak measurable selection conditions are satisfied. To illustrate the broader applicability of our approach, we provide a rigorous treatment of duality results in distributionally robust Markov decision processes and distributionally robust multistage stochastic programming. Additionally, we extend our analysis to other problems such as infinity-Wasserstein distributionally robust optimization, risk-averse optimization, and globalized distributionally robust counterpart.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# 動的ランク付けと翻訳同期 Dynamic Ranking and Translation Synchronization ( http://arxiv.org/abs/2207.01455v4 ) ライセンス: Link先を確認	Ernesto Araya, Eglantine Karlé, Hemant Tyagi,	(参考訳) スポーツトーナメントやレコメンデーションシステムなど,多くのアプリケーションにおいて,1組の$n$アイテム(またはプレイヤー)のペア比較からなる廃棄データがある。目的は、このデータを使って各項目の潜在強度と/またはランキングを推測することである。この問題の既存の結果は、主に単一の比較グラフ$G$からなる設定に焦点を当てている。しかし、ペア比較データが時間とともに進化するシナリオ(例えばスポーツトーナメント)が存在する。この動的設定の理論的結果は比較的限定的であり,本論文の焦点となっている。本研究では, 動的セッティングに対する \emph{translation synchro} 問題の拡張について検討する。ここで $\mathcal{T} \subset [0,1]$ は時間領域を表す格子であり、各項目 $i$ と time $t\in \mathcal{T}$ に対して、関連する未知の強度パラメータ $z^_{t,i}\in \mathbb{R}$ が存在する。我々は、$t\in\mathcal{T}$ に対して、強度ベクトル $z^_t=(z^_{t,1},\dots,z^_{t,n})$ を $z^_{t,i}-z^_{t,j}$ のノイズ測定から回復することを目指している。例えば、$z^_t$が$t$で滑らかに進化すると仮定すると、スムーズネスの最小二乗法に基づく2つの推定器と、適切な滑らかさ作用素の低周波固有空間への射影に基づく2つの推定器を提案する。両方の推定器に対して、$G_t$がすべての$t\in \mathcal{T}$に対して連結であるという仮定の下で$\ell_2$推定誤差に対して有限サンプル境界を与えるので、グリッドサイズ$\|\mathcal{T}\|$という観点から提案された手法の整合性を証明することができる。我々は、理論的な結果と、合成および実データに関する実験を補完する。 In many applications, such as sport tournaments or recommendation systems, we have at our disposal data consisting of pairwise comparisons between a set of $n$ items (or players). The objective is to use this data to infer the latent strength of each item and/or their ranking. Existing results for this problem predominantly focus on the setting consisting of a single comparison graph $G$. However, there exist scenarios (e.g., sports tournaments) where the the pairwise comparison data evolves with time. Theoretical results for this dynamic setting are relatively limited and is the focus of this paper. We study an extension of the \emph{translation synchronization} problem, to the dynamic setting. In this setup, we are given a sequence of comparison graphs $(G_t)_{t\in \mathcal{T}}$, where $\mathcal{T} \subset [0,1]$ is a grid representing the time domain, and for each item $i$ and time $t\in \mathcal{T}$ there is an associated unknown strength parameter $z^_{t,i}\in \mathbb{R}$. We aim to recover, for $t\in\mathcal{T}$, the strength vector $z^_t=(z^_{t,1},\dots,z^_{t,n})$ from noisy measurements of $z^_{t,i}-z^_{t,j}$, where $\{i,j\}$ is an edge in $G_t$. Assuming that $z^_t$ evolves smoothly in $t$, we propose two estimators -- one based on a smoothness-penalized least squares approach and the other based on projection onto the low frequency eigenspace of a suitable smoothness operator. For both estimators, we provide finite sample bounds for the $\ell_2$ estimation error under the assumption that $G_t$ is connected for all $t\in \mathcal{T}$, thus proving the consistency of the proposed methods in terms of the grid size $\|\mathcal{T}\|$. We complement our theoretical findings with experiments on synthetic and real data.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# 3次元LiDAR事前写像における図形タグの局所化 Fiducial Tag Localization on a 3D LiDAR Prior Map ( http://arxiv.org/abs/2209.01072v3 ) ライセンス: Link先を確認	Yibo Liu, Jinjun Shan, Hunter Schofield,	(参考訳) LiDARのフィデューシャルタグは、カメラアプリケーションでよく使われる AprilTagに似ているが、LiDARセンサーに人工的な機能を付与する便利なリソースとして機能し、ロボット工学の応用を容易にする。残念ながら、既存のLiDARフィデューシャルタグのローカライズ手法は、3次元LiDARマップには適用されないが、この問題を解決することは、LiDARベースの再ローカライズとナビゲーションにとって有益である。本稿では,3次元LiDAR事前地図上で,画像タグを直接ローカライズする手法を開発し,タグポーズ(ID番号ラベル付き)と頂点位置(インデックスラベル付き)を地図のグローバル座標系に戻す。特に、フィデューシャルタグが付着面と区別できない薄いシートオブジェクトであることを考えると、地図の3次元点雲を強度と幾何学的観点から徐々に解析し、潜在的なタグを含む点クラスターを抽出するパイプラインを設計する。そこで,本研究では,各電位クラスタにタグがあるかどうかを確認し,頂点位置とタグポーズを求める中間平面法を提案する。我々は,3次元LiDARマップ上でタグをローカライズする手法として,従来の手法と比較して精度が向上し,定性的かつ定量的な実験を行った。この作業のオープンソース実装は、https://github.com/York-SDCNLab/Marker-Detection-Generalで公開されている。 The LiDAR fiducial tag, akin to the well-known AprilTag used in camera applications, serves as a convenient resource to impart artificial features to the LiDAR sensor, facilitating robotics applications. Unfortunately, the existing LiDAR fiducial tag localization methods do not apply to 3D LiDAR maps while resolving this problem is beneficial to LiDAR-based relocalization and navigation. In this paper, we develop a novel approach to directly localize fiducial tags on a 3D LiDAR prior map, returning the tag poses (labeled by ID number) and vertex locations (labeled by index) w.r.t. the global coordinate system of the map. In particular, considering that fiducial tags are thin sheet objects indistinguishable from the attached planes, we design a new pipeline that gradually analyzes the 3D point cloud of the map from the intensity and geometry perspectives, extracting potential tag-containing point clusters. Then, we introduce an intermediate-plane-based method to further check if each potential cluster has a tag and compute the vertex locations and tag pose if found. We conduct both qualitative and quantitative experiments to demonstrate that our approach is the first method applicable to localize tags on a 3D LiDAR map while achieving better accuracy compared to previous methods. The open-source implementation of this work is available at: https://github.com/York-SDCNLab/Marker-Detection-General.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# CoopHash: 画像ハッシュのための変分MCMC指導による多目的ディスクリプタとコントラストペアジェネレータの協調学習 CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing ( http://arxiv.org/abs/2210.04288v2 ) ライセンス: Link先を確認	Khoa D. Doan, Jianwen Xie, Yaxuan Zhu, Yang Zhao, Ping Li,	(参考訳) 教師付き情報を活用することで、画像ハッシュ領域での検索性能が向上するが、十分なラベル付きデータなしで性能が著しく低下する。パフォーマンスを向上する効果的な解決策の1つは、GAN(Generative Adversarial Networks)のような生成モデルを使用して、画像ハッシュモデルで合成データを生成することである。しかし、GANに基づく手法は訓練が難しいため、ハッシュ手法が生成モデルとハッシュ関数を協調的に訓練するのを防ぐことができる。この制限により、準最適検索性能が得られる。この制限を克服するため,エネルギーをベースとした協調学習に基づく新たな協調ハッシュネットワークを提案する。このフレームワークは、コントラスト画像を合成するトップダウンコントラスト対生成器と、確率密度、ハッシュコード、潜伏コード、カテゴリを含む複数の視点から画像を同時に表現するボトムアップ多目的記述器の2つのコンポーネントを介して、データの強力な生成表現と堅牢なハッシュ関数を共同で学習する。 2つのコンポーネントは、新しい可能性に基づく協調学習スキームを通じて共同で学習される。提案手法は,複数の実世界のデータセットを用いて実験を行い,提案手法が競合するハッシュ法よりも優れた性能を示し,現在最先端のハッシュ法よりも最大10倍の相対的な改善を実現し,アウト・オブ・ディストリビューション検索における性能が著しく向上したことを示す。 Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost performance is to employ generative models, such as Generative Adversarial Networks (GANs), to generate synthetic data in an image hashing model. However, GAN-based methods are difficult to train, which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation, we propose a novel framework, the generative cooperative hashing network, which is based on energy-based cooperative learning. This framework jointly learns a powerful generative representation of the data and a robust hash function via two components: a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives, including probability density, hash code, latent code, and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods, achieving up to 10\% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out-of-distribution retrieval.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# タスク指向対話におけるインテント誘導による発話埋め込みとクラスタリング手法の解析 Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue ( http://arxiv.org/abs/2212.02021v5 ) ライセンス: Link先を確認	Jeiyoon Park, Yoonna Jang, Chanhee Lee, Heuiseok Lim,	(参考訳) この研究の焦点は、タスク指向のダイアログスキーマの設計において、意図ラベルを各ダイアログターン(インテントクラスタリング)に割り当て、インテントクラスタリング手法(インテントインジェクション)に基づいたインテントセットを生成するという重要な課題を克服するための教師なしアプローチを検討することである。意図の自動誘導には,(1)インテントラベリングのためのクラスタリングアルゴリズム,(2)ユーザ発話の埋め込み空間の2つの因果関係を仮定する。既存の市販クラスタリングモデルとDSTC11評価に基づく埋め込みを比較した。本研究は,意図的帰納課題における発話の埋め込みとクラスタリングの手法の組み合わせを慎重に検討すべきであることを示すものである。また,Agglomerative clusteringによる事前学習したMiniLMは,NMI,ARI,F1,精度,インテント誘導タスクにおけるサンプルカバレッジを著しく向上させることを示した。ソースコードはhttps://github.com/Jeiyoon/dstc11-track2.comで入手できる。 The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that the combined selection of utterance embedding and clustering method in the intent induction task should be carefully considered. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source codes are available at https://github.com/Jeiyoon/dstc11-track2.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# EIT: インタラクティブトランスの強化 EIT: Enhanced Interactive Transformer ( http://arxiv.org/abs/2212.10197v2 ) ライセンス: Link先を確認	Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu,	(参考訳) 補完原理とコンセンサス原理の2つの原則は、多視点学習の文献で広く認識されている。しかし、現在の多視点学習の例である多視点自己意識の設計は、コンセンサスを無視しながら相補性を優先している。この問題に対処するために,拡張型マルチヘッド自己注意(EMHA)を提案する。まず、補間原理を満たすために、EMHAは複数のサブスペース内のクエリとキー間の1対1のマッピング制約を取り除き、各クエリが複数のキーに参加することを可能にする。そこで我々は,2つの相互作用モデル,すなわち,内部空間相互作用と部分空間間相互作用を導入することにより,頭部間のコンセンサスを完全に促進する手法を開発した。幅広い言語タスク(例えば機械翻訳、抽象的な要約と文法の補正、言語モデリング)に対する広範な実験は、その優位性を示し、モデルサイズは非常に緩やかな増加を示している。私たちのコードは、https://github.com/zhengkid/EIT-Enhanced-Interactive-Transformerで利用可能です。 Two principles: the complementary principle and the consensus principle are widely acknowledged in the literature of multi-view learning. However, the current design of multi-head self-attention, an instance of multi-view learning, prioritizes the complementarity while ignoring the consensus. To address this problem, we propose an enhanced multi-head self-attention (EMHA). First, to satisfy the complementary principle, EMHA removes the one-to-one mapping constraint among queries and keys in multiple subspaces and allows each query to attend to multiple keys. On top of that, we develop a method to fully encourage consensus among heads by introducing two interaction models, namely inner-subspace interaction and cross-subspace interaction. Extensive experiments on a wide range of language tasks (e.g., machine translation, abstractive summarization and grammar correction, language modeling), show its superiority, with a very modest increase in model size. Our code would be available at: https://github.com/zhengkid/EIT-Enhanced-Interactive-Transformer.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# SSR-2D:2次元画像からのセマンティック3次元シーン再構成 SSR-2D: Semantic 3D Scene Reconstruction from 2D Images ( http://arxiv.org/abs/2302.03640v4 ) ライセンス: Link先を確認	Junwen Huang, Alexey Artemov, Yujin Chen, Shuaifeng Zhi, Kai Xu, Matthias Nießner,	(参考訳) 3次元屋内空間の包括的セマンティックモデリングへの深層学習アプローチは、3次元領域における高コストなアノテーションを必要とする。本研究では,3Dアノテーションを使わずにセマンティックなシーン再構成を行う中心的な3Dシーンモデリングタスクについて検討する。提案手法の鍵となる考え方は,不完全な3次元再構成とそれに対応するRGB-D画像の両方を利用するトレーニング可能なモデルを設計し,クロスドメインな特徴を体積埋め込みに融合させて,手動または機械で生成できる2次元ラベリングのみを用いて,完全な3次元形状,色,セマンティックスを予測することである。我々の重要な技術的革新は、2Dの観察と未知の3D空間を、それぞれ観察されたRGB画像と2Dのセマンティクスを監督するために、色とセマンティクスの異なるレンダリングを活用することである。さらに,学習パイプラインとそれに対応する手法を開発して,予測された2次元ラベルから学習を可能にする。これは,元の実際のキャプチャを補完する仮想トレーニングビューを合成することにより,セマンティクスのより効率的な自己スーパービジョンループを可能にする。その結果、我々のエンドツーエンドのトレーニング可能なソリューションは、限られたRGB-D画像からの幾何学的完備化、色化、意味マッピングを、3Dの地下構造情報に頼らずに、共同で扱うことができた。提案手法は,2つの大規模ベンチマークデータセットであるMatterPort3DとScanNetのセマンティックシーン補完の最先端性能を実現する。我々の知る限り,本手法は実世界の3Dスキャンの完成とセマンティックセグメンテーションを同時に行う最初の2D駆動方式である。 Most deep learning approaches to comprehensive semantic modeling of 3D indoor spaces require costly dense annotations in the 3D domain. In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images, fusing cross-domain features into volumetric embeddings to predict complete 3D geometry, color, and semantics with only 2D labeling which can be either manual or machine-generated. Our key technical innovation is to leverage differentiable rendering of color and semantics to bridge 2D observations and unknown 3D space, using the observed RGB images and 2D semantics as supervision, respectively. We additionally develop a learning pipeline and corresponding method to enable learning from imperfect predicted 2D labels, which could be additionally acquired by synthesizing in an augmented set of virtual training views complementing the original real captures, enabling more efficient self-supervision loop for semantics. As a result, our end-to-end trainable solution jointly addresses geometry completion, colorization, and semantic mapping from limited RGB-D images, without relying on any 3D ground-truth information. Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet, surpasses baselines even with costly 3D annotations in predicting both geometry and semantics. To our knowledge, our method is also the first 2D-driven method addressing completion and semantic segmentation of real-world 3D scans simultaneously.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-05
# フロー誘導密度比学習を用いた生成モデル Generative Modeling with Flow-Guided Density Ratio Learning ( http://arxiv.org/abs/2303.03714v3 ) ライセンス: Link先を確認	Alvin Heng, Abdul Fatir Ansari, Harold Soh,	(参考訳) 本稿では,最近の研究で導入されたエントロピー規則化f-ディバージェンスの勾配流の静的(時間に依存しない)近似に基づく,簡易かつスケーラブルな生成モデリング手法であるフローガイド密度比学習(FDRL)を提案する。具体的には、GAN判別器によって与えられるスタイル推定器により、抽出可能な時間依存密度比を近似する。これは、サンプル精製の場合、フローのソースとターゲットの分布が互いに近接している場合に十分である。しかし、この仮定は生成には無効であり、二つの分布の間に大きな亀裂があるため、スタイル推定器のナイーブな応用は失敗する。 FDRLは、トレーニングプロセス中にサンプルを段階的に改善することから学ぶように密度比推定器を訓練することを提案する。本手法では,FDRLが128\times128$の次元の画像を生成できるとともに,既存の勾配流ベースラインを定量的なベンチマークで上回り,密度カオス問題を緩和する。また2つのユースケースでFDRLの柔軟性を示す。第一に、非条件FDRLは外部の分類器で容易に構成でき、クラス条件生成を行うことができる。第2に、FDRLはフレームワークに変更を加えることなく、不適切な画像から画像への変換に直接適用することができる。私たちのコードはttps://github.com/clear-nus/fdrl.comで公開されています。 We present Flow-Guided Density Ratio Learning (FDRL), a simple and scalable approach to generative modeling which builds on the stale (time-independent) approximation of the gradient flow of entropy-regularized f-divergences introduced in recent work. Specifically, the intractable time-dependent density ratio is approximated by a stale estimator given by a GAN discriminator. This is sufficient in the case of sample refinement, where the source and target distributions of the flow are close to each other. However, this assumption is invalid for generation and a naive application of the stale estimator fails due to the large chasm between the two distributions. FDRL proposes to train a density ratio estimator such that it learns from progressively improving samples during the training process. We show that this simple method alleviates the density chasm problem, allowing FDRL to generate images of dimensions as high as $128\times128$, as well as outperform existing gradient flow baselines on quantitative benchmarks. We also show the flexibility of FDRL with two use cases. First, unconditional FDRL can be easily composed with external classifiers to perform class-conditional generation. Second, FDRL can be directly applied to unpaired image-to-image translation with no modifications needed to the framework. Our code is publicly available at ttps://github.com/clear-nus/fdrl.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 機械学習ベンチマーク性能における多重性を考慮した会計 Accounting for multiplicity in machine learning benchmark performance ( http://arxiv.org/abs/2303.07272v4 ) ライセンス: Link先を確認	Kajsa Møllersen, Einar Holsbø,	(参考訳) 機械学習の手法は一般に評価され、公開リポジトリのデータセットのパフォーマンスによって比較される。これにより、複数のメソッド、しばしば数千のメソッドが、同じ条件下で、時間にわたって評価される。問題における最上位の成績は「最先端(SOTA)パフォーマンス」と呼ばれ、新しい手法を公表するための基準点として用いられる。 SOTAの最大性能を推定として用いることは偏りのある推定器であり、過度に楽観的な結果を与える。マルチプリシティ(multiplicity)は、複数の比較と複数のテストの文脈でよく研究されているトピックであるが、著者たちが認識している限り、SOTAの推定に関する議論からほとんど欠落している。新しい手法を評価するための基準として,楽観的な最先端推定法が用いられ,その結果が著しく劣る手法が容易に見過ごされてしまう。本稿では、複数の分類器の場合の確率分布について、既知の解析手法を適用できるようにし、より優れたSOTA推定値を提供する。独立分類器を用いた模擬例による乗法の影響を実演する。分類器依存性が分散にどのように影響するかを示すとともに、精度が高い場合には影響が制限されることを示す。最後に,実世界の3つの実例について論じる。 Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss three real-world examples; Kaggle competitions that demonstrate various aspects.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 大規模言語モデルにおけるヒューマンライクな翻訳評価を可能にする誤り解析 Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models ( http://arxiv.org/abs/2303.13809v4 ) ライセンス: Link先を確認	Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, Dacheng Tao,	(参考訳) 生成型大規模言語モデル(LLM)、例えばChatGPTは、機械翻訳、テキスト要約など、いくつかのNLPタスクにおいて顕著な習熟性を示している。最近の研究 (Kocmi and Federmann, 2023) では、機械翻訳(MT)の品質評価にLLMを用いることで、システムレベルでは最先端のパフォーマンスが得られるが、セグメントレベルでは‘textit{performs’が不十分であることが示されている。 MTの品質評価におけるLCMの性能をさらに向上するため,いくつかのプロンプト設計について検討し,Chain-of-Thoughts (Wei et al , 2022) とError Analysis (Lu et al , 2023) を組み合わせた新しいプロンプト法である \textbf{\textt{Error Analysis Prompting}} (EAPrompt) を提案する。この手法は,多次元品質指標 (MQM, Freitag et al (2021)) と \textit{produces describeable and reliable MT evaluations at the system and segment level} をエミュレートする。 WMT22のメトリクス共有タスクによる実験結果は、異なる構造を持つ各種LLMにおけるEAPromptの有効性を検証した。さらに分析した結果、EAPromptは大規模なエラーとマイナーエラーを効果的に区別し、MQMと類似したエラー数の分布を共有していることがわかった。これらの結果から,人為的評価手法としてのEAPromptの可能性が示唆された。 Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization. Recent research (Kocmi and Federmann, 2023) has shown that utilizing LLMs for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but \textit{performs poorly at the segment level}. To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023). This technique emulates the commonly accepted human evaluation framework - Multidimensional Quality Metrics (MQM, Freitag et al. (2021)) and \textit{produces explainable and reliable MT evaluations at both the system and segment level}. Experimental Results from the WMT22 metrics shared task validate the effectiveness of EAPrompt on various LLMs, with different structures. Further analysis confirms that EAPrompt effectively distinguishes major errors from minor ones, while also sharing a similar distribution of the number of errors with MQM. These findings highlight the potential of EAPrompt as a human-like evaluator prompting technique for MT evaluation.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 量子チャネルと量子状態のいくつかの絶対的性質 Quantum channels and some absolute properties of quantum states ( http://arxiv.org/abs/2304.00711v2 ) ライセンス: Link先を確認	Tapaswini Patro, Kaushiki Mukherjee, Nirman Ganguly,	(参考訳) 環境相互作用は、量子情報処理プロトコルの実際の応用においてユビキタスである。このような相互作用は量子資源の枯渇をもたらす。量子情報の文脈における2つの重要なメリットは、完全に絡み合った分数(FEF)と複合量子系の条件エントロピーである。 FEFはテレポーテーションのようなタスクで重要な役割を担います。一方、条件エントロピーは特定の量子状態に対して負となりうるので、負性は密度の高い符号化や状態の融合といったタスクの資源として残っている。 FEF $ > 1/d $ for a $ d \otimes d $ 量子系は重要なしきい値であるが、いくつかの量子状態では、大域的なユニタリ演算でさえ閾値以下であり、したがって絶対完全絡み合い(AFEF)を持つ状態として知られている。条件付きフォン・ノイマンエントロピーを含む状態は、大域的ユニタリ作用の下で条件付きエントロピーの非負性を保持する状態があり、絶対的条件付きフォン・ノイマンエントロピー非負性状態 (ACVENN) と呼ばれる。本論文では、2つの量子ビットと2つの量子ビットの量子チャネルの作用を探索し、いくつかの量子状態が非絶対状態からその作用の下で絶対状態へ移動することを示す。グローバルなユニタリ操作では絶対的でない状態に戻すことができないため、絡み合いスワッピングネットワークを用いた検索のための処方料を提供する。さらに、絶対性の概念を条件R'enyiエントロピーに拡張し、絶対条件R'enyiエントロピー非負性(ACRENN)を持つ状態に必要な条件を求める。次に、三部構造系の限界部分を含むようにその作業を拡張し、上記の絶対性に関してそれらの特徴を与える。 Environmental interactions are ubiquitous in any real-world application of a quantum information processing protocol. Such interactions result in depletion of quantum resources. Two important figure of merits in the context of quantum information are the fully entangled fraction (FEF) and conditional entropy of a composite quantum system. FEF has a key role to play in tasks like teleportation. Conditional entropy on the other hand can be negative for certain quantum states and thus the negativity remains a resource for tasks like dense coding and state merging. FEF $ > 1/d $ for a $ d \otimes d $ quantum system is a significant threshold, however for some quantum states it remains less than the threshold even with global unitary operations, consequently being known as states having absolute fully entangled fraction (AFEF). Pertaining to conditional von Neumann entropy, there are some states which retains the nonnegativity of the conditional entropy under global unitary action, to be called as states with absolute conditional von Neumann entropy nonnegative (ACVENN) property. In the present submission, we probe the action of some quantum channels in two qubits and two qudits and find that some quantum states move from the non-absolute regime to the absolute regime under the action. Since, global unitary operations are unable to retrieve them back to the non-absolute regime, we provide a prescription for the retrieval using an entanglement swapping network. Furthermore, we extend the notion of absoluteness to conditional R\'enyi entropies and find the required condition for a state to have absolute conditional R\'enyi entropy non-negative (ACRENN) property. We then extend the work to include the marginals of a tripartite system and provide for their characterization with respect to the aforementioned absolute properties.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning ( http://arxiv.org/abs/2304.05366v2 ) ライセンス: Link先を確認	Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson,	(参考訳) 教師付き学習のための無料ランチ定理は、学習者が全ての問題を解くことができず、学習者が学習上の一様分布に対して平均的に全く同じ精度を達成できないことを述べています。したがって、これらの定理は、個々の問題は特別に調整された帰納的バイアスを必要とするという概念を支持するためにしばしば言及される。事実上、全ての一様サンプルデータセットは複雑さが高いが、現実の問題は不均等に低複雑さのデータを生成し、ニューラルネットワークモデルがコルモゴロフ複雑性を用いて形式化された同じ好みを共有していると論じる。特に、コンピュータビジョンのような特定のドメイン用に設計されたアーキテクチャは、さまざまな無関係な領域でデータセットを圧縮できることを示す。実験の結果,事前学習およびランダムに初期化される言語モデルでは,低複雑さのシーケンスを生成することが好ましいことがわかった。フリーランチの定理は個々の問題に特別な学習者が要ることを示すものではないが、ラベル付きデータが乏しい場合や豊富でない場合など、人間の介入を必要とするタスクを1つの学習アルゴリズムに自動化する方法を説明する。これらの観察は、ますます小さな機械学習モデルで異なるように見える問題を統一する深層学習の傾向を正当化する。 No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 有限体上の多項式系の多段階解法とストリーム暗号トリビウムに対する新しい代数的攻撃 A multistep strategy for polynomial system solving over finite fields and a new algebraic attack on the stream cipher Trivium ( http://arxiv.org/abs/2304.07820v2 ) ライセンス: Link先を確認	Roberto La Scala, Federico Pintore, Sharwan K. Tiwari, Andrea Visconti,	(参考訳) 本稿では,有限体上の多変量多項式方程式系を解くための推測・決定・ハイブリッド戦略の多段階一般化を提案する。特に,変数のサブセットの抜本的な評価を段階的に行うこと,すなわち,評価が解決不可能な多項式系に導かれる度に,そのようなサブセットのサイズを増大させることを提案する。どの評価を拡張するかの決定は、現在の評価の後、不完全グロブナー基底を演算する前処理に基づいており、さらなる変数を排除するために使用される線形多項式を生成する可能性がある。システム内の残りの変数数がまだ高すぎると判断された場合、評価は拡張され、前処理が反復される。そうでなければ、完全なGrobner基底計算によってシステムを解く。暗号解析の応用を念頭に置いて,少なくとも1つの解を持つ多項式系を設計したMultiSolveというアルゴリズムで,この戦略を実装した。変数の異なる部分集合に対する評価テストセットで提案した前処理を実行することにより,確率分布に基づく複雑性の公式が容易に推定できることを示す。マルチソルブの最適複雑性は、最大ステップ数で全マルチステップ戦略を用いて達成され、その結果、単一のステップからなる戦略である標準的な推測・決定戦略が最悪の選択であることを示す。最後に、よく知られたストリーム暗号 Trivium に対する代数的攻撃を行う際に、MultiSolve の挙動を広範囲に研究する。 In this paper we introduce a multistep generalization of the guess-and-determine or hybrid strategy for solving a system of multivariate polynomial equations over a finite field. In particular, we propose performing the exhaustive evaluation of a subset of variables stepwise, that is, by incrementing the size of such subset each time that an evaluation leads to a polynomial system which is possibly unfeasible to solve. The decision about which evaluation to extend is based on a preprocessing consisting in computing an incomplete Grobner basis after the current evaluation, which possibly generates linear polynomials that are used to eliminate further variables. If the number of remaining variables in the system is deemed still too high, the evaluation is extended and the preprocessing is iterated. Otherwise, we solve the system by a complete Grobner basis computation. Having in mind cryptanalytic applications, we present an implementation of this strategy in an algorithm called MultiSolve which is designed for polynomial systems having at most one solution. We prove explicit formulas for its complexity which are based on probability distributions that can be easily estimated by performing the proposed preprocessing on a testset of evaluations for different subsets of variables. We prove that an optimal complexity of MultiSolve is achieved by using a full multistep strategy with a maximum number of steps and in turn the standard guess-and-determine strategy, which essentially is a strategy consisting of a single step, is the worst choice. Finally, we extensively study the behaviour of MultiSolve when performing an algebraic attack on the well-known stream cipher Trivium.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# LaMP: 大きな言語モデルがパーソナライゼーションに出会ったとき LaMP: When Large Language Models Meet Personalization ( http://arxiv.org/abs/2304.11406v4 ) ライセンス: Link先を確認	Alireza Salemi, Sheshera Mysore, Michael Bendersky, Hamed Zamani,	(参考訳) 本稿では、大規模言語モデルにおけるパーソナライズの重要性を強調し、パーソナライズされた出力を生成するための言語モデルのトレーニングと評価のための新しいベンチマークであるLaMPベンチマークを紹介する。 LaMPは、さまざまな言語タスクと、各ユーザプロファイルに対する複数のエントリを備えた総合的な評価フレームワークを提供する。パーソナライズされた7つのタスクで構成され、3つのテキスト分類と4つのテキスト生成タスクで構成されている。また、言語モデル出力をパーソナライズするために、各ユーザプロファイルから個人項目を検索する2つの検索拡張アプローチを提案する。そこで本研究では,用語マッチング,意味マッチング,時間認識など,さまざまな検索モデルについて検討する。ゼロショットおよび微調整言語モデルに対するLaMPの大規模な実験は、提案手法の有効性を示し、様々な自然言語タスクにおけるパーソナライズの影響を強調している。 This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark -- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text classification and four text generation tasks. We additionally propose two retrieval augmentation approaches that retrieve personal items from each user profile for personalizing language model outputs. To this aim, we study various retrieval models, including term matching, semantic matching, and time-aware methods. Extensive experiments on LaMP for zero-shot and fine-tuned language models demonstrate the efficacy of the proposed retrieval augmentation approach and highlight the impact of personalization in various natural language tasks.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# テキストと画像のパーソナライズのためのキーロック付きランク1編集 Key-Locked Rank One Editing for Text-to-Image Personalization ( http://arxiv.org/abs/2305.01644v2 ) ライセンス: Link先を確認	Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon,	(参考訳) テキスト・ツー・イメージ・モデル(T2I)は、ユーザーが自然言語を通じて創造的なプロセスをガイドできるようにすることで、新しいレベルの柔軟性を提供する。しかし、これらのモデルをユーザが提供する視覚概念に合わせてパーソナライズすることは、依然として難しい問題である。 T2Iのパーソナライゼーションのタスクは、高い視覚的忠実さを維持しながら創造的な制御を可能にし、複数のパーソナライズされた概念を単一のイメージに組み合わせ、小さなモデルサイズを維持するなど、複数の困難を伴っている。本稿では,これらの課題に対処するT2Iパーソナライズ手法であるPerfusionを提案する。 Perfusionは、新しい概念のクロスアテンションキーをそれらのスーパーオーディネートカテゴリに"ロックする"新しいメカニズムを導入することで、過度な適合を避ける。さらに,推論時間における学習概念の影響を制御し,複数の概念を組み合わせることを可能とするゲートランク1アプローチを開発した。これにより、100KBのトレーニングモデルで視覚的忠実度とテキストアライメントのランタイム効率のバランスが、現在の最先端モデルよりも5桁小さい。さらに、トレーニングを追加することなく、Paretoフロントのさまざまな操作ポイントにまたがることができる。最後に,Perfusionが質的,定量的両面で高いベースラインを達成していることを示す。重要なことに、キーロックは従来のアプローチと比較して新しい結果をもたらし、一発設定でも前例のない方法でパーソナライズされたオブジェクトインタラクションを表現できる。 Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# MoMo:適応学習率のためのモーメントモデル MoMo: Momentum Models for Adaptive Learning Rates ( http://arxiv.org/abs/2305.07583v3 ) ライセンス: Link先を確認	Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower,	(参考訳) 最新の機械学習アーキテクチャを新しいタスクでトレーニングするには、大規模な学習速度チューニングが必要であり、計算コストが高い。そこで我々は,任意の運動量法上で使用可能な新しいPolyak型適応学習率を開発し,チューニングを少なくして性能を向上する。まず,モメンタムモデルに基づくSGD-Mの適応学習速度であるMoMoを開発した。 MoMoは、各イテレーションでサンプリングされた損失と勾配の運動量推定を使用して、損失関数のモデルを構築する。我々のモデルは、トランケーションを用いて、損失関数の既知の下限を任意の下限で利用し、例えば、ほとんどの損失はゼロで下限となる。次に、モデルは各イテレーションでほぼ最小化され、次のステップを計算します。我々は、モーメントベースの手法と組み合わせてMoMoをどのように使用できるかを示し、新しいモデルベースの適応学習率のAdamであるMoMo-Adamを開発することでこれを実証する。補間を伴う凸問題に対して、MoMoが$\mathcal{O}(1/\sqrt{K})$収束率に達し、最適値以外の問題固有量の知識を必要としないことを示す。さらに、未知の下界を持つ損失に対して、我々のモデルに組み込まれた下界のオンザフライ推定を開発する。我々は,MNIST,CIFAR,Imagenet上の画像分類器のトレーニング,Criteo上のレコメンデータシステム,翻訳タスクIWSLT14上のトランスフォーマーモデル,拡散モデルに対して,SGD-MとAdamよりもMoMoとMoMo-Adamが頑健であることを示す。 Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new Polyak-type adaptive learning rates that can be used on top of any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (stochastic gradient descent with momentum). MoMo uses momentum estimates of the losses and gradients sampled at each iteration to build a model of the loss function. Our model makes use of any known lower bound of the loss function by using truncation, e.g. most losses are lower-bounded by zero. The model is then approximately minimized at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam, which is Adam with our new model-based adaptive learning rate. We show that MoMo attains a $\mathcal{O}(1/\sqrt{K})$ convergence rate for convex problems with interpolation, needing knowledge of no problem-specific quantities other than the optimal value. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. We show that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR, and Imagenet, for recommender systems on Criteo, for a transformer model on the translation task IWSLT14, and for a diffusion model.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 若干の例による再構成誤差に基づく異常検出 Reconstruction Error-based Anomaly Detection with Few Outlying Examples ( http://arxiv.org/abs/2305.10464v2 ) ライセンス: Link先を確認	Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina,	(参考訳) 再構成エラーに基づくニューラルアーキテクチャは、異常検出に対する古典的なディープラーニングアプローチを構成しており、優れた性能を示している。オートエンコーダをトレーニングすることで、正常さを表すと思われる一連の例を再構築し、十分な大規模な再構成エラーを示すこれらのデータに異常を指摘します。残念なことに、これらのアーキテクチャはデータ内の異常も適切に再構築できるようになっている。この現象は、トレーニングセットに異常がある場合により明らかである。特に、これらの異常がラベル付けされている場合、半教師付きと呼ばれる設定は、オートエンコーダを訓練する最良の方法は、異常を無視し、通常のデータに対する再構成エラーを最小限にすることである。本研究の目的は,正規データのドメイン記述の外部に既知の異常を配置するようにモデルに指示する,再構成エラーに基づくアーキテクチャのアプローチを検討することである。具体的には,通常例と未知例の両方に関連付けられた再構成誤差のコントラストを高め,異常検出性能を向上させるために,限られた数の異常例を利用する。実験の結果,本手法は,標準的なオートエンコーダ手法や,半教師付き異常検出のためのディープラーニング技術よりも優れた性能を実現することがわかった。 Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction error. Unfortunately, these architectures often become able to well reconstruct also the anomalies in the data. This phenomenon is more evident when there are anomalies in the training set. In particular when these anomalies are labeled, a setting called semi-supervised, the best way to train Autoencoders is to ignore anomalies and minimize the reconstruction error on normal data. The goal of this work is to investigate approaches to allow reconstruction error-based architectures to instruct the model to put known anomalies outside of the domain description of the normal data. Specifically, our strategy exploits a limited number of anomalous examples to increase the contrast between the reconstruction error associated with normal examples and those associated with both known and unknown anomalies, thus enhancing anomaly detection performances. The experiments show that this new procedure achieves better performances than the standard Autoencoder approach and the main deep learning techniques for semi-supervised anomaly detection.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# Hint of Thought prompting:LLMによる推論タスクへの説明可能なゼロショットアプローチ Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs ( http://arxiv.org/abs/2305.11461v6 ) ライセンス: Link先を確認	Ioktong Lei, Zhidong Deng,	(参考訳) GPT や PaLM2 などの LLM と通信する手段としては、LCM をよりよく活用するための重要な研究トピックとなっている。単純なプロンプトは単段階の質問ではうまく機能するが、多段階推論タスクの正しい知識経路を永久に活性化することはできない。思考の連鎖(CoT)は、しばしばゼロショットCoTと少数ショットCoTを含むが、最近開発されたプロンプト法であり、LLMに推論プロセスを説明し、算術、記号、コモンセンス推論を含む3つの挑戦的推論タスクにおいて単純なプロンプトよりも優れている。本稿では、説明可能性とゼロショットの一般化を促進する新しい思考ヒント(HoT)を提案する。まず、説明可能なサブクエスト、論理的推論、解答抽出の3つのステップに分解される。第二に、これらの3つのステップはステップバイステップのヒントの形式で順番に順序付けされる。最後に,実験結果から,HoTプロンプトは既存のゼロショットCoTと比較してゼロショット推論タスクに有意なアドバンテージを持つことが示された。 GSM8K, ADDSUB, AQUA, SVAMPなどの数学タスクとStrategyQAのような常識タスクについてゼロショット実験を行った。特に提案されたHoTプロンプトの精度は、GSM8Kが40.50%から67.80%に、AQUAが31.9%から46.4%に、SVAMPが63.7%から76.9%に、ADDSUBが74.7%から87.34%に改善され、GSM8k、AQUA、SVAMPが競合するPoTアプローチを破る結果となった。 As a way of communicating with users and any LLMs like GPT or PaLM2, prompting becomes an increasingly important research topic for better utilization of LLMs. Although simple prompting performs well on single-step questions, it cannot permanently activate the correct knowledge path for multi-step reasoning tasks. The chain of thought (CoT), which often contains zero-shot CoT and few-shot CoT, is a recently developed prompting method that can explain the reasoning process to the LLM and outperforms simple prompting in three challenging reasoning tasks, including arithmetic, symbolic, and commonsense reasoning. In this paper, we propose a novel hint of thought (HoT) prompting with explainability and zero-shot generalization. First, it is decomposed into the following three steps: explainable sub-questions, logical reasoning, and answer extraction. Second, such three steps are sequentially ordered in the format of step-by-step hints, which can be easily adjusted and explained to different tasks. Finally, experimental results demonstrate that our HoT prompting has a significant advantage on the zero-shot reasoning task compared to existing zero-shot CoT. We did zero-shot experiments on math tasks like GSM8K, ADDSUB, AQUA, SVAMP and commonsense tasks such as StrategyQA. In particular, the accuracy of the proposed HoT prompting is improved with GSM8K from 40.50% to 67.80%, with AQUA from 31.9% to 46.4%, with SVAMP from 63.7% to 76.9%, and with ADDSUB from 74.7% to 87.34%, respectively, which even defeats the competitive PoT approach on GSM8k, AQUA, and SVAMP.	翻訳日:2024-06-07 04:46:49 公開日:2024-06-05
# 推薦説明可能性の可視化:調査と新たな展望 Visualization for Recommendation Explainability: A Survey and New Perspectives ( http://arxiv.org/abs/2305.11755v3 ) ライセンス: Link先を確認	Mohamed Amine Chatti, Mouadh Guesmi, Arham Muslim,	(参考訳) システム生成によるレコメンデーションの説明を提供することは、透明で信頼できるレコメンデーションシステムへの重要なステップである。説明可能なレコメンデータシステムは、アウトプットに対して人間の理解可能な理論的根拠を提供する。過去20年間、説明可能なレコメンデーションは、レコメンデーションシステム研究コミュニティで多くの注目を集めてきた。本稿では,レコメンデーションシステムにおける視覚的説明に関する研究成果の総合的なレビューを行うことを目的とする。より具体的には,4次元の「説明目標」,「説明範囲」,「説明スタイル」,「説明形式」の4次元に基づくレコメンデータシステムにおける説明に関する文献を体系的にレビューする。ビジュアライゼーションの重要性を認識し,説明的ビジュアライゼーションの角度からレコメンダシステム文献にアプローチする。その結果,レコメンデーションシステムにおける説明的視覚化を設計し,今後の研究の視点を明らかにするための一連のガイドラインが導出された。このレビューの目的は、研究者や実践者が視覚的に説明可能なレコメンデーション研究の可能性をよりよく理解し、現在および将来のレコメンデーションシステムにおける視覚的説明の体系設計を支援することである。 Providing system-generated explanations for recommendations represents an important step towards transparent and trustworthy recommender systems. Explainable recommender systems provide a human-understandable rationale for their outputs. Over the last two decades, explainable recommendation has attracted much attention in the recommender systems research community. This paper aims to provide a comprehensive review of research efforts on visual explanation in recommender systems. More concretely, we systematically review the literature on explanations in recommender systems based on four dimensions, namely explanation goal, explanation scope, explanation style, and explanation format. Recognizing the importance of visualization, we approach the recommender system literature from the angle of explanatory visualizations, that is using visualizations as a display style of explanation. As a result, we derive a set of guidelines that might be constructive for designing explanatory visualizations in recommender systems and identify perspectives for future work in this field. The aim of this review is to help recommendation researchers and practitioners better understand the potential of visually explainable recommendation research and to support them in the systematic design of visual explanations in current and future recommender systems.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# 論理推論のための抽象的表現に基づく論理駆動型データ拡張 Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning ( http://arxiv.org/abs/2305.12599v5 ) ライセンス: Link先を確認	Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu,	(参考訳) 大きな言語モデルと論理的推論を組み合わせることで、堅牢で信頼性の高い方法で問題に対処する能力が向上する。それでも、論理的推論の複雑な性質は、Webから信頼できるデータを収集して包括的なトレーニングデータセットを構築する際に問題を引き起こし、その後、下流タスクのパフォーマンスに影響を及ぼす。そこで我々はAMR-LDAという新しい論理駆動型データ拡張手法を提案する。 AMR-LDAは、元のテキストを抽象的意味表現(AMR)グラフに変換する。修正されたAMRグラフは、拡張データを生成するためにテキストに変換される。特に,本手法は,GPT-3.5 や GPT-4 などの生成的大言語モデルと,論理駆動型データ拡張による対照的な学習による識別的大言語モデルの両方をアーキテクチャに依存しない。実験的な証拠は,論理的推論,テキストの包含,自然言語推論など,7つの下流タスクにおける性能向上を図り,提案手法の有効性を裏付けるものである。さらに、この手法はReClor Leaderboard\footnote{\url{https://eval.ai/web/challenges/challenge-page/503/ Leaderboard/1347}}に導かれる。ソースコードとデータは公開されている。footnote{\href{https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation -Learning}{AMR-LDA GitHub Repository}}。 Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard\footnote{\url{https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347}}. The source code and data are publicly available\footnote{\href{https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation -Learning}{AMR-LDA GitHub Repository}}.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# 両レベル最適化を用いたロバストアンテホックグラフ記述器 Robust Ante-hoc Graph Explainer using Bilevel Optimization ( http://arxiv.org/abs/2305.15745v2 ) ライセンス: Link先を確認	Kha-Dinh Luong, Mert Kosan, Arlei Lopes Da Silva, Ambuj Singh,	(参考訳) 高度なアプリケーションのための機械学習モデルによる決定を説明することは、透明性を高め、これらの決定を導く上で重要である。これはグラフのモデルにおいて特に当てはまり、決定はしばしばリッチな構造データと属性データを組み合わせた複雑なパターンに依存する。最近の研究は、いわゆるポストホックな説明器の設計に重点を置いているが、何が良い説明を構成するのかというより広範な疑問は、まだ未解決のままである。直感的な特性の1つは、データによって予測を再現するのに十分な情報的説明が必要であることである。言い換えれば、優れた説明器は予測器として再利用することができる。ポストホックの説明者は、その説明が固定モデルパラメータ(例えば、学習されたGNN重み)に大きく依存しているため、この目標を達成することができない。この課題に対処するために,両レベル最適化を用いたグラフニューラルネットワークの説明を化学領域に焦点をあてて発見するために設計された,新規で柔軟なアンテホック説明器であるRAGE(Robust Ante-hoc Graph Explainer)を提案する。 RAGEは、ユーザーが関連性の観点からこれらの説明をランク付けしながら、予測に必要な完全な情報を含む分子サブ構造を効果的に識別することができる。種々の分子分類タスクに関する実験により、RAGEの説明は既存のポストホック法やアンテホック法よりも優れていることが示された。 Explaining the decisions made by machine learning models for high-stakes applications is critical for increasing transparency and guiding improvements to these decisions. This is particularly true in the case of models for graphs, where decisions often depend on complex patterns combining rich structural and attribute data. While recent work has focused on designing so-called post-hoc explainers, the broader question of what constitutes a good explanation remains open. One intuitive property is that explanations should be sufficiently informative to reproduce the predictions given the data. In other words, a good explainer can be repurposed as a predictor. Post-hoc explainers do not achieve this goal as their explanations are highly dependent on fixed model parameters (e.g., learned GNN weights). To address this challenge, we propose RAGE (Robust Ante-hoc Graph Explainer), a novel and flexible ante-hoc explainer designed to discover explanations for graph neural networks using bilevel optimization, with a focus on the chemical domain. RAGE can effectively identify molecular substructures that contain the full information needed for prediction while enabling users to rank these explanations in terms of relevance. Our experiments on various molecular classification tasks show that RAGE explanations are better than existing post-hoc and ante-hoc approaches.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# ゲームにおける学習のための適応的摂動ミラーダイス Adaptively Perturbed Mirror Descent for Learning in Games ( http://arxiv.org/abs/2305.16610v3 ) ライセンス: Link先を確認	Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki,	(参考訳) 本稿では,ゲームにおけるミラー・ディフレクション(MD)アルゴリズムに対するペイオフ摂動手法を提案する。楽観的なMDによって実証された楽観的な学習アルゴリズムの族は、雑音のないシナリオにおける最終段階の収束を成功させ、力学をナッシュ均衡へと導く。最近の再帰的傾向は、アンカーからの距離、すなわち {\it slingshot} の戦略に基づいて、ペイオフ関数が摂動される、摂動アプローチの可能性を浮き彫りにしている。そこで本研究では,スリングショット戦略を予め定義された間隔で繰り返し更新することにより,摂動の大きさを調整できる適応的摂動MD(APMD)を提案する。このイノベーションによって、保証されたレートで、基礎となるゲームのナッシュ均衡を見つけることができます。実証実験により, アルゴリズムの収束が著しく加速していることが確認された。 This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# CGELBank アノテーションマニュアル v1.1 CGELBank Annotation Manual v1.1 ( http://arxiv.org/abs/2305.17347v2 ) ライセンス: Link先を確認	Brett Reynolds, Nathan Schneider, Aryaman Arora,	(参考訳) CGELBankは、ケンブリッジ・グラマー・オブ・イングリッシュ(Cambridge Grammar of the English)から派生した英語の構文形式に基づくツリーバンクおよび関連ツールである。この文書はCGELBankアノテーションスキームの特異性を概説している。 CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language. This document lays out the particularities of the CGELBank annotation scheme.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# アフリカ中心音声認識の強化:一般化可能なASRモデルのための認識不確実性駆動型データ選択 Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models ( http://arxiv.org/abs/2306.02105v6 ) ライセンス: Link先を確認	Bonaventure F. P. Dossou,	(参考訳) アクセントは人間のコミュニケーションを形作る上で重要な役割を担い、明確さと文化的ニュアンスでメッセージを伝え、理解する能力を高める。自動音声認識(ASR)の進歩は著しいが、アフリカ系英語のASRは、訓練データセットが不足しているために検討されている。いくつかのアクティブな学習パラダイムとコアセットのアプローチを組み合わせることで,認識の不確実性を利用してアノテーションプロセスを自動化するマルチラウンド適応プロセスを提案し,関連するコストと人的労力を大幅に削減する。本手法は,データアノテーションを合理化し,モデル不確実性に最も寄与するデータサンプルを戦略的に選択し,訓練効率を向上する。我々は、ハードアクセントへのモデル適応を追跡するために、新しいU-WERメトリックを定義する。提案手法は,複数の領域,データセット,高性能音声モデルにまたがって評価する。以上の結果から,提案手法はWERの相対的改善率を27 %に抑えつつ,既存のベースラインよりも平均45 %少ないデータを必要とすることがわかった。また,非常に低リソースのアクセントに対する分布外一般化を改良し,アクセント付きアフリカASRの文脈で一般化可能なASRモデルを構築する可能性を示した。 https://github.com/bonaventuredossou/active_learning_african_asr.com/。 Accents play a pivotal role in shaping human communication, enhancing our ability to convey and comprehend messages with clarity and cultural nuance. While there has been significant progress in Automatic Speech Recognition (ASR), African-accented English ASR has been understudied due to a lack of training datasets, which are often expensive to create and demand colossal human labor. Combining several active learning paradigms and the core-set approach, we propose a new multi-rounds adaptation process that uses epistemic uncertainty to automate the annotation process, significantly reducing the associated costs and human labor. This novel method streamlines data annotation and strategically selects data samples contributing most to model uncertainty, enhancing training efficiency. We define a new U-WER metric to track model adaptation to hard accents. We evaluate our approach across several domains, datasets, and high-performing speech models. Our results show that our approach leads to a 27\% WER relative average improvement while requiring on average 45\% less data than established baselines. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African ASR. We open-source the code here: https://github.com/bonaventuredossou/active_learning_african_asr.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# MCTS: マルチリファレンス中国語テキスト簡易化データセット MCTS: A Multi-Reference Chinese Text Simplification Dataset ( http://arxiv.org/abs/2306.02796v3 ) ライセンス: Link先を確認	Ruining Chong, Luming Lu, Liner Yang, Jinran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang,	(参考訳) テキストの単純化は、書き直し変換を適用することで、テキストの理解を容易にすることを目的としている。漢文の簡体化に関する研究は、古くからほとんど行われていない。一般的な評価データがないことが、この現象の重要な理由である。本稿では,マルチ参照中国語テキスト単純化データセットであるMCTSを紹介する。本稿では,データセットのアノテーションプロセスについて記述し,詳細な分析を行う。さらに,教師なし手法と高度な大規模言語モデルの性能評価を行った。また、機械翻訳と英語テキストの簡易化を利用して、学習に使用できる中国語テキストの簡易化データも提供する。基礎研究を通じて漢文の簡易化に関する基本的な理解を構築し,今後の研究への参考資料の提供を期待する。すべてのコードとデータはhttps://github.com/blcuicall/mcts/で公開される。 Text simplification aims to make the text easier to understand by applying rewriting transformations. There has been very little research on Chinese text simplification for a long time. The lack of generic evaluation data is an essential reason for this phenomenon. In this paper, we introduce MCTS, a multi-reference Chinese text simplification dataset. We describe the annotation process of the dataset and provide a detailed analysis. Furthermore, we evaluate the performance of several unsupervised methods and advanced large language models. We additionally provide Chinese text simplification parallel data that can be used for training, acquired by utilizing machine translation and English text simplification. We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research. All of the code and data are released at https://github.com/blcuicall/mcts/.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# マルチタスクオフライン事前学習によるモデルベース強化学習 Model-Based Reinforcement Learning with Multi-Task Offline Pretraining ( http://arxiv.org/abs/2306.03360v3 ) ライセンス: Link先を確認	Minting Pan, Yitao Zheng, Yunbo Wang, Xiaokang Yang,	(参考訳) オフラインデータセット上で強化学習(RL)モデルを事前トレーニングすることは、オンラインタスクにおけるトレーニング効率を改善する上で有望な方法だが、さまざまなタスクにまたがるダイナミクスや振る舞いに固有のミスマッチのため、難しい。本稿では,オフラインデータから新しいタスクへ,潜在的に有用なダイナミックスや動作デモを伝達するモデルベースRL法を提案する。第一の考え方は、世界モデルを行動学習のシミュレーターとしてだけでなく、動的表現伝達と政策伝達の両方のタスク関連性を測定するツールとして使うことである。我々は、オフライン-オフラインの類似度重みのセットを生成するために、時間変化、ドメイン選択蒸留損失を構築します。これらの重みは2つの目的を果たす。一身体力学のタスク非依存知識を世界モデルトレーニングの促進のために適応的に伝達し、 (二)対象方針を導出するために、関連するソースアクションを再生することを学ぶこと。本稿では,Meta-WorldとDeepMind Control Suiteの最先端手法と比較して,我々のアプローチの利点を実証する。 Pretraining reinforcement learning (RL) models on offline datasets is a promising way to improve their training efficiency in online tasks, but challenging due to the inherent mismatch in dynamics and behaviors across various tasks. We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance for both dynamics representation transfer and policy transfer. We build a time-varying, domain-selective distillation loss to generate a set of offline-to-online similarity weights. These weights serve two purposes: (i) adaptively transferring the task-agnostic knowledge of physical dynamics to facilitate world model training, and (ii) learning to replay relevant source actions to guide the target policy. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# フィルタの重み分布による精度とロバストさのトレードオフの再検討 Revisiting the Trade-off between Accuracy and Robustness via Weight Distribution of Filters ( http://arxiv.org/abs/2306.03430v4 ) ライセンス: Link先を確認	Xingxing Wei, Shiji Zhao, Bo li,	(参考訳) 敵の攻撃はディープニューラルネットワーク(DNN)の潜在的な脅威であることが証明されており、敵の攻撃に対して多くの方法が提案されている。しかし、ロバスト性を高める一方で、クリーンな精度はある程度低下し、精度とロバスト性の間にトレードオフがあったことを意味する。本稿では, トレードオフ問題に対処するため, 標準学習モデルとロバスト学習モデルとのフィルタの重み分布の差について理論的に検討し, 静的ニューラルネットワークの本質的特性であると主張し, 精度と対向ロバスト性を同時に根本的に改善することが困難である。そこで本研究では,AW-Net(Adversarial Weight-Varied Network)と呼ばれる動的ネットワークアーキテクチャを提案する。 AW-Netは、対向ルータが生成する制御信号に基づいて、ネットワークの重みを適応的に調整する。動的ネットワークアーキテクチャの利点として、クリーンで逆の例は異なるネットワーク重みで処理できるため、精度と逆の堅牢性の両方を高める可能性がある。一連の実験により、我々のAW-Netはクリーンな例と敵対的な例の両方を扱うのにアーキテクチャに優しいことが示され、最先端のロバストモデルよりも優れたトレードオフ性能が得られる。 Adversarial attacks have been proven to be potential threats to Deep Neural Networks (DNNs), and many methods are proposed to defend against adversarial attacks. However, while enhancing the robustness, the clean accuracy will decline to a certain extent, implying a trade-off existed between the accuracy and robustness. In this paper, to meet the trade-off problem, we theoretically explore the underlying reason for the difference of the filters' weight distribution between standard-trained and robust-trained models and then argue that this is an intrinsic property for static neural networks, thus they are difficult to fundamentally improve the accuracy and adversarial robustness at the same time. Based on this analysis, we propose a sample-wise dynamic network architecture named Adversarial Weight-Varied Network (AW-Net), which focuses on dealing with clean and adversarial examples with a "divide and rule" weight strategy. The AW-Net adaptively adjusts the network's weights based on regulation signals generated by an adversarial router, which is directly influenced by the input sample. Benefiting from the dynamic network architecture, clean and adversarial examples can be processed with different network weights, which provides the potential to enhance both accuracy and adversarial robustness. A series of experiments demonstrate that our AW-Net is architecture-friendly to handle both clean and adversarial examples and can achieve better trade-off performance than state-of-the-art robust models.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# PEARL:ロボットマニピュレーションのためのゼロショットクロスタスク設定とロバスト・リワード学習 PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation ( http://arxiv.org/abs/2306.03615v2 ) ライセンス: Link先を確認	Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li,	(参考訳) 嗜好に基づく強化学習(Reinforcement Learning, RL)では、大量の選好ラベルを取得するのに時間と費用がかかる。また、質問された人間の好みは、新しいタスクには利用できない。本稿では,タスクの人間ラベルを使わずに,タスク間の嗜好伝達からポリシーを学習するZero-shot Cross-task Preference Alignment and Robust Reward Learning(PEARL)を提案する。私たちのコントリビューションには、転送と学習プロセスを促進する2つの新しいコンポーネントが含まれています。 1つ目はCPA(Cross-task Preference Alignment)で、最適なトランスポートによってタスク間の好みを転送する。 CPAの鍵となる考え方は、Gromov-Wasserstein 距離を使ってタスク間の軌道を整列させることであり、最適輸送行列は軌道間の対応として機能する。対象タスク選好は、ソースタスク選好ラベルの重み付け和として計算され、対応は重みとして計算される。さらに、これらのラベルから堅牢な学習を確保するために、報酬平均と不確実性の両方をガウス分布としてモデル化するロバスト・リワード・ラーニング(RRL)を導入する。 Meta-World と Robomimic のロボット操作タスクに関する実証的な結果から,提案手法はタスク間で好みラベルを正確に転送し,適切なポリシーを学習可能であることが示された。特に、人間の好みがほとんどない場合、我々のアプローチは既存の手法をはるかに上回っている。私たちのメソッドのコードとビデオは、https://sites.google.com/view/pearl-preference.orgで公開されている。 In preference-based Reinforcement Learning (RL), obtaining a large number of preference labels are both time-consuming and costly. Furthermore, the queried human preferences cannot be utilized for the new tasks. In this paper, we propose Zero-shot Cross-task Preference Alignment and Robust Reward Learning (PEARL), which learns policies from cross-task preference transfer without any human labels of the target task. Our contributions include two novel components that facilitate the transfer and learning process. The first is Cross-task Preference Alignment (CPA), which transfers the preferences between tasks via optimal transport. The key idea of CPA is to use Gromov-Wasserstein distance to align the trajectories between tasks, and the solved optimal transport matrix serves as the correspondence between trajectories. The target task preferences are computed as the weighted sum of source task preference labels with the correspondence as weights. Moreover, to ensure robust learning from these transferred labels, we introduce Robust Reward Learning (RRL), which considers both reward mean and uncertainty by modeling rewards as Gaussian distributions. Empirical results on robotic manipulation tasks from Meta-World and Robomimic demonstrate that our method is capable of transferring preference labels across tasks accurately and then learns well-behaved policies. Notably, our approach significantly exceeds existing methods when there are few human preferences. The code and videos of our method are available at: https://sites.google.com/view/pearl-preference.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# エラーフィードバックはプリコンディショナーを正確に圧縮できる Error Feedback Can Accurately Compress Preconditioners ( http://arxiv.org/abs/2306.06098v5 ) ライセンス: Link先を確認	Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, Dan Alistarh,	(参考訳) ディープ・ネットワークの規模での損失に関する2次情報を活用することは、ディープラーニングのための現在の最適化器の性能を改善するための主要なアプローチの1つである。しかし、GGT (Full-Matrix Adagrad) やM-FAC (Matrix-Free Approximate Curvature) のような正確な完全行列プリコンディショニングのための既存のアプローチは、モデル次元においてメモリ要求が乗算可能である勾配のスライディングウィンドウを格納しなければならないため、小規模モデルにも適用される場合、膨大なストレージコストに悩まされる。本稿では, コンバージェンスを損なうことなく, プリコンディショナーを最大2桁圧縮できる新しい, 効率的なエラーフィードバック手法により, この問題に対処する。具体的には、スペーシフィケーションや低ランク圧縮 \emph{before} を用いて勾配情報をプレコンディショナーに入力し、圧縮誤差を将来の繰り返しにフィードバックする。ディープニューラルネットワークの実験により、このアプローチは完全行列プレコンディショナーを精度損失なく最大99\%の間隔に圧縮することができ、GGTやM-FACのような完全行列プレコンディショナーのメモリオーバーヘッドを効果的に除去できることが示されている。私たちのコードは \url{https://github.com/IST-DASLab/EFCP} で利用可能です。 Leveraging second-order information about the loss at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via a novel and efficient error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Experiments on deep neural networks show that this approach can compress full-matrix preconditioners to up to 99\% sparsity without accuracy loss, effectively removing the memory overhead of full-matrix preconditioners such as GGT and M-FAC. Our code is available at \url{https://github.com/IST-DASLab/EFCP}.	翻訳日:2024-06-07 04:36:49 公開日:2024-06-05
# SqueezeLLM: Dense-and-Sparse量子化 SqueezeLLM: Dense-and-Sparse Quantization ( http://arxiv.org/abs/2306.07629v4 ) ライセンス: Link先を確認	Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer,	(参考訳) 生成型大規模言語モデル(LLM)は、幅広いタスクに対して顕著な結果を示した。しかしながら、これらのモデルを推論するためにデプロイすることは、前例のないリソース要件のため、重大な課題となっている。これにより、既存のデプロイメントフレームワークでは、複雑でコストがかかるマルチGPU推論パイプラインの使用や、より小型でパフォーマンスの低いモデルの使用を余儀なくされている。本研究では, LLMを用いた生成推論の主なボトルネックは, 計算よりもメモリ帯域幅であることを示す。量子化は、精度を下げて重みを表現して有望な解として現れてきたが、以前の試みは、しばしば顕著な性能劣化をもたらした。学習後量子化フレームワークであるSqueezeLLMを導入し、最大3ビットの超低精度でのロスレス圧縮を可能にするとともに、同じメモリ制約下で高い量子化性能を実現する。私たちのフレームワークには2つの新しいアイデアが組み込まれています。 (i)2次情報に基づく最適ビット精度割当てを探索する感度に基づく非一様量子化 (i) 効率のよいスパースフォーマットで、外れ値と感度な重み値を保持するDense-and-Sparse分解。 LLaMAモデルに適用した場合、我々の3ビット量子化はFP16ベースラインからのパープレキシティギャップを、同じメモリ要件の最先端手法と比較して最大2.1倍削減する。さらに、A6000 GPUにデプロイすると、我々の量子化モデルはベースラインと比較して最大2.3倍のスピードアップを達成する。私たちのコードはhttps://github.com/SqueezeAILab/SqueezeLLM.comで利用可能です。 Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is available at https://github.com/SqueezeAILab/SqueezeLLM.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 分布外一般化のためのグラフ構造と特徴補間 Graph Structure and Feature Extrapolation for Out-of-Distribution Generalization ( http://arxiv.org/abs/2306.08076v2 ) ライセンス: Link先を確認	Xiner Li, Shurui Gui, Youzhi Luo, Shuiwang Ji,	(参考訳) アウト・オブ・ディストリビューション(OOD)の一般化は、テスト分布がトレーニング分布からシフトする一般的な学習シナリオを扱う。アプリケーション要求の増大と固有の複雑さにより、グラフOOD問題は特殊なソリューションを必要とします。データ中心の手法は、多くの汎用機械学習タスクのパフォーマンス向上を示すが、グラフOODの一般化に適したデータ拡張手法が特に存在しない。本研究では,非ユークリッド空間線型補間の新しい設計法により,グラフOOD一般化を実現することを提案する。提案手法は,OODグラフデータを生成するために,構造空間と特徴空間の両方を外挿する。我々の設計は、根底にある因果機構を損なうことなく、OODサンプルを特定のシフトのために調整する。理論的解析と実験結果から,目標シフトの解法における本手法の有効性が証明された。 Out-of-distribution (OOD) generalization deals with the prevalent learning scenario where test distribution shifts from training distribution. With rising application demands and inherent complexity, graph OOD problems call for specialized solutions. While data-centric methods exhibit performance enhancements on many generic machine learning tasks, there is a notable absence of data augmentation methods tailored for graph OOD generalization. In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation. The proposed augmentation strategy extrapolates both structure and feature spaces to generate OOD graph data. Our design tailors OOD samples for specific shifts without corrupting underlying causal mechanisms. Theoretical analysis and empirical results evidence the effectiveness of our method in solving target shifts, showing substantial and constant improvements across various graph OOD tasks.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 実世界のRAW画像からの効率的なHDR再構成 Efficient HDR Reconstruction from Real-World Raw Images ( http://arxiv.org/abs/2306.10311v5 ) ライセンス: Link先を確認	Qirui Yang, Yihao Liu, Qihua Chen, Huanjing Yue, Kun Li, Jingyu Yang,	(参考訳) エッジデバイスでの高解像度スクリーンの普及は、効率的な高ダイナミックレンジ(HDR)アルゴリズムへの強い需要を刺激する。しかし、既存の多くのHDR手法は不満足な結果をもたらすか、計算やメモリ資源を消費しすぎるかのいずれかであり、実際には高解像度の画像(通常12メガピクセル以上)への応用を妨げる。加えて、既存のHDRデータセット収集手法は労働集約的であることが多い。本研究では,HDRを生画像から直接再構成し,モバイルデバイスの展開に寄与する新しいニューラルネットワーク構造を探索する優れた機会を見出した。我々は,(1)高速かつ堅牢なHDRを実現するために構造的再パラメータ化手法RepUNetを開発し,(2)新しい計算生HDRデータ生成パイプラインを設計し,リアルな生HDRデータセットRealRaw-HDRを構築し,(3)限られた帯域幅条件下での動作ゴーストを緩和するためのプラグアンドプレイ動作アライメントロスを提案する。我々のモデルは830K未満のパラメータを含み、RTX 3090 GPUを用いて4K解像度の画像を処理するのに3ms未満である。このモデルでは,PSNR,SSIM,色差測定において,最先端HDR法よりも高い性能を示した。 The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing HDR dataset collection methods often are labor-intensive. In this work, in a new aspect, we discover an excellent opportunity for HDR reconstructing directly from raw images and investigating novel neural network structures that benefit the deployment of mobile devices. Our key insights are threefold: (1) we develop a lightweight-efficient HDR model, RepUNet, using the structural re-parameterization technique to achieve fast and robust HDR; (2) we design a new computational raw HDR data formation pipeline and construct a real-world raw HDR dataset, RealRaw-HDR; (3) we propose a plug-and-play motion alignment loss to mitigate motion ghosting under limited bandwidth conditions. Our model contains less than 830K parameters and takes less than 3 ms to process an image of 4K resolution using one RTX 3090 GPU. While being highly efficient, our model also outperforms the state-of-the-art HDR methods in terms of PSNR, SSIM, and a color difference metric.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 部分空間に制限された最適ミキサーと安定化形式 Optimal mixers restricted to subspaces and the stabilizer formalism ( http://arxiv.org/abs/2306.17083v4 ) ライセンス: Link先を確認	Franz G. Fuchs, Ruben Pariente Bassa,	(参考訳) 与えられた部分空間を保存するミキサーの理解と構築を両立させる新しい形式主義を提示する。この方法は、誤り訂正符号に使用される安定化器形式を接続して利用する。これは、組合せ最適化問題の解法として一般的なメタヒューリスティックである量子近似最適化アルゴリズム(QAOA)が、問題の制約が大きくて容易に指定可能な部分空間に導かれるような設定に適用される場合に有用である。提案手法は,制御されたノットゲートの数で資源効率のよいミキサーを構築する体系的な方法を提供し,よく知られたXとXYミキサーの一般化とGroverミキサーの緩和と理解することができる。得られた数値例では, 従来の結果と比較してCXゲートが劇的に減少していた。我々は、この部分空間を安定化器Sの符号空間に分割し、これらの符号空間に関連する論理回転Xゲートを連続的に適用するものとして理解することができるので、我々のアプローチを論理X-Mixerあるいは論理X QAOA(\textbf{LX-QAOA}$)と呼ぶ。全体として、この新しい視点が量子アルゴリズムの発展に関するさらなる洞察に繋がることを願っている。 We present a novel formalism to both understand and construct mixers that preserve a given subspace. The method connects and utilizes the stabilizer formalism that is used in error correcting codes. This can be useful in the setting when the quantum approximate optimization algorithm (QAOA), a popular meta-heuristic for solving combinatorial optimization problems, is applied in the setting where the constraints of the problem lead to a feasible subspace that is large but easy to specify. The proposed method gives a systematic way to construct mixers that are resource efficient in the number of controlled not gates and can be understood as a generalization of the well-known X and XY mixers and a relaxation of the Grover mixer: Given a basis of any subspace, a resource efficient mixer can be constructed that preserves the subspace. The numerical examples provided show a dramatic reduction of CX gates when compared to previous results. We call our approach logical X-Mixer or logical X QAOA ($\textbf{LX-QAOA}$), since it can be understood as dividing the subspace into code spaces of stabilizers S and consecutively applying logical rotational X gates associated with these code spaces. Overall, we hope that this new perspective can lead to further insight into the development of quantum algorithms.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 積分ゆらぎ定理とトレース保存写像 Integral fluctuation theorems and trace-preserving map ( http://arxiv.org/abs/2307.02705v3 ) ライセンス: Link先を確認	Zhiqiang Huang,	(参考訳) 詳細なゆらぎ定理はエントロピー生成確率の生成関数における対称性を意味する。積分ゆらぎ定理は、この対称性と確率の正規化から直接従う。本稿では,構築されたマッピングに計測と進化を統合することで,生成関数を書き換える。この写像は完全に正であり、元の積分FTはこれらの構築された写像のトレース保存性によって決定される。両浴間の固有状態変動定理と熱交換を議論し,本手法の利便性について述べる。この手法は準確率の生成関数にも適用でき、ここではこのアプローチから自然に生じるペッツの回復写像を観察する。 The detailed fluctuation theorem implies symmetry in the generating function of entropy production probability. The integral fluctuation theorem directly follows from this symmetry and the normalization of the probability. In this paper, we rewrite the generating function by integrating measurements and evolution into a constructed mapping. This mapping is completely positive, and the original integral FT is determined by the trace-preserving property of these constructed maps. We illustrate the convenience of this method by discussing the eigenstate fluctuation theorem and heat exchange between two baths. This set of methods is also applicable to the generating functions of quasi-probability, where we observe the Petz recovery map arising naturally from this approach.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# グリーンを追い越す: 植物が葉の裏に見えることを学ぶ Push Past Green: Learning to Look Behind Plant Foliage by Moving It ( http://arxiv.org/abs/2307.03175v2 ) ライセンス: Link先を確認	Xiaoyu Zhang, Saurabh Gupta,	(参考訳) 自律農業の応用(例えば、検査、表現型付け、果物の摘み取りなど)は、葉や枝の後ろを見るために植物の葉を操作する必要がある。部分的な可視性、極端に粗い構造、植物のための未知の幾何学と力学は、そのような操作を困難にしている。データ駆動方式でこれらの課題に取り組む。 SRPNetは、特定の植物に対する候補アクションの実行時に、どの空間が露呈しているかを予測するニューラルネットワークである。我々はSRPNetとクロスエントロピー法を用いて,植物の葉の下の空間を明らかにするのに有効な行動を予測する。さらに、SRPNetは、どれだけの空間が露光されるかだけでなく、どこでその空間が露光されるかを予測するだけでなく、植物の葉の下のより多くの空間を漸進的に露光する一連の行動を実行することができる。本研究は, 人工植物(Dracaena) と実植物(Dracaena) を, 新しい植物構成への一般化をテストする2つの設定を含む5つの物理的テストベッド上で実験した。本実験は,本手法が手作り探索法よりも有効であること,手作り力学モデルよりもSRPNetが有効であること,および関連する問題点を明らかにするものである。 Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# バス工学による光格子中のフロケットトポロジカル絶縁体の散逸性 Dissipative preparation of a Floquet topological insulator in an optical lattice via bath engineering ( http://arxiv.org/abs/2307.03739v3 ) ライセンス: Link先を確認	Alexander Schnell, Christof Weitenberg, André Eckardt,	(参考訳) フロケット工学は、光学格子中の電荷ニュートラル原子のトポロジカルに非自明なバンド構造を実現するための重要なツールである。しかし, 非自明な準エネルギー帯を完全充填したフェルミオンのトポロジカルバンド絶縁体型状態の調製は, 駆動加熱と不完全な断熱状態(トポロジカル遷移が通過する際の不可避ギャップ閉鎖によって引き起こされる)により困難である。提案された別の手順は、そのような状態、すなわちシステムと貯水池を結合する際に生じる定常状態として散逸的に準備することである。ここでは、熱浴として働く第2の原子種によって与えられる弱相互作用するボース凝縮物にシステムを結合する具体的なスキームについて論じる。我々の戦略は、浴室粒子のポテンシャルのエンジニアリングに依存しており、2次元系に垂直な弱い結合管を占有する。 Floquet-Born-Markov理論を用いて、駆動散逸系の結果として生じる非平衡定常状態がトポロジカル絶縁体に近似することを示す。異常なフロケ位相絶縁体の近似安定化の兆候も見いだすが、これは平衡で実現不可能な状態である。 Floquet engineering is an important tool for realizing topologically nontrivial band structures for charge-neutral atoms in optical lattices. However, the preparation of a topological-band-insulator-type state of fermions, with one nontrivial quasi-energy band filled completely and the others empty, is challenging as a result of both driving induced heating as well as imperfect adiabatic state preparation (with the latter induced by the unavoidable gap closing when passing the topological transition). An alternative procedure that has been proposed is to prepare such states dissipatively, i.e. as a steady state that emerges when coupling the system to reservoirs. Here we discuss a concrete scheme that couples the system to a weakly interacting Bose condensate given by second atomic species acting as a heat bath. Our strategy relies on the engineering of the potential for the bath particles, so that they occupy weakly coupled tubes perpendicular to the two-dimensional system. Using Floquet-Born-Markov theory, we show that the resulting nonequilibrium steady state of the driven-dissipative system approximates a topological insulator. We even find indications for the approximate stabilization of an anomalous Floquet topological insulator, a state that is impossible to realize in equilibrium.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 大規模言語モデルの時代に忘れられる権利:含意、課題、解決策 Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions ( http://arxiv.org/abs/2307.03941v4 ) ライセンス: Link先を確認	Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu,	(参考訳) Google Spain SL、Google Inc. v AEPD、Mario Costeja Gonz\'alezの裁定により最初に制定されたRTBFは、後に欧州連合の一般データ保護規則(GDPR)の下で、個人が個人データを削除する権利を廃止する権利として含まれた。具体的には、検索結果から情報を除外するために、個人が組織にリクエストを送ることができる。それは技術の進化の結果、重要な創発的な権利であった。近年,Large Language Models (LLM) が開発され,チャットボットでの利用により,LLM対応ソフトウェアシステムが普及している。しかし、RTBFから除外されることはない。検索エンジンが使用するインデックス化手法と比較して、LLMは情報を全く異なる方法で保存し、処理する。これにより、RTBFに準拠する上で新たな課題が生じる。本稿では、これらの課題を探求し、差分プライバシー、機械学習、モデル編集、ガードレールの使用など、RTBFの技術的ソリューションの実装方法に関する洞察を提供する。 AIの急速な進歩と、この強力な技術を規制する必要性の高まりにより、RTBFのケースから学んだことは、技術実践者、法律専門家、組織、当局に貴重な教訓を提供することができる。 The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. It was a significant emergent right as the result of the evolution of technology. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of differential privacy, machine unlearning, model editing, and guardrails. With the rapid advancement of AI and the increasing need of regulating this powerful technology, learning from the case of RTBF can provide valuable lessons for technical practitioners, legal experts, organizations, and authorities.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# ChatDev: ソフトウェア開発のためのコミュニケーションエージェント ChatDev: Communicative Agents for Software Development ( http://arxiv.org/abs/2307.07924v5 ) ライセンス: Link先を確認	Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun,	(参考訳) ソフトウェア開発は、多様なスキルを持つ複数のメンバ間の協力を必要とする複雑なタスクです。多くの研究が、デザイン、コーディング、テストなど、ウォーターフォールモデルの特定のフェーズを改善するためにディープラーニングを使用していた。しかし、各フェーズのディープラーニングモデルにはユニークな設計が必要であり、様々なフェーズにわたる技術的不整合が生じ、断片化され、非効率な開発プロセスがもたらされる。本稿では,大規模言語モデル(LLM)によって駆動される特殊なエージェントを(チャットチェーンを介して)コミュニケーションする方法と(コミュニケーション脱ハロシン化を介して)コミュニケーションする方法でガイドするチャット駆動ソフトウェア開発フレームワークChatDevを紹介する。これらのエージェントは、言語ベースの統一コミュニケーションを通じて設計、コーディング、テストフェーズに積極的に貢献する。自然言語の利用はシステム設計に有利であり、プログラミング言語でのコミュニケーションはデバッグに役立ちます。このパラダイムは,LLMエージェント間の自律的タスク解決のための統合ブリッジとして,言語コミュニケーションが多エージェント協調を促進することを示す。コードとデータはhttps://github.com/OpenBMB/ChatDevで公開されている。 Software development is a complex task that necessitates cooperation among multiple members with diverse skills. Numerous studies used deep learning to improve specific phases in a waterfall model, such as design, coding, and testing. However, the deep learning model in each phase requires unique designs, leading to technical inconsistencies across various phases, which results in a fragmented and ineffective development process. In this paper, we introduce ChatDev, a chat-powered software development framework in which specialized agents driven by large language models (LLMs) are guided in what to communicate (via chat chain) and how to communicate (via communicative dehallucination). These agents actively contribute to the design, coding, and testing phases through unified language-based communication, with solutions derived from their multi-turn dialogues. We found their utilization of natural language is advantageous for system design, and communicating in programming language proves helpful in debugging. This paradigm demonstrates how linguistic communication facilitates multi-agent collaboration, establishing language as a unifying bridge for autonomous task-solving among LLM agents. The code and data are available at https://github.com/OpenBMB/ChatDev.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 非線型射影による線形再帰の普遍性:有限幅保証と複素固有値の利点 Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues ( http://arxiv.org/abs/2307.11888v3 ) ライセンス: Link先を確認	Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith,	(参考訳) 線形RNNに基づくディープニューラルネットワークは、シーケンスモデリングの競争的アプローチとして、位置対応型MLPにインターリーブされた。そのようなアーキテクチャの例として、S4、LRU、Mambaのような状態空間モデル(SSM)がある。これらのアーキテクチャの有効性と計算効率を実証した実験的な証拠にもかかわらず、それらの表現力は、特に実際に重要な特定の選択(例えば、慎重に設計された初期化分布と複素数の潜在的使用)に関して、比較的未解明のままである。本稿では,MLPと実あるいは複素線形対角線再帰を組み合わせることで,正規因果列列列列の任意に正確な近似が導かれることを示す。線形RNNは入力シーケンスのロスレスエンコーディングを提供し、MPPはこのエンコーディングに対して非線形処理を行う。実対角線リカレンス(英語版)は、このアーキテクチャにおいて普遍性を達成するのに十分であることを示す一方で、単位円板近傍の複雑な固有値(つまり、S4で最も成功した戦略)を用いることは、情報保存においてRNNに大いに役立つことを証明している。我々はこの発見を、消滅する勾配問題と結びつけ、我々の主張を支持する実験を提供する。 Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence highlighting these architectures' effectiveness and computational efficiency, their expressive power remains relatively unexplored, especially in connection to specific choices crucial in practice - e.g., carefully designed initialization distribution and potential use of complex numbers. In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps. At the heart of our proof, we rely on a separation of concerns: the linear RNN provides a lossless encoding of the input sequence, and the MLP performs non-linear processing on this encoding. While we show that real diagonal linear recurrences are enough to achieve universality in this architecture, we prove that employing complex eigenvalues near unit disk - i.e., empirically the most successful strategy in S4 - greatly helps the RNN in storing information. We connect this finding with the vanishing gradient issue and provide experiments supporting our claims.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# ゼロショットモデル属性のモデル合成 Model Synthesis for Zero-Shot Model Attribution ( http://arxiv.org/abs/2307.15977v2 ) ライセンス: Link先を確認	Tianyun Yang, Juan Cao, Danding Wang, Chang Xu,	(参考訳) 現在、生成モデルは、芸術、デザイン、人間とコンピュータの相互作用といった様々な分野を形作っているが、著作権侵害やコンテンツ管理に関する課題も伴っている。既存の研究では、生成した画像のユニークな指紋を識別し、生成した画像をソースモデルに属性付けすることができる。しかし、既存の手法は、分類器訓練に含まれる静的セット内のモデルを特定することに制約されており、新しく出現した未確認モデルに動的に適応できない。このギャップを埋めるために,ゼロショット属性を生かした汎用型指紋抽出装置を開発し,トレーニング中に露出することなく効果的に未知のモデルを特徴付けることを目的とする。本手法の中心は,実世界の生成モデルの指紋パターンを模倣した多数の合成モデルを生成するモデル合成技術である。合成手法の設計は, 基本生成モデルのアーキテクチャ構築ブロックとパラメータが指紋パターンにどのように影響するかの観察によって動機付けられ, 合成モデルの忠実度と多様性を検証した2つの設計指標によって検証される。本実験は, 合成モデルのみに特化して訓練された指紋抽出装置において, 様々な実世界の生成モデルに対して, 印象的なゼロショット一般化を実現し, 既存手法と比較して, 未知モデルにおけるモデル同定と検証精度を40%以上向上することを示した。 Nowadays, generative models are shaping various fields such as art, design, and human-computer interaction, yet accompanied by challenges related to copyright infringement and content management. In response, existing research seeks to identify the unique fingerprints on the images they generate, which can be leveraged to attribute the generated images to their source models. Existing methods, however, are constrained to identifying models within a static set included in the classifier training, failing to adapt to newly emerged unseen models dynamically. To bridge this gap, we aim to develop a generalized model fingerprint extractor capable of zero-shot attribution, effectively attributes unseen models without exposure during training. Central to our method is a model synthesis technique, which generates numerous synthetic models mimicking the fingerprint patterns of real-world generative models. The design of the synthesis technique is motivated by observations on how the basic generative model's architecture building blocks and parameters influence fingerprint patterns, and it is validated through two designed metrics that examine synthetic models' fidelity and diversity. Our experiments demonstrate that this fingerprint extractor, trained solely on synthetic models, achieves impressive zero-shot generalization on a wide range of real-world generative models, improving model identification and verification accuracy on unseen models by over 40% and 15%, respectively, compared to existing approaches.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# UniAP: 混合整数擬似プログラミングによる層間および層内自動並列化 UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming ( http://arxiv.org/abs/2307.16375v3 ) ライセンス: Link先を確認	Hao Lin, Ke Wu, Jie Li, Jun Li, Wu-Jun Li,	(参考訳) 分散学習は、ディープラーニングモデル、特に大規模モデルのトレーニングに一般的に使用される。分散学習において、手動並列性(英語版)(MP)法はかなりの人的努力を必要とし、柔軟性に制限がある。したがって、並列戦略最適化プロセスを自動化するために、最近自動並列化法(AP)が提案されている。既存のAP法は、並列戦略の2つのカテゴリ(すなわち層間並列性と層間並列性)を共同で最適化しないため、準最適解に苦しむ。本論文では、混合整数二次計画法により層間および層間自動並列性を統一するUniAPと呼ばれる新しいAP手法を提案する。我々の知る限りでは、UniAPは並列戦略の2つのカテゴリを共同で最適化し、最適な解を見つけるための最初の並列手法である。実験の結果、UniAPは最先端のメソッドをスループット3.80$\times$で上回り、ストラテジー最適化時間を最大107$\times$で5つのTransformerベースのモデルで削減している。 Distributed learning is commonly used for training deep learning models, especially large models. In distributed learning, manual parallelism (MP) methods demand considerable human effort and have limited flexibility. Hence, automatic parallelism (AP) methods have recently been proposed for automating the parallel strategy optimization process. Existing AP methods suffer from sub-optimal solutions because they do not jointly optimize the two categories of parallel strategies (i.e., inter-layer parallelism and intra-layer parallelism). In this paper, we propose a novel AP method called UniAP, which unifies inter- and intra-layer automatic parallelism by mixed integer quadratic programming. To the best of our knowledge, UniAP is the first parallel method that can jointly optimize the two categories of parallel strategies to find an optimal solution. Experimental results show that UniAP outperforms state-of-the-art methods by up to 3.80$\times$ in throughput and reduces strategy optimization time by up to 107$\times$ across five Transformer-based models.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# TempFuser: 長期の短期核融合変換器を使って、アジャイル、戦術、およびアクロバティックな飛行マニアを学ぶ TempFuser: Learning Agile, Tactical, and Acrobatic Flight Maneuvers Using a Long Short-Term Temporal Fusion Transformer ( http://arxiv.org/abs/2308.03257v3 ) ライセンス: Link先を確認	Hyunki Seong, David Hyunchul Shim,	(参考訳) ドッグファイティングは、戦略的操作とアジャイル航空機の空気力学の両方を包括的に理解する必要がある航空アプリケーションにおいて難しいシナリオである。航空エージェントは、長期的視点から戦闘機の戦術的に進化する操縦を理解できるだけでなく、短期的な視点から航空機の空気力学を急速に変化させることも必要である。本稿では, 複雑なドッグファイト問題におけるアジャイル, 戦術的, アクロバティックな飛行操作を学習できる, 時間的長期統合型トランスフォーマーアーキテクチャである TempFuser を紹介する。当社のアプローチでは、2つの異なる時間的遷移の埋め込みをトランスフォーマーベースのネットワークに統合し、航空エージェントの長期的戦術と短期的機敏性の両方を包括的に捉える。これらの視点を取り入れることで、当社のポリシネットワークは、長期にわたって支配的な位置を確保し、効果的にアジャイル反対者を上回る、エンドツーエンドのフライトコマンドを生成します。高忠実度飛行シミュレーターで訓練した後、我々のモデルは戦略的な操作をうまく学習し、様々な種類の敵機に対して基本方針モデルより優れた性能を発揮する。特に,本モデルでは,明示的な事前知識を必要とせず,優れた仕様を持つ敵に面しても,人間のようなアクロバティックな操作が可能である。さらに,超音速・低高度の課題において,強靭な追尾性能を示す。デモビデオはhttps://sites.google.com/view/tempfuser.comで公開されている。 Dogfighting is a challenging scenario in aerial applications that requires a comprehensive understanding of both strategic maneuvers and the aerodynamics of agile aircraft. The aerial agent needs to not only understand tactically evolving maneuvers of fighter jets from a long-term perspective but also react to rapidly changing aerodynamics of aircraft from a short-term viewpoint. In this paper, we introduce TempFuser, a novel long short-term temporal fusion transformer architecture that can learn agile, tactical, and acrobatic flight maneuvers in complex dogfight problems. Our approach integrates two distinct temporal transition embeddings into a transformer-based network to comprehensively capture both the long-term tactics and short-term agility of aerial agents. By incorporating these perspectives, our policy network generates end-to-end flight commands that secure dominant positions over the long term and effectively outmaneuver agile opponents. After training in a high-fidelity flight simulator, our model successfully learns to execute strategic maneuvers, outperforming baseline policy models against various types of opponent aircraft. Notably, our model exhibits human-like acrobatic maneuvers even when facing adversaries with superior specifications, all without relying on explicit prior knowledge. Moreover, it demonstrates robust pursuit performance in challenging supersonic and low-altitude situations. Demo videos are available at https://sites.google.com/view/tempfuser.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# Copycatのパーセプトロン:集団学習でバリアを壊す The Copycat Perceptron: Smashing Barriers Through Collective Learning ( http://arxiv.org/abs/2308.03743v3 ) ライセンス: Link先を確認	Giovanni Catania, Aurélien Decelle, Beatriz Seoane,	(参考訳) 教師-学生のシナリオにおいて, 学生の重み間のハミング距離に比例した強磁性結合を, 適切なコスト関数を条件として, $y$結合二元パーセプトロンのモデルの平衡特性を特徴づける。最近の研究とは対照的に、各学生の一般化性能に影響を与える熱ノイズが存在するというより一般的な設定を解析する。非ゼロ温度条件では、レプリカのカップリングが$\alpha$の小さな値への位相図形の曲げにつながることが分かる: これは、自由エントロピーのランドスケープが、完全な一般化(すなわち教師)で解の周囲をより滑らかにし、シミュレートされたアナーリングのような標準的な熱更新アルゴリズムが教師の解にたどり着きやすくなり、非複製の場合、たとえ推論位相図の計算的 \textit{easy} 状態であってもメタスタブル状態に閉じ込められるのを避けることができることを示唆する。これらの結果は、最近推測されたReplicated Simulated Annealing (RSA) のベイズ最適性について、十分な数のレプリカに対して解析的および数値的な証拠を与える。学習の観点から、これらの結果は、複数の学生(この場合、同じデータをレビューする)が、協力的および連合的学習の文脈で活用できる特性として、同じルールを著しく高速かつ少ない例で学習できることを示唆している。 We characterize the equilibrium properties of a model of $y$ coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of $\alpha$: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as it happens in the unreplicated case, even in the computationally \textit{easy} regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing (RSA) for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.	翻訳日:2024-06-07 04:26:20 公開日:2024-06-05
# 数学的検証のための大規模言語モデルの前方逆推論 Forward-Backward Reasoning in Large Language Models for Mathematical Verification ( http://arxiv.org/abs/2308.07758v6 ) ライセンス: Link先を確認	Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok,	(参考訳) 自己整合性(Self-Consistency)は、さまざまな推論チェーンの回答をサンプリングし、多数決によって最終回答を選択する。前方推論に基づいており、飽和時により多くの推論鎖をサンプリングすることで、さらなる性能向上はできない。性能をさらに向上するため、候補解の検証に後方推論を導入する。具体的には、数学的なタスクに対して、質問の番号をマスキングし、単純なテンプレートによって作成された後方質問、すなわち、候補回答が提供されたときにマスクされた番号を予測するようLLMに求める。 FORward と BAckward Reasoning を組み合わせて検証する FOBAR を提案する。 6つの標準的な数学的データセットと3つのLCMに関する大規模な実験は、FOBARが最先端のパフォーマンスを達成することを示す。特に、FOBARはフォワード推論のみを使用し、フォワード推論とフォワード推論の組み合わせがより優れていることを示すセルフ一貫性よりも優れています。さらに、FOBARは既存の検証手法よりも優れた性能を示し、後方推論に使用される単純なテンプレートと提案した組み合わせの有効性を示した。非数学的問題への拡張も議論され、実証的に検証される。 Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and forward reasoning is better. In addition, FOBAR performs better than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination. Extensions to non-mathematical problems are also discussed and validated empirically.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# MultiPA:オープンレスポンスシナリオのためのマルチタスク音声発音評価モデル MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios ( http://arxiv.org/abs/2308.12490v2 ) ライセンス: Link先を確認	Yu-Wen Chen, Zhou Yu, Julia Hirschberg,	(参考訳) オープンレスポンスシナリオ用に設計された発音アセスメントモデルにより、ユーザーは実生活におけるコミュニケーションと同様の方法で言語スキルを実践することができる。しかし、従来のオープンレスポンスの発音評価モデルは、様々な面で総合的な評価を提供するのではなく、文レベルの精度などの単一の発音タスクに主に焦点を当てている。オープン応答に対する文レベルの精度, 流布度, 韻律, 単語レベルの精度評価を提供するマルチタスク発音評価モデルであるMultiPAを提案する。異なる発音課題間の相関について検討し,マルチタスク学習の利点を示した。我々のモデルは、既存のドメイン内データセットの最先端のパフォーマンスに達し、新たに収集したドメイン外データセットに効果的に一般化した。実世界の応用において,本モデルの実用性を示す実験結果が得られた。 Pronunciation assessment models designed for open response scenarios enable users to practice language skills in a manner similar to real-life communication. However, previous open-response pronunciation assessment models have predominantly focused on a single pronunciation task, such as sentence-level accuracy, rather than offering a comprehensive assessment in various aspects. We propose MultiPA, a Multitask Pronunciation Assessment model that provides sentence-level accuracy, fluency, prosody, and word-level accuracy assessment for open responses. We examined the correlation between different pronunciation tasks and showed the benefits of multi-task learning. Our model reached the state-of-the-art performance on existing in-domain data sets and effectively generalized to an out-of-domain dataset that we newly collected. The experimental results demonstrate the practical utility of our model in real-world applications.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# トポロジーによる解離学習 Disentanglement Learning via Topology ( http://arxiv.org/abs/2308.12696v4 ) ライセンス: Link先を確認	Nikita Balabin, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov,	(参考訳) マルチスケールなトポロジ的損失項を付加することにより,不整合表現を学習するTopDis(トポロジカル・ディアンタングルメント)を提案する。ディスタングルメントは、ディープラーニングモデルの説明可能性と堅牢性、およびハイレベル認知へのステップにとって重要なデータ表現の重要な特性である。最先端の手法はVAEに基づいており、潜在変数の共分散を分解することを奨励する。データ多様体のトポロジ的性質を解析することにより、解離について異なる視点を採る。特に,データ多様体のトポロジ的類似性を最適化する。我々の知識を最大限に活用するために,本論文は,解離学習のための微分可能な位相損失を提案する最初の論文である。提案したTopDis損失は,再建品質を保ちながら,MIG,FacterVAEスコア,SAPスコア,DCIアンタングルメントスコアなどのアンタングルメントスコアを改善した。我々の手法は教師なしの方法で動作し、変動要因をラベル付けせずに問題に適用することができる。 TopDisの損失は、変動の要因が相関している場合でも機能する。さらに, 提案した位相損失を用いて, 訓練されたGANにおいて, 絡み合った方向を求める方法を示す。 We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding a multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art methods are based on VAE and encourage the joint distribution of latent variables to be factorized. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement learning. Our experiments have shown that the proposed TopDis loss improves disentanglement scores such as MIG, FactorVAE score, SAP score, and DCI disentanglement score with respect to state-of-the-art results while preserving the reconstruction quality. Our method works in an unsupervised manner, permitting us to apply it to problems without labeled factors of variation. The TopDis loss works even when factors of variation are correlated. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# クーパー対スプリッターを用いたフェルミオン量子計算 Fermionic quantum computation with Cooper pair splitters ( http://arxiv.org/abs/2309.00447v4 ) ライセンス: Link先を確認	Kostas Vilkelis, Antonio Manesco, Juan Daniel Torres Luna, Sebastian Miles, Michael Wimmer, Anton Akhmerov,	(参考訳) 量子ビットではなく局所フェルミオンモード(LFM)を用いる普遍量子コンピュータの実践的実装を提案する。デバイスレイアウトは、ハイブリッド超伝導島で結合された量子ドットトンネルと、ドット間の可変容量結合からなる。クーパー対分割, 弾性コツネリング, クーロン相互作用のコヒーレント制御により, ブラヴィイとキタエフによって定義された量子ゲートの普遍的な集合を実現できることを示す。電荷量子ビットとの類似性のため、電荷ノイズがデコヒーレンスの主な原因になると期待する。このため、量子ドットが超伝導体に調整可能な結合を持つような代替設計も検討する。この第2のデバイス設計では、局所フェルミオンモードが電荷中立であるスイートスポットが存在し、ノイズ効果に敏感であることを示す。最後に、設計と実験的制約を比較し、それらを克服するための今後の取り組みを提案する。 We propose a practical implementation of a universal quantum computer that uses local fermionic modes (LFM) rather than qubits. The device layout consists of quantum dots tunnel coupled by a hybrid superconducting island and a tunable capacitive coupling between the dots. We show that coherent control of Cooper pair splitting, elastic cotunneling, and Coulomb interactions allows us to implement the universal set of quantum gates defined by Bravyi and Kitaev. Due to the similarity with charge qubits, we expect charge noise to be the main source of decoherence. For this reason, we also consider an alternative design where the quantum dots have tunable coupling to the superconductor. In this second device design, we show that there is a sweetspot for which the local fermionic modes are charge neutral, making the device insensitive to charge noise effects. Finally, we compare both designs and their experimental limitations and suggest future efforts to overcome them.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# CONFIDERAI: 説明可能で信頼性の高い人工知能のための新しいコンフォーマル・インタプリタブル・バイ・デザインスコア関数 CONFIDERAI: a novel CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence ( http://arxiv.org/abs/2309.01778v3 ) ライセンス: Link先を確認	Sara Narteni, Alberto Carlevaro, Fabrizio Dabbene, Marco Muselli, Maurizio Mongelli,	(参考訳) 日々の生活は人工知能の影響をますます受けており、機械学習アルゴリズムが誰にとっても信頼性と信頼性を持つように設計されていることに疑いの余地はない。具体的には、コンピュータ科学者は、人工知能システムが説明可能性、堅牢性、透明性、公正性、プライバシーの5つの柱を満たす場合、安全で信頼性の高いシステムだと考えている。これら5つに加えて,機械学習者が期待するようにシステムが振る舞う確率的保証という6つの基本的側面を提案する。本稿では,ルールの予測能力,ルール境界内の点の幾何学的位置,および規則間の重なり合いを利用したルールベース分類器の新しいスコア関数を,幾何学的規則類似項の定義により定義することにより,共形予測と説明可能な機械学習を関連付ける手法を提案する。さらに, 整合性保証を満たす特徴空間内の領域定義の問題に対処し, 整合性臨界集合の定義を利用して, 対象クラスの性能を改善した新しいルールを実現する方法を示す。全体的な方法論は、ドメイン名サーバのトンネリング検出や心臓血管疾患の予測など、現実の関心のあるいくつかのデータセットで有望な結果でテストされている。 Everyday life is increasingly influenced by artificial intelligence, and there is no question that machine learning algorithms must be designed to be reliable and trustworthy for everyone. Specifically, computer scientists consider an artificial intelligence system safe and trustworthy if it fulfills five pillars: explainability, robustness, transparency, fairness, and privacy. In addition to these five, we propose a sixth fundamental aspect: conformity, that is, the probabilistic assurance that the system will behave as the machine learner expects. In this paper, we present a methodology to link conformal prediction with explainable machine learning by defining a new score function for rule-based classifiers that leverages rules predictive ability, the geometrical position of points within rules boundaries and the overlaps among rules as well, thanks to the definition of a geometrical rule similarity term. Furthermore, we address the problem of defining regions in the feature space where conformal guarantees are satisfied, by exploiting the definition of conformal critical set and showing how this set can be used to achieve new rules with improved performance on the target class. The overall methodology is tested with promising results on several datasets of real-world interest, such as domain name server tunneling detection or cardiovascular disease prediction.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# オンライン連続学習におけるモメンタム知識蒸留の再考 Rethinking Momentum Knowledge Distillation in Online Continual Learning ( http://arxiv.org/abs/2309.02870v2 ) ライセンス: Link先を確認	Nicolas Michel, Maorong Wang, Ling Xiao, Toshihiko Yamasaki,	(参考訳) オンライン連続学習(OCL)は、複数の分類タスクが順番に現れる連続データストリーム上で、ニューラルネットワークをトレーニングする問題に対処する。オフラインの連続学習とは対照的に、データはOCLで一度しか見ることができない。この文脈では、リプレイベースの戦略は印象的な成果を上げており、ほとんどの最先端のアプローチはそれらに大きく依存している。知識蒸留(KD)はオフラインの連続学習で広く使われているが、OCLでは高い可能性にもかかわらず未公開のままである。本稿では、OCLにKDを適用する際の課題を分析し、実証的な正当化を与える。我々は,多くの旗艦OCL法にMKD(Momentum Knowledge Distillation)を適用するための直接的かつ効果的な手法を導入し,既存のアプローチを強化する能力を実証する。 ImageNet100の既存の最先端の精度を10\%以上向上することに加えて、私たちは、OCLでのトレーニング中にMKDの内部力学と影響に光を当てました。リプレイと同様、MKDはOCLの中心的なコンポーネントであるべきだと我々は主張する。コードは \url{https://github.com/Nicolas1203/mkd_ocl} で公開されている。 Online Continual Learning (OCL) addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence. In contrast to offline Continual Learning, data can be seen only once in OCL, which is a very severe constraint. In this context, replay-based strategies have achieved impressive results and most state-of-the-art approaches heavily depend on them. While Knowledge Distillation (KD) has been extensively used in offline Continual Learning, it remains under-exploited in OCL, despite its high potential. In this paper, we analyze the challenges in applying KD to OCL and give empirical justifications. We introduce a direct yet effective methodology for applying Momentum Knowledge Distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-art accuracy by more than $10\%$ points on ImageNet100, we shed light on MKD internal mechanics and impacts during training in OCL. We argue that similar to replay, MKD should be considered a central component of OCL. The code is available at \url{https://github.com/Nicolas1203/mkd_ocl}.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 大規模言語モデルはソーシャルメディア利用者の心理的配置を推測できる Large Language Models Can Infer Psychological Dispositions of Social Media Users ( http://arxiv.org/abs/2309.08631v2 ) ライセンス: Link先を確認	Heinrich Peters, Sandra Matz,	(参考訳) 大規模言語モデル(LLM)は、多種多様なタスクにまたがって、ますます人間のような能力を示す。本稿では,ChatGPT のような LLM がソーシャルメディア利用者の心理的配置を正確に推測できるかどうか,その能力が社会デミノグラフィーグループによって異なるかを検討する。具体的には、GPT-3.5とGPT-4は、ゼロショット学習シナリオにおいて、ユーザのFacebookステータス更新からビッグファイブの性格特性を導出できるかどうかを検証する。その結果, LLM-inferred と self-reported trait scores の r = .29 (range = [.22, .33]) の平均相関は, 人格を推定するために特別に訓練された教師付き機械学習モデルと類似した精度であることがわかった。また,年齢の異なるグループや性別のカテゴリーで人格推定の精度が不均一であることも明らかにした。女性や若年者に対して,いくつかの特徴についてより正確であることから,基礎となるトレーニングデータやオンライン自己表現の相違から生じる潜在的なバイアスが示唆された。 LLMがユーザ生成テキストから心理的配置を推測する能力は、研究者と実践者の両方にとって安価でスケーラブルな心理測定アセスメントへのアクセスを民主化する可能性がある。一方で、この民主化は、個人化されたサービスにおいて、生態的妥当性の高い大規模研究を促進し、イノベーションを喚起する可能性がある。一方で、ユーザープライバシと自己決定に関する倫理的懸念を提起し、厳格な倫理的枠組みと規制の必要性を強調している。 Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores - a level of accuracy that is similar to that of supervised machine learning models specifically trained to infer personality. Our findings also highlight heterogeneity in the accuracy of personality inferences across different age groups and gender categories: predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression. The ability of LLMs to infer psychological dispositions from user-generated text has the potential to democratize access to cheap and scalable psychometric assessments for both researchers and practitioners. On the one hand, this democratization might facilitate large-scale research of high ecological validity and spark innovation in personalized services. On the other hand, it also raises ethical concerns regarding user privacy and self-determination, highlighting the need for stringent ethical frameworks and regulation.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 分岐境界におけるノード選択のための強化学習 Reinforcement Learning for Node Selection in Branch-and-Bound ( http://arxiv.org/abs/2310.00112v2 ) ライセンス: Link先を確認	Alexander Mattick, Christopher Mutschler,	(参考訳) ブランチとバウンドにおける大きな課題は、検索ツリー内の最適なノードを特定することにある。現在の最先端セレクタは手作りのアンサンブルを使用して、ナイーブなサブノードセレクタと、個々のノードデータに依存する学習ノードセレクタを自動的に切り替える。孤立ノードではなく木の状態全体を考慮しながら強化学習(RL)を用いる新しいシミュレーション手法を提案する。これを実現するために、モデル根から「選択すべき」葉への経路に基づいて確率分布を生成するグラフニューラルネットワークを訓練する。ノード選択を確率分布としてモデル化することで、本質的なノード品質とノード評価コストの両方をキャプチャする最先端のRL技術を用いてモデルを訓練することができる。提案手法は,TSP(Synthetic Travelling Salesmen problem)インスタンスでのみ訓練されているにもかかわらず,多種多様な複雑な問題集合に対して高品質なノード選択ポリシーを誘導する。このような固定事前訓練ポリシーを用いることで、厳しい時間制約下での最適性ギャップ削減とノード単位の効率において、いくつかのベンチマークにおいて顕著な改善が示される。 A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes. To achieve this, we train a graph neural network that produces a probability distribution based on the path from the model's root to its "to-be-selected" leaves. Modelling node-selection as a probability distribution allows us to train the model using state-of-the-art RL techniques that capture both intrinsic node-quality and node-evaluation costs. Our method induces a high quality node selection policy on a set of varied and complex problem sets, despite only being trained on specially designed, synthetic travelling salesmen problem (TSP) instances. Using such a fixed pretrained policy shows significant improvements on several benchmarks in optimality gap reductions and per-node efficiency under strict time constraints.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# HarmonyDream:世界モデル内でのタスクハーモニゼーション HarmonyDream: Task Harmonization Inside World Models ( http://arxiv.org/abs/2310.00344v3 ) ライセンス: Link先を確認	Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long,	(参考訳) モデルベース強化学習(MBRL)は、環境がどのように機能するかをモデル化し、典型的には2つのタスク、すなわち観察モデリングと報酬モデリングを包含する世界モデルを活用することで、サンプル効率の学習を約束する。本稿では,世界モデルにおいて各タスクが果たす役割について,専用の実証研究を通じてより深く理解し,見落としているサンプル効率のMBRLの可能性を明らかにする。我々の重要な洞察は、明示的なMBRLの一般的なアプローチは、観測モデルを通して環境の豊富な詳細を復元しようとするが、環境の複雑さと限られたモデル容量のために困難であるということである。一方、報酬モデルでは、暗黙のMBRLに支配的であり、コンパクトなタスク中心のダイナミクスを学習する能力は低いが、より豊かな学習信号を持たないサンプル効率の学習には不十分である。これらの知見と発見に触発されて,世界モデル学習における2つのタスク間の動的平衡性を維持するために,損失係数を自動的に調整する,シンプルで効果的なアプローチであるHarmonyDreamを提案する。実験の結果,HarmonyDreamをベースとしたMBRL法では,視覚ロボティクスの絶対性能が10%-69%向上し,Atari 100Kベンチマークに新たな最先端結果が得られた。コードはhttps://github.com/thuml/HarmonyDream.comで入手できる。 Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Motivated by these insights and discoveries, we propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization, i.e. a dynamic equilibrium between the two tasks in world model learning. Our experiments show that the base MBRL method equipped with HarmonyDream gains 10%-69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark. Code is available at https://github.com/thuml/HarmonyDream.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 直接メトリクス最適化としての言語モデルデコーディング Language Model Decoding as Direct Metrics Optimization ( http://arxiv.org/abs/2310.01041v2 ) ライセンス: Link先を確認	Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang,	(参考訳) 言語モデリングの顕著な進歩にもかかわらず、現在の主流の復号法は、異なる側面にわたる人間のテキストと整合するテキストを生成するのに依然として苦労している。特に、サンプリングベースの手法は、しばしば言論において不規則である少ない反復テキストを生成するが、検索ベースの手法は繰り返しの増大を犠牲にしてトピックコヒーレンスを維持する。全体として、これらの手法は幅広い側面にわたる全体的アライメントを達成するには不十分である。本研究では,言語モデルからの復号化を最適化問題として,所望のアスペクトの複数のメトリクスで測定された人文と期待性能を厳密にマッチングすることを目的としている。結果として得られる復号化分布は、これらの指標によって定義されたシーケンスレベルのエネルギー関数を介して入力言語モデルの分布をスケールする分析解を享受する。そして、最も重要なことは、この誘導された分布が人間のテキストの難易度を向上させることが保証されていることを示し、人間のテキストの基盤となる分布に対するより良い近似が示唆される。グローバルな正規化分布から抽出可能なサンプリングを容易にするため,サンプリング・インポータンス・サンプリング手法を採用した。各種領域実験とモデルスケール実験により,本手法がヒトのテキストに適合する指標や,強いベースラインに対する人的評価において優位性を示した。 Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive texts which are often disjunctive in discourse, while search-based methods maintain topic coherence at the cost of increased repetition. Overall, these methods fall short in achieving holistic alignment across a broad range of aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution enjoys an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts. To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Experiments on various domains and model scales demonstrate the superiority of our method in metrics alignment with human texts and human evaluation over strong baselines.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 知識グラフにおけるグラフパターンクエリの解答のためのニューロシンボリックフレームワーク A Neuro-Symbolic Framework for Answering Graph Pattern Queries in Knowledge Graphs ( http://arxiv.org/abs/2310.04598v2 ) ライセンス: Link先を確認	Tamara Cucumides, Daniel Daza, Pablo Barceló, Michael Cochez, Floris Geerts, Juan L Reutter, Miguel Romero,	(参考訳) 不完全な知識グラフに対してグラフクエリに答えることの課題は、機械学習コミュニティで大きな注目を集めている。ニューロシンボリックモデルは、優れた性能と高い解釈可能性を組み合わせた、有望なアプローチとして現れている。これらのモデルは、訓練されたアーキテクチャを使用して、アトミッククエリを実行し、シンボリッククエリ演算子を模倣するモジュールを統合する。しかし、ほとんどのニューロシンボリッククエリプロセッサは木のようなグラフパターンクエリに制約されている。これらのクエリは、一定値のボトムアップ実行や、葉のアンカー、ルートのターゲット変数を許容する。表現力のある木のようなクエリは、エンティティ間の複数エッジの存在や三角形の存在など、知識グラフにおける重要な特性を捉えることができない。非完全知識グラフ上で任意のグラフパターンクエリに応答するフレームワークを導入する。これらのクエリのクラスは実用的な応用には不可欠であるが、現在のほとんどのニューロシンボリックモデルの範囲を超えている。提案手法では,循環パターンの非循環的トラバーサルを容易にする近似手法を用いて,クエリ実行プロセスに新たなシンボルバイアスを埋め込む。実験により,本フレームワークは3つのデータセット上で競合的に動作し,周期的クエリを近似戦略により効果的に処理できることが確認された。さらに、アンカー木のようなクエリ上での既存のニューロシンボリックモデルの性能を維持し、その能力を存在量化変数を持つクエリに拡張する。 The challenge of answering graph queries over incomplete knowledge graphs is gaining significant attention in the machine learning community. Neuro-symbolic models have emerged as a promising approach, combining good performance with high interpretability. These models utilize trained architectures to execute atomic queries and integrate modules that mimic symbolic query operators. However, most neuro-symbolic query processors are constrained to tree-like graph pattern queries. These queries admit a bottom-up execution with constant values or anchors at the leaves and the target variable at the root. While expressive, tree-like queries fail to capture critical properties in knowledge graphs, such as the existence of multiple edges between entities or the presence of triangles. We introduce a framework for answering arbitrary graph pattern queries over incomplete knowledge graphs, encompassing both cyclic queries and tree-like queries with existentially quantified leaves. These classes of queries are vital for practical applications but are beyond the scope of most current neuro-symbolic models. Our approach employs an approximation scheme that facilitates acyclic traversals for cyclic patterns, thereby embedding additional symbolic bias into the query execution process. Our experimental evaluation demonstrates that our framework performs competitively on three datasets, effectively handling cyclic queries through our approximation strategy. Additionally, it maintains the performance of existing neuro-symbolic models on anchored tree-like queries and extends their capabilities to queries with existentially quantified variables.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 繰り返し拘束された部分観測可能なマルコフ決定過程 Recursively-Constrained Partially Observable Markov Decision Processes ( http://arxiv.org/abs/2310.09688v3 ) ライセンス: Link先を確認	Qi Heng Ho, Tyler Becker, Benjamin Kraske, Zakariya Laouar, Martin S. Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg,	(参考訳) 多くのシーケンシャルな決定問題は、1つの目的関数を最適化し、他の目的に制約を課す。制約付き部分可観測マルコフ決定過程(C-POMDP)は、遷移の不確実性と部分可観測性をモデル化する。本研究は,C-POMDPが連続的な決定ステップに対して最適部分構造特性に反し,いくつかの(安全クリティカルな)アプリケーションでは望ましくない動作を示すことを最初に示す。さらに、C-POMDPのオンライン再計画は、この違反による不整合のため、しばしば効果がない。これらの欠点に対処するために、C-POMDPに履歴依存のコスト制約を加えるRecursively-Constrained POMDP(RC-POMDP)を導入する。 C-POMDPとは異なり、RC-POMDPは常に決定論的最適ポリシーを持ち、最適ポリシーはベルマンの最適性原理に従うことを示す。また,RC-POMDPに対するポイントベース動的プログラミングアルゴリズムを提案する。ベンチマーク問題の評価は,提案アルゴリズムの有効性を示し,C-POMDPのポリシーよりもRC-POMDPのポリシーの方が望ましい行動をもたらすことを示した。 Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.	翻訳日:2024-06-07 04:16:10 公開日:2024-06-05
# 大規模言語モデルを用いたエンティティマッチング Entity Matching using Large Language Models ( http://arxiv.org/abs/2310.11244v3 ) ライセンス: Link先を確認	Ralph Peeters, Christian Bizer,	(参考訳) エンティティマッチングは、2つのエンティティ記述が同じ現実世界のエンティティを指すかどうかを決定するタスクであり、ほとんどのデータ統合パイプラインにおいて中心的なステップである。多くの最先端エンティティマッチング方法は、BERTやRoBERTaのような事前訓練された言語モデル(PLM)に依存している。エンティティマッチングにおけるこれらのモデルの2つの大きな欠点は、それらである。一相当量のタスク特化訓練データを必要とするモデル (ii) 細調整されたモデルは分布外エンティティに関して堅牢ではない。本稿では, PLM ベースのマーカに代わる, タスク依存のトレーニングモデルとして, ジェネレーティブな大規模言語モデル (LLM) を用いて検討する。我々の研究は、ローカルで実行できるLLMをホストおよびオープンソースでカバーしています。我々は、これらのモデルをゼロショットシナリオとタスク固有のトレーニングデータが利用できるシナリオで評価する。異なるプロンプト設計とモデルの迅速な感度を比較し、最高のプロンプトはひとつもないが、各モデル/データセットの組み合わせに合わせて調整する必要があることを示す。我々はさらに調査する i) 文脈内デモンストレーションの選択 (二)一致規則の生成及び一致規則三同じトレーニングデータのプールを用いてホストLDMを微調整すること。実験の結果, 数千の例を用いて微調整したPLMと同じような動作を行うには, 最高のLCMは, ほとんど, あるいはわずかの訓練例を必要としないことがわかった。 LLMベースのマーカはさらに、目に見えないエンティティに対して高いロバスト性を示す。 GPT4は一致判定のための構造化された説明を生成することができることを示す。モデルは、間違った判断の説明を分析することによって、一致したエラーの潜在的な原因を自動的に特定することができる。モデルが識別されたエラークラスの意味のあるテキスト記述を生成することを実証し、データエンジニアがエンティティマッチングパイプラインを改善するのに役立つことを実証した。 Entity Matching is the task of deciding whether two entity descriptions refer to the same real-world entity and is a central step in most data integration pipelines. Many state-of-the-art entity matching methods rely on pre-trained language models (PLMs) such as BERT or RoBERTa. Two major drawbacks of these models for entity matching are that (i) the models require significant amounts of task-specific training data and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. This paper investigates using generative large language models (LLMs) as a less task-specific training data-dependent and more robust alternative to PLM-based matchers. Our study covers hosted and open-source LLMs, which can be run locally. We evaluate these models in a zero-shot scenario and a scenario where task-specific training data is available. We compare different prompt designs and the prompt sensitivity of the models and show that there is no single best prompt but needs to be tuned for each model/dataset combination. We further investigate (i) the selection of in-context demonstrations, (ii) the generation of matching rules, as well as (iii) fine-tuning a hosted LLM using the same pool of training data. Our experiments show that the best LLMs require no or only a few training examples to perform similarly to PLMs that were fine-tuned using thousands of examples. LLM-based matchers further exhibit higher robustness to unseen entities. We show that GPT4 can generate structured explanations for matching decisions. The model can automatically identify potential causes of matching errors by analyzing explanations of wrong decisions. We demonstrate that the model can generate meaningful textual descriptions of the identified error classes, which can help data engineers improve entity matching pipelines.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# partialFormer: 機械翻訳のための全体ではなく、モデリング部分 PartialFormer: Modeling Part Instead of Whole for Machine Translation ( http://arxiv.org/abs/2310.14921v2 ) ライセンス: Link先を確認	Tong Zheng, Bei Li, Huiwen Bao, Jiale Wang, Weiqiao Shan, Tong Xiao, Jingbo Zhu,	(参考訳) Transformerフィードフォワードニューラルネットワークの設計選択により、計算とパラメータのオーバーヘッドが大きくなった。本稿では,従来のアーキテクチャでは見過ごされがちな軽量FFNの設計において,隠れ次元の重要性を強調した。この原理により,複数の小さなFFNを用いたパラメータ効率の高いトランスフォーマーアーキテクチャであるPartialFormerを導入し,パラメータや計算量を削減するとともに,本質的な隠蔽次元を維持した。これらの小さなFFNは、効果的なコラボレーションのためのマルチヘッドアテンションメカニズムに統合される。また、PartialFormerの機能を強化するために、カスタマイズされたヘッドスケーリング戦略を提案する。さらに,DepartFormer内での深度スケーリングを改善するために,残差型アテンション計算を提案する。 9つの翻訳タスクと1つの抽象的な要約タスクに関する広範囲な実験により、機械翻訳および要約タスクにおける部分ホルマーアプローチの有効性が検証された。私たちのコードは、https://github.com/zhengkid/PartialFormer.comで利用可能です。 The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention mechanism for effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer's capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach on machine translation and summarization tasks. Our code would be available at: https://github.com/zhengkid/PartialFormer.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# FollowBench: 大規模言語モデルのベンチマークに続くマルチレベルきめ細かい制約 FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models ( http://arxiv.org/abs/2310.20410v3 ) ライセンス: Link先を確認	Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei Mi, Lifeng Shang, Xin Jiang, Qun Liu, Wei Wang,	(参考訳) 命令に従う能力は、LLM(Large Language Models)が様々な現実世界のアプリケーションを扱うために不可欠である。既存のベンチマークは主に、命令に記載された制約に従って応答を評価するのではなく、純粋な応答品質を評価することに焦点を当てている。本研究のギャップを埋めるために,LLMのベンチマークに追従する多レベルきめ細粒度制約であるFollowBenchを提案する。 FollowBenchは、きめ細かい制約の5つの異なるタイプ(コンテンツ、状況、スタイル、フォーマット、例)を包括的に含んでいる。多様な難易度を推定する上で正確な制約を実現するために,各増加レベルにおいて初期命令に1つの制約を漸進的に付加するマルチレベル機構を導入する。 LLMの出力が個々の制約をすべて満たしたかどうかを評価するため,制約進化経路を持つ強いLCMをオープンエンド命令に対処するために提案する。 FollowBench上での13のオープンソースおよびオープンソースLLMの評価により,今後の研究への道のりを示唆する指導におけるLLMの弱点を浮き彫りにしている。データとコードはhttps://github.com/YJiangcm/FollowBench.comで公開されている。 The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response follows constraints stated in the instruction. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs. FollowBench comprehensively includes five different types (i.e., Content, Situation, Style, Format, and Example) of fine-grained constraints. To enable a precise constraint following estimation on diverse difficulties, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level. To assess whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with constraint-evolution paths to handle challenging open-ended instructions. By evaluating 13 closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work. The data and code are publicly available at https://github.com/YJiangcm/FollowBench.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# ベイズ状態推定のためのハールランダムとかなり良い測定 Haar-random and pretty good measurements for Bayesian state estimation ( http://arxiv.org/abs/2310.20565v2 ) ライセンス: Link先を確認	Maria Quadeer,	(参考訳) 本研究では,Haar-random基底とベイズ状態推定法について検討した。 N$ Haar-random基底が与えられたとき、純粋な状態の均一なアンサンブルに対して、そのようなランダムな測定のIDD列上で平均化された忠実度に束縛される。混合量子状態のアンサンブルに対して、ユニタリな2-設計によって定義される測度は、ハールランダムなユニタリ(英語版)(Haar random unitary)によって定義されるものに近いが、パウリ群は弱い下界のみを与える。単発更新では、Petzリカバリマップを用いて、かなり良いベイズ平均推定値が得られることを示す。 We study Haar-random bases and pretty good measurement for Bayesian state estimation. Given $N$ Haar-random bases we derive a bound on fidelity averaged over IID sequences of such random measurements for a uniform ensemble of pure states. For ensembles of mixed qubit states, we find that measurements defined through unitary 2-designs closely approximate those defined via Haar random unitaries while the Pauli group only gives a weak lower bound. For a single-shot-update, we show using the Petz recovery map for pretty good measurement that it can give pretty good Bayesian mean estimates.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 最小限に修正されたマルコフゲームは、あらゆるナッシュ均衡と価値を得る Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value ( http://arxiv.org/abs/2311.00582v3 ) ライセンス: Link先を確認	Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie,	(参考訳) 本稿では,ゲーム修正問題について検討する。このゲーム修正問題では,ゼロサムマルコフゲームの報酬関数を,目標決定的あるいは確率的ポリシープロファイルが独自のマルコフ完全ナッシュ均衡となり,目標範囲内に値を持つように変更コストを最小限に抑える方法として,ゼロサムマルコフゲームの報酬関数を変更する。ゲーム内の一意平衡としてインストール可能なポリシープロファイルの集合を特徴付け,インストールを成功させるために十分な,必要な条件を確立する。線形制約で凸最適化問題を解き、次にランダムな摂動を行い、ほぼ最適コストで修正計画を得る効率的なアルゴリズムを提案する。 We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of some game, and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm, which solves a convex optimization problem with linear constraints and then performs random perturbation, to obtain a modification plan with a near-optimal cost.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# S-LoRA: 数千の同時LoRAアダプタ S-LoRA: Serving Thousands of Concurrent LoRA Adapters ( http://arxiv.org/abs/2311.03285v3 ) ライセンス: Link先を確認	Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica,	(参考訳) Pretrain-then-finetune"パラダイムは、大きな言語モデルのデプロイに一般的に採用されている。パラメータ効率のよい微調整法であるLoRA(Lo-Rank Adaptation)は、ベースモデルを複数のタスクに適応させるのによく使われ、結果として1つのベースモデルから派生したLoRAアダプタのかなりのコレクションとなる。我々は,このパラダイムが提供中のバッチ推論に重要な機会をもたらすことを観察した。これらの機会を生かして,多くのLoRAアダプタのスケーラブルな提供を目的としたシステムであるS-LoRAを提案する。 S-LoRAは、すべてのアダプタをメインメモリに格納し、現在実行中のクエリが使用するアダプタをGPUメモリにフェッチする。 GPUメモリを効率的に使用し、フラグメンテーションを低減するため、S-LoRAはUnified Pagingを提案する。 Unified Pagingは統一メモリプールを使用して、異なるランクの動的アダプタウェイトと異なるシーケンス長のKVキャッシュテンソルを管理する。さらに、S-LoRAは、新しいテンソル並列化戦略と高度に最適化されたカスタムCUDAカーネルを用いて、LoRA計算の不均一なバッチ処理を行う。これらの機能により、S-LoRAは単一のGPU上で、あるいは小さなオーバーヘッドで複数のGPU上で数千のLoRAアダプタを提供することができる。 HuggingFace PEFTやvLLMのような最先端のライブラリと比較すると、S-LoRAはスループットを最大4倍改善し、サービスアダプタの数を桁違いに増やすことができる。その結果、S-LoRAは多くのタスク固有の細調整されたモデルのスケーラブルな提供を可能にし、大規模にカスタマイズされた細調整サービスの可能性を秘めている。コードはhttps://github.com/S-LoRA/S-LoRAで公開されている。 The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services. The code is available at https://github.com/S-LoRA/S-LoRA	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 量子回路の最適化 Quantum Circuit Unoptimization ( http://arxiv.org/abs/2311.03805v2 ) ライセンス: Link先を確認	Yusei Mori, Hideaki Hakoshima, Kyohei Sudo, Toshio Mori, Kosuke Mitarai, Keisuke Fujii,	(参考訳) 回路の最適化は、量子コンピュータと古典コンピュータの両方にとって、その効率を改善するために欠かせない課題である。対照的に、古典論理の最適化は困難であることが知られており、これまで多くのヒューリスティックなアプローチが開発されてきた。本研究では,回路等価性,すなわち回路最適化の逆演算を保ちながら,いくつかの冗長性を導入し,与えられた量子回路複合体を構成する量子回路最適化と呼ばれる量子アルゴリズムプリミティブを定義し,構築する。量子回路の非最適化を用いて、NPクラスとBQPクラスの両方に含まれるが、Pクラスには自明に含まれない決定問題である量子回路等価性テストを提案する。さらに,実例として,Qiskit と Pytket を用いて,コンパイラベンチマークの生成と回路最適化性能の評価を行うために,具体的不最適化レシピを構築した。数値シミュレーションにより,コンパイラの最適化が困難な冗長回路を系統的に生成し,異なるコンパイラの性能の比較と性能向上に有効であることを示す。また、量子優位な機械学習データセットや量子コンピュータ忠実度ベンチマークを生成するなど、量子回路の最適化の潜在的な応用も提供する。 Optimization of circuits is an essential task for both quantum and classical computers to improve their efficiency. In contrast, classical logic optimization is known to be difficult, and a lot of heuristic approaches have been developed so far. In this study, we define and construct a quantum algorithmic primitive called quantum circuit unoptimization, which makes a given quantum circuit complex by introducing some redundancies while preserving circuit equivalence, i.e., the inverse operation of circuit optimization. Using quantum circuit unoptimization, we propose the quantum circuit equivalence test, a decision problem contained both in the NP and BQP classes but is not trivially included in the P class. Furthermore, as a practical application, we construct concrete unoptimization recipes to generate compiler benchmarks and evaluate circuit optimization performance using Qiskit and Pytket. Our numerical simulations demonstrate that quantum circuit unoptimizer systematically generates redundant circuits that are challenging for compilers to optimize, which can be used to compare the performance of different compilers and improve them. We also offer potential applications of quantum circuit unoptimization, such as generating quantum advantageous machine learning datasets and quantum computer fidelity benchmarks.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# オープンワールドにおけるクロスドメインシークエンシャルレコメンデーション:モデルに依存しないコントラシブデノケーションアプローチ Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach ( http://arxiv.org/abs/2311.04760v3 ) ライセンス: Link先を確認	Wujiang Xu, Xuying Ning, Wenfang Lin, Mingming Ha, Qiongxu Ma, Qianqiao Liang, Xuewen Tao, Linxun Chen, Bing Han, Minnan Luo,	(参考訳) クロスドメインシーケンシャルレコメンデーション(CDSR)は、従来のシーケンシャルレコメンデーション(SR)システムに存在するデータ空間の問題に対処することを目的としている。既存手法は,複数のドメインにまたがって情報を伝達・伝播する特定のクロスドメインユニットを設計することを目的としている。しかし、現実のレコメンデーションシステムでは、CDSRシナリオは通常、疎い振る舞いを持つ長い尾を持つユーザーの大多数と、一つのドメインにしか存在しないコールドスタートユーザーから構成される。これにより、現実世界の業界プラットフォームにおける既存のCDSRメソッドのパフォーマンスが低下する。したがって、オープンワールドCDSRシナリオにおけるモデルの一貫性と有効性を改善することは、CDSRモデルを構築する上で重要である(\textit{1st} CH)。近年,SR手法のいくつかは,長期使用者の情報を補完する補助行動を利用している。しかし、これらのマルチビヘイビアSR法は、ターゲットと補助動作のセマンティックなギャップや、ドメイン間のユーザ関心の偏り(\textit{2nd} CH)を見落としているため、CDSRにおいて有望な性能をもたらすことはできない。 Cross-domain sequential recommendation (CDSR) aims to address the data sparsity problems that exist in traditional sequential recommendation (SR) systems. The existing approaches aim to design a specific cross-domain unit that can transfer and propagate information across multiple domains by relying on overlapping users with abundant behaviors. However, in real-world recommender systems, CDSR scenarios usually consist of a majority of long-tailed users with sparse behaviors and cold-start users who only exist in one domain. This leads to a drop in the performance of existing CDSR methods in the real-world industry platform. Therefore, improving the consistency and effectiveness of models in open-world CDSR scenarios is crucial for constructing CDSR models (\textit{1st} CH). Recently, some SR approaches have utilized auxiliary behaviors to complement the information for long-tailed users. However, these multi-behavior SR methods cannot deliver promising performance in CDSR, as they overlook the semantic gap between target and auxiliary behaviors, as well as user interest deviation across domains (\textit{2nd} CH).	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 大規模言語モデルにおけるパーソナリティテストの有効性の検証 Challenging the Validity of Personality Tests for Large Language Models ( http://arxiv.org/abs/2311.05297v2 ) ライセンス: Link先を確認	Tom Sühr, Florian E. Dorner, Samira Samadi, Augustin Kelava,	(参考訳) GPT-4のような大きな言語モデル(LLM)は、テキストベースのインタラクションにおいて、ますます人間らしく振る舞うように見え、もともと人間のために開発されたアンケートを用いて、LLMの性格特性を評価する試みが盛んに行われている。再利用対策はLLMを評価するための資源効率のよい方法であるが、人間のサブポピュレーション全体にわたって評価結果が有効であることを確実にするためには、注意深い適応が必要である。本研究では,人格検査に対するLSMの反応が人間の反応から体系的に逸脱していることを示す。具体的には、逆コードされたアイテム("I am introverted" 対 "I am extraverted" )はどちらも肯定的に答えられることが多い。さらに、特定のパーソナリティタイプをシミュレートするためにLLMを「操る」ために設計されたプロンプト間のバリエーションは、人間のサンプルから5つの独立したパーソナリティ要素を明確な分離に従わない。これらの結果を踏まえ、LLMの「個性」のような潜在的に不明確な概念について強い結論を出す前に、LSMに対する検査の妥当性を検討することが重要であると信じている。 With large language models (LLMs) like GPT-4 appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate personality traits of LLMs using questionnaires originally developed for humans. While reusing measures is a resource-efficient way to evaluate LLMs, careful adaptations are usually required to ensure that assessment results are valid even across human subpopulations. In this work, we provide evidence that LLMs' responses to personality tests systematically deviate from human responses, implying that the results of these tests cannot be interpreted in the same way. Concretely, reverse-coded items ("I am introverted" vs. "I am extraverted") are often both answered affirmatively. Furthermore, variation across prompts designed to "steer" LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe that it is important to investigate tests' validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs' "personality".	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 不確かさ伝達による逐次ラベリングの不確かさ推定 Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission ( http://arxiv.org/abs/2311.08726v2 ) ライセンス: Link先を確認	Jianfeng He, Linlin Yu, Shuo Lei, Chang-Tien Lu, Feng Chen,	(参考訳) シーケンシャルラベリング(Sequential labeling)は、名前付きエンティティ認識(NER)のようなシーケンス内の各トークンのラベルを予測するタスクである。 NERタスクは、エンティティを抽出し、テキストが与えられたラベルを予測することを目的としている。これまでのNERの性能向上には大きな進歩があったが,NER(UE-NER)に対する不確実性評価はいまだに未検討だが必須である。本研究は,NER予測の不確実性スコアを推定することを目的としたUE-NERに焦点を当てる。従来の不確実性推定モデルは、エンティティ間の接続(すなわち、他のエンティティに基づいて1つのエンティティ埋め込みが学習される)とエンティティ抽出サブタスクにおける間違ったスパンケースという、NERの2つのユニークな特徴を見落としていることが多い。そこで我々は,他のトークンから送信された不確実性を考慮して,抽出されたエンティティに対する不確実性スコアを推定する逐次ラベル付け後ネットワーク(SLPN)を提案する。さらに,誤診事例の特異性に対処するための評価戦略を定義した。私たちのSLPNは、MIT-Restaurantデータセット上のAUPRの5.54ポイント改善など、3つのデータセットで大幅に改善されています。我々のコードは \url{https://github.com/he159ok/UncSeqLabeling_SLPN} で利用可能です。 Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on three datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset. Our code is available at \url{https://github.com/he159ok/UncSeqLabeling_SLPN}.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# バイアスニューロン除去による指示追従言語モデルの緩和バイアス Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination ( http://arxiv.org/abs/2311.09627v2 ) ライセンス: Link先を確認	Nakyeong Yang, Taegwan Kang, Jungkyu Choi, Honglak Lee, Kyomin Jung,	(参考訳) 命令追従言語モデルは、しばしば望ましくないバイアスを示す。これらの望ましくないバイアスは、ゼロショット例のプロンプトを通じて幅広い命令が使用される言語モデルの実際の使用において加速される可能性がある。この問題を解決するために、まずバイアス出力に大きく影響するバイアスニューロンを定義し、その存在を経験的に証明する。さらに,命令追従設定における言語モデルのバイアスニューロンを除去するための,新しい実用的なバイアス緩和手法であるCRISPRを提案する。 CRISPRは自動的にバイアス出力を決定し、バイアス出力に影響を与えるニューロンを説明可能性法を用いてバイアスニューロンに分類する。実験により,モデルのタスク性能と既存知識を損なうことなく,ゼロショット命令追従条件下でのバイアス軽減効果が示された。実験の結果,様々な命令やデータセットの下で頑健性を示すため,本手法の一般化可能性を明らかにした。驚いたことに、我々の手法は、少数のニューロン(少なくとも3つ)を除去することで、言語モデルのバイアスを軽減することができる。 Instruction-following language models often show undesirable biases. These undesirable biases may be accelerated in the real-world usage of language models, where a wide range of instructions is used through zero-shot example prompting. To solve this problem, we first define the bias neuron, which significantly affects biased outputs, and prove its existence empirically. Furthermore, we propose a novel and practical bias mitigation method, CRISPR, to eliminate bias neurons of language models in instruction-following settings. CRISPR automatically determines biased outputs and categorizes neurons that affect the biased outputs as bias neurons using an explainability method. Experimental results demonstrate the effectiveness of our method in mitigating biases under zero-shot instruction-following settings without losing the model's task performance and existing knowledge. The experimental results reveal the generalizability of our method as it shows robustness under various instructions and datasets. Surprisingly, our method can mitigate the bias in language models by eliminating only a few neurons (at least three).	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 長川一:テキスト・画像拡散モデルにおける一貫した文字 The Chosen One: Consistent Characters in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2311.10093v4 ) ライセンス: Link先を確認	Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski,	(参考訳) テキスト・ツー・イメージ生成モデルの最近の進歩は、視覚的創造性に対する大きな可能性を解き放っている。しかし、これらのモデルを使用するユーザは、ストーリービジュアライゼーション、ゲーム開発、アセットデザイン、広告など、多くの現実世界アプリケーションにとって重要な側面である、一貫したキャラクターの生成に苦労している。現在の手法は、通常、ターゲットキャラクターの複数の既存のイメージに依存するか、労働集約的な手作業を伴う。そこで本研究では,テキストプロンプトを唯一の入力とする,一貫した文字生成のための完全自動解を提案する。それぞれの段階において、類似した同一性を共有する画像の一貫性の集合を識別し、この集合からより一貫したアイデンティティを抽出する反復手順を導入する。定量的解析により,本手法はベースライン法と比較して,迅速なアライメントとアイデンティティの整合性のバランスが良好であることを示し,これらの知見はユーザ研究によって裏付けられている。結論として,本手法の実用化例をいくつか紹介する。 Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 確率回路からのプルーニングに基づく記述の抽出 Pruning-Based Extraction of Descriptions from Probabilistic Circuits ( http://arxiv.org/abs/2311.13379v2 ) ライセンス: Link先を確認	Sieben Bocklandt, Vincent Derkinderen, Koen Vanderstraeten, Wouter Pijpops, Kurt Jaspers, Wannes Meert,	(参考訳) 概念学習は、様々な分野の応用における一般的なタスクである。モチベーションの例として、音楽プレイリスト生成の応用について考察する。そこでは、プレイリストは曲の固定コレクションとしてではなく、概念として表現される(例:「ラグジュアリング・ミュージック」)。本研究では確率回路を用いて、正にラベル付けされた実例から概念を学習する。これらの回路は、このタスクに魅力的なトラクタブルモデルを形成するが、ドメインの専門家がそれらを検査し分析することは困難であり、特定のアプリケーションでの使用を妨げる。本稿では,学習した確率回路を高密度領域をカバーする論理に基づく判別モデルに変換することにより,この問題を解決することを提案する。すなわち、回路が確実に学習概念の一部として分類する領域である。このアプローチの一環として、我々は、F1スコアと集約エントロピーと呼ばれる新たに提案された記述長の両方を考慮して、確率回路から低密度領域を抽出するアルゴリズムであるPUTPUTを提案する。本実験は,音楽プレイリスト生成タスクや類似データセットにおいて,差別的モデルを提供することによる,競争力に優れるアプローチの有効性を実証するものである。 Concept learning is a general task with applications in various domains. As a motivating example we consider the application of music playlist generation, where a playlist is represented as a concept (e.g., `relaxing music') rather than as a fixed collection of songs. In this work we use a probabilistic circuit to learn a concept from positively labelled and unlabelled examples. While these circuits form an attractive tractable model for this task, it is challenging for a domain expert to inspect and analyse them, which impedes their use within certain applications. We propose to resolve this by converting a learned probabilistic circuit into a logic-based discriminative model that covers the high density regions of the circuit. That is, those regions the circuit classifies as certainly being part of the learned concept. As part of this approach we present two contributions: PUTPUT, an algorithm to prune low density regions from a probabilistic circuit while considering both the F1-score and a newly proposed description length that we call aggregated entropy. Our experiments demonstrate the effectiveness of our approach in providing discriminative models, outperforming competitors on the music playlist generation task and similar datasets.	翻訳日:2024-06-07 04:05:59 公開日:2024-06-05
# 高等教育におけるチャットGPTの倫理的意味:スコーピング・レビュー Ethical Implications of ChatGPT in Higher Education: A Scoping Review ( http://arxiv.org/abs/2311.14378v3 ) ライセンス: Link先を確認	Ming Li, Ariunaa Enkhtur, Fei Cheng, Beverley Anne Yamamoto,	(参考訳) 本稿では,ChatGPTを高等教育に活用する上での倫理的課題について考察する。英語,中国語,日本語の最近の学術論文をレビューすることで,本論文の深層的な検討とギャップの特定をめざした。 Arksey and O'Malley's scoping review framework(2005)を参考に、検索用語を定義し、3つの対象言語の4つのデータベースから関連する出版物を同定した。研究の結果、論文の大半は議論論文であることがわかったが、初期の経験的な研究がいくつかあった。これらの研究で強調された倫理的問題は、主に学術的完全性、評価問題、データ保護に関するものである。生成人工知能の迅速な展開を考えると、教育者がより経験的な研究を行い、その利用のための健全な倫理政策を開発することが不可欠である。 This scoping review explores the ethical challenges of using ChatGPT in higher education. By reviewing recent academic articles in English, Chinese, and Japanese, we aimed to provide a deep dive review and identify gaps in the literature. Drawing on Arksey and O'Malley's (2005) scoping review framework, we defined search terms and identified relevant publications from four databases in the three target languages. The research results showed that the majority of the papers were discussion papers, but there was some early empirical work. The ethical issues highlighted in these works mainly concern academic integrity, assessment issues, and data protection. Given the rapid deployment of generative artificial intelligence, it is imperative for educators to conduct more empirical studies to develop sound ethical policies for its use.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# StableSSM: 安定再パラメータ化による状態空間モデルのメモリ曲線の緩和 StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization ( http://arxiv.org/abs/2311.14495v4 ) ライセンス: Link先を確認	Shida Wang, Qianxiao Li,	(参考訳) 本稿では,パラメータ化の観点から,状態空間モデル(SSM)の長期記憶学習能力について検討する。状態空間モデルによって安定に近似できる対象関係は指数的に減衰するメモリを持つ必要がある。本分析では, 安定境界に収束するリカレント重みの結果として, この「記憶の曲線」を同定し, 再パラメータ化技術が有効であることを示す。そこで本稿では,SSMのメモリ制限を効果的に解消する手法について紹介する。近似能力の向上に加えて,再パラメータ化方式の原理的選択により最適化安定性が向上することを示す。本研究は,合成データセット,言語モデル,画像分類を用いて検証する。 In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# SPIN:テキスト分類のための大規模言語モデルにおける内部ニューロンのスパース化と統合 SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification ( http://arxiv.org/abs/2311.15983v2 ) ライセンス: Link先を確認	Difan Jiao, Yilun Liu, Zhenwei Tang, Daniel Matter, Jürgen Pfeffer, Ashton Anderson,	(参考訳) 大きな言語モデル(LLM)が革新した多くのタスクの1つは、テキスト分類である。しかし、現在のテキスト分類のパラダイムは、LLMの最終層の出力のみに依存しており、内部のニューロンに含まれる豊富な情報がほとんど使われていない。本研究では,テキスト分類のための LLM 中間層の内部ニューロンを分散・統合するモデルに依存しないフレームワーク SPIN を提案する。具体的には、SPINは、リニアプローブベースのサルエントニューロン選択層によって内部ニューロンを拡散させ、無関係ニューロンからのノイズを回避し、効率性を確保する。その後、多層サルエントニューロンが統合され、分類ヘッドの多層的特徴として機能する。大規模な実験結果から,提案したSPINはテキスト分類精度,効率,解釈可能性を大幅に向上することがわかった。 Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. Current text classification paradigms, however, rely solely on the output of the final layer in the LLM, with the rich information contained in internal neurons largely untapped. In this study, we present SPIN: a model-agnostic framework that sparsifies and integrates internal neurons of intermediate layers of LLMs for text classification. Specifically, SPIN sparsifies internal neurons by linear probing-based salient neuron selection layer by layer, avoiding noise from unrelated neurons and ensuring efficiency. The cross-layer salient neurons are then integrated to serve as multi-layered features for the classification head. Extensive experimental results show our proposed SPIN significantly improves text classification accuracy, efficiency, and interpretability.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# アダプティブ・プロンプト学習による統一モーダル・サリエント物体検出 Unified-modal Salient Object Detection via Adaptive Prompt Learning ( http://arxiv.org/abs/2311.16835v5 ) ライセンス: Link先を確認	Kunpeng Wang, Chenglong Li, Zhengzheng Tu, Zhengyi Liu, Bin Luo,	(参考訳) 既存のシングルモーダルおよびマルチモーダルサルトオブジェクト検出(SOD)手法は、それぞれのタスクに適した特定のアーキテクチャの設計に重点を置いている。しかし、異なるタスクに対する全く異なるモデルの開発は、高い計算と実践的なデプロイメントコストだけでなく、労働と時間の消費につながる。本稿では,UniSODと呼ばれる統合フレームワークにおいて,タスク間の事前知識の重複を完全に活用する単一モーダルSODとマルチモーダルSODの両方に対処する。それでも、モダリティ変数入力に適切な戦略を割り当てることは困難である。この目的のために、UniSODは適応的なプロンプト学習を通じてタスク固有のヒントを学習し、提案したトレーニング済みベースラインSODモデルに接続して対応するタスクを処理する。切り替え可能なプロンプト生成ブロックから各モダリティ対応プロンプトを生成し、人間の介入なしにシングルモーダルおよびマルチモーダル入力に基づいて構造切替を適応的に行う。エンドツーエンドのジョイントトレーニングを通じて、RGB、RGB-D、RGB-T SODの14のベンチマークデータセットに対する全体的なパフォーマンス改善を実現し、本手法がシングルモーダルおよびマルチモーダルのSODタスクを効果的かつ効率的に統一することを示し、コードと結果はhttps://github.com/Angknpng/UniSODで利用可能である。 Existing single-modal and multi-modal salient object detection (SOD) methods focus on designing specific architectures tailored for their respective tasks. However, developing completely different models for different tasks leads to labor and time consumption, as well as high computational and practical deployment costs. In this paper, we attempt to address both single-modal and multi-modal SOD in a unified framework called UniSOD, which fully exploits the overlapping prior knowledge between different tasks. Nevertheless, assigning appropriate strategies to modality variable inputs is challenging. To this end, UniSOD learns modality-aware prompts with task-specific hints through adaptive prompt learning, which are plugged into the proposed pre-trained baseline SOD model to handle corresponding tasks, while only requiring few learnable parameters compared to training the entire model. Each modality-aware prompt is generated from a switchable prompt generation block, which adaptively performs structural switching based on single-modal and multi-modal inputs without human intervention. Through end-to-end joint training, UniSOD achieves overall performance improvement on 14 benchmark datasets for RGB, RGB-D, and RGB-T SOD, which demonstrates that our method effectively and efficiently unifies single-modal and multi-modal SOD tasks.The code and results are available at https://github.com/Angknpng/UniSOD.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# ガウス分布型プロトタイプと生成モデルとの混合による解釈可能・信頼可能な画像認識 Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image Recognition ( http://arxiv.org/abs/2312.00092v2 ) ライセンス: Link先を確認	Chong Wang, Yuanhong Chen, Fengbei Liu, Yuyuan Liu, Davis James McCarthy, Helen Frazer, Gustavo Carneiro,	(参考訳) ProtoPNetは、予測とプロトタイプの訓練をリンクすることで、画像認識における解釈可能性を高め、意思決定に関する直感的な洞察を提供する。既存の手法は、プロトタイプのポイントベースの学習に依存しており、通常は2つの重要な問題に直面している。 1)学習したプロトタイプは、限られた表現力を有し、アウト・オブ・ディストリビューション(OoD)の入力を検出するのに適さないため、信頼性を低下させる。 2) 学習したプロトタイプの訓練画像空間への投射は, 予測性能の大幅な低下を引き起こす。さらに、現在のプロトタイプ学習では、重要な分類情報を保持するサブサラントな対象領域を見下ろしながら、トレーニング中に最もアクティブな対象部分のみを考えるアグレッシブなアプローチを採用している。本稿では,Gussian-Distributed Prototypes (MGProto) と呼ばれるプロトタイプ分布を学習するための新しい生成パラダイムを提案する。 MGProtoからのプロトタイプの配布により、OoD入力の解釈可能な画像分類と信頼性の高い認識が可能である。 MGProtoの最適化は、学習したプロトタイプの分布を訓練画像空間に自然に投影することで、プロトタイプの投影による性能劣化に対処する。さらに,最もアクティブなだけでなく,サブサラントなオブジェクト部品も考慮した,新規かつ効果的なプロトタイプマイニング戦略を開発した。モデルコンパクト化を促進するため,より重要度の高いプロトタイプを除去してMGProtoを創出することを提案する。 CUB-200-2011、Stanford Cars、Stanford Dogs、およびOxford-IIIT Petsデータセットに関する実験は、MGProtoが最先端の画像認識とOoD検出性能を達成し、解釈可能性の向上を提供することを示している。 Prototypical-part methods, e.g., ProtoPNet, enhance interpretability in image recognition by linking predictions to training prototypes, thereby offering intuitive insights into their decision-making. Existing methods, which rely on a point-based learning of prototypes, typically face two critical issues: 1) the learned prototypes have limited representation power and are not suitable to detect Out-of-Distribution (OoD) inputs, reducing their decision trustworthiness; and 2) the necessary projection of the learned prototypes back into the space of training images causes a drastic degradation in the predictive performance. Furthermore, current prototype learning adopts an aggressive approach that considers only the most active object parts during training, while overlooking sub-salient object regions which still hold crucial classification information. In this paper, we present a new generative paradigm to learn prototype distributions, termed as Mixture of Gaussian-distributed Prototypes (MGProto). The distribution of prototypes from MGProto enables both interpretable image classification and trustworthy recognition of OoD inputs. The optimisation of MGProto naturally projects the learned prototype distributions back into the training image space, thereby addressing the performance degradation caused by prototype projection. Additionally, we develop a novel and effective prototype mining strategy that considers not only the most active but also sub-salient object parts. To promote model compactness, we further propose to prune MGProto by removing prototypes with low importance priors. Experiments on CUB-200-2011, Stanford Cars, Stanford Dogs, and Oxford-IIIT Pets datasets show that MGProto achieves state-of-the-art image recognition and OoD detection performances, while providing encouraging interpretability results.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# グラフ表現学習のためのリカレント距離フィルタリング Recurrent Distance Filtering for Graph Representation Learning ( http://arxiv.org/abs/2312.01538v3 ) ライセンス: Link先を確認	Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann,	(参考訳) 反復的なワンホップメッセージパッシングに基づくグラフニューラルネットワークは、遠方のノードからの情報を効果的に活用するのに苦労していることが示されている。逆にグラフ変換器は、各ノードが他のすべてのノードに直接参加できるようにするが、グラフ帰納バイアスがなく、アドホックな位置符号化に頼る必要がある。本稿では,これらの課題を解決するための新しいアーキテクチャを提案する。提案手法は, 与えられた対象ノードに対して, 最短距離で他のノードを集約し, 線形RNNを用いてホップ表現のシーケンスを符号化する。線形RNNは、安定な長距離信号伝搬のために特定の対角形でパラメータ化され、理論的には近傍階層を符号化するのに十分な表現性を持つ。位置エンコーディングを必要とせず、我々のモデルの性能は、様々なベンチマークにおける最先端グラフ変換器と同等かそれ以上であり、計算コストが大幅に削減されていることを実証的に示す。私たちのコードはhttps://github.com/skeletondyh/GRED.comでオープンソースです。 Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a linear RNN to encode the sequence of hop representations. The linear RNN is parameterized in a particular diagonal form for stable long-range signal propagation and is theoretically expressive enough to encode the neighborhood hierarchy. With no need for positional encoding, we empirically show that the performance of our model is comparable to or better than that of state-of-the-art graph transformers on various benchmarks, with a significantly reduced computational cost. Our code is open-source at https://github.com/skeletondyh/GRED.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# オープンセット画像復元のためのテスト時間劣化適応 Test-Time Degradation Adaptation for Open-Set Image Restoration ( http://arxiv.org/abs/2312.02197v4 ) ライセンス: Link先を確認	Yuanbiao Gou, Haiyu Zhao, Boyun Li, Xinyan Xiao, Xi Peng,	(参考訳) 事前定義された劣化からイメージを復元するクローズセットのシナリオとは対照的に、オープンセットのイメージ復元は、事前学習期間中に予期せぬ劣化に対処することを目的としている。本研究は,この課題を考察し,テストデータとトレーニングデータ間の不特定分布シフトとして本質を明らかにする。近年、テスト時間適応は、この固有の格差に対処するための基本的な方法として現れている。そこで我々は,3つのコンポーネントであるtextit{i.e.} から構成されるオープンセット画像復元のためのテスト時間劣化適応フレームワークを提案する。一クリーンな画像を生成するための事前訓練及び劣化診断拡散モデル二試験期間中の入力画像に基づいて未知の劣化に適応する試験時間劣化アダプタ三アダプタ誘導画像復元は、アダプタを介してモデルをガイドし、対応するクリーン画像を作成する。複数の劣化実験により,本手法はタスク固有の手法よりも高い性能を達成できることが判明した。コードはhttps://github.com/XLearning-SCU/2024-ICML-TAOで公開されている。 In contrast to close-set scenarios that restore images from a predefined set of degradations, open-set image restoration aims to handle the unknown degradations that were unforeseen during the pretraining phase, which is less-touched as far as we know. This work study this challenging problem and reveal its essence as unidentified distribution shifts between the test and training data. Recently, test-time adaptation has emerged as a fundamental method to address this inherent disparities. Inspired by it, we propose a test-time degradation adaptation framework for open-set image restoration, which consists of three components, \textit{i.e.}, i) a pre-trained and degradation-agnostic diffusion model for generating clean images, ii) a test-time degradation adapter adapts the unknown degradations based on the input image during the testing phase, and iii) the adapter-guided image restoration guides the model through the adapter to produce the corresponding clean image. Through experiments on multiple degradations, we show that our method achieves comparable even better performance than those task-specific methods. The code is available at https://github.com/XLearning-SCU/2024-ICML-TAO.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# PixelLM: 大規模マルチモーダルモデルによるピクセル推論 PixelLM: Pixel Reasoning with Large Multimodal Model ( http://arxiv.org/abs/2312.02228v2 ) ライセンス: Link先を確認	Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin,	(参考訳) 大規模マルチモーダルモデル(LMM)は目覚ましい進歩を遂げているが、複数のオープンワールドターゲットを含む画像推論タスクのためのピクセルレベルのマスクを生成することは依然として課題である。このギャップを埋めるために、ピクセルレベルの推論と理解のための効率的かつ効率的なLMMであるPixelLMを導入する。 Central to PixelLMは、新しくて軽量なピクセルデコーダであり、包括的セグメンテーションコードブックである。デコーダは、詳細な目標関連情報を符号化するコードブックトークンの隠れ埋め込みからマスクを効率よく生成する。この設計により、PixelLMは一般的なLMMの構造と調和し、さらにコストのかかるセグメンテーションモデルを必要としない。さらに,モデルが複数のターゲットを区別する能力を高め,マスク品質を大幅に向上させる目標改良損失を提案する。この分野での研究を進めるために、我々は高品質なマルチターゲット推論セグメンテーションベンチマークであるMUSEを構築した。 PixelLMは、さまざまなピクセルレベルの画像推論と理解タスクを網羅し、MUSEやシングル参照セグメンテーション、マルチ参照セグメンテーションなど、複数のベンチマークで確立されたメソッドよりも優れている。包括的検証により, 提案した各成分の有効性が確認された。すべてのコード、モデル、データセットが公開される。 While large multimodal models (LMMs) have achieved remarkable progress, generating pixel-level masks for image reasoning tasks involving multiple open-world targets remains a challenge. To bridge this gap, we introduce PixelLM, an effective and efficient LMM for pixel-level reasoning and understanding. Central to PixelLM is a novel, lightweight pixel decoder and a comprehensive segmentation codebook. The decoder efficiently produces masks from the hidden embeddings of the codebook tokens, which encode detailed target-relevant information. With this design, PixelLM harmonizes with the structure of popular LMMs and avoids the need for additional costly segmentation models. Furthermore, we propose a target refinement loss to enhance the model's ability to differentiate between multiple targets, leading to substantially improved mask quality. To advance research in this area, we construct MUSE, a high-quality multi-target reasoning segmentation benchmark. PixelLM excels across various pixel-level image reasoning and understanding tasks, outperforming well-established methods in multiple benchmarks, including MUSE, single- and multi-referring segmentation. Comprehensive ablations confirm the efficacy of each proposed component. All code, models, and datasets will be publicly available.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# 単純化されたモデルの一般化における解釈可能性イリュージョン Interpretability Illusions in the Generalization of Simplified Models ( http://arxiv.org/abs/2312.03656v2 ) ライセンス: Link先を確認	Dan Friedman, Andrew Lampinen, Lucas Dixon, Danqi Chen, Asma Ghandeharioun,	(参考訳) ディープラーニングシステムを研究する一般的な方法は、単純化されたモデル表現を使用することで、例えば、特異値分解を用いて、低次元空間におけるモデルの隠れ状態の可視化を行う。このアプローチは、これらの単純化の結果が元のモデルに忠実であると仮定する。ここでは、この仮定に重要な注意を払っている: 単純化された表現がトレーニングセットの完全なモデルを正確に近似できるとしても、モデルの振舞いを正確に把握できないかもしれない。我々は、Dyckバランスの取れたパーセンシ言語やコード補完タスクを含む、体系的な一般化分割を伴う制御データセット上のTransformerモデルをトレーニングすることでこれを説明できる。次元還元やクラスタリングといったツールを使ってこれらのモデルを単純化し、これらの単純化されたプロキシが元のモデルの振る舞いにどのようにマッチするかを明示的にテストする。単純化されたプロキシが分布内評価において元のモデルに忠実であり、体系的一般化の様々なテストに忠実でない場合である。これには、オリジナルのモデルを体系的に一般化するが、単純化されたプロキシは失敗し、単純化されたプロキシがより一般化するケースが含まれる。この結果から,SVD などのツールを用いた機械的解釈が,新しい状況下でモデルがどのように機能するかを確実に予測できるかどうか,という疑問が浮かび上がっている。 A common method to study deep learning systems is to use simplified model representations--for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplifications are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits, including the Dyck balanced-parenthesis languages and a code completion task. We simplify these models using tools like dimensionality reduction and clustering, and then explicitly test how these simplified proxies match the behavior of the original model. We find consistent generalization gaps: cases in which the simplified proxies are more faithful to the original model on the in-distribution evaluations and less faithful on various tests of systematic generalization. This includes cases where the original model generalizes systematically but the simplified proxies fail, and cases where the simplified proxies generalize better. Together, our results raise questions about the extent to which mechanistic interpretations derived using tools like SVD can reliably predict what a model will do in novel situations.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# 並列関数呼び出しのためのLLMコンパイラ An LLM Compiler for Parallel Function Calling ( http://arxiv.org/abs/2312.04511v3 ) ライセンス: Link先を確認	Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami,	(参考訳) 最近のLCMの推論能力により、知識の遮断、算術能力の不足、プライベートデータへのアクセスの欠如など、外部関数呼び出しを実行して、固有の制限を克服することができる。この開発により、LLMはコンテキストに基づいて複数の関数を選択し、コーディネートし、より複雑な問題に対処できるようになった。しかしながら、関数呼び出しの現在のメソッドは、しばしば、高いレイテンシ、コスト、時には不正確な振る舞いをもたらす、各関数のシーケンシャルな推論と動作を必要とする。これに対処するため,複数の関数呼び出しを効率的にオーケストレーションするために並列に関数を実行するLLMCompilerを導入する。古典的なコンパイラの原理からインスピレーションを得たLLMCompilerは、3つのコンポーネントで並列関数呼び出しを可能にする。 i) 関数呼び出しプランナーであって,関数呼び出しの実行計画を定式化するもの (ii)タスクフェッチユニット、タスクを呼び出す関数のディスパッチ、及び (iii)これらのタスクを並列に実行するExecutor。 LLMCompilerは関数呼び出しに最適化されたオーケストレーションを自動的に生成し、オープンソースモデルとクローズドソースモデルの両方で使用することができる。我々はLLMCompilerを様々な関数呼び出しパターンのタスクでベンチマークした。我々は、最大3.7倍のレイテンシ、最大6.7倍のコスト削減、ReActと比較して最大9%の精度向上を観察する。私たちのコードはhttps://github.com/SqueezeAILab/LLMCompiler.comから入手可能です。 The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# Simul-LLM:大規模言語モデルを用いた高品質同時翻訳のためのフレームワーク Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models ( http://arxiv.org/abs/2312.04691v3 ) ライセンス: Link先を確認	Victor Agostinelli, Max Wild, Matthew Raffel, Kazi Ahmed Asif Fuad, Lizhong Chen,	(参考訳) 数十億のパラメータを持ち、大量のデータに事前訓練された大規模言語モデル(LLM)は、さまざまな下流自然言語処理タスクにおいて、最先端の性能に近いかそれ以上の性能を持つようになった。ニューラルマシン翻訳(NMT)は、LLMが大きな成功を収めたタスクの一つである。しかし、LLMをNMTのより難しいサブセットである同時翻訳(SimulMT)に適用することに注力する研究はほとんどない。本稿では,従来のSimulMTのコンセプトと実践をLLMの文脈で検証し,NMTで微調整されたLCMをSimulMTのタスクに適応させる,Simul-LLMを紹介し,SimulMTにフォーカスしたLLMのためのオープンソースのファインチューニングおよび評価パイプライン開発フレームワークであるSimul-LLMを紹介する。 Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine translation (NMT) is one such task that LLMs have been applied to with great success. However, little research has focused on applying LLMs to the more difficult subset of NMT called simultaneous translation (SimulMT), where translation begins before the entire source context is available to the model. In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# ハードウェア効率訓練によるゲート型リニアアテンション変圧器 Gated Linear Attention Transformers with Hardware-Efficient Training ( http://arxiv.org/abs/2312.06635v5 ) ライセンス: Link先を確認	Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim,	(参考訳) 線形アテンションを持つトランスフォーマーは、効率的な並列トレーニングを可能にするが、2D隠れ状態を持つRNNとして同時に定式化することができ、線形時間推論の複雑さを享受できる。しかし、一般に線形の注意は通常のソフトマックスの注意を過小評価する。さらに, 線形アテンションの現在の実装はI/O認識に欠けており, ソフトマックスアテンションの高度に最適化された実装よりも遅い。本研究は、並列化性に対してメモリ移動をオフにする線形注意のためのハードウェア効率のアルゴリズムについて述べる。その結果、FLASHLINEARATTENTIONと呼ばれる実装は、FLASHATTENTION-2 (Dao, 2023) よりも短いシーケンス長 (eg , 1K) であってもスタンドアロン層として高速になった。次に、このアルゴリズムを、データ依存ゲートを用いたより表現力豊かな線形アテンションに一般化する。トランスフォーマーの標準アテンション層の代わりに使用される場合、結果として生じるゲート型リニアアテンション(GLA)トランスフォーマーは、LLaMA-architecture Transformer (Touvron et al , 2023) や、RetNet (Sun et al , 2023a) やMamba (Gu & Dao, 2023) といった最近の線形時間推論ベースラインと、中規模言語モデリング実験において競合的に動作する。 GLA変換器は、特に長さの一般化に有効であり、2Kで訓練されたモデルは、大きなパープレキシティ劣化を伴わずに20K以上のシーケンスに一般化することができる。トレーニング速度では、GLA Transformerは同様のサイズのMambaモデルよりもスループットが高い。 Transformers with linear attention allow for efficient parallel training but can simultaneously be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time inference complexity. However, linear attention generally underperforms ordinary softmax attention. Moreover, current implementations of linear attention lack I/O-awareness and are thus slower than highly optimized implementations of softmax attention. This work describes a hardware-efficient algorithm for linear attention that trades off memory movement against parallelizability. The resulting implementation, dubbed FLASHLINEARATTENTION, is faster than FLASHATTENTION-2 (Dao, 2023) as a standalone layer even on short sequence lengths (e.g., 1K). We then generalize this algorithm to a more expressive variant of linear attention with data-dependent gates. When used as a replacement for the standard attention layer in Transformers, the resulting gated linear attention (GLA) Transformer is found to perform competitively against the LLaMA-architecture Transformer (Touvron et al., 2023) as well recent linear-time-inference baselines such as RetNet (Sun et al., 2023a) and Mamba (Gu & Dao, 2023) on moderate-scale language modeling experiments. GLA Transformer is especially effective at length generalization, enabling a model trained on 2K to generalize to sequences longer than 20K without significant perplexity degradations. For training speed, the GLA Transformer has higher throughput than a similarly-sized Mamba model.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# Silent Guardian: 大規模言語モデルによる悪意ある爆発からテキストを保護する Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models ( http://arxiv.org/abs/2312.09669v4 ) ライセンス: Link先を確認	Jiawei Zhao, Kejiang Chen, Xiaojian Yuan, Yuang Qi, Weiming Zhang, Nenghai Yu,	(参考訳) 大規模言語モデル(LLM)の急速な開発は、様々な下流タスクにおいて顕著な成功を収めた。しかし、LLMの膨大な可能性と目覚ましい能力は、その開放性のために悪用された場合、新たなセキュリティとプライバシの懸念も引き起こす。例えば、LSMは、文書を盗用したり、模倣したりすることで、オリジナルコンテンツの著作権を侵害したり、特定のソーステキストに基づいて識別できない偽の情報を生成したりすることができる。場合によっては、LLMは個人のプライバシーを推測するためにインターネットからテキストを分析することもできる。残念なことに、従来のテキスト保護研究は強力なLSMの出現を予測できなかったため、この新しい文脈ではもはや効果を示さなかった。このギャップを埋めるために,LLMに対するテキスト保護機構であるSilent Guardian(SG)を導入する。具体的には,まず,トラニケート保護事例(TPE)の概念を提案する。保護されるテキストを慎重に修正することで、TPEはLDMを誘導して最初にエンドトークンをサンプリングし、直接相互作用を終了させることができる。さらに,テキストデータの離散空間においてTPEを効率的に構築するために,高効率であるだけでなく,最適化プロセス中にテキストのセマンティック一貫性を維持できる,Super Tailored Protection (STP)と呼ばれる新しい最適化アルゴリズムを提案する。総合的な実験評価により、SGは様々な構成下でターゲットテキストを効果的に保護でき、場合によってはほぼ100%の保護成功率を達成できることが示された。特に、SGは比較的優れた転送性とロバスト性を示しており、現実的なシナリオでも適用可能である。 The rapid development of large language models (LLMs) has yielded impressive success in various downstream tasks. However, the vast potential and remarkable capabilities of LLMs also raise new security and privacy concerns if they are exploited for nefarious purposes due to their open-endedness. For example, LLMs may be used to plagiarize or imitate writing, thereby infringing the copyright of the original content, or to create indiscriminate fake information based on a certain source text. In some cases, LLMs can even analyze text from the Internet to infer personal privacy. Unfortunately, previous text protection research could not foresee the emergence of powerful LLMs, rendering it no longer effective in this new context. To bridge this gap, we introduce Silent Guardian (SG), a text protection mechanism against LLMs, which allows LLMs to refuse to generate response when receiving protected text, preventing the malicious use of text from the source. Specifically, we first propose the concept of Truncation Protection Examples (TPE). By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. In addition, to efficiently construct TPE in the discrete space of text data, we propose a novel optimization algorithm called Super Tailored Protection (STP), which is not only highly efficient but also maintains the semantic consistency of the text during the optimization process. The comprehensive experimental evaluation demonstrates that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases. Notably, SG also exhibits relatively good transferability and robustness, making its application in practical scenarios possible.	翻訳日:2024-06-07 03:55:26 公開日:2024-06-05
# ニューラルネットワーク表現のトレーニング方法: 総合的研究とベンチマーク How to Train Neural Field Representations: A Comprehensive Study and Benchmark ( http://arxiv.org/abs/2312.10531v2 ) ライセンス: Link先を確認	Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, Efstratios Gavves,	(参考訳) ニューラルフィールド(NeF)は、画像、形状、シーンを含む様々なモードの信号をモデリングするための汎用的な手法として最近登場した。その後、下流タスクの表現としてNeFを使うことを探り、例えば、それに適合したNeFのパラメータに基づいて画像を分類した。しかし、NeFハイパーパラメーターが下流の表現としての品質に与える影響はほとんど理解されておらず、ほとんど探索されていない。これは部分的には、ニューラルネットワークのデータセットに適合するために必要な大量の時間によって引き起こされる。本研究では,大規模なNeFデータセットの高速な最適化を実現するために並列化を利用するJAXベースのライブラリを提案する。このライブラリーを用いて、下流タスクに対するNeFsの適合に対する異なるハイパーパラメータの影響を総合的に研究する。特に,共有初期化の利用,オーバートレーニングの効果,使用するネットワークアーキテクチャの表現性について検討する。我々の研究は、NeFのトレーニング方法に関する貴重な洞察を提供し、下流アプリケーションでの有効性を最適化するためのガイダンスを提供する。最後に、提案したライブラリと分析に基づいて、MNIST、CIFAR、ImageNetの変種、ShapeNetv2を含む一般的な視覚データセットのニューラルネットワーク変種からなるベンチマークであるNeural Field Arenaを提案する。我々のライブラリとNeural Field Arenaはオープンソースとして公開され、標準化されたベンチマークを導入し、ニューラルフィールドに関するさらなる研究を促進する。 Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters on fitting NeFs for downstream tasks. In particular, we explore the use of a shared initialization, the effects of overtraining, and the expressiveness of the network architectures used. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# WaveCoder: インストラクションチューニングによる大規模言語モデルの広範化とVersatile拡張 WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning ( http://arxiv.org/abs/2312.14187v4 ) ライセンス: Link先を確認	Zhaojian Yu, Xin Zhang, Ning Shang, Yangyu Huang, Can Xu, Yishujie Zhao, Wenxiang Hu, Qiufeng Yin,	(参考訳) 最近の研究は、高品質な命令データセットで微調整された後、得られたモデルが広範囲のタスクに対処する印象的な能力を得ることができることを実証している。しかし、既存の命令データ生成手法はしばしば重複データを生成し、データ品質を十分に制御できない。本稿では、命令データを4つのコード関連タスクに分類することで、命令チューニングの一般化を拡張し、オープンソースコードから多種多様な高品質な命令データを生成するLLMベースのジェネレータデータ処理フレームワークを提案する。そこで我々は,4つの普遍的なコード関連タスクにまたがる20,000の命令インスタンスからなるデータセットであるCodeOceanを紹介した。次に、WidespreadとVersatile拡張命令チューニングを備えた微調整コードLLMであるWaveCoderを紹介する。このモデルは、コード言語モデル(LLM)の命令チューニングを強化するために特別に設計されている。我々の実験では、Wavecoderモデルは、異なるコード関連タスクを同じレベルの微調整スケールで一般化する能力において、他のオープンソースモデルよりも優れていることを示した。さらに、Wavecoderは、以前のコード生成タスクで高い効率を示す。そこで本研究では,命令データ生成と微調整モデルの分野に多大な貢献をし,コード関連タスクのパフォーマンス向上のための新たな洞察とツールを提供する。 Recent work demonstrates that, after being fine-tuned on a high-quality instruction dataset, the resulting model can obtain impressive capabilities to address a wide range of tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality. In this paper, we extend the generalization of instruction tuning by classifying the instruction data to 4 code-related tasks and propose a LLM-based Generator-Discriminator data process framework to generate diverse, high-quality instruction data from open source code. Hence, we introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4 universal code-related tasks,which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model. Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of Code Language Models (LLMs). Our experiments demonstrate that Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous code generation tasks. This paper thus offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# シミュレーションに基づく推論によるパルサー集団の分離合成 Isolated pulsar population synthesis with simulation-based inference ( http://arxiv.org/abs/2312.14848v3 ) ライセンス: Link先を確認	Vanessa Graber, Michele Ronchi, Celsa Pardo-Araujo, Nanda Rea,	(参考訳) 我々は、パルサー集団合成とシミュレーションベース推論(SBI)を組み合わせることで、孤立したギャラクティック電波パルサーの磁気特性を制約する。まず、中性子星の誕生特性とその動的・磁気進化をモデル化する枠組みを開発する。具体的には、対数正規分布から初期磁場強度の$B$とスピン周期の$P$をサンプリングし、電力法則により遅延磁場減衰を捕捉する。各対数正規化は平均$\mu_{\log B}, \mu_{\log P}$, そして標準偏差$\sigma_{\log B}, \sigma_{\log P}$で表され、一方電力法則は指数$a_{\rm late}$で表される。その後、恒星の電波放射と観測バイアスをモデル化し、3つの電波サーベイで検出を模倣し、5つの磁気的入力パラメータを変化させることで、合成$P$--$\dot{P}$ダイアグラムの大規模なデータベースを作成する。次に、神経後部推定に焦点を当てたSBIアプローチに従い、パラメータの後部分布を推定するために深部ニューラルネットワークを訓練する。シミュレーションデータを用いてこれらのニューラルネットワーク密度推定器の検証に成功した後、観測されたパルサー集団の後方分布を推定するために、ネットワークのアンサンブルを用いた。我々は、対数正規分布に対して$\mu_{\log B} = 13.10^{+0.08}_{-0.10}$, $\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$, $\mu_{\log P} = -1.00^{+0.26}_{-0.21}$, $\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$, $a_{\rm late} = -1.80^{+0.65}_{-0.61}$, $ for the power law at the 9.5\%$ credible intervals を得る。これまでの研究と対比し、推定された$a_{\rm late}$値の不確かさを強調します。本手法は, 複雑な集団合成フレームワークの統計的頑健な推測に向けた重要なステップであり, 今後の銀河パルサーのマルチ波長解析の基礎となる。 We combine pulsar population synthesis with simulation-based inference (SBI) to constrain the magnetorotational properties of isolated Galactic radio pulsars. We first develop a framework to model neutron star birth properties and their dynamical and magnetorotational evolution. We specifically sample initial magnetic field strengths, $B$, and spin periods, $P$, from lognormal distributions and capture the late-time magnetic field decay with a power law. Each lognormal is described by a mean, $\mu_{\log B}, \mu_{\log P}$, and standard deviation, $\sigma_{\log B}, \sigma_{\log P}$, while the power law is characterized by the index, $a_{\rm late}$. We subsequently model the stars' radio emission and observational biases to mimic detections with three radio surveys, and we produce a large database of synthetic $P$--$\dot{P}$ diagrams by varying our five magnetorotational input parameters. We then follow an SBI approach that focuses on neural posterior estimation and train deep neural networks to infer the parameters' posterior distributions. After successfully validating these individual neural density estimators on simulated data, we use an ensemble of networks to infer the posterior distributions for the observed pulsar population. We obtain $\mu_{\log B} = 13.10^{+0.08}_{-0.10}$, $\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$ and $\mu_{\log P} = -1.00^{+0.26}_{-0.21}$, $\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$ for the lognormal distributions and $a_{\rm late} = -1.80^{+0.65}_{-0.61}$ for the power law at the $95\%$ credible interval. We contrast our results with previous studies and highlight uncertainties of the inferred $a_{\rm late}$ value. Our approach represents a crucial step toward robust statistical inference for complex population synthesis frameworks and forms the basis for future multiwavelength analyses of Galactic pulsars.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# オーマ・ケミカル・ペアの嗅覚ラベル予測 Olfactory Label Prediction on Aroma-Chemical Pairs ( http://arxiv.org/abs/2312.16124v2 ) ライセンス: Link先を確認	Laura Sisson, Aryan Amit Barsainyan, Mrityunjay Sharma, Ritesh Kumar,	(参考訳) 深層学習技術のアロマ化学への応用により、嗅覚の質を予測するためのモデルが人間の専門家より正確になった。しかし、この領域での公衆の研究は単一分子の品質を予測することに限られており、産業用途では、香水剤と食品科学者が多くの分子のブレンドに関心を持つことが多い。本稿では、ラベル付き分子対からなるデータセットに対して、既存のアプローチと新しいアプローチの両方を適用する。本稿では,アロマケミカルのブレンドから発生する臭気特性を正確に予測できるグラフニューラルネットワークモデルを提案する。 The application of deep learning techniques on aroma-chemicals has resulted in models more accurate than human experts at predicting olfactory qualities. However, public research in this domain has been limited to predicting the qualities of single molecules, whereas in industry applications, perfumers and food scientists are often concerned with blends of many molecules. In this paper, we apply both existing and novel approaches to a dataset we gathered consisting of labeled pairs of molecules. We present graph neural network models capable of accurately predicting the odor qualities arising from blends of aroma-chemicals, with an analysis of how variations in architecture can lead to significant differences in predictive power.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# ソフトウェア開発エージェントの実験的共同学習 Experiential Co-Learning of Software-Developing Agents ( http://arxiv.org/abs/2312.17025v3 ) ライセンス: Link先を確認	Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、特にLLM駆動の自律エージェントを通じて、様々な領域に大きな変化をもたらした。 LLMエージェントは効率的なコラボレーション、タスク分割、ソフトウェア品質の保証を示し、手動による関与の必要性を著しく減らします。しかし、これらのエージェントは過去の経験から恩恵を受けずに、しばしば様々なタスクを独立に実行する。この目的のために,教師とアシスタントエージェントが過去の軌跡からショートカット指向の体験を収集し,これらの過去の経験を将来のタスク実行に活用する,新しいLLMエージェント学習フレームワークであるExperiential Co-Learningを紹介した。広範な実験により、このフレームワークは、未確認のソフトウェア開発タスクをより効果的に対処することを可能にする。我々は、LLMエージェントを自律性向上に導くとともに、協調学習における進化的成長に寄与することを期待している。コードとデータはhttps://github.com/OpenBMB/ChatDevで公開されている。 Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents. A representative scenario is in software development, where LLM agents demonstrate efficient collaboration, task division, and assurance of software quality, markedly reducing the need for manual involvement. However, these agents frequently perform a variety of tasks independently, without benefiting from past experiences, which leads to repeated mistakes and inefficient attempts in multi-step task execution. To this end, we introduce Experiential Co-Learning, a novel LLM-agent learning framework in which instructor and assistant agents gather shortcut-oriented experiences from their historical trajectories and use these past experiences for future task execution. The extensive experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively. We anticipate that our insights will guide LLM agents towards enhanced autonomy and contribute to their evolutionary growth in cooperative learning. The code and data are available at https://github.com/OpenBMB/ChatDev.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# MR-GSM8K:大規模言語モデル評価のためのメタ推論ベンチマーク MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation ( http://arxiv.org/abs/2312.17080v4 ) ライセンス: Link先を確認	Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia,	(参考訳) 本研究では,Large Language Models (LLMs) の新たな評価パラダイムを導入し,従来の質問応答の役割から,教師に類似した問題解決の役割へ移行する。メタ推論」と呼ばれるこのパラダイムは、推論過程を無視する結果指向の評価から、異なるモデルの認知能力を効果的に区別するより包括的な評価へと重点を移す。 GSM8Kデータセットにこのパラダイムを適用し,MR-GSM8Kベンチマークを開発した。我々の広範な分析には、オープンソースドメインと商用ドメインの両方の最先端モデルが含まれており、そのトレーニングおよび評価手法における根本的な欠陥を明らかにしている。特に、Deepseek-v2やClaude3-SonnetといったモデルではGPT-4のGSM8Kと密接に競合するが、MR-GSM8Kでは性能格差が劇的に拡大し、20以上の絶対点まで拡張された。 In this work, we introduce a novel evaluation paradigm for Large Language Models (LLMs) that compels them to transition from a traditional question-answering role, akin to a student, to a solution-scoring role, akin to a teacher. This paradigm, focusing on "reasoning about reasoning," hence termed meta-reasoning, shifts the emphasis from result-oriented assessments, which often neglect the reasoning process, to a more comprehensive evaluation that effectively distinguishes between the cognitive capabilities of different models. By applying this paradigm in the GSM8K dataset, we have developed the MR-GSM8K benchmark. Our extensive analysis includes several state-of-the-art models from both open-source and commercial domains, uncovering fundamental deficiencies in their training and evaluation methodologies. Notably, while models like Deepseek-v2 and Claude3-Sonnet closely competed with GPT-4 in GSM8K, their performance disparities expanded dramatically in MR-GSM8K, with differences widening to over 20 absolute points, underscoring the significant challenge posed by our meta-reasoning approach.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# 因果推論に必要なすべての大規模言語モデルについて Is Knowledge All Large Language Models Needed for Causal Reasoning? ( http://arxiv.org/abs/2401.00139v2 ) ライセンス: Link先を確認	Hengrui Cai, Shengjie Liu, Rui Song,	(参考訳) 本稿では,大規模言語モデル(LLM)の因果推論について,人工知能の進化における解釈可能性と信頼性を高めるために検討する。様々なタスクにおけるLLMの習熟度にもかかわらず、因果関係を理解するにはさらなる探索が必要である。本稿では,「do-operators」を用いた因果帰属モデルを提案し,その因果帰属過程における入力数値データとLLMの既存知識の影響を体系的に定量化する。筆者らが新たに開発した実験装置は,LLMがコンテキスト情報や各領域の固有知識に依存していることを評価する。評価の結果、LLMの因果推論能力は、主に提供されたコンテキストやドメイン固有の知識に依存していることが明らかとなった。このような知識がなければ、LLMは計算に制限があるにもかかわらず、利用可能な数値データを用いて因果推論の程度を維持することができる。このことは、2つの因果発見のための微調整LDMの提案を動機付け、知識と数値情報の両方を効果的に活用する。 This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes ``do-operators" for constructing counterfactual scenarios, allowing us to systematically quantify the influence of input numerical data and LLMs' pre-existing knowledge on their causal reasoning processes. Our newly developed experimental setup assesses LLMs' reliance on contextual information and inherent knowledge across various domains. Our evaluation reveals that LLMs' causal reasoning ability mainly depends on the context and domain-specific knowledge provided. In the absence of such knowledge, LLMs can still maintain a degree of causal reasoning using the available numerical data, albeit with limitations in the calculations. This motivates the proposed fine-tuned LLM for pairwise causal discovery, effectively leveraging both knowledge and numerical information.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# 大規模言語モデルの実現における公正性 Fairness in Serving Large Language Models ( http://arxiv.org/abs/2401.00588v2 ) ライセンス: Link先を確認	Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica,	(参考訳) オンデマンドLLM推論サービス(例:ChatGPT、BARD)は、短いチャットチャットから長いドキュメント読み込みまで、幅広いリクエストをサポートする。すべてのクライアントリクエストが公平に処理されることを保証するため、ほとんどの主要なLCM推論サービスはリクエストレート制限を持ち、クライアントがリクエストキューを支配できないことを保証します。しかし、この公平さという初歩的な概念は、余分なキャパシティがある場合、リソースの過小評価とクライアントエクスペリエンスの低下をもたらす。フェアスケジューリングには豊富な文献があるが、LLMは予測不可能な要求長と並列アクセラレータ上での独自のバッチ特性のために、新たな課題を提示している。本稿では,処理された入力および出力トークンの数を考慮に入れたコスト関数に基づいて,LLMサービスフェアネスの定義を提案する。サービスにおける公平性を達成するために,連続バッチ機構に基づく公平なスケジューラであるVirtual Token Counter (VTC)を提案する。 2つのバックログ化されたクライアント間のサービス差に2倍の厳しい上限があることを証明します。様々な条件下での欠点を示す他のベースライン法と対照的に, 公平性を確保するために, VTCの優れた性能を示す。再現可能なコードはhttps://github.com/Ying1123/VTC-artifactで入手できる。 High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilization of the resources and poor client experience when there is spare capacity. While there is a rich literature on fair scheduling, serving LLMs presents new challenges due to their unpredictable request lengths and their unique batching characteristics on parallel accelerators. This paper introduces the definition of LLM serving fairness based on a cost function that accounts for the number of input and output tokens processed. To achieve fairness in serving, we propose a novel scheduling algorithm, the Virtual Token Counter (VTC), a fair scheduler based on the continuous batching mechanism. We prove a 2x tight upper bound on the service difference between two backlogged clients, adhering to the requirement of work-conserving. Through extensive experiments, we demonstrate the superior performance of VTC in ensuring fairness, especially in contrast to other baseline methods, which exhibit shortcomings under various conditions. The reproducible code is available at https://github.com/Ying1123/VTC-artifact	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# HAAQI-Net: 聴覚障害者のための非侵襲的ニューラル音楽品質評価モデル HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids ( http://arxiv.org/abs/2401.01145v4 ) ライセンス: Link先を確認	Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao,	(参考訳) 本稿では、補聴器使用者に適した音質評価のための非侵襲的深層学習モデルであるHAAQI-Netを紹介する。 HAAQI-Netは、参照信号に対する侵入的比較に依存する聴覚支援オーディオ品質指標(HAAQI)のような従来の手法とは異なり、よりアクセシブルで効率的な代替手段を提供する。 HAAQI-Netは、双方向長短期記憶(BLSTM)アーキテクチャを用いて、事前訓練されたBEATsモデルから、音楽オーディオクリップや聴覚障害パターンから直接HAAQIスコアを予測する。その結果,線形相関係数(LCC)0.9368,スピアマンランク相関係数(SRCC)0.9486,平均正方形誤差(MSE)0.0064,推定時間62.52秒から2.54秒が得られた。有効ではあるが、大きなBEATモデルによる特徴抽出は計算オーバーヘッドを発生させる。これを解決するため、知識蒸留戦略は学生蒸留BEATsモデルを作成し、HAAQI-Netトレーニング中に教師BEATsモデルから情報を蒸留し、必要なパラメータを減らす。蒸留されたHAAQI-Netは、LCCが0.9071、SRCCが0.9307、MSEが0.0091、パラメータが75.85%、推測時間が96.46%の強い性能を維持している。この削減により、HAAQI-Netの効率性とスケーラビリティが向上し、補聴器設定における実環境の音楽品質評価が可能となる。この研究は、特定のアプリケーションに対するディープラーニングモデルの最適化に関するさらなる研究の道を開き、補聴器技術における実践的応用のための効率的で正確なモデルの開発に関する洞察を提供することで、音声信号処理と品質評価に寄与する。 This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features from the pre-trained BEATs model, HAAQI-Net predicts HAAQI scores directly from music audio clips and hearing loss patterns. Results show HAAQI-Net's effectiveness, with predicted scores achieving a Linear Correlation Coefficient (LCC) of 0.9368, a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and a Mean Squared Error (MSE) of 0.0064, reducing inference time from 62.52 seconds to 2.54 seconds. Although effective, feature extraction via the large BEATs model incurs computational overhead. To address this, a knowledge distillation strategy creates a student distillBEATs model, distilling information from the teacher BEATs model during HAAQI-Net training, reducing required parameters. The distilled HAAQI-Net maintains strong performance with an LCC of 0.9071, an SRCC of 0.9307, and an MSE of 0.0091, while reducing parameters by 75.85% and inference time by 96.46%. This reduction enhances HAAQI-Net's efficiency and scalability, making it viable for real-world music audio quality assessment in hearing aid settings. This work also opens avenues for further research into optimizing deep learning models for specific applications, contributing to audio signal processing and quality assessment by providing insights into developing efficient and accurate models for practical applications in hearing aid technology.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# 低リソース言語のための効率的かつ効果的なOpenQAシステムの構築 Building Efficient and Effective OpenQA Systems for Low-Resource Languages ( http://arxiv.org/abs/2401.03590v2 ) ライセンス: Link先を確認	Emrah Budur, Rıza Özçelik, Dilara Soylu, Omar Khattab, Tunga Güngör, Christopher Potts,	(参考訳) 質問応答(QA)とは、ある節から抽出された自由形式の自然言語による自然言語による質問に答えるタスクである。 OpenQAの変種では、質問文のみが与えられ、システムは構造化されていない知識ソースから関連するパスを検索し、それを使って回答を提供する必要がある。 QAシステムは現在、英語以外の言語に大規模なラベル付きQAデータセットがないため、英語に限られている。本稿では,低コストで効率的な OpenQA システムを低リソース環境向けに開発できることを示す。主な要素は,(1) 機械翻訳ラベル付きデータセットを用いた弱監督,(2) 対象言語文脈における非構造的知識源である。さらに,これらのシステムを確実に評価するためには,数百のゴールドアセスメント例が不可欠であることを示す。英語とトルコ語は類型的に非常に異なっており、トルコ語にはQAのためのリソースが限られているため、我々の手法をトルコ語に適用することは難しいケーススタディである。我々は、SQuAD2.0の機械翻訳であるSQuAD-TRを紹介し、ColBERT-QAを適応させ、トルコのリソースとSQuAD-TRを2年間にわたるウィキペディアダンプの2バージョンを用いて再トレーニングすることで、OpenQAシステムを構築します。 BM25ベースおよびDPRベースラインQAリーダモデルと比較して,エクサクトマッチ(EM)スコアで24～32%,F1スコアで22～29%の性能向上が得られた。以上の結果から,SQuAD-TRにより,トルコ語でOpenQAが実現可能となり,研究者が他の低リソース言語でOpenQAシステムを構築することが期待できる。すべてのコード、モデル、データセットをhttps://github.com/boun-tabi/SQuAD-TRで公開しています。 Question answering (QA) is the task of answering questions posed in natural language with free-form natural language answers extracted from a given passage. In the OpenQA variant, only a question text is given, and the system must retrieve relevant passages from an unstructured knowledge source and use them to provide answers, which is the case in the mainstream QA systems on the Web. QA systems currently are mostly limited to the English language due to the lack of large-scale labeled QA datasets in non-English languages. In this paper, we show that effective, low-cost OpenQA systems can be developed for low-resource contexts. The key ingredients are (1) weak supervision using machine-translated labeled datasets and (2) a relevant unstructured knowledge source in the target language context. Furthermore, we show that only a few hundred gold assessment examples are needed to reliably evaluate these systems. We apply our method to Turkish as a challenging case study, since English and Turkish are typologically very distinct and Turkish has limited resources for QA. We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources and SQuAD-TR using two versions of Wikipedia dumps spanning two years. We obtain a performance improvement of 24-32% in the Exact Match (EM) score and 22-29% in the F1 score compared to the BM25-based and DPR-based baseline QA reader models. Our results show that SQuAD-TR makes OpenQA feasible for Turkish, which we hope encourages researchers to build OpenQA systems in other low-resource languages. We make all the code, models, and the dataset publicly available at https://github.com/boun-tabi/SQuAD-TR.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# カーネル・フィッシャー・ラオ流による単位時間サンプリング Sampling in Unit Time with Kernel Fisher-Rao Flow ( http://arxiv.org/abs/2401.03892v3 ) ライセンス: Link先を確認	Aimee Maurais, Youssef Marzouk,	(参考訳) 非正規化対象密度からサンプリングするための平均場ODEと対応する相互作用粒子系(IPS)を導入する。 IPSは勾配のない閉形式であり、参照密度からサンプリングし、(正規化されていない)ターゲット-参照密度比を計算する能力のみを必要とする。平均場ODEは、特定のフィッシャー-ラオ勾配流の経路である2つの密度の幾何学的混合に沿ってサンプルを輸送する速度場に対するポアソン方程式を解くことで得られる。速度場にRKHSアンサッツを用い、ポアソン方程式を抽出可能とし、有限標本上での平均場ODEの離散化を可能にする。平均場ODEは、サンプル駆動最適輸送として知られるフレームワーク内でのモンゲ・アンプ・エル方程式の連続線型化の極限として離散時間の観点からも導出することができる。我々は,我々のアプローチの確率的変種を導入し,我々のIPSは,様々な対象分布から高品質なサンプルを生成可能であることを実証し,同等の勾配のない粒子系と競合し,勾配に基づく代替品と競合することを示した。 We introduce a new mean-field ODE and corresponding interacting particle systems (IPS) for sampling from an unnormalized target density. The IPS are gradient-free, available in closed form, and only require the ability to sample from a reference density and compute the (unnormalized) target-to-reference density ratio. The mean-field ODE is obtained by solving a Poisson equation for a velocity field that transports samples along the geometric mixture of the two densities, which is the path of a particular Fisher-Rao gradient flow. We employ a RKHS ansatz for the velocity field, which makes the Poisson equation tractable and enables discretization of the resulting mean-field ODE over finite samples. The mean-field ODE can be additionally be derived from a discrete-time perspective as the limit of successive linearizations of the Monge-Amp\`ere equations within a framework known as sample-driven optimal transport. We introduce a stochastic variant of our approach and demonstrate empirically that our IPS can produce high-quality samples from varied target distributions, outperforming comparable gradient-free particle systems and competitive with gradient-based alternatives.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# ニューラルマーク付き時間点過程の分布自由等角関節予測領域 Distribution-Free Conformal Joint Prediction Regions for Neural Marked Temporal Point Processes ( http://arxiv.org/abs/2401.04612v2 ) ライセンス: Link先を確認	Victor Dheur, Tanguy Bosser, Rafael Izbicki, Souhaib Ben Taieb,	(参考訳) 連続的に不規則な間隔で観測されるラベル付き事象の系列は、様々な分野に分布する。 TPP(Temporal Point Processs)は、これらのシーケンスをモデル化するための数学的フレームワークを提供する。しかし、モデル上の不特定性やトレーニングデータの欠如により、これらの確率モデルは真で未知の基盤過程の貧弱な近似を与える可能性があり、それらから抽出された予測領域は、基礎となる不確実性の信頼できない推定値である。本稿では、共形予測の枠組みを用いて、ニューラルTPPモデルにおける不確実性定量化のためのより信頼性の高い手法を開発する。主な目的は、イベントの到着時刻とマークに対する分布自由な共同予測領域を生成し、有限サンプルの限界カバレッジを保証することである。重要な課題は、分布的な仮定なしで、厳密な正、連続的な応答とカテゴリー的な応答の両方を扱うことである。まず、イベントの到着時刻とマークの個々の予測領域を組み合わせた、単純だが保守的なアプローチを検討します。そこで本研究では,到達時刻と標章の合同予測密度から得られた2変量高密度領域に基づく,より効果的な手法を提案する。この2つの変数間の依存関係を利用することで、この手法は2つの不可能な組み合わせを除外し、よりシャープな予測領域を発生させながら、未指定のカバレッジレベルを達成できる。また、イベントの到着時刻とマークの個別の単変量予測領域の生成について、共形回帰と分類手法を用いて検討する。さらに,条件付きカバレッジという概念を強く評価する。最後に、シミュレーションと実世界の両方のデータセットに関する広範な実験を通じて、これらの手法の有効性と効率を評価する。 Sequences of labeled events observed at irregular intervals in continuous time are ubiquitous across various fields. Temporal Point Processes (TPPs) provide a mathematical framework for modeling these sequences, enabling inferences such as predicting the arrival time of future events and their associated label, called mark. However, due to model misspecification or lack of training data, these probabilistic models may provide a poor approximation of the true, unknown underlying process, with prediction regions extracted from them being unreliable estimates of the underlying uncertainty. This paper develops more reliable methods for uncertainty quantification in neural TPP models via the framework of conformal prediction. A primary objective is to generate a distribution-free joint prediction region for an event's arrival time and mark, with a finite-sample marginal coverage guarantee. A key challenge is to handle both a strictly positive, continuous response and a categorical response, without distributional assumptions. We first consider a simple but conservative approach that combines individual prediction regions for the event's arrival time and mark. Then, we introduce a more effective method based on bivariate highest density regions derived from the joint predictive density of arrival times and marks. By leveraging the dependencies between these two variables, this method excludes unlikely combinations of the two, resulting in sharper prediction regions while still attaining the pre-specified coverage level. We also explore the generation of individual univariate prediction regions for events' arrival times and marks through conformal regression and classification techniques. Moreover, we evaluate the stronger notion of conditional coverage. Finally, through extensive experimentation on both simulated and real-world datasets, we assess the validity and efficiency of these methods.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# Batch-ICL:効果的、効率的、秩序に依存しないインコンテキスト学習 Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning ( http://arxiv.org/abs/2401.06469v3 ) ライセンス: Link先を確認	Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan,	(参考訳) 本稿では,テキスト内学習(ICL)をメタ最適化プロセスとして扱うことにより,LCMがICLの順序に敏感である理由を説明する。この理解は、ICLの効率的、効率的、秩序に依存しない推論アルゴリズムであるBatch-ICLの開発に繋がる。標準的なNショット学習アプローチとは違い、Batch-ICLは$N$の1ショットフォワード計算を採用し、その結果のメタ勾配を集約する。これらの集約されたメタグラディエントをゼロショットクエリの前方計算に適用し、最終的な予測を生成する。このバッチ処理アプローチでは、LCMはICLの例の順に非依存である。広範な実験と解析により、Batch-ICLはICLの例のほとんどの置換よりも一貫して優れていることを示した。一部のケースでは、必要な計算資源を削減しつつ、標準ICLのベストオーダーの性能を超越している。さらに,メタ最適化の「エポック」を複数備えた新しいBatch-ICLを開発した。この変種は暗黙的にICLの例の置換を探求し、ICLのパフォーマンスをさらに向上させる。 In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.	翻訳日:2024-06-07 03:45:21 公開日:2024-06-05
# ハードタスクのための簡易トレーニングデータの不合理な有効性 The Unreasonable Effectiveness of Easy Training Data for Hard Tasks ( http://arxiv.org/abs/2401.06751v2 ) ライセンス: Link先を確認	Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe,	(参考訳) ハードトレーニングデータが正確なラベル付けが難しい場合に、どうやってモデルをトレーニングしてハードテストデータでうまく機能させるのか? この問題はスケーラブルな監視問題と呼ばれ、言語モデルが継続的に改善され、注目を集めている。本稿では,既存の事前学習型言語モデルが,ハードデータに微調整されたオラクルモデルと同様に,比較的容易なデータからハードデータまで,比較的よく一般化されるという驚くべき結論を提示する。本研究では,テキスト内学習,線形分類器ヘッド,QLoRAといった簡易な微調整手法を用いて,データポイントの硬さを7つの異なる尺度で表し,これらを実験的に異なる6つの人的硬さ尺度(グレードレベルなど)と1つのモデルベース尺度(ロスベース)で示す。さらに, ハードデータのモデル性能を最も気にしている場合でも, ファインタニングのためのハードデータよりも簡単なデータを集める方がよいことを示す。実験では,70bまでの大きさのオープンモデルと,3年生の理科質問から大学レベルのSTEM質問,一般知識トリヴィアまで,難易度の高い4つの質問回答データセットを用いた。本研究は, LMの難解な一般化が, 研究課題に対して驚くほど強いことを結論づける。私たちのコードは、https://github.com/allenai/easy-to-hard- generalizationで利用可能です。 How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data. We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear classifier heads, and QLoRA for seven different measures of datapoint hardness, including six empirically diverse human hardness measures (like grade level) and one model-based measure (loss-based). Furthermore, we show that even if one cares most about model performance on hard data, it can be better to collect easy data rather than hard data for finetuning, since hard data is generally noisier and costlier to collect. Our experiments use open models up to 70b in size and four publicly available question-answering datasets with questions ranging in difficulty from 3rd grade science questions to college level STEM questions and general-knowledge trivia. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied. Our code is available at: https://github.com/allenai/easy-to-hard-generalization	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 航空機の予測維持のためのサロゲートニューラルネットワークの局所安定性 Surrogate Neural Networks Local Stability for Aircraft Predictive Maintenance ( http://arxiv.org/abs/2401.06821v2 ) ライセンス: Link先を確認	Mélanie Ducoffe, Guillaume Povéda, Audrey Galametz, Ryma Boumazouza, Marion-Cécile Martin, Julien Baris, Derk Daverschot, Eugene O'Higgins,	(参考訳) サーロゲートニューラルネットワークは、今日では、計算に要求されるエンジニアリングシミュレーション(例:構造解析)の代用として、産業で日常的に使われている。製品設計、テスト、監視フェーズなどにおいて、より高速な予測を生成できるため、産業アプリケーションでの分析が可能になる。性能と時間効率のため、これらのサロゲートモデルは安全クリティカルなアプリケーションでの使用のために開発されている。ニューラルネットワークの検証、特にその堅牢性(例えば摂動)の評価は、現実のアプリケーションや認定に組み込むための次の重要なステップである。航空機の外部負荷から航空機が持続する応力を予測するために設計されたサロゲートニューラルネットワークに対する航空機の予測保守の文脈における経験的および形式的手法の適用性とスケーラビリティを評価する。ケーススタディは高次元の入出力空間をカバーし、検証プロセスは多目的制約を許容する。本稿では,そのような代理モデルの局所安定性特性を入力雑音に対して評価する際の検証手法の相補性について検討する。 1つの検証「パイプライン」におけるメソッドの逐次結合の有効性を示すとともに、対象プロパティの評価に必要な実行時の利得を示す。 Surrogate Neural Networks are nowadays routinely used in industry as substitutes for computationally demanding engineering simulations (e.g., in structural analysis). They allow to generate faster predictions and thus analyses in industrial applications e.g., during a product design, testing or monitoring phases. Due to their performance and time-efficiency, these surrogate models are now being developed for use in safety-critical applications. Neural network verification and in particular the assessment of their robustness (e.g., to perturbations) is the next critical step to allow their inclusion in real-life applications and certification. We assess the applicability and scalability of empirical and formal methods in the context of aircraft predictive maintenance for surrogate neural networks designed to predict the stress sustained by an aircraft part from external loads. The case study covers a high-dimensional input and output space and the verification process thus accommodates multi-objective constraints. We explore the complementarity of verification methods in assessing the local stability property of such surrogate models to input noise. We showcase the effectiveness of sequentially combining methods in one verification 'pipeline' and demonstrating the subsequent gain in runtime required to assess the targeted property.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# Selene: ソフトウェア検証における自動証明のパイオニア化 Selene: Pioneering Automated Proof in Software Verification ( http://arxiv.org/abs/2401.07663v2 ) ライセンス: Link先を確認	Lichen Zhang, Shuai Lu, Nan Duan,	(参考訳) 正しさを保証することは、ソフトウェア工学の重要な側面である。利用可能なさまざまな戦略の中で、ソフトウェア検証は正確性の確定的な保証を提供する。それでも、検証証明を書くことはリソース集約的で人的消費であり、このプロセスを自動化する必要がある。本稿では,実世界の産業レベルのマイクロカーネルであるseL4に基づいて構築された,プロジェクトレベルの自動検証ベンチマークであるSeleneを紹介する。 Seleneは、エンドツーエンドの証明生成のための包括的なフレームワークと、軽量な検証環境を提供する。 GPT-3.5-turbo や GPT-4 のような先進的な大規模言語モデル (LLM) による実験結果から, 自動証明生成領域における LLM の機能を強調した。さらに,セレンによる課題が今後の研究で緩和される可能性が示唆された。 Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# JumpCoder: オンライン修正による自動回帰コーダを超えて JumpCoder: Go Beyond Autoregressive Coder via Online Modification ( http://arxiv.org/abs/2401.07870v2 ) ライセンス: Link先を確認	Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, Jianling Sun,	(参考訳) 既存のコード大言語モデル(コードLLM)は、コード生成において印象的な能力を示すが、自己回帰的なシーケンシャル生成は本質的に可逆性に欠ける。この制限は、人間がしているようにコーディング中に失われた文をタイムリーに修正するのを妨げる。 JumpCoderは、人間に似たオンライン修正と非逐次生成が可能な新しいモデルに依存しないフレームワークで、LLMを増強する。 JumpCoderの背景にある重要なアイデアは、生成時に必要に応じて、現在生成されたコードに新しいコードを挿入することである。この戦略は,各ラインの生成後に最も重要な位置を$k$で埋める実験であり,ジェネレーションモデルスコーリングとともに抽象構文木(AST)パーサを用いて,各インフィルの有効性を効果的に判断するものである。複数言語ベンチマークと多言語ベンチマークにまたがる6つの最先端コード LLM を用いた大規模な実験は、すべてのベースラインに対する大幅な改善を一貫して示している。私たちのコードはhttps://github.com/Keytoyze/JumpCoder.comで公開されています。 While existing code large language models (code LLMs) exhibit impressive capabilities in code generation, their autoregressive sequential generation inherently lacks reversibility. This limitation hinders them from timely correcting previous missing statements during coding as humans do, often leading to error propagation and suboptimal performance. We introduce JumpCoder, a novel model-agnostic framework that enables human-like online modification and non-sequential generation to augment code LLMs. The key idea behind JumpCoder is to insert new code into the currently generated code when necessary during generation, which is achieved through an auxiliary infilling model that works in tandem with the code LLM. Since identifying the best infill position beforehand is intractable, we adopt an \textit{infill-first, judge-later} strategy, which experiments with filling at the $k$ most critical positions following the generation of each line, and uses an Abstract Syntax Tree (AST) parser alongside the Generation Model Scoring to effectively judge the validity of each potential infill. Extensive experiments using six state-of-the-art code LLMs across multiple and multilingual benchmarks consistently indicate significant improvements over all baselines. Our code is public at https://github.com/Keytoyze/JumpCoder.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 次世代ネットワークにおける弾性フェデレーションとマルチエージェント深部強化学習に基づく協調エッジキャッシング Cooperative Edge Caching Based on Elastic Federated and Multi-Agent Deep Reinforcement Learning in Next-Generation Network ( http://arxiv.org/abs/2401.09886v2 ) ライセンス: Link先を確認	Qiong Wu, Wenhua Wang, Pingyi Fan, Qiang Fan, Huiling Zhu, Khaled B. Letaief,	(参考訳) エッジキャッシュは、小型セルベースステーション(SBS)のキャッシュユニットを有効活用することで、次世代ネットワークにとって有望なソリューションである。 SBSは,ユーザの個人情報を保護しながら,学習を通じて正確な人気コンテンツを予測することが重要である。従来のフェデレーション学習(FL)はユーザのプライバシを保護することができるが、UE間のデータ格差はモデル品質の低下につながる。そのため、各UE毎に個別のローカルモデルをトレーニングし、人気コンテンツの正確な予測を行う必要がある。さらに、次世代ネットワークにおいて、キャッシュされたコンテンツを隣接するSBS間で共有することができるため、予測された人気コンテンツを異なるSBSでキャッシュすることで、コンテンツを取得するコストに影響を与える可能性がある。したがって、人気のあるコンテンツがどこで共同でキャッシュされているかを判断することが重要である。これらの問題に対処するために、ネットワークのコストを最適化するために、弾性フェデレーションとマルチエージェント深部強化学習(CEFMR)に基づく協調エッジキャッシュ方式を提案する。まず,各UEのパーソナライズされたモデルをトレーニングするための弾力的FLアルゴリズムを提案する。そこでは,予測精度を向上させるために,対向オートエンコーダ(AAE)モデルを採用し,トレーニングされたAAEモデルに基づいて,SBS毎に人気コンテンツを予測するために,人気コンテンツ予測アルゴリズムを提案する。最後に,マルチエージェント・ディープ・強化学習(MADRL)に基づくアルゴリズムを提案する。提案手法が既存のベースラインキャッシュ方式よりも優れていることを示す実験結果を得た。 Edge caching is a promising solution for next-generation networks by empowering caching units in small-cell base stations (SBSs), which allows user equipments (UEs) to fetch users' requested contents that have been pre-cached in SBSs. It is crucial for SBSs to predict accurate popular contents through learning while protecting users' personal information. Traditional federated learning (FL) can protect users' privacy but the data discrepancies among UEs can lead to a degradation in model quality. Therefore, it is necessary to train personalized local models for each UE to predict popular contents accurately. In addition, the cached contents can be shared among adjacent SBSs in next-generation networks, thus caching predicted popular contents in different SBSs may affect the cost to fetch contents. Hence, it is critical to determine where the popular contents are cached cooperatively. To address these issues, we propose a cooperative edge caching scheme based on elastic federated and multi-agent deep reinforcement learning (CEFMR) to optimize the cost in the network. We first propose an elastic FL algorithm to train the personalized model for each UE, where adversarial autoencoder (AAE) model is adopted for training to improve the prediction accuracy, then {a popular} content prediction algorithm is proposed to predict the popular contents for each SBS based on the trained AAE model. Finally, we propose a multi-agent deep reinforcement learning (MADRL) based algorithm to decide where the predicted popular contents are collaboratively cached among SBSs. Our experimental results demonstrate the superiority of our proposed scheme to existing baseline caching schemes.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 最大エントロピー原理からのエントロピー生成:統一的アプローチ Entropy Production from Maximum Entropy Principle: a Unifying Approach ( http://arxiv.org/abs/2401.09936v2 ) ライセンス: Link先を確認	Adalberto D. Varizi, Pedro S. Correia,	(参考訳) エントロピー生成は、不可逆現象と熱力学の第2法則を特徴づける重要な量である。しかし、ユビキタスな定義はコンセンサスを損なう。エントロピー生産が情報への不完全なアクセスから生じることを考えれば、このレターでは、ジェインズの最大エントロピー原理を用いて、顕著で矛盾する定義をまとめる枠組みを確立する。より一般的に、エントロピー生成の定義は、トモグラフィ的に不完全な量子測定やシステム上の量子チャネルの作用に対処する。 Entropy production is the crucial quantity characterizing irreversible phenomena and the second law of thermodynamics. Yet, a ubiquitous definition eludes consensus. Given that entropy production arises from incomplete access to information, in this Letter we use Jaynes' maximum entropy principle to establish a framework that brings together prominent and apparently conflicting definitions. More generally our definition of entropy production addresses any tomographically incomplete quantum measurement and/or the action of a quantum channel on a system.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 完全パルスインライン式インライン式ツインビームスクレーサ Perfect pulsed inline twin-beam squeezers ( http://arxiv.org/abs/2401.10197v2 ) ライセンス: Link先を確認	Martin Houde, Nicolás Quesada,	(参考訳) 完全なインラインストレッチャーはスペクトル的に純粋であり、同じ入力と出力の時間モードを持ち、デバイスが作用する唯一の入力モードで任意の入力量子状態を絞り、他のモードの量子状態は影響を受けない。本研究では, 単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式複孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式単孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式2孔式離散化されたハイゼンベルク・ピクチャー・プロパゲータのBloch-Messiah分解から入力時間モードと出力時間モードの関係を解析することにより、周波数縮退した対称群速度が一致したタイプ-II構成で操作すると、二重パス構造が完全パルスインライン・スクラッシャーを生成することがわかった。 Perfect inline squeezers are both spectrally pure and have identical input and output temporal modes, allowing one to squeeze an arbitrary input quantum state in the sole input mode on which the device acts, while the quantum states of any other modes are unaffected. We study theoretically how to obtain a perfect pulsed inline squeezer in twin-beam systems by considering three commonly used configurations: unpoled single pass, poled single pass, and poled double pass. By obtaining analytical relations between the input and output temporal modes from the Bloch-Messiah decomposition of the discretized Heisenberg-picture propagator, we find that a double pass structure produces a perfect pulsed inline squeezer when operated in a frequency degenerate, symmetric group-velocity matched type-II configuration.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# ギャップレス対称性で保護された位相位相位相と有限部分群をガグすることによる一般化された分解臨界点 Gapless symmetry-protected topological phases and generalized deconfined critical points from gauging a finite subgroup ( http://arxiv.org/abs/2401.11702v2 ) ライセンス: Link先を確認	Lei Su, Meng Zeng,	(参考訳) 大域対称性の有限部分群を測ることによって、従来の位相と位相遷移を非伝統的な位相にマッピングすることができる。本研究では、実例として、大域対称性が$U(1)$の創発的な $\mathbb{Z}_2$-gauged システム、すなわち、$\mathbb{Z}_2$-gauged Bose-Hubbard モデルを 1-D と 2-D の両方で検討する。ある極限において、商 $\tilde{U}(1)$ 対称性と双対 $\hat{\mathbb{Z}}_2$ 対称性の間には、創発的に混ざった 't Hooft 異常が存在する。 1-Dでは、超流動相は密度行列再正規化群(DMRG)計算によって支持されるように、本質的にギャップのない対称性保護位相(SPT)相にマッピングされる。 2-Dでは、元の超流体絶縁体遷移は、ギャップレスSPT相とゴールドストーンモードと共存するSPT次数と$\tilde{U}(1)$-symmetric-enriched topological (SET)相の間の一般化された分解量子臨界点(DQCP)となる。また、これらの相の安定性と、小さな摂動に対する臨界点とその潜在的な実験的実現についても論じる。我々の研究は、部分的なゲージングが新しい位相と量子臨界性を構築するための単純かつ強力なアプローチであることを実証している。 Gauging a finite subgroup of a global symmetry can map conventional phases and phase transitions to unconventional ones. In this work, we study, as a concrete example, an emergent $\mathbb{Z}_2$-gauged system with global symmetry $U(1)$, namely, the $\mathbb{Z}_2$-gauged Bose-Hubbard model both in 1-D and in 2-D. In certain limits, there is an emergent mixed 't Hooft anomaly between the quotient $\tilde{U}(1)$ symmetry and the dual $\hat{\mathbb{Z}}_2$ symmetry. In 1-D, the superfluid phase is mapped to an intrinsically gapless symmetry-protected topological (SPT) phase, as supported by density-matrix renormalization group (DMRG) calculations. In 2-D, the original superfluid-insulator transition becomes a generalized deconfined quantum critical point (DQCP) between a gapless SPT phase, where a SPT order coexists with Goldstone modes, and a $\tilde{U}(1)$-symmetry-enriched topological (SET) phase. We also discuss the stability of these phases and the critical points to small perturbations and their potential experimental realizations. Our work demonstrates that partial gauging is a simple and yet powerful approach in constructing novel phases and quantum criticalities.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 生成コンテキストによるブラインド: 言語モデルと生成コンテキストのマージは、知識衝突時にどのように行われるか? Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? ( http://arxiv.org/abs/2401.11911v5 ) ライセンス: Link先を確認	Hexiang Tan, Fei Sun, Wanli Yang, Yuanzhuo Wang, Qi Cao, Xueqi Cheng,	(参考訳) 補助情報は、LLM(Large Language Models)の拡張の鍵となっているが、LLMがこれらのコンテキストをどのように統合するかについては、特にLLMが生成したコンテキストと外部ソースから取得したコンテキストについてはあまり知られていない。そこで本研究では,LLMの応答が生成した文脈と検索した文脈のいずれに起因しているかを特定するための体系的な枠組みを定式化する。応答の起源を容易に追跡するために,各質問は生成したコンテキストと検索したコンテキストの両方にペアリングされるが,その中の1つだけが正解である。実験の結果,複数のLDM (GPT-4/3.5, Llama2) において, 誤った情報を提供する場合でも, 生成コンテキストを優先する有意なバイアスが認められた。さらに、このバイアスに寄与する2つの重要な要因を特定します。 i) LLMが生成する文脈は,通常,質問とより類似し,選択される可能性を高める。二検索した文脈におけるセグメンテーションのプロセスは、その完全性を損なうため、LLMの完全利用を阻害する。我々の分析は,LLMが様々な文脈を融合する方法の理解を深め,現在のLLM拡張法を進展させる上で貴重な洞察を提供し,LLM検索における誤情報の発生リスクを強調している。 While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information. We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 多言語言語モデルのためのテキスト埋め込み型インバージョンセキュリティ Text Embedding Inversion Security for Multilingual Language Models ( http://arxiv.org/abs/2401.12192v4 ) ライセンス: Link先を確認	Yiyi Chen, Heather Lent, Johannes Bjerva,	(参考訳) テキストデータは、特に大規模言語モデル(LLM)やエンベッドディング・アズ・ア・サービス(EaaS)の人気により、NLPにおける実数の埋め込みとして表現されることが多い。しかし、センシティブな情報を埋め込みとして保存することはセキュリティ侵害の影響を受けやすい。防衛機構は検討されているが、これらは英語のみに焦点を当てており、他の言語は攻撃に晒される可能性がある。本研究は多言語埋め込みインバージョンによるLLMのセキュリティについて検討する。ブラックボックス・マルチランガル・クロスランガル・インバージョン・アタックの問題を定義し,その可能性を探る。以上の結果から,多言語LPMは英語による防御が不十分なため,逆攻撃に対して脆弱である可能性が示唆された。これを軽減するために,単言語モデルと多言語モデルの両方に有効な単純なマスキング防御法を提案する。本研究は,単言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語・多言語 Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages potentially exposed to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and cross-lingual inversion attacks, and explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English-based defences may be ineffective. To alleviate this, we propose a simple masking defense effective for both monolingual and multilingual models. This study is the first to investigate multilingual inversion attacks, shedding light on the differences in attacks and defenses across monolingual and multilingual settings.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# オラクル骨スクリプト認識・解読のためのオープンデータセット An open dataset for oracle bone script recognition and decipherment ( http://arxiv.org/abs/2401.15365v3 ) ライセンス: Link先を確認	Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, Yuliang Liu,	(参考訳) Oracle Bone Script (OBS) は、古代中国最古の書物として知られており、3000年前にさかのぼる上海王朝の人文と地理に関する貴重な知見を持っている。これらの著作の歴史的・文化的意義は過大評価されない。しかし、時間の経過はそれらの意味の多くを曖昧にしており、これらの古代のテキストを解読する上で重要な課題が提示されている。人工知能(AI)の出現により、OBSの解釈を支援するAIが実現可能な選択肢となった。しかし、この分野の進歩は、高品質なデータセットの欠如によって妨げられている。本稿では,HUST-OBSデータセットの作成について詳述する。このデータセットは1,588個の解読されたスクリプトの77,064個の画像と9,411個の未解読文字の62,989個の画像と、様々なソースからコンパイルされた合計140,053個の画像を含んでいる。さらに、すべての画像とラベルは、オラクルの骨研究の専門家によってレビューされ、修正されている。このデータセットは、未知のOBSを解読する将来の研究を刺激し、支援することを期待している。すべてのコードとデータセットはhttps://github.com/Pengjie-W/HUST-OBCで公開されている。 Oracle Bone Script (OBS), one of the earliest known forms of ancient Chinese writing, holds invaluable insights into the humanities and geography of the Shang Dynasty, dating back 3,000 years. The immense historical and cultural significance of these writings cannot be overstated. However, the passage of time has obscured much of their meaning, presenting a significant challenge in deciphering these ancient texts. With the advent of Artificial Intelligence (AI), employing AI to assist in interpreting OBS has become a feasible option. Yet, progress in this area has been hindered by a lack of high-quality datasets. To address this issue, this paper details the creation of the HUST-OBS dataset. This dataset encompasses 77,064 images of 1,588 individual deciphered scripts and 62,989 images of 9,411 undeciphered characters, with a total of 140,053 images, compiled from diverse sources. Additionally, all images and labels have been reviewed and corrected by experts in oracle bone studies. The hope is that this dataset could inspire and assist future research in deciphering those unknown OBS. All the codes and datasets are available at https://github.com/Pengjie-W/HUST-OBC.	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# cDVGAN:マルチクラス重力波信号と格子生成のためのフレキシブルモデル cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation ( http://arxiv.org/abs/2401.16356v4 ) ライセンス: Link先を確認	Tom Dooney, Lyana Curier, Daniel Tan, Melissa Lopez, Chris Van Den Broeck, Stefano Bromuri,	(参考訳) 重力波(GW)とGW検出器グリッチの現実的な時間領域観測のシミュレーションは、GWデータ解析を前進させるのに役立つ。シミュレーションされたデータは、信号検索のためのデータセットの拡大、機械学習のためのデータセットのバランス、検出スキームの検証など、下流タスクで使用することができる。本研究では、重力波(GW)と検出器グリッチを表す複数の時間領域観測のクラスをシミュレートする、ジェネレーティブ・アドバーサリアル・ネットワーク・フレームワークにおける新しい条件モデルである条件微分型GAN(cDVGAN)を提案する。 cDVGANはまた、条件付きクラスベクトルの補間によってクラス間のばらつきにまたがる一般化されたハイブリッドサンプルを生成することもできる。 cDVGANは、GANの典型的な2人対戦ゲームに追加のプレイヤーを導入し、補助判別器が1次微分時間列を解析する。以上の結果から,元のデータの特徴をよりよく捉えた合成データが得られることがわかった。 cDVGAN条件は3つのクラスで、LIGO blip と Tomte glitch の事象を観測3回目(O3)から2回、そして3回目は2回目(BBH)の融合を表す。提案したcDVGANは,3つのクラスの特徴を再現する4種類のベースラインGANモデルより優れている。具体的には、我々の実験により、cDVGAN生成データによる畳み込みニューラルネットワーク(CNN)のトレーニングが、他の最先端のGANモデルからの合成データ以外の検出器ノイズに埋め込まれたサンプルの検出を改善することが示されている。我々の最高の合成データセットは、ベースラインGANの合成データセットと比較して、AUC(Area-under-the-curve)のパフォーマンスが最大4.2%向上する。さらに,CNNをcDVGANのハイブリッドサンプルでトレーニングすることで,標準クラスのみをトレーニングし,LIGO検出器バックグラウンドに埋め込まれた実サンプルを同定する(cDVGANの4%のAUC改善)。 Simulating realistic time-domain observations of gravitational waves (GWs) and GW detector glitches can help in advancing GW data analysis. Simulated data can be used in downstream tasks by augmenting datasets for signal searches, balancing data sets for machine learning, and validating detection schemes. In this work, we present Conditional Derivative GAN (cDVGAN), a novel conditional model in the Generative Adversarial Network framework for simulating multiple classes of time-domain observations that represent gravitational waves (GWs) and detector glitches. cDVGAN can also generate generalized hybrid samples that span the variation between classes through interpolation in the conditioned class vector. cDVGAN introduces an additional player into the typical 2-player adversarial game of GANs, where an auxiliary discriminator analyzes the first-order derivative time-series. Our results show that this provides synthetic data that better captures the features of the original data. cDVGAN conditions on three classes, two denoised from LIGO blip and tomte glitch events from its 3rd observing run (O3), and the third representing binary black hole (BBH) mergers. Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes. Specifically, our experiments show that training convolutional neural networks (CNNs) with our cDVGAN-generated data improves the detection of samples embedded in detector noise beyond the synthetic data from other state-of-the-art GAN models. Our best synthetic dataset yields as much as a 4.2% increase in area-under-the-curve (AUC) performance compared to synthetic datasets from baseline GANs. Moreover, training the CNN with hybrid samples from our cDVGAN outperforms CNNs trained only on the standard classes, when identifying real samples embedded in LIGO detector background (4% AUC improvement for cDVGAN).	翻訳日:2024-06-07 03:35:00 公開日:2024-06-05
# 言語モデルアライメントの効率的なエクササイズ最適化に向けて Towards Efficient Exact Optimization of Language Model Alignment ( http://arxiv.org/abs/2402.00856v4 ) ライセンス: Link先を確認	Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang,	(参考訳) 言語モデルと人間の嗜好の整合性は、現実世界のタスクに応用するために不可欠である。この問題は、初期方針からの逸脱を最小限に抑え、人間の嗜好を反映した期待される報酬を最大化するために、モデルのポリシーを最適化することとして定式化される。素直な解決と見なされているが、強化学習(RL)は、効率的な政策改善を妨げる政策更新のばらつきに悩まされている。近年、嗜好データからポリシーを直接最適化するために、直接選好最適化(DPO)が提案されている。しかし、この問題の最適解に基づいて導出されたDPOが、現実の最適解の妥協平均探索近似に繋がることを示す。本稿では、アライメント目的の効率的な精度最適化(EXO)を提案する。 EXOは、任意のポリシーパラメトリゼーションのために漸近的にRLアルゴリズムと同じ方向に最適化することが保証されている。これにより、同じモード探索解が得られ、RLの複雑さを回避して効率的な最適化が可能となる。また,提案手法をDPOと比較し,提案手法の現実的嗜好データに対する既存手法に対する優位性を実証した。コードはhttps://github.com/haozheji/exact-optimization.comで入手できる。 The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates, which impedes efficient policy improvement. Recently, direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. However, we show that DPO derived based on the optimal solution of the problem leads to a compromised mean-seeking approximation of the optimal solution in practice. In this paper, we propose efficient exact optimization (EXO) of the alignment objective. EXO is guaranteed to optimize in the same direction as RL algorithms asymptotically for arbitrary policy parametrization. This leads to the same mode-seeking solution, while enables efficient optimization by circumventing the complexities of RL. We also compare our method to DPO with both theoretical and empirical analyses, and further demonstrate the advantages of our method over existing approaches on realistic human preference data. Code is available at https://github.com/haozheji/exact-optimization.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# FindingEmo:野生における感情認識のための画像データセット FindingEmo: An Image Dataset for Emotion Recognition in the Wild ( http://arxiv.org/abs/2402.01355v2 ) ライセンス: Link先を確認	Laurent Mertens, Elahe' Yargholi, Hans Op de Beeck, Jan Van den Stock, Joost Vennekens,	(参考訳) 我々は25k画像のアノテーションを含む新しい画像データセットであるFindingEmoを紹介した。既存のデータセットとは対照的に、さまざまな自然主義的、社会的な設定で複数の人を描写する複雑なシーンに焦点を合わせており、画像は全体として注釈付けされている。注釈付きディメンションには、Valence、Arousal、Emotionのラベルがあり、Prolificを使ってアノテーションを収集する。アノテーションとともに、元のイメージを示すURLのリストと、関連するすべてのソースコードをリリースします。 We introduce FindingEmo, a new image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# 機械学習における情報理論のアプローチ An Information Theoretic Approach to Machine Unlearning ( http://arxiv.org/abs/2402.01401v3 ) ライセンス: Link先を確認	Jack Foster, Kyle Fogarty, Stefan Schoepf, Cengiz Öztireli, Alexandra Brintrup,	(参考訳) AIやデータ規則に従うためには、トレーニングされた機械学習モデルからプライベートまたは著作権のある情報を忘れる必要性がますます高まっている。アンラーニングにおける重要な課題は、モデルのパフォーマンスを保ちながら、必要なデータをタイムリーに忘れることである。この研究では、ゼロショットのアンラーニングシナリオに対処し、未学習のアルゴリズムは、トレーニングされたモデルと忘れられるデータだけが与えられたデータを削除できなければならない。我々は、サンプルの影響をモデルが受ける情報と結びつけ、情報理論の観点から未学習を探索する。このことから,モデルの幾何学に基づく単純だが原則化されたゼロショットアンラーニング手法を導出する。提案手法は,学習関数の勾配を,対象の忘れ点付近の小さな近傍に対して最小化する手法である。これによりスムーズな効果が生じ、分類器の境界を移動させることで忘れてしまう。一連の低次元実験を通して一般的なモデル性能を保ちながら、なぜこのアプローチがサンプルを共同で解き放つことができるのか、その背景にある直観を考察する。提案手法は, ゼロショットアンラーニングの厳密な制約の下で, 最先端の性能と競合することが検証された。 To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# 弱視から学ぶための一般的なフレームワーク A General Framework for Learning from Weak Supervision ( http://arxiv.org/abs/2402.01922v3 ) ライセンス: Link先を確認	Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj,	(参考訳) 弱い教師付き学習は、様々なシナリオに適用可能な課題に直面している。本稿では、新しいアルゴリズムを用いて、弱監督(GLWS)から学習するための一般的な枠組みを紹介する。 GLWSの中心は期待最大化(EM)の定式化であり、サンプル部分ラベル、集約統計、ペアワイズ観測、ラベルなしデータなど、様々な弱い監督ソースを順調に収容している。さらに,非決定論的有限オートマトン (NFA) とフォワードバックワードアルゴリズムを用いて,EM計算要求を大幅に単純化するアルゴリズムを提案する。したがって、任意の弱監督から学習する問題は、それらのNFAモデリングに変換される。 GLWSは機械学習モデルのスケーラビリティを向上するだけでなく、11の弱い監視シナリオで優れたパフォーマンスと汎用性を示す。この分野でのさらなる進歩と実践的な展開の道を開くことを願っています。 Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# ファインチューニング基礎モデルのためのリーマン事前条件付きLORA Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models ( http://arxiv.org/abs/2402.02347v3 ) ライセンス: Link先を確認	Fangzhao Zhang, Mert Pilanci,	(参考訳) Low-Rank Adaptation (LoRA) は、事前学習したモデルの重みを凍結し、付加的な低ランクトレーニング可能な行列を更新することを提案するPEFT法として人気がある。本稿では,LoRA トレーニングの強化について,各勾配ステップに $r \times r$ preconditioner を導入することで検討する。提案したプリコンディショナは,無限幅NN設定下でのLoRAによる特徴学習を安定化する。経験的に、この新しいプリコンディショナーの実装は、既存のオプティマイザコードに小さな変更を必要とし、事実上最小のストレージとランタイムオーバーヘッドを生成する。大規模言語モデルとテキスト・ツー・イメージ拡散モデルによる実験結果から,この新しいプレコンディショナーにより,SGDとAdamWの収束性と信頼性が著しく向上できることが示唆された。さらに、トレーニングプロセスは、学習率などのハイパーパラメータ選択に対して、より堅牢になる。新しいプレコンディショナーは、ローランク行列場における新しいリーマン計量から導出することができる。コードはhttps://github.com/pilancilab/Riemannian_Preconditioned_LoRAでアクセスすることができる。 Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method, which proposes to freeze pretrained model weights and update an additive low-rank trainable matrix. In this work, we study the enhancement of LoRA training by introducing an $r \times r$ preconditioner in each gradient step where $r$ is the LoRA rank. We theoretically verify that the proposed preconditioner stabilizes feature learning with LoRA under infinite-width NN setting. Empirically, the implementation of this new preconditioner requires a small change to existing optimizer code and creates virtually minuscule storage and runtime overhead. Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and AdamW can be significantly enhanced. Moreover, the training process becomes much more robust to hyperparameter choices such as learning rate. The new preconditioner can be derived from a novel Riemannian metric in low-rank matrix field. Code can be accessed at https://github.com/pilancilab/Riemannian_Preconditioned_LoRA.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# DeepLag: 直観的流体予測のためのディープラグランジアンダイナミクスの発見 DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction ( http://arxiv.org/abs/2402.02425v3 ) ライセンス: Link先を確認	Qilong Ma, Haixu Wu, Lanxiang Xing, Shangchen Miao, Mingsheng Long,	(参考訳) 将来の流体を正確に予測することは、気象学、海洋学、空気力学など幅広い分野において不可欠である。しかしながら、流体は通常ユーレウスの視点で観測されるため、その動きと複雑なダイナミクスは深刻な曖昧さと静的な格子にまとめられ、予測に厄介な挑戦をもたらす。本稿では, タングルサム流体力学に対処する新しいラグランジアン・ユーレリア複合パラダイムを提案する。ユーレアン観測に基づいて未来を予測するのではなく、適応的にサンプリングされた鍵粒子の動きを追跡することによって流体中に隠れたラグランジアン力学を発見するディープラグを提案する。さらに、ディープラグは、追跡された粒子のラグランジアン運動をユーレリア観測から推定し、その蓄積したラグランジアンダイナミクス情報を、それぞれ将来の予測を導くためにグローバルユーレリア進化特徴に組み込む、流体予測の新しいパラダイムを提示する。キー粒子の追跡は、流体力学の透明かつ解釈可能な手がかりを提供するだけでなく、我々のモデルは、大規模グリッド間の複雑な相関をモデル化することなく、効率を向上する。実験では、DeepLagは2Dと3D、シミュレートされた実世界の流体をカバーする3つの挑戦的な流体予測タスクに優れています。 Accurately predicting the future fluid is vital to extensive areas such as meteorology, oceanology, and aerodynamics. However, since the fluid is usually observed from the Eulerian perspective, its moving and intricate dynamics are seriously obscured and confounded in static grids, bringing thorny challenges to the prediction. This paper introduces a new Lagrangian-Eulerian combined paradigm to tackle the tanglesome fluid dynamics. Instead of solely predicting the future based on Eulerian observations, we propose DeepLag to discover hidden Lagrangian dynamics within the fluid by tracking the movements of adaptively sampled key particles. Further, DeepLag presents a new paradigm for fluid prediction, where the Lagrangian movement of the tracked particles is inferred from Eulerian observations, and their accumulated Lagrangian dynamics information is incorporated into global Eulerian evolving features to guide future prediction respectively. Tracking key particles not only provides a transparent and interpretable clue for fluid dynamics but also makes our model free from modeling complex correlations among massive grids for better efficiency. Experimentally, DeepLag excels in three challenging fluid prediction tasks covering 2D and 3D, simulated and real-world fluids.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# DRED:データ調整環境設計による強化学習におけるゼロショット転送 DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design ( http://arxiv.org/abs/2402.03479v3 ) ライセンス: Link先を確認	Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht,	(参考訳) 深層強化学習(RL)を用いて訓練された自律エージェントは、トレーニング中に遭遇したものと特性を共有した場合でも、新しい環境にうまく一般化する能力に欠けることが多い。本研究では,RLエージェントのゼロショット一般化能力(ZSG)に,個々の環境インスタンスやレベルのサンプリングがどう影響するかを検討する。基本層を共有する深いアクター・クリティカルなアーキテクチャでは, エージェントの内部表現と, 生成したトレーニングデータのトレーニングレベルとの相互情報を最小限に抑える。これは、特定の適応サンプリング戦略によって達成される正規化に対する新しい理論的な正当化を与える。次に、レベル生成の制御を前提とした、教師なし環境設計(UED)手法に注意を向ける。既存のUED手法は,ZSG性能の低いトレーニング分布を著しくシフトできることがわかった。オーバーフィッティングと分散シフトの両方を防止するため,データ正規化環境設計(DRED)を導入する。 DREDは、初期レベルパラメータの基底真理分布を近似するために訓練された生成モデルを用いてレベルを生成する。 DREDは、その基盤として、適応レベルサンプリング戦略とUEDメソッドよりも、ZSGの大幅な改善を実現している。私たちのコードと実験データはhttps://github.com/uoe-agents/dred.comで公開されています。 Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when these environments share characteristics with the ones they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which assume control over level generation. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained to approximate the ground truth distribution of an initial set of level parameters. Through its grounding, DRED achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods. Our code and experimental data are available at https://github.com/uoe-agents/dred.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# ニューラルネットワーク初期化におけるゴールディロックゾーンの分解 Deconstructing the Goldilocks Zone of Neural Network Initialization ( http://arxiv.org/abs/2402.03579v2 ) ライセンス: Link先を確認	Artem Vysogorets, Anna Dawid, Julia Kempe,	(参考訳) トレーニング損失の2次特性は、ディープラーニングモデルの最適化力学に大きな影響を与える。 Fort & Scherlis (2019) は、損失 Hessian の多数の正の曲率と局所凸性は、"Goldilocks zone" と呼ばれる領域にある高度に訓練可能な初期点と関連していることを示した。その後もこの関係に触発された研究はごくわずかであり、ほとんど説明がつかないままである。本稿では,同種ニューラルネットワークにおけるGoldilocksゾーンの厳密かつ包括的解析について述べる。特に、損失の正の曲率を超越した基本条件を導出し、従来受け入れられていた初期化ノルムへの接続を説明する。さらに, 正曲率の過大さをモデル信頼度, 初期損失の低さ, 以前は知られていなかったクロスエントロピー損失勾配に関連付ける。深層ネットワークのトレーニング性に対する過剰な正曲率の重要性を理解するため,Goldilocksゾーン外の完全連結・畳み込みアーキテクチャを最適化し,創発的挙動を解析した。私たちは、強力なモデルパフォーマンスがGoldilocksゾーンと完全に一致していないことに気付き、この関係についてさらなる研究を要求します。 The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this relationship, so it remains largely unexplained. In this paper, we present a rigorous and comprehensive analysis of the Goldilocks zone for homogeneous neural networks. In particular, we derive the fundamental condition resulting in excess of positive curvature of the loss, explaining and refining its conventionally accepted connection to the initialization norm. Further, we relate the excess of positive curvature to model confidence, low initial loss, and a previously unknown type of vanishing cross-entropy loss gradient. To understand the importance of excessive positive curvature for trainability of deep networks, we optimize fully-connected and convolutional architectures outside the Goldilocks zone and analyze the emergent behaviors. We find that strong model performance is not perfectly aligned with the Goldilocks zone, calling for further research into this relationship.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# Read to Play (R2-Play):マルチモーダルゲーム指導による決定変換器 Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction ( http://arxiv.org/abs/2402.04154v6 ) ライセンス: Link先を確認	Yonggang Jin, Ge Zhang, Hao Zhao, Tianyu Zheng, Jarvi Guo, Liuyu Xiang, Shawn Yue, Stephen W. Huang, Zhaofeng He, Jie Fu,	(参考訳) 汎用エージェントの開発は、人工知能の長年の目標である。さまざまなタスクから広範なオフラインデータセットを活用するこれまでの取り組みは、強化学習内のマルチタスクシナリオにおいて、顕著なパフォーマンスを示している。しかしながら、これらの作業は、新しいタスクに機能を拡張する際の課題に直面します。近年,テキスト指導や視覚的軌跡を意思決定ネットワークに統合し,タスク固有の文脈情報を提供し,有望な方向を示す手法が提案されている。しかし,タスクの文脈情報を正確に伝達するには,テキスト指導や視覚的軌跡のみに頼るだけでは不十分であることが観察された。本稿では,エージェントに対するタスクガイダンスの強化について検討し,ゲームプレイの指示を理解することによって,「読み上げ」機能を実現する。視覚タスクにおけるマルチモーダル・インストラクション・チューニングの成功からインスピレーションを得て、視覚ベースのRLタスクを長期視覚タスクとして扱い、インストラクション・チューニングを決定変換器に組み込むためのマルチモーダル・ゲーム・インストラクションのセットを構築する。実験により,マルチモーダルゲーム命令を組み込むことで,決定変換器のマルチタスクと一般化能力を大幅に向上することが示された。 Developing a generalist agent is a longstanding objective in artificial intelligence. Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning. However, these works encounter challenges in extending their capabilities to new tasks. Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction. However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks. This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a "read-to-play" capability. Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer. Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer's multitasking and generalization capabilities.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# ダンス生成のための双方向自己回帰拡散モデル Bidirectional Autoregressive Diffusion Model for Dance Generation ( http://arxiv.org/abs/2402.04356v2 ) ライセンス: Link先を確認	Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang,	(参考訳) ダンスは人間の感情を表現するための強力な媒体として機能するが、人生のようなダンスの生成は依然としてかなりの課題である。近年、拡散モデルは様々な領域で顕著な生成能力を示した。彼らは、適応可能な多対多の性質のために、人間のモーションジェネレーションを約束します。それにもかかわらず、現在の拡散に基づく運動生成モデルは、局所的および双方向的な拡張による動きに焦点を絞らず、直接かつ一方向の運動列を直接生成することが多い。高品質な舞踊の動きを振る舞う際には、音楽的文脈だけでなく、近隣の音楽的な舞踊の動きも考慮する必要がある。本研究では,音楽間距離生成のための双方向自己回帰拡散モデル (BADM) を提案する。生成したダンス動作をよりスムーズにするため、局所運動強調のための局所情報デコーダを構築する。提案フレームワークは入力条件と近傍の動作に基づいて新しい動きを生成することができ、個々の動きスライスを反復的に予測し、全ての予測を統合する。生成されたダンスとビートとの同期性を更に向上させるため、ビート情報を入力として組み込んで、より優れた音楽整列ダンス動作を生成する。実験結果から,提案モデルが既存の一方向アプローチと比較して最先端性能を達成できることが示唆された。 Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# プロキシ再暗号化, IPFS, ブロックチェーンの統合による電子カルテの商用化, 分散化, ストアリング A Solution for Commercializing, Decentralizing and Storing Electronic Medical Records by Integrating Proxy Re-Encryption, IPFS, and Blockchain ( http://arxiv.org/abs/2402.05498v2 ) ライセンス: Link先を確認	Phong Tran, Thong Nguyen, Long Chu, Nhi Tran, Hang Ta,	(参考訳) グローバルシステム全体でのユーザ医療記録の急速な拡大は、機会だけでなく、ユーザのプライバシ、コントロール可能性、患者の医療記録を商業化する能力を保証する効果的なアプリケーションモデルを維持する上での新たな課題も示している。さらに、医療機関におけるデータ分析モデルの普及は、医療記録データの分散化と復元性を必要とする。これらのシステムから収集されたユーザ医療データは、収集後数年も簡単に分析・活用でき、多くの要因によるデータ損失のリスクを伴わないことが重要である。さらに、医療情報はデータ所有者によって認可され、患者に医療研究機関からのデータ使用要求を受け入れ、拒否する権利を与える必要がある。そこで本研究では,EVM互換のブロックチェーンとIPFSを用いた分散ストレージを実現するための革新的なソリューションを提案する。プライバシとコントロールを確保するため,医療データマーケットプレースでは,PRE(Proxy Re-Encryption)という暗号認証方式を採用しています。提案アーキテクチャは,記録記録の暗号化と復号化を最小化することにより,医療研究機関への読み取りアクセスを許可するコストを大幅に削減する。さらに、ブロックチェーンのスマートコントラクトとIPFSを通じて、医療データのコントロールを強化し、医療記録の完全性とプライバシを保護します。 The rapid expansion of user medical records across global systems presents not only opportunities but also new challenges in maintaining effective application models that ensure user privacy, controllability, and the ability to commercialize patient medical records. Moreover, the proliferation of data analysis models in healthcare institutions necessitates the decentralization and restorability of medical record data. It is imperative that user medical data collected from these systems can be easily analyzed and utilized even years after collection, without the risk of data loss due to numerous factors. Additionally, medical information must be authorized by the data owner, granting patients the right to accept or decline data usage requests from medical research agencies. In response, we propose an innovative solution for implementing a decentralized system utilizing an EVM-compatible blockchain and IPFS for decentralized storage. To ensure privacy and control, we employ Proxy Re-Encryption (PRE), a cryptographic authorized method, within the medical data marketplace. Our proposed architecture significantly reduces costs associated with granting read access to healthcare research agencies by minimizing the encryption and decryption time of stored records. Furthermore, it empowers users with enhanced control over their health data through tamperproof blockchain smart contracts and IPFS, safeguarding the integrity and privacy of their medical records.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# トランスフォーマーはどのようにして文脈内自己回帰学習を行うのか? How do Transformers perform In-Context Autoregressive Learning? ( http://arxiv.org/abs/2402.05787v2 ) ライセンス: Link先を確認	Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré,	(参考訳) トランスフォーマーは言語モデリングタスクで最先端のパフォーマンスを達成した。しかし、その大成功の背景にはいまだ不明な点がある。本稿では,より理解を深めるために,第1次自己回帰プロセス $s_{t+1} = W s_t$ としてシーケンスが生成される,単純な次のトークン予測タスク上でTransformerモデルをトレーニングする。トレーニングされたTransformerが、まず$W$ in-contextを学習し、次に予測マッピングを適用することで、次のトークンを予測する方法を示す。結果の手順を文脈内自己回帰学習と呼ぶ。より正確には、直交行列の可換化に$W$に着目して、トレーニングされた一層線形変換器が、拡張トークンを考える際に、内的目的関数の最小化のために勾配勾配の1ステップを実装できることを最初に示す。トークンが拡張されない場合、一層対角線マルチヘッド変換器のグローバルミニマを特徴付ける。重要なことは、頭部間の直交性を示し、位置符号化がデータの三角関係を捉えることを示すことである。実験面では,非可換直交行列の一般事例を考察し,理論的な知見を一般化する。 Transformers have achieved state-of-the-art performance in language modeling tasks. However, the reasons behind their tremendous success are still unclear. In this paper, towards a better understanding, we train a Transformer model on a simple next token prediction task, where sequences are generated as a first-order autoregressive process $s_{t+1} = W s_t$. We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping. We call the resulting procedure in-context autoregressive learning. More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens. When the tokens are not augmented, we characterize the global minima of a one-layer diagonal linear multi-head Transformer. Importantly, we exhibit orthogonality between heads and show that positional encoding captures trigonometric relations in the data. On the experimental side, we consider the general case of non-commuting orthogonal matrices and generalize our theoretical findings.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# 知識蒸留におけるグラフニューラルネットワークを用いた大規模言語モデル Large Language Model Meets Graph Neural Network in Knowledge Distillation ( http://arxiv.org/abs/2402.05894v3 ) ライセンス: Link先を確認	Shengxiang Hu, Guobing Zou, Song Yang, Yanglan Gan, Bofeng Zhang, Yixin Chen,	(参考訳) サービス指向アーキテクチャでは、信頼性を維持し、ユーザの満足度を高めるために、QoS(Quality of Service)を正確に予測することが重要です。しかし、ユーザとサービス間の高次の協調関係を常に見落とし、正確な機能を学ぶ上で重要な特定のユーザサービス呼び出し毎に機能学習を動的に調整できないため、大きな課題が残っている。さらに、QoS進化を捉えるためのRNNに依存しているため、長距離依存関係の管理が難しいため、長期的なトレンドを検出することができる。これらの課題に対処するために、時間対応QoS予測のための \underline{T}arget-Prompt \underline{O}nline \underline{G}raph \underline{C}ollaborative \underline{L}earning (TOGCL) フレームワークを提案する。 TOGCLは、動的なユーザサービス呼び出しグラフを利用して、歴史的なインタラクションをモデル化し、ユーザサービス間の関係を包括的に表現する。このグラフに基づいて、ターゲットユーザ/サービスとその隣人間の暗黙的な協調関係と関連する歴史的QoS値とを同時に考慮しながら、ユーザとサービスのオンラインの深い潜伏した特徴を各時間スライス時に抽出するターゲットプロンプトグラフアテンションネットワークを開発する。さらに、ユーザやサービスの時間的特徴進化パターンを明らかにするために、多層トランスフォーマーエンコーダが使用され、時間的認識のQoS予測につながった。 WS-DREAMデータセットで実施された大規模な実験により、提案したTOGCLフレームワークは、複数のメトリクスにわたって最先端のメソッドを著しく上回り、最大38.80\%の改善が達成された。これらの結果は、TOGCLフレームワークの正確な時間的QoS予測の有効性を裏付けるものである。 In service-oriented architectures, accurately predicting the Quality of Service (QoS) is crucial for maintaining reliability and enhancing user satisfaction. However, significant challenges remain due to existing methods always overlooking high-order latent collaborative relationships between users and services and failing to dynamically adjust feature learning for every specific user-service invocation, which are critical for learning accurate features. Additionally, reliance on RNNs for capturing QoS evolution hampers models' ability to detect long-term trends due to difficulties in managing long-range dependencies. To address these challenges, we propose the \underline{T}arget-Prompt \underline{O}nline \underline{G}raph \underline{C}ollaborative \underline{L}earning (TOGCL) framework for temporal-aware QoS prediction. TOGCL leverages a dynamic user-service invocation graph to model historical interactions, providing a comprehensive representation of user-service relationships. Building on this graph, it develops a target-prompt graph attention network to extract online deep latent features of users and services at each time slice, simultaneously considering implicit collaborative relationships between target users/services and their neighbors, as well as relevant historical QoS values. Additionally, a multi-layer Transformer encoder is employed to uncover temporal feature evolution patterns of users and services, leading to temporal-aware QoS prediction. Extensive experiments conducted on the WS-DREAM dataset demonstrate that our proposed TOGCL framework significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80\%. These results underscore the effectiveness of the TOGCL framework for precise temporal QoS prediction.	翻訳日:2024-06-07 03:25:10 公開日:2024-06-05
# 結合型正規化流れの普遍性について On the Universality of Coupling-based Normalizing Flows ( http://arxiv.org/abs/2402.06578v2 ) ライセンス: Link先を確認	Felix Draxler, Stefan Wahl, Christoph Schnörr, Ullrich Köthe,	(参考訳) 正規化フローの表現力を理解するための新しい理論的枠組みを提案する。科学的な応用が盛んであるにもかかわらず、流れの包括的な理解は、その制限されたアーキテクチャのため、いまだに解明されていない。既存の定理は、任意に不条件のニューラルネットワークを使用する必要があるため、実用性を制限するため、不足している。本稿では,RealNVP などの疎結合型正規化フローに対する分布普遍性定理を提案する。さらに,体積保存型正規化フローは普遍的ではなく,どの分布を学習するか,どのように表現性を修正するかを示す。この結果は,アフィンと関連する結合が表現的であり,一般に容積保存フローに優れており,経験的結果と理論的理解のギャップを埋めるものである,という一般的な知恵を裏付けるものである。 We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 大規模言語モデル:Webshellのエスケープサンプルを生成するハイブリッドプロンプトアルゴリズムの提案 Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples ( http://arxiv.org/abs/2402.07408v2 ) ライセンス: Link先を確認	Mingrui Ma, Lansheng Han, Chunjie Zhou,	(参考訳) サイバー攻撃の頻発により、ウェブシェル攻撃と防衛は次第にネットワークセキュリティの分野で研究ホットスポットとなっている。しかし、公開されているベンチマークデータセットの欠如と、webshellエスケープサンプル生成のための手動で定義されたルールへの過度な依存は、webshellエスケープサンプル生成と人工知能(AI)ベースのWebshell検出に関する研究の進捗を遅らせている。弱いウェブシェルサンプルエスケープ機能の欠点や複雑な悪意のある特徴を持つウェブシェルデータセットの欠如に対処し、ウェブシェル検出の開発を促進するために、大規模言語モデルの助けを借りてウェブシェルサンプル生成のためのハイブリッド・プロンプトアルゴリズムを提案する。ウェブシェルサンプル生成用に特別に開発されたプロンプトアルゴリズムとして、Hybrid Promptアルゴリズムは、思考のチェーン、思考のツリーなど様々な素早いアイデアを結合するだけでなく、ウェブシェル階層モジュールや少数ショット例などの様々なコンポーネントを組み込んで、ウェブシェルエスケープ戦略の学習と推論を容易にする。実験の結果、Hybrid Promptアルゴリズムは、高いエスケープレート(GPT-4モデルでは88.61%)と(GPT-4モデルでは54.98%)で高品質なウェブシェルサンプルを生成する優れたコード推論能力を持つ複数のLLMで動作可能であることが示された。 The frequent occurrence of cyber-attacks has made webshell attacks and defense gradually become a research hotspot in the field of network security. However, the lack of publicly available benchmark datasets and the over-reliance on manually defined rules for webshell escape sample generation have slowed down the progress of research related to webshell escape sample generation and artificial intelligence (AI)-based webshell detection. To address the drawbacks of weak webshell sample escape capabilities, the lack of webshell datasets with complex malicious features, and to promote the development of webshell detection, we propose the Hybrid Prompt algorithm for webshell escape sample generation with the help of large language models. As a prompt algorithm specifically developed for webshell sample generation, the Hybrid Prompt algorithm not only combines various prompt ideas including Chain of Thought, Tree of Thought, but also incorporates various components such as webshell hierarchical module and few-shot example to facilitate the LLM in learning and reasoning webshell escape strategies. Experimental results show that the Hybrid Prompt algorithm can work with multiple LLMs with excellent code reasoning ability to generate high-quality webshell samples with high Escape Rate (88.61% with GPT-4 model on VirusTotal detection engine) and (Survival Rate 54.98% with GPT-4 model).	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# テキスト生成のためのラベル効率の良いモデル選択 Label-Efficient Model Selection for Text Generation ( http://arxiv.org/abs/2402.07891v2 ) ライセンス: Link先を確認	Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch,	(参考訳) 与えられた対象タスクに対するモデル選択は、異なるモデルの出力の品質に関する広範なアノテーションを必要とするため、コストがかかる可能性がある。 DiffUseは、選好アノテーションに基づく候補テキスト生成モデル間の情報決定を効果的に行う方法である。 DiffUseは必要なアノテーション量を削減し、評価を行う上で貴重な時間とリソースを節約します。 DiffUseは、モデル出力間のセマンティックな差異を表す埋め込みをクラスタリングすることで、インテリジェントにインスタンスを選択する。したがって、選好決定に対してより有益な例のサブセットを特定できる。提案手法はモデルに依存しず,任意のテキスト生成モデルに適用し,モデル,プロンプト,構成を選択する。さらに,アノテートするインスタンス数を動的に決定する実用的な反復手法を提案する。何百ものモデルペアに対する一連の実験では、高い評価信頼性を維持しながら、DiffUseが要求されるアノテーションの数を最大75%削減できることを示した。 Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models. We introduce DiffUse, an efficient method to make an informed decision between candidate text generation models based on preference annotations. DiffUse reduces the required amount of annotations, thus saving valuable time and resources in performing evaluation. DiffUse intelligently selects instances by clustering embeddings that represent the semantic differences between model outputs. Thus, it is able to identify a subset of examples that are more informative for preference decisions. Our method is model-agnostic, and can be applied to any text generation model for selecting between models, prompts and configurations. Moreover, we propose a practical iterative approach for dynamically determining how many instances to annotate. In a series of experiments over hundreds of model pairs, we demonstrate that DiffUse can dramatically reduce the required number of annotations -- by up to 75% -- while maintaining high evaluation reliability.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# PreFLMR: 微細粒遅延反応型マルチモーダルリトリーバーのスケールアップ PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers ( http://arxiv.org/abs/2402.08327v2 ) ライセンス: Link先を確認	Weizhe Lin, Jingbiao Mei, Jinghong Chen, Bill Byrne,	(参考訳) LMM(Large Multimodal Models)は、自然言語や視覚的理解に優れるが、知識に基づく視覚質問回答(KB-VQA)のような、質問に対する回答を形作るために文書コレクションから関連する情報を検索するタスクによって、課題が解決される。 KB-VQAのための広範囲なトレーニングおよび評価フレームワークM2KRを提案する。 M2KRにはビジョンと言語タスクの集合が含まれており、汎用マルチモーダルレトリバーのトレーニングと評価のために、単一のベンチマークタスクに組み込まれています。我々はM2KRを用いて、KB-VQAに対する最近開発された細粒度ラテン・アクション・マルチモーダル・レトリバー(FLMR)アプローチの事前訓練版であるPreFLMRを開発した。また, 汎用マルチモーダルレトリバーの開発に有用なPreFLMRのスケーリング挙動について検討した。 Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection of vision and language tasks which we have incorporated into a single suite of benchmark tasks for training and evaluating general-purpose multi-modal retrievers. We use M2KR to develop PreFLMR, a pre-trained version of the recently developed Fine-grained Late-interaction Multi-modal Retriever (FLMR) approach to KB-VQA, and we report new state-of-the-art results across a range of tasks. We also present investigations into the scaling behaviors of PreFLMR intended to be useful in future developments in general-purpose multi-modal retrievers.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 拡散モデルにおける逆最適化の克服:帰納的・原始的バイアスの観点から Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases ( http://arxiv.org/abs/2402.08552v2 ) ライセンス: Link先を確認	Ziyi Zhang, Sen Zhang, Yibing Zhan, Yong Luo, Yonggang Wen, Dacheng Tao,	(参考訳) 拡散モデルと人間の嗜好のギャップを埋めることは、実際の生成ワークフローに統合するために重要である。下流の報酬モデルの最適化は有望なアライメント戦略として現れてきたが、学習された報酬モデルによる過度な最適化のリスクが懸念され、それによって根底的なパフォーマンスが損なわれる可能性がある。本研究では,誘導バイアスとプライマリーバイアスの両方のレンズによる拡散モデルアライメントにおける報酬過最適化問題に直面する。まず,拡散モデルの多段階分極過程に固有の時間的帰納バイアスと現在の手法のミスマッチを,報酬過小評価の潜在的源として同定する。そして、我々の批評家モデルにおける休眠ニューロンが報酬過小評価に対する正則化として機能し、アクティブニューロンはプライマリーバイアスを反映していることが驚くほどわかりました。これらの観測から得られた時間拡散政策最適化(TDPO-R)を提案する。これは、拡散モデルの時間的帰納バイアスを利用して、活動ニューロンから生じる優劣バイアスを緩和するポリシー勾配アルゴリズムである。実験の結果,報酬過小評価を緩和する手法が有効であることが示された。コードはhttps://github.com/ZiyiZhang27/tdpo.comで検証可能である。 Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative workflows. While optimizing downstream reward models has emerged as a promising alignment strategy, concerns arise regarding the risk of excessive optimization with learned reward models, which potentially compromises ground-truth performance. In this work, we confront the reward overoptimization problem in diffusion model alignment through the lenses of both inductive and primacy biases. We first identify a mismatch between current methods and the temporal inductive bias inherent in the multi-step denoising process of diffusion models, as a potential source of reward overoptimization. Then, we surprisingly discover that dormant neurons in our critic model act as a regularization against reward overoptimization while active neurons reflect primacy bias. Motivated by these observations, we propose Temporal Diffusion Policy Optimization with critic active neuron Reset (TDPO-R), a policy gradient algorithm that exploits the temporal inductive bias of diffusion models and mitigates the primacy bias stemming from active neurons. Empirical results demonstrate the superior efficacy of our methods in mitigating reward overoptimization. Code is avaliable at https://github.com/ZiyiZhang27/tdpo.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# ハイブリッド逆強化学習 Hybrid Inverse Reinforcement Learning ( http://arxiv.org/abs/2402.08848v2 ) ライセンス: Link先を確認	Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban Choudhury,	(参考訳) 逆強化学習による模倣学習は、二重刃の剣である。一方、少数の専門家によるデモから学ぶことは、行動的クローニングアプローチよりも、エラーの複雑化に対して堅牢性が高い。一方,学習者は計算コストのかかる強化学習(RL)問題を繰り返し解く必要がある。多くの場合、この計算の多くは専門家と非常に異なるポリシーを検索するのに費やされている。本研究では,オンラインデータとエキスパートデータの混在をトレーニングするハイブリッドRLを用いて,不要な探索を抑えることを提案する。直感的には、専門家データは学習者がトレーニング中に良い状態に焦点を合わせ、強力なポリシーを計算するのに必要な探索の量を削減します。特に、そのようなアプローチでは学習者を環境内の任意の状態にリセットする必要がない。より正式には、逆RLから専門家競合RL(グローバル最適RLではなく)への還元により、IRLアプローチの利点を維持しつつ、内部ポリシー探索ループ間の相互作用を劇的に低減できる。これにより、強力なポリシー性能を保証するモデルフリーとモデルベースハイブリッド逆RLアルゴリズムの両方を導出できる。実験によって、我々のアプローチは、標準的な逆RLや連続制御タスクのスイート上のいくつかのベースラインよりもはるかにサンプル効率が高いことが判明した。 The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approaches. On the other hand, it requires that the learner repeatedly solve a computationally expensive reinforcement learning (RL) problem. Often, much of this computation is wasted searching over policies very dissimilar to the expert's. In this work, we propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration. Intuitively, the expert data focuses the learner on good states during training, which reduces the amount of exploration required to compute a strong policy. Notably, such an approach doesn't need the ability to reset the learner to arbitrary states in the environment, a requirement of prior work in efficient inverse RL. More formally, we derive a reduction from inverse RL to expert-competitive RL (rather than globally optimal RL) that allows us to dramatically reduce interaction during the inner policy search loop while maintaining the benefits of the IRL approach. This allows us to derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees. Empirically, we find that our approaches are significantly more sample efficient than standard inverse RL and several other baselines on a suite of continuous control tasks.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# DeepPolar: ディープラーニングによる非線形大カーネル極性コードの作成 DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning ( http://arxiv.org/abs/2402.08864v2 ) ライセンス: Link先を確認	S Ashwin Hebbar, Sravan Kumar Ankireddy, Hyeji Kim, Sewoong Oh, Pramod Viswanath,	(参考訳) チャネル符号の設計の進歩は人間の創造性によって推進され、適切には散発的である。極符号は、アリカンの分極カーネルの基盤として開発され、符号理論の最新のブレークスルーであり、短距離から中距離のブロック長系のための最先端の誤り訂正符号として登場した。優れたチャネル符号の発明を自動化するため、特にこの体制において、我々は、DeepPolar符号と呼ばれる極性符号の新しい非線形一般化を探求する。 DeepPolarコードは、カーネルサイズを大きくし、これらのカーネルをパラメータ化し、ニューラルネットワークを介してデコーダにマッチさせることで、従来のPolarコーディングフレームワークを拡張している。以上の結果から,これらのデータ駆動型コードは,既存のニューラルコードと従来のポラコードの両方と比較して,カーネルサイズが大きくなるというメリットを効果的に活用できることが示唆された。 Progress in designing channel codes has been driven by human ingenuity and, fittingly, has been sporadic. Polar codes, developed on the foundation of Arikan's polarization kernel, represent the latest breakthrough in coding theory and have emerged as the state-of-the-art error-correction code for short-to-medium block length regimes. In an effort to automate the invention of good channel codes, especially in this regime, we explore a novel, non-linear generalization of Polar codes, which we call DeepPolar codes. DeepPolar codes extend the conventional Polar coding framework by utilizing a larger kernel size and parameterizing these kernels and matched decoders through neural networks. Our results demonstrate that these data-driven codes effectively leverage the benefits of a larger kernel size, resulting in enhanced reliability when compared to both existing neural codes and conventional Polar codes.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# モデル編集による蝶効果:大言語モデルの崩壊をトリガーできる編集は少ない The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse ( http://arxiv.org/abs/2402.09656v4 ) ライセンス: Link先を確認	Wanli Yang, Fei Sun, Xinyu Ma, Xun Liu, Dawei Yin, Xueqi Cheng,	(参考訳) モデル編集は、Large Language Models (LLMs) における知識の改訂において有望であるが、LLMの本質的な能力への影響はしばしば見過ごされている。一つの編集でもモデル崩壊を引き起こし、様々なベンチマークタスクで大幅なパフォーマンス低下を示す。しかし、このような崩壊を防ぐために各編集後のLCMのベンチマークは、過激な時間とリソース集約に費やされる。これを軽減するために, 編集モデルのパープレキシティの変化が下流タスクのパフォーマンスと強く相関していることを示す広範な実験により, シュロゲート計量としてパープレキシティを用いる方法を提案する。さらに,従来の単一編集研究の難題に焦点をあて,様々な編集手法やLLMをまたいだ実世界のシナリオの実践的設定であるシーケンシャル編集について,詳細な研究を行う。その結果, ほぼすべての編集手法が, ごくわずかの編集後, モデル崩壊を招いたことが示唆された。さらなる研究を容易にするため,我々はGPT-3.5を用いて,これらのハードケースに基づいた新しいデータセットであるHardEditを開発した。このデータセットは、信頼性のあるモデル編集の研究の先駆的な基盤と、編集によるモデル崩壊の基礎となるメカニズムを確立することを目的としている。この作業が、モデル編集プラクティスに固有の潜在的なリスクに、コミュニティの注意を引き付けることを願っています。 Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating changes in an edited model's perplexity are strongly correlated with its downstream task performances. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# LLMにおけるファウショットモビリティの発生を緩和するチェーン・オブ・プランテッド・ビヘイビアワークフロー Chain-of-Planned-Behaviour Workflow Elicits Few-Shot Mobility Generation in LLMs ( http://arxiv.org/abs/2402.09836v2 ) ライセンス: Link先を確認	Chenyang Shao, Fengli Xu, Bingbing Fan, Jingtao Ding, Yuan Yuan, Meng Wang, Yong Li,	(参考訳) 大規模言語モデル(LLM)の強力な推論能力は多くの分野に革命的変化をもたらしたが、人間の行動生成におけるその性能はまだ広く研究されていない。行動意図を管理する内部プロセスは抽象的推論によってのみ説明できないため、このギャップが生じる可能性が高い。代わりに、社会的規範や個人の嗜好など、さまざまな要因の影響も受けている。 The Theory of Planned Behaviour (TPB)にインスパイアされた我々は、人間の活動の重要な時空間的ダイナミクスを反映した移動行動生成のためのLLMワークフローであるChain-of-Planned Behaviour (CoPB)を開発した。姿勢,主観的規範,認知行動制御の認知的構造を活用することで,COPBは次の動きの意図を推論するLLMの能力を大幅に向上させた。特に、CoPBは移動意図発生の誤り率を57.8%から19.4%に大幅に下げている。提案する CoPB ワークフローのスケーラビリティを向上させるため,LLM と力学モデルの相乗効果について検討する。重力モデルのようなメカニスティックモビリティモデルは、運動意図を物理的モビリティの振る舞いに効果的にマッピングできる。 CoPBと重力モデルを統合する戦略はトークンのコストを97.7%削減し、同時に性能を向上させる。さらに,提案した CoPB ワークフローは GPT-4-turbo を容易にして,移動行動推論のための高品質なラベルを自動的に生成することができる。これらのラベルは、小規模でオープンソースのLLaMA 3-8Bの微調整に利用でき、生成した振る舞いの品質を犠牲にすることなく、使用コストを大幅に削減できることを示す。 The powerful reasoning capabilities of large language models (LLMs) have brought revolutionary changes to many fields, but their performance in human behaviour generation has not yet been extensively explored. This gap likely emerges because the internal processes governing behavioral intentions cannot be solely explained by abstract reasoning. Instead, they are also influenced by a multitude of factors, including social norms and personal preference. Inspired by the Theory of Planned Behaviour (TPB), we develop a LLM workflow named Chain-of-Planned Behaviour (CoPB) for mobility behaviour generation, which reflects the important spatio-temporal dynamics of human activities. Through exploiting the cognitive structures of attitude, subjective norms, and perceived behaviour control in TPB, CoPB significantly enhance the ability of LLMs to reason the intention of next movement. Specifically, CoPB substantially reduces the error rate of mobility intention generation from 57.8% to 19.4%. To improve the scalability of the proposed CoPB workflow, we further explore the synergy between LLMs and mechanistic models. We find mechanistic mobility models, such as gravity model, can effectively map mobility intentions to physical mobility behaviours. The strategy of integrating CoPB with gravity model can reduce the token cost by 97.7% and achieve better performance simultaneously. Besides, the proposed CoPB workflow can facilitate GPT-4-turbo to automatically generate high quality labels for mobility behavior reasoning. We show such labels can be leveraged to fine-tune the smaller-scale, open source LLaMA 3-8B, which significantly reduces usage costs without sacrificing the quality of the generated behaviours.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 教師なし翻訳のための自己強化型インコンテキスト学習 Self-Augmented In-Context Learning for Unsupervised Word Translation ( http://arxiv.org/abs/2402.10024v2 ) ライセンス: Link先を確認	Yaoyiran Li, Anna Korhonen, Ivan Vulić,	(参考訳) 最近の研究によると、大規模言語モデル(LLM)は、強力な単語翻訳やバイリンガル語彙誘導(BLI)機能を数ショットで示すが、特に低リソース言語では、シード翻訳ペアが利用できないような教師なしシナリオにおいて、従来のマッピングベースのアプローチのパフォーマンスと一致しない。この課題に LLM で対処するため,非教師付き BLI のための自己拡張型インコンテキスト学習 (SAIL) を提案する。ゼロショットプロンプトから始まる SAIL は LLM から高信頼語訳ペアを反復的に誘導し,ICL 方式で同じ LLM に再適用する。提案手法は,広範囲の言語ペアにまたがる2つの確立されたBLIベンチマークにおいて,LLMのゼロショットプロンプトよりも大幅に向上し,また,ボード全体のマッピングベースラインよりも優れていた。最先端の非教師付きBLIの性能を達成することに加えて,SAILに関する包括的な分析を行い,その限界について議論する。 Recent work has shown that, while large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups, they still cannot match the performance of 'traditional' mapping-based approaches in the unsupervised scenario where no seed translation pairs are available, especially for lower-resource languages. To address this challenge with LLMs, we propose self-augmented in-context learning (SAIL) for unsupervised BLI: starting from a zero-shot prompt, SAIL iteratively induces a set of high-confidence word translation pairs for in-context learning (ICL) from an LLM, which it then reapplies to the same LLM in the ICL fashion. Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks spanning a wide range of language pairs, also outperforming mapping-based baselines across the board. In addition to achieving state-of-the-art unsupervised BLI performance, we also conduct comprehensive analyses on SAIL and discuss its limitations.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 機械学習回帰タスクの校正統計の信頼性に対する重み付き不確実性と誤差分布の負の影響 Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks ( http://arxiv.org/abs/2402.10043v4 ) ライセンス: Link先を確認	Pascal Pernot,	(参考訳) 1つは平均絶対誤差(MSE)と平均分散(MV)の差としてキャリブレーション誤差(CE)を推定することであり、もう1つは平均二乗zスコア(ZMS)と1である。問題は、両方のアプローチが、最近の機械学習不確実性定量化(ML-UQ)文学からのデータセットのアンサンブルのために示されているように、異なる結論につながる可能性があることである。 ML-UQデータセットの頻繁な特徴である重み付き不確実性と誤り分布に対しては,MV,MSE,その信頼区間の推定が信頼性に欠けることが示されている。対照的に、ZMS統計は感度が低く、この文脈でもっとも信頼性の高いアプローチを提供する。残念なことに、同じ問題が、一般的なenceのような条件付きキャリブレーション統計や、同様の統計に基づくポストホックキャリブレーション手法にも影響することが期待されている。概説された問題を回避するためのいくつかの解決策が提案されている。 Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and the mean variance (MV); the alternative is to compare the mean squared z-scores (ZMS) to 1. The problem is that both approaches might lead to different conclusions, as illustrated in this study for an ensemble of datasets from the recent machine learning uncertainty quantification (ML-UQ) literature. It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions, which seems to be a frequent feature of ML-UQ datasets. By contrast, the ZMS statistic is less sensitive and offers the most reliable approach in this context. Unfortunately, the same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE, and very likely post-hoc calibration methods based on similar statistics. Several solutions to circumvent the outlined problems are proposed.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 機械学習による大規模言語モデル構築に向けて Towards Safer Large Language Models through Machine Unlearning ( http://arxiv.org/abs/2402.10058v2 ) ライセンス: Link先を確認	Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang,	(参考訳) LLM(Large Language Models)の急速な進歩は、その膨大な事前学習知識と例外的な一般化性によって、様々な領域にまたがる大きな可能性を実証している。しかし、LSMは問題のあるプロンプトに直面すると有害なコンテンツを生成するという課題に遭遇することが多い。この問題に対処するため、既存の研究はLSMが有害な出力を発生させないために勾配上昇に基づくアプローチを導入しようとした。これらの手法は有効であるが、通常のプロンプトに対応する際にしばしばモデルユーティリティに影響を及ぼす。このギャップに対処するために、我々は、通常のプロンプトで実用性を維持しながら有害な知識を排除し、LLMのための新しい非学習フレームワークである選択的知識否定学習(SKU)を紹介した。具体的には、SKUは有害な知識獲得段階と知識否定段階の2段階からなる。第1段階は、モデル内の有害な知識を特定し、取得することを目的としており、第2段階は、この知識を取り除くことを目的としている。 SKUはモデルパラメータの有害な知識を選択的に分離し除去し、モデルの性能が正常なプロンプトに対して堅牢であることを保証する。各種LLMアーキテクチャを用いて実施した実験により,有害情報除去と有効性維持のバランス点をSKUが同定できることが確認された。 The rapid advancement of Large Language Models (LLMs) has demonstrated their vast potential across various domains, attributed to their extensive pretraining knowledge and exceptional generalizability. However, LLMs often encounter challenges in generating harmful content when faced with problematic prompts. To address this problem, existing work attempted to implement a gradient ascent based approach to prevent LLMs from producing harmful output. While these methods can be effective, they frequently impact the model utility in responding to normal prompts. To address this gap, we introduce Selective Knowledge negation Unlearning (SKU), a novel unlearning framework for LLMs, designed to eliminate harmful knowledge while preserving utility on normal prompts. Specifically, SKU is consisted of two stages: harmful knowledge acquisition stage and knowledge negation stage. The first stage aims to identify and acquire harmful knowledge within the model, whereas the second is dedicated to remove this knowledge. SKU selectively isolates and removes harmful knowledge in model parameters, ensuring the model's performance remains robust on normal prompts. Our experiments conducted across various LLM architectures demonstrate that SKU identifies a good balance point between removing harmful information and preserving utility.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# Rewards-in-Context:動的優先度調整による基礎モデルの多目的アライメント Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment ( http://arxiv.org/abs/2402.10207v5 ) ライセンス: Link先を確認	Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen,	(参考訳) 我々は,人選好による基礎モデルの多目的アライメントの問題を考える。しかし、一般に、強化学習(RL)を用いた大規模基礎モデルの構築にはコストがかかり不安定であり、多次元性、不均一性、そして人間の嗜好の相反する性質は、アライメントプロセスをさらに複雑にする。本稿では,リワード・イン・コンテキスト(Rewards-in-Context,RiC)について紹介する。 RiCの優れた特徴は単純さと適応性であり、単一のファンデーションモデルの教師付き微調整しか必要とせず、推論時間中にユーザの好みを動的に調整できる。抽象凸最適化問題の解析解にインスパイアされた我々の動的推論時間調整法は、複数の目的に対してパレート最適解にアプローチする。実験的な証拠は,多目的RLベースラインと比較して,多言語モデル (LLM) と拡散モデルの両方が,約10%のGPU時間で報奨に適合することを示す。 We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. The salient features of RiC are simplicity and adaptivity, as it only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approaches the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning both Large Language Models (LLMs) and diffusion models to accommodate diverse rewards with only around 10% GPU hours compared with multi-objective RL baseline.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# サンプル効率の良いRLHFの能動選好最適化 Active Preference Optimization for Sample Efficient RLHF ( http://arxiv.org/abs/2402.10500v2 ) ライセンス: Link先を確認	Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury,	(参考訳) RLHF(Reinforcement Learning from Human Feedback)は、大規模言語モデル(LLM)と人間の嗜好の整合において重要である。協調生成モデルは様々なタスクにおいて顕著な能力を示してきたが、高品質な人間の嗜好データへの依存は、RLHFの実践的応用においてコストのかかるボトルネックを生み出している。主な理由の1つは、現在の手法が、人間のフィードバックを集めるために、素早い世代のデータセットから一様に生成するペアを選別することに依存しており、その結果、制約された予算の下で最適以下のアライメントが生まれ、効率の良いアライメントにおける適応戦略の臨界性が強調される。最近の研究(Mehta et al , 2023, Muldrew et al , 2024)は、生成の不確実性に基づく様々なヒューリスティックを設計することによってこの問題に対処しようとしている。しかし、[Mehta et al , 2023] の仮定は制限的であるか、[Muldrew et al , 2024] は厳密な理論的保証を提供していない。これらの問題に対処するために、RLHFを文脈的選好帯域フレームワーク内で再構成し、プロンプトを文脈として扱い、より重要なサンプルから選好データをクエリすることでモデルアライメントを向上させるアクティブラーニングアルゴリズムである$\textit{Active Preference Optimization}$$(\textt{APO}$)を開発する。我々は、BTL選好モデルの下で、$\texttt{APO}$の理論的性能保証を分析し、$\texttt{APO}$の予算に対して$O(1/\sqrt{T})$のスケールで学習したポリシーの最適性の差が$T$であることを示す。また、プロンプトの選択による選好データ収集は、一定の準最適性に苦しむポリシーをランダムに導くことを示す。我々は,既存の手法に対する$\texttt{APO}$の有効性を検証するために,実用的な選好データセットに関する詳細な実験的な評価を行い,コスト効率とスケーラブルな方法でアライメントのサンプル効率と実用的なソリューションとして確立した。 Reinforcement Learning from Human Feedback (RLHF) is pivotal in aligning Large Language Models (LLMs) with human preferences. Although aligned generative models have shown remarkable abilities in various tasks, their reliance on high-quality human preference data creates a costly bottleneck in the practical application of RLHF. One primary reason is that current methods rely on uniformly picking prompt-generation pairs from a dataset of prompt-generations, to collect human feedback, resulting in sub-optimal alignment under a constrained budget, which highlights the criticality of adaptive strategies in efficient alignment. Recent works [Mehta et al., 2023, Muldrew et al., 2024] have tried to address this problem by designing various heuristics based on generation uncertainty. However, either the assumptions in [Mehta et al., 2023] are restrictive, or [Muldrew et al., 2024] do not provide any rigorous theoretical guarantee. To address these, we reformulate RLHF within contextual preference bandit framework, treating prompts as contexts, and develop an active-learning algorithm, $\textit{Active Preference Optimization}$ ($\texttt{APO}$), which enhances model alignment by querying preference data from the most important samples, achieving superior performance for small sample budget. We analyze the theoretical performance guarantees of $\texttt{APO}$ under the BTL preference model showing that the suboptimality gap of the policy learned via $\texttt{APO}$ scales as $O(1/\sqrt{T})$ for a budget of $T$. We also show that collecting preference data by choosing prompts randomly leads to a policy that suffers a constant sub-optimality. We perform detailed experimental evaluations on practical preference datasets to validate $\texttt{APO}$'s efficacy over the existing methods, establishing it as a sample-efficient and practical solution of alignment in a cost-effective and scalable manner.	翻訳日:2024-06-07 01:21:50 公開日:2024-06-05
# 学習可能なカーネル関数を持つ線形変換器は文脈内モデルより優れている Linear Transformers with Learnable Kernel Functions are Better In-Context Models ( http://arxiv.org/abs/2402.10644v2 ) ライセンス: Link先を確認	Yaroslav Aksenov, Nikita Balagansky, Sofia Maria Lo Cicero Vaina, Boris Shaposhnikov, Alexey Gorbatovski, Daniil Gavrilov,	(参考訳) 言語モデル(LM)のサブクワッドアーキテクチャのフロンティアの整備は、自然言語処理の急速に発展する分野において不可欠である。 State Space Modelsを含む現在のイノベーションは、言語モデリングタスクにおけるTransformerのパフォーマンスを上回るものとして、当初は祝われていた。しかし、これらのモデルは、トランスフォーマーが伝統的に輝く領域である、本質的なインコンテキスト学習能力の欠如を明らかにしている。ベースモデルはハイブリッドソリューションとして登場し、畳み込みネットワークによって強化された指数関数のテイラー展開にインスパイアされたリニアトランスフォーマーとカーネルを融合した。トランスフォーマーの文脈内適応性を反映して、この分野では強力な競争相手となった。本研究では,Pileデータセットに示すように,マルチクエリ・アソシエイト・リコールタスクと言語モデリングプロセスを用いて評価されたインコンテキスト学習能力を増幅する,独特でエレガントな変更をベースカーネルに提示する。 Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# WilKE: 生涯の知識編集のためのWise-Layerナレッジエディタ WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing ( http://arxiv.org/abs/2402.10987v2 ) ライセンス: Link先を確認	Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,	(参考訳) 知識編集は、大規模言語モデル(LLM)における不正確さを、時代遅れや誤った知識のためにコストがかかることなく修正することを目的としている。しかし、現在の知識編集法は主に単一編集に重点を置いており、生涯編集の要件を満たしていない。本研究は, 毒性蓄積と毒性フラッシュを特徴とする生涯編集において, 知識編集によって生じる性能劣化について明らかにし, 主な原因をパターンアンマッチと同定した。 Wese-Layer Knowledge Editor (WilKE) と呼ばれる知識編集手法を導入し,言語モデルにおいて,様々な階層にまたがる編集知識のパターンマッチング度に基づいて,編集層を選択する。実験結果は、生涯編集において、GPT2-XLとGPT-Jの編集において、最先端の知識編集法と比較して平均46.2%と67.8%の改善が示されている。 Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without costly retraining for outdated or erroneous knowledge. However, current knowledge editing methods primarily focus on single editing, failing to meet the requirements for lifelong editing. This study reveals a performance degradation encountered by knowledge editing in lifelong editing, characterized by toxicity buildup and toxicity flash, with the primary cause identified as pattern unmatch. We introduce a knowledge editing approach named Wise-Layer Knowledge Editor (WilKE), which selects editing layer based on the pattern matching degree of editing knowledge across different layers in language models. Experimental results demonstrate that, in lifelong editing, WilKE exhibits an average improvement of 46.2% and 67.8% on editing GPT2-XL and GPT-J relative to state-of-the-art knowledge editing methods.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# Black-Box Probabilistic Certification による説明のための信頼領域 Trust Regions for Explanations via Black-Box Probabilistic Certification ( http://arxiv.org/abs/2402.11168v3 ) ライセンス: Link先を確認	Amit Dhurandhar, Swagatam Haldar, Dennis Wei, Karthikeyan Natesan Ramamurthy,	(参考訳) 機械学習モデルのブラックボックスの性質を考えると、個々の決定の背後にある要因を解読するために、多くの説明可能性法が開発されている。本稿では,ブラックボックス(確率的)説明証明の新たな問題を紹介する。クエリアクセスのみを持つブラックボックスモデル、例の説明と品質指標(viz.fidelity, stability)が与えられた場合、ハイパーキューブ内のすべての例に説明が適用される場合(高い確率で)、品質基準が満たされる場合(viz.fidelityはいくつかの値よりも大きい)、その例を中心とした最大のハイパーキューブ($\ell_{\infty}$ ball)を見つけることができるか? そのようなemph{trust region} を効率的に見つけることができると、いくつかの利点がある。 i) \emph{ Region}, with a \emph{guarantee} におけるモデル行動に関する洞察二説明のemph{stability}を確定する。三あらゆる例について説明をしなくてすむことにより、時間、エネルギー及びお金を節約できる「emph{explanation reuse}」 iv) 説明方法を比較するためのemph{meta-metric}の可能性。私たちの貢献には、この問題の形式化、ソリューションの提案、計算可能なこれらのソリューションに対する理論的保証の提供、合成および実データに対するそれらの有効性を実験的に示すことが含まれる。 Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a \emph{trust region} has multiple benefits: i) insight into model behavior in a \emph{region}, with a \emph{guarantee}; ii) ascertained \emph{stability} of the explanation; iii) \emph{explanation reuse}, which can save time, energy and money by not having to find explanations for every example; and iv) a possible \emph{meta-metric} to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# FactPICO:医学的証拠の平易な要約のためのファクチュアリティ評価 FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence ( http://arxiv.org/abs/2402.11456v2 ) ライセンス: Link先を確認	Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li,	(参考訳) LLMを用いた平易な言語要約は、技術的コンテンツのテキストアクセシビリティを向上させるのに有用である。しかし、これらの要約は、医学のような高い領域における現実的なものなのだろうか? 本稿では, ランダム化対照試験(RCT)を記述した医療用テキストの非言語要約のための実例ベンチマークであるFactPICOについて述べる。 FactPICOは、3つのLCM(GPT-4、Llama-2、Alpaca)から生成された345のプレーン言語要約と、専門家によるきめ細かい評価と自然言語の有理性からなる。人口,介入,コンパレータ,アウトカム(PICO),および報告されたこれらのサマリーにおけるRTTの重要要素の事実について検討した。また,LLMが付加した余分な情報(例:説明)の正確性も評価した。 FactPICOを用いて, LLMをベースとした新たなファクトリティー指標を含む, 既存のファクトリティー指標をベンチマークする。医学的証拠の平易な言語要約は、特に単純さと事実性のバランスをとる場合、依然として困難であり、既存のメトリクスは、インスタンスレベルの専門家の判断とあまり相関しない。 Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly inform patient treatment. FactPICO consists of 345 plain language summaries of RCT abstracts generated from three LLMs (i.e., GPT-4, Llama-2, and Alpaca), with fine-grained evaluation and natural language rationales from experts. We assess the factuality of critical elements of RCTs in those summaries: Populations, Interventions, Comparators, Outcomes (PICO), as well as the reported findings concerning these. We also evaluate the correctness of the extra information (e.g., explanations) added by LLMs. Using FactPICO, we benchmark a range of existing factuality metrics, including the newly devised ones based on LLMs. We find that plain language summarization of medical evidence is still challenging, especially when balancing between simplicity and factuality, and that existing metrics correlate poorly with expert judgments on the instance level.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 編集の学習:知識編集によるLLMの調整 Learning to Edit: Aligning LLMs with Knowledge Editing ( http://arxiv.org/abs/2402.11905v2 ) ライセンス: Link先を確認	Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang,	(参考訳) 大規模言語モデル(LLM)における知識のごく一部を、他の入力に悪影響を及ぼすことなく効率的に修正することを目的とした知識編集技術は、広く注目を集めている。しかし、既存の手法は主に更新された知識を記憶することに依存しており、LLMは質問に答える際に、新しい知識と固有の知識を効果的に組み合わせることを妨げる。そこで本研究では,LLMに「人間に魚を教える」という哲学に触発されて,知識を入力質問に適用する学習(LTE)フレームワークを提案する。 LTEには2段階のプロセスがあります。一顕微鏡外情報及び言語能力を維持しつつ、信頼性のある顕微鏡内編集を行うための微調整並列データセット上に微調整した調整段階 (II)リアルタイム・マス知識編集に検索に基づくメカニズムを用いた推論フェーズ。 4つの一般的な知識編集ベンチマークと2つのLLMアーキテクチャにまたがって、我々のアプローチを7つの高度なベースラインと比較することにより、LTEの知識編集性能、バッチおよびシーケンシャルな編集の堅牢性、一般的なタスクへの干渉の最小化、高速な編集速度を実証する。データとコードはhttps://github.com/YJiangcm/LTEで入手できる。 Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when answering questions. To this end, we propose a Learning to Edit (LTE) framework, focusing on teaching LLMs to apply updated knowledge into input questions, inspired by the philosophy of "Teach a man to fish." LTE features a two-phase process: (i) the Alignment Phase, which fine-tunes LLMs on a meticulously curated parallel dataset to make reliable, in-scope edits while preserving out-of-scope information and linguistic proficiency; and (ii) the Inference Phase, which employs a retrieval-based mechanism for real-time and mass knowledge editing. By comparing our approach with seven advanced baselines across four popular knowledge editing benchmarks and two LLM architectures, we demonstrate LTE's superiority in knowledge editing performance, robustness in both batch and sequential editing, minimal interference on general tasks, and rapid editing speeds. The data and code are available at https://github.com/YJiangcm/LTE.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# すべての言語モデルが大小 All Language Models Large and Small ( http://arxiv.org/abs/2402.12061v2 ) ライセンス: Link先を確認	Zhixun Chen, Yali Du, David Mguni,	(参考訳) 多くの主要な言語モデル(LM)は、訓練と実行の両方で高強度の計算資源を使用する。これは、デプロイメントのリソースコストを削減し、意思決定タスクの実行を高速化するという課題を引き起こします。本稿では,Language Optimising Network Distribution (LONDI) フレームワークという新しいLMフレームワークを紹介する。 LONDIは、低リソースのLMを使用する場合、複雑な意思決定と推論を必要とする場合にのみ、大きなLMを選択的に採用することを学ぶ。 LONDIは、2つの(オフ・オフ)ポリシーネットワーク、LM、大きなLM(LLM)と、スイッチング制御を使った強化学習モジュールで構成される。次に LLM コールの予算制約とリソース使用量を維持する LONDI の変種を導入する。理論的には、LONDIはシステム状態のサブセットを学習し、その課題を解決するのに必要なLLMを活性化する。次に、LONDIが最適解に収束すると同時に、LLMコールの予算制約をほぼ確実に保ちながら、計算コストを大幅に削減しつつ、様々なタスクを解決できることを証明した。我々は、ScienceWorldとBabyAI-TextのタスクでLONDIのパフォーマンスをテストし、LONDIはリソース集約型LLMでのみ解決可能なタスクを解き、GPU使用率を最大30%削減できることを示した。 Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs only where complex decision-making and reasoning are required while using low-resource LMs (i.e. LMs require less GPU usage, but may not be able to solve the problem alone) everywhere else. LONDI consists of a system of two (off-)policy networks, an LM, a large LM (LLM), and a reinforcement learning module that uses switching controls to quickly learn which system states to call the LLM. We then introduce a variant of LONDI that maintains budget constraints on LLM calls and hence its resource usage. Theoretically, we prove LONDI learns the subset of system states to activate the LLM required to solve the task. We then prove that LONDI converges to optimal solutions while also preserving budgetary constraints on LLM calls almost surely enabling it to solve various tasks while significantly lowering computational costs. We test LONDI's performance in a range of tasks in ScienceWorld and BabyAI-Text and demonstrate that LONDI can solve tasks only solvable by resource-intensive LLMs while reducing GPU usage by up to 30%.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# NEO-BENCH: ニューロジズムを用いた大規模言語モデルのロバスト性評価 NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms ( http://arxiv.org/abs/2402.12261v3 ) ライセンス: Link先を確認	Jonathan Zheng, Alan Ritter, Wei Xu,	(参考訳) 大規模言語モデル(LLM)の性能は、モデルトレーニングに使用されるデータと推論中に見られる新しいテキストの間の時間的ドリフトから低下する。データドリフトを引き起こす言語変更の未調査の道の1つは、新しい言葉形式であるネオロジズムの出現である。我々は、いくつかの一般的な収集手法を用いて、近年のイングランドのネオロジズムの多様な資源を創出する。我々は,新語を含む文と,新語を代替語に置き換えるほぼ同一の文とを比較して,新語を用いた時間的ドリフトの分析を行った。モデル性能は1つの新語が文中に導入されるとき、機械翻訳においてほぼ半減する。これらの結果から,様々な自然言語理解タスクとモデルパープレキシティを備えた新語に一般化するLLMの能力を評価するためのベンチマークを構築した。後続の知識カットオフのモデルでは、より難易度が低くなり、下流のタスクでより良く機能する。 LLMは単語の言語的起源にもとづいて異なる影響を受けており、静的LLMにはネオロジズムが複雑であることを示している。実験を再現するためのベンチマークとコードをリリースします。 The performance of Large Language Models (LLMs) degrades from the temporal drift between data used for model training and newer text seen during inference. One understudied avenue of language change causing data drift is the emergence of neologisms -- new word forms -- over time. We create a diverse resource of recent English neologisms by using several popular collection methods. We analyze temporal drift using neologisms by comparing sentences containing new words with near-identical sentences that replace neologisms with existing substitute words. Model performance is nearly halved in machine translation when a single neologism is introduced in a sentence. Motivated by these results, we construct a benchmark to evaluate LLMs' ability to generalize to neologisms with various natural language understanding tasks and model perplexity. Models with later knowledge cutoff dates yield lower perplexities and perform better in downstream tasks. LLMs are also affected differently based on the linguistic origins of words, indicating that neologisms are complex for static LLMs to address. We will release our benchmark and code for reproducing our experiments.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# ロバストCLIP:ロバスト大視野モデルのための教師なし視覚埋め込みの微調整 Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models ( http://arxiv.org/abs/2402.12336v2 ) ライセンス: Link先を確認	Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein,	(参考訳) OpenFlamingo、LLaVA、GPT-4といったマルチモーダル基盤モデルは、様々な現実世界のタスクにますます使われている。以前の研究では、これらのモデルは視覚のモダリティに対する敵の攻撃に対して非常に脆弱であることが示されている。これらの攻撃は偽の情報を広めたり、ユーザーを欺いたりするために利用でき、大きなマルチモーダル基盤モデルの堅牢性に重大なリスクをもたらす。 CLIPモデルまたはその派生機種の1つは、多くの大きな視覚言語モデル(LVLM)、例えばLLaVAやOpenFlamingoの凍結視覚エンコーダとして使用される。本稿では,CLIPに依存した全視覚ダウンストリームタスク(LVLM,ゼロショット分類)に対してロバストなCLIPビジョンエンコーダを実現するための,教師なし逆調整方式を提案する。特に,元のCLIPモデルをロバストなものに置き換えれば,悪質な第三者によるLVLMのユーザに対する盗難攻撃はもはや不可能であることを示す。下流のLVLMの再訓練や微調整は不要である。コードとロバストモデルはhttps://github.com/chs20/RobustVLMで公開されている。 Multi-modal foundation models like OpenFlamingo, LLaVA, and GPT-4 are increasingly used for various real-world tasks. Prior work has shown that these models are highly vulnerable to adversarial attacks on the vision modality. These attacks can be leveraged to spread fake information or defraud users, and thus pose a significant risk, which makes the robustness of large multi-modal foundation models a pressing problem. The CLIP model, or one of its variants, is used as a frozen vision encoder in many large vision-language models (LVLMs), e.g. LLaVA and OpenFlamingo. We propose an unsupervised adversarial fine-tuning scheme to obtain a robust CLIP vision encoder, which yields robustness on all vision down-stream tasks (LVLMs, zero-shot classification) that rely on CLIP. In particular, we show that stealth-attacks on users of LVLMs by a malicious third party providing manipulated images are no longer possible once one replaces the original CLIP model with our robust one. No retraining or fine-tuning of the down-stream LVLMs is required. The code and robust models are available at https://github.com/chs20/RobustVLM	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 高エネルギー物理応用のための局所感性ハッシュを用いた高効率点変圧器 Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics ( http://arxiv.org/abs/2402.12535v2 ) ライセンス: Link先を確認	Siqi Miao, Zhiyuan Lu, Mia Liu, Javier Duarte, Pan Li,	(参考訳) 本研究では,高エネルギー物理(HEP)や天体物理学などの科学領域における大規模クラウド処理に最適化された新しい変圧器モデルを提案する。グラフニューラルネットワークと標準トランスフォーマーの限界に対処するため、我々のモデルは局所帰納バイアスを統合し、ハードウェアフレンドリーな正規演算とほぼ直線的な複雑性を実現する。この研究の1つの貢献は、効率的な変圧器を構築するための様々なスパーシフィケーション手法の誤差・複雑さトレードオフの定量的解析である。局所誘導バイアスを伴う大規模クラウドデータに対するカーネル近似において,LSH(Locality-sensitive hashing),特にOR & AND-construction LSH(OR & AND-Construction LSH)の優位性が示された。そこで本研究では,E$^2$LSH と OR & AND の構成を組み合わせた LSH ベースの高効率点変換器 (HEPT) を提案する。 HEPTは2つの重要な時間を要するHEPタスクにおいて顕著な性能を示し、既存のGNNやトランスフォーマーを精度と計算速度で大幅に上回り、幾何学的深層学習と大規模科学データ処理の大きな進歩を示している。私たちのコードはhttps://github.com/Graph-COM/HEPTで公開されています。 This study introduces a novel transformer model optimized for large-scale point cloud processing in scientific domains such as high-energy physics (HEP) and astrophysics. Addressing the limitations of graph neural networks and standard transformers, our model integrates local inductive bias and achieves near-linear complexity with hardware-friendly regular operations. One contribution of this work is the quantitative analysis of the error-complexity tradeoff of various sparsification techniques for building efficient transformers. Our findings highlight the superiority of using locality-sensitive hashing (LSH), especially OR & AND-construction LSH, in kernel approximation for large-scale point cloud data with local inductive bias. Based on this finding, we propose LSH-based Efficient Point Transformer (HEPT), which combines E$^2$LSH with OR & AND constructions and is built upon regular computations. HEPT demonstrates remarkable performance on two critical yet time-consuming HEP tasks, significantly outperforming existing GNNs and transformers in accuracy and computational speed, marking a significant advancement in geometric deep learning and large-scale scientific data processing. Our code is available at https://github.com/Graph-COM/HEPT.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 大規模言語モデルは感情的支援者になれるか?感情的支援会話における選好バイアスの緩和 Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation ( http://arxiv.org/abs/2402.13211v2 ) ライセンス: Link先を確認	Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo,	(参考訳) 感情支援会話(Emotional Support Conversation、ESC)は、日々の会話を通じて個人の感情的苦痛を軽減することを目的としたタスクである。 ESConvデータセットには、その固有の複雑さと非直感的な性質から、適切なレスポンスの生成を容易にするためのサポート戦略が組み込まれている。近年、大きな言語モデル(LLM)の顕著な会話能力にもかかわらず、以前の研究は、しばしば有用な感情的支援の提供に苦慮していることを示唆している。したがって、この研究はまずESConv上でのLCMの結果を分析し、正しい戦略を選択する際の課題と、特定の戦略に対する顕著な選好を明らかにする。これらの結果から, LLMにおける本質的な嗜好が感情的支援に及ぼす影響を考察し, 特定の戦略に対する高い嗜好を示すと, 効果的な情緒的支援が妨げられ, 適切な戦略を予測する上での頑健さが増すことが明らかとなった。さらに,LLMが有能な感情的サポーターとして機能するために必要なアプローチについて,方法論的な考察を行った。その結果,(1) 特定の戦略に対する嗜好の低さは情緒的支援の進行を妨げること,(2) 外部援助は嗜好バイアスの低減に役立つこと,(3) 既存のLCMだけでは感情的な支持者にはならないこと,などが強調された。これらの知見は,LLMの感情的知性を高めるための今後の研究への道のりを示唆している。 Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have suggested that they often struggle with providing useful emotional support. Hence, this work initially analyzes the results of LLMs on ESConv, revealing challenges in selecting the correct strategy and a notable preference for a specific strategy. Motivated by these, we explore the impact of the inherent preference in LLMs on providing emotional support, and consequently, we observe that exhibiting high preference for specific strategies hinders effective emotional support, aggravating its robustness in predicting the appropriate strategy. Moreover, we conduct a methodological study to offer insights into the necessary approaches for LLMs to serve as proficient emotional supporters. Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters. These insights suggest promising avenues for future research to enhance the emotional intelligence of LLMs.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 言語間移動におけるマルチソース言語学習の分析 Analysis of Multi-Source Language Training in Cross-Lingual Transfer ( http://arxiv.org/abs/2402.13562v2 ) ライセンス: Link先を確認	Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim,	(参考訳) 多言語言語モデル(LM)の特定の言語とタスクのペアへの適応は、その条件に合わせたデータの可用性に大きく依存する。言語間移動(XLT)法はこのデータ不足問題への対処に寄与しているが、その有効性の背後にあるメカニズムについては現在も議論が続いている。本稿では,言語に依存しない,あるいはタスク固有の機能に重点を置く多言語LMを奨励する,XLTの内部動作に関する有望な仮定の1つに焦点をあてる。我々は、XLTのパターンが、そのプロセスに関わる様々なソース言語でどのように変化するかを調べることで、この仮説を検証した。実験の結果,マルチソース言語学習(Multi-Source Language Training (MSLT)-leads)と呼ばれるXLTにおける複数のソース言語の使用が,言語に依存しない情報の利用によるXLTのメリットを裏付けるものと考えられる。一方,任意の組み合わせのソース言語を使用することで,性能が常に向上するとは限らないことが判明した。提案手法は,MSLTに有効な言語の組み合わせを特定するための単純なヒューリスティックスであり,その有効性を実証的に証明するものである。 The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# CODIS:マルチモーダル大規模言語モデルのためのコンテキスト依存ビジュアル理解のベンチマーク CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models ( http://arxiv.org/abs/2402.13607v3 ) ライセンス: Link先を確認	Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu,	(参考訳) マルチモーダル大規模言語モデル(MLLM)は、視覚と言語を組み合わせた様々なタスクにおいて有望な結果を示してきた。これらのモデルが研究や応用にとってより不可欠なものになるにつれて、それらの能力の包括的な評価がますます重要になっている。しかし、既存のベンチマークのほとんどは、ある状況において、画像がより広い文脈で解釈される必要があることを考慮していない。本研究では,自由形式のテキストで提供されるコンテキストを用いて視覚的理解を高めるモデルの有効性を評価するために,CODISと呼ばれる新しいベンチマークを導入する。以上の結果から,MLLMは必ずしも人体性能に劣っていることが示唆された。さらなる分析により、これらのモデルが、画像の理解を改善するために文脈情報を効果的に抽出し、利用するのに苦労していることが確認される。このことは、MLLMが視覚を文脈依存的に理解する能力を高めることの必要性を浮き彫りにする。プロジェクトのWebサイトはhttps://thunlp-mt.github.io/CODIS.com。 Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpreted within a broader context. In this work, we introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. Further analysis confirms that these models struggle to effectively extract and utilize contextual information to improve their understanding of images. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner. View our project website at https://thunlp-mt.github.io/CODIS.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 大規模言語モデルに基づくレコメンデーションのステルス攻撃 Stealthy Attack on Large Language Model based Recommendation ( http://arxiv.org/abs/2402.14836v2 ) ライセンス: Link先を確認	Jinghao Zhang, Yuting Liu, Qiang Liu, Shu Wu, Guibing Guo, Liang Wang,	(参考訳) 近年,強力な大規模言語モデル (LLM) がレコメンダシステム (RS) の進展に寄与している。しかし、これらのシステムは繁栄しているが、セキュリティの脅威に対する感受性はほとんど見過ごされてしまっている。本研究では,レコメンデーションモデルにLSMを導入することで,項目のテキスト内容に重点を置いているため,新たなセキュリティ脆弱性が生じることを明らかにした。攻撃者は、モデルのトレーニングプロセスに直接干渉することなく、テストフェーズ中にテキストの内容を変更するだけで、アイテムの露出を著しく向上させることができることを実証する。さらにこの攻撃は、全体的なレコメンデーションパフォーマンスに影響を与えず、テキストの変更は微妙であり、ユーザやプラットフォームが検出しにくくなるため、特にステルス性が高い。 4つの主要なLCMベースレコメンデーションモデルに対する総合的な実験は、我々のアプローチの優れた有効性とステルス性を示している。我々の研究は、LLMベースのレコメンデーションシステムにおいて重大なセキュリティギャップを明らかにし、これらのシステムを保護するための将来の研究の道を開く。 Recently, the powerful large language models (LLMs) have been instrumental in propelling the progress of recommender systems (RS). However, while these systems have flourished, their susceptibility to security threats has been largely overlooked. In this work, we reveal that the introduction of LLMs into recommendation models presents new security vulnerabilities due to their emphasis on the textual content of items. We demonstrate that attackers can significantly boost an item's exposure by merely altering its textual content during the testing phase, without requiring direct interference with the model's training process. Additionally, the attack is notably stealthy, as it does not affect the overall recommendation performance and the modifications to the text are subtle, making it difficult for users and platforms to detect. Our comprehensive experiments across four mainstream LLM-based recommendation models demonstrate the superior efficacy and stealthiness of our approach. Our work unveils a significant security gap in LLM-based recommendation systems and paves the way for future research on protecting these systems.	翻訳日:2024-06-07 01:11:46 公開日:2024-06-05
# 接地真理のない大規模言語モデルのランク付け Ranking Large Language Models without Ground Truth ( http://arxiv.org/abs/2402.14860v3 ) ライセンス: Link先を確認	Amit Dhurandhar, Rahul Nair, Moninder Singh, Elizabeth Daly, Karthikeyan Natesan Ramamurthy,	(参考訳) 大規模言語モデル(LLM)の評価とランキングは,これらのモデルの普及とその影響において重要な問題となっている。評価手法は、取得に費用がかかる人間の反応を必要とするか、信頼性の低いLLMを互いに評価するために使用するかのいずれかである。本稿では,質問文や指示文など)のデータセットとLLMのセットを与えられた場合,根拠となる真実や参照応答にアクセスできることなく,それらをランク付けする,新しい視点を提供する。専門家と知識のある人の両方が初心者を識別できる現実の生活に触発された私たちの主要なアイデアは、モデルの三つ子を考えることであり、それぞれが他の2つを評価し、三つ子の中で最悪のモデルを高い確率で正しく識別する。また、私たちの考えを分析し、成功するための十分な条件を提供します。この考え方を繰り返し適用し、LLMをランク付けする2つの方法を提案する。異なる生成タスク(要約、複数選択、ダイアログ)の実験では、参照データなしで真のランキングに近い位置を確実に回復する。これは、実用のために実行可能な低リソースメカニズムを示している。 Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructions, etc.) and a set of LLMs, we rank them without access to any ground truth or reference responses. Inspired by real life where both an expert and a knowledgeable person can identify a novice our main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. We also analyze our idea and provide sufficient conditions for it to succeed. Applying this idea repeatedly, we propose two methods to rank LLMs. In experiments on different generative tasks (summarization, multiple-choice, and dialog), our methods reliably recover close to true rankings without reference data. This points to a viable low-resource mechanism for practical use.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# ダブルIウォーターマーク : LLMファインチューニングのためのモデル著作権保護 Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning ( http://arxiv.org/abs/2402.14883v3 ) ライセンス: Link先を確認	Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li,	(参考訳) さまざまなアプリケーションをサポートするために、ビジネスオーナーにとって一般的で効率的なアプローチは、LLMオーナやクラウドサーバが提供するAPIを通じて、トレーニング済みのLLMを微調整するための貴重なデータセットを活用している。しかし、このプロセスはモデル誤用のかなりのリスクを伴い、ビジネスオーナーに深刻な経済的影響をもたらす可能性がある。したがって、LLM微調整中にこれらのカスタマイズされたモデルの著作権を保護することは、緊急の現実的な要件となっているが、そのような保護を提供するための既存のソリューションは限られている。このプレス問題に対処するため、「ダブルI透かし」と呼ばれる新しい透かし手法を提案する。具体的には、インストラクションチューニングデータに基づいて、2種類のバックドアデータパラダイムを導入し、それぞれインストラクションと入力をトリガーとする。 LLMの学習機能を活用して、データセットにカスタマイズされたバックドアサンプルを組み込むことにより、細調整中に特定の透かし情報をカスタマイズされたモデルに効果的に注入することで、商業シナリオにおける透かしの注入と検証が容易になる。提案手法を各種微調整法で評価し, その無害性, 頑健性, 独特性, 不受容性, 妥当性を定量的および定性的な分析により検証した。 To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these customized models during LLM fine-tuning has become an urgent practical requirement, but there are limited existing solutions to provide such protection. To tackle this pressing issue, we propose a novel watermarking approach named ``Double-I watermark''. Specifically, based on the instruct-tuning data, two types of backdoor data paradigms are introduced with trigger in the instruction and the input, respectively. By leveraging LLM's learning capability to incorporate customized backdoor samples into the dataset, the proposed approach effectively injects specific watermarking information into the customized model during fine-tuning, which makes it easy to inject and verify watermarks in commercial scenarios. We evaluate the proposed "Double-I watermark" under various fine-tuning methods, demonstrating its harmlessness, robustness, uniqueness, imperceptibility, and validity through both quantitative and qualitative analyses.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# シャープネスを意識した最小化と対人訓練の両立について On the Duality Between Sharpness-Aware Minimization and Adversarial Training ( http://arxiv.org/abs/2402.15152v2 ) ライセンス: Link先を確認	Yihao Zhang, Hangzhou He, Jingyu Zhu, Huanran Chen, Yifei Wang, Zeming Wei,	(参考訳) 逆行訓練(AT)は、訓練中に入力サンプルを逆行的に摂動させ、敵の攻撃に対する最も効果的な防御の1つとして認識されているが、必然的にクリーンな精度が低下している。サンプルを摂動する代わりに、Sharpness-Aware Minimization (SAM) はトレーニング中にモデルの重量を摂動させ、より平坦な損失ランドスケープを見つけ、一般化を改善する。しかし、SAMはより清潔な精度で設計されているため、敵の堅牢性を高める効果は未解明のままである。本研究では,SAM と AT の双対性を考慮し,SAM から得られる対角的強靭性について検討する。興味深いことに、SAMのみを使用することで、敵の堅牢性を向上させることができる。このSAMの予期せぬ性質を理解するために、まずSAMがより頑健な特徴を暗黙的に学習する方法に関する経験的および理論的知見を提供し、SAMが特にクリーンな精度を犠牲にすることなく敵の堅牢性を向上できることを示す包括的な実験を行い、精度の高いATに代わるSAMの可能性に光を当てる。コードはhttps://github.com/weizeming/SAM_AT.comで入手できる。 Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https://github.com/weizeming/SAM_AT.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# LLMを用いた概念空間次元のランク付け:微調整戦略の解析 Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies ( http://arxiv.org/abs/2402.15337v2 ) ライセンス: Link先を確認	Nitesh Kumar, Usashi Chatterjee, Steven Schockaert,	(参考訳) 概念空間は、それらの原始的な意味的特徴の観点でエンティティを表現する。このような表現は非常に貴重であるが、特に知覚的特徴や主観的特徴をモデル化する場合には、学習が困難であることが知られている。概念空間をLLM(Large Language Models)から拡張することは,近年,有望な戦略として浮上しているが,既存の作業は,比較的単純なゼロショット戦略を用いて,事前学習されたLLMの探索に限られている。我々は特に、与えられた概念空間次元に応じてエンティティをランク付けするタスクに焦点をあてる。残念なことに、概念空間次元の基底真理ランキングは稀であるため、このタスクでは直接微調整はできない。したがって、より容易に利用できる機能をトレーニングデータとして使用し、結果のモデルのランキング能力が知覚的および主観的特徴に移行するかどうかを分析する。しかし、トレーニングデータに少なくともいくつかの知覚的、主観的特徴を持つことは、最高の結果を達成するのに不可欠である。 Conceptual spaces represent entities in terms of their primitive semantic features. Such representations are highly valuable but they are notoriously difficult to learn, especially when it comes to modelling perceptual and subjective features. Distilling conceptual spaces from Large Language Models (LLMs) has recently emerged as a promising strategy, but existing work has been limited to probing pre-trained LLMs using relatively simple zero-shot strategies. We focus in particular on the task of ranking entities according to a given conceptual space dimension. Unfortunately, we cannot directly fine-tune LLMs on this task, because ground truth rankings for conceptual space dimensions are rare. We therefore use more readily available features as training data and analyse whether the ranking capabilities of the resulting models transfer to perceptual and subjective features. We find that this is indeed the case, to some extent, but having at least some perceptual and subjective features in the training data seems essential for achieving the best results.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# 非線形変換器は文脈内学習においてどのように学習し、一般化するか? How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? ( http://arxiv.org/abs/2402.15607v2 ) ライセンス: Link先を確認	Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen,	(参考訳) トランスフォーマーベースの大規模言語モデルでは、トレーニング済みのモデルがそのタスクから入力出力の例でクエリを増大させるだけで、微調整なしで新しいタスクを処理できるような、コンテキスト内学習機能が目覚ましい。実証的な成功にもかかわらず、トランスフォーマーをトレーニングしてICLとそれに対応するICL能力を達成するメカニズムは、トランスフォーマーの非線形自己注意と非線形活性化に起因する非凸トレーニング問題を解析する技術的な課題により、ほとんど解明されている。本稿では,非線形自己アテンションと非線形MLPを用いたトランスフォーマーのトレーニング力学の理論的解析と,結果モデルのICL一般化能力について述べる。バイナリ分類タスクのグループに着目し,これらのタスクのサブセットからのデータを用いてトランスフォーマーを訓練し,各要素のICL一般化性能への影響を,データ分散シフトの有無に関わらず,残りの未確認タスクに与える影響を定量化する。また、学習したトランスフォーマーの異なるコンポーネントがICLのパフォーマンスにどのように貢献するかを分析する。さらに、モデルプルーニングがICL性能にどのように影響するかを初めて理論的に分析し、適切な等級ベースのプルーニングが推論コストを低減しつつ、ICLに最小限の影響を与えることを証明した。これらの理論的発見は数値実験によって正当化される。 Transformer-based large language models have displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply augmenting the query with some input-output examples from that task. Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers. To the best of our knowledge, this paper provides the first theoretical analysis of the training dynamics of Transformers with nonlinear self-attention and nonlinear MLP, together with the ICL generalization capability of the resulting model. Focusing on a group of binary classification tasks, we train Transformers using data from a subset of these tasks and quantify the impact of various factors on the ICL generalization performance on the remaining unseen tasks with and without data distribution shifts. We also analyze how different components in the learned Transformers contribute to the ICL performance. Furthermore, we provide the first theoretical analysis of how model pruning affects ICL performance and prove that proper magnitude-based pruning can have a minimal impact on ICL while reducing inference costs. These theoretical findings are justified through numerical experiments.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# 熱力学的可逆量子計測と関連する作業コスト Thermodynamically reversible quantum measurements and related work costs ( http://arxiv.org/abs/2402.16037v2 ) ライセンス: Link先を確認	Camille L Latune, Cyril Elouard,	(参考訳) 熱浴に結合した測定装置を含む量子測定の一般的な顕微鏡モデルを考えると、システムと装置の結合のオンオフ過程、統計混合物への移行、古典的な読み出し、装置リセットなど、量子測定の実現に必要なエネルギー資源を解析する。一般的な熱力学の議論を通して、必要最小限の作業は、測定されるシステムのエネルギー変動と、測定の性能を特徴づける情報理論量、すなわち効率と完全性に依存することを示した。さらに、明示的なプロトコルを提供することで、熱力学的に可逆な測定が可能であり、最小限の作業費に到達できることを示す。最後に、有限時間測定プロトコルについて、有限時間熱力学過程に固有のエントロピー生成の増大による作業コストの増加について説明する。これは、測定の効率と作業コストの間のトレードオフに加えて、測定の速度と作業コストの間のトレードオフが増大していることを強調します。 Considering a general microscopic model for quantum measurement comprising a measurement apparatus coupled to a thermal bath, we analyze the energetic resources necessary for the realisation of quantum measurements, including the process of switching on and off the coupling between the system and the apparatus, the transition to a statistical mixture, the classical readout, and the apparatus resetting. We show via general thermodynamic arguments that the minimal required work depends on the energy variation of the system being measured plus information-theoretic quantities characterizing the performance of the measurement -- efficiency and completeness. Additionally, providing an explicit protocol, we show that it is possible to perform thermodynamically reversible measurement, thus reaching the minimal work expenditure. Finally, for finite-time measurement protocols, we illustrate the increasing work cost induced by rising entropy production inherent of finite-time thermodynamic processes. This highlights an emerging trade-off between velocity of the measurement and work cost, on top of a trade-off between efficiency of the measurement and work cost.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# RetrievalQA: 短期オープンドメイン質問応答に対する適応型検索拡張生成の評価 RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering ( http://arxiv.org/abs/2402.16457v2 ) ライセンス: Link先を確認	Zihan Zhang, Meng Fang, Ling Chen,	(参考訳) Adaptive Search-augmented Generation (ARAG) は、ソース情報の効率性と関連性を高めるために、無差別に検索する代わりに、クエリに対する検索の必要性を動的に決定することを目的としている。しかし、従来の研究はARAGアプローチの評価を概ね見落としており、その効果が検討されている。この研究は、新しい世界とロングテール知識をカバーする1,271の短い形式の質問を含む、RetrievalQAというベンチマークを提示する。質問に答えるために必要な知識は LLM から欠落しているため、外部情報は正しく答えるために取り出さなければならない。これにより、RetrievalQAは既存のARAGメソッドを評価するのに適したテストベッドとなる。キャリブレーションに基づく手法はしきい値調整に大きく依存しているのに対し,バニラプロンプトはLLMを誘導して信頼性の高い検索決定を行うには不十分である。本研究は,LLMが校正や追加訓練を伴わずに検索の必要性を評価するのに役立つ,シンプルかつ効果的な方法であるTA-ARE(Time-Aware Adaptive Retrieval)を提案する。データセットとコードはhttps://github.com/hyintell/RetrievalQAで公開される。 Adaptive retrieval-augmented generation (ARAG) aims to dynamically determine the necessity of retrieval for queries instead of retrieving indiscriminately to enhance the efficiency and relevance of the sourced information. However, previous works largely overlook the evaluation of ARAG approaches, leading to their effectiveness being understudied. This work presents a benchmark, RetrievalQA, comprising 1,271 short-form questions covering new world and long-tail knowledge. The knowledge necessary to answer the questions is absent from LLMs; therefore, external information must be retrieved to answer correctly. This makes RetrievalQA a suitable testbed to evaluate existing ARAG methods. We observe that calibration-based methods heavily rely on threshold tuning, while vanilla prompting is inadequate for guiding LLMs to make reliable retrieval decisions. Based on our findings, we propose Time-Aware Adaptive Retrieval (TA-ARE), a simple yet effective method that helps LLMs assess the necessity of retrieval without calibration or additional training. The dataset and code will be available at https://github.com/hyintell/RetrievalQA	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# 政治コンパス」か「スピニング・アロー」か? 大規模言語モデルにおける価値と意見のより意味のある評価に向けて Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models ( http://arxiv.org/abs/2402.16786v2 ) ライセンス: Link先を確認	Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy,	(参考訳) 近年の研究では, 大規模言語モデル (LLM) の価値と意見を, 複数項目のアンケートとアンケートを用いて評価することを目指している。この研究の多くは、現実世界のLLMアプリケーションに関する懸念から動機づけられている。例えば、政治的バイアスのLLMは、何百万人もの人々が使っているときに社会に微妙に影響を及ぼす可能性がある。しかし、このような現実的な懸念は、現在の評価の人工性とは対照的である。本研究は,LLMにおける価値観と意見の制約評価パラダイムに挑戦し,より現実的な非制約評価を探求する。ケーススタディでは、人気のある政治コンパステスト(PCT)に焦点を当てる。体系的なレビューでは、PCTを用いた以前の作業のほとんどは、PCTの多重選択フォーマットに従わざるを得ない。モデルが強制されない場合、その答えは、モデルがどのように強制されているかによって変わること、そして、パラフレーズの堅牢性が欠如していることを示します。そして、より現実的なオープンエンドの回答設定において、モデルがさらに異なる回答を与えることを示す。我々はこれらの知見をLLMの価値と意見を評価するための推奨とオープンな課題に抽出する。 Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificiality of current evaluations: real users do not typically ask LLMs survey questions. Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations. As a case study, we focus on the popular Political Compass Test (PCT). In a systematic review, we find that most prior work using the PCT forces models to comply with the PCT's multiple-choice format. We show that models give substantively different answers when not forced; that answers change depending on how models are forced; and that answers lack paraphrase robustness. Then, we demonstrate that models give different answers yet again in a more realistic open-ended answer setting. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# ジョブショップスケジューリング問題の解決のための双方向グラフ注意ネットワークを用いたトポロジ表現の学習 Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem ( http://arxiv.org/abs/2402.17606v3 ) ライセンス: Link先を確認	Cong Zhang, Zhiguang Cao, Yaoxin Wu, Wen Song, Jing Sun,	(参考訳) 既存の学習に基づくジョブショップスケジューリング問題(JSSP)の解法は、通常、非方向グラフに適した既製のGNNモデルを使用し、解離グラフ(DG)のリッチで有意義なトポロジ構造を無視する。本稿では,このアテンション機構に基づく新しいGNNアーキテクチャである,トポロジ対応双方向グラフアテンションネットワーク(TBGAT)を提案し,JSSPをローカル検索フレームワークに組み込む。具体的には、TBGATは、それぞれ前方と後方のビューからDGを埋め込み、ビューの異なるトポロジに従ってメッセージが伝播し、グラフの注意を通して集約される。そこで本稿では,DGの前方および後方トポロジ的ソートを計算するためのメッセージパス機構に基づく新しい演算子を提案する。さらに,TBGATはジョブ数とマシン数に線形計算の複雑さがあることを理論的および実験的に示し,本手法の実用的価値を高めた。さらに、5つの合成データセットと7つの古典的なベンチマークに関する広範な実験により、TBGATは広い範囲のニューラルネットワークよりも大きなマージンで、新しいSOTA結果を達成することが示された。すべてのコードとデータはhttps://github.com/zcaicaros/TBGAT.comで公開されている。 Existing learning-based methods for solving job shop scheduling problems (JSSP) usually use off-the-shelf GNN models tailored to undirected graphs and neglect the rich and meaningful topological structures of disjunctive graphs (DGs). This paper proposes the topology-aware bidirectional graph attention network (TBGAT), a novel GNN architecture based on the attention mechanism, to embed the DG for solving JSSP in a local search framework. Specifically, TBGAT embeds the DG from a forward and a backward view, respectively, where the messages are propagated by following the different topologies of the views and aggregated via graph attention. Then, we propose a novel operator based on the message-passing mechanism to calculate the forward and backward topological sorts of the DG, which are the features for characterizing the topological structures and exploited by our model. In addition, we theoretically and experimentally show that TBGAT has linear computational complexity to the number of jobs and machines, respectively, strengthening our method's practical value. Besides, extensive experiments on five synthetic datasets and seven classic benchmarks show that TBGAT achieves new SOTA results by outperforming a wide range of neural methods by a large margin. All the code and data are publicly available online at https://github.com/zcaicaros/TBGAT.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# DiffusionがDAggerと出会う: 目と手の動きの学習を超える Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning ( http://arxiv.org/abs/2402.17768v2 ) ライセンス: Link先を確認	Xiaoyu Zhang, Matthew Chang, Pranav Kumar, Saurabh Gupta,	(参考訳) 模倣で訓練されたポリシーの一般的な失敗モードは、テスト時に実行エラーを複雑化することである。学習されたポリシーが専門家のデモに存在しないと宣言すると、ポリシーは失敗し、振る舞いを退化させる。データ集合(Dataset Aggregation)あるいはDAggerアプローチは、これらの障害状態をカバーするために、単により多くのデータを収集する。しかし、実際には高額であることが多い。本研究では,手作業による模倣学習のコストを伴わずにDAggerの利点を享受するDiffusion Meets DAgger (DMD)を提案する。分散状態をカバーするために新しいサンプルを集める代わりに、MDDは最近の拡散モデルを用いてこれらのサンプルを合成する。これは、少数のデモから堅牢なパフォーマンスをもたらす。 DMDと行動クローニングのベースラインを,プッシュ,積み重ね,注ぐ,シャツハングという4つのタスクで比較した。プッシュでは、DMDは8つの専門家によるデモンストレーションで80%の成功率を達成した。積み重ねでは、DMDは5杯で平均92%の時間で成功し、BCでは40%である。コーヒー豆を注ぐと、DMDは80%の時間で別のカップに転送される。最後に、DMDは洋服ラックに掛けたシャツの90%の成功率を達成した。 A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to reap the benefits of DAgger without the cost for eye-in-hand imitation learning problems. Instead of collecting new samples to cover out-of-distribution states, DMD uses recent advances in diffusion models to synthesize these samples. This leads to robust performance from few demonstrations. We compare DMD against behavior cloning baseline across four tasks: pushing, stacking, pouring, and shirt hanging. In pushing, DMD achieves 80% success rate with as few as 8 expert demonstrations, where naive behavior cloning reaches only 20%. In stacking, DMD succeeds on average 92% of the time across 5 cups, versus 40% for BC. When pouring coffee beans, DMD transfers to another cup successfully 80% of the time. Finally, DMD attains 90% success rate for hanging shirt on a clothing rack.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# TruthX: 真の空間における大規模言語モデルの編集による幻覚の軽減 TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space ( http://arxiv.org/abs/2402.17811v2 ) ライセンス: Link先を確認	Shaolei Zhang, Tian Yu, Yang Feng,	(参考訳) 大型言語モデル (LLMs) は幻覚を生じさせることがあるが、特にLLMは正しい知識を知っていながら、不合理な反応を生じさせることがある。 LLM内での真理性の活性化は、LLMの知識ポテンシャルを完全に解き放つ鍵である。本稿では, LLMの内部表現における特徴を識別し, 編集することにより, LLMの真しさを活性化する推論時間介入手法であるTruthXを提案する。 TruthXは自動エンコーダを使用して、LLMの表現をそれぞれ意味的および真正な潜在空間にマッピングし、真正空間内の真正な編集方向を特定するために対照的な学習を適用する。推測では、LLMの内部表現を真理空間で編集することで、TruthXはLLMの真理性を効果的に強化する。 TruthfulQAベンチマークでは,TruthXは13の高度なLCMの真偽を平均20%改善することを示した。さらなる分析により、TruthXはLSMの内部表現の1つのベクトルのみを編集することで、真理または幻覚の応答を生成するためにLSMを制御できることが示唆された。 Large Language Models (LLMs) sometimes suffer from producing hallucinations, especially LLMs may generate untruthful responses despite knowing the correct knowledge. Activating the truthfulness within LLM is the key to fully unlocking LLM's knowledge potential. In this paper, we propose TruthX, an inference-time intervention method to activate the truthfulness of LLM by identifying and editing the features within LLM's internal representations that govern the truthfulness. TruthX employs an auto-encoder to map LLM's representations into semantic and truthful latent spaces respectively, and applies contrastive learning to identify a truthful editing direction within the truthful space. During inference, by editing LLM's internal representations in truthful space, TruthX effectively enhances the truthfulness of LLM. Experiments show that TruthX improves the truthfulness of 13 advanced LLMs by an average of 20% on TruthfulQA benchmark. Further analyses suggest that TruthX can control LLM to produce truthful or hallucinatory responses via editing only one vector in LLM's internal representations.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# 逆最適化からEMMへの可能性 From Inverse Optimization to Feasibility to ERM ( http://arxiv.org/abs/2402.17890v2 ) ライセンス: Link先を確認	Saurabh Mishra, Anant Raj, Sharan Vaswani,	(参考訳) 逆最適化は、既知の解から未知のパラメータを推定し、輸送、電力システム、医療などの分野で広く使われている。本研究では,未知の問題パラメータをより正確に予測するために,追加の文脈情報を利用する文脈逆最適化設定について検討する。我々は、文脈逆線形プログラミング(CILP)に注目し、LPの非微分不可能な性質によって引き起こされる課題に対処する。線形予測モデルでは、CILPを凸実現可能性問題に還元し、交互プロジェクションのような標準アルゴリズムを使用する。 CILPのアルゴリズムは、縮退や補間といった追加の仮定なしで理論収束を保証する。次に、ポリアック・ロジャシエヴィチ条件を満たす滑らかな凸損失に対して、CILPを経験的リスク最小化(ERM)に削減する。この削減により、拡張性のある一階最適化手法を用いることで、凸設定における理論的保証を維持しながら、大規模な非凸問題の解決が可能になる。次に,提案手法の一般化性能の定量化にERMの低減法を用いる。最後に, 実世界の合成問題に対する我々のアプローチを実験的に検証し, 既存手法と比較して性能が向上したことを示す。 Inverse optimization involves inferring unknown parameters of an optimization problem from known solutions and is widely used in fields such as transportation, power systems, and healthcare. We study the contextual inverse optimization setting that utilizes additional contextual information to better predict the unknown problem parameters. We focus on contextual inverse linear programming (CILP), addressing the challenges posed by the non-differentiable nature of LPs. For a linear prediction model, we reduce CILP to a convex feasibility problem allowing the use of standard algorithms such as alternating projections. The resulting algorithm for CILP is equipped with theoretical convergence guarantees without additional assumptions such as degeneracy or interpolation. Next, we reduce CILP to empirical risk minimization (ERM) on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition. This reduction enables the use of scalable first-order optimization methods to solve large non-convex problems while maintaining theoretical guarantees in the convex setting. Subsequently, we use the reduction to ERM to quantify the generalization performance of the proposed algorithm on previously unseen instances. Finally, we experimentally validate our approach on synthetic and real-world problems and demonstrate improved performance compared to existing methods.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# コントラスト文表現学習のより良い理解に向けて--グラディエントのための統一パラダイム Towards Better Understanding of Contrastive Sentence Representation Learning: A Unified Paradigm for Gradient ( http://arxiv.org/abs/2402.18281v2 ) ライセンス: Link先を確認	Mingxin Li, Richong Zhang, Zhijie Nie,	(参考訳) 文表現学習(SRL)は自然言語処理(NLP)において重要な課題であり、対照的な自己監督学習(SSL)は現在主流のアプローチである。しかし、その顕著な効果の背景にある理由は不明である。具体的には、対照的なSSLと非対照的なSSLの類似性を理論的観点から研究している。このような類似性は、2つのアプローチが同等のパフォーマンスを達成するように分類タスクで検証することができる。しかし、ランキングタスク(すなわち、SRLのセマンティックテキスト類似性(STS))では、対照的なSSLは非コントラストSSLを大きく上回っている。まず、共通点は、STSで優れたパフォーマンスを達成するために、さまざまな対照的な損失を許容しますか? ※第二に、STSで非コントラストSSLも有効にできるか? Gradient Dissipation、*Weight、Ratioの3つのコンポーネントに依存します。次に、これらのコンポーネントが最適化において果たす役割を詳細に分析し、モデル性能におけるそれらの意義を実験的に示す。最後に、これらのコンポーネントを調整することで、STSにおいて非コントラストSSLが優れたパフォーマンスを達成することができる。 Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, many studies have investigated the similarities between contrastive and non-contrastive SSL from a theoretical perspective. Such similarities can be verified in classification tasks, where the two approaches achieve comparable performance. But in ranking tasks (i.e., Semantic Textual Similarity (STS) in SRL), contrastive SSL significantly outperforms non-contrastive SSL. Therefore, two questions arise: First, what commonalities enable various contrastive losses to achieve superior performance in STS? Second, how can we make non-contrastive SSL also effective in STS? To address these questions, we start from the perspective of gradients and discover that four effective contrastive losses can be integrated into a unified paradigm, which depends on three components: the Gradient Dissipation, the Weight, and the Ratio. Then, we conduct an in-depth analysis of the roles these components play in optimization and experimentally demonstrate their significance for model performance. Finally, by adjusting these components, we enable non-contrastive SSL to achieve outstanding performance in STS.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# 統一生成, 再構成, 表現: 適応型遅延符号化-復号による一般化拡散 Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding ( http://arxiv.org/abs/2402.19009v2 ) ライセンス: Link先を確認	Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu,	(参考訳) 深層生成モデルの膨大な応用は、3つのコア機能 – 新しいインスタンスの生成、インプットの再構築、コンパクト表現の学習 – に固定されている。既存のモデルファミリ(VAE)、GAN(generative adversarial network)、自己回帰モデル(autoregressive model)、および(相対)拡散モデル(latent)拡散モデル)は、一般的に特定の機能やデータ型に優れているが、他では不足している。汎用エンコーディング・デコード拡散確率モデル(EDDPM)を導入する。 EDDPMはパラメタライズされた符号化復号を導入することで標準拡散におけるガウス雑音化を一般化する。重要なことは、EDDPMは、確立された拡散モデル目標とトレーニングレシピと互換性があり、エンコーダ-デコーダパラメータを拡散とともに効果的に学習することができる。適切なエンコーダ/デコーダ(例えば、大きな言語モデル)を選択することで、EDDPMは自然に異なるデータ型に適用できる。テキスト、タンパク質、画像に関する大規模な実験は、多様なデータやタスクを扱う柔軟性と、既存のモデルに対する強力な改善を実証している。 The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and (latent) diffusion models, generally excel in specific capabilities and data types but fall short in others. We introduce Generalized Encoding-Decoding Diffusion Probabilistic Models (EDDPMs) which integrate the core capabilities for broad applicability and enhanced performance. EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding. Crucially, EDDPMs are compatible with the well-established diffusion model objective and training recipes, allowing effective learning of the encoder-decoder parameters jointly with diffusion. By choosing appropriate encoder/decoder (e.g., large language models), EDDPMs naturally apply to different data types. Extensive experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks and the strong improvement over various existing models.	翻訳日:2024-06-07 01:01:43 公開日:2024-06-05
# テキスト生成による独特な知識蒸留 Differentially Private Knowledge Distillation via Synthetic Text Generation ( http://arxiv.org/abs/2403.00932v2 ) ライセンス: Link先を確認	James Flemings, Murali Annavaram,	(参考訳) 大規模言語モデル(LLM)は多くの下流タスクで最先端のパフォーマンスを実現している。しかし、データプライバシの緊急性が高まっているため、実践者はプライベートデータ上で差分プライバシー(DP)でLLMをトレーニングする必要がある。同時に、LLMのパラメータサイズが指数関数的に大きくなることは、リソース制約のあるデバイスや遅延に敏感なアプリケーションにLLMをデプロイする前にモデル圧縮を必要とする。異なるプライバシとモデル圧縮は、一般的に、目的を達成するためにユーティリティ損失をトレードオフする必要があります。さらに、両方のスキームを同時に適用すれば、実用性劣化を複雑にすることができる。そこで本研究では,差分私的知識蒸留アルゴリズムであるDistilDPを提案する。教師のLSMの知識は、合成データ自体からハードラベル、ソフトラベルから評価された教師の出力分布によって2つの方法で学生に伝達される。さらに,教師と生徒が類似のアーキテクチャ構造を共有している場合,その間に隠された表現を整列させることで,知識をさらに掘り下げることができる。我々の実験結果は、DistilDPが既存のベースラインよりも実用性を大幅に改善できることを示し、少なくとも9.0ドルのPPLがBig Patentデータセット上で、強力なプライバシパラメータである$\epsilon=2$を持つ。これらの有望な結果は自己回帰LDMのプライバシー保護圧縮を促進する。私たちのコードはここでアクセスできます。 Large Language models (LLMs) are achieving state-of-the-art performance in many different downstream tasks. However, the increasing urgency of data privacy puts pressure on practitioners to train LLMs with Differential Privacy (DP) on private data. Concurrently, the exponential growth in parameter size of LLMs necessitates model compression before deployment of LLMs on resource-constrained devices or latency-sensitive applications. Differential privacy and model compression generally must trade off utility loss to achieve their objectives. Moreover, simultaneously applying both schemes can compound the utility degradation. To this end, we propose DistilDP: a novel differentially private knowledge distillation algorithm that exploits synthetic data generated by a differentially private teacher LLM. The knowledge of a teacher LLM is transferred onto the student in two ways: one way from the synthetic data itself -- the hard labels, and the other way by the output distribution of the teacher evaluated on the synthetic data -- the soft labels. Furthermore, if the teacher and student share a similar architectural structure, we can further distill knowledge by aligning the hidden representations between both. Our experimental results demonstrate that DistilDP can substantially improve the utility over existing baselines, at least $9.0$ PPL on the Big Patent dataset, with strong privacy parameters, $\epsilon=2$. These promising results progress privacy-preserving compression of autoregressive LLMs. Our code can be accessed here: https://github.com/james-flemings/dp_compress.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# Pairwise Alignmentがグラフドメイン適応を改善した Pairwise Alignment Improves Graph Domain Adaptation ( http://arxiv.org/abs/2403.01092v2 ) ライセンス: Link先を確認	Shikun Liu, Deyu Zou, Han Zhao, Pan Li,	(参考訳) グラフベースの手法は、多くの実世界のアプリケーションにおいて相互接続されたオブジェクトに対するラベル推論のために重要であり、モデルトレーニングに使用されるグラフがテストに使用されるグラフと大きく異なる場合、しばしば一般化問題に遭遇する。この作業は、グラフデータ上の分散シフトのユニークな複雑さに対処するため、グラフドメイン適応(GDA)に組み込まれ、相互接続されたデータポイントは、機能やラベル、特に接続パターンのシフトを経験する。本稿では,条件構造シフト (CSS) とラベルシフト (LS) を緩和することにより,グラフ構造シフトに対処する新しい理論的手法であるペアワイズアライメント (ペアワイズアライメント) を提案する。 Pair-Alignはエッジウェイトを使用して、近隣ノード間の影響を再検討し、CSSを処理する。提案手法は,ネットワークの領域シフトを考慮したノード分類や,粒子衝突実験におけるピーク緩和タスクなど,実世界のアプリケーションにおいて優れた性能を示す。最初のアプリケーションでは、GDA研究のために、これまでで最大のデータセットをキュレートします。提案手法は,既存のベンチマークデータセットにおいて高い性能を示す。 Graph-based methods, pivotal for label inference over interconnected objects in many real-world applications, often encounter generalization challenges, if the graph used for model training differs significantly from the graph used for testing. This work delves into Graph Domain Adaptation (GDA) to address the unique complexities of distribution shifts over graph data, where interconnected data points experience shifts in features, labels, and in particular, connecting patterns. We propose a novel, theoretically principled method, Pairwise Alignment (Pair-Align) to counter graph structure shift by mitigating conditional structure shift (CSS) and label shift (LS). Pair-Align uses edge weights to recalibrate the influence among neighboring nodes to handle CSS and adjusts the classification loss with label weights to handle LS. Our method demonstrates superior performance in real-world applications, including node classification with region shift in social networks, and the pileup mitigation task in particle colliding experiments. For the first application, we also curate the largest dataset by far for GDA studies. Our method shows strong performance in synthetic and other existing benchmark datasets.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# Reward Model Learning vs. Direct Policy Optimization: A Comparison Analysis of Learning from Human Preferences Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences ( http://arxiv.org/abs/2403.01857v2 ) ライセンス: Link先を確認	Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla,	(参考訳) 本稿では、人間からのフィードバック(RLHF)からの強化学習のパラダイムと、最近提案された直接選好最適化(DPO)のパラダイムを体系的に比較することにより、人間の嗜好から学ぶことのより深い理解に向けた一歩を踏み出した。対数政策のパラメトリゼーションと線形報酬関数のクラスに注目する。 2つのパラダイムを比較するために、まずRLHFとDPOの両方が引き起こす最適度差の最小値統計境界を導出し、最適化問題を正確に解くオラクルへのアクセスを仮定する。本稿では,2つのパラダイムの相対比較について,サンプルサイズ,政策および報酬クラス次元,正規化温度を同時に考慮し,詳細な議論を行う。さらに、近似最適化設定まで解析を拡張し、RLHFとDPOの指数的に減衰する収束率を導出する。次に, 地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地対地最後に、マルコフ決定プロセス設定との比較を拡張し、その結果を正確な最適化で一般化する。我々の知る限りでは、我々はRLHFとDPOの比較分析を初めて提供する。 In this paper, we take a step towards a deeper understanding of learning from human preferences by systematically comparing the paradigm of reinforcement learning from human feedback (RLHF) with the recently proposed paradigm of direct preference optimization (DPO). We focus our attention on the class of loglinear policy parametrization and linear reward functions. In order to compare the two paradigms, we first derive minimax statistical bounds on the suboptimality gap induced by both RLHF and DPO, assuming access to an oracle that exactly solves the optimization problems. We provide a detailed discussion on the relative comparison between the two paradigms, simultaneously taking into account the sample size, policy and reward class dimensions, and the regularization temperature. Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO. Next, we analyze the setting where the ground-truth reward is not realizable and find that, while RLHF incurs a constant additional error, DPO retains its asymptotically decaying gap by just tuning the temperature accordingly. Finally, we extend our comparison to the Markov decision process setting, where we generalize our results with exact optimization. To the best of our knowledge, we are the first to provide such a comparative analysis for RLHF and DPO.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# トポロジカルサンプル選択によるグラフ上のラベルノイズの緩和 Mitigating Label Noise on Graph via Topological Sample Selection ( http://arxiv.org/abs/2403.01942v2 ) ライセンス: Link先を確認	Yuhao Wu, Jiangchao Yao, Xiaobo Xia, Jun Yu, Ruxin Wang, Bo Han, Tongliang Liu,	(参考訳) 慎重に注釈付けされたベンチマークの成功にもかかわらず、実世界のグラフデータが騒々しくラベル付けされている場合、既存のグラフニューラルネットワーク(GNN)の有効性は著しく損なわれる可能性がある。従来, サンプル選択の探索は, ノイズラベルを用いた頑健な学習の有効な方法として実証されてきたが, 従来の研究はi.dデータに重点を置いており, 非idグラフデータやGNNに移行する際には, 1) トポロジカルなクラス境界付近のノードは分類に非常に有用であるが, ヒューリスティックなサンプル選択では区別できない。 2) グラフにおけるサンプル選択を促進するために, グラフトポロジ情報を考慮した指標は存在しない。このジレンマに対処するために、トポロジ的情報を利用してグラフ内の情報的サンプル選択プロセスを促進する$\textit{Topological Sample Selection}$ (TSS)法を提案する。提案手法は,対象のクリーン分布下での予測されるリスク上限の上限を最小化し,最先端のベースラインと比較して,提案手法の優位性を実験的に示す。 Despite the success of the carefully-annotated benchmarks, the effectiveness of existing graph neural networks (GNNs) can be considerably impaired in practice when the real-world graph data is noisily labeled. Previous explorations in sample selection have been demonstrated as an effective way for robust learning with noisy labels, however, the conventional studies focus on i.i.d data, and when moving to non-iid graph data and GNNs, two notable challenges remain: (1) nodes located near topological class boundaries are very informative for classification but cannot be successfully distinguished by the heuristic sample selection. (2) there is no available measure that considers the graph topological information to promote sample selection in a graph. To address this dilemma, we propose a $\textit{Topological Sample Selection}$ (TSS) method that boosts the informative sample selection process in a graph by utilising topological information. We theoretically prove that our procedure minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# モークドビジョン事前学習におけるトランスフォーマーの学習方法 How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining ( http://arxiv.org/abs/2403.02233v2 ) ライセンス: Link先を確認	Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang,	(参考訳) マスクのないパッチからランダムにマスクされたパッチを推定するマスケリコンストラクションは、自己教師による事前トレーニングにおいて重要なアプローチとして現れている。しかしながら、マスク付き事前学習の理論的理解は、特に変圧器の基本構造について、かなり限定的である。本稿では,マスク付き再構成前訓練における一層変圧器の学習に関する,エンドツーエンドの理論的保証について述べる。概念的側面では,特徴位置相関を強調させる空間構造を持つデータ分布に基づいて,マスク付き視覚前訓練目的のトランスフォーマーが,経験的に観察された局所的・多彩な注意パターンを生成するメカニズムを提示する。技術面では、ソフトマックス・アテンションモデルにおけるトレーニングダイナミクスのエンドツーエンド特性は、入力と位置の埋め込みを同時に考慮する。 Masked reconstruction, which predicts randomly masked patches from unmasked ones, has emerged as an important approach in self-supervised pretraining. However, the theoretical understanding of masked pretraining is rather limited, especially for the foundational architecture of transformers. In this paper, to the best of our knowledge, we provide the first end-to-end theoretical guarantee of learning one-layer transformers in masked reconstruction self-supervised pretraining. On the conceptual side, we posit a mechanism of how transformers trained with masked vision pretraining objectives produce empirically observed local and diverse attention patterns, on data distributions with spatial structures that highlight feature-position correlations. On the technical side, our end-to-end characterization of training dynamics in softmax-attention models simultaneously accounts for input and position embeddings, which is developed based on a careful analysis tracking the interplay between feature-wise and position-wise attention correlations.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# 量子コンピューティング:ビジョンと課題 Quantum Computing: Vision and Challenges ( http://arxiv.org/abs/2403.02240v3 ) ライセンス: Link先を確認	Sukhpal Singh Gill, Oktay Cetinkaya, Stefano Marrone, Daniel Claudino, David Haunschild, Leon Schlote, Huaming Wu, Carlo Ottaviani, Xiaoyuan Liu, Sree Pragna Machupalli, Kamalpreet Kaur, Priyansh Arora, Ji Liu, Ahmed Farouk, Houbing Herbert Song, Steve Uhlig, Kotagiri Ramamohanarao,	(参考訳) 量子コンピューティングの最近の発展は、絡み合い、重ね合わせ、その他の量子基本概念を用いており、従来の計算よりも大幅に処理上の利点をもたらす。これらの量子的特徴は、従来の計算手法では解けない多くの複雑な問題を解くのに役立つ。これらの問題には、量子力学、ロジスティクス、化学ベースの進歩、薬物設計、統計科学、持続可能なエネルギー、銀行、信頼性のある通信、量子化学工学などが含まれる。ここ数年、量子ソフトウェアやアルゴリズムの作成、量子ハードウェアの研究が目覚ましい進歩を遂げており、量子コンピュータの実現に向けて大きく進歩している。この分野に関する総合的な文献研究を行うことで、現状を把握し、量子コンピューティング業界で働く研究コミュニティからかなりの注意を必要とする未解決の問題を発見できるだろう。本稿では,量子コンピューティングの理解を深めるために,この領域における現在の研究に基づく基礎とビジョンについて考察する。本稿では,量子コンピュータハードウェアの最先端開発と量子暗号,量子ソフトウェア,高スケール性量子コンピュータの今後の進歩について論じる。量子技術の研究と開発における多くの潜在的な課題とエキサイティングな新しいトレンドが、より広範な議論のためにこの論文で強調されている。 The recent development of quantum computing, which uses entanglement, superposition, and other quantum fundamental concepts, can provide substantial processing advantages over traditional computing. These quantum features help solve many complex problems that cannot be solved with conventional computing methods. These problems include modeling quantum mechanics, logistics, chemical-based advances, drug design, statistical science, sustainable energy, banking, reliable communication, and quantum chemical engineering. The last few years have witnessed remarkable advancements in quantum software and algorithm creation and quantum hardware research, which has significantly advanced the prospect of realizing quantum computers. It would be helpful to have comprehensive literature research on this area to grasp the current status and find outstanding problems that require considerable attention from the research community working in the quantum computing industry. To better understand quantum computing, this paper examines the foundations and vision based on current research in this area. We discuss cutting-edge developments in quantum computer hardware advancement and subsequent advances in quantum cryptography, quantum software, and high-scalability quantum computers. Many potential challenges and exciting new trends for quantum technology research and development are highlighted in this paper for a broader debate.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# Square Rootを廃止する - AdaGradの新しい効率的なスケール不変バージョン Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad ( http://arxiv.org/abs/2403.02648v2 ) ライセンス: Link先を確認	Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov,	(参考訳) 適応的手法は、学習率のチューニングを安価にするため、機械学習で非常に人気がある。本稿では、よく知られたAdaGradアルゴリズムのスケール不変な適応を示す、KATEという新しい最適化アルゴリズムを提案する。一般化線形モデルの場合のKATEのスケール不変性を証明する。さらに、一般の滑らかな非凸問題に対して、KATE に対して$O \left(\frac{\log T}{\sqrt{T}} \right)$の収束率を確立し、AdaGrad と Adam の最もよく知られた問題と一致する。我々はまた、KATEと他の最先端適応アルゴリズムAdamとAdaGradを比較して、さまざまな問題に関する数値実験を行った。結果は、KATEがAdaGradを一貫して上回り、すべての考慮されたシナリオでAdamのパフォーマンスにマッチ/オーバーパスしていることを示している。 Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# 情報フローによる相互強化効果の実証 Demonstrating Mutual Reinforcement Effect through Information Flow ( http://arxiv.org/abs/2403.02902v2 ) ライセンス: Link先を確認	Chengguang Gan, Xuzheng He, Qinghao Zhang, Tatsunori Mori,	(参考訳) 相互強化効果(MRE)は、テキスト分類タスクにおける単語レベルとテキストレベルの分類の相乗的関係を調査する。両分類レベルの性能は相互に向上できると仮定する。しかし、このメカニズムは以前の研究では十分に実証されていない。このギャップに対処するために,情報フロー解析を用いてMRE理論を観察・実証する。 6つのMREハイブリッドデータセットに対する実験により、モデルにおけるMREの存在とその影響が明らかになった。さらに,情報フロー実験と一致した微調整実験を行った。両方の実験の結果の収束は、MREの存在を裏付けるものである。さらに,テキストレベルの分類ラベルの予測を促進するために,単語レベルの情報を動詞化子として活用し,学習促進のためのMREの適用を拡大した。最終実験では、6つのデータセットのうち5つでF1スコアがベースラインをはるかに上回り、単語レベルの情報によって言語モデル全体の理解が促進されるという概念が検証された。 The Mutual Reinforcement Effect (MRE) investigates the synergistic relationship between word-level and text-level classifications in text classification tasks. It posits that the performance of both classification levels can be mutually enhanced. However, this mechanism has not been adequately demonstrated or explained in prior research. To address this gap, we employ information flow analysis to observe and substantiate the MRE theory. Our experiments on six MRE hybrid datasets revealed the presence of MRE in the model and its impact. Additionally, we conducted fine-tuning experiments, whose results were consistent with those of the information flow experiments. The convergence of findings from both experiments corroborates the existence of MRE. Furthermore, we extended the application of MRE to prompt learning, utilizing word-level information as a verbalizer to bolster the model's prediction of text-level classification labels. In our final experiment, the F1-score significantly surpassed the baseline in five out of six datasets, further validating the notion that word-level information enhances the language model's comprehension of the text as a whole.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# ヒューリスティックコア:事前訓練された言語モデルにおけるサブネットワークの一般化を理解する The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models ( http://arxiv.org/abs/2403.03942v2 ) ライセンス: Link先を確認	Adithya Bhaskar, Dan Friedman, Danqi Chen,	(参考訳) 事前学習された言語モデル(LM)は、異なるランダムなシードで微調整され、類似したドメイン内での性能を達成することができるが、構文一般化のテストでは異なる一般化が可能である。本研究では,単一モデル内であっても,ドメイン内でも同様に動作するが,大きく異なる一般化を行うサブネットワークが複数存在することを示す。これらの現象をよりよく理解するために、「競合サブネットワーク」という用語で理解できるかどうかを考察する: モデルは最初は異なるサブネットワークに対応する様々な異なるアルゴリズムを表現し、最終的に1つに収束すると一般化が起こる。この説明は、単純なアルゴリズムタスク("grokking")の一般化を説明するために使われてきた。競合するサブネットワークを見つける代わりに、すべてのサブネットワーク(一般化するかどうかに関わらず)が、ヒューリスティックコア(heuristic core)と呼ぶ一連のアテンションヘッドを共有することを発見した。さらなる分析は、これらの注意の頭は訓練の初期段階に現れ、浅い、一般化しない特徴を計算していることを示している。モデルは、より高度な特徴を計算するために「ヒューリスティック」ヘッドの出力に依存する追加のアテンションヘッドを組み込むことで一般化することを学ぶ。本研究の結果は, 予め訓練したLMにおける構文一般化のメカニズムについて, より詳細な知見を提供するものである。 Prior work has found that pretrained language models (LMs) fine-tuned with different random seeds can achieve similar in-domain performance but generalize differently on tests of syntactic generalization. In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. To better understand these phenomena, we investigate if they can be understood in terms of "competing subnetworks": the model initially represents a variety of distinct algorithms, corresponding to different subnetworks, and generalization occurs when it ultimately converges to one. This explanation has been used to account for generalization in simple algorithmic tasks ("grokking"). Instead of finding competing subnetworks, we find that all subnetworks -- whether they generalize or not -- share a set of attention heads, which we refer to as the heuristic core. Further analysis suggests that these attention heads emerge early in training and compute shallow, non-generalizing features. The model learns to generalize by incorporating additional attention heads, which depend on the outputs of the "heuristic" heads to compute higher-level features. Overall, our results offer a more detailed picture of the mechanisms for syntactic generalization in pretrained LMs.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# フォトンボース-アインシュタイン凝縮体における非線形応答とオンサガー回帰の観察 Observation of Nonlinear Response and Onsager Regression in a Photon Bose-Einstein Condensate ( http://arxiv.org/abs/2403.04705v3 ) ライセンス: Link先を確認	Alexander Sazhin, Vladimir N. Gladilin, Andris Erglis, Göran Hellmann, Frank Vewinger, Martin Weitz, Michiel Wouters, Julian Schmitt,	(参考訳) 量子回帰定理は、2つの異なる時間における系の相関が平均値の時間応答と同じ運動方程式によって制御されていることを述べる。このような関係は、外的「原因」による内在的微視的行動とマクロ的「効果」との形式的関係を確立することにより、物理系の研究のための強力な枠組みを提供する。このように制御された摂動に対する応答を測定することで、例えば凝縮物質系の構造因子や物質系の他の相関関数を決定できる。ここでは,光子ボース・アインシュタイン凝縮体中の2時間粒子数相関が,色素分子浴の急激な摂動に対する凝縮物の応答と同じダイナミクスを示すことを実験的に実証した。これは量子気体の回帰定理を確認し、さらに、摂動が浴槽に作用し、凝縮反応のみが監視される非伝統的な形でこの関係のテストを確立する。強い摂動に対して、我々の顕微鏡理論が平衡変動に関係している非線形緩和力学を観察し、線形応答の規則を超えた回帰定理を拡張する。凝縮槽系の非線形性の証明は、駆動散逸性光子凝縮体の格子における新しい初等励起の研究の道を開く。 The quantum regression theorem states that the correlations of a system at two different times are governed by the same equations of motion as the temporal response of the average values. Such a relation provides a powerful framework for the investigation of physical systems by establishing a formal connection between intrinsic microscopic behaviour and a macroscopic 'effect' due to an external 'cause'. Measuring the response to a controlled perturbation in this way allows to determine, for example, structure factors in condensed matter systems as well as other correlation functions of material systems. Here we experimentally demonstrate that the two-time particle number correlations in a photon Bose-Einstein condensate inside a dye-filled microcavity exhibit the same dynamics as the response of the condensate to a sudden perturbation of the dye molecule bath. This confirms the regression theorem for a quantum gas and, moreover, establishes a test of this relation in an unconventional form where the perturbation acts on the bath and only the condensate response is monitored. For strong perturbations, we observe nonlinear relaxation dynamics which our microscopic theory relates to the equilibrium fluctuations, thereby extending the regression theorem beyond the regime of linear response. The demonstrated nonlinearity of the condensate-bath system paves the way for studies of novel elementary excitations in lattices of driven-dissipative photon condensates.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# 大規模モンテカルロシミュレーションによる1次元及び2次元ランダム逆場イジングモデルの量子臨界特性 Quantum-critical properties of the one- and two-dimensional random transverse-field Ising model from large-scale quantum Monte Carlo simulations ( http://arxiv.org/abs/2403.05223v2 ) ライセンス: Link先を確認	C. Krämer, J. A. Koziol, A. Langheld, M. Hörmann, K. P. Schmidt,	(参考訳) 強磁性逆場イジングモデルについて, 厳密なゼロ温度スキームを用いた確率級数展開量子モンテカルロシミュレーションを用いて, 1次元および2次元で1T = 0$の焼成障害をもつモデルについて検討した。サンプル複製法と平均ビンダー比を用いて、有限スケールスケーリングによる非バイアス臨界点だけでなく、臨界シフトと幅指数$\nu_\mathrm{s}$および$\nu_\mathrm{w}$を決定する。さらに、臨界点における乱れ平均磁化のスケーリングを用いて、平均相関長のオーダーパラメータ臨界指数$\beta$と臨界指数$\nu_{\mathrm{av}}$を決定する。グリフィス相の動的スケーリングは、乱相の局所感受性を測定して検討し、動的指数$z'$を抽出する。様々な有限サイズのスケーリングプロトコルを適用することにより、等質な足場における異なるアプローチの広範かつ包括的な比較を行う。実効的なゼロ温度シミュレーションの強調は、既存の文献におけるいくつかの矛盾を解消する。 We study the ferromagnetic transverse-field Ising model with quenched disorder at $T = 0$ in one and two dimensions by means of stochastic series expansion quantum Monte Carlo simulations using a rigorous zero-temperature scheme. Using a sample-replication method and averaged Binder ratios, we determine the critical shift and width exponents $\nu_\mathrm{s}$ and $\nu_\mathrm{w}$ as well as unbiased critical points by finite-size scaling. Further, scaling of the disorder-averaged magnetisation at the critical point is used to determine the order-parameter critical exponent $\beta$ and the critical exponent $\nu_{\mathrm{av}}$ of the average correlation length. The dynamic scaling in the Griffiths phase is investigated by measuring the local susceptibility in the disordered phase and the dynamic exponent $z'$ is extracted. By applying various finite-size scaling protocols, we provide an extensive and comprehensive comparison between the different approaches on equal footing. The emphasis on effective zero-temperature simulations resolves several inconsistencies in existing literature.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# ICPアルゴリズムのレジリエンス解析のための学習型逆アタック Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm ( http://arxiv.org/abs/2403.05666v2 ) ライセンス: Link先を確認	Ziyu Zhang, Johann Laconte, Daniil Lisus, Timothy D. Barfoot,	(参考訳) 本稿では,ライダー点雲に対する深層学習に基づく攻撃により,ICPアルゴリズムのレジリエンスを評価する新しい手法を提案する。自律ナビゲーションのような安全クリティカルなアプリケーションでは、デプロイ前にアルゴリズムのレジリエンスを確保することが最も重要です。 ICPアルゴリズムはライダーベースのローカライゼーションの標準となっている。しかし、それが生み出すポーズ推定は、測定の腐敗によって大きく影響を受ける可能性がある。破損は、センサーの閉塞、悪天候、機械的な問題など様々なシナリオから生じることがある。残念ながら、ICPの複雑で反復的な性質は、破壊に対するレジリエンスを評価することを困難にしている。 ICPのレジリエンスを実証的に評価するために,挑戦的なデータセットの作成やシミュレーションの開発が試みられているが,本手法は摂動型対向攻撃を用いた最大ICPポーズ誤差の発見に重点を置いている。提案した攻撃はICPに重大なポーズエラーを生じさせ、幅広いシナリオで88%以上の時間でベースラインを上回ります。例として、ICPが測定結果の破損に対して特に脆弱である地図上の領域を特定するために、我々の攻撃が有効であることを示す。 This paper presents a novel method to assess the resilience of the Iterative Closest Point (ICP) algorithm via deep-learning-based attacks on lidar point clouds. For safety-critical applications such as autonomous navigation, ensuring the resilience of algorithms prior to deployments is of utmost importance. The ICP algorithm has become the standard for lidar-based localization. However, the pose estimate it produces can be greatly affected by corruption in the measurements. Corruption can arise from a variety of scenarios such as occlusions, adverse weather, or mechanical issues in the sensor. Unfortunately, the complex and iterative nature of ICP makes assessing its resilience to corruption challenging. While there have been efforts to create challenging datasets and develop simulations to evaluate the resilience of ICP empirically, our method focuses on finding the maximum possible ICP pose error using perturbation-based adversarial attacks. The proposed attack induces significant pose errors on ICP and outperforms baselines more than 88% of the time across a wide range of scenarios. As an example application, we demonstrate that our attack can be used to identify areas on a map where ICP is particularly vulnerable to corruption in the measurements.	翻訳日:2024-06-07 00:51:07 公開日:2024-06-05
# 拡散モデルによる分散を考慮したデータ拡張 Distribution-Aware Data Expansion with Diffusion Models ( http://arxiv.org/abs/2403.06741v2 ) ライセンス: Link先を確認	Haowei Zhu, Ling Yang, Jun-Hai Yong, Hongzhi Yin, Jiawei Jiang, Meng Xiao, Wentao Zhang, Bin Wang,	(参考訳) データセットのスケールと品質は、ディープモデルのパフォーマンスに大きな影響を与えます。しかし、大規模なアノテートデータセットを取得することは、コストと時間を要する作業である。この課題に対処するため、データセット拡張技術はデータセットを自動的に拡張し、ディープモデルの潜在能力を最大限に活用することを目的としている。現在のデータ拡張技術には、画像変換と画像合成方法が含まれる。変換に基づく手法は局所的な変化のみを導入し、限られた多様性をもたらす。対照的に、合成に基づく手法は全く新しい内容を生成し、情報性を大幅に向上させる。しかし,既存の合成法では分布偏差のリスクが伴い,分布外サンプルを用いたモデル性能が低下する可能性がある。本稿では,分散対応拡散モデルに基づくトレーニングフリーなデータ拡張フレームワークであるDistDiffを提案する。 DistDiffは、階層的なプロトタイプを構築し、実際のデータ分布を近似し、階層的なエネルギー誘導による拡散モデル内の潜在データポイントを最適化する。分散一貫性のあるサンプルを生成する能力を示し、データ拡張タスクを大幅に改善する。 DistDiffは、オリジナルデータのみにトレーニングされたモデルと比較して、さまざまなデータセットの精度を一貫して向上させる。さらに,提案手法は既存の合成技術より一貫して優れており,広く採用されている変換に基づく拡張手法との互換性を示す。さらに、拡張されたデータセットは、さまざまなアーキテクチャフレームワークにまたがる堅牢性を示している。私たちのコードはhttps://github.com/haoweiz23/DistDiffで利用可能です。 The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion techniques include image transformation and image synthesis methods. Transformation-based methods introduce only local variations, leading to limited diversity. In contrast, synthesis-based methods generate entirely new content, greatly enhancing informativeness. However, existing synthesis methods carry the risk of distribution deviations, potentially degrading model performance with out-of-distribution samples. In this paper, we propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff constructs hierarchical prototypes to approximate the real data distribution, optimizing latent data points within diffusion models with hierarchical energy guidance. We demonstrate its capability to generate distribution-consistent samples, significantly improving data expansion tasks. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data. Furthermore, our approach consistently outperforms existing synthesis-based techniques and demonstrates compatibility with widely adopted transformation-based augmentation methods. Additionally, the expanded dataset exhibits robustness across various architectural frameworks. Our code is available at https://github.com/haoweiz23/DistDiff	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# 凸メッセージパッシングアルゴリズムの固定点への収束 Convergence of Some Convex Message Passing Algorithms to a Fixed Point ( http://arxiv.org/abs/2403.07004v2 ) ライセンス: Link先を確認	Vaclav Voracek, Tomas Werner,	(参考訳) グラフィカルモデルにおけるMAP推論問題に対する一般的なアプローチは、双対線形計画法や(ブロック-)座標降下によるラグランジュ緩和から得られる上限を最小化することである。これは凸/収束メッセージパッシング(convex/convergent message passing)とも呼ばれる。これらの手法の収束特性は、現時点では完全には理解されていない。それらは、活性制約の局所的な一貫性と未知の収束率によって特徴づけられる集合に収束することが証明された。より強い結果(先述するが証明されない)を証明し、反復はメソッドの固定点に収束する。さらに、このアルゴリズムは $\mathcal{O}(1/\varepsilon)$ iterations で終了することを示す。まず、これを一般のピースワイズ・アフィン凸対象に適用した座標降下のバージョンとして証明する。次に,複数の凸メッセージパッシング手法が特別な場合であることを示す。最後に、座標降下のわずかに異なるバージョンがサイクル可能であることを示す。 A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not fully understood. They have been proved to converge to the set characterized by local consistency of active constraints, with unknown convergence rate; however, it was not clear if the iterates converge at all (to any point). We prove a stronger result (conjectured before but never proved): the iterates converge to a fixed point of the method. Moreover, we show that the algorithm terminates within $\mathcal{O}(1/\varepsilon)$ iterations. We first prove this for a version of coordinate descent applied to a general piecewise-affine convex objective. Then we show that several convex message passing methods are special cases of this method. Finally, we show that a slightly different version of coordinate descent can cycle.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# ハイパースペクトル画像分類のためのランダム化主成分分析 Randomized Principal Component Analysis for Hyperspectral Image Classification ( http://arxiv.org/abs/2403.09117v2 ) ライセンス: Link先を確認	Mustafa Ustuner,	(参考訳) ハイパースペクトル画像の高次元特徴空間は、ハイパースペクトルデータセットの処理と解析に大きな課題をもたらす。このような場合、計算複雑性を減少させるためには次元削減が必要である。ランダムプロジェクションは、特に大きなデータセットに対して、次元の減少の新しい方法を開く。本稿では, 支持ベクトルマシン (SVM) と光勾配ブースティングマシン (LightGBM) を用いたハイパースペクトル画像の分類のための主成分分析 (PCA) とランダム化主成分分析 (R-PCA) について検討した。この実験では、2つの超スペクトルデータセット(インドパインズ大学とパヴィア大学)を分類するために、特徴の数は20と30に減らされた。実験の結果、PCAは両方のデータセットでSVMのR-PCAよりも優れていたが、LightGBMでは精度が良くなった。最も高い分類精度は、パヴィア大学とインド・パインズに固有の特徴を持つLightGBMによって0.9925と0.9639として得られた。 The high-dimensional feature space of the hyperspectral imagery poses major challenges to the processing and analysis of the hyperspectral data sets. In such a case, dimensionality reduction is necessary to decrease the computational complexity. The random projections open up new ways of dimensionality reduction, especially for large data sets. In this paper, the principal component analysis (PCA) and randomized principal component analysis (R-PCA) for the classification of hyperspectral images using support vector machines (SVM) and light gradient boosting machines (LightGBM) have been investigated. In this experimental research, the number of features was reduced to 20 and 30 for classification of two hyperspectral datasets (Indian Pines and Pavia University). The experimental results demonstrated that PCA outperformed R-PCA for SVM for both datasets, but received close accuracy values for LightGBM. The highest classification accuracies were obtained as 0.9925 and 0.9639 by LightGBM with original features for the Pavia University and Indian Pines, respectively.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# 密度関数理論ハミルトニアン予測のための自己整合性トレーニング Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction ( http://arxiv.org/abs/2403.09560v2 ) ライセンス: Link先を確認	He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu,	(参考訳) 密度汎関数理論における平均場ハミルトン行列の予測は、分子科学の問題を解決するために機械学習を利用するための基本的な定式化である。しかし、その適用性はトレーニングに十分なラベル付きデータによって制限されている。本研究では,ラベル付きデータを必要としない厳密なトレーニング手法である自己整合性トレーニングを提案する。 1) ラベルのない大量のデータに基づいてモデルをトレーニングし、データ不足の問題に対処し、一般化を促進すること、(2) 教師付きトレーニングのためのラベルを生成するためにDFTを実行するよりも効率的である。データスカースとアウト・オブ・ディストリビューションのシナリオにおけるより優れた一般化と、DFTラベリングよりも優れた効率を実証的に示す。これらの利点はハミルトン予想の適用性を常に大きなスケールに推し進める。 Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# AD3: Inlicit Actionは、さまざまな視覚障害を識別する世界モデルの鍵である AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors ( http://arxiv.org/abs/2403.09976v2 ) ライセンス: Link先を確認	Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan,	(参考訳) モデルに基づく手法は、視覚制御のためのタスク非関連な割り込み器の識別に大きく貢献している。しかし、従来の研究では、ノイズの多いバックグラウンドビデオのような異質なイントラクタに主に焦点を当てており、制御可能なエージェントによく似ている同質なイントラクタは、ほとんど探索されていないため、既存の手法には重大な課題が生じる。この問題に対処するために,視覚的障害の暗黙的な動作を学ぶためにImplicit Action Generator (IAG)を提案するとともに,IAGが推定した動作を利用して,分離世界モデルのトレーニングを行う暗黙的な動作インフォームド・ディバース・ディトラクタ・ディスタンス・ディスタンス・ディスタンス(AD3)と呼ばれる新しいアルゴリズムを提案する。 Inlicitアクションは、タスク関連コンポーネントの識別を支援するバックグラウンドインタラプタの挙動を効果的にキャプチャし、エージェントはタスク関連状態空間内のポリシーを最適化することができる。そこで本手法は,異種・同種両輪のトラヒックを特徴とする様々な視覚制御タスクにおいて,優れた性能を実現する。 IAGが学習した暗黙的な行動の必要不可欠な役割も実証的に検証されている。 Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# RAFT:言語モデルをドメイン固有RAGに適用する RAFT: Adapting Language Model to Domain Specific RAG ( http://arxiv.org/abs/2403.10131v2 ) ライセンス: Link先を確認	Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez,	(参考訳) 大規模なテキストデータのコーパス上でのLLM(Large Language Models)の事前学習は、現在では標準パラダイムとなっている。下流の多くのアプリケーションでこれらのLCMを使用する場合、RAGベースのプロンプティングや微調整によって、事前訓練されたモデルに新しい知識(例えば、時間クリティカルニュースやプライベートドメイン知識)を焼くことが一般的である。しかし、そのような新しい知識を得るためのモデルのための最適な方法論は、未解決の問題である。本稿では、ドメイン内の「オープンブック」設定において、モデルが質問に答える能力を改善するためのトレーニングレシピであるRetrieval Augmented FineTuning(RAFT)を提案する。 RAFTでは、質問に答えるのに役に立たない文書を無視するようにモデルを訓練します。 RAFTは、質問に答える助けとなる関連文書から正しいシーケンスを冗長に引用することで、これを達成します。 RAFTの連鎖型応答と組み合わせることで、モデルの推論能力が向上する。ドメイン固有のRAGでは、RAFTは、PubMed、HotpotQA、Gorillaデータセット全体にわたるモデルのパフォーマンスを一貫して改善し、事前トレーニングされたLMをドメイン内のRAGに改善するためのトレーニング後のレシピを提供する。 RAFTのコードとデモはgithub.com/ShishirPatil/gorillaでオープンソース化されている。 Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# EffiVED:テキスト指示拡散モデルによる効率的なビデオ編集 EffiVED:Efficient Video Editing via Text-instruction Diffusion Models ( http://arxiv.org/abs/2403.11568v2 ) ライセンス: Link先を確認	Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang,	(参考訳) 大規模なテキスト・ビデオ・モデルは目覚ましい能力を示しているが、ビデオ編集における直接の応用は、利用可能なデータセットが限られているため、依然として困難である。現在のビデオ編集法では、拡散モデルの微調整や、高忠実度な編集を保証するための特定の反転最適化が一般的である。本稿では,命令誘導ビデオ編集を直接サポートする効率的な拡散ベースモデルであるEffiVEDを紹介する。これを実現するために,拡張と基本的視覚言語技術を利用して,ビデオ編集ペアを収集する2つの効率的なワークフローを提案する。これらのワークフローは、膨大な画像編集データセットとオープンワールドビデオを、EffiVEDをトレーニングするための高品質なデータセットに変換する。実験結果から,EffiVEDは高品質な編集ビデオを生成するだけでなく,高速に実行可能であることがわかった。最後に,データ収集手法が編集性能を大幅に向上し,ビデオ編集データの不足に対処できることを実証する。コードはhttps://github.com/alibaba/EffiVEDにある。 Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets. Current video editing methods commonly require per-video fine-tuning of diffusion models or specific inversion optimization to ensure high-fidelity edits. In this paper, we introduce EffiVED, an efficient diffusion-based model that directly supports instruction-guided video editing. To achieve this, we present two efficient workflows to gather video editing pairs, utilizing augmentation and fundamental vision-language techniques. These workflows transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED. Experimental results reveal that EffiVED not only generates high-quality editing videos but also executes rapidly. Finally, we demonstrate that our data collection method significantly improves editing performance and can potentially tackle the scarcity of video editing data. Code can be found at https://github.com/alibaba/EffiVED.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# オブジェクトローカライゼーション Few-shot Object Localization ( http://arxiv.org/abs/2403.12466v3 ) ライセンス: Link先を確認	Yunhan Ren, Bo Li, Chengyang Zhang, Yong Zhang, Baocai Yin,	(参考訳) 既存のオブジェクトローカライゼーション手法は、モデル最適化のために大量のラベル付きデータに依存するため、特定のオブジェクトのクラスを特定するように調整されている。しかし、多くの実世界のシナリオにおいて大量のラベル付きデータを取得することは困難であり、ローカライゼーションモデルの広範な適用を著しく制限する。そこで本研究では,Few-Shot Object Localization (FSOL, Few-Shot Object Localization) という,限られたサンプルを用いて高精度なローカライゼーションを実現する新しいタスクを定義した。本課題は、少数のラベル付きサポートサンプルを利用して、対応する画像内のオブジェクトの位置情報をクエリすることで、一般化されたオブジェクトのローカライゼーションを実現する。この分野を推し進めるために,我々は革新的な高性能ベースラインモデルを設計する。このモデルは、デュアルパス機能拡張モジュールを統合して、サポートイメージとクエリイメージ間の形状関連と勾配差を強化するとともに、セルフクエリモジュールを使用して、特徴マップとクエリイメージの関係を探索する。実験の結果,FSOLタスクにおけるアプローチの大幅な性能向上が示され,さらなる研究のための効率的なベンチマークが確立された。すべてのコードとデータはhttps://github.com/Ryh1218/FSOLで公開されている。 Existing object localization methods are tailored to locate specific classes of objects, relying heavily on abundant labeled data for model optimization. However, acquiring large amounts of labeled data is challenging in many real-world scenarios, significantly limiting the broader application of localization models. To bridge this research gap, this paper defines a novel task named Few-Shot Object Localization (FSOL), which aims to achieve precise localization with limited samples. This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images. To advance this field, we design an innovative high-performance baseline model. This model integrates a dual-path feature augmentation module to enhance shape association and gradient differences between supports and query images, alongside a self query module to explore the association between feature maps and query images. Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research. All codes and data are available at https://github.com/Ryh1218/FSOL.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# 音声分類のための可聴マップ Listenable Maps for Audio Classifiers ( http://arxiv.org/abs/2403.13086v2 ) ライセンス: Link先を確認	Francesco Paissan, Mirco Ravanelli, Cem Subakan,	(参考訳) さまざまなタスクにわたるディープラーニングモデルの素晴らしいパフォーマンスにもかかわらず、その複雑さは解釈に挑戦する。この課題は、音声信号の伝達が本質的に困難になる場合に特に顕著である。この問題に対処するために,音声分類のためのリスナブルマップ (L-MAC) を導入し,忠実で聞きやすい解釈を生成するポストホック解釈法を提案する。 L-MACは、事前訓練された分類器の上のデコーダを使用して、入力オーディオの関連部分をハイライトするバイナリマスクを生成する。我々は、マスクアウト部分のモデル出力の確率を最小化しつつ、音声のマスクイン部分における分類器決定の信頼性を最大化する損失関数でデコーダを訓練する。領域内および領域外データの定量的評価は、L-MACが複数の勾配およびマスキングに基づく手法よりも一貫して忠実な解釈を生成することを示す。さらに,ユーザスタディでは,提案手法が生成した解釈を平均的に好んでいることを確認した。 Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation. This challenge is particularly evident for audio signals, where conveying interpretations becomes inherently difficult. To address this issue, we introduce Listenable Maps for Audio Classifiers (L-MAC), a posthoc interpretation method that generates faithful and listenable interpretations. L-MAC utilizes a decoder on top of a pretrained classifier to generate binary masks that highlight relevant portions of the input audio. We train the decoder with a loss function that maximizes the confidence of the classifier decision on the masked-in portion of the audio while minimizing the probability of model output for the masked-out portion. Quantitative evaluations on both in-domain and out-of-domain data demonstrate that L-MAC consistently produces more faithful interpretations than several gradient and masking-based methodologies. Furthermore, a user study confirms that, on average, users prefer the interpretations generated by the proposed technique.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# Cobra: 効率的な推論のためのマルチモーダル大言語モデルへのMambaの拡張 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference ( http://arxiv.org/abs/2403.14520v3 ) ライセンス: Link先を確認	Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang,	(参考訳) 近年,多モーダル大規模言語モデル (MLLM) の様々な分野への応用が目覚ましい成功を収めている。しかし、多くの下流タスクの基礎モデルとして、現在のMLLMは2次計算の複雑さの少ないよく知られたトランスフォーマーネットワークで構成されている。このような基本モデルの効率を改善するために,線形計算複雑性MLLMであるCobraを提案する。特に、Cobraは効率的なMamba言語モデルを視覚的モダリティに統合する。さらに,効率的なマルチモーダルマンバを作成するための様々なモーダル融合スキームを探索し,検討する。大規模実験により,(1)コブラの線形逐次モデルにより,コブラの高速な性能が向上し,計算効率が向上した現状,例えば,LLaVA-Phi,TinyLLaVA,MobileVLM v2が得られた。 2) 視覚錯覚や空間的関係判断を克服する上で, クローズドセットの課題予測ベンチマークの結果は良好であった。 (3) 特に、Cobraはパラメータの約43%でLLaVAに匹敵するパフォーマンスを実現している。我々は,Cobraのすべてのコードをオープンソースにし,提案手法がMLLMにおける複雑性問題の今後の研究を促進することを期待する。プロジェクトページは、https://sites.google.com/view/cobravlm.com/com/com/cobravlm.comで公開されている。 In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the foundation model for many downstream tasks, current MLLMs are composed of the well-known Transformer network, which has a less efficient quadratic computation complexity. To improve the efficiency of such basic models, we propose Cobra, a linear computational complexity MLLM. Specifically, Cobra integrates the efficient Mamba language model into the visual modality. Moreover, we explore and study various modal fusion schemes to create an effective multi-modal Mamba. Extensive experiments demonstrate that (1) Cobra achieves extremely competitive performance with current computationally efficient state-of-the-art methods, e.g., LLaVA-Phi, TinyLLaVA, and MobileVLM v2, and has faster speed due to Cobra's linear sequential modeling. (2) Interestingly, the results of closed-set challenging prediction benchmarks show that Cobra performs well in overcoming visual illusions and spatial relationship judgments. (3) Notably, Cobra even achieves comparable performance to LLaVA with about 43% of the number of parameters. We will make all codes of Cobra open-source and hope that the proposed method can facilitate future research on complexity problems in MLLM. Our project page is available at: https://sites.google.com/view/cobravlm.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# 単一温度計による2つの温度測定 Measuring two temperatures using a single thermometer ( http://arxiv.org/abs/2403.15186v2 ) ライセンス: Link先を確認	Harshit Verma, Fabio Costa,	(参考訳) 一つの温度計で2つの温度を同時に測定することは可能か? 一般的な状況では、温度計が一度に1つの浴のみと相互作用し、相互作用によって完全な熱化がもたらされるが、温度計の最終状態が最初の浴の温度から独立しているため、これは明らかに不可能である。本研究では,この課題が量子制御の助けを借りて実現可能であることを示す。特に、複数の量子自由度(DoF)を持つ複合粒子を温度センサとみなし、内部のDoFと呼ばれるDoFの1つが局所的な温度に影響を受け、温度計として機能する一方、外部のDoFと呼ばれる別のDoFは量子制御される。合成粒子中の上記DoF間の絡み合いを2温度温度測定に利用し、外部のDoFを量子的重ね合わせで調製し、内部のDoFを2つの局所温度に曝露した。我々は、マッハ・ツェンダー型干渉計や量子チャネルの適用順序を量子的に制御できる量子スイッチで用いられる粒子を同時に2つの温度を推定できることを示す。これらの設定のそれぞれについて,マルチパラメータClam\'er-Rao境界による推定温度のばらつきを求め,推定した2つの温度の総変動範囲に基づいてそれらの性能を比較した。推定温度の総変動に基づいて全ての設定をベンチマークすると、quditプローブを用いた量子スイッチが他の設定より優れていることが分かる。プローブを量子ビットに制限すると、量子スイッチはマッハ・ツェンダー型干渉計と同等に機能する。 We consider the question: Is it possible to measure two temperatures simultaneously using a single thermometer? Under common circumstances, where the thermometer can interact with only one bath at a time and the interaction leads to complete thermalization, this is clearly impossible because the final state of the thermometer would be independent of the temperature of the first bath. In this work, we show that this task can indeed be accomplished with the assistance of quantum control. In particular, we consider a composite particle with multiple quantum degrees of freedom (DoF) as a temperature sensor, where one of the DoF -- termed as internal DoF -- is susceptible to the local temperature, thereby functioning as a thermometer, whereas another DoF -- termed external DoF -- is quantum-controlled. We leverage the entanglement between the aforementioned DoF in a composite particle for two-temperature thermometry by preparing the external DoF in a quantum superposition, exposing the internal DoF to two local temperatures. We show that such a particle used in a Mach-Zehnder type interferometer, or a quantum switch -- which allows quantum control over the order of application of quantum channels -- can be used to estimate two temperatures simultaneously, thus affirming our main proposition. For each of these setups, we obtain the variance in the estimated temperatures through the multi-parameter Cram\'er-Rao bound, and compare their performances based on the range of total variance of the two temperatures estimated. On benchmarking all the setups based on the total variance of the estimated temperatures, we find that a quantum switch with a qudit probe outperforms other setups. On restricting our probe to be a qubit, we find that quantum switch performs equally well as a Mach-Zehnder type interferometer.	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# すべての注意が必要でない:マルチモーダル大言語モデルのパラメータと計算効率向上学習 Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models ( http://arxiv.org/abs/2403.15226v2 ) ライセンス: Link先を確認	Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji,	(参考訳) 本稿では,マルチモーダル大規模言語モデル(MLLM)のための新しいパラメータと計算効率のチューニング手法を提案し,その手法をEAS(Efficient Attention Skipping)と呼ぶ。具体的には、MLLMの主な計算オーバーヘッドであるマルチヘッドアテンション(MHA)が、ダウンストリームタスクに冗長であることを明らかにする。この観測に基づいて、EASは注意冗長性を評価し、重要でないMHAをスキップして推論を高速化する。また,新しい情報伝達アダプタ (PIA) を提案し,EASの注意スキップとパラメータ効率の維持を実現し,フィードフォワードネットワーク (FFN) に再パラメータ化することで,遅延をゼロにする。 EASを検証するために、最近提案されたLaVINと呼ばれるMLLMと、METERと呼ばれる古典的なVL事前学習モデルに適用し、一連のベンチマークで広範な実験を行う。実験により、EASは高い性能とパラメータ効率を維持するだけでなく、推論速度を大幅に高速化することが示された。例えば、LaVIN-EASはScineceQA上で89.98\%の精度を得ることができ、推論をLaVINに2.2倍速めることができる。 In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS). Concretely, we first reveal that multi-head attentions (MHAs), the main computational overhead of MLLMs, are often redundant to downstream tasks. Based on this observation, EAS evaluates the attention redundancy and skips the less important MHAs to speed up inference. Besides, we also propose a novel propagation-of-information adapter (PIA) to serve the attention skipping of EAS and keep parameter efficiency, which can be further re-parameterized into feed-forward networks (FFNs) for zero-extra latency. To validate EAS, we apply it to a recently proposed MLLM called LaVIN and a classic VL pre-trained model called METER, and conduct extensive experiments on a set of benchmarks. The experiments show that EAS not only retains high performance and parameter efficiency, but also greatly speeds up inference speed. For instance, LaVIN-EAS can obtain 89.98\% accuracy on ScineceQA while speeding up inference by 2.2 times to LaVIN	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# EgoExoLearn: 実世界の手続き活動の非同期的エゴとエクソ中心の視点をブリッジするデータセット EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World ( http://arxiv.org/abs/2403.16182v2 ) ライセンス: Link先を確認	Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao,	(参考訳) 他人の活動を自分の視点にマッピングできることは、非常に若い頃からの基本的な人間のスキルである。 EgoExoLearnは、デモビデオによってガイドされたタスクを実行する際に、個人がエゴセントリックなビデオを記録するプロセスに続く人間のデモをエミュレートする大規模なデータセットである。 EgoExoLearnは、日常生活のシナリオや専門的な研究室で捉えた120時間にわたる、エゴセントリックでデモ的なビデオデータを含んでいる。ビデオとともに、高品質な視線データを記録し、より詳細なマルチモーダルアノテーションを提供し、異なる視点から非同期手続きアクションをブリッジする人間の能力をモデル化するための遊び場を定式化します。この目的のために、クロスビューアソシエーション、クロスビューアクションプランニング、クロスビュー参照スキルアセスメントなどのベンチマークを詳細な分析とともに提示する。 EgoExoLearnは、ビューをまたいでアクションをブリッジするための重要なリソースとして機能し、現実世界で人間を観察してシームレスに学習できるAIエージェントを作るための道を開くことができると期待している。コードとデータは、https://github.com/OpenGVLab/EgoExoLearnで参照できる。 Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates the human demonstration following process, in which individuals record egocentric videos as they execute tasks guided by demonstration videos. Focusing on the potential applications in daily assistance and professional support, EgoExoLearn contains egocentric and demonstration video data spanning 120 hours captured in daily life scenarios and specialized laboratories. Along with the videos we record high-quality gaze data and provide detailed multimodal annotations, formulating a playground for modeling the human ability to bridge asynchronous procedural actions from different viewpoints. To this end, we present benchmarks such as cross-view association, cross-view action planning, and cross-view referenced skill assessment, along with detailed analysis. We expect EgoExoLearn can serve as an important resource for bridging the actions across views, thus paving the way for creating AI agents capable of seamlessly learning by observing humans in the real world. Code and data can be found at: https://github.com/OpenGVLab/EgoExoLearn	翻訳日:2024-06-07 00:40:47 公開日:2024-06-05
# 条件付きワッサースタイン距離とベイジアンOTフローマッチングへの応用 Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching ( http://arxiv.org/abs/2403.18705v2 ) ライセンス: Link先を確認	Jannis Chemseddine, Paul Hagemann, Gabriele Steidl, Christian Wald,	(参考訳) 逆問題において、多くの条件生成モデルは、合同測度と学習近似との距離を最小化することにより、後続測度を近似する。このアプローチは、クルバック-リーブラー発散の場合の後方測度間の距離も制御するが、一般には、ワッサーシュタイン距離には当てはまらない。本稿では,後部における期待するワッサーシュタイン距離と等しい制限結合の集合を通じて,条件付きワッサーシュタイン距離を導入する。興味深いことに、条件付きワッサーシュタイン 1 流の二重定式化は条件付きワッサースタイン GAN 文学における損失に非常に自然な方法で類似している。我々は条件付きワッサーシュタイン距離の理論的性質を導出し、対応する測地線と速度場と流れのODEを特徴づける。その後、条件付きワッサーシュタイン距離を緩和することにより速度場を近似する。これに基づいて,ベイズ逆問題の解法としてOTフローマッチングの拡張を提案し,その逆問題とクラス条件画像生成における数値的優位性を示す。 In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback--Leibler divergence, this is in general not hold true for the Wasserstein distance. In this paper, we introduce a conditional Wasserstein distance via a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. Interestingly, the dual formulation of the conditional Wasserstein-1 flow resembles losses in the conditional Wasserstein GAN literature in a quite natural way. We derive theoretical properties of the conditional Wasserstein distance, characterize the corresponding geodesics and velocity fields as well as the flow ODEs. Subsequently, we propose to approximate the velocity fields by relaxing the conditional Wasserstein distance. Based on this, we propose an extension of OT Flow Matching for solving Bayesian inverse problems and demonstrate its numerical advantages on an inverse problem and class-conditional image generation.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# インストラクティブ・コントラスト・デコーディングを用いた大規模視覚言語モデルにおける幻覚の緩和 Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding ( http://arxiv.org/abs/2403.18715v2 ) ライセンス: Link先を確認	Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann,	(参考訳) LVLM(Large Vision-Language Models)は、視覚入力からコンテキスト的に詳細で一貫性のある応答を生成するのに、ますます適している。しかし,マルチモーダルな意思決定やオープンエンドジェネレーションにおけるそれらの応用は,生成したテキストが視覚内容の不正確な表現をする幻覚の顕著な頻度によって妨げられる。そこで本研究では,LVLM推論における幻覚の低減を目的とした,命令コントラスト復号法(ICD)を提案する。本手法は,マルチモーダル核融合モジュールにおいて,外乱指示が幻覚を著しく悪化させるという観察に着想を得たものである。 ICDは、標準および命令障害からの分布を対比し、アライメントの不確実性を増大させ、元の分布から幻覚概念を効果的に抽出する。識別ベンチマーク (POPE, MME) と生成ベンチマーク (LLaVa-Bench) の総合的な実験を通じて, ICDは対象レベルの幻覚と属性レベルの幻覚の両方を著しく緩和することを示した。さらに,本手法は幻覚だけでなく,LVLMの認識能力や認識能力を著しく向上させる。 Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# 潜在機能モジュールによる自己教師付き解釈可能なエンドツーエンド学習 Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity ( http://arxiv.org/abs/2403.18947v2 ) ライセンス: Link先を確認	Hyunki Seong, David Hyunchul Shim,	(参考訳) 我々は,自己教師型かつ解釈可能なエンドツーエンド学習のための,関数型モジュールネットワークであるMoNetを紹介する。 MoNetは、機能的モジュラリティを遅延誘導型コントラスト損失関数で活用することにより、タスクレベルの監督を必要とせずに、潜在空間におけるタスク固有の意思決定プロセスを効率的に学習する。さらに,本手法は,センサモレータ制御性能を損なうことなく,エンド・ツー・エンド推論の解釈可能性を高めるオンライン・ポスト・ホックな説明可能性アプローチを取り入れている。現実世界の屋内環境では、MoNetは効果的な視覚自律ナビゲーションを示し、タスク特異性分析においてベースラインモデルを7%から28%上回っている。さらに,知覚の正当性マップと潜時決定ベクトルのポストホック解析により,ネットワークの解釈可能性について検討する。このことは、ロボット学習への説明可能な人工知能の取り入れに関する貴重な洞察を与え、知覚的視点と行動的視点の両方を包含する。追加資料はhttps://sites.google.com/view/monet-lgc.comで入手できる。 We introduce MoNet, a novel functionally modular network for self-supervised and interpretable end-to-end learning. By leveraging its functional modularity with a latent-guided contrastive loss function, MoNet efficiently learns task-specific decision-making processes in latent space without requiring task-level supervision. Moreover, our method incorporates an online, post-hoc explainability approach that enhances the interpretability of end-to-end inferences without compromising sensorimotor control performance. In real-world indoor environments, MoNet demonstrates effective visual autonomous navigation, outperforming baseline models by 7% to 28% in task specificity analysis. We further explore the interpretability of our network through post-hoc analysis of perceptual saliency maps and latent decision vectors. This provides valuable insights into the incorporation of explainable artificial intelligence into robotic learning, encompassing both perceptual and behavioral perspectives. Supplementary materials are available at https://sites.google.com/view/monet-lgc.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# コード大言語モデルのコード比較チューニング Code Comparison Tuning for Code Large Language Models ( http://arxiv.org/abs/2403.19121v2 ) ライセンス: Link先を確認	Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu,	(参考訳) コード比較チューニング(Code Comparison Tuning, CCT)は,コード大言語モデル(Code LLM)の簡易かつ効果的なチューニング手法である。具体的には、トークンレベルとシーケンスレベルの両方において、比較の概念をインストラクションチューニングに統合し、コード内のわずかなずれでもモデルを識別できるようにする。元のコードと手動で追加したコードエラーを含む誤ったバージョンを比較するために、トークンレベルの詳細な比較にトークンレベルの優先度損失を用いる。さらに、コードセグメントを組み合わせて、シーケンスレベルの比較のための新しいインストラクションチューニングサンプルを作成し、モデルのバグ修正機能を強化します。 HumanEvalFix ベンチマークによる実験結果から,CCT はパス@1 スコアの命令チューニングを,多種多様なコード LLM で最大 4 ポイント超えた結果が得られた。 We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# NeuroPrune: 大規模言語モデルのためのニューロインスパイアされたトポロジカルスパーストレーニングアルゴリズム NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models ( http://arxiv.org/abs/2404.01306v3 ) ライセンス: Link先を確認	Amit Dhurandhar, Tejaswini Pedapati, Ronny Luss, Soham Dan, Aurelie Lozano, Payel Das, Georgios Kollias,	(参考訳) トランスフォーマーベースの言語モデルは、様々なタスクにおける印象的なパフォーマンスのため、自然言語処理(NLP)においてユビキタスになっている。しかし、高価なトレーニングや推論は、その適用性に重大な障害となる。モデルアーキテクチャのさまざまなレベルにおけるスパーシリティの実施は、スケーリングと効率の問題に対処する上で有望なものとなっているが、スパーシリティがネットワークトポロジにどのように影響するかは、いまだに不一致である。脳神経ネットワークにインスパイアされた我々は、ネットワークトポロジーのレンズを通してスパーシティアプローチを探索する。具体的には、優先的なアタッチメントや冗長なシナプスプルーニングなどの生物学的ネットワークで見られるメカニズムを活用し、モデル非依存のスパーシリティアプローチは、性能を最適化しない唯一の目的にもかかわらず、分類(自然言語推論など)と生成(要約、機械翻訳など)の両方にまたがって、多様なNLPタスクにまたがって実行され、効率的であることを示す。 NeuroPruneは、パフォーマンスのベースラインと競合する(あるいは、時として優れている)ため、所定の間隔のトレーニング時間において最大10ドル高速になり、同時に多くのケースにおいて推論時間の測定可能な改善を示す。 Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) due to their impressive performance on various tasks. However, expensive training as well as inference remains a significant impediment to their widespread applicability. While enforcing sparsity at various levels of the model architecture has found promise in addressing scaling and efficiency issues, there remains a disconnect between how sparsity affects network topology. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology. Specifically, we exploit mechanisms seen in biological networks, such as preferential attachment and redundant synapse pruning, and show that principled, model-agnostic sparsity approaches are performant and efficient across diverse NLP tasks, spanning both classification (such as natural language inference) and generation (summarization, machine translation), despite our sole objective not being optimizing performance. NeuroPrune is competitive with (or sometimes superior to) baselines on performance and can be up to $10$x faster in terms of training time for a given level of sparsity, simultaneously exhibiting measurable improvements in inference time in many cases.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# 2レベルフィードバック制御によるネットワークシステムの侵入耐性 Intrusion Tolerance for Networked Systems through Two-Level Feedback Control ( http://arxiv.org/abs/2404.01741v5 ) ライセンス: Link先を確認	Kim Hammar, Rolf Stadler,	(参考訳) サービスレプリカを2段階最適制御問題とするシステムの侵入耐性を定式化する。ローカルレベルではノードコントローラが侵入回復を行い、グローバルレベルではシステムコントローラが複製係数を管理する。局所的およびグローバルな制御問題は、操作研究における古典的な問題、すなわち機械交換問題と在庫補充問題として定式化することができる。この定式化に基づいて、侵入耐性システムのための新しい制御アーキテクチャであるTOLERANCEを設計する。両レベルにおける最適制御戦略がしきい値構造を持ち、それらの計算に効率的なアルゴリズムを設計することを証明する。 10種類のネットワーク侵入を行うエミュレーション環境でのTOLERANCEの実装と評価を行う。その結果、TOLERANCEは、最先端の侵入耐性システムと比較して、サービスの可用性を向上し、運用コストを低減できることがわかった。 We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# M2SA:つぶやきの知覚分析のための多モーダルおよび多言語モデル M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets ( http://arxiv.org/abs/2404.01753v2 ) ライセンス: Link先を確認	Gaurish Thakkar, Sherzod Hakimov, Marko Tadić,	(参考訳) 近年,多様なデータ型から学習することを目的としたマルチモーダル自然言語処理が注目されている。しかし、多言語コンテキストにおけるマルチモーダルタスクの分析に関しては、より明確にする必要がある。ツイートの感情分析に関する先行研究は、主に英語に重点を置いているが、本稿では、既存のテキストTwitter感情データセットを、簡単なキュレーションプロセスを通じてマルチモーダルフォーマットに変換することで、このギャップに対処する。本研究は,研究コミュニティにおける感情関連研究の新たな道を開くものである。さらに、この拡張データセットを利用してベースライン実験を行い、その結果を報告する。特に,非モーダル・マルチモーダル構成の比較において,テキストエンコーダとしての感情調整型大言語モデルを用いることで,優れた性能が得られた。 In recent years, multimodal natural language processing, aimed at learning from diverse data types, has garnered significant attention. However, there needs to be more clarity when it comes to analysing multimodal tasks in multi-lingual contexts. While prior studies on sentiment analysis of tweets have predominantly focused on the English language, this paper addresses this gap by transforming an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process. Our work opens up new avenues for sentiment-related research within the research community. Additionally, we conduct baseline experiments utilising this augmented dataset and report the findings. Notably, our evaluations reveal that when comparing unimodal and multimodal configurations, using a sentiment-tuned large language model as a text encoder performs exceptionally well.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# BanglaAutoKG:意味的ニューラルグラフフィルタリングによるバングラ知識グラフの自動構築 BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering ( http://arxiv.org/abs/2404.03528v3 ) ライセンス: Link先を確認	Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Dong-Kyu Chae,	(参考訳) 知識グラフ(KG)は、関連エンティティをリンクし、コンテキストに富んだ情報を提供し、効率的な情報検索と知識発見をサポートし、情報フローを極めて効果的な方法で提示するため、情報処理や推論アプリケーションにおいて必須であることが証明されている。世界中で広く使われているにもかかわらず、バングラは包括的データセット、エンコーダ、NER(エンティティ認識)モデル、POS(part-of-speech)タグガー、レムマタイザの欠如、言語における効率的な情報処理と推論を妨げているため、KGでは比較的不足している。ベンガルにおけるKG不足に対処し、バングラテキストからベンガルKGを自動構築できる先駆的なフレームワークであるBanglaAutoKGを提案する。我々は多言語LLMを用いて様々な言語を理解し、エンティティと関係を普遍的に関連付ける。翻訳辞書を用いて、英語の等価部分を識別し、事前学習されたBERTモデルから単語の特徴を抽出することにより、基礎的なKGを構築する。雑音を低減し、単語の埋め込みをゴールに合わせるために、グラフベースの多項式フィルタを用いる。最後に、文脈的理解を高め、不要なエッジをトリムするGNNベースのセマンティックフィルタを実装し、決定的なKGを形成する。実験的な結果とケーススタディにより,任意のテキストから意味豊かなKGを自律的に構築できるモデルの有効性が実証された。 Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# 言語モデルにおける文脈と事前知識 Context versus Prior Knowledge in Language Models ( http://arxiv.org/abs/2404.04633v2 ) ライセンス: Link先を確認	Kevin Du, Vésteinn Snæbjarnarson, Niklas Stoehr, Jennifer C. White, Aaron Schein, Ryan Cotterell,	(参考訳) 質問に答えるために、言語モデルはしばしば、事前学習中に学んだ事前知識と、文脈で提示された新しい情報を統合する必要がある。モデルは、トレーニングコーパスの露出が大きいため、より親しみやすいエンティティ(例えば、人、場所など)に関する質問に対する事前の知識に頼り、いくつかのコンテキストによってより容易に説得される、という仮説を立てています。この問題を定式化するために、あるコンテキストに対するモデルの依存性と、そのエンティティに関する先行性を測定するための2つの相互情報ベースのメトリクスを提案する。メトリクスの妥当性と信頼性を実証的にテストします。最後に、スコアとモデルが期待するエンティティとの親和性の関係を調べ、その利点を説明するための2つのユースケースを提供します。 To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer distribution about an entity. We empirically test our metrics for their validity and reliability. Finally, we explore and find a relationship between the scores and the model's expected familiarity with an entity, and provide two use cases to illustrate their benefits.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# コロンビアの地熱勾配予測 : 機械学習によるアプローチ Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach ( http://arxiv.org/abs/2404.05184v7 ) ライセンス: Link先を確認	Juan Camilo Mejía-Fragoso, Manuel A. Florez, Rocío Bernal-Olaya,	(参考訳) 地熱勾配の正確な決定は、所定の地域の地熱エネルギーポテンシャルを評価するために重要である。特に興味深いのは、豊富な地熱資源を持つコロンビアである。活発な石油とガスの探査と生産の歴史は、掘削されたボーアホールを異なる地質環境に残し、地熱勾配を直接測定した。残念なことに、地熱資源が存在する国ではそのような測定方法が欠如している。間接的な物理測定は、地域規模で行うのに費用がかかり、困難である。計算熱モデルを構築することもできるが、基礎となる地質について非常に詳細な知識と地下温度の均一なサンプリングが必要である。我々は,地球規模の地球物理データセットとコース地質知識しか利用できない地域での地熱勾配を予測するために,教師付き機械学習と直接測定の最近の進歩を活用するアプローチを提案する。グラディエントブースト回帰木アルゴリズムは最適な予測を行い、トレーニングされたモデルを広範囲に検証する。我々は,本モデルの予測精度が12%以内であり,他の著者による独立測定値が本モデルとよく一致していることを示す。最後に,コロンビアの地熱勾配図で,深部探査とデータ収集を行うべき地域に焦点を当てた。 Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# ファウンデーションモデルのための顔特徴ガイド適応によるより一般的なビデオベースディープフェイク検出に向けて Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model ( http://arxiv.org/abs/2404.05583v2 ) ライセンス: Link先を確認	Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, Jun-Cheng Chen,	(参考訳) ディープラーニングの台頭により、生成モデルは高度に現実的な合成画像の作成を可能にし、その潜在的な誤用による課題を提示している。ディープフェイク検出の研究は、反応が急速に進んでいるが、多くの検出手法は、新しい合成技術によって生成された未知のディープフェイクと競合している。この一般化の課題に対処するため,下流タスクに強力なゼロショット機能を示すCLIPの画像エンコーダを用いて,内部にリッチな情報をエンコードしたファンデーションモデルを適用することにより,新しいディープフェイク検出手法を提案する。近年のパラメータ効率のよい微調整の進歩に触発されて,ビデオクリップから空間的および時間的手がかりを抽出する,サイドネットワークベースのデコーダを提案するとともに,より堅牢で汎用的なディープフェイク検出のための重要な顔部品の特徴を含む空間的特徴を促進すべく,FCG(Facial Component Guidance)の促進を図った。大規模なクロスデータセット評価を通じて,本手法は未知のDeepfakeサンプルを同定し,限られたトレーニングサンプルや操作タイプでも顕著な性能向上を実現している。本モデルでは,最先端手法と比較して,AUROCの平均性能向上率が0.9\%であること,特にDFDCデータセットの4.4\%向上に寄与することが重要である。 With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake detection approach by adapting the Foundation Models with rich information encoded inside, specifically using the image encoder from CLIP which has demonstrated strong zero-shot capability for downstream tasks. Inspired by the recent advances of parameter efficient fine-tuning, we propose a novel side-network-based decoder to extract spatial and temporal cues from the given video clip, with the promotion of the Facial Component Guidance (FCG) to encourage the spatial feature to include features of key facial parts for more robust and general Deepfake detection. Through extensive cross-dataset evaluations, our approach exhibits superior effectiveness in identifying unseen Deepfake samples, achieving notable performance improvement even with limited training samples and manipulation types. Our model secures an average performance enhancement of 0.9\% AUROC in cross-dataset assessments comparing with state-of-the-art methods, especially a significant lead of achieving 4.4\% improvement on the challenging DFDC dataset.	翻訳日:2024-06-07 00:30:45 公開日:2024-06-05
# Lyapunov-stable Neural Control for State and Output Feedback: a novel formulation Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation ( http://arxiv.org/abs/2404.07956v2 ) ライセンス: Link先を確認	Lujie Yang, Hongkai Dai, Zhouxing Shi, Cho-Jui Hsieh, Russ Tedrake, Huan Zhang,	(参考訳) 学習ベースのニューラルネットワーク(NN)制御ポリシは、ロボット工学と制御の幅広いタスクにおいて、印象的な経験的パフォーマンスを示している。しかし、非線形力学系を持つNNコントローラの領域トラクション(ROA)に対する形式的(リアプノフ)安定性の保証は困難であり、既存のアプローチの多くは、sums-of-squares(SOS)、mixed-integer Programming(MIP)、SMT(Satisfiability modulo theory)といった高価な解法に依存している。本稿では、高速な経験的ファルシフィケーションと戦略的正規化を用いて、Lyapunov証明書とともにNNコントローラを学習するための新しいフレームワークを実証する。そこで本論文では,文献で示されるよりも大きなアトラクション領域(ROA)を定義し,リアプノフ誘導体に対する従来の制限制約を洗練し,証明可能なROAのみに焦点をあてる新しい定式化を提案する。 Lyapunov条件は、拡張性のある線形有界伝搬に基づくNN検証技術を用いて、分岐とバウンドで厳密に検証されている。このアプローチは効率的で柔軟性があり、SOS、MIP、SMTの高価なソルバに頼ることなく、GPU上で完全なトレーニングと検証の手順が加速される。筆者らのフレームワークの柔軟性と効率性により,合成NNベースのコントローラと形式的安定性保証を備えたNNベースのオブザーバによるリアプノフ安定出力フィードバック制御を文献で初めて実証することができる。ソースコードはhttps://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllersにある。 Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability modulo theories (SMT). In this paper, we demonstrate a new framework for learning NN controllers together with Lyapunov certificates using fast empirical falsification and strategic regularizations. We propose a novel formulation that defines a larger verifiable region-of-attraction (ROA) than shown in the literature, and refines the conventional restrictive constraints on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov condition is rigorously verified post-hoc using branch-and-bound with scalable linear bound propagation-based NN verification techniques. The approach is efficient and flexible, and the full training and verification procedure is accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT. The flexibility and efficiency of our framework allow us to demonstrate Lyapunov-stable output feedback control with synthesized NN-based controllers and NN-based observers with formal stability guarantees, for the first time in literature. Source code at https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllers	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# ブロックワイド並列デコーディングにおけるドラフトの探索と改善 Exploring and Improving Drafts in Blockwise Parallel Decoding ( http://arxiv.org/abs/2404.09221v2 ) ライセンス: Link先を確認	Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton,	(参考訳) 自己回帰言語モデルによる顕著な進歩にもかかわらず、そのポテンシャルはシーケンシャルトークン生成に固有の遅い推論速度によって妨げられることが多い。ブロックワイド並列復号法(BPD)は,複数の将来のトークンを同時に予測することで,言語モデルの推論速度を向上させる手法として,Sternらによって提案された。本稿では,ブロックドラフトの理解と改善に2つの方法で貢献する。まず,複数の予測ヘッドが生成するトークン分布を解析する。第二に、この分析を利用して、n-gramモデルとニューラル言語モデルを用いてブロックドラフトを精製することにより、BPD推論速度を改善するアルゴリズムを開発する。実験では、改良されたブロックドラフトがブロック効率(ブロックドラフトから受け入れられたトークンの数)を、多様なデータセットで+5-21%増加させることを示した。 Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified and conditionally accepted by the autoregressive model. This paper contributes to the understanding and improvement of block drafts in two ways. First, we analyze the token distributions produced by multiple prediction heads. Secondly, we leverage this analysis to develop algorithms to improve BPD inference speed by refining the block drafts using n-gram and neural language models. Experiments demonstrate that refined block drafts yield a +5-21% increase in block efficiency (i.e., the number of accepted tokens from the block draft) across diverse datasets.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# 生成モデルを用いた圧縮強化学習 Compressed Federated Reinforcement Learning with a Generative Model ( http://arxiv.org/abs/2404.10635v2 ) ライセンス: Link先を確認	Ali Beikmohammadi, Sarit Khirirat, Sindri Magnússon,	(参考訳) 強化学習は近年、前例のない人気を得たが、それでもサンプルの非効率さに悩まされている。この課題に対処するため、フェデレーション強化学習(FedRL)が出現し、エージェントは局所的な推定を集約することで単一のポリシーを協調的に学習する。しかし、この集約ステップは、かなりの通信コストを発生させる。本稿では,通信効率のよいFedRL手法であるCompFedRLを提案する。具体的には、中央サーバがローカルエージェントから圧縮された$Q$-estimatesを定期的に集約することにより、最適な$Q$-functionを学習する生成モデルセットアップを用いて、圧縮された$Q$-learningを検討する。提案アルゴリズムの有限時間解析により, 直接圧縮と誤りフィードバック圧縮のどちらを用いても強い収束挙動を示すことにより, この2つのメカニズムの影響を初めて特徴づけた。我々の限界は、通信コストを同時に低減しつつ、エージェント数やその他の連合ハイパーパラメータに関する解の精度の向上を示している。我々の理論を裏付けるために、我々は、Top-K$およびSparsified-K$スペーシフィケーション作用素を考慮し、詳細な数値実験も行います。 Reinforcement learning has recently gained unprecedented popularity, yet it still grapples with sample inefficiency. Addressing this challenge, federated reinforcement learning (FedRL) has emerged, wherein agents collaboratively learn a single policy by aggregating local estimations. However, this aggregation step incurs significant communication costs. In this paper, we propose CompFedRL, a communication-efficient FedRL approach incorporating both \textit{periodic aggregation} and (direct/error-feedback) compression mechanisms. Specifically, we consider compressed federated $Q$-learning with a generative model setup, where a central server learns an optimal $Q$-function by periodically aggregating compressed $Q$-estimates from local agents. For the first time, we characterize the impact of these two mechanisms (which have remained elusive) by providing a finite-time analysis of our algorithm, demonstrating strong convergence behaviors when utilizing either direct or error-feedback compression. Our bounds indicate improved solution accuracy concerning the number of agents and other federated hyperparameters while simultaneously reducing communication costs. To corroborate our theory, we also conduct in-depth numerical experiments to verify our findings, considering Top-$K$ and Sparsified-$K$ sparsification operators.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# CKGConv: 継続的カーネルによる一般的なグラフの畳み込み CKGConv: General Graph Convolution with Continuous Kernels ( http://arxiv.org/abs/2404.13604v2 ) ライセンス: Link先を確認	Liheng Ma, Soumyasundar Pal, Yitian Zhang, Jiaming Zhou, Yingxue Zhang, Mark Coates,	(参考訳) 既存のグラフ畳み込みの定義は、空間的あるいはスペクトル的な観点からも、柔軟性がなく、統一されていない。グラフ領域における一般畳み込み作用素の定義は、標準座標の欠如、不規則構造の存在、およびグラフ対称性の性質により困難である。本研究では,グラフ位置符号化によって導出される疑似座標の連続関数としてカーネルをパラメータ化する,新しい一般グラフ畳み込みフレームワークを提案する。このContinuous Kernel Graph Convolution(CKGConv)と名付けます。理論的には、CKGConvは柔軟で表現力がある。 CKGConvは多くの既存のグラフ畳み込みを包含し、非同型グラフを区別する点においてグラフ変換器と同じくらい強力な表現性を示す。経験的に、CKGConvベースのネットワークは、既存のグラフ畳み込みネットワークより優れており、様々なグラフデータセットで最高のグラフ変換器と互換性があることを示す。コードとモデルはhttps://github.com/networkslab/CKGConv.comで公開されている。 The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a general convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits a stronger expressiveness, as powerful as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets. The code and models are publicly available at https://github.com/networkslab/CKGConv.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# ゼロショット高忠実度とポス制御可能なキャラクタアニメーション Zero-shot High-fidelity and Pose-controllable Character Animation ( http://arxiv.org/abs/2404.13680v3 ) ライセンス: Link先を確認	Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang,	(参考訳) 画像対ビデオ生成(I2V)は、高時間的コヒーレンスと視覚的忠実度を必要とする単一の画像からビデオシーケンスを作成することを目的としている。しかし、既存のアプローチはキャラクターの外観の不整合と細部保存の貧弱さに悩まされている。さらに、トレーニングには大量のビデオデータが必要です。これらの制約に対処するため,文字アニメーションのための新しいゼロショットI2VフレームワークであるPoseAnimateを提案する。 PoseAnimateには3つの重要なコンポーネントが含まれている。 1)多彩なポーズ信号をテキスト埋め込みに組み込んで、文字に依存しないコンテンツを保存し、アクションの正確なアライメントを維持するPose-Aware Control Module(PACM)。 2)DCAM(Dual Consistency Attention Module)は,時間的一貫性を高め,文字識別と複雑な背景情報を保持するモジュールである。 3) Mask-Guided Decoupling Module (MGDM) は特徴認識能力を洗練させ,文字と背景を分離することでアニメーションの忠実度を向上させる。また、スムーズな動作遷移を保証するために、PATA(Pose Alignment Transition Algorithm)を提案する。実験結果から,本手法は,文字の一貫性と細部忠実度の観点から,最先端のトレーニングベース手法よりも優れていることが示された。さらに、生成されたアニメーション全体を通して、高レベルの時間的コヒーレンスを維持している。 Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# 品質多様性のためのインコンテキストAIジェネレータとしての大規模言語モデル Large Language Models as In-context AI Generators for Quality-Diversity ( http://arxiv.org/abs/2404.15794v2 ) ライセンス: Link先を確認	Bryan Lim, Manon Flageat, Antoine Cully,	(参考訳) QD(Quality-Diversity)アプローチは、様々なニッチにまたがる高品質なソリューションのアーカイブを見つけることができるため、オープンなプロセスを開発する上で有望な方向である。既に多くのアプリケーションで成功したが、QDアプローチは通常、新しい候補ソリューションを生成するために1つまたは2つのソリューションの組み合わせに頼っている。技術進化のようなオープンなプロセスで観察されるように、これらのソリューションの大きな多様性を賢明に組み合わせることで、より革新的なソリューションが生まれ、QD検索の生産性が向上する可能性がある。本研究では、生成モデルのパターンマッチング機能を利用して、そのような効率的な解の組み合わせを実現することを提案する。 In-context QDは、事前訓練された大規模言語モデル(LLM)のコンテキスト内能力を引き出すためのテクニックのフレームワークであり、QDアーカイブから品質の異なる例をコンテキストとして、少ないショットと多ショットのプロンプトを使って興味深いソリューションを生成する。一連の共通QDドメインに適用すると、In-context QDは、単目的最適化のために開発されたQDベースラインと類似の戦略の両方と比較して有望な結果を示す。さらに、この結果は、パラメータサイズとアーカイブ人口サイズの複数の値にまたがるだけでなく、BBO関数と異なる特徴を持つ領域やポリシー探索の領域にも及んでいる。最後に、QDのための有望なソリューションの創出を促進する重要なプロンプト設計の考察を強調した広範囲なアブレーションを行う。 Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# PatentGPT:知的財産のための大規模言語モデル PatentGPT: A Large Language Model for Intellectual Property ( http://arxiv.org/abs/2404.18255v5 ) ライセンス: Link先を確認	Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang, Weilei Wang, Changyang Tu,	(参考訳) 近年,大規模言語モデル (LLM) は,様々な自然言語処理タスクにまたがる例外的な性能から注目され,様々な分野に広く応用されている。しかし、知的財産権(IP)分野における大規模言語モデルの応用は、専門知識、プライバシー保護、この分野における極端に長いテキストの処理の必要性が強いため、困難である。本技術報告では,IP ドメインのユニークな要件を満たす,IP 指向 LLM をトレーニングするための,低コストで標準化された手順を初めて提示する。この標準プロセスを用いて,オープンソース事前学習モデルに基づく特許GPTシリーズモデルを訓練した。オープンソースIP指向ベンチマークMOZIPで評価することにより,提案したトレーニング手順の有効性とIPドメインにおける特許GPTモデルの専門性を示す,ドメイン固有のLCMがGPT-4を上回った。注目すべきは、2019年の中国特許代理人資格試験において、当社のモデルはGPT-4を上回り、65のスコアと人間の専門家レベルが一致したことです。さらに、SMoE アーキテクチャを利用する PatentGPT モデルは、IP ドメインの GPT-4 に匹敵する性能を達成し、IP ドメイン内の GPT-4 の代替として機能し、長文タスクのコストパフォーマンスを向上する。 In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain. Using this standard process, we have trained the PatentGPT series models based on open-source pretrained models. By evaluating them on the open-source IP-oriented benchmark MOZIP, our domain-specific LLMs outperforms GPT-4, indicating the effectiveness of the proposed training procedure and the expertise of the PatentGPT models in the IP domain. Remarkably, our model surpassed GPT-4 on the 2019 China Patent Agent Qualification Examination, scoring 65 and matching human expert levels. Additionally, the PatentGPT model, which utilizes the SMoE architecture, achieves performance comparable to that of GPT-4 in the IP domain and demonstrates a better cost-performance ratio on long-text tasks, potentially serving as an alternative to GPT-4 within the IP domain.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# ニューラルネットワークの深さを減らすためのエントロピーに基づく重要度基準 The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth ( http://arxiv.org/abs/2404.18949v2 ) ライセンス: Link先を確認	Victor Quétu, Zhu Liao, Enzo Tartaglione,	(参考訳) ディープニューラルネットワークは複雑なタスクを解くのに非常に効果的であるが、大きめの事前訓練されたモデルは、大きめのモデルの複雑さを必ずしも必要としない、一貫した単純化された下流タスクを解くためにも一般的に使用される。成長を続けるAI環境の影響を意識して、我々は、大規模モデルによって伝達される事前知識を活用する効率戦略を提案する。本稿では,過度にパラメータ化された深層ニューラルネットワークの深さを低減し,その計算負担を軽減するために,エントロピーをベースとした重要度mEtRic(EASIER)を利用する手法を提案する。従来の画像分類設定における手法の有効性を評価する。私たちのコードはhttps://github.com/VGCQ/EASIER.comから入手可能です。 While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. Our code is available at https://github.com/VGCQ/EASIER.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# 動的データセットの近似近傍探索に関する研究 Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation ( http://arxiv.org/abs/2404.19284v3 ) ライセンス: Link先を確認	Ben Harwood, Amir Dezfouli, Iadine Chades, Conrad Sanderson,	(参考訳) 近似k-Nearest Neighbour (ANN) 法は情報マイニングや大規模高次元データセットでの機械学習支援によく用いられる。 ANN法は通常、検索の高速化に使用されるインデックス構造が異なるため、様々なリコール/実行時のトレードオフ点が生じる。静的なデータセットを持つアプリケーションでは、ランタイム制約とデータセットプロパティを使用して、適切な操作特性を持つANNメソッドを経験的に選択することができる。しかし、オンラインの頻繁な変更(新しいサンプルの追加など)の対象となる動的データセットを持つアプリケーションでは、どのANNメソッドが最も適しているかについては、現時点では合意が得られていない。従来の評価手法は、インデックス構造を更新する際の計算コストや、インデックス更新の率とサイズを考慮していない。これを解決するために、これらの考慮を考慮しつつ、2つの主要なアプリケーション(オンラインデータ収集とオンライン特徴学習)で5つの人気のあるANN手法を実証的に評価する。 100万のサンプルを持つSIFT1Mデータセットと10億のサンプルを持つDEEP1Bデータセットから派生した2つの動的データセットが使用されている。その結果,k-d木法は,単純なベースライン探索法よりも遅いため,動的データセットには適さないことがわかった。オンラインデータ収集において、階層ナビゲート可能な小型世界グラフ法は、幅広いリコールレートでベースラインを一貫したスピードアップを達成する。オンライン機能学習において、スケーラブルなNearest Neighboursメソッドは75%未満のリコール率のベースラインよりも高速である。 Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# Cognate Synonym Selectionにおける主観性統合のための計算的アプローチ Computational Approaches for Integrating out Subjectivity in Cognate Synonym Selection ( http://arxiv.org/abs/2404.19328v2 ) ライセンス: Link先を確認	Luise Häuser, Gerhard Jäger, Alexandros Stamatakis,	(参考訳) コグネートデータを扱うには、同義語、つまり言語で同じ概念を記述する複数の単語を扱う必要がある。言語系統学の初期において、一つの同義語のみを選択することが推奨された。しかし、ここで示すように、計算手法の入力として使用されるバイナリ文字行列は、すべての同義語を含むデータセット全体を表現することができる。ここでは、どのようにしてすべての同義語を含めるべきか、あるいは前科を選択すべきかどうかという疑問に対処する。この目的のために、広く使われているRAxML-NGツールを用いて最大木推定を行い、すべての同義語を入力として使用する場合に可塑性木を生成することを示す。さらに, 前代同義語選択は, トポロジカルに大きく異なる木を産出できることを示す。すべての同義語を含む同義語データを表現するために、確率的二元数行列と確率的多値文字行列という、標準的な二元数行列以外の2種類の文字行列を導入する。さらに, 推定されたRAxML-NG木がゴールド標準に最も近いキャラクタリマトリクスは, データセット依存であることを示す。また、CLDFフォーマットで提供されるコグネートデータに対して、上記のすべてのキャラクタマトリックスタイプを生成するためのPythonインターフェースも提供しています。 Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# オンライン強化学習による費用効果・エキスパートレベル臨床ノート作成のためのオープンソース大規模言語モデルの適用 Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning ( http://arxiv.org/abs/2405.00715v2 ) ライセンス: Link先を確認	Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Chuck Outcalt, Jimeng Sun,	(参考訳) GPT-4やGeminiのようなプロプライエタリな大規模言語モデル(LLM)は、臨床テキスト要約タスクにおいて有望な能力を示している。しかしながら、患者のデータのプライバシに関する懸念と計算コストのため、多くの医療提供者は、外部ジェネリックLLMよりも、小さなローカルホストモデルを使うことを好む。本研究は、オープンソースのLLaMA-213億パラメーターモデルに対する包括的ドメインおよびタスク固有の適応プロセスを示し、外来患者と医師の対話から高品質な臨床ノートを生成する。私たちのプロセスには、継続的な事前トレーニング、教師付き微調整、AIと人間のフィードバックからの強化学習が含まれています。我々は、教師モデルとしてGemini 1.0 Proを用いて、政治強化学習を行うための新しいアプローチであるDistillDirectを導入した。得られたLLaMA-Clinicは,医師が作成したものと同等の精度で臨床記録を作成できる。盲目医学読者の研究では、個々の評価の90.4%がLLaMA-Clinicが生み出したノートを「許容可能」以上の3つの基準(現実の読みやすさ、完全性、正確性)で評価している。より挑戦的な「評価と計画」のセクションでは、LLaMA-クリニックは医師が発行したノート(4.1/5)よりも現実の即応性が高い(4.2/5)。我々のLLaMA-Clinicモデルでは,外部ジェネリックLLMサービスに比べて4.375倍のコスト削減を実現している。さらに, 臨床実践において, LLM に頼らず, ベストプラクティスのノートフォーマットを事前に定義することの重要性を強調し, 今後の臨床ノート生成課題の重要点を強調した。我々は,新たに作成した総合診療録データセットと医師のフィードバックデータセットを公開し,今後の研究を奨励した。 Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pre-training, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (90.4%) of individual evaluations rated the notes generated by LLaMA-Clinic as "acceptable" or higher across all three criteria: real-world readiness, completeness, and accuracy. In the more challenging "Assessment and Plan" section, LLaMA-Clinic scored higher (4.2/5) in real-world readiness than physician-authored notes (4.1/5). Our cost analysis for inference shows that our LLaMA-Clinic model achieves a 4.375-fold cost reduction compared to an external generic LLM service. Additionally, we highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a best-practice note format, rather than relying on LLMs to determine this for clinical practice. We have made our newly created synthetic clinic dialogue-note dataset and the physician feedback dataset publicly available to foster future research.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# リポジトリ上の反復的ツール強化推論を用いた自然言語からのクラスレベルコード生成 Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository ( http://arxiv.org/abs/2405.01573v2 ) ライセンス: Link先を確認	Ajinkya Deshpande, Anmol Agarwal, Shashank Shet, Arun Iyer, Aditya Kanade, Ramakrishna Bairi, Suresh Parthasarathy,	(参考訳) LLMはコード生成タスクにおいて大きな可能性を示しており、様々なベンチマークで関数やステートメントレベルで有望な結果を達成している。しかし、クラスのようなコードアーティファクトを作成することに関連する複雑さ、特に現実世界のソフトウェアリポジトリのコンテキスト内では、まだ解明されていないままです。それまでの研究は、クラスレベルの生成を独立したタスクとして扱い、現実世界のソフトウェア環境を特徴付ける複雑な依存関係と相互作用を無視していた。このギャップに対処するために、現実のリポジトリ内で複雑なクラスレベルのコードを生成する際に、LLMを厳格に評価するために設計された包括的なベンチマークであるRepoClassBenchを紹介します。 RepoClassBenchには、リポジトリの選択からJava、Python、C#にまたがる"Natural Language to Class Generation"タスクが含まれている。データセットの各クラスがリポジトリ内でクロスファイルの依存関係を持つだけでなく、その機能を検証するための対応するテストケースも含んでいることを保証します。現在のモデルでは,関連するリポジトリコンテキストへの露出が限られているため,ベンチマークによって引き起こされる現実的な課題に対処しています。 Retrieve-Repotools-Reflect(RRR)は、エージェントベースのフレームワークでリポジトリレベルのコンテキストを反復的にナビゲートし、推論する静的解析ツールを備えた新しいアプローチである。我々の実験は、RRRが既存のRepoClassBenchのベースラインを大幅に上回ることを示した。私たちの発見は、ソフトウェア開発の複雑さをより正確に反映するために、リポジトリレベルの依存関係を組み込むコード生成ベンチマークが不可欠であることを強調します。我々の研究は、レポジトリコンテキストに対するLLMの理解を高めるために、特殊なツールを活用する利点を示している。データセットと評価を一般公開する予定です。 LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, neglecting the intricate dependencies & interactions that characterize real-world software environments. To address this gap, we introduce RepoClassBench, a comprehensive benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We ensure that each class in our dataset not only has cross-file dependencies within the repository but also includes corresponding test cases to verify its functionality. We find that current models struggle with the realistic challenges posed by our benchmark, primarily due to their limited exposure to relevant repository contexts. To address this shortcoming, we introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context in an agent-based framework. Our experiments demonstrate that RRR significantly outperforms existing baselines on RepoClassBench, showcasing its effectiveness across programming languages & under various settings. Our findings emphasize the critical need for code-generation benchmarks to incorporate repo-level dependencies to more accurately reflect the complexities of software development. Our work shows the benefits of leveraging specialized tools to enhance LLMs' understanding of repository context. We plan to make our dataset & evaluation harness public.	翻訳日:2024-06-07 00:20:37 公開日:2024-06-05
# ランダム一般化スティフェル多様体上のリトラクションなし最適化 Optimization without Retraction on the Random Generalized Stiefel Manifold ( http://arxiv.org/abs/2405.01702v2 ) ライセンス: Link先を確認	Simon Vary, Pierre Ablin, Bin Gao, P. -A. Absil,	(参考訳) X^\top B X = I_p$ を満たす行列の集合上の最適化は一般化スティーフェル多様体と呼ばれ、正準相関解析(CCA)、独立成分解析(ICA)、一般化固有値問題(GEVP)などのサンプル共分散行列を含む多くの応用に現れる。これらの問題の解決は、通常、完全に構成された$B$を必要とする反復的な方法によって行われる。本稿では,B$のランダムな推定値にのみアクセスしながら,最適化問題を解く,安価な確率的反復法を提案する。我々の方法はすべての反復において制約を強制するのではなく、予想で定義される一般化されたスティーフェル多様体上の臨界点に収束する反復を生成する。この手法は点当たりのコストが低く、行列乗法しか必要とせず、リーマン最適化と同じ収束率を持ち、完全行列の$B$を必要とする。実験は、CCA、ICA、GEVPを含む一般化直交制約を含む様々な機械学習アプリケーションでその効果を示す。 Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to a random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 位置:Quo Vadis, Unsupervised Time Series Anomaly Detection? Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? ( http://arxiv.org/abs/2405.02678v3 ) ライセンス: Link先を確認	M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis,	(参考訳) Timeseries Anomaly Detection (TAD)における機械学習奨学金の現在の状況は、欠陥のある評価指標の使用、一貫性のないベンチマークプラクティス、新しいディープラーニングベースのモデル設計における選択に対する適切な正当化の欠如に悩まされている。本稿は,TADにおける現状を批判的に分析し,現在の研究の誤解を招き,問題となる方法や評価の実践を明らかにする。我々の立場は、単に新しいモデル設計を追求することから、ベンチマークプラクティスの改善、非自明なデータセットの作成、より単純なベースラインに対して複雑なメソッドの有用性を批判的に評価することへと焦点を移すことを提唱している。その結果,厳密な評価プロトコルの必要性,単純なベースラインの作成,および最先端の深部異常検出モデルが線形写像を効果的に学習できることが示唆された。これらの結果から, 簡便かつ解釈可能なTAD法のさらなる探索と開発の必要性が示唆された。最先端のディープラーニングベースのモデルにおけるモデルの複雑さの増加は、残念ながら、ほとんど改善しない。この分野を前進させるための洞察と提案を提供する。コード:https://github.com/ssarfraz/QuoVadisTAD The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 一般化されたアインシュタイン-ポドルスキー-ローゼンステアリングパラドックス Generalized Einstein-Podolsky-Rosen Steering Paradox ( http://arxiv.org/abs/2405.03100v2 ) ライセンス: Link先を確認	Zhi-Jie Liu, Xing-Yan Fan, Jie Zhou, Mi Xie, Jing-Ling Chen,	(参考訳) 量子パラドックスは、アインシュタイン=ポドルスキー=ローゼン(EPR)のステアリングパラドックス(英語版)は、通常の不等式法よりも局所隠れ状態モデルと量子力学との矛盾に対するよりシャープな基準を提供する量子理論と古典理論の非互換性を明らかにするための必須の手段である。本研究では、量子(Q$)と古典(C$)理論によって与えられる矛盾する等式を予想する一般化されたEPRステアリングパラドックスを示す。ステアリングパーティの条件状態が純粋である任意の$N$-qubit状態に対して、2セットのステアリングプロトコルを用いてパラドックスをテストし、特定の測定条件が満たされれば、その状態がステアリング可能であることを確認する。さらに、我々の構成は、典型的な量子テレポーテーションや量子鍵分布のスキームに寄与するであろうEPRステアリングの不等式の構築にも寄与する。 Quantum paradoxes are essential means to reveal the incompatibility between quantum and classical theories, among which the Einstein-Podolsky-Rosen (EPR) steering paradox offers a sharper criterion for the contradiction between local-hidden-state model and quantum mechanics than the usual inequality-based method. In this work, we present a generalized EPR steering paradox, which predicts a contradictory equality $2_{Q}=\left( 1+\delta\right)_{C}$ ($0\leq\delta<1$) given by the quantum ($Q$) and classical ($C$) theories. For any $N$-qubit state in which the conditional state of the steered party is pure, we test the paradox through a two-setting steering protocol, and find that the state is steerable if some specific measurement requirements are satisfied. Moreover, our construction also enlightens the building of EPR steering inequality, which may contribute to some schemes for typical quantum teleportation and quantum key distributions.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 対向UASシステムのための商用DTIソリューションの比較性能評価のためのオブジェクト指向テスト手法の設計 Designing an Objective-Driven Test Method for the Comparative Performance Evaluation of Commercial DTI Solutions for Counter UAS systems ( http://arxiv.org/abs/2405.04477v2 ) ライセンス: Link先を確認	Ali Mohamoud, Johan van de Pol, Hanno Hildmann, Rob van Heijster, Beatrice Masini, Martijn van den Heuvel, Amber van Keeken,	(参考訳) 無人航空システム(UAS)やドローンはますます商業的になり、安価になる。検出トラッキングと識別(DTI)ソリューションを備えた対UAS(Counter-UAS)システムの開発と展開に重点が置かれている。しかし、これらのシステムの能力はベンチマークが難しい。これらのシステムの性能主張は、現在証拠によって支持されていない。さらに、これらのDTIシステムでは標準的なテスト方法論が利用できず、異なるテスト方法論がこれらのシステムの比較を困難または不可能にしている。本稿では,C-UASを対象とした商用DTIソリューションにおける目標駆動型テスト手法の定義,開発,検証,およびそれに対応する性能評価について報告する。開発された方法論は、運用上関係のあるエンドユーザーシナリオに基づいている。テスト手法は汎用DTIシステムレイアウトに基づいており、コンテキスト情報とエンドユーザー入力を考慮して検出、追跡、識別を行う。 DTIシステムの性能に影響を及ぼす可能性のある潜在的な環境面を考慮し、関連する環境における方法論の使用を可能にするために、比較性能評価法を開発した。関連する環境での作業の検証は、3つの運用試験で行われている。運用試験の結果、本手法は、コンポーネントレベル(検出、追跡、識別コンポーネント)とシステムレベル(これらのコンポーネントとシステムソリューションの統合DTIシステム)のパフォーマンス評価を可能にすることが示された。 Unmanned Aerial Systems (UASs) or drones become more and more commercially available and cheap. There has been much emphasis on developing and deploying Counter-UAS systems (UASs) with Detection Tracking and Identification (DTI) solutions. However, the capabilities of these systems are hard to benchmark. Performance claims of these systems are currently not supported by evidence. In addition, no standard test methodologies are available for these DTI systems and different test methodologies make comparison of these systems hard or impossible. We report on the definition, development and verification of an objective-driven test method and corresponding comparative performance evaluation for commercial DTI solutions for C-UASs. The developed methodology is based on end-user scenarios that are operationally relevant. The test methodology is based on a generic DTI system lay-out and is detailed towards detection, tracking and identification, taking into account contextual information and end-user input. The comparative performance evaluation is developed to enable the use of the methodology in a relevant environment, thereby taking into account any potential environmental aspect that might influence DTI system performance. Validation of the work in a relevant environment has been done in three operational trials. The operational trial results show that the method allows for performance evaluation at component level (i.e., detection, tracking or identification component) and at system level (combinations of these components and integrated DTI system of system solutions).	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# グラフニューラルネットに基づくクエリプラン表現の新しい手法 A Novel Technique for Query Plan Representation Based on Graph Neural Nets ( http://arxiv.org/abs/2405.04814v2 ) ライセンス: Link先を確認	Baoming Chang, Amin Kamali, Verena Kantere,	(参考訳) クエリプランの学習表現は、データベース管理システムの機械学習ベースのクエリオプティマイザにおいて重要な役割を果たす。この目的のために、木構造クエリプランを下流機械学習モデルで学習可能なフォーマットで表現に変換するために、特定のモデルアーキテクチャが文献で提案されている。しかし、既存の研究では、これらのツリーモデルのクエリプラン表現能力と、全体的なオプティマイザの性能に対する直接的な影響を比較し、分析することはめったにない。この問題に対処するために、我々は、比較的複雑なワークロードにおいて、最適化者のコスト推定と計画選択性能に異なる最先端ツリーモデルを使用することの効果を比較検討する。さらに、クエリ計画表現タスクでグラフニューラルネットワーク(GNN)を使用する可能性についても検討する。本稿では, Gated Recurrent Unit (GRU) で集約された双方向GNNを用いたツリーモデルBiGGを提案する。 Learning representations for query plans play a pivotal role in machine learning-based query optimizers of database management systems. To this end, particular model architectures are proposed in the literature to transform the tree-structured query plans into representations with formats learnable by downstream machine learning models. However, existing research rarely compares and analyzes the query plan representation capabilities of these tree models and their direct impact on the performance of the overall optimizer. To address this problem, we perform a comparative study to explore the effect of using different state-of-the-art tree models on the optimizer's cost estimation and plan selection performance in relatively complex workloads. Additionally, we explore the possibility of using graph neural networks (GNNs) in the query plan representation task. We propose a novel tree model BiGG employing Bidirectional GNN aggregated by Gated recurrent units (GRUs) and demonstrate experimentally that BiGG provides significant improvements to cost estimation tasks and relatively excellent plan selection performance compared to the state-of-the-art tree models.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 高次元観測から低次元潜在ダイナミクスを学習する:非漸近と下界 Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds ( http://arxiv.org/abs/2405.06089v2 ) ライセンス: Link先を確認	Yuyang Zhang, Shahriar Talebi, Na Li,	(参考訳) 本稿では,低次元潜在変数を持つ線形時間不変モデル(LTI)の学習に焦点をあてる。我々は,観測者の列空間のような高次元の特徴を復元し,データを低次元に埋め込み,低次元モデルパラメータを学習するアルゴリズムを提案する。我々のアルゴリズムは、次数$\tilde{\mathcal{O}}(n/\epsilon^2)$のサンプル複雑性を保証する。さらに、この複雑性境界が対数係数と次元非依存定数に最適であることを示す基本的な下界を確立する。この避けられない$n$の線形係数は、高次元ノイズの存在下で観測者の列空間の学習誤差に起因する。結果を拡張して,複数のLTIシステムのデータセットからオブザーバ列空間を総合的に学習する,様々な実世界のアプリケーションから着想を得たメタラーニング問題を考える。その後、サンプルの複雑性を低下させるメタデータセットからLTIシステムの学習を容易にするエンド・ツー・エンドのアルゴリズムが提案される。 In this paper, we focus on learning a linear time-invariant (LTI) model with low-dimensional latent variables but high-dimensional observations. We provide an algorithm that recovers the high-dimensional features, i.e. column space of the observer, embeds the data into low dimensions and learns the low-dimensional model parameters. Our algorithm enjoys a sample complexity guarantee of order $\tilde{\mathcal{O}}(n/\epsilon^2)$, where $n$ is the observation dimension. We further establish a fundamental lower bound indicating this complexity bound is optimal up to logarithmic factors and dimension-independent constants. We show that this inevitable linear factor of $n$ is due to the learning error of the observer's column space in the presence of high-dimensional noises. Extending our results, we consider a meta-learning problem inspired by various real-world applications, where the observer column space can be collectively learned from datasets of multiple LTI systems. An end-to-end algorithm is then proposed, facilitating learning LTI systems from a meta-dataset which breaks the sample complexity lower bound in certain scenarios.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# ランダム行列理論は対称正定値行列のフレシェ平均を改善した Random matrix theory improved Fréchet mean of symmetric positive definite matrices ( http://arxiv.org/abs/2405.06558v2 ) ライセンス: Link先を確認	Florent Bouchard, Ammar Mian, Malik Tiomoko, Guillaume Ginolhac, Frédéric Pascal,	(参考訳) 本研究では、機械学習における共分散行列の領域について考察し、特にFr'echetは対称正定値行列の多様体(一般にカーチャー(Karcher)あるいは幾何学的手段(Geological means)と呼ばれる)上での計算に焦点をあてる。このような手段は、多くの機械学習タスクで活用される。統計的手法を応用して,Fr'echetを推定する確率行列理論に基づく手法を導入する。人工脳波と実世界の脳波とハイパースペクトルの両方を含む実験結果から,我々は最先端の手法を大きく上回っていることが明らかとなった。 In this study, we consider the realm of covariance matrices in machine learning, particularly focusing on computing Fr\'echet means on the manifold of symmetric positive definite matrices, commonly referred to as Karcher or geometric means. Such means are leveraged in numerous machine-learning tasks. Relying on advanced statistical tools, we introduce a random matrix theory-based method that estimates Fr\'echet means, which is particularly beneficial when dealing with low sample support and a high number of matrices to average. Our experimental evaluation, involving both synthetic and real-world EEG and hyperspectral datasets, shows that we largely outperform state-of-the-art methods.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# あらゆるデータ配信のためのコンフォーマルな妥当性保証(そしてその方法) Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them) ( http://arxiv.org/abs/2405.06627v3 ) ライセンス: Link先を確認	Drew Prinster, Samuel Stanton, Anqi Liu, Suchi Saria,	(参考訳) 人工知能(AI)/機械学習(ML)が広く普及するにつれて、実践者はこれらのシステムがもたらすリスクを定量化し、制御する方法を模索している。このようなシステムが、ブラックボックス最適化やアクティブラーニングなど、独自のデータを収集する自律性を持つ場合には、この課題は特に有益である。コンフォーマル予測は、不確実性とリスク定量化に対する有望なアプローチであるが、事前の変種による妥当性保証は、データ分布に「準交換可能性」の何らかの形式を仮定し、多くのシーケンシャルシフトを排除している。本稿では,共形予測が,交換可能データや準交換可能データだけでなく,理論的に「textit{any}」結合データ分布に拡張可能であることを証明する。最も一般的なケースは計算に実用的でないが、具体的には、任意のデータ分布に対して特定の共形アルゴリズムを導出するための手順を概説し、この手順を用いて、AI/ML-エージェントが引き起こす共変量シフトに対して、抽出可能なアルゴリズムを導出する。提案アルゴリズムは,合成ブラックボックス最適化とアクティブ学習タスクを実証的に評価する。 As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# TKAN: 一時的コルモゴロフ・アルノルドネットワーク TKAN: Temporal Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2405.07344v2 ) ライセンス: Link先を確認	Remi Genet, Hugo Inzirillo,	(参考訳) リカレントニューラルネットワーク(RNN)は、特に自然言語やデータシーケンス処理において、機械学習の多くの領域に革命をもたらした。 LSTM(Long Short-Term Memory)は、シーケンシャルデータにおける長期的な依存関係をキャプチャする能力を示している。 MLP(Multi-Layer Perceptrons)に代わる有望な代替手段であるKolmogorov-Arnold Networks(KAN)に触発された我々は、kanとLSTM、TKAN(Temporal Kologorov-Arnold Networks)に触発された新しいニューラルネットワークアーキテクチャを提案した。 TKANは両方のネットワークの強みを組み合わせたもので、メモリ管理を組み込んだRecurring Kolmogorov-Arnold Networks (RKANs) Layersで構成されている。この革新により、精度と効率を向上したマルチステップ時系列予測が可能となる。複雑なシーケンシャルパターンを扱う場合の従来のモデルの限界に対処することにより、TKANアーキテクチャは予測を1段階以上進める必要がある分野において、大きな可能性をもたらす。 Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# AnoVox: 自動運転におけるマルチモーダル異常検出ベンチマーク AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving ( http://arxiv.org/abs/2405.07865v3 ) ライセンス: Link先を確認	Daniel Bogdoll, Iramm Hamdard, Lukas Namgyu Rößler, Felix Geisler, Muhammed Bayram, Felix Wang, Jan Imhof, Miguel de Campos, Anushervon Tabarov, Yitian Yang, Hanno Gottschalk, J. Marius Zöllner,	(参考訳) 自動運転車のスケールアップは、道路上のまれな物体のような異常に対処する能力に大きく依存している。このような状況に対処するためには、そもそも異常を検出する必要がある。自動走行の異常検出はここ数年で大きな進歩を遂げてきたが、カメラデータに強く焦点を絞った設計の悪いベンチマークに悩まされている。本研究では,自動運転におけるANOmaly検出のための最大のベンチマークであるAnoVoxを提案する。 AnoVoxは、大規模なマルチモーダルセンサーデータと空間的VOXel地上真実を組み込んでおり、使用済みセンサとは無関係な方法の比較を可能にしている。正規性の形式的定義を提案し,従順なトレーニングデータセットを提供する。 AnoVoxは、コンテンツと時間的異常の両方を含む最初のベンチマークである。 The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 代表選手によるグラフィオン平均フィールドゲーム:分析と学習アルゴリズム Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm ( http://arxiv.org/abs/2405.08005v2 ) ライセンス: Link先を確認	Fuzhong Zhou, Chenyu Zhang, Xu Chen, Xuan Di,	(参考訳) 本稿では,エージェント間の不均一な相互作用を伴う確率ゲームの研究に代表者を用いた連続状態とアクション空間の離散時間グラフゲーム定式化を提案する。この定式化は、プレイヤーの連続体を用いた広く採用されている定式化と比較して、哲学的および数学的優位性の両方を認めている。軽度の仮定でグラノン平衡の存在と特異性を証明し、この平衡を用いてネットワーク上の有限プレイヤーゲームに対する近似解を構築できることを示し、次元性の呪いによって解析と解決が困難である。オンラインのオラクルフリー学習アルゴリズムは平衡を数値的に解くために開発され、その収束のためにサンプル複雑性解析が提供される。 We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 非ユークリッド計量によるヒルベルト空間における部分系の定義 Defining subsystems in Hilbert spaces with non-Euclidean metric ( http://arxiv.org/abs/2405.08095v2 ) ライセンス: Link先を確認	Himanshu Badhani, Sibasish Ghosh,	(参考訳) この研究は、下層の内積構造とは独立に有限次元ヒルベルト空間内の部分系を同定する一貫した方法の概要を述べる。いわゆる計量作用素によって定義される修正内積を持つヒルベルト空間が、例えば、均衡した利得と損失を含むような特定の現象を表現する最も自然な方法であることが証明されている。擬エルミート進化を経る合成系では、部分系を定義することは一般に、計量作用素がテンソル積形式を持つように選択された場合にのみ実現可能であると考えられ、部分的トレース演算を適切に定義することができる。本研究では、計量がテンソル積形式であるか否かに関わらず、すべての距離空間において部分系が十分に定義可能であることを示すために、代数量子力学からの引数を用いる。これは、基底となる$C^$-algebraを可換な部分代数に分解した部分系を識別する。異なるサブシステム分解は、GNS表現の異なる同値類を選択することに一致することを示す。さらに、擬エルミート・ハミルトニアンの形式が与えられた場合、ハミルトニアン互換計量の選択は部分系分解を特徴づけ、結果として系の絡み合い構造を特徴づける。このように定義された各サブシステムは、トモグラフィ的に構築可能であり、これらのサブシステムは、符号付けの原則を満たす。これらの結果から、計量作用素のすべての選択を等しい足場に配置する。 This work outlines a consistent method of identifying subsystems in finite-dimensional Hilbert spaces, independent of the underlying inner-product structure. It has been well established that Hilbert spaces with modified inner-product, defined through the so-called metric operator, turn out to be the most natural ways to represent certain phenomena such as those involving balanced gain and loss resulting in pseudo-Hermitian Hamiltonians. For composite systems undergoing pseudo-Hermitian evolution, defining the subsystems is generally considered feasible only when the metric operator is chosen to have a tensor product form so that a partial trace operation can be well defined. In this work, we use arguments from algebraic quantum mechanics to show that the subsystems can be well-defined in every metric space -- irrespective of whether or not the metric is of tensor product form. This is done by identifying subsystems with a decomposition of the underlying $C^$-algebra into commuting sub-algebras. We show that different subsystem decompositions correspond to choosing different equivalence classes of the GNS representation. Furthermore, given a form of pseudo-Hermitian Hamiltonian, the choice of the Hamiltonian compatible metric characterizes the subsystem decomposition and as a consequence, the entanglement structure in the system. We clarify how each of the subsystems, defined this way, can be tomographically constructed and that these subsystems satisfy the no-signaling principle. With these results, we put all the choices of the metric operator on an equal footing.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# クリロフ空間における量子ダイナミクス:方法と応用 Quantum Dynamics in Krylov Space: Methods and Applications ( http://arxiv.org/abs/2405.09628v2 ) ライセンス: Link先を確認	Pratik Nandy, Apollonas S. Matsoukas-Roubeas, Pablo Martínez-Azcona, Anatoly Dymarsky, Adolfo del Campo,	(参考訳) 量子系の力学は状態空間や作用素空間(クリロフ空間)の部分空間内で展開する。このレビューでは、クリロフ部分空間法を用いて、大きなヒルベルト空間を持つ多体系の非平衡現象に重点を置いて、量子進化のコンパクトで効率的な記述を提供する。これは、ハイゼンベルク図における作用素の量子進化と純粋かつ混合状態に焦点を当てた最近の発展の包括的更新を提供する。さらに、作用素成長を定量化するためのツールとして、Krylov複雑性と関連するメトリクスの概念、一般化された量子速度制限による境界、普遍的な作用素成長仮説、量子カオス、スクランブル、一般化されたコヒーレント状態との関係について考察する。開量子系に対するクリロフ構成のいくつかの一般化の比較を示す。クリャロフ部分空間法の量子場理論、ホログラフィー、可積分性、量子制御、量子コンピューティングへの応用と、現在のオープンな問題に対処する。 The dynamics of quantum systems unfolds within a subspace of the state space or operator space, known as the Krylov space. This review presents the use of Krylov subspace methods to provide a compact and computationally efficient description of quantum evolution, with emphasis on nonequilibrium phenomena of many-body systems with a large Hilbert space. It provides a comprehensive update of recent developments, focused on the quantum evolution of operators in the Heisenberg picture as well as pure and mixed states. It further explores the notion of Krylov complexity and associated metrics as tools for quantifying operator growth, their bounds by generalized quantum speed limits, the universal operator growth hypothesis, and its relation to quantum chaos, scrambling, and generalized coherent states. A comparison of several generalizations of the Krylov construction for open quantum systems is presented. A closing discussion addresses the application of Krylov subspace methods in quantum field theory, holography, integrability, quantum control, and quantum computing, as well as current open problems.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# 大規模言語モデルによる科学的仮説生成:乳癌治療における検査的検証 Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment ( http://arxiv.org/abs/2405.12258v2 ) ライセンス: Link先を確認	Abbi Abdel-Rehim, Hector Zenil, Oghenejokpeme Orhobor, Marie Fisher, Ross J. Collins, Elizabeth Bourne, Gareth W. Fearnley, Emma Tate, Holly X. Smith, Larisa N. Soldatova, Ross D. King,	(参考訳) 大規模言語モデル(LLM)はAIを変革し、人間の知性を必要とする幅広いタスクにおいて画期的なパフォーマンスを達成した。科学において、LLMの最も興味深い応用は仮説形成である。 LLMの特徴は、その確率的構造から生じるものであり、出力テキストが必ずしもトレーニングテキストからの有効な推論であるとは限らないことである。これらは「幻覚」であり、多くのアプリケーションにおいて深刻な問題である。しかし、科学では幻覚は有用であり、実験室で検証できる新しい仮説である。ここでは乳がん治療の分野での科学的仮説の根拠としてLLMの使用を実験的に検証する。 LLM GPT4を用いて,MCF7乳がん細胞株を標的とした新しいFDA承認非癌薬の仮説を立証した。実験の第1ラウンドで、GPT4は、正の制御以上のシナジースコアを持つ3つの薬物の組み合わせ(テストされた12のうち)を発見することに成功した。これらの組み合わせはイトラコナゾール+アテノール、ジスルフィラム+シムバスタチン、ジピリダモール+メベンダゾールである。その後、GPT4は最初の結果を考慮して新しい組み合わせを生成するよう求められた。その後、さらに3つの正のシナジースコア(4つの試験のうち)が発見され、これらはジスルフィラム+フヴェストラント、メベンダゾール+キナクリン、ジスルフィラム+キナクリンであった。仮説の生成元としてのGPT4の限界は、それらの説明が定式化され、説得力がないことである。 LLMは科学的仮説のエキサイティングな新しい源であると結論付けている。 Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a valid inference from the training text. These are 'hallucinations', and are a serious problem in many applications. However, in science, hallucinations may be useful: they are novel hypotheses whose validity may be tested by laboratory experiments. Here we experimentally test the use of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment. We applied the LLM GPT4 to hypothesize novel pairs of FDA-approved non-cancer drugs that target the MCF7 breast cancer cell line relative to the non-tumorigenic breast cell line MCF10A. In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations (out of 12 tested) with synergy scores above the positive controls. These combinations were itraconazole + atenolol, disulfiram + simvastatin and dipyridamole + mebendazole. GPT4 was then asked to generate new combinations after considering its initial results. It then discovered three more combinations with positive synergy scores (out of four tested), these were disulfiram + fulvestrant, mebendazole + quinacrine and disulfiram + quinacrine. A limitation of GPT4 as a generator of hypotheses was that its explanations for them were formulaic and unconvincing. We conclude that LLMs are an exciting novel source of scientific hypotheses.	翻訳日:2024-06-07 00:09:48 公開日:2024-06-05
# ピラミッドインファー:高スループットLDM推論のためのピラミッドKVキャッシュ圧縮 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference ( http://arxiv.org/abs/2405.12532v2 ) ライセンス: Link先を確認	Dongjie Yang, XiaoDong Han, Yan Gao, Yao Hu, Shilin Zhang, Hai Zhao,	(参考訳) 大規模言語モデル(LLM)は、目覚ましい理解能力を示しているが、推論中のGPUメモリ使用の課題に直面しており、チャットボットのようなリアルタイムアプリケーションに対するスケーラビリティを妨げている。推論を高速化するために、計算されたキーと値(KVキャッシュ)をGPUメモリに格納する。既存のKVキャッシュ圧縮法では、プリ計算されたKVキャッシュをプルーニングすることでメモリを削減できる。しかし、プレ計算において、レイヤ間の層間依存関係と巨大なメモリ消費を無視する。これらの欠陥を探索するために、将来の世代に影響を与える重要なキーや値の数が層ごとに減少し、注意重みの一貫性によってそれらを抽出できることがわかった。そこで本研究では,KVキャッシュを重要コンテキストを階層的に保持することで圧縮するPraamidInferを提案する。 PyramidInferは、パフォーマンスを犠牲にすることなく、少ないキーと値を計算することで、大きなメモリを節約する。実験の結果、PraamidInferは、KVキャッシュで54%以上のGPUメモリを削減したAccelerateと比較して、2.2倍のスループットを向上した。 Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference, hindering their scalability for real-time applications like chatbots. To accelerate inference, we store computed keys and values (KV cache) in the GPU memory. Existing methods study the KV cache compression to reduce memory by pruning the pre-computed KV cache. However, they neglect the inter-layer dependency between layers and huge memory consumption in pre-computation. To explore these deficiencies, we find that the number of crucial keys and values that influence future generations decreases layer by layer and we can extract them by the consistency in attention weights. Based on the findings, we propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer saves significant memory by computing fewer keys and values without sacrificing performance. Experimental results show PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# 大規模言語モデルにおける政治的バイアスの評価 Assessing Political Bias in Large Language Models ( http://arxiv.org/abs/2405.13041v3 ) ライセンス: Link先を確認	Luca Rettenberger, Markus Reischl, Mark Schutera,	(参考訳) 大規模言語モデル(LLMs)におけるバイアスの評価は、社会的ダイナミクスに対する潜在的な影響の文脈において、人工知能(AI)を取り巻く現代の議論において重要な関心事となっている。 LLMアプリケーション内での政治的偏見の認識と考慮は、特に、パフォーマンス予測に向けてチップポイントを閉じる際に重要である。そして、潜在的効果と社会的行動について教育を受けることで、LLMは人間のオペレーターとの相互作用により、大規模に運転することができる。このようにして、欧州議会の次の選挙は LLM の影響を受けないままである。我々は、欧州連合(EU)内の政治問題に関して、現在最も人気のあるオープンソースLLM(インストラクションまたはアシスタントモデル)の政治的バイアスを、ドイツの有権者の視点から評価する。そのために、ドイツで使われている投票アドバイスアプリケーション"Wahl-O-Mat"を使用します。ウォール=オ=マト」の投票助言から、ドイツ政党とのLLMの整合度を定量化する。 Llama3-70Bのような大型モデルは、左派政党とより緊密に結びつく傾向にある一方で、小さなモデルは、特に英語で促された場合、中立であることが多い。中心的な発見は、LLMも同様に偏りがあり、特定のパーティに関するアライメントのばらつきが低いことである。本研究は,性能予測能力と機械学習予測および言語生成の目に見えない手を用いたアプリケーションの完全性と信頼性を守るため,LLMにおける偏見の透明化を厳格に評価することの重要性を明らかにした。 The assessment of bias within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) in the context of their potential impact on societal dynamics. Recognizing and considering political bias within LLM applications is especially important when closing in on the tipping point toward performative prediction. Then, being educated about potential effects and the societal behavior LLMs can drive at scale due to their interplay with human operators. In this way, the upcoming elections of the European Parliament will not remain unaffected by LLMs. We evaluate the political bias of the currently most popular open-source LLMs (instruct or assistant models) concerning political issues within the European Union (EU) from a German voter's perspective. To do so, we use the "Wahl-O-Mat," a voting advice application used in Germany. From the voting advice of the "Wahl-O-Mat" we quantize the degree of alignment of LLMs with German political parties. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, particularly when prompted in English. The central finding is that LLMs are similarly biased, with low variances in the alignment concerning a specific party. Our findings underline the importance of rigorously assessing and making bias transparent in LLMs to safeguard the integrity and trustworthiness of applications that employ the capabilities of performative prediction and the invisible hand of machine learning prediction and language generation.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# 未来を創るAIコミュニティ : ハグする顔ハブの開発活動の定量的分析 The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub ( http://arxiv.org/abs/2405.13058v2 ) ライセンス: Link先を確認	Cailean Osborne, Jennifer Ding, Hannah Rose Kirk,	(参考訳) オープンモデル開発者は、人工知能(AI)の政治経済において重要な役割を担っている。本稿では,Huging Face (HF) Hubにおける開発活動の定量的分析を3段階に分けて行うことで,このギャップに対処する。まず、348,181モデル、65,761データセット、および156,642スペースリポジトリのさまざまな種類のアクティビティが右スクリュー分布を示している。例えば、70%以上のモデルが0回ダウンロードされており、1%が99%のダウンロードを占めている。さらに、ライセンスは重要です: パーミッシブで制限的で、ライセンスのないモデルリポジトリでは、コラボレーションパターンに統計的に有意な違いがあります。第2に、モデルリポジトリにおけるコラボレーションのソーシャルネットワーク構造のスナップショットを分析し、コミュニティがコア周辺構造を持ち、多彩な開発者のコアと分離された開発者の大多数(89%)が参加していることを発見した。分離された開発者をネットワークから排除すると、コラボレーションは開発者のネットワーク位置に関係なく高い相互性によって特徴づけられる。第三に、空間におけるモデル利用のレンズを通してモデルの採用を検討し、少数の企業が開発している少数のモデルがHF Hubで広く使われていることを発見した。全体として、HF Hub上のアクティビティはParetoディストリビューションによって特徴づけられ、GitHubのようなプラットフォーム上のOSS開発パターンと一致している。我々は、オープンAI開発の理解を深めるための研究者、企業、政策立案者への勧告で締めくくります。 Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing the isolate developers from the network, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, activity on the HF Hub is characterised by Pareto distributions, congruent with OSS development patterns on platforms like GitHub. We conclude with recommendations for researchers, companies, and policymakers to advance our understanding of open AI development.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# SLIFER: マルウェア検出パイプラインの性能とロバスト性の調査 SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines ( http://arxiv.org/abs/2405.14478v2 ) ライセンス: Link先を確認	Andrea Ponte, Dmitrijs Trizna, Luca Demetrio, Battista Biggio, Ivan Tesfai Ogbu, Fabio Roli,	(参考訳) 何十年にもわたっての研究の結果、Windowsのマルウェア検出は数多くの技術を通してアプローチされている。しかしながら、検出率と低い誤報の観点から最適なパフォーマンスを追求するアカデミックと、現実のシナリオの要件との間には、継続的なミスマッチがある。特にアカデミックは、単一のモデルまたはアンサンブル内で静的解析と動的解析を組み合わせることに集中し、いくつかの落とし穴に陥る。一必要な計算負担を考慮せずに、動的解析を行うこと。二分析不可能なサンプルを廃棄すること、及び三敵攻撃に対する頑健さを、マルウェア検知器がより非機械的学習部品で補完されていることを考慮せずに分析すること。そこで本稿では,静的解析と動的解析の両方を逐次的に活用し,ひとつのモジュールがアラームを起動するとすぐに計算を中断し,必要な時にのみ動的解析を必要とする,新しいWindowsマルウェア検出パイプラインであるSLIFERを提案する。現状とは対照的に、分析に対するサンプル抵抗の扱い方について検討し、それらがパフォーマンスにどの程度影響するかを示し、誤報を劇的に増やさないよう正当であるとフラグを立てた方がよいと結論付けた。最後に、コンテンツインジェクション攻撃を利用したSLIFERの堅牢性評価を行い、対戦戦略を最適化しながら生成したバイトアーティファクトによる動的解析よりも、YARAルールにより攻撃がブロックされることを示す。 As a result of decades of research, Windows malware detection is approached through a plethora of techniques. However, there is an ongoing mismatch between academia -- which pursues an optimal performances in terms of detection rate and low false alarms -- and the requirements of real-world scenarios. In particular, academia focuses on combining static and dynamic analysis within a single or ensemble of models, falling into several pitfalls like (i) firing dynamic analysis without considering the computational burden it requires; (ii) discarding impossible-to-analyse samples; and (iii) analysing robustness against adversarial attacks without considering that malware detectors are complemented with more non-machine-learning components. Thus, in this paper we propose SLIFER, a novel Windows malware detection pipeline sequentially leveraging both static and dynamic analysis, interrupting computations as soon as one module triggers an alarm, requiring dynamic analysis only when needed. Contrary to the state of the art, we investigate how to deal with samples resistance to analysis, showing how much they impact performances, concluding that it is better to flag them as legitimate to not drastically increase false alarms. Lastly, we perform a robustness evaluation of SLIFER leveraging content-injections attacks, and we show that, counter-intuitively, attacks are blocked more by YARA rules than dynamic analysis due to byte artifacts created while optimizing the adversarial strategy.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# フォールトトレラントML:効率的なメタアグリゲーションと同期トレーニング Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training ( http://arxiv.org/abs/2405.14759v2 ) ライセンス: Link先を確認	Tehila Dahan, Kfir Y. Levy,	(参考訳) 本稿では,分散機械学習(ML)システムにおけるビザンチン・ロバスト学習の挑戦的枠組みについて検討し,効率性と実用性の両方に焦点をあてる。分散MLシステムは複雑なMLタスクに不可欠なものとなり、ビザンチンの障害に対するレジリエンスを確保する。最初のコントリビューションは、CTMA(Centered Trimmed Meta Aggregator)の導入です。これは、低計算要求を必要としながら、ベースラインアグリゲータを最適なパフォーマンスレベルにアップグレードする効率的なメタアグリゲータです。さらに,ビザンチン文脈における2重モーメント戦略に基づいて,最近開発された勾配推定手法を提案する。本稿では,ビザンチン・ロバスト訓練の理論的・実践的優位性,特にチューニングプロセスの簡素化と多数のハイパーパラメータへの依存軽減について述べる。この手法の有効性は確率凸最適化(SCO)フレームワークの理論的な洞察に支えられ、実証的な証拠によって裏付けられる。 In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# 制御可能なメモリを用いたパイプライン並列処理 Pipeline Parallelism with Controllable Memory ( http://arxiv.org/abs/2405.15362v2 ) ライセンス: Link先を確認	Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin,	(参考訳) パイプライン並列性は広く研究されてきたが、既存のスケジュールには体系的な方法論がない。本稿では,パイプラインスケジュールをビルディングブロックの繰り返しとして分解するフレームワークを提案し,ビルディングブロックの寿命がパイプラインスケジュールのピークアクティベーションメモリを決定することを示す。観察によってガイドされた結果,既存のパイプラインスケジュールのほとんどすべてが,私たちの知る限りでは,メモリ非効率であることが分かりました。これを解決するために、制御可能なアクティベーションメモリを備えたメモリ効率の良いビルディングブロック群を導入し、1F1Bのピークアクティベーションメモリを、効率を犠牲にすることなく1/2に削減し、最大スループットで1/3にまで削減する。また、1F1Bと同じアクティベーションメモリを維持しながら、ほぼゼロのパイプラインバブルを実現できる。我々の評価は、純粋なパイプライン並列化設定では、スループットの点で1F1Bを7%から55%上回っていることを示している。提案手法は,大規模言語モデルの1F1Bベースラインよりも16%のスループット向上を示す。 Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our proposed methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# インプシットバイアスは逆行性ロバスト性を引き起こすか? Can Implicit Bias Imply Adversarial Robustness? ( http://arxiv.org/abs/2405.15942v2 ) ライセンス: Link先を確認	Hancheng Min, René Vidal,	(参考訳) 勾配に基づくトレーニングアルゴリズムの暗黙のバイアスは、しばしばよく一般化されるトレーニングネットワークにつながるため、主に有益であると考えられている。しかし、Frei et al (2023) はそのような暗黙の偏見が敵の頑健さを損なうことを示した。具体的には、クラスタ間相関が小さいクラスタからなる場合、勾配流によって訓練された浅層(二層)のReLUネットワークはよく一般化するが、小さな半径の敵攻撃に対して堅牢ではないことを示す。さらに、この現象は浅いネットワークから明示的に構築できるより堅牢な分類器が存在するにもかかわらず起こる。本稿では,近年のニューロンアライメント解析を拡張し,勾配流によってトレーニングされた多項式ReLU活性化(pReLU)の浅いネットワークが一般化するだけでなく,敵の攻撃に対して堅牢であることを示す。本結果は,学習ネットワークの暗黙的バイアスとロバスト性において,データ構造とアーキテクチャ設計の相互作用の重要性を強調した。 The implicit bias of gradient-based training algorithms has been considered mostly beneficial as it leads to trained networks that often generalize well. However, Frei et al. (2023) show that such implicit bias can harm adversarial robustness. Specifically, they show that if the data consists of clusters with small inter-cluster correlation, a shallow (two-layer) ReLU network trained by gradient flow generalizes well, but it is not robust to adversarial attacks of small radius. Moreover, this phenomenon occurs despite the existence of a much more robust classifier that can be explicitly constructed from a shallow network. In this paper, we extend recent analyses of neuron alignment to show that a shallow network with a polynomial ReLU activation (pReLU) trained by gradient flow not only generalizes well but is also robust to adversarial attacks. Our results highlight the importance of the interplay between data structure and architecture design in the implicit bias and robustness of trained networks.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# ContrastAlign:マルチモーダル3次元物体検出のためのコントラスト学習によるロバストなBEV特徴アライメントを目指して ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection ( http://arxiv.org/abs/2405.16873v2 ) ライセンス: Link先を確認	Ziying Song, Feiyang Jia, Hongyu Pan, Yadan Luo, Caiyan Jia, Guoxin Zhang, Lin Liu, Yang Ji, Lei Yang, Li Wang,	(参考訳) 3Dオブジェクト検出タスクの分野では、LiDARとカメラセンサーの不均一な特徴を統一されたBird's Eye View(BEV)表現に融合することが広く採用されているパラダイムである。しかし、既存の手法は、しばしば不正確なセンサーキャリブレーションによって妥協され、LiDARカメラのBEV融合における特徴的不一致をもたらす。さらに、このような不正確さは、カメラブランチの深さ推定の誤差をもたらし、最終的にLiDARとカメラBEVの特徴の不一致を引き起こす。本研究では,異種モードのアライメントを向上し,融合プロセスの堅牢性を向上させるために,コントラストアライメントを用いた新しいコントラストアライメント手法を提案する。具体的には、LiDAR BEV機能内で直接LiDARインスタンス機能を出力するL-Instanceモジュールを含む。次に,カメラBEV機能上でのRoI(Region of Interest)プールによるカメラインスタンス機能の予測を行うC-Instanceモジュールを紹介する。異種多様度にまたがる類似のインスタンス機能を生成するために,コントラスト学習を利用するインスタンスフュージョンモジュールを提案する。次に、グラフマッチングを使用して、隣接するカメラインスタンス機能と類似度インスタンス機能との類似度を計算し、インスタンス機能のアライメントを完了します。 MAPは70.3%であり, nuScenes 検証セットでは BEVFusion を 1.8% 上回っている。 BEVFusionを7.3%改善し,騒音の悪さを解消した。 In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a novel ContrastAlign approach that utilizes contrastive learning to enhance the alignment of heterogeneous modalities, thereby improving the robustness of the fusion process. Specifically, our approach includes the L-Instance module, which directly outputs LiDAR instance features within LiDAR BEV features. Then, we introduce the C-Instance module, which predicts camera instance features through RoI (Region of Interest) pooling on the camera BEV features. We propose the InstanceFusion module, which utilizes contrastive learning to generate similar instance features across heterogeneous modalities. We then use graph matching to calculate the similarity between the neighboring camera instance features and the similarity instance features to complete the alignment of instance features. Our method achieves state-of-the-art performance, with an mAP of 70.3%, surpassing BEVFusion by 1.8% on the nuScenes validation set. Importantly, our method outperforms BEVFusion by 7.3% under conditions with misalignment noise.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# I-LLM:完全量子化低ビット大言語モデルのための効率的な整数オンリー推論 I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models ( http://arxiv.org/abs/2405.17849v2 ) ライセンス: Link先を確認	Xing Hu, Yuan Cheng, Dawei Yang, Zhihang Yuan, Jiangyong Yu, Chen Xu, Sifan Zhou,	(参考訳) 後学習量子化(PTQ)は、大規模言語モデル(LLM)の推論を加速する強力な手法である。それでも、既存の作業は、RMSNormやSoftmaxのような非線形演算子と同様に、さらなる量子化や非量子化を含む、推論中にかなりの数の浮動小数点演算を必要とする。この制限は、エッジとクラウドデバイスへのLSMのデプロイを妨げる。本稿では,LLMにおける整数のみの量子化の主な障害は,線形演算と非線形演算の両方において,チャネルとトークン間のアクティベーションが大きく変動することにある。この問題に対処するために,LLMに適した整数のみの完全量子化PTQフレームワークであるI-LLMを提案する。具体的には,(1)全てのアクティベーションと重みのチャネル間変動を積極的にスムースに行うために,FSBR(Fully-Smooth Block-Reconstruction)を開発した。 2) トキン間変異による劣化を軽減するため, 動的整数のみのMatMul (DI-MatMul) と呼ばれる新しいアプローチを導入する。この方法は整数のみの演算で入力と出力を動的に量子化することにより、全整数行列乗法における動的量子化を可能にする。 (3) ビットシフトを利用したDI-ClippedSoftmax, DI-Exp, DI-Normalizationを設計し, 精度を維持しつつ, 非線形演算子を効率的に実行する。実験の結果,我々のI-LLMはFPベースラインに匹敵する精度を達成し,非整数量子化法より優れていた。例えば、I-LLMはW4A4で動作でき、精度は無視できる。我々の知る限り、我々は整数のみの量子化と LLM のギャップを埋める最初の人物である。我々は、この分野の進歩に貢献することを目的として、匿名の.4open.scienceに関するコードを公開しました。 Post-training quantization (PTQ) serves as a potent technique to accelerate the inference of large language models (LLMs). Nonetheless, existing works still necessitate a considerable number of floating-point (FP) operations during inference, including additional quantization and de-quantization, as well as non-linear operators such as RMSNorm and Softmax. This limitation hinders the deployment of LLMs on the edge and cloud devices. In this paper, we identify the primary obstacle to integer-only quantization for LLMs lies in the large fluctuation of activations across channels and tokens in both linear and non-linear operations. To address this issue, we propose I-LLM, a novel integer-only fully-quantized PTQ framework tailored for LLMs. Specifically, (1) we develop Fully-Smooth Block-Reconstruction (FSBR) to aggressively smooth inter-channel variations of all activations and weights. (2) to alleviate degradation caused by inter-token variations, we introduce a novel approach called Dynamic Integer-only MatMul (DI-MatMul). This method enables dynamic quantization in full-integer matrix multiplication by dynamically quantizing the input and outputs with integer-only operations. (3) we design DI-ClippedSoftmax, DI-Exp, and DI-Normalization, which utilize bit shift to execute non-linear operators efficiently while maintaining accuracy. The experiment shows that our I-LLM achieves comparable accuracy to the FP baseline and outperforms non-integer quantization methods. For example, I-LLM can operate at W4A4 with negligible loss of accuracy. To our knowledge, we are the first to bridge the gap between integer-only quantization and LLMs. We've published our code on anonymous.4open.science, aiming to contribute to the advancement of this field.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# 対数回帰に束縛された次元自由一様濃度 Dimension-free uniform concentration bound for logistic regression ( http://arxiv.org/abs/2405.18055v2 ) ライセンス: Link先を確認	Shogo Nakakita,	(参考訳) 制約付きロジスティック回帰の経験的リスク関数に拘束された新しい次元自由一様濃度を与える。我々の境界は、ラデマッハ複雑性論とマクダイアルメイドの不等式によって導かれる条件よりも大きな数の一様法則に対して、より穏やかな条件をもたらす。この導出は、2階展開を持つPAC-ベイズ法と、拡張の残余項に対するラデマッハ複素性に基づく境界に基づくものである。 We provide a novel dimension-free uniform concentration bound for the empirical risk function of constrained logistic regression. Our bound yields a milder sufficient condition for a uniform law of large numbers than conditions derived by the Rademacher complexity argument and McDiarmid's inequality. The derivation is based on the PAC-Bayes approach with second-order expansion and Rademacher-complexity-based bounds for the residual term of the expansion.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# 視覚言語ナビゲーションのための大規模モデルによる修正可能なランドマーク発見 Correctable Landmark Discovery via Large Models for Vision-Language Navigation ( http://arxiv.org/abs/2405.18721v2 ) ライセンス: Link先を確認	Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang,	(参考訳) Vision-Language Navigation (VLN) は、ターゲット位置に到達するために、エージェントが言語命令に従う必要がある。ナビゲーションを成功させる重要な要因は、指導で暗示されるランドマークを様々な視覚的観察と整合させることである。しかしながら、以前のVLNエージェントは、限られたナビゲーションデータから学習し、十分なオープンワールドアライメント知識がないため、特に探索されていないシーンでは正確なモダリティアライメントを実行できない。本研究では,Currectable LaNdmark DiScOvery と呼ばれる新しい VLN パラダイムをLarge ModEls (CONSOLE) 経由で提案する。 CONSOLEでは、2つの大きなモデルChatGPTとCLIPに基づく新しい修正可能なランドマーク発見スキームを導入することで、VLNをオープンワールドシーケンシャルなランドマーク発見問題として捉えた。具体的には、ChatGPTを使用して、豊かなオープンワールドのランドマークコモンセンスを提供し、これらのコモンセンスに基づいてCLIP駆動のランドマーク発見を行う。視覚的制約の欠如による前者の騒音を軽減するため,学習可能な共起スコアリングモジュールを導入し,実際の観測結果に基づいて各共起の重要度を補正し,正確なランドマーク発見を行う。我々はさらに、異なるVLNエージェントとエレガントな組み合わせのための観察強化戦略を設計し、修正されたランドマーク特徴を用いて行動決定のための観察機能を得る。複数の人気のあるVLNベンチマーク(R2R、REVERIE、R4R、RxR)の大規模な実験結果から、強力なベースラインよりもCONSOLEの顕著な優位性が確認された。特に,我々のCONSOLEは,目に見えないシナリオにおいて,R2RとR4Rの最先端結果を確立している。コードはhttps://github.com/expectorlin/CONSOLEで入手できる。 Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# FUSU:きめ細かい都市セマンティック理解のための多時期的土地利用変化セグメンテーションデータセット FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding ( http://arxiv.org/abs/2405.19055v2 ) ライセンス: Link先を確認	Shuai Yuan, Guancong Lin, Lixian Zhang, Runmin Dong, Jinxiao Zhang, Shuang Chen, Juepeng Zheng, Jie Wang, Haohuan Fu,	(参考訳) 都市部における人間と環境の相互作用を理解するためには,マルチ時間リモートセンシング画像を用いた都市変化セグメンテーションが不可欠である。都市モニタリングのためのリモートセンシングデータの進歩にもかかわらず、粗粒度分類システムと連続時間観測の欠如は、深層学習の都市変化解析への応用を妨げている。そこで本稿では,都市セマンティック理解のためのマルチソース・マルチ時間変化セグメンテーションデータセットであるFUSUを紹介する。 FUSUは、これまでで最も詳細な土地利用分類システムであり、17のクラスと300億ピクセルのアノテーションがある。 20-50cmの地上サンプルと月847km2の光学・レーダー衛星時系列の両時間高解像度衛星画像を含む。微細なピクセル単位のアノテーションと高空間時間分解能データにより、深層学習モデルが都市化と土地利用の変化を理解するための堅牢な基盤を提供する。 FUSUをフル活用するために,変更検出とセグメンテーションの両方に統一された時系列アーキテクチャを提案する。データセットとコードは、https://github.com/yuanshuai0914/FUSU.com/で利用可能になる。 Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Despite advances in remote sensing data for urban monitoring, coarse-grained classification systems and the lack of continuous temporal observations hinder the application of deep learning to urban change analysis. To address this, we introduce FUSU, a multi-source, multi-temporal change segmentation dataset for Fine-grained Urban Semantic Understanding. FUSU features the most detailed land use classification system to date, with 17 classes and 30 billion pixels of annotations. It includes bi-temporal high-resolution satellite images with 20-50 cm ground sample distance and monthly optical and radar satellite time series, covering 847 km2 across five urban areas in China. The fine-grained pixel-wise annotations and high spatial-temporal resolution data provide a robust foundation for deep learning models to understand urbanization and land use changes. To fully leverage FUSU, we propose a unified time-series architecture for both change detection and segmentation and then benchmark FUSU on various methods for several tasks. Dataset and code will be available at: https://github.com/yuanshuai0914/FUSU.	翻訳日:2024-06-06 23:59:22 公開日:2024-06-05
# Grokfast: Slow Gradientを増幅することで、グローキングを加速する Grokfast: Accelerated Grokking by Amplifying Slow Gradients ( http://arxiv.org/abs/2405.20233v2 ) ライセンス: Link先を確認	Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee,	(参考訳) グラッキングと呼ばれる機械学習のファズリングアーティファクトのひとつは、トレーニングデータにほぼ完全にオーバーフィットした後、遅れた一般化が10倍のイテレーションで達成されることだ。機械学習の実践者に代わって、長い遅れ自体に焦点をあてて、グラッキング現象下でのモデルの一般化を加速させることを目標としています。時間とともに繰り返しを訓練する際のパラメータの勾配をランダムな信号として扱うことで、勾配降下の下でパラメータの軌道をスペクトル的に2つの成分に分解することができる。この分析により、勾配の遅い成分を増幅する数行のコードだけで、$\times 50$以上のグルーキング現象を加速することができる。実験により,本アルゴリズムは画像,言語,グラフを含む多種多様なタスクに適用され,突発的一般化のこの特異な成果物の実現が可能となった。私たちのコードはhttps://github.com/ironjr/grokfast.comから入手可能です。 One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than $\times 50$ with only a few lines of code that amplifies the slow-varying components of gradients. The experiments show that our algorithm applies to diverse tasks involving images, languages, and graphs, enabling practical availability of this peculiar artifact of sudden generalization. Our code is available at https://github.com/ironjr/grokfast.	翻訳日:2024-06-06 23:49:24 公開日:2024-06-05
# 医師と医師の対話要約におけるロバスト性を探る:SOAPの外部ノートの分析 Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes ( http://arxiv.org/abs/2406.02826v1 ) ライセンス: Link先を確認	Yu-Wen Chen, Julia Hirschberg,	(参考訳) 医学的会話の要約は、専門領域と、ドメイン内のトレーニングデータを集めることの難しさにより、ユニークな課題を生んでいる。本研究では,現在最先端の医師と患者との会話生成モデルの性能について,ドメイン外データを用いて検討した。 1)主観的(S)、目的的(O)、評価的(A)、計画的(P)ノートを指定せずに、一般的なモデル、(2)SOAPセクションの要約を生成するSOAP指向モデルである。両構成における細調整型言語モデルとGPTの限界と強みを解析した。また、異なるデータセットのSOAPノートを比較するために、Lingguistic InquiryとWord Count分析を実施しました。結果は、異なるデータセット間での参照ノートに対する強い相関を示し、フォーマットミスマッチ(すなわち、単語分布の相違)がドメイン外のデータのパフォーマンス低下の主な原因ではないことを示す。最後に、SOAPノートの詳細な分析は、モデルが導入した不足情報や幻覚に関する洞察を提供するために含まれます。 Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.	翻訳日:2024-06-06 22:37:23 公開日:2024-06-05
# 確率拡散:確率時系列予測のための拡散確率モデル Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting ( http://arxiv.org/abs/2406.02827v1 ) ライセンス: Link先を確認	Yuansan Liu, Sudanthi Wijewickrema, Dongting Hu, Christofer Bester, Stephen O'Leary, James Bailey,	(参考訳) 拡散確率モデルにおける最近の革新は、画像、テキスト、音声生成の大幅な進歩の道を開いた。しかし、そのような能力を活用して高度に確率的な時系列データをモデル化することは依然として困難である。本稿では,多変量時系列データの可変性をモデル化するために,確率潜在空間の表現力を利用して,各時点におけるデータ駆動事前知識を学習する新しい確率拡散(StochDiff)モデルを提案する。学習された事前知識は、複雑な時間的ダイナミクスとデータ固有の不確実性を捉えるのに役立つ。これにより、高度に確率的な時系列データをモデル化する能力が向上する。実世界のデータセットに関する広範な実験を通じて,提案モデルが確率的時系列予測に与える影響を実証する。さらに,本モデルを用いた実世界の外科的指導について紹介し,医療コミュニティに利益をもたらす可能性を強調した。 Recent innovations in diffusion probabilistic models have paved the way for significant progress in image, text and audio generation, leading to their applications in generative time series forecasting. However, leveraging such abilities to model highly stochastic time series data remains a challenge. In this paper, we propose a novel Stochastic Diffusion (StochDiff) model which learns data-driven prior knowledge at each time step by utilizing the representational power of the stochastic latent spaces to model the variability of the multivariate time series data. The learnt prior knowledge helps the model to capture complex temporal dynamics and the inherent uncertainty of the data. This improves its ability to model highly stochastic time series data. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of our proposed model on stochastic time series forecasting. Additionally, we showcase an application of our model for real-world surgical guidance, highlighting its potential to benefit the medical community.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 大規模言語モデルは認知症関連言語異常の誘発に不均衡に耐性がある Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies ( http://arxiv.org/abs/2406.02830v1 ) ライセンス: Link先を確認	Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov,	(参考訳) 人工ニューラルネットワークが複雑化するにつれて、その内部動作を理解することはますます難しくなり、医療応用において特に重要である。自己回帰型ニューラルネットワークモデル(NLM)、パープレキシティ(PPL)の本質的な評価基準は、NLMモデルがいかに新しい入力であるかを反映することができる。 PPLはNLMの挙動を理解するために広く用いられている。以上の結果より, アルツハイマー病認知症に伴う言語異常を反映し, 注意層をマスキングする場合のPPLの変化が示唆された。そこで我々は,脳により多くのニューロンを持ち,より効率的な処理を行う人が神経変性に対してより耐性を持つことを仮定した,認知と脳保護の概念に起因した特性を示す,新しい双方向注意頭アブレーション法を提案する。以上の結果から,より大型のGPT-2モデルでは,より小型のモデルではマスキングに類似した大きさの劣化を示すために,マスキング/アタッチメントの差が大きいことが示唆された。これらの結果は、トランスフォーマーモデルにおける注意機構が認知と脳保護の概念に類似している可能性を示し、神経変性疾患や老化の進行の特定の側面をモデル化する可能性があることを示唆している。 As artificial neural networks grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive neural language models (NLMs), perplexity (PPL), can reflect how "surprised" an NLM model is at novel input. PPL has been widely used to understand the behavior of NLMs. Previous findings show that changes in PPL when masking attention layers in pre-trained transformer-based NLMs reflect linguistic anomalies associated with Alzheimer's disease dementia. Building upon this, we explore a novel bidirectional attention head ablation method that exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more efficient processing are more resilient to neurodegeneration. Our results show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation of similar magnitude to masking in smaller models. These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve and could potentially be used to model certain aspects of the progression of neurodegenerative disorders and aging.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 弱教師付きビデオ異常検出のための蒸留集約知識 Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection ( http://arxiv.org/abs/2406.02831v1 ) ライセンス: Link先を確認	Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu,	(参考訳) ビデオ異常検出は、監視ビデオにおける異常事象を識別できる自動モデルを開発することを目的としている。このタスクのベンチマーク設定は非常に難しい。一訓練セットの限られた大きさ二ビデオレベルラベルで定める監督の弱さ三異常事象の欠如により生ずる内因性階級不均衡本研究では,複数のバックボーンの集合的表現から比較的単純なモデルに知識を蒸留することで,最先端の性能が得られることを示す。特に,二段階蒸留法と新規な非絡み合い型特徴集約ネットワークを開発した。提案手法であるDAKD(Distilling Aggregated Knowledge with Disentangled Attention)は,複数のベンチマークデータセットにまたがる既存手法と比較して,優れた性能を示す。特に、UCF-Crime、ShanghaiTech、XD-Violenceデータセットでそれぞれ1.36%、0.78%、および7.02%の大幅な改善を実現しています。 Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos. The benchmark setup for this task is extremely challenging due to: i) the limited size of the training sets, ii) weak supervision provided in terms of video-level labels, and iii) intrinsic class imbalance induced by the scarcity of abnormal events. In this work, we show that distilling knowledge from aggregated representations of multiple backbones into a relatively simple model achieves state-of-the-art performance. In particular, we develop a bi-level distillation approach along with a novel disentangled cross-attention-based feature aggregation network. Our proposed approach, DAKD (Distilling Aggregated Knowledge with Disentangled Attention), demonstrates superior performance compared to existing methods across multiple benchmark datasets. Notably, we achieve significant improvements of 1.36%, 0.78%, and 7.02% on the UCF-Crime, ShanghaiTech, and XD-Violence datasets, respectively.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 低ランク行列補完アルゴリズムを用いた効率よい最小ベイズリスク復号法 Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms ( http://arxiv.org/abs/2406.02832v1 ) ライセンス: Link先を確認	Firas Trabelsi, David Vilar, Mara Finkelstein, Markus Freitag,	(参考訳) 最小ベイズリスク(MBR)復号法は、テキスト生成タスクに広く用いられている強力な復号法であるが、その2次計算複雑性は実用的応用を制限している。本稿では,機械翻訳のタスクに着目し,行列補完手法を用いてMBRデコーディングを近似する手法を提案する。 MBR復号を行列完備問題として定式化し、候補仮説と擬似参照変換の間の有効度スコアを低ランク行列とする。まず、スコア行列が実際に低ランク構造を持っていることを実証的に示す。そこで我々は,この手法を,スコアのランダムな部分集合のみを計算し,Alternating Least Squares (ALS) アルゴリズムを適用して,行列内の欠落成分を効率よく回収することにより,MBR復号プロセスの高速な近似を可能にする。 WMT22データセット(en<>de, en<>ru)上でCOMET22が測定した等価翻訳品質を実現しつつ, 機械翻訳タスクにおいて, 提案手法はバニラMBR復号よりも1/16効用メトリック計算を必要とすることを示した。また,本手法を他の近似法と比較し,それと比較した場合の品質向上を示す。 Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks, but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on the task of machine translation. We formulate MBR decoding as a matrix completion problem, where the utility metric scores between candidate hypotheses and pseudo-reference translations form a low-rank matrix. First, we empirically show that the scores matrices indeed have a low-rank structure. Then, we exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix by applying the Alternating Least Squares (ALS) algorithm, thereby enabling a fast approximation of the MBR decoding process. Our experimental results on machine translation tasks demonstrate that the proposed method requires 1/16 utility metric computations compared to vanilla MBR decoding while achieving equal translation quality measured by COMET22 on the WMT22 dataset (en<>de and en<>ru). We also benchmark our method against other approximation methods and we show gains in quality when comparing to them.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# DenoDet: SAR画像におけるターゲット検出のための変形可能なマルチサブスペース機能としての注意 DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images ( http://arxiv.org/abs/2406.02833v1 ) ライセンス: Link先を確認	Yimian Dai, Minrui Zou, Yuxuan Li, Xiang Li, Kang Ni, Jian Yang,	(参考訳) SAR(Synthetic Aperture Radar)のターゲット検出は、固有のスペックルノイズや、小型であいまいなターゲットの出現によって長い間妨げられてきた。ディープニューラルネットワークはSARターゲット検出を先進的に進めているが、本質的な低周波バイアスと静的な後トレーニングの重みはコヒーレントノイズに悩まされ、不均一な地形にわたって微妙な詳細を保存している。従来のSAR画像デノベーションにより、畳み込みバイアスを校正し、高周波数に注意を払い、マルチサブスペースデノベーションの観点からターゲットを検出する自然なマルチスケールサブスペース表現を形成するために、明示的な周波数領域変換によって支援されるネットワークであるDenoDetを提案する。我々はトランスデノ(TransDeno)を設計する。トランスデノ(TransDeno)は変換領域のソフトしきい値処理として動作し、サルエントターゲット信号の保存とノイズの減衰によりサブスペースを動的にデノイングする。また、サブスペース処理の粒度を適応的に調整するために、入力特徴に条件付けられた群を動的に変化させる変形可能なグループ完全連結層(DeGroFC)を提案する。ベルとホイッスルがなければ、プラグ&プレイのTransDenoは複数のSARターゲット検出データセットに対して最先端のスコアを設定する。コードはhttps://github.com/GrokCV/GrokSARで入手できる。 Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR image denoising, we propose DenoDet, a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies, forming a natural multi-scale subspace representation to detect targets from the perspective of multi-subspace denoising. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation, dynamically denoising across subspaces by preserving salient target signals and attenuating noise. To adaptively adjust the granularity of subspace processing, we also propose a deformable group fully-connected layer (DeGroFC) that dynamically varies the group conditioned on the input features. Without bells and whistles, our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets. The code is available at https://github.com/GrokCV/GrokSAR.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# DREW : エラー制御型透かしの活用によるロバストデータ保護に向けて DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking ( http://arxiv.org/abs/2406.02836v1 ) ライセンス: Link先を確認	Mehrdad Saberi, Vinu Sankar Sadasivan, Arman Zarei, Hessam Mahdavifar, Soheil Feizi,	(参考訳) データオーナシップ保護、メディアの法医学、AI生成コンテンツの検出など、データの起源の特定はデータの証明に不可欠である。標準的なアプローチは、クエリデータと参照データセットのエントリをマッチングする埋め込みベースの検索技術である。しかし、この方法は良心や悪意のある編集に対して堅牢ではない。そこで我々は,誤り訂正符号とウォーターマーキング(DREW)を用いたデータ検索手法を提案する。 DREWは、参照データセットをランダムにクラスタ化し、各クラスタに独自のエラー制御された透かしキーを注入し、クエリ時にこれらのキーを使用して、所定のサンプルに対して適切なクラスタを特定する。関連するクラスタを特定した後、最も正確な一致を見つけるために、クラスタ内に埋め込みベクトル類似性検索を行う。エラー制御符号(ECC)の統合により、信頼性の高いクラスタ割り当てが保証され、ECCアルゴリズムが正しいクラスタを高い信頼性で検出できない場合に、データセット全体の検索が可能になる。これにより、DREWはベースラインのパフォーマンスを維持しつつ、データセットの小さなサブセットで検索を行う際に、クエリをその起源と正しく一致させる可能性が高くなるため、パフォーマンス改善の機会を提供する。使用した透かし技術によって、DREWは複数のデータセットと最先端の埋め込みモデル(例えば、DinoV2、CLIP)にわたる検索精度(いくつかのデータセットや修正タイプで最大40%)を大幅に改善し、セキュアで信頼性の高いソース識別のための有望なソリューションとなる。コードはhttps://github.com/mehrdadsaberi/DREWで公開されている。 Identifying the origin of data is crucial for data provenance, with applications including data ownership protection, media forensics, and detecting AI-generated content. A standard approach involves embedding-based retrieval techniques that match query data with entries in a reference dataset. However, this method is not robust against benign and malicious edits. To address this, we propose Data Retrieval with Error-corrected codes and Watermarking (DREW). DREW randomly clusters the reference dataset, injects unique error-controlled watermark keys into each cluster, and uses these keys at query time to identify the appropriate cluster for a given sample. After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches. The integration of error control codes (ECC) ensures reliable cluster assignments, enabling the method to perform retrieval on the entire dataset in case the ECC algorithm cannot detect the correct cluster with high confidence. This makes DREW maintain baseline performance, while also providing opportunities for performance improvements due to the increased likelihood of correctly matching queries to their origin when performing retrieval on a smaller subset of the dataset. Depending on the watermark technique used, DREW can provide substantial improvements in retrieval accuracy (up to 40\% for some datasets and modification types) across multiple datasets and state-of-the-art embedding models (e.g., DinoV2, CLIP), making our method a promising solution for secure and reliable source identification. The code is available at https://github.com/mehrdadsaberi/DREW	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# サンプルを一度だけ受け入れる:高速で自己修正可能な確率的変分推論 You Only Accept Samples Once: Fast, Self-Correcting Stochastic Variational Inference ( http://arxiv.org/abs/2406.02838v1 ) ライセンス: Link先を確認	Dominic B. Dayta,	(参考訳) 大規模なベイズ系モデル上での変分推論(VI)に対する高速で自己修正確率的最適化を行うアルゴリズムである YOASOVI を紹介する。これを実現するために、各繰り返しにおける確率 VI の目的関数に関する情報を利用して、通常のモンテカルロサンプリングを受入サンプリングに置き換える。グラデーションのための大きなサンプルを描画・評価するために計算資源を費やすのではなく、1つのサンプルのみを描画し、目標の期待された改善に比例した確率で受け入れる。下記の論文では, 素直な直観に基づくアルゴリズムと, メトロポリス型スキームとして構築したアルゴリズムの2つのバージョンについて述べる。多変量ガウス混合モデルのためのシミュレーションとベンチマークデータセットに基づく実験結果から、ヨアソビは正規化モンテカルロと準モンテカルロVIのアルゴリズムよりも、連続的に(時計時間で)より早く、より良い近傍に収束することが示された。 We introduce YOASOVI, an algorithm for performing fast, self-correcting stochastic optimization for Variational Inference (VI) on large Bayesian heirarchical models. To accomplish this, we take advantage of available information on the objective function used for stochastic VI at each iteration and replace regular Monte Carlo sampling with acceptance sampling. Rather than spend computational resources drawing and evaluating over a large sample for the gradient, we draw only one sample and accept it with probability proportional to the expected improvement in the objective. The following paper develops two versions of the algorithm: the first one based on a naive intuition, and another building up the algorithm as a Metropolis-type scheme. Empirical results based on simulations and benchmark datasets for multivariate Gaussian mixture models show that YOASOVI consistently converges faster (in clock time) and within better optimal neighborhoods than both regularized Monte Carlo and Quasi-Monte Carlo VI algorithms.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 条件等等化生成ネットワーク Conditional Idempotent Generative Networks ( http://arxiv.org/abs/2406.02841v1 ) ライセンス: Link先を確認	Niccolò Ronchetti,	(参考訳) 本稿では,条件付き生成ネットワーク(CIGN, Conditional Idempotent Generative Networks)を提案する。 IGNは効率的なシングルパス生成を提供するが、生成されたデータの内容を制御する能力は欠如している。 CIGNは条件付け機構を組み込むことでこの制限に対処し、ユーザーは特定のタイプのデータに対して生成プロセスを制御できる。我々は,CIGNの理論的基盤を確立し,その範囲,損失関数設計,評価指標について概説する。次に、チャネル条件付けとフィルタ条件付けという、CIGNを実装するための2つの潜在的アーキテクチャを提案する。最後に,MNISTデータセットの実験結果について考察し,両手法の有効性を実証する。我々の発見は、より大規模なデータセットとより強力な計算資源でCIGNを探索し、最適な実装戦略を決定するための道を開いた。 We propose Conditional Idempotent Generative Networks (CIGN), a novel approach that expands upon Idempotent Generative Networks (IGN) to enable conditional generation. While IGNs offer efficient single-pass generation, they lack the ability to control the content of the generated data. CIGNs address this limitation by incorporating conditioning mechanisms, allowing users to steer the generation process towards specific types of data. We establish the theoretical foundations for CIGNs, outlining their scope, loss function design, and evaluation metrics. We then present two potential architectures for implementing CIGNs: channel conditioning and filter conditioning. Finally, we discuss experimental results on the MNIST dataset, demonstrating the effectiveness of both approaches. Our findings pave the way for further exploration of CIGNs on larger datasets and with more powerful computing resources to determine the optimal implementation strategy.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 拡散特性に対する再帰的正規化カットによるゼロショット画像分割 Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features ( http://arxiv.org/abs/2406.02842v1 ) ライセンス: Link先を確認	Paul Couairon, Mustafa Shukor, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome,	(参考訳) ファンデーションモデルは、言語、ビジョン、マルチモーダルタスクなど、さまざまな領域にまたがる強力なツールとして登場した。以前の研究は教師なしのイメージセグメンテーションに対処してきたが、教師付きモデルにはかなり遅れている。本稿では,拡散UNetエンコーダを基礎ビジョンエンコーダとして使用し,最終的な自己注意ブロックからの出力特徴のみを利用する教師なしゼロショットセグメンテーション手法であるDiffCutを紹介する。広汎な実験により,グラフベースセグメンテーションアルゴリズムにおける拡散特性の利用が,ゼロショットセグメンテーションにおける従来の最先端手法を著しく上回ることを示した。具体的には、検出対象の粒度をソフトに制御する再帰的正規化カットアルゴリズムを活用し、複雑な画像の詳細を正確にキャプチャする明確に定義されたセグメンテーションマップを生成する。我々の研究は、拡散UNetエンコーダに埋め込まれた極めて正確なセマンティック知識を強調し、下流タスクの基盤ビジョンエンコーダとして機能する。 Project page at https://diffcut-segmentation.github.io Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method that solely harnesses the output features from the final self-attention block. Through extensive experimentation, we demonstrate that the utilization of these diffusion features in a graph based segmentation algorithm, significantly outperforms previous state-of-the-art methods on zero-shot segmentation. Specifically, we leverage a recursive Normalized Cut algorithm that softly regulates the granularity of detected objects and produces well-defined segmentation maps that precisely capture intricate image details. Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks. Project page at https://diffcut-segmentation.github.io	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# 対話レコメンデーションのための項目言語モデル Item-Language Model for Conversational Recommendation ( http://arxiv.org/abs/2406.02844v1 ) ライセンス: Link先を確認	Li Yang, Anushya Subbiah, Hardik Patel, Judith Yue Li, Yanwei Song, Reza Mirghaderi, Vikram Aggarwal,	(参考訳) 大規模言語モデル(LLM)は、複雑な対話理解、推論、コーディングといったタスクにおいて、その創発的な能力によって非常に成功した。これらの創発的能力は、画像、オーディオ、ビデオ機能を含むマルチモードで拡張されている。一方、レコメンダシステムは、情報検索やアイテム発見のニーズに対して重要な役割を担っている。近年,レコメンデーションにLLMを適用しようとする試みがある。現在の試みの難しさの1つは、LLMが通常、ユーザーインタラクション信号を含むレコメンデータシステムデータでトレーニングされていないことであり、一般には利用できないことが多いことである。もう1つの困難は、ユーザインタラクション信号が自然言語のテキストと異なるパターンを持っていることであり、LLMトレーニング設定が従来のレコメンデータシステム手法と比較して、インタラクション信号からより簡単な知識を学べるかは、現時点では不明である。最後に、複数のLDMを異なるユースケースで訓練することは困難であり、レコメンデーションシステムデータから学習する際、元の言語と推論能力を維持することは困難である。これら3つの制約に対処するために,ユーザインタラクション信号をエンコードするテキスト整列アイテム表現を生成するアイテムエンコーダと,保存済みの知識でこれらのアイテム表現を理解可能な凍結LDMからなるアイテムランゲージモデル(ILM)を提案する。項目エンコーダにおける言語アライメントの重要性とユーザインタラクション知識の両立を実証する広範な実験を行う。 Large-language Models (LLMs) have been extremely successful at tasks like complex dialogue understanding, reasoning and coding due to their emergent abilities. These emergent abilities have been extended with multi-modality to include image, audio, and video capabilities. Recommender systems, on the other hand, have been critical for information seeking and item discovery needs. Recently, there have been attempts to apply LLMs for recommendations. One difficulty of current attempts is that the underlying LLM is usually not trained on the recommender system data, which largely contains user interaction signals and is often not publicly available. Another difficulty is user interaction signals often have a different pattern from natural language text, and it is currently unclear if the LLM training setup can learn more non-trivial knowledge from interaction signals compared with traditional recommender system methods. Finally, it is difficult to train multiple LLMs for different use-cases, and to retain the original language and reasoning abilities when learning from recommender system data. To address these three limitations, we propose an Item-Language Model (ILM), which is composed of an item encoder to produce text-aligned item representations that encode user interaction signals, and a frozen LLM that can understand those item representations with preserved pretrained knowledge. We conduct extensive experiments which demonstrate both the importance of the language-alignment and of user interaction knowledge in the item encoder.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# モデルウェイトへのインテクスト学習の厳密な変換 Exact Conversion of In-Context Learning to Model Weights ( http://arxiv.org/abs/2406.02847v1 ) ライセンス: Link先を確認	Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi,	(参考訳) In-Context Learning (ICL)は、近年注目を集めている大規模言語モデルの強力な創発的特性である。正規勾配に基づく学習とは対照的に、ICLは高度に解釈可能であり、パラメータ更新を必要としない。本稿では,線形化変圧器ネットワークにおいて,バイアス項を含めることで,ICLを明示的かつ永続的にすることができることを示す。我々は、ICLデモプロンプトを持つモデルと、追加のバイアス項を持つモデルとの等価性を数学的に示す。我々のアルゴリズム(ICLCA)は、正確な変換を安価に行うことができる。既存のメソッドは正確ではなく、高価なパラメータ更新を必要とする。 ICLトークンを線形変換器に正確に組み込む実験により,本手法の有効性を実証する。さらに,線形化されていない正規変圧器ネットワークにおいても,ICLトークンの高精度な近似変換を実現する方法を提案する。 GPT-2の実験では、変換が近似的であるにもかかわらず、モデルが包含されたバイアス項から価値ある文脈を得ることを示した。 In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# Xmodel-LM技術報告 Xmodel-LM Technical Report ( http://arxiv.org/abs/2406.02856v1 ) ライセンス: Link先を確認	Yichuan Wang, Yang Liu, Yu Yan, Xucheng Huang, Ling Jiang,	(参考訳) 2兆以上のトークンで事前訓練されたコンパクトで効率的な1.1B言語モデルであるXmodel-LMを紹介する。ダウンストリームタスク最適化に基づいて、中国語と英語のコーパスのバランスをとる自己構築データセット(Xdata)に基づいて、Xmodel-LMは、そのサイズが小さいにもかかわらず、顕著なパフォーマンスを示す。特に、同様の規模の既存のオープンソース言語モデルを上回っている。私たちのモデルチェックポイントとコードはGitHubでhttps://github.com/XiaoduoAILab/XmodelLMで公開されています。 We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on over 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# TSPDiffuser:トラベリングセールスパーソンパス計画問題のための学習サンプリングとしての拡散モデル TSPDiffuser: Diffusion Models as Learned Samplers for Traveling Salesperson Path Planning Problems ( http://arxiv.org/abs/2406.02858v1 ) ライセンス: Link先を確認	Ryo Yonetani,	(参考訳) 本稿では,トラベリングセールスパーソンパス計画問題(TSPPP)を,障害に富んだ環境下で行う新しいデータ駆動型パスプランナーTSPDiffuserを提案する。障害物マップ内の目的地の集合を考慮に入れれば、最も短い衝突のない経路を効率的に見つけることが目的である。 TSPDiffuser では,大量の TSPPP インスタンスとその各ソリューション上で拡散モデルを訓練し,未知の問題インスタンスに対する可塑性経路を生成する。このモデルは学習したサンプルとして利用でき、少数のノードとエッジを持つ潜在的なソリューションを含むロードマップを構築することができる。このアプローチにより、目的地間の移動コストを効率よく正確に推定することができ、TSPPPの解法における主要な計算課題に効果的に対処できる。各種合成・実世界の屋内・屋外環境を用いた実験評価は,ソリューションの品質と計算時間とのトレードオフの観点から,既存の手法よりもTSPDiffuserの有効性を示す。 This paper presents TSPDiffuser, a novel data-driven path planner for traveling salesperson path planning problems (TSPPPs) in environments rich with obstacles. Given a set of destinations within obstacle maps, our objective is to efficiently find the shortest possible collision-free path that visits all the destinations. In TSPDiffuser, we train a diffusion model on a large collection of TSPPP instances and their respective solutions to generate plausible paths for unseen problem instances. The model can then be employed as a learned sampler to construct a roadmap that contains potential solutions with a small number of nodes and edges. This approach enables efficient and accurate estimation of traveling costs between destinations, effectively addressing the primary computational challenge in solving TSPPPs. Experimental evaluations with diverse synthetic and real-world indoor/outdoor environments demonstrate the effectiveness of TSPDiffuser over existing methods in terms of the trade-off between solution quality and computational time requirements.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# ラベルなしサンプルを活用したガイダンス情報の再考:ラベルエンコーディングの視点から Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective ( http://arxiv.org/abs/2406.02862v1 ) ライセンス: Link先を確認	Yulong Zhang, Yuan Yao, Shuhao Chen, Pengrong Jin, Yu Zhang, Jian Jin, Jiangang Lu,	(参考訳) 経験的リスク最小化(ERM)は、ラベル付きサンプルが不十分なシナリオでは脆弱である。 ERMから未ラベルのサンプルへのバニラ拡張としてエントロピー最小化(EntMin)があり、未ラベルのサンプルのソフトラベルを使って学習をガイドしている。しかしEntMinは、予測の多様性を無視しながら、予測の差別性を強調している。この問題を軽減するため,本稿では,未ラベルサンプルを利用するためのガイダンス情報を再考する。 ERMの学習目標を解析することにより、特定のカテゴリにおけるラベル付きサンプルのガイダンス情報が対応するラベルエンコーディングであることが分かる。この発見に触発されて,ラベルエンコードリスク最小化(LERM)を提案する。まず、ラベル付きサンプルの予測手段を通じてラベルエンコーディングを推定し、対応する接地トラスラベルエンコーディングと整合させる。その結果、LERMは予測の差別性と多様性の両方を保証し、プラグインとして既存のメソッドに統合することができる。理論的には、LERMとERMとEntMinの関係を解析する。実験により,複数のラベルが不十分なシナリオにおいて,LERMの優位性を検証した。コードはhttps://github.com/zhangyl660/LERMで公開されている。 Empirical Risk Minimization (ERM) is fragile in scenarios with insufficient labeled samples. A vanilla extension of ERM to unlabeled samples is Entropy Minimization (EntMin), which employs the soft-labels of unlabeled samples to guide their learning. However, EntMin emphasizes prediction discriminability while neglecting prediction diversity. To alleviate this issue, in this paper, we rethink the guidance information to utilize unlabeled samples. By analyzing the learning objective of ERM, we find that the guidance information for labeled samples in a specific category is the corresponding label encoding. Inspired by this finding, we propose a Label-Encoding Risk Minimization (LERM). It first estimates the label encodings through prediction means of unlabeled samples and then aligns them with their corresponding ground-truth label encodings. As a result, the LERM ensures both prediction discriminability and diversity, and it can be integrated into existing methods as a plugin. Theoretically, we analyze the relationships between LERM and ERM as well as EntMin. Empirically, we verify the superiority of the LERM under several label insufficient scenarios. The codes are available at https://github.com/zhangyl660/LERM.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# スコーラとしてのLLM:対話評価における出力順序の影響 LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation ( http://arxiv.org/abs/2406.02863v1 ) ライセンス: Link先を確認	Yi-Pei Chen, KuanChao Chu, Hideki Nakayama,	(参考訳) 本研究では,大規模言語モデル(LLM)を用いた対話評価における即時設計の効果について検討する。 LLMは様々な入力のスコアリングにますます利用されているが、対話評価におけるモデル感度と主観性のため、効果的な対話評価のプロンプトを作成することは依然として困難である。本研究は、異なるプロンプト構造を用いて、出力命令の順序を変更し、説明的理由を含む実験を行った。理由と得点の順序はLLMのスコアに大きく影響し,「理性優先」アプローチによりより包括的評価が得られた。この知見はLLMに基づく評価の精度と一貫性を高めるために重要である。 This research investigates the effect of prompt design on dialogue evaluation using large language models (LLMs). While LLMs are increasingly used for scoring various inputs, creating effective prompts for dialogue evaluation remains challenging due to model sensitivity and subjectivity in dialogue assessments. Our study experimented with different prompt structures, altering the sequence of output instructions and including explanatory reasons. We found that the order of presenting reasons and scores significantly influences LLMs' scoring, with a "reason-first" approach yielding more comprehensive evaluations. This insight is crucial for enhancing the accuracy and consistency of LLM-based evaluations.	翻訳日:2024-06-06 22:26:58 公開日:2024-06-05
# NUMCoT:大規模言語モデルを用いたChain-of-Thought Reasoningにおける数量と単位 NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models ( http://arxiv.org/abs/2406.02864v1 ) ライセンス: Link先を確認	Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu,	(参考訳) 多数のシステムと測定単位は、人間の活動において2つの共通する話題であり、それらを表現する言語と相互に影響を及ぼす。現在、LLM(Large Language Models)の評価は、しばしば数学的推論を伴っているが、数や単位の微妙な変化が問題の複雑さやLLMの性能を劇的に変える可能性についてはほとんど注目されていない。本稿では、摂動を伴うデータセットの構築により、数値と測定単位の処理に関する既存のLCMを精査する。まず,算術語問題を言語から数への数値変換や単位に基づく測度変換など,様々なサブプロデューサにアナライズする。さらに,数量や単位に挑戦する古代中国の算術作品から,数学用語の問題に注釈を付ける。摂動データセットの実験は、LLMが数値と測定の変換を扱うのに依然として困難に直面することを示した。 Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 振動はフィードバックによる貯水池計算における時系列予測を促進する Oscillations enhance time-series prediction in reservoir computing with feedback ( http://arxiv.org/abs/2406.02867v1 ) ライセンス: Link先を確認	Yuji Kawai, Takashi Morita, Jihoon Park, Minoru Asada,	(参考訳) 脳のモデリングに使用される機械学習フレームワークであるReservoir Computingは、観測の少ない時間データを最小限の計算リソースで予測することができる。しかし, 貯水池系が不安定になるため, 長期目標時系列を正確に再現することは困難である。この予測能力は、モータタイミングの予測やカオス力学系の予測など、様々な時系列処理に必要である。本研究は, 振動駆動型貯水池計算(ODRC)のフィードバックにより, 振動信号を貯水池ネットワークに供給し, ネットワーク活動を安定化し, 複雑な貯水池力学を誘導する手法を提案する。 ODRCは、モータタイミングおよびカオス時系列予測タスクにおいて、従来の貯水池計算方法よりも、より正確な長期目標時系列を再現することができる。さらに、未経験期間における対象と類似した時系列を生成する。つまり、限られた観測から抽象的な生成規則を学習することができる。このような単純で計算コストのかかる実装による大幅な改善を考えると、ODRCは様々な時系列データの実用的なモデルとして機能する。さらに、神経振動とその小脳プロセッサのモデルとして、ODRCの生物学的意義について論じる。 Reservoir computing, a machine learning framework used for modeling the brain, can predict temporal data with little observations and minimal computational resources. However, it is difficult to accurately reproduce the long-term target time series because the reservoir system becomes unstable. This predictive capability is required for a wide variety of time-series processing, including predictions of motor timing and chaotic dynamical systems. This study proposes oscillation-driven reservoir computing (ODRC) with feedback, where oscillatory signals are fed into a reservoir network to stabilize the network activity and induce complex reservoir dynamics. The ODRC can reproduce long-term target time series more accurately than conventional reservoir computing methods in a motor timing and chaotic time-series prediction tasks. Furthermore, it generates a time series similar to the target in the unexperienced period, that is, it can learn the abstract generative rules from limited observations. Given these significant improvements made by the simple and computationally inexpensive implementation, the ODRC would serve as a practical model of various time series data. Moreover, we will discuss biological implications of the ODRC, considering it as a model of neural oscillations and their cerebellar processors.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 到達度を目標とした非カウントPOMDPの音響ヒューリスティック探索値反復法 Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives ( http://arxiv.org/abs/2406.02871v1 ) ライセンス: Link先を確認	Qi Heng Ho, Martin S. Feather, Federico Rossi, Zachary N. Sunberg, Morteza Lahijanian,	(参考訳) 部分的に観測可能なマルコフ決定プロセス(POMDP)は、遷移および観測の不確実性の下での逐次決定のための強力なモデルである。本稿では,最大到達確率問題(MRPP)として知られるPMDPにおいて,目標状態に到達する確率を最大化することを目的とした課題について検討する。これはまた、論理的仕様によるモデルチェックにおける中核的な問題であり、自然に非カウントされている(因子は1つ)。割引問題に対するポイントベース手法の成功に触発され,MRPPへの拡張について検討した。具体的には、試行ベースのヒューリスティックな探索値反復手法に着目し、これらの手法の強みを利用して、不確定水平問題に対するループ処理の欠点に対処しながら、信念空間(値境界による情報探索)を効率的に探索する新しいアルゴリズムを提案する。このアルゴリズムは、最適到達可能性確率の両側境界を持つポリシーを生成する。一定の条件下では、最適政策への収束を下から証明する。提案手法は,確率保証と計算時間の両方において,ほぼすべての場合において既存手法よりも優れていることを示す。 Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 自動グラフニューラルネットワークによる組合せ最適化 Combinatorial Optimization with Automated Graph Neural Networks ( http://arxiv.org/abs/2406.02872v1 ) ライセンス: Link先を確認	Yang Liu, Peng Zhang, Yang Gao, Chuan Zhou, Zhao Li, Hongyang Chen,	(参考訳) 近年、グラフニューラルネットワーク(GNN)は、最大カットや最大独立セットといったNP-hard combinatorial optimization(CO)問題を解決するために人気が高まっている。これらの手法の背後にある中核的な考え方は、CO問題をグラフとして表現し、GNNを使用して、組み合わせ情報によるノード/グラフの埋め込みを学ぶことである。これらの手法は、特定のCO問題を考えると、有望な結果を得たが、GNNアーキテクチャの設計にはドメイン知識による重い手作業が必要である。既存の自動GNNは、NPハードCO問題の解決には適用できない従来のグラフ学習問題に主に焦点をあてている。この目的のために、我々は、新しいクラスである \textbf{AUTO}mated \textbf{G}NNs を、 \textbf{NP}-ハード問題、すなわち \textbf{AutoGNP} を解決する。我々は、GNNによるCO問題を表現するとともに、2つの特定の問題、すなわち混合整数線形計画法と2次非制約バイナリ最適化に焦点をあてる。 AutoGNPの考え方は、グラフニューラルアーキテクチャ検索アルゴリズムを使用して、与えられたNPハード組合せ最適化問題に対して最適なGNNを自動的に見つけることである。既存のグラフニューラルネットワーク検索アルゴリズムと比較して、AutoGNPはアーキテクチャ検索空間の2ホップ演算子を利用する。さらに、AutoGNPはシミュレーションアニールと厳密な早期停止ポリシーを利用して局所最適解を回避する。ベンチマーク組合せ問題に対する実験結果から,提案モデルの有効性が示された。 In recent years, graph neural networks (GNNs) have become increasingly popular for solving NP-hard combinatorial optimization (CO) problems, such as maximum cut and maximum independent set. The core idea behind these methods is to represent a CO problem as a graph and then use GNNs to learn the node/graph embedding with combinatorial information. Although these methods have achieved promising results, given a specific CO problem, the design of GNN architectures still requires heavy manual work with domain knowledge. Existing automated GNNs are mostly focused on traditional graph learning problems, which is inapplicable to solving NP-hard CO problems. To this end, we present a new class of \textbf{AUTO}mated \textbf{G}NNs for solving \textbf{NP}-hard problems, namely \textbf{AutoGNP}. We represent CO problems by GNNs and focus on two specific problems, i.e., mixed integer linear programming and quadratic unconstrained binary optimization. The idea of AutoGNP is to use graph neural architecture search algorithms to automatically find the best GNNs for a given NP-hard combinatorial optimization problem. Compared with existing graph neural architecture search algorithms, AutoGNP utilizes two-hop operators in the architecture search space. Moreover, AutoGNP utilizes simulated annealing and a strict early stopping policy to avoid local optimal solutions. Empirical results on benchmark combinatorial problems demonstrate the superiority of our proposed model.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 因果推論の予測による一般化 Prediction-powered Generalization of Causal Inferences ( http://arxiv.org/abs/2406.02873v1 ) ライセンス: Link先を確認	Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag,	(参考訳) ランダム化制御試験(RCT)の因果推論は、いくつかの効果修飾子が異なる分布を持つ対象集団には関係しないかもしれない。先行研究は、実験の結果を結果のない目的の個体群に一般化するが、共変量データは利用可能である。複雑なニュアンス関数を推定する必要があるため,試行錯誤の程度が限定されることで,一般化が統計的に実現不可能な課題となることを示す。我々は,OSに仮定することなく,新たな観測結果(OS)から学習した予測モデルを用いて試行データを補足する一般化アルゴリズムを開発した。理論的かつ実証的に、我々の手法は、OSが高品質であり、そうでなければ頑健であり、また、例えば、未測定の欠点がある場合に、より優れた一般化を促進することを示している。 Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution. Prior work studies generalizing the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional observational study (OS), without making any assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is high-quality, and remain robust when it is not, and e.g., have unmeasured confounding.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# ディープ・クープマン・オペレーター発見のためのカンの活用 Leveraging KANs For Enhanced Deep Koopman Operator Discovery ( http://arxiv.org/abs/2406.02875v1 ) ライセンス: Link先を確認	George Nehma, Madhur Tiwari,	(参考訳) 多層パーセプトロン(MLP)は、非線形力学を線形化するディープ・クープマン作用素の発見に広く利用されている。本稿では,MLPニューラルネットワークのより効率的かつ正確な代替としてKAN(Kolmogorov-Arnold Networks)が出現し,制御付きクープマン演算子(Koopman operator)の学習における各ネットワークタイプの性能の比較を行った。カンはトレーニングのほぼ全ての面で優れており、学習速度は31倍、パラメータ効率は15倍、予測精度は2BPの場合のMLP Deep Neural Networks(DNN)の1.25倍である。このように、カンスはディープ・クープマン理論の発展において効率的なツールとなる可能性を示している。 Multi-layer perceptrons (MLP's) have been extensively utilized in discovering Deep Koopman operators for linearizing nonlinear dynamics. With the emergence of Kolmogorov-Arnold Networks (KANs) as a more efficient and accurate alternative to the MLP Neural Network, we propose a comparison of the performance of each network type in the context of learning Koopman operators with control.In this work, we propose a KANs-based deep Koopman framework with applications to an orbital Two-Body Problem (2BP) and the pendulum for data-driven discovery of linear system dynamics. KANs were found to be superior in nearly all aspects of training; learning 31 times faster, being 15 times more parameter efficiency, and predicting 1.25 times more accurately as compared to the MLP Deep Neural Networks (DNNs) in the case of the 2BP. Thus, KANs shows potential for being an efficient tool in the development of Deep Koopman Theory.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# LCS:ゼロショットニューラルネットワーク翻訳のための言語コンバータ戦略 LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation ( http://arxiv.org/abs/2406.02876v1 ) ライセンス: Link先を確認	Zengkui Sun, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou,	(参考訳) 多言語ニューラルマシン翻訳モデルは、典型的には、ソースまたはターゲット文の前にある言語タグ(LT)によって翻訳方向を区別する。しかし、現在のLT戦略は、ゼロショット翻訳で期待されているように、望まれるターゲット言語、すなわちオフターゲット問題を示すことはできない。例えば、対象言語をデコーダ側に置くと、デコーダ側に置くと、デコーダ側に置くと、ターゲット言語をエンコーダ側に置くと、ソース入力のコピーやパラフレーズ化につながる。上記の課題に対処するため,Language Converter Strategy (LCS) という,シンプルながら効果的な戦略を提案する。ターゲット言語をトップエンコーダ層に埋め込むことで、LCSはエンコーダの混乱を緩和し、デコーダの安定した言語表示を保証する。 MultiUN、TED、OPUS-100データセットの実験結果は、LCSが目標外の問題を著しく軽減し、言語精度は95.28%、96.21%、85.35%、バニラLTの戦略は3.07、3.3、733 BLEUでそれぞれ上回っていることを示している。 Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of the target LT. For example, when placing the target LT on the decoder side, the indication would rapidly degrade along with decoding steps, while placing the target LT on the encoder side would lead to copying or paraphrasing the source input. To address the above issues, we propose a simple yet effective strategy named Language Converter Strategy (LCS). By introducing the target language embedding into the top encoder layers, LCS mitigates confusion in the encoder and ensures stable language indication for the decoder. Experimental results on MultiUN, TED, and OPUS-100 datasets demonstrate that LCS could significantly mitigate the off-target issue, with language accuracy up to 95.28%, 96.21%, and 85.35% meanwhile outperforming the vanilla LT strategy by 3.07, 3,3, and 7.93 BLEU scores on zero-shot translation, respectively.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# FedStaleWeight: バッファリングされた非同期フェデレーションラーニングと、静的リヘアリングによる公正な集約 FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting ( http://arxiv.org/abs/2406.02877v1 ) ライセンス: Link先を確認	Jeffrey Ma, Alan Tu, Yiling Chen, Vijay Janapa Reddi,	(参考訳) フェデレートラーニング(FL)は、プライバシを保持しながら分散データを活用し、パフォーマンスやスケーラビリティ、コラボレーションといった課題に直面している。非同期フェデレートラーニング(AFL)メソッドは、最も遅いエージェントによってバウンドされた同期の代替手段として期待されているが、コンバージェンス保証、計算の不均一性に対する公正性、集約された更新における不安定性の導入など、さらなる課題が加えられている。具体的には、AFLは、更新を高速に生成できるエージェントに対して、モデルトレーニングを重んじ、遅いエージェントを置き去りにし、グローバルモデルで学ばない異なる分散データを持つことが多い。 Naively upweightingはインセンティブの問題を導入し、真の高速更新エージェントは、モデルトレーニングへの貢献を増やすために、更新を遅い速度で報告する可能性がある。我々はFedStaleWeightを紹介した。これは非同期クライアント更新を集約する際の公平性に対処するアルゴリズムである。 FedStaleWeightは非同期フェデレートされた学習アグリゲーションをメカニズム設計の問題として再設計し、安定度に基づいたエージェント更新をアップウェイトすることで、より高速な更新生成エージェントを好まずに、真に計算速度のレポートをインセンティブ化する重み付け戦略を考案した。 FedStaleWeightは、観察されたエージェント更新の安定性のみを活用することで、エージェントごとのアグリゲーションをより公平にする。我々はどちらも、スムーズで非凸な設定における理論的収束保証を提供し、FedStaleWeightと一般的に使用される非同期FedBuffの勾配平均化を実証的に比較し、より強い公正性を実現し、よりグローバルなモデルの精度に収束を早める方法を示した。最後に、バッファリングされたAFLアグリゲーション戦略の探索を容易にするためのオープンソースのテストベンチを提供し、非同期フェデレーション学習パラダイムにおけるさらなる研究を促進する。 Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration. Asynchronous Federated Learning (AFL) methods have emerged as promising alternatives to their synchronous counterparts bounded by the slowest agent, yet they add additional challenges in convergence guarantees, fairness with respect to compute heterogeneity, and incorporation of staleness in aggregated updates. Specifically, AFL biases model training heavily towards agents who can produce updates faster, leaving slower agents behind, who often also have differently distributed data which is not learned by the global model. Naively upweighting introduces incentive issues, where true fast updating agents may falsely report updates at a slower speed to increase their contribution to model training. We introduce FedStaleWeight, an algorithm addressing fairness in aggregating asynchronous client updates by employing average staleness to compute fair re-weightings. FedStaleWeight reframes asynchronous federated learning aggregation as a mechanism design problem, devising a weighting strategy that incentivizes truthful compute speed reporting without favoring faster update-producing agents by upweighting agent updates based on staleness. Leveraging only observed agent update staleness, FedStaleWeight results in more equitable aggregation on a per-agent basis. We both provide theoretical convergence guarantees in the smooth, non-convex setting and empirically compare FedStaleWeight against the commonly used asynchronous FedBuff with gradient averaging, demonstrating how it achieves stronger fairness, expediting convergence to a higher global model accuracy. Finally, we provide an open-source test bench to facilitate exploration of buffered AFL aggregation strategies, fostering further research in asynchronous federated learning paradigms.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 埋め込み多様体上の二階微分作用素、確率微分方程式およびブラウン運動 Second-order differential operators, stochastic differential equations and Brownian motions on embedded manifolds ( http://arxiv.org/abs/2406.02879v1 ) ライセンス: Link先を確認	Du Nguyen, Stefan Sommer,	(参考訳) 我々は、内積空間 E に埋め込まれた多様体 M が E 上の確率微分方程式(SDE)の不変多様体であるとき、それを M 上の二階微分作用素の概念と結び付けるとき、M がリーマン計量(英語版)を与えられるとき、E 上の勾配の項でラプラス・ベルトラミ作用素の簡単な公式を導出し、E 上のヘッセン(英語版)により、M 上のリーマン・ブラウン運動を保守的ストラトノビッチ(英語版)およびイオ(英語版) SDE の解として構成する。数値的に,多様体上のSDEを解くための3つのシミュレーションスキームを提案する。リーマン・ブラウン運動をシミュレートする確率射影法に加えて、与えられたE-tubular retractionを用いて、Levi-Civita接続の2階接トラクションを構築する。また, タンジェントリトラクションの2次項を考慮し, SDE を解くための抽出型オイラー・丸山法を提案する。議論された多様体のブラウン運動を含む手法を論文に実装するソフトウェアを提供する。いくつかのコンパクトリーマン多様体において、ブラウンシミュレーションの長期極限が一様分布に収束することを数値的に検証し、リーマン一様分布をサンプリングする方法を提案する。 We specify the conditions when a manifold M embedded in an inner product space E is an invariant manifold of a stochastic differential equation (SDE) on E, linking it with the notion of second-order differential operators on M. When M is given a Riemannian metric, we derive a simple formula for the Laplace-Beltrami operator in terms of the gradient and Hessian on E and construct the Riemannian Brownian motions on M as solutions of conservative Stratonovich and Ito SDEs on E. We derive explicitly the SDE for Brownian motions on several important manifolds in applications, including left-invariant matrix Lie groups using embedded coordinates. Numerically, we propose three simulation schemes to solve SDEs on manifolds. In addition to the stochastic projection method, to simulate Riemannian Brownian motions, we construct a second-order tangent retraction of the Levi-Civita connection using a given E-tubular retraction. We also propose the retractive Euler-Maruyama method to solve a SDE, taking into account the second-order term of a tangent retraction. We provide software to implement the methods in the paper, including Brownian motions of the manifolds discussed. We verify numerically that on several compact Riemannian manifolds, the long-term limit of Brownian simulation converges to the uniform distributions, suggesting a method to sample Riemannian uniform distributions	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 意図しない顔のキーポイント編集による対話型顔生成 Controllable Talking Face Generation by Implicit Facial Keypoints Editing ( http://arxiv.org/abs/2406.02880v1 ) ライセンス: Link先を確認	Dong Zhao, Jiaying Shi, Wenjun Li, Shudong Wang, Shenghui Xu, Zhaoming Pan,	(参考訳) 音声による会話顔生成は、デジタル人間の研究分野において大きな関心を集めている。既存の手法では、複雑なモデルアーキテクチャが互いに複雑に依存しており、画像やビデオの入力を再編集するプロセスが複雑になる。そこで本研究では,音声による顔表情の変形を制御するための音声音声生成手法であるControlTalkを提案し,単一画像と連続映像の両方に対する唇の動きを含む頭部ポーズと表情を統一的に構築する。予め訓練されたビデオ合成レンダラーを利用し、軽量な適応を提案することにより、口の開口形状を定量的に制御しつつ、正確で自然主義的な唇同期を実現する。提案手法は,HDTFやMEADなど,広く使用されているベンチマークにおいて,最先端の性能よりも優れていることを示す。パラメータ化適応は、言語に関係なく、同IDおよびクロスIDシナリオ間の表現変形を効果的に処理し、その実用性を領域外ポートレートに拡張する、顕著な一般化能力を示す。 Audio-driven talking face generation has garnered significant interest within the domain of digital human research. Existing methods are encumbered by intricate model architectures that are intricately dependent on each other, complicating the process of re-editing image or video inputs. In this work, we present ControlTalk, a talking face generation method to control face expression deformation based on driven audio, which can construct the head pose and facial expression including lip motion for both single image or sequential video inputs in a unified manner. By utilizing a pre-trained video synthesis renderer and proposing the lightweight adaptation, ControlTalk achieves precise and naturalistic lip synchronization while enabling quantitative control over mouth opening shape. Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD. The parameterized adaptation demonstrates remarkable generalization capabilities, effectively handling expression deformation across same-ID and cross-ID scenarios, and extending its utility to out-of-domain portraits, regardless of languages.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# Inv-Adapter:画像インバージョンと軽量アダプタによるIDカスタマイズ生成 Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter ( http://arxiv.org/abs/2406.02881v1 ) ライセンス: Link先を確認	Peng Xing, Ning Wang, Jianbo Ouyang, Zechao Li,	(参考訳) テキスト・画像生成モデルの顕著な進歩は、IDカスタマイズ生成の研究を著しく加速させる。しかし、既存のパーソナライズ手法は、高い忠実度と高効率要件を同時に満たすことはできない。その主なボトルネックはプロンプト画像エンコーダであり、テキスト・ツー・イメージモデルと弱いアライメント信号を生成し、モデルサイズを大幅に増大させる。そこで本研究では,ID画像の拡散領域表現をDDIM画像の逆変換により抽出する軽量なInv-Adapterを提案する。抽出したIDの高アライメントとテキスト・ツー・イメージ・モデルの中間的特徴から恩恵を受け、軽量アテンション・アダプタを慎重に設計し、それらをベース・テキスト・ツー・イメージ・モデルに効率的に組み込む。提案したInv-Adapterは,IDのカスタマイズ生成とモデルスケールにおいて高い競争力を持つことを示す。 The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased model size. Towards this end, we propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model by carefully designing a lightweight attention adapter. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# ファクチュアル知識編集のための復号化を意識した古い問題 Outdated Issue Aware Decoding for Factual Knowledge Editing ( http://arxiv.org/abs/2406.02882v1 ) ライセンス: Link先を確認	Zengkui Sun, Yijin Liu, Jiaan Wang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou,	(参考訳) 近年、知識編集は、事前訓練されたモデルにおける時代遅れのものからの特定の知識を、再訓練せずに更新できるため、注目を集めている。しかし、近年の研究で指摘されているように、既存の関連手法は、真の学習や吸収ではなく、単に編集された知識の表層的な単語構成を記憶するだけである。その結果,既存の手法では,新たな解答を推論するために編集された知識を利用するのに苦労しており,本来の知識を生かしたオリジナルのモデルによって生成される時代遅れの応答を保ちがちであることがわかった。それでも、古い回答は、我々が古い問題と名づけた推論問題に対する正しい答えとして予期せぬものである。この問題を軽減するため,本論文では,編集モデルの性能向上を目的とした,簡易かつ効果的な復号化戦略であるDISCO(Outdated ISsue aware decodeding)を提案する。具体的には、オリジナルのモデルと編集されたモデルとの確率分布の差を捉える。さらに、編集されたモデルにおけるトークン予測の違いを増幅し、古い問題を緩和し、編集された知識でモデル性能を向上させる。実験結果から,disCOを適用することで,従来のSOTA法を12.99F1スコアで上回り,古い問題の割合をzsREデータセットの5.78%に下げることが可能であることが示唆された。 Recently, Knowledge Editing has received increasing attention, since it could update the specific knowledge from outdated ones in pretrained models without re-training. However, as pointed out by recent studies, existing related methods tend to merely memorize the superficial word composition of the edited knowledge, rather than truly learning and absorbing it. Consequently, on the reasoning questions, we discover that existing methods struggle to utilize the edited knowledge to reason the new answer, and tend to retain outdated responses, which are generated by the original models utilizing original knowledge. Nevertheless, the outdated responses are unexpected for the correct answers to reasoning questions, which we named as the outdated issue. To alleviate this issue, in this paper, we propose a simple yet effective decoding strategy, i.e., outDated ISsue aware deCOding (DISCO), to enhance the performance of edited models on reasoning questions. Specifically, we capture the difference in the probability distribution between the original and edited models. Further, we amplify the difference of the token prediction in the edited model to alleviate the outdated issue, and thus enhance the model performance w.r.t the edited knowledge. Experimental results suggest that applying DISCO could enhance edited models to reason, e.g., on reasoning questions, DISCO outperforms the prior SOTA method by 12.99 F1 scores, and reduces the ratio of the outdated issue to 5.78% on the zsRE dataset.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# 再生不能データセットに対する非線形変換 Nonlinear Transformations Against Unlearnable Datasets ( http://arxiv.org/abs/2406.02883v1 ) ライセンス: Link先を確認	Thushari Hapuarachchi, Jing Lin, Kaiqi Xiong, Mohamed Rahouti, Gitte Ost,	(参考訳) 自動スクラップは、データ所有者の許可なしにディープラーニングモデルのデータを収集する一般的な方法として際立っている。近年,このデータ収集手法に関するプライバシー問題に取り組み始めている。注目すべきアプローチとしては、Deepconfuse、エラー最小化、エラー最大化(逆行性中毒とも呼ばれる)、Neural Tangent Generalization Attack、Synthetic、autoregressive、One-Pixel Shortcut、Self-Ensemble Protection、Entangled Features、Robust Error-Minimizing、Physe critical、TensorClogなどがある。学習不可能(unlearnable)な例と呼ばれるこれらのアプローチによって生成されたデータは、ディープラーニングモデルによって"学習"される。本研究では,従来の学習不可能と考えられてきたデータ/サンプルから,ニューラルネットワークが効果的に学習できることを実証するために,有効な非線形変換フレームワークを研究開発し,広範な実験を行う。結果として得られたアプローチは、最近研究者によって提案された線形分離可能な手法と比較して、学習不可能なデータを分解する能力を改善する。具体的には、この改良は、1-Pixelショートカットを除いて、これらの12つのデータ保護アプローチによって生成される未学習のCIFAR10データセットに対して0.34%から249.59%の範囲に及んでいることを示す。さらに, 自動回帰法とREM法の試験精度を線形分離法と比較して100%以上向上させる手法を提案する。その結果,これらの手法は,機械学習モデルにおける不正なデータの使用を防止するには不十分であることが示唆された。攻撃者が所有者の適切な許可なしにデータにアクセスするのを効果的に阻止する、より堅牢な保護メカニズムを開発する必要がある。 Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. Notable approaches include Deepconfuse, error-minimizing, error-maximizing (also known as adversarial poisoning), Neural Tangent Generalization Attack, synthetic, autoregressive, One-Pixel Shortcut, Self-Ensemble Protection, Entangled Features, Robust Error-Minimizing, Hypocritical, and TensorClog. The data generated by those approaches, called "unlearnable" examples, are prevented "learning" by deep learning models. In this research, we investigate and devise an effective nonlinear transformation framework and conduct extensive experiments to demonstrate that a deep neural network can effectively learn from the data/examples traditionally considered unlearnable produced by the above twelve approaches. The resulting approach improves the ability to break unlearnable data compared to the linear separable technique recently proposed by researchers. Specifically, our extensive experiments show that the improvement ranges from 0.34% to 249.59% for the unlearnable CIFAR10 datasets generated by those twelve data protection approaches, except for One-Pixel Shortcut. Moreover, the proposed framework achieves over 100% improvement of test accuracy for Autoregressive and REM approaches compared to the linear separable technique. Our findings suggest that these approaches are inadequate in preventing unauthorized uses of data in machine learning models. There is an urgent need to develop more robust protection mechanisms that effectively thwart an attacker from accessing data without proper authorization from the owners.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# PosterLLaVa:LLMによる統一マルチモーダルレイアウトジェネレータの構築 PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM ( http://arxiv.org/abs/2406.02884v1 ) ライセンス: Link先を確認	Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen,	(参考訳) レイアウト生成は自動グラフィックデザインを実現する上で鍵となる要素であり、視覚的に快く制約に富んだ方法で様々なマルチモーダルデザイン要素の位置とサイズをアレンジする必要がある。これまでのアプローチは、大規模アプリケーションでは非効率だったり、さまざまな設計要件に対する柔軟性に欠けていたりします。本研究は,多モード大言語モデル(MLLM)を活用し,多様な設計課題に対応するため,グラフィックレイアウトの自動生成のための統一的なフレームワークを提案する。対照的に、データ駆動方式では、構造化テキスト(JSONフォーマット)とビジュアルインストラクションチューニングを使用して、ユーザ定義の自然言語仕様を含む、特定の視覚的およびテキスト的制約の下でレイアウトを生成する。提案手法の有効性を実証し,多モードレイアウト生成ベンチマークを用いて実験を行い,SOTA(State-of-the-art)性能を実現した。さらに、実世界のグラフィックデザインの複雑さを捉える際の既存のデータセットの制限を認識し、より困難なタスク(ユーザ制約付き世代と複雑なポスター)のための2つの新しいデータセットを提案し、さらに、我々のモデルの有用性を現実の環境で検証する。より優れたアクセシビリティと適応性によって、このアプローチはさらに大規模なグラフィックデザインタスクを自動化する。コードとデータセットはhttps://github.com/posterllava/PosterLLaVAで公開されている。 Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# PLaD:擬似参照ペアを用いた優先型大規模言語モデル蒸留 PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs ( http://arxiv.org/abs/2406.02886v1 ) ライセンス: Link先を確認	Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang,	(参考訳) 大きな言語モデル(LLM)は、様々なタスクにおいて印象的な機能を示しているが、その膨大なパラメータサイズは、リソース制約のある設定での適用性を制限している。知識蒸留(KD)は、大規模な教師モデルからコンパクトな学生モデルに専門知識を移すことによって、実行可能なソリューションを提供する。しかしながら、従来のKD技術は、LLM出力の制限、教師と学生の容量格差、継承された誤校正問題など、LLMに適用する際の特定の課題に直面している。本研究は,新規な選好型LLM蒸留フレームワークであるPLaDについて述べる。 PLaDは教師と学生の能力の相違を利用して、学生の出力よりも教師の出力が優先される擬似参照ペアを生成する。そして、PLaDはランキングの損失を利用して、生徒が教師を模倣するのではなく、出力の相対的品質を理解することに焦点を当てたシーケンス可能性の推定を再検討する。 PLaDは、教師のLLMの内部状態へのアクセスの必要性を回避し、生徒の表現力制限に対処し、生徒の誤校正問題を緩和する。 2つのシーケンス生成タスクと各種LLMの広範な実験を通じて,提案手法の有効性を実証した。 Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate student's estimation of sequence likelihood, which steers the student's focus towards understanding the relative quality of outputs instead of simply imitating the teacher. PLaD bypasses the need for access to teacher LLM's internal states, tackles the student's expressivity limitations, and mitigates the student mis-calibration issue. Through extensive experiments on two sequence generation tasks and with various LLMs, we demonstrate the effectiveness of our proposed PLaD framework.	翻訳日:2024-06-06 22:16:58 公開日:2024-06-05
# HYDRA:Black-Box LLMパーソナライゼーションのためのモデル因子化フレームワーク HYDRA: Model Factorization Framework for Black-Box LLM Personalization ( http://arxiv.org/abs/2406.02888v1 ) ライセンス: Link先を確認	Yuchen Zhuang, Haotian Sun, Yue Yu, Qifan Wang, Chao Zhang, Bo Dai,	(参考訳) パーソナライゼーションは、ユーザの行動履歴をマイニングし、カスタマイズされた体験を提供するための好みに適応することに焦点を当てた、現代のインテリジェントシステムにおける重要な研究領域として現れてきた。ブラックボックスの大規模言語モデル(LLM)が示した驚くべき数ショットの能力にもかかわらず、それらのモデルパラメータの本質的な不透明さは、生成された出力を個々の期待と整合させる上で大きな課題である。既存のソリューションは主に、ユーザ固有のプロファイルや振る舞いを組み込むための設計に重点を置いているが、そのようなアプローチは、すべてのユーザ間で共有知識をキャプチャできないため、効果的に一般化するのに苦労することが多い。これらの課題に対処するために,歴史的データからユーザ固有の行動パターンを抽出し,パーソナライズされた世代を提供するための一般知識を共有するモデル分解フレームワークHYDRAを提案する。ユーザ固有の行動パターンをキャプチャするために、まず、リランカをトレーニングし、検索履歴から最も有用な情報を優先する。優先度付き履歴と対応するクエリを組み合わせることで,個々のユーザの好みに合わせて出力を調整できるようにアダプタを訓練し,ブラックボックスLLMの固有モデルパラメータへの依存を解消する。リランカとアダプタの両方を、ヒドラに似た複数のユーザ固有のヘッドを持つベースモデルに分解することができる。ベースモデルは、ユーザ間の共有知識を維持し、複数のパーソナルヘッドは、ユーザ固有の嗜好をキャプチャする。実験の結果、HYDRAは、LaMPベンチマークの5つの異なるパーソナライズタスクに対して、平均9.01%の相対的な改善により、既存の最先端のプロンプトベースの手法よりも優れていることが示された。実装はhttps://github.com/night-chen/HYDRAで公開しています。 Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the generated output with individual expectations. Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. In order to capture user-specific behavior patterns, we first train a reranker to prioritize the most useful information from top-retrieved relevant historical records. By combining the prioritized history with the corresponding query, we train an adapter to align the output with individual user-specific preferences, eliminating the reliance on access to inherent model parameters of black-box LLMs. Both the reranker and the adapter can be decomposed into a base model with multiple user-specific heads, resembling a hydra. The base model maintains shared knowledge across users, while the multiple personal heads capture user-specific preferences. Experimental results demonstrate that HYDRA outperforms existing state-of-the-art prompt-based methods by an average relative improvement of 9.01% across five diverse personalization tasks in the LaMP benchmark. Our implementation is available at https://github.com/night-chen/HYDRA.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 未知データセットバイアスの言語誘導検出と緩和 Language-guided Detection and Mitigation of Unknown Dataset Bias ( http://arxiv.org/abs/2406.02889v1 ) ライセンス: Link先を確認	Zaiying Zhao, Soichiro Kumano, Toshihiko Yamasaki,	(参考訳) データセットバイアスは、公平な分類器を訓練する上で重要な問題である。分類と無関係な属性が特定のクラスに対して強いバイアスを示す場合、そのようなデータセットで訓練された分類器はこれらのバイアス属性に過度に適合し、少数群の精度を著しく低下させる。緩和技術はバイアス情報(つまり事前知識)の可用性に応じて分類することができる。未知のバイアスのあるシナリオは現実世界の設定に適しているが、この分野での以前の作業は、バイアスに関する解釈可能性の欠如とパフォーマンスの低下に悩まされることが多い。本研究では,キャプションの部分的発生に基づく事前知識のないキーワードとして潜在的なバイアスを識別する枠組みを提案する。さらに2つのデバイアス法を提案する。 (a)擬似ラベルを割り当てて事前知識を必要とする既存の嫌悪的アプローチを譲り受け、 b) 取得したバイアスキーワードをプロンプトとして,テキストから画像への生成モデルによるデータ拡張を利用する。その単純さにもかかわらず、実験結果から、我々のフレームワークは、事前知識なしで既存のメソッドよりも優れているだけでなく、事前知識を前提としたメソッドにさえ匹敵することを示した。 Dataset bias is a significant problem in training fair classifiers. When attributes unrelated to classification exhibit strong biases towards certain classes, classifiers trained on such dataset may overfit to these bias attributes, substantially reducing the accuracy for minority groups. Mitigation techniques can be categorized according to the availability of bias information (\ie, prior knowledge). Although scenarios with unknown biases are better suited for real-world settings, previous work in this field often suffers from a lack of interpretability regarding biases and lower performance. In this study, we propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions. We further propose two debiasing methods: (a) handing over to an existing debiasing approach which requires prior knowledge by assigning pseudo-labels, and (b) employing data augmentation via text-to-image generative models, using acquired bias keywords as prompts. Despite its simplicity, experimental results show that our framework not only outperforms existing methods without prior knowledge, but also is even comparable with a method that assumes prior knowledge.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 効率的な多エージェント強化学習のための表現学習 Representation Learning For Efficient Deep Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2406.02890v1 ) ライセンス: Link先を確認	Dom Huh, Prasant Mohapatra,	(参考訳) サンプル効率はマルチエージェント強化学習(MARL)において依然として重要な課題である。有望なアプローチは、MARLの目的に沿った補助的な学習目標を通じて有意義な潜在表現空間を学習し、制御ポリシーの学習を支援することである。本稿では,MAPO-LSO(Multi-Agent Policy Optimization with Latent Space Optimization)を提案する。特に、MAPO-LSOは遷移力学再構成と自己予測学習のマルチエージェント拡張を提案し、現在の最先端MARLアルゴリズムに自明に拡張できる潜在状態最適化スキームを構築している。実験の結果,MAPO-LSOは,多種多様なMARLタスクに対して,追加のMARLハイパーパラメータチューニングを伴わないバニラMARLと比較して,サンプル効率と学習性能の顕著な向上を示した。 Sample efficiency remains a key challenge in multi-agent reinforcement learning (MARL). A promising approach is to learn a meaningful latent representation space through auxiliary learning objectives alongside the MARL objective to aid in learning a successful control policy. In our work, we present MAPO-LSO (Multi-Agent Policy Optimization with Latent Space Optimization) which applies a form of comprehensive representation learning devised to supplement MARL training. Specifically, MAPO-LSO proposes a multi-agent extension of transition dynamics reconstruction and self-predictive learning that constructs a latent state optimization scheme that can be trivially extended to current state-of-the-art MARL algorithms. Empirical results demonstrate MAPO-LSO to show notable improvements in sample efficiency and learning performance compared to its vanilla MARL counterpart without any additional MARL hyperparameter tuning on a diverse suite of MARL tasks.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 高速類似検索のためのバイメトリックフレームワーク A Bi-metric Framework for Fast Similarity Search ( http://arxiv.org/abs/2406.02891v1 ) ライセンス: Link先を確認	Haike Xu, Sandeep Silwal, Piotr Indyk,	(参考訳) 近接するデータ構造を設計するための新しい「バイメトリック」フレームワークを提案する。本フレームワークでは, 高精度で計算に費用がかかる基底トラストメトリックと, 安価だが精度の低いプロキシメトリックの2つの相似性関数を仮定する。理論と実践の両方において,クエリ手順が高価なメトリックの精度を達成するように,プロキシメトリックのみを使用してデータ構造を構築する方法を示す。我々の理論的結果は、このフレームワークを2つの一般的な近接探索アルゴリズムであるDiskANNとCover Treeのインスタンス化する。いずれの場合も、データ構造を構成するために使用されるプロキシメトリックが、境界要素まで基底トラスの計量を近似する限り、データ構造は、基底トラスの計量に関して任意に良好な近似を保証する。実験的な面では、計算コストが大幅に異なるMLモデルにより評価された2つの相似関数を持つテキスト検索問題に対して、このフレームワークを適用した。 MTEBベンチマークのほぼ全てのデータセットに対して、我々の手法は、再ランク付けのような代替手法よりも精度と効率のトレードオフがかなり優れていることを観察する。 We propose a new "bi-metric" framework for designing nearest neighbor data structures. Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate. In both theory and practice, we show how to construct data structures using only the proxy metric such that the query procedure achieves the accuracy of the expensive metric, while only using a limited number of calls to both metrics. Our theoretical results instantiate this framework for two popular nearest neighbor search algorithms: DiskANN and Cover Tree. In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric. On the empirical side, we apply the framework to the text retrieval problem with two dissimilarity functions evaluated by ML models with vastly different computational costs. We observe that for almost all data sets in the MTEB benchmark, our approach achieves a considerably better accuracy-efficiency tradeoff than the alternatives, such as re-ranking.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 言語モデルでは知識追跡が可能:言語モデルと知識追跡タスクを統合するシンプルだが効果的な方法 Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task ( http://arxiv.org/abs/2406.02893v1 ) ライセンス: Link先を確認	Unggi Lee, Jiyeong Bae, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Damji Stratton, Hyeoncheol Kim,	(参考訳) KT(Knowledge Tracing)は、学生の知識を時間とともにモデリングするオンライン学習において重要なタスクである。数列をデータとして依存するディープラーニングベースのKTモデルの成功にもかかわらず、既存のアプローチのほとんどは、質問や概念のテキストのリッチなセマンティック情報を活用することができない。本稿では、事前学習された言語モデル(PLM)とKTメソッドを統合する新しいフレームワークである言語モデルに基づく知識追跡(LKT)を提案する。セマンティック表現をキャプチャするために言語モデルのパワーを活用することで、LKTはテキスト情報を効果的に取り入れ、大規模なベンチマークデータセットで以前のKTモデルよりも大幅に優れている。さらに,PLMが獲得した意味的知識を活用することで,LKTがKTのコールドスタート問題に効果的に対処できることを実証した。 LKTの解釈性は、テキストリッチなデータを使用するため、従来のKTモデルと比較して向上している。そこで我々は,局所的解釈可能なモデルに依存しない説明手法と注意点の分析を行い,モデル性能をさらに解釈した。我々の研究は、PLMとKTの統合の可能性を強調し、KTドメインにおける今後の研究の道を開くものである。 Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 直列配向アルゴリズムにおける逆モデル過最適化のスケーリング法則 Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms ( http://arxiv.org/abs/2406.02900v1 ) ライセンス: Link先を確認	Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum,	(参考訳) Reinforcement Learning from Human Feedback (RLHF)は、最近のLarge Language Models (LLMs)の成功に不可欠であるが、しばしば複雑で不安定なプロセスである。古典的なRLHFフレームワークでは、報酬モデルはまず人間の好みを表現するために訓練され、オンライン強化学習(RL)アルゴリズムによってLLMを最適化するために使用される。このような方法の顕著な問題は、学習されたプロキシ報酬モデルによって測定されたパフォーマンスが増大するが、真の品質のプラトーは低下する、あるいは低下する、\emph{reward over-optimization} または \emph{reward Hacking} である。ダイレクトアライメントアルゴリズム(DDA)は、報酬モデリングフェーズを回避し、古典的なRLHFパイプラインに代わるものとして登場した。しかしながら、DAAは別のプロキシ報酬モデルを使用していないが、通常は過度な最適化によって劣化している。いわゆる報酬ハッキング現象は、DAAにとってよく定義されていないが、同じような傾向がまだ明らかである:高いKL予算では、DAAアルゴリズムは従来のRLHFと同じような劣化パターンを示す。特に,DAA法は,広範囲のKL予算だけでなく,データセットの1つのエポックが完成する前にも劣化することがわかった。広範な実証実験を通じて、この研究はDAAに対する過度な最適化やハッキングの問題を定式化し、その成果を目的、訓練体制、モデルスケールにわたって探求する。 Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods is \emph{reward over-optimization} or \emph{reward hacking}, where performance as measured by the learned proxy reward model increases, but true quality plateaus or even deteriorates. Direct Alignment Algorithms (DDAs) like Direct Preference Optimization have emerged as alternatives to the classical RLHF pipeline by circumventing the reward modeling phase. However, although DAAs do not use a separate proxy reward model, they still commonly deteriorate from over-optimization. While the so-called reward hacking phenomenon is not well-defined for DAAs, we still uncover similar trends: at higher KL budgets, DAA algorithms exhibit similar degradation patterns to their classic RLHF counterparts. In particular, we find that DAA methods deteriorate not only across a wide range of KL budgets but also often before even a single epoch of the dataset is completed. Through extensive empirical experimentation, this work formulates and formalizes the reward over-optimization or hacking problem for DAAs and explores its consequences across objectives, training regimes, and model scales.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# S$^2$GSL:Aspect-based Sentiment Analysisのための構文強化グラフ構造学習へのセグメントの導入 S$^2$GSL: Incorporating Segment to Syntactic Enhanced Graph Structure Learning for Aspect-based Sentiment Analysis ( http://arxiv.org/abs/2406.02902v1 ) ライセンス: Link先を確認	Bingfeng Chen, Qihan Ouyang, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao,	(参考訳) Aspect based Sentiment Analysis(ABSA)における従来のグラフベースのアプローチは、静的依存木や動的潜伏木の構造を学習するためにグラフニューラルネットワークとアテンション機構を活用することで、優れたパフォーマンスを示している。しかし、複雑なグローバル構造にセマンティック情報と構文情報を同時に組み込むことは、グラフ構造学習の過程で無関係な文脈や構文依存を導入し、不正確な予測をもたらす可能性がある。上記の問題に対処するために,Segment と Syntactic enhanced Graph Structure Learning for ABSA を取り入れた S$^2$GSL を提案する。具体的には、S$^2$GSLにはセグメンテーションを意識したセマンティックグラフ学習と、無関係なコンテキストと依存関係の削除を可能にする構文ベースの潜在グラフ学習が特徴である。さらに,2つのグラフ学習分野の融合を容易にし,多様な構造をまたいだ相補性を実現する自己適応型集約ネットワークを提案する。 4つのベンチマークによる実験結果から,本フレームワークの有効性が示された。 Previous graph-based approaches in Aspect based Sentiment Analysis(ABSA) have demonstrated impressive performance by utilizing graph neural networks and attention mechanisms to learn structures of static dependency trees and dynamic latent trees. However, incorporating both semantic and syntactic information simultaneously within complex global structures can introduce irrelevant contexts and syntactic dependencies during the process of graph structure learning, potentially resulting in inaccurate predictions. In order to address the issues above, we propose S$^2$GSL, incorporating Segment to Syntactic enhanced Graph Structure Learning for ABSA. Specifically,S$^2$GSL is featured with a segment-aware semantic graph learning and a syntax-based latent graph learning enabling the removal of irrelevant contexts and dependencies, respectively. We further propose a self-adaptive aggregation network that facilitates the fusion of two graph learning branches, thereby achieving complementarity across diverse structures. Experimental results on four benchmarks demonstrate the effectiveness of our framework.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# オープングランドプランニング - 課題とベンチマーク構築 Open Grounded Planning: Challenges and Benchmark Construction ( http://arxiv.org/abs/2406.02903v1 ) ライセンス: Link先を確認	Shiguang Guo, Ziliang Deng, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun,	(参考訳) 大規模言語モデル(LLM)の出現は、人間のような計画にLLMを使うことに注目が集まっている。 LLMベースの計画に関する既存の研究は、LLMの言語生成能力を活用してフリースタイルの計画を作成するか、あるいは制限された環境内での限られた行動に対する意思決定を学習するために強化学習アプローチを採用するかに焦点を当てている。しかし、どちらの手法も、実世界の計画において、オープンかつ実行可能な要件とはかなりの相違が見られる。本稿では,新しい計画課題であるオープングランドプランニングを提案する。オープングランドプランニングの主な目的は、モデルに可変アクションセットに基づいて実行可能なプランを生成するように要求することであり、それによって生成されたプランの実行可能性を確保することである。この目的のために、幅広い領域にまたがるオープングランドプランニングのベンチマークを確立する。そして、現在最先端のLLMを5つの計画手法とともにテストし、既存のLLMとメソッドが、オープンドメインの基盤となる計画によってもたらされる課題を解決するのに依然として苦労していることを明らかにした。本研究の結果は,オープングラウンドプランニングの基盤となるデータセットを定義し,LLMプランニングの潜在的な課題と今後の方向性を明らかにした。 The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments. However, both approaches exhibit significant discrepancies from the open and executable requirements in real-world planning. In this paper, we propose a new planning task--open grounded planning. The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set, thereby ensuring the executability of the produced plan. To this end, we establishes a benchmark for open grounded planning spanning a wide range of domains. Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains. The outcomes of this paper define and establish a foundational dataset for open grounded planning, and shed light on the potential challenges and future directions of LLM-based planning.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# Focked-up ZX計算:連続可変量子計算 The Focked-up ZX Calculus: Picturing Continuous-Variable Quantum Computation ( http://arxiv.org/abs/2406.02905v1 ) ライセンス: Link先を確認	Razin A. Shaikh, Lia Yeh, Stefano Gogioso,	(参考訳) ZX と ZW の計算は有限次元量子計算のグラフィカル推論ツールとして有効であるが、無限次元ヒルベルト空間における連続変数量子計算(CVQC)の可能性は探求され始めている。本研究では,CVQCのグラフィカル言語を定式化する。各図は2種類のクモからなる無向グラフで、実数上で定義されたZXのZクモと、自然数で定義された新しく導入されたフォッククモである。 Z と X のクモはそれぞれ位置空間と運動量空間の関数を表し、フォッククモは離散フォック基底の関数を表す。 Z と X の間のフーリエ変換と Z と Fock の間のエルミート変換に加えて、ヘフティアCVQC 相互作用をキャプチャするエキサイティングな新しいグラフィカルルールを提案する。この計算が無限次元ヒルベルト空間で解釈されたガウス CVQC のすべてに対して完備であることを保証するため、ブース、カレット、コンフォートによるアフィンラグランジアン関係の完全性を変換する。量子誤り訂正法を応用して、ゴッテマン・キタエフ・プレスキル(GKP)符号エンコーダ、シンドローム測定、およびアダマール固有状態のマジック状態蒸留のグラフィカル表現を導出する。最後に,ガウスボソンサンプリングについて,その回路がハフニアンのサブマトリクスをサンプリングすることの完全なグラフィカルな証明を提供することによって解明する。 While the ZX and ZW calculi have been effective as graphical reasoning tools for finite-dimensional quantum computation, the possibilities for continuous-variable quantum computation (CVQC) in infinite-dimensional Hilbert space are only beginning to be explored. In this work, we formulate a graphical language for CVQC. Each diagram is an undirected graph made of two types of spiders: the Z spider from the ZX calculus defined on the reals, and the newly introduced Fock spider defined on the natural numbers. The Z and X spiders represent functions in position and momentum space respectively, while the Fock spider represents functions in the discrete Fock basis. In addition to the Fourier transform between Z and X, and the Hermite transform between Z and Fock, we present exciting new graphical rules capturing heftier CVQC interactions. We ensure this calculus is complete for all of Gaussian CVQC interpreted in infinite-dimensional Hilbert space, by translating the completeness in affine Lagrangian relations by Booth, Carette, and Comfort. Applying our calculus for quantum error correction, we derive graphical representations of the Gottesman-Kitaev-Preskill (GKP) code encoder, syndrome measurement, and magic state distillation of Hadamard eigenstates. Finally, we elucidate Gaussian boson sampling by providing a fully graphical proof that its circuit samples submatrix hafnians.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 感性分析のための予測フィードバックによるインコンテキスト学習の改善 Improving In-Context Learning with Prediction Feedback for Sentiment Analysis ( http://arxiv.org/abs/2406.02911v1 ) ライセンス: Link先を確認	Hongling Xu, Qianlong Wang, Yice Zhang, Min Yang, Xi Zeng, Bing Qin, Ruifeng Xu,	(参考訳) 大規模言語モデル(LLM)は、文脈内学習(ICL)パラダイムを通じて感情分析において有望な結果を得た。しかし、微妙な感情を区別する能力は依然として課題である。人間のフィードバックによる理解の調整能力に触発されて,従来の予測とフィードバックを取り入れたICLを強化し,LLMの感情的誤解釈の是正を目指す。具体的には,(1)LLMの事前予測の取得,(2)正確性に基づく予測フィードバックの考案,(3)感情理解を洗練させるためにフィードバック駆動のプロンプトを活用する3つのステップから構成される。 9つの感情分析データセットによる実験結果から,従来のICL法よりもフレームワークが優れていることが示され,平均F1改善率は5.95%となった。 Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation of LLMs. Specifically, the proposed framework consists of three steps: (1) acquiring prior predictions of LLMs, (2) devising predictive feedback based on correctness, and (3) leveraging a feedback-driven prompt to refine sentiment understanding. Experimental results across nine sentiment analysis datasets demonstrate the superiority of our framework over conventional ICL methods, with an average F1 improvement of 5.95%.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 極端間隔を有するLDMのゼロ次微調整 Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity ( http://arxiv.org/abs/2406.02913v1 ) ライセンス: Link先を確認	Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu,	(参考訳) ゼロ階最適化(ゼロ階最適化、ZO)は、フォワードパスのみを用いた大規模言語モデルの微調整のためのメモリ効率の最適化手法である。しかし、携帯電話やラップトップなどのメモリ制限された設定におけるZO微調整の適用は、完全精度のフォワードパスが実現不可能であるため、依然として困難である。本研究では,LLMのZO微調整に空間性と量子化を組み込むことにより,この制限に対処する。具体的には,ZO を用いた LLM パラメータの極めて小さなサブセットの微調整の実現可能性について検討する。このアプローチにより、未チューニングパラメータの大部分を量子化し、限られたデバイスメモリの制約を満たすことができる。以上の結果から, 学習前プロセスは, 下流タスクにおけるZO微調整を導出する「感度パラメータ」のセットを特定できることがわかった。以上の結果から,ZO を用いた LLM の微調整パラメータは,壁面時間速度を向上しつつ,ZO の微調整性能に優れることが示された。さらに、これらの0.1%の感度パラメータをターゲットとしたZO微調整と4ビット量子化を組み合わせ、メモリ8ギバイト未満のGPUデバイス上でのLlama2-7Bモデルの効率的なZO微調整を可能にし、遅延を顕著に低減できることを示す。 Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of "sensitive parameters" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance, while offering wall-clock time speedup. Additionally, we show that ZO fine-tuning targeting these 0.1% sensitive parameters, combined with 4 bit quantization, enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8 GiB of memory and notably reduced latency.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 水中音響カメラ画像の自己監督型デノナイズ戦略 A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries ( http://arxiv.org/abs/2406.02914v1 ) ライセンス: Link先を確認	Xiaoteng Zhou, Katsunori Mizuno, Yilong Zhang,	(参考訳) 濁度と暗さを特徴とする低視認性海洋環境では、音響カメラは高解像度の2Dソナー画像を生成することができる視覚センサーとして機能する。しかし、音響カメラ画像は複雑なノイズによって干渉され、下流の視覚アルゴリズムによって直接摂取することは困難である。本稿では,自己監督型デノナイジングフレームワークと細かな特徴誘導ブロックの2つの主要構成要素からなる深層学習技術を用いて,音響カメラ画像のデノナイジング手法を提案する。さらに,画像の認知レベルと特徴マッチング性能の改善との関係について検討した。実験結果から,提案手法はノイズモデルに事前の知識を必要とせず,効果的に音響カメラ画像のフィルタリングを行うことができることがわかった。 denoisingプロセスは、複雑なパラメータチューニングと後処理なしで、ほぼエンドツーエンドである。微細な特徴を保存しながらノイズを除去し、局所的な特徴マッチングの性能を向上させる。 In low-visibility marine environments characterized by turbidity and darkness, acoustic cameras serve as visual sensors capable of generating high-resolution 2D sonar images. However, acoustic camera images are interfered with by complex noise and are difficult to be directly ingested by downstream visual algorithms. This paper introduces a novel strategy for denoising acoustic camera images using deep learning techniques, which comprises two principal components: a self-supervised denoising framework and a fine feature-guided block. Additionally, the study explores the relationship between the level of image denoising and the improvement in feature-matching performance. Experimental results show that the proposed denoising strategy can effectively filter acoustic camera images without prior knowledge of the noise model. The denoising process is nearly end-to-end without complex parameter tuning and post-processing. It successfully removes noise while preserving fine feature details, thereby enhancing the performance of local feature matching.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 視覚テキストのクロスアライメント:視覚言語モデルにおける類似点の精査 Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models ( http://arxiv.org/abs/2406.02915v1 ) ライセンス: Link先を確認	Jinhao Li, Haopeng Li, Sarah Erfani, Lei Feng, James Bailey, Feng Liu,	(参考訳) 近年,例えばCLIPなど,事前学習された視覚言語モデルを用いて,大規模言語モデルによって生成されたより微細なテキスト記述とクエリイメージ全体を整合させることで,ゼロショット性能を大幅に向上させることが判明している。しかし,本論文では,画像全体よりもクエリ画像の局所的な領域に,より詳細な記述がより効果的に適合する傾向があることを実証的に確認し,理論的に検証する。そこで本研究では,重み付きビジュアルテキスト・クロスアライメント(WCA)という手法を提案する。この方法は、クエリ画像内の局所的な視覚領域を特定するために設計された、局所的な視覚的プロンプト技術から始まる。局所的な視覚領域は、事前訓練されたVLMを用いて類似度行列を作成することにより、より微細な記述と交差する。問合せ画像が各カテゴリとどの程度よく一致しているかを判断するために,この行列の重み付き類似度に基づいてスコア関数を開発する。大規模な実験により,本手法は各種データセット間のゼロショット性能を著しく向上し,少数ショット学習手法に匹敵する結果が得られることが示された。 It has recently been discovered that using a pre-trained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly enhance zero-shot performance. However, in this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than the whole image, and then we theoretically validate this finding. Thus, we present a method called weighted visual-text cross alignment (WCA). This method begins with a localized visual prompting technique, designed to identify local visual areas within the query image. The local visual areas are then cross-aligned with the finer descriptions by creating a similarity matrix using the pre-trained VLM. To determine how well a query image aligns with each category, we develop a score function based on the weighted similarities in this matrix. Extensive experiments demonstrate that our method significantly improves zero-shot performance across various datasets, achieving results that are even comparable to few-shot learning methods.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 微分方程式と演算子ネットワークに対するMLPとkan表現の包括的およびFAIR比較 A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks ( http://arxiv.org/abs/2406.02917v1 ) ライセンス: Link先を確認	Khemraj Shukla, Juan Diego Toscano, Zhicheng Wang, Zongren Zou, George Em Karniadakis,	(参考訳) Kolmogorov-Arnold Networks (KAN) はMLPの代替表現モデルとして最近導入された。本稿では, 物理インフォームド機械学習モデル (PIKAN) とディープ演算子モデル (DeepokaN) を構築し, 前方および逆問題に対する微分方程式を解く。特に,物理インフォームドニューラルネットワーク (PINN) とディープオペレータネットワーク (DeepONets) を比較する。 B-splinesパラメタライゼーションに基づく元のkanは精度と効率に欠けるが、低次直交多項式に基づく修正版はPINNやDeepONetと同等の性能を持つが、異なるランダムシードや高次直交多項式に分岐する可能性があるため、ロバスト性に欠ける。我々は,それらの損失景観を可視化し,情報ボトルネック理論を用いて学習動態を解析する。我々の研究は、FAIRの原則に従って、他の研究者が我々のベンチマークを使って、この新たなトピックをさらに前進させることができるようにしている。 Kolmogorov-Arnold Networks (KANs) were recently introduced as an alternative representation model to MLP. Herein, we employ KANs to construct physics-informed machine learning models (PIKANs) and deep operator models (DeepOKANs) for solving differential equations for forward and inverse problems. In particular, we compare them with physics-informed neural networks (PINNs) and deep operator networks (DeepONets), which are based on the standard MLP representation. We find that although the original KANs based on the B-splines parameterization lack accuracy and efficiency, modified versions based on low-order orthogonal polynomials have comparable performance to PINNs and DeepONet although they still lack robustness as they may diverge for different random seeds or higher order orthogonal polynomials. We visualize their corresponding loss landscapes and analyze their learning dynamics using information bottleneck theory. Our study follows the FAIR principles so that other researchers can use our benchmarks to further advance this emerging topic.	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# 医用画像の分離・生成に強力なバックボーンを作るU-KAN U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation ( http://arxiv.org/abs/2406.02918v1 ) ライセンス: Link先を確認	Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan,	(参考訳) U-Netは画像分割や拡散確率モデルといった様々な視覚的応用の基盤となっている。変圧器やMLPを導入して多くの革新的な設計や改良がなされてきたが、ネットワークは依然として線形モデリングパターンと不十分な解釈可能性に制限されている。これらの課題に対処するため、我々の直感は、コルモゴロフ・アルノルドネットワーク(KAN)の精度と解釈可能性の観点から印象的な結果に触発され、コルモゴロフ・アンノルド表現定理から導かれる非線形可学習活性化関数のスタックを介してニューラルネットワーク学習を再構築した。具体的には,視覚タスクのバックボーン改善におけるkansの未解決の可能性について検討する。トークン化中間表現であるU-KAN上に専用kan層を統合することにより,確立したU-Netパイプラインを検証,修正,再設計する。厳密な医用画像セグメンテーションのベンチマークでは、計算コストが低い場合でも高い精度でU-KANの優位性を検証している。さらに、拡散モデルにおける代替U-Netノイズ予測器としてのU-KANの可能性を探り、タスク指向モデルアーキテクチャの生成にその適用性を実証した。これらの取り組みは貴重な洞察を示し、U-KANでは医用画像のセグメンテーションと生成のための強力なバックボーンを作ることができるという可能性に光を当てている。プロジェクトページ: https://yes-ukan.github.io/ U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the impressive results of the Kolmogorov-Arnold Networks (KANs) in terms of accuracy and interpretability, which reshape the neural network learning via the stack of non-linear learnable activation functions derived from the Kolmogorov-Anold representation theorem. Specifically, in this paper, we explore the untapped potential of KANs in improving backbones for vision tasks. We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN. Rigorous medical image segmentation benchmarks verify the superiority of U-KAN by higher accuracy even with less computation cost. We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures. These endeavours unveil valuable insights and sheds light on the prospect that with U-KAN, you can make strong backbone for medical image segmentation and generation. Project page: https://yes-ukan.github.io/	翻訳日:2024-06-06 22:05:49 公開日:2024-06-05
# MultifacetEval: 医学知識習得におけるLLMの多面的評価 MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge ( http://arxiv.org/abs/2406.02919v1 ) ライセンス: Link先を確認	Yuxuan Zhou, Xien Liu, Chen Ning, Ji Wu,	(参考訳) 大規模言語モデル(LLM)はドメイン間で優れており、MedQAのような医療評価ベンチマークでも顕著なパフォーマンスを提供している。しかし、実際の医療シナリオにおける報告されたパフォーマンスと実践的効果の間には、依然として大きなギャップがある。本稿では,このギャップの原因を多面的検査スキーマを用いて検討し,現在のLSMによる医学知識の実態を体系的に探究することを目的とする。具体的には,複数の面(比較,修正,識別,検証)における医療知識のエンコーディングと習得におけるLLMの程度と範囲を同時に検討するための,新しい評価フレームワークであるMultifacetEvalを開発した。 MultifacetEval フレームワークをベースとして,MultiDiseK (臨床疾患知識ベースからの質問) とMultiMedQA (医療ベンチマーク MedQA からの質問を多面的質問に書き換える) という2つの多面的評価データセットを構築した。これらの多面的データセットの実験結果は、医学知識を習得する際の現在のLLMの程度が、既存の医学ベンチマークよりもはるかに低いことを示し、医学知識を習得する際の深さ、精度、包括性を欠いていることを示唆している。結果として、現在のLLMは現実世界の医療タスクにはまだ対応できていない。コードとデータセットはhttps://github.com/THUMLP/MultifacetEval.comで公開されている。 Large language models (LLMs) have excelled across domains, also delivering notable performance on the medical evaluation benchmarks, such as MedQA. However, there still exists a significant gap between the reported performance and the practical effectiveness in real-world medical scenarios. In this paper, we aim to explore the causes of this gap by employing a multifaceted examination schema to systematically probe the actual mastery of medical knowledge by current LLMs. Specifically, we develop a novel evaluation framework MultifacetEval to examine the degree and coverage of LLMs in encoding and mastering medical knowledge at multiple facets (comparison, rectification, discrimination, and verification) concurrently. Based on the MultifacetEval framework, we construct two multifaceted evaluation datasets: MultiDiseK (by producing questions from a clinical disease knowledge base) and MultiMedQA (by rephrasing each question from a medical benchmark MedQA into multifaceted questions). The experimental results on these multifaceted datasets demonstrate that the extent of current LLMs in mastering medical knowledge is far below their performance on existing medical benchmarks, suggesting that they lack depth, precision, and comprehensiveness in mastering medical knowledge. Consequently, current LLMs are not yet ready for application in real-world medical tasks. The codes and datasets are available at https://github.com/THUMLP/MultifacetEval.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# ニューラルネットワークバイザリングのためのテキストインジェクション Text Injection for Neural Contextual Biasing ( http://arxiv.org/abs/2406.02921v1 ) ライセンス: Link先を確認	Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran,	(参考訳) ニューラルコンテキストバイアスは、話者の文脈内で重要なフレーズ、特に訓練データに稀なフレーズに対する自動音声認識(ASR)を効果的に改善する。本研究では文脈テキストインジェクション(CTI)を提案する。 CTIは、ペア化された音声テキストデータだけでなく、ASRモデルとそのバイアス成分を最適化するために、より大規模な未ペアテキストコーパスも活用している。未ペアテキストは、音声のような表現に変換され、モデルの注意を関連するバイアスフレーズへと導くために使用される。さらに、文脈テキスト注入(CTI)最小単語誤り率(MWER)トレーニングを導入する。実験により、1000億の文を持つCTIは、強い神経バイアスモデルから43.3%の相対的なWER削減を達成できることが示された。 CTI-MWERはさらに23.5%の改善を提供している。 Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and its biasing component. Unpaired text is converted into speech-like representations and used to guide the model's attention towards relevant bias phrases. Moreover, we introduce a contextual text-injected (CTI) minimum word error rate (MWER) training, which minimizes the expected WER caused by contextual biasing when unpaired text is injected into the model. Experiments show that CTI with 100 billion text sentences can achieve up to 43.3% relative WER reduction from a strong neural biasing model. CTI-MWER provides a further relative improvement of 23.5%.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# スパイクニューラルネットワークを状態空間モデルとして再考 Rethinking Spiking Neural Networks as State Space Models ( http://arxiv.org/abs/2406.02923v1 ) ライセンス: Link先を確認	Malyaban Bal, Abhronil Sengupta,	(参考訳) スパイキングニューラルネットワーク(SNN)は、従来のニューラルアーキテクチャに代わる生物学的に妥当な代替品として提案されており、そのコアとなる計算フレームワークは、広範囲に研究されているインテリジェンス・アンド・ファイア(LIF)ニューロンの設計に依存している。 LIFニューロンのステートフルな性質は、リカレントニューラルネットワーク(RNN)と同様に、SNNがシーケンシャルなデータを処理する能力について、現在進行中の議論を引き起こしている。それにもかかわらず、長距離依存タスクの領域において、現在のSNNの探索には大きなギャップが残っている。本研究では, 単純なLIF機構を超えて神経力学の解析を拡張するために, 状態空間モデルに基づく新しい確率的スパイキング神経モデルを提案する。我々は従来の膜電位のみを含むLIFニューロンのスカラー隠れ状態表現を超えて、n次元隠れ状態を提案する。さらに、LIFニューロンの固定ダイナミクスとは対照的に、学習可能なパラメータを導入することにより、各層にまたがるニューロンダイナミクスの微調整が可能となる。また,これらのニューラルネットワークモデルを深部SNNアーキテクチャに拡張し,効率的な並列トレーニングを実現するとともに,後方フェーズにおける確率的スパイク操作の非微分可能性の課題にも着目する。我々のモデルは、Long Range Arenaベンチマーク、順列MNIST、音声コマンドデータセットを含む、様々な長距離依存タスクにわたるSNNモデル間の最先端性能を実現する。さらに、このスパイキングモデルに固有のスパース活動パターンを強調し、エネルギー効率の利点を分析する。 Spiking neural networks (SNNs) are posited as a biologically plausible alternative to conventional neural architectures, with their core computational framework resting on the extensively studied leaky integrate-and-fire (LIF) neuron design. The stateful nature of LIF neurons has spurred ongoing discussions about the ability of SNNs to process sequential data, akin to recurrent neural networks (RNNs). Despite this, there remains a significant gap in the exploration of current SNNs within the realm of long-range dependency tasks. In this study, to extend the analysis of neuronal dynamics beyond simplistic LIF mechanism, we present a novel class of stochastic spiking neuronal model grounded in state space models. We expand beyond the scalar hidden state representation of LIF neurons, which traditionally comprises only the membrane potential, by proposing an n-dimensional hidden state. Additionally, we enable fine-tuned formulation of neuronal dynamics across each layer by introducing learnable parameters, as opposed to the fixed dynamics in LIF neurons. We also develop a robust framework for scaling these neuronal models to deep SNN-based architectures, ensuring efficient parallel training while also adeptly addressing the challenge of non-differentiability of stochastic spiking operation during the backward phase. Our models attain state-of-the-art performance among SNN models across diverse long-range dependency tasks, encompassing the Long Range Arena benchmark, permuted sequential MNIST, and the Speech Command dataset. Moreover, we provide an analysis of the energy efficiency advantages, emphasizing the sparse activity pattern intrinsic to this spiking model.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# Pruner-Zero:大規模言語モデルのスクラッチからシンボリック・プルーニング・メトリックを進化させる Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models ( http://arxiv.org/abs/2406.02924v1 ) ライセンス: Link先を確認	Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu,	(参考訳) 目覚ましい機能にもかかわらず、LLM(Large Language Models)はその大きなサイズのため、デプロイメントの課題に直面している。プルーニング法は重量のサブセットを減らして加速させるが、その多くは再訓練を必要とする。近年,再学習を伴わずにLLMを刈り取る手法が提案されている。しかし、これらのメトリクスは人間の専門家の関与と退屈な試行錯誤を必要とします。優れたプルーニング指標を効率よく同定するために,遺伝的プログラミングを用いたシンボルプルーニング指標の自動検索フレームワークを開発した。特に、既存のプルーニング指標を含む精巧な探索空間を考案し、潜在的な記号的プルーニング指標を発見する。本稿では,人口の多様性を高めるための運用の簡易化戦略を提案する。このようにして、Pruner-Zeroはシンボリックプルーニングメトリクスの自動生成を可能にする。検索結果に基づいて, 刈り込み後の刈り出し指標と性能の相関について検討し, いくつかの原理を要約する。言語モデリングとゼロショットタスクにおけるLLaMAとLLaMA-2の広範囲な実験により,我々のPruner-Zeroは,SOTAポストトレーニングプルーニング法よりも優れた性能が得られることが示された。コードネームは \url{https://github.com/pprp/Pruner-Zero}。 Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# Syn2REAL: ASRドメイン適応における相違点の緩和のためのタスク算術の活用 SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation ( http://arxiv.org/abs/2406.02925v1 ) ライセンス: Link先を確認	Hsuan Su, Hua Farn, Shang-Tse Chen, Hung-yi Lee,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は「タスクベクトル」の概念を導入している。本稿では,テキストのみを対象とした自動音声認識(ASR)における領域適応のための新しいタスクベクトル「SYN2REAL」を提案する。従来の合成音声の微調整は、しばしば音響ミスマッチによる性能劣化をもたらす。この問題に対処するために、実音声と合成音声で微調整されたモデル間のパラメータ差を減じて「SYN2REAL」ベクトルを作成することを提案する。このベクトルは2つの領域間のギャップを効果的に埋める。 SLURPデータセットを用いた実験により,提案手法は未確認対象領域に対する単語誤り率を平均11.15%向上させ,音声領域適応性向上におけるタスクベクトルの可能性を強調した。 Recent advancements in large language models (LLMs) have introduced the 'task vector' concept, which has significantly impacted various domains but remains underexplored in speech recognition. This paper presents a novel 'SYN2REAL' task vector for domain adaptation in automatic speech recognition (ASR), specifically targeting text-only domains. Traditional fine-tuning on synthetic speech often results in performance degradation due to acoustic mismatches. To address this issue, we propose creating a 'SYN2REAL' vector by subtracting the parameter differences between models fine-tuned on real and synthetic speech. This vector effectively bridges the gap between the two domains. Experiments on the SLURP dataset demonstrate that our approach yields an average improvement of 11.15% in word error rate for unseen target domains, highlighting the potential of task vectors in enhancing speech domain adaptation.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# 多変量物理インフォームド・コンボリューション・オートエンコーダによる多変量制御 Multivariate Physics-Informed Convolutional Autoencoder for Anomaly Detection in Power Distribution Systems with High Penetration of DERs ( http://arxiv.org/abs/2406.02927v1 ) ライセンス: Link先を確認	Mehdi Jabbari Zideh, Sarika Khushalani Solanki,	(参考訳) サイバー物理事象下でのシステム状態の解析におけるディープラーニングモデルの絶え間ない進歩にもかかわらず、それらの能力は、データ可用性の問題、データ取得のコスト、およびトレーニングウィンドウ以外のデータの解釈と外挿の欠如により、電力システム領域において制限されている。さらに、風力や太陽光発電のような分散エネルギー資源(DER)の統合は、電力システムの複雑さと非線形性を高める。したがって、電力系統運用者の信頼性を高め、信頼性のある意思決定を行うための状況意識を高める必要がある。これにより、物理インフォームドニューラルネットワーク(PINN)モデルがより解釈可能で信頼性が高く、堅牢なモデルとして開発され、基礎となる原則法則がニューラルネットワークモデルのトレーニングプロセスに統合されて、パフォーマンスの向上を実現している。本稿では,多変量物理インフォームド畳み込みオートエンコーダ(PIConvAE)モデルを提案する。物理法則は、基礎となるキルヒホフの回路法則をオートエンコーダのトレーニングプロセスに組み込むカスタマイズされた損失関数によって統合される。多変量PIConvAEモデルの性能を,カリフォルニア州リバーサイドのIEEE 123バスシステムと実世界の給電網で評価した。その結果,両システムにおける各種サイバー異常の検出において,提案手法の異例な性能を示した。さらに、トレーニングデータ比率の異なるデータ不足シナリオにおいて、モデルの有効性を評価する。最後に、PIConvAEモデルは検出基準がかなり高い他のモデルを上回る既存の機械学習モデルと比較される。 Despite the relentless progress of deep learning models in analyzing the system conditions under cyber-physical events, their abilities are limited in the power system domain due to data availability issues, cost of data acquisition, and lack of interpretation and extrapolation for the data beyond the training windows. In addition, the integration of distributed energy resources (DERs) such as wind and solar generations increases the complexities and nonlinear nature of power systems. Therefore, an interpretable and reliable methodology is of utmost need to increase the confidence of power system operators and their situational awareness for making reliable decisions. This has led to the development of physics-informed neural network (PINN) models as more interpretable, trustworthy, and robust models where the underlying principled laws are integrated into the training process of neural network models to achieve improved performance. This paper proposes a multivariate physics-informed convolutional autoencoder (PIConvAE) model to detect cyber anomalies in power distribution systems with unbalanced configurations and high penetration of DERs. The physical laws are integrated through a customized loss function that embeds the underlying Kirchhoff's circuit laws into the training process of the autoencoder. The performance of the multivariate PIConvAE model is evaluated on two unbalanced power distribution grids, IEEE 123-bus system and a real-world feeder in Riverside, CA. The results show the exceptional performance of the proposed method in detecting various cyber anomalies in both systems. In addition, the model's effectiveness is evaluated in data scarcity scenarios with different training data ratios. Finally, the model's performance is compared with existing machine learning models where the PIConvAE model surpasses other models with considerably higher detection metrics.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# 拡散モデルを用いたゼロショット学習におけるデータ効率の探索 Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models ( http://arxiv.org/abs/2406.02929v1 ) ライセンス: Link先を確認	Zihan Ye, Shreyank N. Gowda, Xiaobo Jin, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang,	(参考訳) Zero-Shot Learning (ZSL) は、クラスレベルでのデータ効率を向上させることで、分類器が見えないクラスを識別できるようにすることを目的としている。これは、未確認クラスの事前に定義されたセマンティクスから画像特徴を生成することで実現される。しかし、現在のほとんどのアプローチは、見たクラスのサンプルの数に大きく依存している。本稿では,限られた例が一般的に生成モデルの性能低下をもたらすことを示す。これらの課題を克服するために,拡散型ZSLモデルであるZeroDiffを提案する。この統合されたフレームワークは拡散モデルを導入し、クラスレベルとインスタンスレベルのデータ効率を改善する。具体的には、例えば、ZeroDiffはフォワード拡散チェーンを使用して、制限されたデータを拡張されたノイズ付きデータに変換する。クラスレベルの有効性を得るために,拡散型特徴発生器(DFG)と拡散型表現発生器(DRG)からなる2分岐生成構造を設計する。 DFGはクロスエントロピーに基づく特徴分布の学習とサンプリングに重点を置いており、DRGは教師付きコントラストベース表現を学習し、DFGのゼロショット能力を高める。さらに,様々な側面から生成された特徴を評価するために3つの識別器を使用し,識別器間の知識の伝達にワッサーシュタイン距離に基づく相互学習損失を導入し,生成指導を強化する。一般的な3つのZSLベンチマークに関する広範な実験を通じて実証されたZeroDiffは、既存のZSLメソッドよりも大幅に改善されているだけでなく、トレーニングデータが少ない場合でも堅牢なパフォーマンスを維持している。コードは受理時にリリースされる。 Zero-Shot Learning (ZSL) aims to enable classifiers to identify unseen classes by enhancing data efficiency at the class level. This is achieved by generating image features from pre-defined semantics of unseen classes. However, most current approaches heavily depend on the number of samples from seen classes, i.e. they do not consider instance-level effectiveness. In this paper, we demonstrate that limited seen examples generally result in deteriorated performance of generative models. To overcome these challenges, we propose ZeroDiff, a Diffusion-based Generative ZSL model. This unified framework incorporates diffusion models to improve data efficiency at both the class and instance levels. Specifically, for instance-level effectiveness, ZeroDiff utilizes a forward diffusion chain to transform limited data into an expanded set of noised data. For class-level effectiveness, we design a two-branch generation structure that consists of a Diffusion-based Feature Generator (DFG) and a Diffusion-based Representation Generator (DRG). DFG focuses on learning and sampling the distribution of cross-entropy-based features, whilst DRG learns the supervised contrastive-based representation to boost the zero-shot capabilities of DFG. Additionally, we employ three discriminators to evaluate generated features from various aspects and introduce a Wasserstein-distance-based mutual learning loss to transfer knowledge among discriminators, thereby enhancing guidance for generation. Demonstrated through extensive experiments on three popular ZSL benchmarks, our ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Code will be released upon acceptance.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# P2PFormer:リモートセンシング画像から正規建物輪郭抽出のためのプリミティブ・ツー・ポリゴン法 P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images ( http://arxiv.org/abs/2406.02930v1 ) ライセンス: Link先を確認	Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji,	(参考訳) リモートセンシング画像から建物輪郭を抽出することは、複雑で多様な形状、閉塞、騒音のために重要な課題である。既存の方法は、しばしば不規則な輪郭、丸い角、冗長点に悩まされ、通常の多角形建築輪郭を生成するために広範囲な後処理を必要とする。これらの課題に対処するため,我々は,ポストプロセッシングを伴わずに通常の建物輪郭を生成する,新しい合理化パイプラインを導入する。我々のアプローチは、一般的な幾何学的プリミティブ(頂点、線、角を含むことができる)のセグメンテーションから始まり、次にそれらの列の予測を行う。これにより、セグメント化されたプリミティブを順次接続することで、通常の建物の輪郭を直接構築することができる。このパイプライン上に構築したP2PFormerは,変圧器をベースとしたアーキテクチャを用いて幾何学的プリミティブを分割し,その順序を予測する。プリミティブのセグメンテーションを強化するために,グループクエリと呼ばれるユニークな表現を導入する。この表現は、一連のクエリと特異なクエリ位置から構成され、プリミティブの複数のミドルポイントとその効率的なリンクに焦点を当てる。さらに,クエリ位置の埋め込みにおいて,クエリの焦点を適切な位置に絞ることを目的とした革新的な暗黙的な更新戦略を提案し,その結果,プリミティブセグメンテーションの質を高める。我々の実験は、P2PFormerがWHU、CrowdAI、WHU-Mixデータセットで新しい最先端のパフォーマンスを実現し、最大のCrowdAIデータセットでは2.7 APと6.5 AP75のマージンで以前のSOTA PolyWorldを上回ったことを示している。コードとトレーニングされた重量を公開して、それらの使用を促進し、さらなる研究を促進するつもりです。 Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined pipeline that generates regular building contours without post-processing. Our approach begins with the segmentation of generic geometric primitives (which can include vertices, lines, and corners), followed by the prediction of their sequence. This allows for the direct construction of regular building contours by sequentially connecting the segmented primitives. Building on this pipeline, we developed P2PFormer, which utilizes a transformer-based architecture to segment geometric primitives and predict their order. To enhance the segmentation of primitives, we introduce a unique representation called group queries. This representation comprises a set of queries and a singular query position, which improve the focus on multiple midpoints of primitives and their efficient linkage. Furthermore, we propose an innovative implicit update strategy for the query position embedding aimed at sharpening the focus of queries on the correct positions and, consequently, enhancing the quality of primitive segmentation. Our experiments demonstrate that P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets, surpassing the previous SOTA PolyWorld by a margin of 2.7 AP and 6.5 AP75 on the largest CrowdAI dataset. We intend to make the code and trained weights publicly available to promote their use and facilitate further research.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# 放射線誘導型マルチモーダルセルフアテンションネットワークによる乳腺MRIの病的完全反応の予測 Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI ( http://arxiv.org/abs/2406.02936v1 ) ライセンス: Link先を確認	Jonghun Kim, Hyunjin Park,	(参考訳) 乳癌は女性の間で最も多いがんであり、抗がん療法が患者の予後と治療のカスタマイズに不可欠である後、病理学的完全反応(pCR)を予測する。深層学習は、医用画像診断において、特に精度を高めるために複数の画像モダリティを利用する場合に、有望であることを示している。本研究では,ダイナミックコントラスト強調画像(DCE)とADCマップを用いた乳癌患者のpCR予測モデルを提案する。放射線学的特徴は腫瘍領域の手作りの特徴として確立されており、医用画像解析に有用である。本手法は, 腫瘍関連領域からの特徴抽出を誘導するために放射線を利用した自己注意機構を備えたエンコーダを用いて, DCE MRI と ADC から特徴抽出を行う。実験の結果,他のベースライン法と比較して,pCR予測におけるモデルの性能が優れていることが示された。 Breast cancer is the most prevalent cancer among women and predicting pathologic complete response (pCR) after anti-cancer treatment is crucial for patient prognosis and treatment customization. Deep learning has shown promise in medical imaging diagnosis, particularly when utilizing multiple imaging modalities to enhance accuracy. This study presents a model that predicts pCR in breast cancer patients using dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) and apparent diffusion coefficient (ADC) maps. Radiomics features are established hand-crafted features of the tumor region and thus could be useful in medical image analysis. Our approach extracts features from both DCE MRI and ADC using an encoder with a self-attention mechanism, leveraging radiomics to guide feature extraction from tumor-related regions. Our experimental results demonstrate the superior performance of our model in predicting pCR compared to other baseline methods.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# Adaptive Stepsizes を用いた分散ミニマックス最適化のためのニア最適収束の実現 Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes ( http://arxiv.org/abs/2406.02939v1 ) ライセンス: Link先を確認	Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu,	(参考訳) 本稿では,分散ミニマックス問題に適応的手法を直接適用することにより,局所的に計算された適応段階における不整合による非収束が生じることを示す。そこで我々はD-AdaSTを提案する。D-AdaSTはStepsize Trackingを用いた分散適応ミニマックス法である。鍵となる戦略は、2つの余分な(スカラー)変数の送信を含む適応的なステップサイズ追跡プロトコルを使用することである。このプロトコルは、ノードの段差間の整合性を保証し、バニラ分散適応法に存在するノード間の段差の調整の欠如による定常誤差を排除し、正確な収束を保証する。非凸-強凸分散ミニマックス問題に対しては、ステップサイズの時間スケールの分離とネットワークの準独立性を保証し、ほぼ最適収束率$\tilde{\mathcal{O}} \left( \epsilon ^{-\left(4+\delta \right)} \right)$を任意の小さな$\delta > 0$に対して設定する。我々の知る限り、D-AdaSTは非凸ミニマックス問題に対する問題依存パラメータを知らずにほぼ最適収束を達成する最初の分散適応手法である。我々の理論結果を検証するために大規模な実験を行った。 In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two extra (scalar) variables. This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence. For nonconvex-strongly-concave distributed minimax problems, we characterize the specific transient times that ensure time-scale separation of stepsizes and quasi-independence of networks, leading to a near-optimal convergence rate of $\tilde{\mathcal{O}} \left( \epsilon ^{-\left( 4+\delta \right)} \right)$ for any small $\delta > 0$, matching that of the centralized counterpart. To our best knowledge, D-AdaST is the first distributed adaptive method achieving near-optimal convergence without knowing any problem-dependent parameters for nonconvex minimax problems. Extensive experiments are conducted to validate our theoretical results.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# タスク指向クエリベンチマーク(ToQB) The Task-oriented Queries Benchmark (ToQB) ( http://arxiv.org/abs/2406.02943v1 ) ライセンス: Link先を確認	Keun Soo Yim,	(参考訳) タスク指向クエリ(ビデオ再生、注文食品、タクシー呼び出しなど)は、仮想アシスタントやチャットボット、その他のLLMベースのサービスの品質を評価する上で不可欠である。しかし、関連するNLP(Natural Language Processing)分野の既存のベンチマークは主にタスク指向の対話に焦点を当てているため、タスク指向クエリの標準ベンチマークはまだ利用できない。そこで本研究では,既存のタスク指向対話データセットとLLMサービスを用いて,タスク指向クエリベンチマーク(ToQB)を効率的に生成する手法を提案する。提案手法は,各対話における話者の本来の意図を要約するために基礎となるNLPタスクを定式化し,LLMサービスを用いて考案されたNLPタスクを実行するための重要なステップを詳述し,ベンチマーク生成プロセスの大部分を自動化するためのフレームワークの概要を述べる。 2つの単一タスクドメインと1つのマルチタスクドメインを含むケーススタディを通じて、これらの3つのドメインに対してLLMプロンプト(例えば、システム発話や話者ラベルを省略する)をカスタマイズし、生成されたタスク指向クエリを特徴付ける方法を示す。生成されたToQBデータセットが一般公開されている。さらに、コミュニティコントリビュータによるToQBに追加可能な新しいドメインとその実践的応用について論じる。 Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public. We further discuss new domains that can be added to ToQB by community contributors and its practical applications.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# 時間反転対称性の破れによるツイスト二層グラフェンのトポロジー形成 Shaping the topology of twisted bilayer graphene via time-reversal symmetry breaking ( http://arxiv.org/abs/2406.02947v1 ) ライセンス: Link先を確認	Cunyuan Jiang, Matteo Baggioli, Qing-Dong Jiang,	(参考訳) 対称性破砕は2次元層状材料の輸送特性とトポロジー特性を調整するための有効なツールである。これらの材料のうち、ツイスト二層グラフェン(TBG)は、トポロジカルな特徴と強く相関する電子的挙動の豊富な相互作用を特徴とする、新しい物理学のための有望なプラットフォームとして登場した。本研究では, 時間反転対称性の破れ(TRSB)を用いて, TBGの位相特性を制御した。 TRSBの強度を変動させることにより、反対のチャーン数を持つ一対の平坦なバンドを示す位相絶縁相と、平坦なバンドのチャーン数ではなくベリー曲率が消える新しい絶縁状態との間の位相相転移が発見された。このトポロジ的遷移は、$\Gamma$ポイントでのギャップ閉包によって媒介されることを示すとともに、ねじれ角、対称性破壊パラメータ、AとABの積層領域間のミスマッチ結合の関数として3次元位相図を構築する。最後に、この新しい電子相は、最低平坦帯のベリー双極子密度によって誘導される非量子化異常ホール導電率であるフェルミエネルギーの関数として測定することで、実験室で同定できることを示す。 Symmetry breaking is an effective tool for tuning the transport and topological properties of 2D layered materials. Among these materials, twisted bilayer graphene (TBG) has emerged as a promising platform for new physics, characterized by a rich interplay between topological features and strongly correlated electronic behavior. In this study, we utilize time-reversal symmetry breaking (TRSB) to manipulate the topological properties of TBG. By varying the strength of TRSB, we discover a topological phase transition between a topological insulating phase, which exhibits a pair of flat bands with opposite Chern numbers, and a novel insulating state where the Chern number, but not the Berry curvature, of the flat bands vanishes. We demonstrate that this topological transition is mediated by a gap closing at the $\Gamma$ point, and we construct a three-dimensional phase diagram as a function of the twisting angle, the symmetry-breaking parameter, and the mismatch coupling between AA and AB stacking regions. Finally, we show that this novel electronic phase can be identified in the lab by measuring, as a function of the Fermi energy, its non-quantized anomalous Hall conductivity that is induced by the Berry dipole density of the lowest flat bands.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# 4D ASR: CTC、アテンション、トランスデューサ、マスク予測デコーダを統合した共同ビームサーチ 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders ( http://arxiv.org/abs/2406.02950v1 ) ライセンス: Link先を確認	Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe,	(参考訳) エンドツーエンドの自動音声認識(E2E-ASR)は、コネクショニスト時間分類(CTC)、リカレントニューラルネットワークトランスデューサ(RNN-T)、アテンションベースのエンコーダデコーダ、マスク予測モデルなど、いくつかのネットワークアーキテクチャに分類される。それぞれのネットワークアーキテクチャにはアドバンテージとデメリットがあり、実践者はアプリケーション要求に応じてこれらの異なるモデルを切り替えることができます。異なるモデルを構築する代わりに、4つのデコーダ(CTC、RNN-T、アテンション、マスク予測)が同じエンコーダを共有するジョイントモデリングスキームを提案し、これを4Dモデリングと呼ぶ。 4Dモデルはマルチタスク学習を用いて訓練され、モデル正則化とモデルロバストネスの最大化を実現している。 4Dモデルを効率的に訓練するために,マルチタスク学習を安定化させる2段階のトレーニング戦略を導入する。さらに,3つのデコーダ(CTC,RNN-T,アテンション)を組み合わせることで,より高性能な1パスビーム探索アルゴリズムを提案する。これら3つのビームサーチアルゴリズムは、デコーダをプライマリデコーダとして使用する点で異なる。各アルゴリズムの性能と計算上のトレードオフを慎重に評価する。実験の結果, 共同で訓練した4Dモデルは, 1個のデコーダで訓練したE2E-ASRモデルよりも優れていた。さらに,提案した1パスビーム探索アルゴリズムは,提案したCTC/アテンションデコーディングよりも優れていることを示す。 End-to-end automatic speech recognition (E2E-ASR) can be classified into several network architectures, such as connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention-based encoder-decoder, and mask-predict models. Each network architecture has advantages and disadvantages, leading practitioners to switch between these different models depending on application requirements. Instead of building separate models, we propose a joint modeling scheme where four decoders (CTC, RNN-T, attention, and mask-predict) share the same encoder -- we refer to this as 4D modeling. The 4D model is trained using multitask learning, which will bring model regularization and maximize the model robustness thanks to their complementary properties. To efficiently train the 4D model, we introduce a two-stage training strategy that stabilizes multitask learning. In addition, we propose three novel one-pass beam search algorithms by combining three decoders (CTC, RNN-T, and attention) to further improve performance. These three beam search algorithms differ in which decoder is used as the primary decoder. We carefully evaluate the performance and computational tradeoffs associated with each algorithm. Experimental results demonstrate that the jointly trained 4D model outperforms the E2E-ASR models trained with only one individual decoder. Furthermore, we demonstrate that the proposed one-pass beam search algorithm outperforms the previously proposed CTC/attention decoding.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# AVFF:ビデオディープフェイク検出のためのオーディオ・ビジュアル機能融合 AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection ( http://arxiv.org/abs/2406.02951v1 ) ライセンス: Link先を確認	Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj,	(参考訳) ディープフェイクビデオコンテンツが急速に成長するにつれて、我々はそれらを検出するための改善された一般化可能な方法を必要としている。既存のほとんどの検出方法は、ユニモーダル・キューを使用するか、オーディオと視覚のモダリティの間の不協和を捉えるために教師付きトレーニングに依存している。前者は音声と視覚の対応を完全に無視しているが、後者はトレーニングコーパス内の音声と視覚の手がかりを識別することに重点を置いている。本稿では,2段階のクロスモーダル学習手法であるAudio-Visual Feature Fusion(AVFF)について述べる。第1段階では、実ビデオの自己監督による表現学習を追求し、本質的な音声と視覚の対応を捉えている。マルチモーダルな表現を抽出するために、コントラスト学習と自動符号化の目的を使い、新しい音声-視覚補間マスキングと特徴融合戦略を導入する。学習された表現は第2段階で調整され、実際のビデオと偽ビデオの両方で教師付き学習によってディープフェイク分類が追求される。大規模な実験と分析により,我々の新しい表現学習パラダイムは自然界において極めて差別的であることが示唆された。我々は、FakeAVCelebデータセットの98.6%の精度と99.1%のAUCを報告し、現在のオーディオ・ビジュアル・オブ・ザ・アートをそれぞれ14.9%、9.9%上回った。 With the rapid growth in deepfake video content, we require improved and generalizable methods to detect them. Most existing detection methods either use uni-modal cues or rely on supervised training to capture the dissonance between the audio and visual modalities. While the former disregards the audio-visual correspondences entirely, the latter predominantly focuses on discerning audio-visual cues within the training corpus, thereby potentially overlooking correspondences that can help detect unseen deepfakes. We present Audio-Visual Feature Fusion (AVFF), a two-stage cross-modal learning method that explicitly captures the correspondence between the audio and visual modalities for improved deepfake detection. The first stage pursues representation learning via self-supervision on real videos to capture the intrinsic audio-visual correspondences. To extract rich cross-modal representations, we use contrastive learning and autoencoding objectives, and introduce a novel audio-visual complementary masking and feature fusion strategy. The learned representations are tuned in the second stage, where deepfake classification is pursued via supervised learning on both real and fake videos. Extensive experiments and analysis suggest that our novel representation learning paradigm is highly discriminative in nature. We report 98.6% accuracy and 99.1% AUC on the FakeAVCeleb dataset, outperforming the current audio-visual state-of-the-art by 14.9% and 9.9%, respectively.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# GraphAlign: 機能アライメントによる複数グラフ上の1つのグラフニューラルネットワークの事前トレーニング GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment ( http://arxiv.org/abs/2406.02953v1 ) ライセンス: Link先を確認	Zhenyu Hou, Haozhan Li, Yukuo Cen, Jie Tang, Yuxiao Dong,	(参考訳) グラフ自己教師型学習(SSL)は、グラフ構造化データによるマイニングと学習をかなり約束する。しかし、グラフSSLにおける重要な課題は、異なるドメインにまたがるグラフ間の機能差にある。本研究では,豊富なノード特徴を持つグラフのコレクションに1つのグラフニューラルネットワーク(GNN)を事前学習し,事前学習したGNNを未知のグラフに適用することを目的とする。本稿では,既存のグラフSSLフレームワークにシームレスに統合可能な汎用GraphAlign法を提案する。異なるグラフにまたがる特徴分布を調整するために、GraphAlignは、機能エンコーディング、正規化のアライメント戦略を、機能レベルの混合モジュールとともに設計する。大規模な実験によると、GraphAlignは既存のグラフSSLフレームワークを使用して、複数のグラフにまたがる統一的で強力なGNNを事前トレーニングし、ドメイン内グラフとドメイン外グラフの両方でパフォーマンス上の優位性を示す。 Graph self-supervised learning (SSL) holds considerable promise for mining and learning with graph-structured data. Yet, a significant challenge in graph SSL lies in the feature discrepancy among graphs across different domains. In this work, we aim to pretrain one graph neural network (GNN) on a varied collection of graphs endowed with rich node features and subsequently apply the pretrained GNN to unseen graphs. We present a general GraphAlign method that can be seamlessly integrated into the existing graph SSL framework. To align feature distributions across disparate graphs, GraphAlign designs alignment strategies of feature encoding, normalization, alongside a mixture-of-feature-expert module. Extensive experiments show that GraphAlign empowers existing graph SSL frameworks to pretrain a unified and powerful GNN across multiple graphs, showcasing performance superiority on both in-domain and out-of-domain graphs.	翻訳日:2024-06-06 19:59:32 公開日:2024-06-05
# PrE-Text:LLM時代の私的フェデレーションデータに基づく言語モデル PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs ( http://arxiv.org/abs/2406.02958v1 ) ライセンス: Link先を確認	Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar,	(参考訳) オンデバイストレーニングは、現在、プライベートな分散ユーザデータ上で機械学習(ML)モデルをトレーニングするための最も一般的なアプローチである。それにもかかわらず、デバイス上でのトレーニングにはいくつかの欠点がある: (1) 多くのユーザデバイスはデバイス上で大きなモデルをトレーニングするには小さすぎる、(2)デバイス上でのトレーニングは通信と計算集約であり、(3)デバイス上でのトレーニングはデバッグとデプロイが困難である。これらの問題に対処するために、差分プライベート(DP)合成テキストデータを生成するPrE-Text(PrE-Text)を提案する。まず、複数のデータセットにまたがって、PrE-Text合成データによる小さなモデル(ユーザデバイスに適合するモデル)のトレーニングが、実際のプライバシー体制下でトレーニングされた小さなモデル(\epsilon=1.29$, $\epsilon=7.58$)よりも優れていることを示す。 9$\times$より少ないラウンド、6$\times$より少ないラウンドで、100$\times$より少ない通信で、これらの結果を達成する。第二に、PrE-TextのDP合成データに大規模なモデルを微調整することで、同じ種類のプライバシー予算でプライベートデータ上での大きな言語モデル(LLM)のパフォーマンスが向上する。これらの結果は、DP合成データのトレーニングが、プライベートな分散データ上でデバイス上でモデルをトレーニングするよりも、よりよい選択肢となることを示唆している。コードはhttps://github.com/houcharlie/PrE-Textで入手できる。 On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large models on-device, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. To address these problems, we propose Private Evolution-Text (PrE-Text), a method for generating differentially private (DP) synthetic textual data. First, we show that across multiple datasets, training small models (models that fit on user devices) with PrE-Text synthetic data outperforms small models trained on-device under practical privacy regimes ($\epsilon=1.29$, $\epsilon=7.58$). We achieve these results while using 9$\times$ fewer rounds, 6$\times$ less client computation per round, and 100$\times$ less communication per round. Second, finetuning large models on PrE-Text's DP synthetic data improves large language model (LLM) performance on private data across the same range of privacy budgets. Altogether, these results suggest that training on DP synthetic data can be a better option than training a model on-device on private distributed data. Code is available at https://github.com/houcharlie/PrE-Text.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 大規模言語モデルの逆モーメントマッチング蒸留 Adversarial Moment-Matching Distillation of Large Language Models ( http://arxiv.org/abs/2406.02959v1 ) ライセンス: Link先を確認	Chen Jia,	(参考訳) 知識蒸留(KD)は、より大きな教師モデルで学生モデルを指導し、大規模言語モデル(LLM)の計算と記憶効率を改善する実践的な利点を享受する上で、非常に効果的であることが示されている。 LLMの最先端KD法は、主に教師と学生の確率予測の間の明示的な分布距離の最小化に頼っている。本研究では,これらの強制行動のクローン化目的を最適化する代わりに,LLMのKDの模倣学習戦略を検討する。特に,教師の行動の行動価値モーメントをオン・アンド・オフ・ポリティクスの観点から一致させることにより,模倣ギャップを最小化する。このアクション値のモーメントマッチング目標を達成するために,モーメントマッチング距離を推定し,学生のポリシーを最適化して最小化するための逆トレーニングアルゴリズムを提案する。タスクに依存しない命令追従実験とタスク固有の実験の両方の結果は,本手法の有効性を実証し,新しい最先端性能を実現する。 Knowledge distillation (KD) has been shown to be highly effective in guiding a student model with a larger teacher model and achieving practical benefits in improving the computational and memory efficiency for large language models (LLMs). State-of-the-art KD methods for LLMs mostly rely on minimizing explicit distribution distance between teacher and student probability predictions. Instead of optimizing these mandatory behaviour cloning objectives, we explore an imitation learning strategy for KD of LLMs. In particular, we minimize the imitation gap by matching the action-value moments of the teacher's behavior from both on- and off-policy perspectives. To achieve this action-value moment-matching goal, we propose an adversarial training algorithm to jointly estimate the moment-matching distance and optimize the student policy to minimize it. Results from both task-agnostic instruction-following experiments and task-specific experiments demonstrate the effectiveness of our method and achieve new state-of-the-art performance.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# Docs2KG: 大規模言語モデルによる異種文書からの統一知識グラフ構築 Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models ( http://arxiv.org/abs/2406.02962v1 ) ライセンス: Link先を確認	Qiang Sun, Yuanyi Luo, Wenxiao Zhang, Sirui Li, Jichunyang Li, Kai Niu, Xiangrui Kong, Wei Liu,	(参考訳) 保守的な推定であっても、エンタープライズデータの80%は非構造化ファイルにあり、不均一なフォーマットに対応するデータレイクに格納されている。古典的な検索エンジンは、特に洞察の定式化のために検索と探索を行うタスクにおいて、情報検索のニーズを満たすことができない。言い換えれば、明確な検索キーワードは存在しない。知識グラフは、人間の認知負荷を減らす自然な視覚的魅力のため、異種データ統合と知識表現の勝者となる。本稿では,メール,Webページ,PDFファイル,Excelファイルなど,多種多様な非構造化文書からマルチモーダル情報を抽出するための新しいフレームワークであるDocs2KGを紹介する。動的に抽出されたキー情報を表す統一知識グラフを生成し、Docs2KGは文書データレイクの効率的なクエリと探索を可能にする。ドメイン固有のデータソースや事前設計されたスキーマにフォーカスする既存のアプローチとは異なり、Docs2KGは様々なドキュメント構造やコンテンツタイプに適応できる柔軟性と拡張性を備えたソリューションを提供する。提案フレームワークは、複数の下流タスクをサポートするデータ処理を統一し、ドメインの解釈性を改善した。 Docs2KGはhttps://docs2kg.ai4wa.comで公開されており、デモビデオはhttps://docs2kg.ai4wa.com/Videoで公開されている。 Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual appeals that reduce the human cognitive load, become the winning candidate for heterogeneous data integration and knowledge representation. In this paper, we introduce Docs2KG, a novel framework designed to extract multimodal information from diverse and heterogeneous unstructured documents, including emails, web pages, PDF files, and Excel files. Dynamically generates a unified knowledge graph that represents the extracted key information, Docs2KG enables efficient querying and exploration of document data lakes. Unlike existing approaches that focus on domain-specific data sources or pre-designed schemas, Docs2KG offers a flexible and extensible solution that can adapt to various document structures and content types. The proposed framework unifies data processing supporting a multitude of downstream tasks with improved domain interpretability. Docs2KG is publicly accessible at https://docs2kg.ai4wa.com, and a demonstration video is available at https://docs2kg.ai4wa.com/Video.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 負のプロンプトの影響を理解する:いつ、どのように影響をもたらすか? Understanding the Impact of Negative Prompts: When and How Do They Take Effect? ( http://arxiv.org/abs/2406.02965v1 ) ライセンス: Link先を確認	Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh,	(参考訳) 負のプロンプトの概念は、安定拡散のような条件付き生成モデルから生まれ、ユーザーは生成された画像から何を除外すべきかを指定できる。 %であり,有意な有効性を示した。負のプロンプトが広く使われているにもかかわらず、その固有のメカニズムはほとんど解明されていない。本稿では, 負のプロンプトがどのように作用するか, どのように作用するかを明らかにするための, 初めての総合的研究について述べる。我々の広範な経験的分析は、負のプロンプトの2つの主要な挙動を識別する。遅延効果: 正のプロンプトが対応するコンテンツをレンダリングした後、負のプロンプトの影響が観察される。 Deletion through Neutralization: Negativeは、肯定的なプロンプトを持つ潜在空間における相互キャンセル効果を通じて生成されたイメージから概念を削除する。これらの知見は、例えば、ネガティブなプロンプトは、単純な適応アルゴリズムによって、背景に最小限の変更を加えることで、オブジェクトの塗布を促進できることを示す。私たちの発見は、ネガティブなプロンプトの可能性に乗じて、コミュニティに貴重な洞察をもたらすだろうと考えています。 The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# グローバル教育におけるジェネレーティブAIとデジタルネオコロニアリズム : 平等なフレームワークを目指して Generative AI and Digital Neocolonialism in Global Education: Towards an Equitable Framework ( http://arxiv.org/abs/2406.02966v1 ) ライセンス: Link先を確認	Matthew Nyaaba, Alyson Leigh Wright, Gyu Lim Choi,	(参考訳) 本稿では、ジェネレーティブ・人工知能(GenAI)が西洋以外の社会に西洋のイデオロギーを課すのかを批判的に論じ、その固有のバイアスを通じて教育におけるデジタル新植民地主義を永続させ、これらの効果を緩和する戦略を提案する。我々の議論は、玄AIが西洋の学生に関係のある文化資料や事例を主に取り入れたコンテンツを作成し、西洋以外の背景から学生を遠ざけることによって、文化帝国主義を育むことができることを示した。また、GenAIによる西洋語の主な使用は、非支配的な言語を疎外し、教育コンテンツが先住民語話者に近づきにくくし、彼らの最初の言語で学ぶ能力に影響を及ぼす可能性がある。また、GenAIは、技術的に支配的な国家観を反映した内容やカリキュラムを多く生み出し、極端に専門化された土着の知識や実践を誇張している。さらに、GenAIへのアクセスコストは教育の不平等を増し、GenAIデータのコントロールは、地元の学生やコミュニティに利益をもたらすことなく商業的搾取につながる可能性がある。我々は、GenAI開発における文化的多様性と平等を優先する人間中心の改革、GenAIアプリケーション内の抑圧的構造を特定し解体する教育者や学生に権限を与える自由デザイン、将来の教育ニーズを満たすための調整可能なGenAIシステムを構築するための設計の展望、そして最後に、ネオコロニアルアウトプットの検索を効果的に促す技術について提案する。 This paper critically discusses how Generative Artificial Intelligence (GenAI) might impose Western ideologies on non-Western societies, perpetuating digital neocolonialism in education through its inherent biases and further suggests strategies to mitigate these effects. Our discussions demonstrated that GenAI can foster cultural imperialism by generating content that primarily incorporates cultural references and examples relevant to Western students, thereby alienating students from non-Western backgrounds. Also, the predominant use of Western languages by GenAI can marginalize non-dominant languages, making educational content less accessible to speakers of indigenous languages and potentially impacting their ability to learn in their first language. Additionally, GenAI often generates content and curricula that reflect the perspectives of technologically dominant countries, overshadowing marginalized indigenous knowledge and practices. Moreover, the cost of access to GenAI intensifies educational inequality and the control of GenAI data could lead to commercial exploitation without benefiting local students and their communities. We propose human-centric reforms to prioritize cultural diversity and equity in GenAI development; a liberatory design to empower educators and students to identify and dismantle the oppressive structures within GenAI applications; foresight by design to create an adjustable GenAI systems to meet future educational needs, and finally, effective prompting skills to reduces the retrieval of neocolonial outputs.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 3次元生成モデルのための階層型ガウスの逆生成 Adversarial Generation of Hierarchical Gaussians for 3D Generative Model ( http://arxiv.org/abs/2406.02968v1 ) ライセンス: Link先を確認	Sangeek Hyun, Jae-Pil Heo,	(参考訳) 3D生成適応ネットワーク(3D GAN)のほとんどの進歩はレイキャストベースのボリュームレンダリングに大きく依存しており、レンダリングコストが要求される。 1つの有望な代替手段は、ラスタライズベースの3Dガウススプラッティング(3D-GS)であり、より高速なレンダリング速度と明示的な3D表現を提供する。本稿では,Gaussianを3D GANの3次元表現として利用し,その効率的かつ明示的な特徴を活用する。しかし, 逆向きの枠組みでは, na\ 型ジェネレータアーキテクチャは訓練の不安定さに悩まされ, ガウスの規模を調節する能力が欠如している。このことは、ガウスの初期化位置に対する適切なガイダンスがないことと、彼らのスケールを適応的に管理する密度化によって、モデルのばらつきと視覚的アーティファクトをもたらす。これらの問題に対処するために、生成したガウスの位置とスケールを効果的に正規化する階層的マルチスケールガウス表現を持つジェネレータアーキテクチャを導入する。具体的には,より微細なガウスの階層を,粗いレベルと細かな3次元シーンの両方をモデル化し,より微細なガウスの位置を粗いレベルに近い位置に置くことで,より微細なガウスの階層を設計する。実験結果から,最先端の3D一貫したGANと同等の3D生成能力を持つGANと比較して,レンダリング速度(x100)が大幅に向上することが示された。プロジェクトページ: https://hse1032.github.io/gsgan.com Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a na\"ive generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. Project page: https://hse1032.github.io/gsgan.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 混合されていないフィルタ:大規模言語モデルの混合のための確率的フィルタリングに基づくオンラインゲーティング Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models ( http://arxiv.org/abs/2406.02969v1 ) ライセンス: Link先を確認	Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz,	(参考訳) 我々は、オンライン時系列予測タスクにおいて、LLM予測の最良の重み付けを各ステップで適応的に予測することで、N$の事前訓練されたエキスパート大規模言語モデル(LLM)を組み合わせるための形式化されたメカニズムであるMoE-Fを提案する。我々のメカニズムは,各専門家のランニング性能の条件情報を利用して,次のステップで時系列を予測するためのLLMの最適な組み合わせを予測する。静的(学習された)エキスパート混合法(MoE)から派生したMoE-Fでは、専門家を組み合わせるために時間適応確率的フィルタリング技術を採用している。専門家選択問題を有限状態空間、連続時間ハイデンマルコフモデル (HMM) としてフレーミングすることにより、ウーマン・シリャエフフィルタを利用することができる。提案手法はまず,それぞれのLLMに対応する$N$並列フィルタを構築する。各フィルタは、それらがアクセス可能な情報を考えると、LLMの最良の組み合わせを提案する。その後、N$フィルタ出力を集約して、集約されたLLMの損失に対する下限を最適化し、クローズドフォームで最適化し、アンサンブル予測器を生成する。 I)MoE-Fアルゴリズム -- プラグアンドプレイフィルタリングハーネスとしてデプロイ可能であること、(II)提案されたフィルタリングベースのゲーティングアルゴリズムの理論的最適性を保証すること、(III)MoE-Fが目覚ましい17%の絶対値と48.5%の相対的なF1測定値を持つ実世界の金融市場運動タスクにおいて、最先端の基盤およびMoE LLMを用いた経験的評価と改善結果。 We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound for the loss of the aggregated LLMs, which can be optimized in closed-form, thus generating our ensemble predictor. Our contributions here are: (I) the MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II) theoretical optimality guarantees of the proposed filtering-based gating algorithm, and (III) empirical evaluation and ablative results using state of the art foundational and MoE LLMs on a real-world Financial Market Movement task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# ガウス点雲のどの例外的な低次元射影が多項式時間で見つかるか。 Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time? ( http://arxiv.org/abs/2406.02970v1 ) ライセンス: Link先を確認	Andrea Montanari, Kangjie Zhou,	(参考訳) d$-次元標準ガウスベクトル $\boldsymbol{x}_1,\dots, \boldsymbol{x}_n$ が与えられたとき、その$m$-次元射影のすべての経験的分布の集合を考える。 Diaconis and Freedman (1984) は、$n/d\to \infty$ ならば、そのような分布は標準ガウス分布に収束することを示した。対照的に、比例漸近について研究し、$n,d\to \infty$を$n/d\to \alpha \in (0, \infty)$とする。この場合、典型的なランダム部分空間に沿ったデータポイントの射影は再びガウス的であるが、集合 $\mathscr{F}_{m,\alpha}$ は例外部分空間に対応する非ガウス分布を含む。統計物理学の非厳密な手法は、一般化されたパリの公式の言葉で$\mathscr{F}_{m,\alpha}$の間接的な特徴づけを与える。この式を厳密な基準で配置し、これらの射影が効率的に発見できるかどうかを理解するために、部分集合 $\mathscr{F}^{\rm alg}_{m,\alpha}\subseteq \mathscr{F}_{m,\alpha}$ を反復アルゴリズムのクラスで実現できる分布について研究する。この集合は確率的最適制御問題によって特徴づけられることを証明し、パリの公式を拡張する変分原理の観点からこの問題の双対的特徴付けを得る。副産物として、「一般化球面パーセプトロン」モデルを含むランダム最適化問題のクラスに対して計算的に達成可能な値を得る。 Given $d$-dimensional standard Gaussian vectors $\boldsymbol{x}_1,\dots, \boldsymbol{x}_n$, we consider the set of all empirical distributions of its $m$-dimensional projections, for $m$ a fixed constant. Diaconis and Freedman (1984) proved that, if $n/d\to \infty$, all such distributions converge to the standard Gaussian distribution. In contrast, we study the proportional asymptotics, whereby $n,d\to \infty$ with $n/d\to \alpha \in (0, \infty)$. In this case, the projection of the data points along a typical random subspace is again Gaussian, but the set $\mathscr{F}_{m,\alpha}$ of all probability distributions that are asymptotically feasible as $m$-dimensional projections contains non-Gaussian distributions corresponding to exceptional subspaces. Non-rigorous methods from statistical physics yield an indirect characterization of $\mathscr{F}_{m,\alpha}$ in terms of a generalized Parisi formula. Motivated by the goal of putting this formula on a rigorous basis, and to understand whether these projections can be found efficiently, we study the subset $\mathscr{F}^{\rm alg}_{m,\alpha}\subseteq \mathscr{F}_{m,\alpha}$ of distributions that can be realized by a class of iterative algorithms. We prove that this set is characterized by a certain stochastic optimal control problem, and obtain a dual characterization of this problem in terms of a variational principle that extends Parisi's formula. As a byproduct, we obtain computationally achievable values for a class of random optimization problems including `generalized spherical perceptron' models.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# Event3DGS: 高速エゴモーションのためのイベントベースの3Dガウススプレイティング Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion ( http://arxiv.org/abs/2406.02972v1 ) ライセンス: Link先を確認	Tianyi Xiong, Jiayi Wu, Botao He, Cornelia Fermuller, Yiannis Aloimonos, Heng Huang, Christopher A. Metzler,	(参考訳) 最近の3Dガウススプラッティング(3DGS)の出現は、明示的な点ベース表現の利点を生かし、新規ビュー合成のレンダリング速度と品質を大幅に向上させる。しかし, 実世界のロボット作業では, 高ダイナミックな動きや難解な照明条件の環境下での3次元放射場レンダリングが問題視されている。その理由は、高速な移動は現実のロボットの作業が一般的であり、それが動きのぼやけを引き起こし、再建された構造における不正確さとアーティファクトをもたらすからである。この問題を軽減するために,生イベントストリームからのみガウススプラッティングを学習する最初の方法であるEvent3DGSを提案する。イベントカメラの高時間分解能と明示的なポイントベース表現を利用することで、Event3DGSはイベントストリームのみから高速なエゴモーションの下で高忠実度3D構造を再構築することができる。スパーシリティを意識したサンプリングとプログレッシブトレーニングのアプローチにより、再構築の品質と一貫性が向上します。外観の忠実度をさらに高めるため, アクティベート可能なラスタライザに運動ぼけ形成過程を明示的に組み込んで, 限られたRGB画像と組み合わせて外観を洗練させる。複数のデータセットに対する大規模な実験は、既存のアプローチと比較してEvent3DGSのレンダリング品質が優れていることを検証する。 The recent emergence of 3D Gaussian splatting (3DGS) leverages the advantage of explicit point-based representations, which significantly improves the rendering speed and quality of novel-view synthesis. However, 3D radiance field rendering in environments with high-dynamic motion or challenging illumination condition remains problematic in real-world robotic tasks. The reason is that fast egomotion is prevalent real-world robotic tasks, which induces motion blur, leading to inaccuracies and artifacts in the reconstructed structure. To alleviate this problem, we propose Event3DGS, the first method that learns Gaussian Splatting solely from raw event streams. By exploiting the high temporal resolution of event cameras and explicit point-based representation, Event3DGS can reconstruct high-fidelity 3D structures solely from the event streams under fast egomotion. Our sparsity-aware sampling and progressive training approaches allow for better reconstruction quality and consistency. To further enhance the fidelity of appearance, we explicitly incorporate the motion blur formation process into a differentiable rasterizer, which is used with a limited set of blurred RGB images to refine the appearance. Extensive experiments on multiple datasets validate the superior rendering quality of Event3DGS compared with existing approaches, with over 95% lower training time and faster rendering speed in orders of magnitude.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 中国語における読みやすさ誘導Idiom-Aware Simplification (RISS) Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese ( http://arxiv.org/abs/2406.02974v1 ) ライセンス: Link先を確認	Jingshen Zhang, Xinglu Chen, Xinying Qiu, Zhimin Wang, Wenhe Feng,	(参考訳) 中国語の文の単純化は、大規模にラベル付けされたパラレルコーパスの欠如とイディオムの流行によって困難に直面している。これらの課題に対処するために、データ拡張技術と語彙単純化を組み合わせた新しいフレームワークである、可読性を考慮したIdiom-aware Simplification (RISS)を提案する。 RISSは,(1)高品質な文ペアをマイニングするRPS(Readability-Guided Paraphrase Selection)と,(2)慣用的表現の理解と単純化を促進するモデルであるIAS(Idiom-aware Simplification)の2つの重要なコンポーネントを導入している。マルチステージとマルチタスクの学習戦略を用いてRSSとIASを統合することで、RISSは2つの中国語文単純化データセットにおいて、従来の最先端の手法よりも優れています。さらに、RISSは小さなラベル付きデータセットを微調整することで、さらなる改善を実現している。我々のアプローチは、より効果的でアクセスしやすい中国語のテキストの単純化の可能性を示している。 Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sentence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the comprehension and simplification of idiomatic expressions. By integrating RPS and IAS using multi-stage and multi-task learning strategies, RISS outperforms previous state-of-the-art methods on two Chinese sentence simplification datasets. Furthermore, RISS achieves additional improvements when fine-tuned on a small labeled dataset. Our approach demonstrates the potential for more effective and accessible Chinese text simplification.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# DA-Flow:デュアルアテンション正規化フローによる骨格型ビデオ異常検出 DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection ( http://arxiv.org/abs/2406.02976v1 ) ライセンス: Link先を確認	Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, Yipeng Liu,	(参考訳) 時間的畳み込みネットワーク(TCN)とグラフ畳み込みネットワーク(GCN)の処理モジュールとしての連携は,骨格型ビデオ異常検出(SVAD)において有望な結果を示した。しかし,計算と記憶の複雑さが低い軽量モデルを維持するために,浅いGCNブロックとTCNブロックは,小さな受容場とクロス次元相互作用キャプチャの欠如によって制約される。この制限に対処するため,時空間データにおけるクロス次元相互作用関係を捉えるためのDAM (Dual Attention Module) という軽量モジュールを提案する。フレームアテンション機構を使用して、最も重要なフレームを識別し、スケルトンアテンション機構を使用して、最小パラメータとフロップで固定されたパーティション間の広範な関係をキャプチャする。さらに、DA-Flow(Dual Attention Normalizing Flow)は、GCNの後処理ユニットとして、正規化フローフレームワーク内でDAMを統合している。シミュレーションにより,提案手法は雑音や負のサンプルに対して頑健であることが示された。実験の結果, DA-Flowは, パラメータ数が最も少ないマイクロAUC測定値において, 既存の最先端(SOTA)法よりも競争力や性能に優れていた。さらに, トレーニングなしでも, スケルトンデータの次元的減少を伴わないランダムプロジェクションを用いることで, かなりの異常検出が可能であることが判明した。 Cooperation between temporal convolutional networks (TCN) and graph convolutional networks (GCN) as a processing module has shown promising results in skeleton-based video anomaly detection (SVAD). However, to maintain a lightweight model with low computational and storage complexity, shallow GCN and TCN blocks are constrained by small receptive fields and a lack of cross-dimension interaction capture. To tackle this limitation, we propose a lightweight module called the Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in spatio-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops. Furthermore, the proposed Dual Attention Normalizing Flow (DA-Flow) integrates the DAM as a post-processing unit after GCN within the normalizing flow framework. Simulations show that the proposed model is robust against noise and negative samples. Experimental results show that DA-Flow reaches competitive or better performance than the existing state-of-the-art (SOTA) methods in terms of the micro AUC metric with the fewest number of parameters. Moreover, we found that even without training, simply using random projection without dimensionality reduction on skeleton data enables substantial anomaly detection capabilities.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# Sparse Color-Code Net:エッジデバイス上でのリアルタイムRGBベースの6次元オブジェクトマップ推定 Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices ( http://arxiv.org/abs/2406.02977v1 ) ライセンス: Link先を確認	Xingjian Yang, Zhitao Yu, Ashis G. Banerjee,	(参考訳) ロボット工学や拡張現実のアプリケーションは、正確で効率的な6Dオブジェクトのポーズ推定にますます依存しているため、よりインタラクティブでレスポンシブなシステムでは、エッジデバイス上でのリアルタイムのパフォーマンスが要求される。提案するスパースカラーコードネット(SCCN)は,この要求に効果的に対応するために,明確かつ簡潔なパイプライン設計を具現化する。 SCCNはRGB画像中の対象オブジェクトに対して画素レベルの予測を行い、本質的なオブジェクト幾何学的特徴の空間を利用して、パースペクティブ-n-Point(PnP)計算プロセスを高速化する。さらに、新しいピクセルレベルの幾何学に基づくオブジェクト対称性表現を導入し、初期ポーズ予測とシームレスに統合し、対称オブジェクトの曖昧さに効果的に対処する。 SCCNは、NVIDIA Jetson AGX Xavierに対して、ベンチマークLINEMODデータセットとOcclusion LINEMODデータセットで、それぞれ19フレーム/秒(FPS)と6FPSの見積率を実現し、高い推定精度を連続的に維持する。 As robotics and augmented reality applications increasingly rely on precise and efficient 6D object pose estimation, real-time performance on edge devices is required for more interactive and responsive systems. Our proposed Sparse Color-Code Net (SCCN) embodies a clear and concise pipeline design to effectively address this requirement. SCCN performs pixel-level predictions on the target object in the RGB image, utilizing the sparsity of essential object geometry features to speed up the Perspective-n-Point (PnP) computation process. Additionally, it introduces a novel pixel-level geometry-based object symmetry representation that seamlessly integrates with the initial pose predictions, effectively addressing symmetric object ambiguities. SCCN notably achieves an estimation rate of 19 frames per second (FPS) and 6 FPS on the benchmark LINEMOD dataset and the Occlusion LINEMOD dataset, respectively, for an NVIDIA Jetson AGX Xavier, while consistently maintaining high estimation accuracy at these rates.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 自己監督型スケルトン行動表現学習 - ベンチマークとそれを超えるもの Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond ( http://arxiv.org/abs/2406.02978v1 ) ライセンス: Link先を確認	Jiahang Zhang, Lilang Lin, Shuai Yang, Jiaying Liu,	(参考訳) ラベル付きデータから有意義な事前表現を学習することを目的とした自己教師付き学習(SSL)は,ラベル効率のよい骨格に基づく行動理解に有効であることが証明されている。画像領域と異なり、骨格データは背景の手がかりや追加の時間次元が無く、スペーサー空間構造と多様な表現形式を有する。本研究では,空間的時間的運動表現学習におけるプレテキスト・タスク・デザインの課題について述べる。近年、スケルトンベースのSSLに多くの取り組みがなされており、目覚ましい進歩を遂げている。しかし、体系的で徹底的なレビューは依然として欠落している。本稿では,自己教師型骨格に基づく行動表現学習に関する総合的な調査を初めて実施する。文脈に基づく、生成的学習、および対照的な学習アプローチの分類に続き、既存の研究の徹底的なレビューとベンチマークを行い、将来可能な方向性について光を当てる。本研究は,ほとんどのSSL作業が単一パラダイム,単一レベルの学習表現に依存していることを実証し,動作認識タスクのみを用いて評価し,スケルトン型SSLモデルの一般化能力について検討した。この目的のために、複数のプレテキストタスクを統合し、異なる粒度の多目的表現を共同で学習し、下流タスクの一般化能力を大幅に向上させる、新しいスケルトン用SSL法が提案されている。 3つの大規模データセットによる大規模な実験により,提案手法は,認識,検索,検出,少数ショット学習など,様々な下流タスクにおいて優れた一般化性能を達成できることを示した。 Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for label-efficient skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial structures and diverse representation forms, with the absence of background clues and the additional temporal dimension. This presents the new challenges for the pretext task design of spatial-temporal motion representation learning. Recently, many endeavors have been made for skeleton-based SSL and remarkable progress has been achieved. However, a systematic and thorough review is still lacking. In this paper, we conduct, for the first time, a comprehensive survey on self-supervised skeleton-based action representation learning, where various literature is organized according to their pre-training pretext task methodologies. Following the taxonomy of context-based, generative learning, and contrastive learning approaches, we make a thorough review and benchmark of existing works and shed light on the future possible directions. Our investigation demonstrates that most SSL works rely on the single paradigm, learning representations of a single level, and are evaluated on the action recognition task solely, which leaves the generalization power of skeleton SSL models under-explored. To this end, a novel and effective SSL method for skeleton is further proposed, which integrates multiple pretext tasks to jointly learn versatile representations of different granularity, substantially boosting the generalization capacity for different downstream tasks. Extensive experiments under three large-scale datasets demonstrate that the proposed method achieves the superior generalization performance on various downstream tasks, including recognition, retrieval, detection, and few-shot learning.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 圧縮グラフニューラルネットワークによるオンラインサービスの効率的なユーザシーケンス学習 Efficient User Sequence Learning for Online Services via Compressed Graph Neural Networks ( http://arxiv.org/abs/2406.02979v1 ) ライセンス: Link先を確認	Yucheng Wu, Liyue Chen, Yu Cheng, Shuai Chen, Jinyu Xu, Leye Wang,	(参考訳) ユーザ行動シーケンスの学習は、オンライン不正取引検出機構など、さまざまなオンラインサービスにとって不可欠である。グラフニューラルネットワーク(GNN)は、モデルシーケンス関係に広く適用され、類似したシーケンスから情報を抽出している。ユーザ行動シーケンスのデータ量は、通常、オンラインアプリケーションでは巨大であるが、直接GNNモデルを適用すると、トレーニングと推論の段階でかなりの計算オーバーヘッドが発生し、オンラインサービスのリアルタイム要件を満たすことが困難になる。本稿では,グラフ圧縮技術を利用して効率問題を緩和する。具体的には、ユーザシーケンス表現学習のための関係モデリングにグラフ圧縮技術を導入するための、ECSeqと呼ばれる新しい統合フレームワークを提案する。 ECSeqの鍵となるモジュールはシーケンス関係モデリングであり、シーケンス表現学習を強化するためにシーケンス間の関係を探索し、グラフ圧縮アルゴリズムを用いて高い効率とスケーラビリティを実現する。 ECSeqはまた、プラグイン・アンド・プレイの特性を示し、修正することなく、シームレスにトレーニング済みのシーケンス表現モデルを拡張する。シーケンス分類と回帰タスクの両方に関する実証実験は、ECSeqの有効性を実証している。具体的には、合計10,000以上のシーケンスで数十秒のトレーニング時間と10^{-4}$ seconds/sampleで保存された推論時間により、ECSeqは、広く使用されているLSTMの予測R@P$_{0.9}$を$\sim 5\%$で改善する。 Learning representations of user behavior sequences is crucial for various online services, such as online fraudulent transaction detection mechanisms. Graph Neural Networks (GNNs) have been extensively applied to model sequence relationships, and extract information from similar sequences. While user behavior sequence data volume is usually huge for online applications, directly applying GNN models may lead to substantial computational overhead during both the training and inference stages and make it challenging to meet real-time requirements for online services. In this paper, we leverage graph compression techniques to alleviate the efficiency issue. Specifically, we propose a novel unified framework called ECSeq, to introduce graph compression techniques into relation modeling for user sequence representation learning. The key module of ECSeq is sequence relation modeling, which explores relationships among sequences to enhance sequence representation learning, and employs graph compression algorithms to achieve high efficiency and scalability. ECSeq also exhibits plug-and-play characteristics, seamlessly augmenting pre-trained sequence representation models without modifications. Empirical experiments on both sequence classification and regression tasks demonstrate the effectiveness of ECSeq. Specifically, with an additional training time of tens of seconds in total on 100,000+ sequences and inference time preserved within $10^{-4}$ seconds/sample, ECSeq improves the prediction R@P$_{0.9}$ of the widely used LSTM by $\sim 5\%$.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# テンソルポリノミアル付加モデル Tensor Polynomial Additive Model ( http://arxiv.org/abs/2406.02980v1 ) ライセンス: Link先を確認	Yang Chen, Ce Zhu, Jiani Liu, Yipeng Liu,	(参考訳) 追加モデルは、その明快さと単純さのために解釈可能な機械学習に使用できる。しかし、高次データに対する古典的なモデルでは、ベクトル化演算がデータ構造を乱すため、精度が劣化し、計算複雑性が増大する可能性がある。これらの問題に対処するために,テンソル多項式加算モデル(TPAM)を提案する。テンソル表現を持つ高次入力の多次元構造情報を保持する。モデルパラメータ圧縮は階層的および低次対称テンソル近似を用いて達成される。このように、複雑な高次特徴相互作用を少ないパラメータで捉えることができる。さらに、TPAMは、加法モデルの固有の解釈可能性を保持し、透過的な意思決定と意味のある特徴値の抽出を容易にする。さらに、TPAMの透明性と高次機能を扱う能力を活用し、クラスアクティベーションマップ用の2つの変種を導入することで、他の解釈モデルの後処理モジュールとして使用される。一連のデータセットによる実験結果から,TPAMは精度を最大30%向上し,圧縮速度を最大5倍向上し,良好な解釈性を維持した。 Additive models can be used for interpretable machine learning for their clarity and simplicity. However, In the classical models for high-order data, the vectorization operation disrupts the data structure, which may lead to degenerated accuracy and increased computational complexity. To deal with these problems, we propose the tensor polynomial addition model (TPAM). It retains the multidimensional structure information of high-order inputs with tensor representation. The model parameter compression is achieved using a hierarchical and low-order symmetric tensor approximation. In this way, complex high-order feature interactions can be captured with fewer parameters. Moreover, The TPAM preserves the inherent interpretability of additive models, facilitating transparent decision-making and the extraction of meaningful feature values. Additionally, leveraging TPAM's transparency and ability to handle higher-order features, it is used as a post-processing module for other interpretation models by introducing two variants for class activation maps. Experimental results on a series of datasets demonstrate that TPAM can enhance accuracy by up to 30\%, and compression rate by up to 5 times, while maintaining a good interpretability.	翻訳日:2024-06-06 19:49:25 公開日:2024-06-05
# 局所対グローバル解釈可能性:計算複雑性の観点から Local vs. Global Interpretability: A Computational Complexity Perspective ( http://arxiv.org/abs/2406.02981v1 ) ライセンス: Link先を確認	Shahaf Bassan, Guy Amir, Guy Katz,	(参考訳) 近年,様々なMLモデルの局所的およびグローバル的解釈可能性の研究が盛んに行われている。しかし、この分野でかなりの進歩があったにもかかわらず、多くの既知の結果は非公式のままであり、あるいは十分な数学的厳密さが欠如している。本稿では,計算複雑性理論を用いて,MLモデルの局所的および大域的視点を評価することにより,このギャップを埋める枠組みを提案する。まず,1)局所的な説明形式とグローバルな説明形式との二重性,(2)ある種のグローバルな説明形式の本質的な特異性という,分析に不可欠な2つの新しい洞察の証明を提案する。次に、線形モデル、(2)決定木、(3)ニューラルネットワークの3つのモデルタイプにまたがって、計算説明の複雑さを評価する。これらのモデルの局所的およびグローバル的解釈可能性に関する知見を提供する。例えば、P のような標準的な複雑性仮定の下では! NP = 線形モデルにおける大域的十分部分集合の選択は局所部分集合の選択よりも計算的に困難であることを示す。興味深いことに、ニューラルネットワークと決定木では、その逆が当てはまります。我々は,計算複雑性レンズによる説明可能性の検証が,MLモデル固有の解釈可能性をより厳密に把握する上で有効であることを示す。 The local and global interpretability of various ML models has been studied extensively in recent years. However, despite significant progress in the field, many known results remain informal or lack sufficient mathematical rigor. We propose a framework for bridging this gap, by using computational complexity theory to assess local and global perspectives of interpreting ML models. We begin by proposing proofs for two novel insights that are essential for our analysis: (1) a duality between local and global forms of explanations; and (2) the inherent uniqueness of certain global explanation forms. We then use these insights to evaluate the complexity of computing explanations, across three model types representing the extremes of the interpretability spectrum: (1) linear models; (2) decision trees; and (3) neural networks. Our findings offer insights into both the local and global interpretability of these models. For instance, under standard complexity assumptions such as P != NP, we prove that selecting global sufficient subsets in linear models is computationally harder than selecting local subsets. Interestingly, with neural networks and decision trees, the opposite is true: it is harder to carry out this task locally than globally. We believe that our findings demonstrate how examining explainability through a computational complexity lens can help us develop a more rigorous grasp of the inherent interpretability of ML models.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# FREA:適合性のある安全批判シナリオの実現可能性 FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality ( http://arxiv.org/abs/2406.02983v1 ) ライセンス: Link先を確認	Keyu Chen, Yuheng Lei, Hao Cheng, Haoran Wu, Wenchao Sun, Sifa Zheng,	(参考訳) 安全クリティカルシナリオの生成は、大規模に収集することが不可欠だが、自律走行車(AV)の堅牢性を評価する効果的な方法を提供する。既存の手法は、シナリオの自然性を維持しながら、データ駆動アプローチによるバランスを達成することを目的として、逆境の最適化に重点を置いている。しかし、逆境の適切な上限がなければ、シナリオは過剰な逆境を示し、避けられない衝突を引き起こす可能性がある。本稿では,AVの最大の実現可能な領域(LFR)を組み込んだ新たな安全クリティカルシナリオ生成手法であるFREAを紹介する。具体的には、FREAは最初、オフラインデータセットからAVのLFRをプリ計算する。その後、シーン内の重要な背景車両(CBV)を制御し、新しい実現可能性依存の目的関数を最大化することにより、敵対的かつAV可能なシナリオを生成する合理的な敵政策を学習する。広範囲にわたる実験は、FREAが安全クリティカルなシナリオを効果的に生成し、AVの実現性を確保しながら、かなりの近距離事象を発生させることを示した。一般化分析は、様々な代理AV法および交通環境におけるAV試験におけるFREAの堅牢性も確認する。 Generating safety-critical scenarios, which are essential yet difficult to collect at scale, offers an effective method to evaluate the robustness of autonomous vehicles (AVs). Existing methods focus on optimizing adversariality while preserving the naturalness of scenarios, aiming to achieve a balance through data-driven approaches. However, without an appropriate upper bound for adversariality, the scenarios might exhibit excessive adversariality, potentially leading to unavoidable collisions. In this paper, we introduce FREA, a novel safety-critical scenarios generation method that incorporates the Largest Feasible Region (LFR) of AV as guidance to ensure the reasonableness of the adversarial scenarios. Concretely, FREA initially pre-calculates the LFR of AV from offline datasets. Subsequently, it learns a reasonable adversarial policy that controls critical background vehicles (CBVs) in the scene to generate adversarial yet AV-feasible scenarios by maximizing a novel feasibility-dependent objective function. Extensive experiments illustrate that FREA can effectively generate safety-critical scenarios, yielding considerable near-miss events while ensuring AV's feasibility. Generalization analysis also confirms the robustness of FREA in AV testing across various surrogate AV methods and traffic environments.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 視覚表現強化のためのマルチインスタンス・ビジュアル・プロンプト・ジェネレータによる多モード大言語モデルの強化 Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment ( http://arxiv.org/abs/2406.02987v1 ) ライセンス: Link先を確認	Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang,	(参考訳) MLLM(Multimodal Large Language Models)は、様々な視覚言語タスクにおいて、視覚的表現をLLMと融合させることで、SOTAのパフォーマンスを達成している。本稿では,Q-formerのようなクエリベースのトランスフォーマーを用いたアダプタが,インスタンスの不均一性/相関を考慮せずに,簡易なマルチインスタンス学習手法であることを最初に確認する。次に、画像とパッチのインスタンス相関を利用して、リッチな視覚表現をLLMに組み込むMIVPG(Multi-instance Visual Prompt Generator)を提案する。異なるシナリオからの3つのパブリックビジョン言語(VL)データセットの定量評価は、提案したMIVPGがメインのVLタスクにおいてQ-formerを改善することを示す。 Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose a general component termed Multi-instance Visual Prompt Generator (MIVPG) to incorporate enriched visual representations into LLMs by taking advantage of instance correlation between images or patches for the same sample. Quantatitive evaluation on three public vision-language (VL) datasets from different scenarios shows that the proposed MIVPG improves Q-former in main VL tasks.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# Egocentric Video と Automated Annotation Strategy を用いた意味的トラバータビリティの学習 Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy ( http://arxiv.org/abs/2406.02989v1 ) ライセンス: Link先を確認	Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo,	(参考訳) 都市環境における信頼性の高い自律型ロボットナビゲーションには、シーンのセマンティック理解に基づいて、画像内のセマンティック・トラバース可能な地形を識別する能力が必要である。この推論能力はセマンティックトラバーサビリティに基づいており、テストドメイン上で微調整されたセマンティックセグメンテーションモデルを使用して頻繁に達成される。この微調整プロセスでは、ターゲットとなるロボットによる手動のデータ収集や、高額で計算不能な人間ラベル作成者によるアノテーションが伴うことが多い。本研究では,エゴセントリックなビデオと自動アノテーションプロセスを用いて,セマンティック・トラバーサビリティ・エデュメータをトレーニングするための効果的な手法を提案する。エゴセントリックなビデオは、歩行者の胸に装着されたカメラから収集される。次に、画像セグメンテーションにおける最近の基礎モデルとプロンプト技術を用いて、各ビデオフレームのセマンティックトラバーサビリティ領域を抽出し、セマンティックトラバーサビリティ推定器を訓練するためのデータセットを自動生成する。様々な都市シナリオを網羅した複数の国や都市で撮影されたビデオによる大規模な実験により,提案手法のスケーラビリティと一般化性を実証した。さらに、自律型ロボットナビゲーションの性能解析と実世界展開は、訓練されたセマンティック・トラバーサビリティ推定器が高度に正確であることを示し、多様なカメラ視点、計算学的軽量、実世界に適用できることを示した。要約ビデオはhttps://youtu.be/EUVoH-wA-lA.comで公開されている。 For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable. The summary video is available at https://youtu.be/EUVoH-wA-lA.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# バイオメディカル・言語知識を用いた全スライド画像からの遺伝的変異の予測 Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification ( http://arxiv.org/abs/2406.02990v1 ) ライセンス: Link先を確認	Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin,	(参考訳) スライド画像全体から遺伝子変異を予測することは、がんの診断には不可欠である。しかし、既存の作業トレーニング複数のバイナリ分類モデルは、以下の2つの課題に直面している。 (a)複数のバイナリ分類器の訓練は非効率であり、必然的にクラス不均衡の問題を引き起こす。 b) 遺伝子間の生物学的関係は見過ごされ, 予測性能が制限される。これらの課題に対処するために、遺伝子変異予測性能を改善するために、生物知識を改良したPathGenomic Multi-label Transformerを革新的に設計する。 BPGTは、まず2つの慎重に設計されたモジュールによって遺伝子前駆体を構成する新しい遺伝子エンコーダを確立する。 a) ノードの特徴を有する遺伝子グラフは、遺伝子の言語学的記述と癌表現型であり、そのエッジは遺伝子の経路関連と突然変異の相同性によってモデル化されている。 b)トランスフォーマーに基づくグラフ表現学習により、言語的および生医学的知識を遺伝子優先に融合させ、異なる遺伝子の突然変異間の本質的な関係を捉える知識関連モジュール。 BPGTはそれからラベルデコーダを設計し、最終的に2つの調整されたモジュールによる遺伝的突然変異予測を行う。 a) まず、WSIの臨界領域に遺伝子前駆体を融合させ、遺伝子ワイドな突然変異ロジットを得るモダリティ融合モジュール。 b) 識別能力を高めるため,変異状態の固有比較を強調した比較多ラベル損失について検討した。 The Cancer Genome Atlasベンチマークの十分な実験は、BPGTが最先端の技術を上回ることを示した。 Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 360度映像要約法のトレーニングと評価のための人間アノテーション付きビデオデータセット A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods ( http://arxiv.org/abs/2406.02991v1 ) ライセンス: Link先を確認	Ioannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris,	(参考訳) 本稿では,テレビやスマートフォンなどの従来のデバイスで使用可能な,360度映像コンテンツから2D映像要約への変換という,360度映像要約のための新しいデータセットを提案する。データセットには、トレーニングや360度ビデオ要約手法の客観的評価に使用可能な、地平の人間生成サマリーが含まれている。このデータセットを用いて、2次元ビデオ要約のために提案された2つの最先端要約手法を訓練・評価し、360度ビデオに特化された要約法と将来の比較のためのベースラインとして機能する。最後に,データアノテーションプロセスを容易にするために開発され,ビデオフラグメント選択に依存する他のアノテーション活動を支援するインタラクティブツールを提案する。 In this paper we introduce a new dataset for 360-degree video summarization: the transformation of 360-degree video content to concise 2D-video summaries that can be consumed via traditional devices, such as TV sets and smartphones. The dataset includes ground-truth human-generated summaries, that can be used for training and objectively evaluating 360-degree video summarization methods. Using this dataset, we train and assess two state-of-the-art summarization methods that were originally proposed for 2D-video summarization, to serve as a baseline for future comparisons with summarization methods that are specifically tailored to 360-degree video. Finally, we present an interactive tool that was developed to facilitate the data annotation process and can assist other annotation activities that rely on video fragment selection.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# マルチタスク最適化のためのタスク優先度の定量化 Quantifying Task Priority for Multi-Task Optimization ( http://arxiv.org/abs/2406.02996v1 ) ライセンス: Link先を確認	Wooseong Jeong, Kuk-Jin Yoon,	(参考訳) マルチタスク学習の目標は、単一の統合ネットワーク内で多様なタスクを学習することである。それぞれのタスクには独自の客観的機能があるため、トレーニング中に対立が生じ、結果として負の移動が発生する。以前の研究では、タスク間の共有パラメータにおけるこれらの矛盾する勾配を特定し、それらを同じ方向に認識しようとした。しかし,これらの最適化手法は,各パラメータの個々の寄与を正確に決定できないため,最適でないパレート解に導かれることが証明されている。本稿では,タスク間のパラメータ寄与を評価するタスク優先度の概念を提案する。タスクプライオリティを学習するために、バックプロパゲーション中のタスク固有の損失に影響されたパラメータ間のリンクに関連するコネクションの種類を同定する。接続の強さは、タスク優先度を決定するためにパラメータの大きさによって測定される。そこで本研究では,2段階からなるマルチタスク学習のための接続強度に基づく最適化手法を提案する。第1フェーズは、ネットワーク内でタスク優先度を学習し、第2フェーズは、この優先度を維持しながら勾配を変更する。これは最終的に、複数のタスクに対する新しいPareto最適解を見つけるのに繋がる。実験により,従来の勾配操作法と比較してマルチタスク性能が大幅に向上したことを示す。 The goal of multi-task learning is to learn diverse tasks within a single unified network. As each task has its own unique objective function, conflicts emerge during training, resulting in negative transfer among them. Earlier research identified these conflicting gradients in shared parameters between tasks and attempted to realign them in the same direction. However, we prove that such optimization strategies lead to sub-optimal Pareto solutions due to their inability to accurately determine the individual contributions of each parameter across various tasks. In this paper, we propose the concept of task priority to evaluate parameter contributions across different tasks. To learn task priority, we identify the type of connections related to links between parameters influenced by task-specific losses during backpropagation. The strength of connections is gauged by the magnitude of parameters to determine task priority. Based on these, we present a new method named connection strength-based optimization for multi-task learning which consists of two phases. The first phase learns the task priority within the network, while the second phase modifies the gradients while upholding this priority. This ultimately leads to finding new Pareto optimal solutions for multiple tasks. Through extensive experiments, we show that our approach greatly enhances multi-task performance in comparison to earlier gradient manipulation methods.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 残留接続と正規化は、GNNの過度なスムース化を確実に防ぐことができる Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs ( http://arxiv.org/abs/2406.02997v1 ) ライセンス: Link先を確認	Michael Scholkemper, Xinyi Wu, Ali Jadbabaie, Michael Schaub,	(参考訳) 残差接続と正規化層はグラフニューラルネットワーク(GNN)の標準設計選択となり、GNNにおける過度な問題を軽減するソリューションとして提案されている。しかし、これらの手法が理論的な観点から過大な問題を緩和するのにどのように役立つかはよく分かっていない。本研究では,残差接続層と正規化層を有する(線形化)GNNの形式的,正確な特徴付けを行う。私たちはそれを確立します (a) 残差接続の場合、各層に初期特徴を組み込むことで、信号がスムーズになるのを防ぎ、可能ノード表現のサブ空間を決定する。 b) バッチ正規化は、特徴行列の各列の個別再スケーリングによって出力埋め込み空間が1次元部分空間に完全に崩壊することを防ぐ。これにより、ノード表現がメッセージパッシング演算子の上位$k$固有空間に収束する。さらに, プロジェクションとして理解可能な正規化層の中心となるステップが, 関連情報が抽出しにくくなるように, メッセージパッシングにおいてグラフ信号を変化させることが示される。そこで我々は、グラフNormv2と呼ばれる新しい正規化層を導入し、中心となるステップを学習し、元のグラフ信号を望ましくない方法で歪ませないようにした。実験の結果,本手法の有効性が確認された。 Residual connections and normalization layers have become standard design choices for graph neural networks (GNNs), and were proposed as solutions to the mitigate the oversmoothing problem in GNNs. However, how exactly these methods help alleviate the oversmoothing problem from a theoretical perspective is not well understood. In this work, we provide a formal and precise characterization of (linearized) GNNs with residual connections and normalization layers. We establish that (a) for residual connections, the incorporation of the initial features at each layer can prevent the signal from becoming too smooth, and determines the subspace of possible node representations; (b) batch normalization prevents a complete collapse of the output embedding space to a one-dimensional subspace through the individual rescaling of each column of the feature matrix. This results in the convergence of node representations to the top-$k$ eigenspace of the message-passing operator; (c) moreover, we show that the centering step of a normalization layer -- which can be understood as a projection -- alters the graph signal in message-passing in such a way that relevant information can become harder to extract. We therefore introduce a novel, principled normalization layer called GraphNormv2 in which the centering step is learned such that it does not distort the original graph signal in an undesirable way. Experimental results confirm the effectiveness of our method.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 性能保証を用いたリスク回避型PMDPの簡易化 Simplification of Risk Averse POMDPs with Performance Guarantees ( http://arxiv.org/abs/2406.03000v1 ) ライセンス: Link先を確認	Yaacov Pariente, Vadim Indelman,	(参考訳) 部分的に観測可能な領域における不確実性の下でのリスク回避意思決定は、AIの基本的問題であり、信頼性の高い自律エージェントにとって不可欠である。この場合、値関数がリターンの条件値(CVaR)である場合、問題は部分的に観測可能なマルコフ決定プロセス(POMDP)を用いてモデル化される。 POMDPの最適解を計算することは、一般に計算的に計算可能である。本研究では,性能保証を提供しながら,値関数の評価を高速化する簡易化フレームワークを開発する。計算的に安価な信念-MDP遷移モデルを単純化し、例えば、より安価な観測モデルや遷移モデルに対応できると考えている。我々の貢献は、確率変数 Y を用いて確率変数 X の CVaR の有界化を可能にする CVaR の一般境界を含む。次に,POMDP設定におけるCVaR値関数のバウンダリを導出し,計算コストの低いMDP遷移モデルを用いて,計算コストのかかるモデルにリアルタイムでアクセスすることなく,値関数をバウンダリする方法を示す。次に,推定値に対する理論的性能保証を行う。本研究は,信念-MDP遷移モデルの一般化と,観測モデルと状態遷移モデルの両方を同時に簡易化するためのものである。 Risk averse decision making under uncertainty in partially observable domains is a fundamental problem in AI and essential for reliable autonomous agents. In our case, the problem is modeled using partially observable Markov decision processes (POMDPs), when the value function is the conditional value at risk (CVaR) of the return. Calculating an optimal solution for POMDPs is computationally intractable in general. In this work we develop a simplification framework to speedup the evaluation of the value function, while providing performance guarantees. We consider as simplification a computationally cheaper belief-MDP transition model, that can correspond, e.g., to cheaper observation or transition models. Our contributions include general bounds for CVaR that allow bounding the CVaR of a random variable X, using a random variable Y, by assuming bounds between their cumulative distributions. We then derive bounds for the CVaR value function in a POMDP setting, and show how to bound the value function using the computationally cheaper belief-MDP transition model and without accessing the computationally expensive model in real-time. Then, we provide theoretical performance guarantees for the estimated bounds. Our results apply for a general simplification of a belief-MDP transition model and support simplification of both the observation and state transition models simultaneously.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# EdgeSync: ビデオデータドリフトのための適応型継続的学習によるより高速なエッジモデル更新 EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift ( http://arxiv.org/abs/2406.03001v1 ) ライセンス: Link先を確認	Peng Zhao, Runchu Dong, Guiqin Wang, Cong Zhao,	(参考訳) リアルタイムビデオ分析システムは一般的に、レイテンシを低減するためにエッジデバイスに重みを減らしたモデルを配置する。映像コンテンツの特徴の分布は、様々な理由(光と天気の変化)によって変化し、既存のモデルの精度が低下し、この問題を解決するために、最近の研究は、遠隔サーバを用いて複雑なモデルの助けを借りて、エッジでの軽量モデルを継続的に訓練・適応するフレームワークを提案する。しかし、既存の分析アプローチでは、2つの課題が未解決のまま残されている: 第一に、再トレーニングタスクは計算集約的であり、大きなモデル更新遅延が発生する;第二に、新しいモデルは現在のビデオストリームのデータ配信に十分適合しないかもしれない。これらの課題に対処するため、EdgeSyncでは、タイムラインと推論結果の両方を考慮してサンプルをフィルタリングし、現在のビデオコンテンツとより関連性の高いトレーニングサンプルを作成し、更新遅延を低減し、トレーニングの質を向上させるとともに、モデルトレーニング時間と実行時のトレーニング順序を効率的に調整可能なトレーニング管理モジュールも設計する。複雑なシーンで実際のデータセットを評価することで、従来の手法に比べて約3.4%改善し、従来の手法に比べて約10%改善した。 Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency. The distribution of video content features may change over time for various reasons (i.e. light and weather change) , leading to accuracy degradation of existing models, to solve this problem, recent work proposes a framework that uses a remote server to continually train and adapt the lightweight model at edge with the help of complex model. However, existing analytics approaches leave two challenges untouched: firstly, retraining task is compute-intensive, resulting in large model update delays; secondly, new model may not fit well enough with the data distribution of the current video stream. To address these challenges, in this paper, we present EdgeSync, EdgeSync filters the samples by considering both timeliness and inference results to make training samples more relevant to the current video content as well as reduce the update delay, to improve the quality of training, EdgeSync also designs a training management module that can efficiently adjusts the model training time and training order on the runtime. By evaluating real datasets with complex scenes, our method improves about 3.4% compared to existing methods and about 10% compared to traditional means.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# Phy-Diff:拡散MRI合成のための物理誘導フールグラス拡散モデル Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis ( http://arxiv.org/abs/2406.03002v1 ) ライセンス: Link先を確認	Juanhua Zhang, Ruodan Yan, Alessandro Perelli, Xi Chen, Chao Li,	(参考訳) 拡散MRI(dMRI)は,取得コストの高い重要な神経画像撮影技術である。深層学習のアプローチは、dMRIの強化や、アンダーサンプルdMRIによる拡散バイオマーカーの予測に用いられている。より包括的な生のdMRIを生成するために,b-値とb-ベクトルを条件として含む生成的敵ネットワークに基づく手法が提案されているが,それらは不安定なトレーニングと望ましい多様性の欠如によって制限されている。新興拡散モデル(DM)は、生成性能を改善することを約束する。しかし、DMの条件付けに欠かせない情報、すなわちdMRIとホワイトマタートラクトの構造の物理原理を含めることは依然として困難である。本研究では,高画質のdMRIを生成する物理誘導拡散モデルを提案する。本モデルは拡散過程におけるノイズ進化におけるdMRIの物理原理を導入し,拡散モデル内にクエリに基づく条件付きマッピングを導入する。また,XTRACTアトラスを,アダプター技術を用いて,白質トラスの前駆体として導入した。以上の結果から,本手法は他の最先端手法よりも優れ,dMRI向上の可能性が示唆された。 Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and less desirable diversity. The emerging diffusion model (DM) promises to improve generative performance. However, it remains challenging to include essential information in conditioning DM for more relevant generation, i.e., the physical principles of dMRI and white matter tract structures. In this study, we propose a physics-guided diffusion model to generate high-quality dMRI. Our model introduces the physical principles of dMRI in the noise evolution in the diffusion process and introduce a query-based conditional mapping within the difussion model. In addition, to enhance the anatomical fine detials of the generation, we introduce the XTRACT atlas as prior of white matter tracts by adopting an adapter technique. Our experiment results show that our method outperforms other state-of-the-art methods and has the potential to advance dMRI enhancement.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# マルチモーダル感情分析におけるデータ整合性の評価 Evaluation of data inconsistency for multi-modal sentiment analysis ( http://arxiv.org/abs/2406.03004v1 ) ライセンス: Link先を確認	Yufei Wang, Mengyue Wu,	(参考訳) 感情意味の不整合は、マルチモーダル感情分析(MSA)におけるユビキタスな課題である。 MSAは、テキスト、オーディオ、ビデオなど、さまざまなモードで表現される感情を分析する。それぞれのモダリティは、人間の微妙でニュアンスな表現のために、感情の異なる側面を伝達し、不整合を招き、人工エージェントの予測を妨げる可能性がある。本研究では,従来のマルチモーダル感情分析モデルとマルチモーダル大言語モデル(MLLM)の性能評価を行う。本研究は,従来のモデルにおいて,意味的に矛盾するデータに直面する場合と,マルチモーダル感情分析におけるMLLMの欠点を指摘するものである。本研究は、新たな課題を提示し、感情分析システムの今後の発展に有用な洞察を提供する。 Emotion semantic inconsistency is an ubiquitous challenge in multi-modal sentiment analysis (MSA). MSA involves analyzing sentiment expressed across various modalities like text, audio, and videos. Each modality may convey distinct aspects of sentiment, due to subtle and nuanced expression of human beings, leading to inconsistency, which may hinder the prediction of artificial agents. In this work, we introduce a modality conflicting test set and assess the performance of both traditional multi-modal sentiment analysis models and multi-modal large language models (MLLMs). Our findings reveal significant performance degradation across traditional models when confronted with semantically conflicting data and point out the drawbacks of MLLMs when handling multi-modal emotion analysis. Our research presents a new challenge and offer valuable insights for the future development of sentiment analysis systems.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 有限サム最適化のための量子アルゴリズムと下界 Quantum Algorithms and Lower Bounds for Finite-Sum Optimization ( http://arxiv.org/abs/2406.03006v1 ) ライセンス: Link先を確認	Yexin Zhang, Chenyi Zhang, Cong Fang, Liwei Wang, Tongyang Li,	(参考訳) 有限サム最適化は機械学習に広く応用されており、サポートベクタマシンや回帰などの重要な問題をカバーしている。本稿では,量子コンピューティングによる有限サム最適化問題の解法について検討する。具体的には、$f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex function and $\psi\colon\mathbb{R}^d\to\mathbb{R}$ be $\mu$-strongly convex proximal functionとする。目標は、$F(\mathbf{x})=\frac{1}{n}\sum_{i=1}^n f_i(\mathbf{x})+\psi(\mathbf{x})$に対する$\epsilon$-最適化点を見つけることである。複雑性を持つ量子アルゴリズムに$\tilde{O}\big(n+\sqrt{d}+\sqrt{\ell/\mu}\big(n^{1/3}d^{1/3}+n^{-2/3}d^{5/6}\big)\big)$を与え、古典的強結合$\tilde{\Theta}\big(n+\sqrt{n\ell/\mu}\big)$を改善する。また、$d$ が十分大きいとき、量子下界 $\tilde{\Omega}(n+n^{3/4}(\ell/\mu)^{1/4})$ も証明する。我々の量子上界と下界はともに、$\psi$ が必ずしも強凸でない場合や、それぞれの$f_i$ がリプシッツであるが必ずしも滑らかでない場合にまで拡張できる。さらに、F$が非凸であるとき、我々の量子アルゴリズムは$\tilde{O}(n+\ell(d^{1/3}n^{1/3}+\sqrt{d})/\epsilon^2)$クエリを使って$\epsilon$-critial pointを見つけることができる。 Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $\psi\colon\mathbb{R}^d\to\mathbb{R}$ be a $\mu$-strongly convex proximal function. The goal is to find an $\epsilon$-optimal point for $F(\mathbf{x})=\frac{1}{n}\sum_{i=1}^n f_i(\mathbf{x})+\psi(\mathbf{x})$. We give a quantum algorithm with complexity $\tilde{O}\big(n+\sqrt{d}+\sqrt{\ell/\mu}\big(n^{1/3}d^{1/3}+n^{-2/3}d^{5/6}\big)\big)$, improving the classical tight bound $\tilde{\Theta}\big(n+\sqrt{n\ell/\mu}\big)$. We also prove a quantum lower bound $\tilde{\Omega}(n+n^{3/4}(\ell/\mu)^{1/4})$ when $d$ is large enough. Both our quantum upper and lower bounds can extend to the cases where $\psi$ is not necessarily strongly convex, or each $f_i$ is Lipschitz but not necessarily smooth. In addition, when $F$ is nonconvex, our quantum algorithm can find an $\epsilon$-critial point using $\tilde{O}(n+\ell(d^{1/3}n^{1/3}+\sqrt{d})/\epsilon^2)$ queries.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# BadAgent: LLMエージェントのバックドア攻撃の実施と活性化 BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents ( http://arxiv.org/abs/2406.03007v1 ) ライセンス: Link先を確認	Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian,	(参考訳) 大規模言語モデル(LLM)の繁栄により、ユーザ定義ツールセットでカスタマイズされたサービスを提供するために、強力なLLMベースのインテリジェントエージェントが開発された。 LLMエージェントを構築するための最先端の手法は、訓練されたLLMを採用し、エージェントタスクのデータに基づいてそれらをさらに微調整する。しかし,これらの手法は,バックドアデータを微調整してバックドアを埋め込む,BadAgentと呼ばれる様々なエージェントタスクに対して,提案したバックドア攻撃に対して脆弱であることを示す。テスト時には、攻撃者はエージェントの入力や環境にトリガーを表示することで、デプロイされたLLMエージェントを操作して有害な操作を実行することができる。驚いたことに、我々の提案した攻撃方法は信頼性のあるデータを微調整した後でも極めて堅牢である。バックドア攻撃は自然言語処理において広範囲に研究されてきたが、私たちの知る限り、外部ツールの使用許可によりより危険であるLSMエージェントでそれらを最初に研究する可能性がある。我々の研究は、信頼できないLSMやデータに基づいてLSMエージェントを構築することの明確なリスクを実証している。私たちのコードはhttps://github.com/DPamK/BadAgentで公開されています。 With the prosperity of large language models (LLMs), powerful LLM-based intelligent agents have been developed to provide customized services with a set of user-defined tools. State-of-the-art methods for constructing LLM agents adopt trained LLMs and further fine-tune them on data for the agent task. However, we show that such methods are vulnerable to our proposed backdoor attacks named BadAgent on various agent tasks, where a backdoor can be embedded by fine-tuning on the backdoor data. At test time, the attacker can manipulate the deployed LLM agents to execute harmful operations by showing the trigger in the agent input or environment. To our surprise, our proposed attack methods are extremely robust even after fine-tuning on trustworthy data. Though backdoor attacks have been studied extensively in natural language processing, to the best of our knowledge, we could be the first to study them on LLM agents that are more dangerous due to the permission to use external tools. Our work demonstrates the clear risk of constructing LLM agents based on untrusted LLMs or data. Our code is public at https://github.com/DPamK/BadAgent	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# DriVLMe: LLMをベースとした自律運転エージェントの身体的・社会的体験の向上 DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences ( http://arxiv.org/abs/2406.03008v1 ) ライセンス: Link先を確認	Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai,	(参考訳) ファンデーションモデル(FM)の最近の進歩は、自動運転の新しい展望を解き放ちつつあるが、これらの研究の実験的な設定は、予備的であり、過剰に単純化され、人間の環境における現実の運転シナリオの複雑さを捉えることができない。 FMエージェントが長距離航法タスクを自由対話で処理し、環境力学やタスク変更による予期せぬ状況に対処できるかは、まだ解明されていない。上記の課題に直面するFMの能力と限界を探るため,人間と自律走行車の自然かつ効果的なコミュニケーションを支援するビデオ言語モデルベースのエージェントであるDriVLMeを紹介した。シミュレーション環境における具体的体験と実際の人間対話による社会体験の両方からDriVLMeを開発する。 DriVLMeは、オープンループベンチマークとクローズドループヒューマンスタディの両方で競争性能を示す一方で、許容できない推論時間、不均衡なトレーニングデータ、視覚的理解の制限、マルチターンインタラクションによる課題、ロボット体験からの言語生成の簡略化、環境力学やタスク変更といった予期せぬ状況に対処する難しさなど、いくつかの制限と課題を明らかにします。 Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes.	翻訳日:2024-06-06 19:39:21 公開日:2024-06-05
# 解き放つ選択バイアス:大規模言語モデルにおける順序とトークン感度の探索 Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models ( http://arxiv.org/abs/2406.03009v1 ) ライセンス: Link先を確認	Sheng-Lun Wei, Cheng-Kuang Wu, Hen-Hsen Huang, Hsin-Hsi Chen,	(参考訳) 本稿では,Large Language Models (LLMs) における選択バイアスの現象を考察し,順序付きシーケンスから最適な選択肢を選択することをモデルが課題とする問題に焦点をあてる。 LLMの意思決定プロセスに大きな影響を与える、オプションの順序とトークンの使用に関するバイアスを掘り下げます。また、これらのバイアスの影響を、複数のモデルやタスクにまたがる広範な経験的分析を通じて定量化する。さらに,モデル性能を向上させるための緩和戦略を提案する。私たちの重要な貢献は3つあります。 1)LLMに対するオプションオーダーとトークンの影響を正確に定量化する。 2【トークンの影響を緩和し、堅牢性を高めるための秩序感度を高めるための戦略開発】 3)モデルとタスク間の感度を詳細に分析し,より安定かつ信頼性の高いLCMアプリケーションを選択問題に適用する。 In this paper, we investigate the phenomena of "selection biases" in Large Language Models (LLMs), focusing on problems where models are tasked with choosing the optimal option from an ordered sequence. We delve into biases related to option order and token usage, which significantly impact LLMs' decision-making processes. We also quantify the impact of these biases through an extensive empirical analysis across multiple models and tasks. Furthermore, we propose mitigation strategies to enhance model performance. Our key contributions are threefold: 1) Precisely quantifying the influence of option order and token on LLMs, 2) Developing strategies to mitigate the impact of token and order sensitivity to enhance robustness, and 3) Offering a detailed analysis of sensitivity across models and tasks, which informs the creation of more stable and reliable LLM applications for selection problems.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 量子コンピュータシミュレーションによるテンソル更新の簡易化 Simplification of tensor updates toward performance-complexity balanced quantum computer simulation ( http://arxiv.org/abs/2406.03010v1 ) ライセンス: Link先を確認	Koichi Yanagisawa, Aruto Hosaka, Tsuyoshi Yoshida,	(参考訳) テンソルネットワーク法は、量子多体スピンシステムの最適化問題から進化してきた。テンソルネットワークは現在、量子コンピュータシミュレーションにおいて強力なツールとみなされているが、テンソルを更新する際の複雑さの問題がまだ残っている。本研究は、テンソルネットワークに基づく量子コンピュータシミュレーションの文脈におけるテンソル更新の単純化について研究する。数値シミュレーションによると、単純更新と呼ばれる手法は、量子多体スピン系からもたらされ、忠実度と計算複雑性のバランスが良好である。 Tensor network methods have evolved from solving optimization problems in quantum many-body spin systems. While the tensor network is now regarded as a powerful tool in quantum computer simulation, there still exists a complexity issue in updating the tensors. This work studies the tensor updates simplification in the context of the tensor network based quantum computer simulation. According to the numerical simulations, a method called simple update, also originated in quantum many-body spin systems, shows a good balance of the fidelity and the computational complexity.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# トレーニングサンプルが説明に及ぼす影響の分析 Analyzing the Influence of Training Samples on Explanations ( http://arxiv.org/abs/2406.03012v1 ) ライセンス: Link先を確認	André Artelt, Barbara Hammer,	(参考訳) 説明可能なAI(XAI)は、意思決定を説明することによって、AIシステムの推論を分析する一般的な方法である。しかし、予期せぬ説明のようなケースでは、ユーザーは、観察された説明に責任がある活用されたトレーニングデータの特性など、この説明の原因について学習することに関心があるかもしれない。データ評価の領域では、データサンプルが与えられたモデルに与える影響を推定する最初のアプローチが提案されている。本研究では,モデル自体ではなく,モデル説明に対する単一サンプルの影響に関心があるため,若干異なるスタンスをとる。そこで本稿では,与えられた説明(あるいは関連量)に高い影響を与えるトレーニングデータサンプルを同定し,保護されたグループ間の関係のコスト差の特定の事例について検討する。そこで本研究では,そのような学習サンプルを同定するアルゴリズムを提案する。 EXplainable AI (XAI) constitutes a popular method to analyze the reasoning of AI systems by explaining their decision-making, e.g. providing a counterfactual explanation of how to achieve recourse. However, in cases such as unexpected explanations, the user might be interested in learning about the cause of this explanation -- e.g. properties of the utilized training data that are responsible for the observed explanation. Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model. In this work, we take a slightly different stance, as we are interested in the influence of single samples on a model explanation rather than the model itself. Hence, we propose the novel problem of identifying training data samples that have a high influence on a given explanation (or related quantity) and investigate the particular case of differences in the cost of the recourse between protected groups. For this, we propose an algorithm that identifies such influential training samples.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# ゼロショットロボットナビゲーションにおけるバランシング性能と効率性 Balancing Performance and Efficiency in Zero-shot Robotic Navigation ( http://arxiv.org/abs/2406.03015v1 ) ライセンス: Link先を確認	Dmytro Kuzmenko, Nadiya Shvai,	(参考訳) 本稿では,ロボット工学におけるオブジェクトゴールナビゲーションタスクに適用したビジョンランゲージフロンティアマップ(VLFM)の最適化研究について述べる。本研究は,視覚言語モデル,オブジェクト検出器,セグメンテーションモデル,マルチモーダル理解および視覚質問応答モジュールの効率と性能を評価する。 Habitat-Matterport 3Dデータセットの分割を$\textit{val-mini}$と$\textit{val}$を使って、限られたVRAMでデスクトップ上で実験を行います。本稿では,VLFM BLIP-2ベースラインよりも高い成功率(+1.55%)を実現するソリューションを提案する。本研究は, モデル性能と計算効率のバランスに関する知見を提供し, 資源限定環境における効率的な配置戦略を提案する。 We present an optimization study of the Vision-Language Frontier Maps (VLFM) applied to the Object Goal Navigation task in robotics. Our work evaluates the efficiency and performance of various vision-language models, object detectors, segmentation models, and multi-modal comprehension and Visual Question Answering modules. Using the $\textit{val-mini}$ and $\textit{val}$ splits of Habitat-Matterport 3D dataset, we conduct experiments on a desktop with limited VRAM. We propose a solution that achieves a higher success rate (+1.55%) improving over the VLFM BLIP-2 baseline without substantial success-weighted path length loss while requiring $\textbf{2.3 times}$ less video memory. Our findings provide insights into balancing model performance and computational efficiency, suggesting effective deployment strategies for resource-limited environments.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# DifAttack++: クロスドメインの階層的不整合特徴空間によるクエリ効率の良いブラックボックス逆攻撃 DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain ( http://arxiv.org/abs/2406.03017v1 ) ライセンス: Link先を確認	Jun Liu, Jiantao Zhou, Jiandian Zeng, Jinyu Tian,	(参考訳) 本研究は,高攻撃成功率(ASR)と良好な一般化性を備えた,効率的なスコアベースブラックボックス攻撃について検討する。我々は, 機能空間全体で動作する既存のものとは大きく異なる, \textbf{DifAttack++} と呼ばれる, \textbf{Di}sentangled \textbf{F}eature space と \textit{cross domain} に基づく新しい攻撃手法を設計する。具体的には、DifAttack++が最初にイメージの潜在機能を、特殊に設計された \textbf{H}ierarchical \textbf{D}ecouple-\textbf{F}usion (HDF) モジュールを備えたオートエンコーダを介して、画像の逆数機能(AF)と \textit{visual feature} (VF)に分解する。クリーンな画像のペアと、ホワイトボックスアタック手法を用いて利用可能なサロゲートモデルから生成されたその逆例(AE)を用いて、特徴のゆがみを実現するとともに、クリーンな画像領域と逆画像領域のオートエンコーダをそれぞれ訓練する。最終的に、ブラックボックス攻撃の段階では、DifAttack++は被害者モデルからのクエリフィードバックに従って、VFを変更せずに成功したAEが生成されるまで、AFを反復的に最適化する。広汎な実験結果から,本手法はSOTA法よりも優れたASRとクエリ効率を実現する一方で,AEsの視覚的品質も向上することが示された。コードはhttps://github.com/csjunjun/DifAttack.git.comで入手できる。 This work investigates efficient score-based black-box adversarial attacks with a high Attack Success Rate (ASR) and good generalizability. We design a novel attack method based on a \textit{Hierarchical} \textbf{Di}sentangled \textbf{F}eature space and \textit{cross domain}, called \textbf{DifAttack++}, which differs significantly from the existing ones operating over the entire feature space. Specifically, DifAttack++ firstly disentangles an image's latent feature into an \textit{adversarial feature} (AF) and a \textit{visual feature} (VF) via an autoencoder equipped with our specially designed \textbf{H}ierarchical \textbf{D}ecouple-\textbf{F}usion (HDF) module, where the AF dominates the adversarial capability of an image, while the VF largely determines its visual appearance. We train such autoencoders for the clean and adversarial image domains respectively, meanwhile realizing feature disentanglement, by using pairs of clean images and their Adversarial Examples (AEs) generated from available surrogate models via white-box attack methods. Eventually, in the black-box attack stage, DifAttack++ iteratively optimizes the AF according to the query feedback from the victim model until a successful AE is generated, while keeping the VF unaltered. Extensive experimental results demonstrate that our method achieves superior ASR and query efficiency than SOTA methods, meanwhile exhibiting much better visual quality of AEs. The code is available at https://github.com/csjunjun/DifAttack.git.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 古代中国語の文字をラディカル・レコンストラクションで解読する「Puzzle Pieces Picker」 Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction ( http://arxiv.org/abs/2406.03019v1 ) ライセンス: Link先を確認	Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu,	(参考訳) Oracle Bone Inscriptionsは、世界で最も古い書式の一つである。しかし、この時代の大きな古さのため、多くのOracle Bone Inscriptions (OBI) が未解読のままであり、今日の古生物学分野における世界的課題の1つとなっている。本稿では, 急進的再構成によりこれらの謎の文字を復号化するための新しい手法, Puzzle Pieces Picker (P$^3$) を提案する。 OBIを基本的なストロークとラジカルに分解し、Transformerモデルを使用して、それらをモダンな(conterpart)\textcolor{blue}{counterparts}に再構築し、古代のスクリプト分析の画期的なソリューションを提供します。この取り組みをさらに進めるために、7つの重要な歴史的段階から大量の文字画像を集め、詳細なラジカル配列を付加した新しい古代中国語の文字パズル(ACCP)データセットが開発された。この実験は、古代中国のスクリプトの複雑さの解読における我々のアプローチの可能性と有効性について、かなり有望な洞察を示してきた。この新たなデータセットと方法論を通じて、従来の文献学と近代文書分析のギャップを埋めることを目指しており、中国の言語遺産の豊富な歴史に対する新たな洞察を提供する。 Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 中性基底状態パラヘリウムに対するシュロディンガー方程式の解析解 An analytical solution of the Schrodinger equation for the neutral ground state Para Helium ( http://arxiv.org/abs/2406.03020v1 ) ライセンス: Link先を確認	Frank Kowol,	(参考訳) 本報告では, シュロディンガー方程式の解析解と, その対応する波動関数について, 基底状態における中性ヘリウム原子, パラヘリウム様原子について述べる。 s=0 と l=0 の2つの電子の状態関数とその境界条件を詳細に検討する。さらに、クーロンと交換相互作用からなる一般的な電子ポテンシャルを記述する方法が導出され、結果として得られるポテンシャル関数がポテンシャル項としてシュロディンガー方程式に統合される。さらに、真空偏極効果による電子の電磁結合の変化を調査し、ラプラス変換を用いて中性パラヘリウムに対するシュロディンガー方程式を解く。すると基底状態のエネルギーが決定され、電子が点状粒子であると仮定できるという事実から、文献値と一致することが示される。これらの研究の文脈では、電子の空間次元に対する上限推定は、2つの電子間の安定結合状態の最小距離の存在と同様に与えられるが、これは絡み合った状態と解釈できる。ヘリウム原子の波動関数は、水素原子の既知の溶液と比較され、2つの重要な相違が解決される。 This report presents the analytical solution of the Schrodinger equation and its corresponding wave function for the neutral para-helium or para-helium-like atoms in the ground state. The state functions of the two electrons for s=0 and l=0 as well as their boundary conditions are examined in detail. Furthermore, a method for describing a generic electron potential consisting of Coulomb and exchange interactions is derived, and the resulting potential function is integrated into the Schrodinger equation as a potential term. In addition, the altered electromagnetic coupling of the electrons due to vacuum polarization effects is investigated and finally the Schrodinger equation for the neutral Para-Helium is solved using Laplace transformations. The energy in the ground state is then determined , and it can be shown that this agrees with the literature values given the fact that the electron can be assumed to be a point-like particle. In the context of these investigations, an upper limit estimation for the spatial dimension of the electron can also be given as well as the existence of a minimal distance of a stable bonding state between two electrons, which can be interpreted as an entangled state; in addition, the chemical inertness of helium with regard to chemical reactions-i.e. the principle of the "closed" electron shell-can be made plausible by the quantum mechanical electron configuration and its consequences with regard to binding energy. The wave function found for the helium atom is compared with the known solutions for the hydrogen atom, and essential differences between the two are worked out.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 非エルミート量子系におけるキラル状態転移と非相互状態転移の動的トポロジー Dynamical topology of chiral and nonreciprocal state transfers in a non-Hermitian quantum system ( http://arxiv.org/abs/2406.03026v1 ) ライセンス: Link先を確認	Pengfei Lu, Yang Liu, Qifeng Lao, Teng Liu, Xinxin Rao, Ji Bian, Hao Wu, Feng Zhu, Le Luo,	(参考訳) 位相現象の基礎となる基本的な概念は、固有状態に関連する幾何学的位相を仮定する。この一般的な概念とは対照的に、時変ハミルトニアンの理論的研究はトポロジカル・ダイナミクスとして知られる新しいタイプのトポロジカル現象を許容し、進化過程は連続フローに付随する隠れトポロジカル不変性を許容する。この予想を検証するために、非エルミート・ハミルトニアンの例外点(EP)を閉じ込めたイオン系に囲むことで、トポロジカルなカイラルと非相互ダイナミクスを研究する。これらの力学は、散逸によって引き起こされる非断熱過程においても、外部の摂動に対して位相的に堅牢である。本研究は,非エルミタンバンド構造が平行輸送された固有ベイシスにおいてエネルギー分散にともなうトポロジカル不変量である動的渦によって保護されていることを示唆する。トポロジカルダイナミクスの対称性の破れや他の重要な特徴は、量子状態トモグラフィーによって直接観察される。この結果は、オープン量子系のトポロジカルな性質を探求するための重要なステップである。 The fundamental concept underlying topological phenomena posits the geometric phase associated with eigenstates. In contrast to this prevailing notion, theoretical studies on time-varying Hamiltonians allow for a new type of topological phenomenon, known as topological dynamics, where the evolution process allows a hidden topological invariant associated with continuous flows. To validate this conjecture, we study topological chiral and nonreciprocal dynamics by encircling the exceptional points (EPs) of non-Hermitian Hamiltonians in a trapped ion system. These dynamics are topologically robust against external perturbations even in the presence dissipation-induced nonadiabatic processes. Our findings indicate that they are protected by dynamical vorticity -- an emerging topological invariant associated with the energy dispersion of non-Hermitian band structures in a parallel transported eigenbasis. The symmetry breaking and other key features of topological dynamics are directly observed through quantum state tomography. Our results mark a significant step towards exploring topological properties of open quantum systems.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# ベルの理論に反する Against Bell's Theorem ( http://arxiv.org/abs/2406.03028v1 ) ライセンス: Link先を確認	Andrea Aiello,	(参考訳) ベルの定理は、量子力学と局所的で現実的な隠れ変数理論の間の矛盾を証明していると考えられている。本稿ではベルの定理を証明しようとするすべての実験がこの目標を達成できないことを示す。我々の結論は、これらの実験の結果の直接的な統計的分析に基づいている。この研究の鍵となるツールは確率論であり、特に、そのような実験の結果を定量化する二コトミックな確率変数に対するサンプル空間の概念である。また、ベルの定理の実験的な証明は原理的には不可能ではないが、この目的を達成するために一般的に用いられるものとは全く異なる実験装置を必要とすることも示している。我々の研究の主な成果は、利用可能な実験データに基づいて、局所的な現実的な隠れ変数理論を排除できないことである。 Bell's theorem supposedly demonstrates an irreconcilable conflict between quantum mechanics and local, realistic hidden variable theories. In this paper we show that all experiments that aim to prove Bell's theorem do not actually achieve this goal. Our conclusions are based on a straightforward statistical analysis of the outcomes of these experiments. The key tool in our study is probability theory and, in particular, the concept of sample space for the dichotomic random variables that quantifies the outcomes of such experiments. We also show that an experimental proof of Bell's theorem is not, in principle, impossible, but it would require a completely different experimental apparatus than those commonly used to allegedly achieve this objective. The main consequence of our work is that we cannot dismiss local realistic hidden variable theories on the basis of the available experimental data.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# ターザンからトールキンへ:コンテンツ生成のためのLLMの言語習熟度制御 From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation ( http://arxiv.org/abs/2406.03030v1 ) ライセンス: Link先を確認	Ali Malik, Stephen Mayhew, Chris Piech, Klinton Bicknell,	(参考訳) 本研究では,言語学習者などエンドユーザーが十分に熟練していない状況において,Large Language Models (LLM) が生成するテキストの難易度を制御する問題について検討する。 GPT-4 と LLama2-7B や Mistral-7B といったオープンソースの代替品を併用した,少数ショットプロンプト,教師付き微調整,強化学習 (RL) など,この課題に対するいくつかの重要なアプローチの有効性を評価する。この結果から,プロンプトベース戦略を用いた場合,GPT-4とオープンソースモデルの間に大きな性能差があることが判明した。しかし、このギャップをファインタニングとRLアライメントの慎重に組み合わせて橋渡しする方法を示す。我々の最良のモデルであるCALM (CEFR-Aligned Language Model) は、GPT-4やその他の戦略の性能をほんの少しのコストで上回ります。我々は、小規模の人間による研究を通じて、結果の質をさらに検証する。 We study the problem of controlling the difficulty level of text generated by Large Language Models (LLMs) for contexts where end-users are not fully proficient, such as language learners. Using a novel framework, we evaluate the effectiveness of several key approaches for this task, including few-shot prompting, supervised finetuning, and reinforcement learning (RL), utilising both GPT-4 and open source alternatives like LLama2-7B and Mistral-7B. Our findings reveal a large performance gap between GPT-4 and the open source models when using prompt-based strategies. However, we show how to bridge this gap with a careful combination of finetuning and RL alignment. Our best model, CALM (CEFR-Aligned Language Model), surpasses the performance of GPT-4 and other strategies, at only a fraction of the cost. We further validate the quality of our results through a small-scale human study.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# ゼロショット学習のためのプロンプト・ツー・プロンプト生成の指導 Instructing Prompt-to-Prompt Generation for Zero-Shot Learning ( http://arxiv.org/abs/2406.03032v1 ) ライセンス: Link先を確認	Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao,	(参考訳) ゼロショット学習(ZSL)は、目に見えないカテゴリを分類するために、目に見えないカテゴリから移行した包括的知識を発見するために、意味と視覚の相互作用を探索することを目的としている。近年、ZSLでは、多様な視覚概念を下流タスクにゼロショットで転送できるなど、迅速なエンジニアリングが実現している。しかし、これらの方法はまだ広く見えない領域に対して十分に一般化されていない。主な理由は、学習可能なプロンプトが学習時に観察される主要な視覚的特徴を過度に強調する傾向があるためである。本稿では, 包括的伝達可能な知識発見のために, 命令追従手法を更に取り入れることで, この問題に対処する。 P2Pのコアとなるのは、アクセシブル条件付き視覚特徴とモーダル共有セマンティック概念に関するテキスト命令からセマンティック関連インストラクションを抽出し、学習したインストラクションプロンプトのガイダンスで視覚表現を逆修正することである。これにより、視覚的詳細の欠如に対する補償が一次文脈に課せられ、また、目に見えない領域の一般化によって、モデアルの相違が解消される。実験により,P2Pが最先端手法よりも優れた性能を発揮することを示す。 Zero-shot learning (ZSL) aims to explore the semantic-visual interactions to discover comprehensive knowledge transferred from seen categories to classify unseen categories. Recently, prompt engineering has emerged in ZSL, demonstrating impressive potential as it enables the zero-shot transfer of diverse visual concepts to downstream tasks. However, these methods are still not well generalized to broad unseen domains. A key reason is that the fixed adaption of learnable prompts on seen domains makes it tend to over-emphasize the primary visual features observed during training. In this work, we propose a \textbf{P}rompt-to-\textbf{P}rompt generation methodology (\textbf{P2P}), which addresses this issue by further embracing the instruction-following technique to distill instructive visual prompts for comprehensive transferable knowledge discovery. The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts and then inversely rectify the visual representations with the guidance of the learned instruction prompts. This enforces the compensation for missing visual details to primary contexts and further eliminates the cross-modal disparity, endowing unseen domain generalization. Through extensive experimental results, we demonstrate the efficacy of P2P in achieving superior performance over state-of-the-art methods.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 最適マルチフィデリティベストアーム同定 Optimal Multi-Fidelity Best-Arm Identification ( http://arxiv.org/abs/2406.03033v1 ) ライセンス: Link先を確認	Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli,	(参考訳) バンディットのベストアーム識別において、アルゴリズムは、できるだけ早く特定の精度で、最高平均報酬の腕を見つけることを任務とする。そこで本研究では,低忠実度(正確な平均推定値を持たない)の腕を低コストでサンプリングするアルゴリズムを提案する。この問題に対処するためのいくつかの方法が提案されているが、その最適性は、特に最適な腕を特定するのに必要な総コストのゆるやかな下限のため、未解決のままである。最初のコントリビューションは、コストの複雑さに対する厳密でインスタンス依存の低いバウンダリです。下界に特徴付けられる最適化問題の研究は、計算効率の良いアルゴリズムを考案するための新たな洞察を与え、漸近的に最適なコスト複雑性を持つ勾配に基づくアプローチを提案する。実験における既存手法と比較して,新しいアルゴリズムの利点を実証する。私たちの理論的および経験的な発見は、各腕に最適な忠実さという興味深い概念にも光を当てました。 In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# Follow-Your-Pose v2:Stable Pose Controlのためのマルチコンディション誘導文字アニメーション Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control ( http://arxiv.org/abs/2406.03035v1 ) ライセンス: Link先を確認	Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo,	(参考訳) ソーシャルメディアプラットフォームにおける自動広告やコンテンツ作成などの分野への広範な応用により、ポーズコントロール可能なキャラクタビデオ生成が要求されている。ポーズシーケンスと参照画像を用いた既存のキャラクタ画像アニメーション手法は有望なパフォーマンスを示しているが、複数のキャラクタアニメーションやボディーオブクルージョンといった複雑なシナリオでは、非一貫性のアニメーションに苦労する傾向がある。さらに、現在の方法では、トレーニングデータセットとして安定したバックグラウンドと時間的一貫性を備えた大規模な高品質なビデオが要求される。これら2つの課題は、文字画像アニメーションツールの実用化を妨げている。本稿では,インターネット上で容易に利用できるノイズの多いオープンソースビデオに基づいてトレーニング可能な,実用的で堅牢なフレームワークFollow-Your-Pose v2を提案する。マルチコンディションガイドは,背景安定性,マルチキャラクタ生成時の身体閉塞,キャラクタの外観の整合性といった課題に対処するように設計されている。さらに,マルチキャラクタポーズアニメーションの公平な評価のギャップを埋めるために,約4,000フレームからなる新しいベンチマークを提案する。大規模な実験により、我々の手法は2つのデータセットと7つのメトリクスで35\%以上のマージンで最先端の手法より優れていることが示された。一方, 質的評価では, 生成ビデオの品質が著しく向上し, 特に複雑な背景やマルチキャラクタの身体閉塞などのシナリオにおいて, アプローチの優位性が示唆された。 Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion. Additionally, current methods request large-scale high-quality videos with stable backgrounds and temporal consistency as training datasets, otherwise, their performance will greatly deteriorate. These two issues hinder the practical utilization of character image animation tools. In this paper, we propose a practical and robust framework Follow-Your-Pose v2, which can be trained on noisy open-sourced videos readily available on the internet. Multi-condition guiders are designed to address the challenges of background stability, body occlusion in multi-character generation, and consistency of character appearance. Moreover, to fill the gap of fair evaluation of multi-character pose animation, we propose a new benchmark comprising approximately 4,000 frames. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a margin of over 35\% across 2 datasets and on 7 metrics. Meanwhile, qualitative assessments reveal a significant improvement in the quality of generated video, particularly in scenarios involving complex backgrounds and body occlusion of multi-character, suggesting the superiority of our approach.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 自動運転におけるソフトウェア・イン・ザ・ループシミュレーションと物理試験の相関 Correlation of Software-in-the-Loop Simulation with Physical Testing for Autonomous Driving ( http://arxiv.org/abs/2406.03040v1 ) ライセンス: Link先を確認	Zhennan Fei, Mikael Andersson, Andreas Tingberg,	(参考訳) ソフトウェア・イン・ザ・ループ (Software-in-the-loop, SIL) シミュレーションは、その柔軟性と効率性から、自動運転車の迅速な開発とテストに広く用いられている手法である。本稿では,社内で開発されたSILシミュレーションツールチェーンの検証事例について述べる。提示された検証プロセスには、テストトラック上の代表シナリオの設計と実行が含まれます。テストトラックをSILシミュレーションと整合させるため,車載テストから得られたデータに基づいてパラメータを微調整することでシナリオを改良する同期手法を提案する。また、SILシミュレーションと車両試験ログの相関性を評価するために用いられる2つの指標についても論じる。提案した検証プロセスの有効性を示すための予備的な結果が提示される。 Software-in-the-loop (SIL) simulation is a widely used method for the rapid development and testing of autonomous vehicles because of its flexibility and efficiency. This paper presents a case study on the validation of an in-house developed SIL simulation toolchain. The presented validation process involves the design and execution of a set of representative scenarios on the test track. To align the test track runs with the SIL simulations, a synchronization approach is proposed, which includes refining the scenarios by fine-tuning the parameters based on data obtained from vehicle testing. The paper also discusses two metrics used for evaluating the correlation between the SIL simulations and the vehicle testing logs. Preliminary results are presented to demonstrate the effectiveness of the proposed validation process	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# 集団変換器:頭蓋内活動の集団レベルの表現を学習する Population Transformer: Learning Population-level Representations of Intracranial Activity ( http://arxiv.org/abs/2406.03044v1 ) ライセンス: Link先を確認	Geeling Chau, Christopher Wang, Sabera Talukder, Vighnesh Subramaniam, Saraswati Soedarmadji, Yisong Yue, Boris Katz, Andrei Barbu,	(参考訳) 本稿では,頭蓋内神経記録の集団レベルの符号を大規模に学習し,重要な神経科学記録のための表現学習の利点を解放する自己教師型フレームワークを提案する。 Population Transformer (PopT)は、復号実験に必要なデータ量を削減し、未確認の被験者やタスクでも精度を向上する。 PopTの開発における2つの課題に対処する: スパース電極分布と患者間での電極位置の変化である。 PopTスタックは事前訓練された表現の上にあり、複数の空間的にスパースなデータチャネルの学習的な集約を可能にすることで下流タスクを強化する。復号化以外にも、事前訓練されたPopTと微調整されたモデルを解釈して、大量のデータから学んだ神経科学的な洞察を提供する方法を示す。トレーニング済みのPopTをリリースし、マルチチャネルの頭蓋内データの復号化と解釈性の向上を実現し、https://github.com/czlwang/Population Transformer.comでコードを利用できる。 We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key challenges in developing PopT: sparse electrode distribution and varying electrode location across patients. PopT stacks on top of pretrained representations and enhances downstream tasks by enabling learned aggregation of multiple spatially-sparse data channels. Beyond decoding, we interpret the pretrained PopT and fine-tuned models to show how it can be used to provide neuroscience insights learned from massive amounts of data. We release a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability, and code is available at https://github.com/czlwang/PopulationTransformer.	翻訳日:2024-06-06 19:29:27 公開日:2024-06-05
# スパイキングニューラルネットワークが時間的注意画像デコードと適応スパイキングニューロンと出会うとき When Spiking neural networks meet temporal attention image decoding and adaptive spiking neuron ( http://arxiv.org/abs/2406.03046v1 ) ライセンス: Link先を確認	Xuerui Qiu, Zheng Luan, Zhaorui Wang, Rui-Jie Zhu,	(参考訳) スパイキングニューラルネットワーク(SNN)は、生物学的に妥当な方法で時間情報をエンコードし、処理することができる。しかし、画像タスクのための既存のSNNベースのメソッドの多くは、この機能を完全に活用していない。さらに、スパイキングニューロンにおける適応しきい値の役割を見落とし、そのダイナミックな振る舞いと学習能力を高めることができる。本稿では,時間的注意(TAID)と適応型Leaky-Integrate-and-Fire(ALIF)ニューロンモデルに基づく画像復号法を提案する。提案手法は,SNN出力の時間的情報を利用して,インセプションスコア,Fr'echet Inception Distance,Fr'echet Autoencoder Distanceの点から,最先端(SOTA)を超える高品質な画像を生成する。さらに、我々のALIFニューロンモデルでは、MNIST(99.78\%)およびCIFAR-10(93.89\%)データセットの顕著な分類精度を実現し、スパイキングニューロンに対する適応しきい値の学習の有効性を示す。コードはhttps://github.com/bollossom/ICLR_TINY_SNNで公開されている。 Spiking Neural Networks (SNNs) are capable of encoding and processing temporal information in a biologically plausible way. However, most existing SNN-based methods for image tasks do not fully exploit this feature. Moreover, they often overlook the role of adaptive threshold in spiking neurons, which can enhance their dynamic behavior and learning ability. To address these issues, we propose a novel method for image decoding based on temporal attention (TAID) and an adaptive Leaky-Integrate-and-Fire (ALIF) neuron model. Our method leverages the temporal information of SNN outputs to generate high-quality images that surpass the state-of-the-art (SOTA) in terms of Inception score, Fr\'echet Inception Distance, and Fr\'echet Autoencoder Distance. Furthermore, our ALIF neuron model achieves remarkable classification accuracy on MNIST (99.78\%) and CIFAR-10 (93.89\%) datasets, demonstrating the effectiveness of learning adaptive thresholds for spiking neurons. The code is available at https://github.com/bollossom/ICLR_TINY_SNN.	翻訳日:2024-06-06 19:19:28 公開日:2024-06-05
# 単一共振器-光子感度を有する共振器結合二重点による高効率マイクロ波光検出 High-efficiency microwave photodetection by cavity coupled double dots with single cavity-photon sensitivity ( http://arxiv.org/abs/2406.03047v1 ) ライセンス: Link先を確認	Subhomoy Haldar, Harald Havir, Waqar Khan, Drilon Zenelaj, Patrick P. Potts, Sebastian Lehmann, Kimberly A. Dick, Peter Samuelsson, Ville F. Maisi,	(参考訳) 超伝導空洞結合型二重量子ドット(DQD)フォトダイオードをマイクロ波領域で最大25%の光子変換効率を実現する。より高品質な共振器と改良されたデバイス設計により、不要な経路による光子漏れを防止するとともに、マイクロ波信号を100 aWの電力レベルまで測定し、共振器内で1つの光子でマイクロ波信号をプローブする感度を実現する。我々はJaynes-Cummings入出力理論を用いて光ダイオード動作を解析し、ほぼ均一な光検出効率を実現するために必要なキャビティ-DQD結合の重要な改善点を特定した。本研究の結果は、マイクロ波領域における光子統計学および量子情報処理に関する応用研究において、単一空洞光子感度による近距離マイクロ波光検出効率への重要な進歩を示すものである。 We present a superconducting cavity-coupled double quantum dot (DQD) photodiode that achieves a maximum photon-to-electron conversion efficiency of 25% in the microwave domain. With a higher-quality-factor cavity and improved device design to prevent photon leakages through unwanted pathways, our device measures microwave signals down to 100 aW power level and achieves sensitivity to probe microwave signals with one photon at a time in the cavity. We analyze the photodiode operation using Jaynes-Cummings input-output theory, identifying the key improvements of stronger cavity-DQD coupling needed to achieve near-unity photodetection efficiency. The results presented in this work represent a crucial advancement toward near unity microwave photodetection efficiency with single cavity-photon sensitivity for studies of photon statistics in the microwave range and applications related to quantum information processing.	翻訳日:2024-06-06 19:19:28 公開日:2024-06-05
# 各タスクに必要なものを与える -- 構造化された疎性を活用したマルチタスク学習 Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning ( http://arxiv.org/abs/2406.03048v1 ) ライセンス: Link先を確認	Richa Upadhyay, Ronald Phlypo, Rajkumar Saini, Marcus Liwicki,	(参考訳) 各タスクは、低レベルから高レベルまで多様な特徴表現を要求するため、特にマルチタスク学習(MTL)フレームワークにおいて、各タスクの特定のニーズに対処することが不可欠である。この研究は、構造化された空間を利用して個々のタスクの特徴選択を洗練し、マルチタスクシナリオにおける全てのタスクのパフォーマンスを向上させるレイヤ最適化マルチタスク(LOMT)モデルを導入する。構造化されたあるいはグループの疎結合は、訓練中に自明なチャネルからパラメータを体系的に排除し、最終的には畳み込みニューラルネットワーク内のすべての層を除去する。その結果、残りのレイヤは与えられたタスクに対して最も最適な機能を提供します。この2段階のアプローチでは、ネットワークの終端でデコーダを均一に接続する従来の手法から逸脱し、タスク固有のデコーダをこれらの戦略的に識別された層に接続することで、この疎結合による最適層情報を利用してLOMTモデルを構築する。このカスタマイズされたアーキテクチャはネットワークを最適化し、冗長性を減らしながら本質的な機能に重点を置いている。本稿では,複数の異種タスクに対して,NYU-v2とCelebAMask-HDの2つのデータセットに対して提案手法の有効性を検証する。従来のMTLモデルとは対照的に,LOMTモデルの詳細な性能解析により,ほとんどのタスクの組み合わせにおいて,LOMTモデルの方が優れていたことが明らかとなった。優れた質的および定量的な結果は、最適層(または特徴)選択に構造化されたスパーシティを採用することの有効性を浮き彫りにする。 Every task demands distinct feature representations, ranging from low-level to high-level attributes, so it is vital to address the specific needs of each task, especially in the Multi-task Learning (MTL) framework. This work, therefore, introduces Layer-Optimized Multi-Task (LOMT) models that utilize structured sparsity to refine feature selection for individual tasks and enhance the performance of all tasks in a multi-task scenario. Structured or group sparsity systematically eliminates parameters from trivial channels and, eventually, entire layers within a convolution neural network during training. Consequently, the remaining layers provide the most optimal features for a given task. In this two-step approach, we subsequently leverage this sparsity-induced optimal layer information to build the LOMT models by connecting task-specific decoders to these strategically identified layers, deviating from conventional approaches that uniformly connect decoders at the end of the network. This tailored architecture optimizes the network, focusing on essential features while reducing redundancy. We validate the efficacy of the proposed approach on two datasets, ie NYU-v2 and CelebAMask-HD datasets, for multiple heterogeneous tasks. A detailed performance analysis of the LOMT models, in contrast to the conventional MTL models, reveals that the LOMT models outperform for most task combinations. The excellent qualitative and quantitative outcomes highlight the effectiveness of employing structured sparsity for optimal layer (or feature) selection.	翻訳日:2024-06-06 19:19:28 公開日:2024-06-05
# StreamSpeech: マルチタスク学習による同時音声音声合成 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning ( http://arxiv.org/abs/2406.03049v1 ) ライセンス: Link先を確認	Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng,	(参考訳) 同時音声音声変換(Simul-S2ST、ストリーミング音声翻訳)は、リアルタイム通信において重要なストリーミング音声入力を受信しながらターゲット音声を出力する。 Simul-S2STは、音声間の翻訳の達成以外にも、音声入力の機会に対応するターゲット音声を生成するためのモデルを制御するためのポリシーが必要であり、それによって翻訳とポリシーの二重課題が引き起こされる。本稿では,マルチタスク学習の統一フレームワークであるStreamSpeechを提案する。マルチタスク学習アプローチを採用することで、StreamSpeechは"All-in-One"シームレスモデルを通じて、オフラインおよび同時音声認識、音声翻訳、音声合成を行うことができる。 CVSSベンチマークの実験では、StreamSpeechはオフラインS2STタスクとSimul-S2STタスクの両方で最先端のパフォーマンスを実現している。さらに、StreamSpeechは、同時翻訳プロセス中に高品質な中間結果(ASRまたは翻訳結果)を提示することができ、より包括的なリアルタイム通信エクスペリエンスを提供する。 Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an "All-in-One" seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality intermediate results (i.e., ASR or translation results) during simultaneous translation process, offering a more comprehensive real-time communication experience.	翻訳日:2024-06-06 19:19:28 公開日:2024-06-05

Title

Authors

Abstract

論文公表日・翻訳日

# 変数の相似選択のためのクラスタモデルと学生の就業率予測の強化

Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction ( http://arxiv.org/abs/2407.16884v1 )

ライセンス: Link先を確認

Pooja Thakar, Anil Mehta, Manisha,

(参考訳) 教育データマイニング(EDM: Educational Data Mining)は、データマイニングが学生のパフォーマンス予測に広く利用されている、有望な分野である。高等教育が直面する最も一般的かつ最近の課題の1つは、生徒を巧みに雇用できるようにすることである。施設は大量のデータを持っているが、それでも知識を明らかにして生徒を指導することはできない。教育におけるデータは一般的に非常に大きく、多次元であり、自然界では不均衡である。このようなデータから知識を抽出するプロセスには、独自の問題セットがあり、非常に複雑なタスクである。本稿では,様々な大学や大学から,MCA(Masters in Computer Applications)の学生データを収集する。データセットは、大きく、不均衡で、本質的に多次元である。本稿では,前処理段階に適用されたクラスタベースモデルを用いて,変数のパーシミュニケートな選択を支援し,予測アルゴリズムの性能を向上させる。したがって、学生の就労率の予測がより容易になる。

Educational Data Mining (EDM) is a promising field, where data mining is widely used for predicting students performance. One of the most prevalent and recent challenge that higher education faces today is making students skillfully employable. Institutions possess large volume of data; still they are unable to reveal knowledge and guide their students. Data in education is generally very large, multidimensional and unbalanced in nature. Process of extracting knowledge from such data has its own set of problems and is a very complicated task. In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India. The dataset is large, unbalanced and multidimensional in nature. A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables and improves the performance of predictive algorithms. Hence, facilitate in better prediction of Students Employability.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-05

# GPT-4におけるモラルの1次元マッピング--モラル領域の国別推定精度がモラル領域にどのように依存するか

GPT-4's One-Dimensional Mapping of Morality: How the Accuracy of Country-Estimates Depends on Moral Domain ( http://arxiv.org/abs/2407.16886v1 )

ライセンス: Link先を確認

Pontus Strimling, Joel Krueger, Simon Karlsson,

(参考訳) 以前の研究では、Open AIのGPTモデルは、各国間の道徳的意見の変化を予測することができるが、低所得国に比べて、高い所得国では精度が著しく高い傾向にあることが示されている。本研究は, 過去の知見を再現し, 道徳的問題の種類によってどのように精度が変化するかを調べることによって研究を進めることを目的としている。世界価値調査と欧州価値調査の回答を用いて、63か国18の道徳問題をカバーし、各道徳問題の平均スコアを算出し、GPT-4の予測と比較した。以上の結果から,GPT-4は低所得国よりも高所得国において高い予測的成功率を示した。しかしながら, GPT-4は, 各国の保守主義・自由主義の程度を反映して, 主に一つの次元に基づいて予測を行う。逆に、現実世界の道徳観は2次元のように見える。道徳的問題が道徳的領域に基づいて分類されると、GPT-4の予測は、高所得者(r = .77)と低所得者(r = .58)の両方で、個人性領域において著しく正確であることが分かる。しかし、予測精度は高所得国(r = .30)と低所得国(r = -.16)の両方で暴力的不正直な領域で著しく低下し、GPT-4の1次元の世界観が道徳的景観の複雑さを完全に捉えていないことを示している。本研究は、GPT-4の道徳的理解を理解するために、国固有の特徴を考えるだけでなく、目前にある道徳的問題の特徴も考慮することの重要性を強調している。

Prior research demonstrates that Open AI's GPT models can predict variations in moral opinions between countries but that the accuracy tends to be substantially higher among high-income countries compared to low-income ones. This study aims to replicate previous findings and advance the research by examining how accuracy varies with different types of moral questions. Using responses from the World Value Survey and the European Value Study, covering 18 moral issues across 63 countries, we calculated country-level mean scores for each moral issue and compared them with GPT-4's predictions. Confirming previous findings, our results show that GPT-4 has greater predictive success in high-income than in low-income countries. However, our factor analysis reveals that GPT-4 bases its predictions primarily on a single dimension, presumably reflecting countries' degree of conservatism/liberalism. Conversely, the real-world moral landscape appears to be two-dimensional, differentiating between personal-sexual and violent-dishonest issues. When moral issues are categorized based on their moral domain, GPT-4's predictions are found to be remarkably accurate in the personal-sexual domain, across both high-income (r = .77) and low-income (r = .58) countries. Yet the predictive accuracy significantly drops in the violent-dishonest domain for both high-income (r = .30) and low-income (r = -.16) countries, indicating that GPT-4's one-dimensional world-view does not fully capture the complexity of the moral landscape. In sum, this study underscores the importance of not only considering country-specific characteristics to understand GPT-4's moral understanding, but also the characteristics of the moral issues at hand.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-05

# インド高等教育システムにおける雇用可能性の統一予測モデル

Unified Prediction Model for Employability in Indian Higher Education System ( http://arxiv.org/abs/2407.17591v1 )

ライセンス: Link先を確認

Pooja Thakar, Anil Mehta, Manisha,

(参考訳) 教育データマイニングは、過去10年間で研究者の間で非常に人気がある。この領域における以前の取り組みは、学生の学業成績の予測にのみ向けられていた。大学構内における学生の就学率の予測は, 学生の就学初期における就学率の予測に向け, 学生の就学率の予測に向けられた研究が極めて少ない。さらに、既存の学生雇用予測の研究は、アプローチにおいて普遍的ではなく、1つのコースまたは大学/機関のみに基づいている。そのため、あるコンテキストから別のコンテキストへ拡張性がない。統一の必要性から、Bchelor in Engineering/Technology and Masters in Computer Applicationsという専門技術コースのデータがインド17州から収集されている。このようなデータを扱うために、17の状態データセットに統一的な予測モデルが開発され、適用されている。本研究は, モデルが普遍的に適用可能であることを証明し, 異なる文化的背景とコース構造を持つインドパン・インディアの様々な州や機関に適用可能であることを実証する。また,本論文は,学生の就学率の予測に関して,国家に対するインド教育制度に有意な差がないことを統計的に調査し,証明している。モデルは、インドのシナリオにおける学生雇用率予測のための一般化されたソリューションを提供する。

Educational Data Mining has become extremely popular among researchers in last decade. Prior effort in this area was only directed towards prediction of academic performance of a student. Very less number of researches are directed towards predicting employability of a student i.e. prediction of students performance in campus placements at an early stage of enrollment. Furthermore, existing researches on students employability prediction are not universal in approach and is either based upon only one type of course or University/Institute. Henceforth, is not scalable from one context to another. With the necessity of unification, data of professional technical courses namely Bachelor in Engineering/Technology and Masters in Computer Applications students have been collected from 17 states of India. To deal with such a data, a unified predictive model has been developed and applied on 17 states datasets. The research done in this paper proves that model has universal application and can be applied to various states and institutes pan India with different cultural background and course structure. This paper also explores and proves statistically that there is no significant difference in Indian Education System with respect to states as far as prediction of employability of students is concerned. Model provides a generalized solution for student employability prediction in Indian Scenario.

翻訳日:2024-08-05 01:35:56 公開日:2024-06-05

# スライダチャット:3Dスライダのためのローカルチャットボットの構築

SlicerChat: Building a Local Chatbot for 3D Slicer ( http://arxiv.org/abs/2407.11987v1 )

ライセンス: Link先を確認

Colton Barr,

(参考訳) 3D Slicerは3Dデータ視覚化と分析のための強力なプラットフォームだが、新しいユーザーにとって大きな学習曲線がある。 ChatGPTのような生成AIアプリケーションは、自然言語を使ってさまざまなドキュメントソース間のギャップを埋める潜在的な方法として登場した。しかし、3DスライダのドキュメンテーションへのLLMサービスの露出は限られているため、ChatGPTと関連するサービスは幻覚に悩まされる傾向にある。このプロジェクトの目的は、SlicerChatと呼ばれるチャットボットアーキテクチャを構築することであり、3D Slicer関連の質問に答え、オープンソースモデルを使用してローカルで実行できるように最適化されている。この研究で調査された中核的な質問は、微調整、モデルサイズ、そしてプロンプトに含まれるドメイン知識の種類による、回答の品質と速度の違いに関するものだ。プロトタイプのSlicerChatシステムは、Code-Llama Instructアーキテクチャに基づいた3Dスライダのカスタム拡張として開発された。低階適応を用いてサイズ1.1B,7B,13Bのモデルを微調整し、3Dスライダドキュメンテーションの様々なソースを検索型拡張生成パラダイムで使用するためにコンパイルした。 5つの3D Slicer質問のベンチマークデータセットで、ファインチューニングとモデルサイズの組み合わせをテストすると、ファインチューニングはベースアーキテクチャと比較してモデル性能や速度に影響を与えず、より大きなモデルの方が大幅に速度を低下させる結果となった。プロンプトに3Dスライダのドキュメンテーションを追加する実験では、PythonのサンプルコードとMarkdownのドキュメンテーションが最も有用な情報であるが、3DスライダのシーンデータとDiscourseからの質問もモデルのパフォーマンスを改善した。結論として、このプロジェクトは高品質でローカルなチャットボットを3D Slicerに直接統合し、新しいユーザーや経験豊富な開発者がソフトウェアをより効率的に使えるようにする可能性を示している。

3D Slicer is a powerful platform for 3D data visualization and analysis, but has a significant learning curve for new users. Generative AI applications, such as ChatGPT, have emerged as a potential method of bridging the gap between various sources of documentation using natural language. The limited exposure of LLM services to 3D Slicer documentation, however, means that ChatGPT and related services tend to suffer from significant hallucination. The objective of this project is to build a chatbot architecture, called SlicerChat, that is optimized to answer 3D Slicer related questions and able to run locally using an open-source model. The core research questions explored in this work revolve around the answer quality and speed differences due to fine-tuning, model size, and the type of domain knowledge included in the prompt. A prototype SlicerChat system was built as a custom extension in 3D Slicer based on the Code-Llama Instruct architecture. Models of size 1.1B, 7B and 13B were fine-tuned using Low rank Adaptation, and various sources of 3D Slicer documentation were compiled for use in a Retrieval Augmented Generation paradigm. Testing combinations of fine-tuning and model sizes on a benchmark dataset of five 3D Slicer questions revealed that fine-tuning had no impact on model performance or speed compared to the base architecture, and that larger models performed better with a significant speed decrease. Experiments with adding 3D Slicer documentation to the prompt showed that Python sample code and Markdown documentation were the most useful information to include, but that adding 3D Slicer scene data and questions taken from Discourse also improved model performance. In conclusion, this project shows the potential for integrating a high quality, local chatbot directly into 3D Slicer to help new users and experienced developers alike to more efficiently use the software.

翻訳日:2024-07-22 11:50:18 公開日:2024-06-05

# メタフォリックパラフレーズを用いたよりハードなクロスドキュメントイベント参照解決データセットの生成

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing ( http://arxiv.org/abs/2407.11988v1 )

ライセンス: Link先を確認

Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin,

(参考訳) 最も一般的なクロスドキュメントイベント参照解決(CDEC)データセットは、コア参照イベントトリガ(イベントを参照する単語やフレーズ)間の語彙的多様性が欠如しているため、タスクの真の難しさを伝えることができない。さらに、図形言語のためのイベントデータセットのデジェストがあり、イベント理解における重要な研究の道のりを制限している。象徴的で比喩的な言語でCDECにイベントコアフバンクプラス(ECB+)の語彙的に豊かな変種であるECB+METAを導入することで、これらの2つの問題に対処する。我々は、ECB+の文書における文の比喩的変換のツールとしてChatGPTを使用し、変換された文の元のイベントトリガーを半自動的にタグ付けする。このようにして、高価なコア参照リンクの再注釈を避ける。我々は、ECB+METAとの闘いをうまくこなす既存の手法を示す結果を示し、より困難なデータセットに関するCDEC研究の道を開く。コード/データ:https://github.com/ahmeshaf/llms_coref

The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref

翻訳日:2024-07-22 11:30:12 公開日:2024-06-05

# 大規模言語モデルにおけるヘッド・オブ・ライン・ブロッキングの解決に必要なのは1つのキュー

One Queue Is All You Need: Resolving Head-of-Line Blocking in Large Language Model Serving ( http://arxiv.org/abs/2407.00047v1 )

ライセンス: Link先を確認

Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Shengkun Cui, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer,

(参考訳) LLM(Large Language Model)は,エンタープライズアプリケーションとコンシューマアプリケーションの両方を対象とするクラウドプロバイダにとって,ますます重要なワークロードになっています。これらのアプリケーションからのLLM推論要求には、本番環境に従わなければならないエンドツーエンドのレイテンシSLOがある。しかし、既存のLLMサービスシステムは、エンドツーエンドのレイテンシSLOよりも、要求サービススループットや要求実行遅延といった最適化目標に重点を置いている。待ち時間に敏感なリクエストに対するエンドツーエンドのSLOを実現することは、リクエストキューにヘッド・オブ・ライン(HOL)がブロックされているため困難である。上記の課題に対処するため,LLMサービスのためのマルチモデルキュー管理フレームワークであるQLMを提案する。 QLMは確率的プログラミングを用いて、複数のLSMサービングオペレーション(LSO)の動作をオーケストレーションし、HOLブロックを減らし、SLO達成を最大化する。具体的には、モデルスワップ、要求消去、GPU-CPU状態スワップ、ロードバランシング、ウォームモデルスタートなどである。実世界のLLMサービスデータセットを用いた異種GPUデバイスおよびモデルの評価は、QLMがSLOの達成率を40-90%改善し、スループットを20-400%向上し、他の最先端のLLMサービスシステムと比較してデバイス利用率を維持または改善していることを示している。

$ $Large language models (LLMs) have become an increasingly important workload for cloud providers catering to both enterprise and consumer applications. LLM inference requests from these applications have end-to-end latency SLOs that must be adhered to in production settings. However, existing LLM serving systems focus on optimization objectives such as request serving throughput or request execution latency rather than the end-to-end latency SLOs. Achieving end-to-end SLOs for latency-sensitive requests is challenging due to head-of-line (HOL) blocking in the request queue, which results from bursty arrival rates and insufficient resources. To address the above challenge, we propose QLM, a multi-model queue management framework for LLM serving. QLM uses stochastic programming to orchestrate the actions of multiple LLM Serving Operations (LSOs) to reduce HOL blocking and maximize SLO attainment. Specifically, QLM uses the following LSOs: model swapping, request eviction, GPU-CPU state swapping, load balancing, and warm model start. Evaluation on heterogeneous GPU devices and models with real-world LLM serving dataset shows that QLM improves SLO attainment by 40-90% and throughput by 20-400% while maintaining or improving device utilization compared to other state-of-the-art LLM serving systems.

翻訳日:2024-07-07 13:43:41 公開日:2024-06-05

# Block-Toeplitz Augmented Covariance Matrices and Siegel Metricsを用いたモータ画像BCI分類の計算効率の向上

Enhancing Computational Efficiency of Motor Imagery BCI Classification with Block-Toeplitz Augmented Covariance Matrices and Siegel Metric ( http://arxiv.org/abs/2406.16909v1 )

ライセンス: Link先を確認

Igor Carrara, Theodore Papadopoulo,

(参考訳) 脳波信号は多次元データセットとして表現される。運動画像分類を改善するために, 拡張共分散法(ACM)の強化を導入し, 動的系の位相空間再構成とリーマン幾何学の組合せとして現れる。実際、分類を改善するための対称正定行列の構成に基づいている。しかし、この行列は以前に無視されたブロック・トゥープリッツ構造を持つ。この研究は、それらが属する実多様体におけるそのような行列、すなわちブロック・トゥープリッツ SPD 行列の集合を扱う。いくつかの操作の後、この集合はSPD多様体とシーゲルディスク空間の積と見なすことができ、提案手法はMOABBフレームワークを用いてセッション内評価法を用いて検証された。 ACMと同じような分類性能を実現しており、一般的には--あるいは---------------------------------------------------------------------------------------------------- --------------------------------------------------------------- しかし、結果としてACMよりも計算効率が向上し、リアルタイム実験にさらに適している。

Electroencephalographic signals are represented as multidimensional datasets. We introduce an enhancement to the augmented covariance method (ACM), exploiting more thoroughly its mathematical properties, in order to improve motor imagery classification.Standard ACM emerges as a combination of phase space reconstruction of dynamical systems and of Riemannian geometry. Indeed, it is based on the construction of a Symmetric Positive Definite matrix to improve classification. But this matrix also has a Block-Toeplitz structure that was previously ignored. This work treats such matrices in the real manifold to which they belong: the set of Block-Toeplitz SPD matrices. After some manipulation, this set is can be seen as the product of an SPD manifold and a Siegel Disk Space.The proposed methodology was tested using the MOABB framework with a within-session evaluation procedure. It achieves a similar classification performance to ACM, which is typically better than -- or at worse comparable to -- state-of-the-art methods. But, it also improves consequently the computational efficiency over ACM, making it even more suitable for real time experiments.

翻訳日:2024-07-01 06:41:31 公開日:2024-06-05

# 心の目:マルチモーダル類似性学習による脳波による画像認識

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning ( http://arxiv.org/abs/2406.16910v1 )

ライセンス: Link先を確認

Chi-Sheng Chen, Chun-Shu Wei,

(参考訳) 非侵襲脳波(EEG)信号からの画像の復号は、人間の脳がどのように視覚情報を現実世界のシナリオで処理するかを理解する上で大きな課題である。信号対雑音比と非定常性の問題に対処するために,ゼロショット脳波画像分類のためのMUltimodal similarity-keeper contrastivE learning (MUSE) フレームワークを提案する。我々は、脳波信号に適した多変量時系列エンコーダを開発し、広範囲な視覚的脳波データセットを用いて、正規化されたコントラスト脳波画像事前学習の有効性を評価する。本手法は,200方向ゼロショット画像分類において,トップ1の精度が19.3%,トップ5の精度が48.8%の最先端性能を実現する。さらに、モデル解釈による神経パターンの可視化を行い、人間の脳の視覚的処理のダイナミクスに光を当てる。この作業のコードリポジトリは、https://github.com/ChiShengChen/MUSE_EEG.comで公開されている。

Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG.

翻訳日:2024-07-01 06:31:46 公開日:2024-06-05

# ナノダイヤモンドセンサを用いた動的非局所変形の測定

Measurement of dynamic nonlocal deformation using nanodiamond sensors ( http://arxiv.org/abs/2406.18577v1 )

ライセンス: Link先を確認

Yue Cui, Weng-Hang Leong, Guoli Zhu, Ren-Bao Liu, Quan Li,

(参考訳) 原子間力顕微鏡によるインデンテーションとナノダイアモンドによる配向追跡を統合した非局所変形検出は、高精度で空間分解能が高く、ソフトバイオシステムの機械的特性を研究するのに有用な技術である。しかし、この技術は現在、生体活動や他の外部の摂動とインデンテーションによる変形を区別できないため、生命の無いシステムに限られている。そこで我々は,この制限を克服するために,振動ナノインデンテーションと分光分析を用いた動的非局所変形検出法を開発した。粘弾性材料と生体細胞の機械的応答における表面・界面効果の開示につながる、時間的および空間的に解決された機械的解析を、数十マイクロ秒のタイムラグ精度、ナノメートルの垂直変形精度、およびサブハンドレッドナノメートルの空間的解像度で実現する。表面張力の無視は、材料の液体のような特性を過小評価する。この研究は、軟質で複雑な生体関連物質の時空間力学的解析の有用なツールとしてナノダイヤモンドセンサーを実証する。

Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induced deformation from that associated with live activities or other external perturbations. Here we develop a dynamic nonlocal deformation sensing method using oscillatory nanoindentation and spectroscopic analysis to overcome this limitation. The method realizes both temporally and spatially resolved mechanical analysis, with tens of microsecond time-lag precision, nanometer vertical deformation precision, and sub-hundred nanometer lateral spatial resolution, leading to the disclosure of surface/interface effects in the mechanical response of viscoelastic materials and live cells. Neglecting surface tension would underestimate the liquid-like characteristics of the materials. This work demonstrates nanodiamond sensors as a useful tool for spatial-temporal mechanical analysis of soft, complex bio-relevant materials.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-05

# Hire: 画像テキストマッチングのためのハイブリッドモーダルインタラクションとマルチリレーショナルエンハンスメント

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching ( http://arxiv.org/abs/2406.18579v1 )

ライセンス: Link先を確認

Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose,

(参考訳) 画像テキストマッチング(ITM)はコンピュータビジョンの基本的な問題である。重要な問題は、視覚とテキストの表現を共同で学習し、それらの類似性を正確に見積もることである。既存のほとんどの手法は、モダリティにおける特徴強化や、モダリティ間の特徴相互作用に重点を置いているが、それにもかかわらず、対応する文とリッチな文脈意味論に一致するオブジェクト間の関係に基づいて、オブジェクト表現の文脈情報を無視している。本稿では,オブジェクトと単語間のモーダル間セマンティクスを暗黙的および明示的関係モデリングで関連づける,画像テキストマッチングのための複合モーダルインタラクションとマルチリレーショナルエンハンスメント(termed \textit{Hire})を提案する。特に、明示的なモーダル空間意味グラフに基づく推論ネットワークは、オブジェクトの空間位置とシーングラフの明示的な関係によって導かれる、空間的および意味的な関係性を持つ視覚オブジェクトの文脈的表現を改善するように設計されている。我々は、明示的な関係検出の耐障害性を改善するために、明示的なモデリングの前に潜在的な関係の相互作用に暗黙的な関係のモデリングを用いる。そして、視覚的およびテキスト的意味表現は、モーダル間対話的注意とモーダル間アライメントによって共同で洗練される。オブジェクトのコンテキストとテキストのコンテキストを関連付けるため、クロスレベルなオブジェクト文と単語画像に基づく対話的注意による視覚的意味表現をさらに洗練する。広汎な実験により、暗黙的および明示的なモデリングとのハイブリッド・モーダル相互作用が画像テキストマッチングにおいてより有益であることが検証された。提案した‘textit{Hire} は MS-COCO と Flickr30K のベンチマークで新しい最先端結果を得る。

Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-object relationships that match the corresponding sentences with rich contextual semantics. In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling. In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects with salient spatial and semantic relational connectivities, guided by the explicit relationships of the objects' spatial positions and their scene graph. We use implicit relationship modelling for potential relationship interactions before explicit modelling to improve the fault tolerance of explicit relationship detection. Then the visual and textual semantic representations are refined jointly via inter-modal interactive attention and cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via cross-level object-sentence and word-image-based interactive attention. Extensive experiments validate that the proposed hybrid-modal interaction with implicit and explicit modelling is more beneficial for image-text matching. And the proposed \textit{Hire} obtains new state-of-the-art results on MS-COCO and Flickr30K benchmarks.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-05

# 大規模生成ネットワーク上のシーディング光-拡散モデルにおける疫学的不確かさの推定

Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models ( http://arxiv.org/abs/2406.18580v1 )

ライセンス: Link先を確認

Lucas Berry, Axel Brando, David Meger,

(参考訳) 1億のパラメータ数と高次元画像空間での演算で有名な生成拡散モデルは、計算要求による従来の不確実性推定手法に重大な課題を提起する。本研究では,拡散モデルの疫学的不確実性を推定するために設計されたDiffusion Ensembles for Capturing Uncertainity (DECU) という革新的なフレームワークを紹介する。 DECUフレームワークは、事前訓練されたパラメータの静的なセットを組み込んで条件拡散モデルのアンサンブルを効率的に訓練する手法を導入し、計算負担と訓練を必要とするパラメータの数を大幅に削減する。さらに、DECはPairwise-Distance Estimator (PaiDEs) を用いて、高次元空間におけるモデル出力と重みの相互情報を評価することで、てんかんの不確かさを正確に測定する。このフレームワークの有効性は、ImageNetデータセットの実験を通じて実証され、特にアンダーサンプル画像クラスにおいて、てんかん不確実性を捉える能力を強調している。

Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-05

# スティル化スコア蒸留によるDream-in-Style:テキスト・ツー・3D生成

Dream-in-Style: Text-to-3D Generation using Stylized Score Distillation ( http://arxiv.org/abs/2406.18581v1 )

ライセンス: Link先を確認

Hubert Kompanowski, Binh-Son Hua,

(参考訳) 本稿では,3次元オブジェクトをスタイルで生成する手法を提案する。提案手法では,テキストプロンプトとスタイル参照イメージを入力として取り込んでニューラルラディアンスフィールドを再構成し,テキストプロンプトと参照画像に続くスタイルに整合した3Dモデルを合成する。 3Dオブジェクトを同時に生成し,一行でスタイル転送を行うために,テキストから3Dまでの最適化プロセスを導出し,視覚的に可視な形状と外観を出力するスタイリングされたスコア蒸留損失を提案する。本発明のスタイライズされたスコア蒸留は,従来の事前訓練されたテキスト・ツー・イメージモデルと,参照画像からスタイルを注入するために操作された自己保持層のキーと値の特徴を組み合わさったものである。最新の手法との比較により,本手法の強い視覚的性能が示され,ユーザ研究の定量的結果によってさらに裏付けられた。

We present a method to generate 3D objects in styles. Our method takes a text prompt and a style reference image as input and reconstructs a neural radiance field to synthesize a 3D model with the content aligning with the text prompt and the style following the reference image. To simultaneously generate the 3D object and perform style transfer in one go, we propose a stylized score distillation loss to guide a text-to-3D optimization process to output visually plausible geometry and appearance. Our stylized score distillation is based on a combination of an original pretrained text-to-image model and its modified sibling with the key and value features of self-attention layers manipulated to inject styles from the reference image. Comparisons with state-of-the-art methods demonstrated the strong visual performance of our method, further supported by the quantitative results from our user study.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-05

# 正準整合場:点雲からの動的形状の再構成

Canonical Consolidation Fields: Reconstructing Dynamic Shapes from Point Clouds ( http://arxiv.org/abs/2406.18582v1 )

ライセンス: Link先を確認

Miaowei Wang, Changjian Li, Amir Vaxman,

(参考訳) カノニカル・コンソリデーション・フィールド(CanFields: Canonical Consolidation Fields: CanFields)は、独立にサンプリングされた点雲の時系列を単一の変形コヒーレントな形状に再構成する手法である。このような入力は、しばしばモーションキャプチャーから来る。既存の手法は幾何と変形を組み合わせ、細部を滑らかにし、移動点を追跡する能力を失うか、あるいは変形を明示的に追跡するが、位相的および幾何学的アーティファクトを導入する。我々の斬新さは、ノイズや外れ値の影響を低減し、欠落した領域を克服できる方法で、点雲を単一の標準形にまとめることにある。変形を導く速度場を同時に再構築する。この統合により、低周波変形を忠実に再現しながら、幾何学の高周波詳細を維持できる。私たちのアーキテクチャは単純なコンポーネントで構成されており、データセットを使わずに任意の入力形状に適合します。提案手法のロバスト性および精度を,欠落領域,スパースフレーム,ノイズを含む多様な動的点雲のベンチマークで示す。

We present Canonical Consolidation Fields (CanFields): a method for reconstructing a time series of independently-sampled point clouds into a single deforming coherent shape. Such input often comes from motion capture. Existing methods either couple the geometry and the deformation, where by doing so they smooth fine details and lose the ability to track moving points, or they track the deformation explicitly, but introduce topological and geometric artifacts. Our novelty lies in the consolidation of the point clouds into a single canonical shape in a way that reduces the effect of noise and outliers, and enables us to overcome missing regions. We simultaneously reconstruct the velocity fields that guide the deformation. This consolidation allows us to retain the high-frequency details of the geometry, while faithfully reproducing the low-frequency deformation. Our architecture comprises simple components, and fits any single input shape without using datasets. We demonstrate the robustness and accuracy of our methods on a diverse benchmark of dynamic point clouds, including missing regions, sparse frames, and noise.

翻訳日:2024-07-01 05:40:31 公開日:2024-06-05

# Lumina-Next:Next-DiTでLumina-T2Xをより強く高速に

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT ( http://arxiv.org/abs/2406.18583v1 )

ライセンス: Link先を確認

Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao,

(参考訳) Lumina-T2Xは、フローベースの大規模拡散変換器の初期段階のファミリーであり、画像やビデオなどの様々なモダリティにノイズを変換する統一的なフレームワークを確立し、テキスト命令で条件付けされている。その有望な機能にもかかわらず、Lumina-T2Xは、トレーニング不安定、遅い推論、外挿アーティファクトなどの課題に直面している。本稿では,Lumina-T2Xの改良版であるLumina-Nextについて述べる。本稿では,Frag-DiTアーキテクチャの包括的解析から始め,Next-DiTアーキテクチャに3D RoPEとサンドイッチ正規化を導入することで,いくつかの部分最適化コンポーネントを同定する。より高分解能な外挿を実現するために,3D RoPEとテキスト・画像生成に適用された異なるコンテキスト外挿手法を徹底的に比較し,拡散トランスフォーマに適した周波数・時間対応スケール付き RoPE を提案する。さらに,フローODEとコンテキストドロップ法を解く際のサンプリングステップを削減するためのシグモイド時間離散化スケジュールを導入し,冗長な視覚トークンをマージしてネットワーク評価を高速化し,全体のサンプリング速度を効果的に向上させた。これらの改善により、Lumina-Nextは基本的なテキスト・ツー・イメージ生成の品質と効率を向上するだけでなく、デコーダベースのLCMをテキストエンコーダとして使い、優れた解像度外挿機能と多言語生成をゼロショットで実現している。汎用的な生成フレームワークとしてLumina-Nextをさらに検証するために、視覚認識、マルチビュー、オーディオ、音楽、ポイントクラウド生成など様々なタスクをインスタンス化し、これらの領域で強いパフォーマンスを示す。すべてのコードとモデルウェイトをリリースすることにより、ユニバーサルモデリングが可能な次世代生成AIの開発を進めることを目指している。

Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling.

翻訳日:2024-07-01 05:40:31 公開日:2024-06-05

# ロボットマニピュレーションのための不変マッチングを用いたワンショット模倣学習

One-Shot Imitation Learning with Invariance Matching for Robotic Manipulation ( http://arxiv.org/abs/2405.13178v2 )

ライセンス: Link先を確認

Xinyu Zhang, Abdeslam Boularias,

(参考訳) 多様な操作タスクを実行できる単一の普遍的なポリシーを学ぶことは、ロボティクスにおける有望な新しい方向性である。しかし、既存のテクニックは、トレーニング中に遭遇したタスクのみを実行することができ、新しいタスクを学ぶために多数のデモを必要とする学習ポリシーに限られている。一方、人間は1つの無意味なデモンストレーションから新しいタスクを学ぶことができる。そこで本研究では,IMOP(Invariance-Matching One-shot Policy Learning)アルゴリズムを提案する。エンドエフェクタのポーズを直接学習する標準的なプラクティスとは対照的に、IMOPはまず与えられたタスクの状態空間の不変領域を学習し、次にデモとテストシーン間の不変領域をマッチングしてエンドエフェクタのポーズを計算する。 IMOPは18のRLBenchタスクで訓練され、18のタスクで平均4.5%、最先端のタスクを継続的に上回る成功率を達成した。さらに重要なことは、IMOPは1つの未発表のデモから新しいタスクを学習でき、微調整なしで、9つのカテゴリで選択された22の新規タスクに対して、最先端のタスクよりも11.5\%の平均的な成功率の向上を達成することができる。 IMOPはまた、新しい形状に一般化し、デモと異なるオブジェクトを操作することを学べる。さらに、IMOPは1つの実ロボットデモを用いて、ワンショットのsim-to-real転送を行うことができる。

Learning a single universal policy that can perform a diverse set of manipulation tasks is a promising new direction in robotics. However, existing techniques are limited to learning policies that can only perform tasks that are encountered during training, and require a large number of demonstrations to learn new tasks. Humans, on the other hand, often can learn a new task from a single unannotated demonstration. In this work, we propose the Invariance-Matching One-shot Policy Learning (IMOP) algorithm. In contrast to the standard practice of learning the end-effector's pose directly, IMOP first learns invariant regions of the state space for a given task, and then computes the end-effector's pose through matching the invariant regions between demonstrations and test scenes. Trained on the 18 RLBench tasks, IMOP achieves a success rate that outperforms the state-of-the-art consistently, by 4.5% on average over the 18 tasks. More importantly, IMOP can learn a novel task from a single unannotated demonstration, and without any fine-tuning, and achieves an average success rate improvement of $11.5\%$ over the state-of-the-art on 22 novel tasks selected across nine categories. IMOP can also generalize to new shapes and learn to manipulate objects that are different from those in the demonstration. Further, IMOP can perform one-shot sim-to-real transfer using a single real-robot demonstration.

翻訳日:2024-06-23 14:05:12 公開日:2024-06-05

# 乱流におけるスイミングのためのアクター・クリティカル強化学習における物理インフォームド批判

Physics-Informed Critic in an Actor-Critic Reinforcement Learning for Swimming in Turbulence ( http://arxiv.org/abs/2406.10242v1 )

ライセンス: Link先を確認

Christopher Koh, Laurent Pagnier, Michael Chertkov,

(参考訳) 乱流拡散は粒子を分離に近接させる。受動的に対流する粒子に近い粒子を維持するために必要な水泳の努力について検討した。本研究では,新しい物理情報強化学習(PIRL)戦略と所定の制御(PC)戦略と標準物理情報強化学習戦略とを開発・比較することにより,これらの取り組みを意図した目標と最適にバランスさせることを検討する。我々のPIRLスキームはActor-Physicistと呼ばれ、Actor-Criticアルゴリズムの適応であり、ニューラルネットワークのパラメータ化Criticを解析的に導出された物理的ヒューリスティック関数(物理学者)に置き換える。この戦略は、確率的最適制御の定式化と標準物理非依存のアクター・クリティカル型アルゴリズムから導かれる解析計算された最適PCポリシーと比較される。

Turbulent diffusion causes particles placed in proximity to separate. We investigate the required swimming efforts to maintain a particle close to its passively advected counterpart. We explore optimally balancing these efforts with the intended goal by developing and comparing a novel Physics-Informed Reinforcement Learning (PIRL) strategy with prescribed control (PC) and standard physics-agnostic Reinforcement Learning strategies. Our PIRL scheme, coined the Actor-Physicist, is an adaptation of the Actor-Critic algorithm in which the Neural Network parameterized Critic is replaced with an analytically derived physical heuristic function (the physicist). This strategy is then compared with an analytically computed optimal PC policy derived from a stochastic optimal control formulation and standard physics-agnostic Actor-Critic type algorithms.

翻訳日:2024-06-23 13:35:51 公開日:2024-06-05

# フェイクニュースの検出における大規模言語モデルの有効性の評価:比較分析

Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysis ( http://arxiv.org/abs/2406.06584v1 )

ライセンス: Link先を確認

Sahas Koka, Anthony Vuong, Anish Kataria,

(参考訳) 人工知能の影響がますます高まる時代において、偽ニュースの検出は特に、誤報が社会に重大な影響を及ぼす選挙シーズンのような文脈において重要である。本研究では,偽ニュースコンテンツの識別・フィルタリングにおける各種LLMの有効性について検討した。比較分析アプローチを用いて、GPT-4、Claude 3 Sonnet、Gemini Pro 1.0、Mistral Largeの4つの大きなLLMと、Gemma 7BとMistral 7Bの2つの小さなLLMをテストした。 Kaggleのフェイクニュースデータセットのサンプルを使用することで、この研究はフェイクニュース検出におけるLLMの現在の能力と限界に光を当てるだけでなく、AI駆動の情報整合性向上における開発者や政策立案者の影響についても議論する。

In an era increasingly influenced by artificial intelligence, the detection of fake news is crucial, especially in contexts like election seasons where misinformation can have significant societal impacts. This study evaluates the effectiveness of various LLMs in identifying and filtering fake news content. Utilizing a comparative analysis approach, we tested four large LLMs -- GPT-4, Claude 3 Sonnet, Gemini Pro 1.0, and Mistral Large -- and two smaller LLMs -- Gemma 7B and Mistral 7B. By using fake news dataset samples from Kaggle, this research not only sheds light on the current capabilities and limitations of LLMs in fake news detection but also discusses the implications for developers and policymakers in enhancing AI-driven informational integrity.

翻訳日:2024-06-12 21:24:05 公開日:2024-06-05

# 離散時間力学系の解釈可能なモデルに対する表現的記号回帰

Expressive Symbolic Regression for Interpretable Models of Discrete-Time Dynamical Systems ( http://arxiv.org/abs/2406.06585v1 )

ライセンス: Link先を確認

Adarsh Iyer, Nibodh Boddupalli, Jeff Moehlis,

(参考訳) 離散時間力学系(定位写像)を定義する解釈可能な数学的表現は、科学的な関心の多くの現象をモデル化することができ、システムの振る舞いをより深く理解することができる。第一原理から表現を定式化するのは難しいため,データストリームのみを与えられた反復写像の表現を識別することが特に重要である。本研究では,この課題に対して,SymANNTEx(SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx ,SymANNTEx,SymANNTEx,SymANNTEx,SymANNTEx,SymANNT,SymANNT,SymANNT,SymANNT,SymANNT,S 回帰を最適化するためにモデルパイプラインを修正し、古典的なカオスマップを識別する際の調整されたモデルの挙動を特徴付ける。パーシモニーの目的により、スパーシリティ誘導重み正規化と情報理論インフォームド・シンプリケーションが実現される。修正したSymanNTExモデルでは,単一状態のマップを適切に識別し,二状態のアトラクタの近似に適度に成功していることを示す。これらのパフォーマンスは、データ駆動の科学的な発見と解釈を大いに約束する。

Interpretable mathematical expressions defining discrete-time dynamical systems (iterated maps) can model many phenomena of scientific interest, enabling a deeper understanding of system behaviors. Since formulating governing expressions from first principles can be difficult, it is of particular interest to identify expressions for iterated maps given only their data streams. In this work, we consider a modified Symbolic Artificial Neural Network-Trained Expressions (SymANNTEx) architecture for this task, an architecture more expressive than others in the literature. We make a modification to the model pipeline to optimize the regression, then characterize the behavior of the adjusted model in identifying several classical chaotic maps. With the goal of parsimony, sparsity-inducing weight regularization and information theory-informed simplification are implemented. We show that our modified SymANNTEx model properly identifies single-state maps and achieves moderate success in approximating a dual-state attractor. These performances offer significant promise for data-driven scientific discovery and interpretation.

翻訳日:2024-06-12 21:24:05 公開日:2024-06-05

# Bi-Chainer: 双方向チェインで推論する大規模言語モデルを自動化する

Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining ( http://arxiv.org/abs/2406.06586v1 )

ライセンス: Link先を確認

Shuqi Liu, Bowei He, Linqi Song,

(参考訳) 大規模言語モデル(LLM)は人間のような推論能力を示しているが、複雑な論理問題を解く上ではまだ課題に直面している。前方連鎖や後方連鎖のような既存の一方向連鎖法は、予測精度の低下や効率性の低下といった問題に悩まされる。そこで本研究では,2方向チェインリング手法であるBi-Chainerを提案する。これにより、中間推論結果をガイダンスとして利用して推論プロセスを容易にすることができる。 Bi-Chainerは,4つの挑戦的論理推論データセット上で,一方向チェインフレームワーク上での高精度ブートを実現する。さらに、Bi-Chainerは中間証明ステップの精度を高め、推論呼び出しの平均回数を減らし、より効率的で正確な推論を行う。

Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on four challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.

翻訳日:2024-06-12 21:24:05 公開日:2024-06-05

# 感覚体験における人間とAIの知覚アライメントの探索:LLMは繊維の手を理解するか?

Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand? ( http://arxiv.org/abs/2406.06587v1 )

ライセンス: Link先を確認

Shu Zhong, Elia Gatti, Youngjun Cho, Marianna Obrist,

(参考訳) 人間の意図による大規模言語モデル(LLM)の振る舞いの調整は、将来のAIにとって重要である。このアライメントの重要かつしばしば見落とされがちな側面は知覚アライメントである。タッチのような知覚のモダリティは、視覚のような他の感覚のモダリティよりも多面的かつニュアンス的である。本研究は,LLMが「触覚ハンド」タスクを用いて,人間の触覚とどのように協調するかを検討する。私たちは"Guess What Textile"インタラクションを作り、参加者には2つの繊維サンプル(ターゲットと参照)が与えられました。見ることなく、参加者はそれらの違いをLSMに説明しました。これらの記述を用いて、LLMは、その高次元埋め込み空間内での類似性を評価することによって、ターゲット繊維の同定を試みた。以上の結果から, 知覚的アライメントの程度は異なるが, 異なる繊維試料間で大きく異なることが示唆された。例えば、LLMの予測は絹のサテンには適しているが、綿のデニムには適していない。さらに, LLM予測と密に一致した織物経験を, 参加者は認識しなかった。これは触覚のアライメントに関する最初の調査であり、繊維の手で例示されている。このアライメントのばらつきの可能性のある源泉と、人間の知覚的アライメントが将来の日常業務にどのように役立つかについて議論する。

Aligning large language models (LLMs) behaviour with human intent is critical for future AI. An important yet often overlooked aspect of this alignment is the perceptual alignment. Perceptual modalities like touch are more multifaceted and nuanced compared to other sensory modalities such as vision. This work investigates how well LLMs align with human touch experiences using the "textile hand" task. We created a "Guess What Textile" interaction in which participants were given two textile samples -- a target and a reference -- to handle. Without seeing them, participants described the differences between them to the LLM. Using these descriptions, the LLM attempted to identify the target textile by assessing similarity within its high-dimensional embedding space. Our results suggest that a degree of perceptual alignment exists, however varies significantly among different textile samples. For example, LLM predictions are well aligned for silk satin, but not for cotton denim. Moreover, participants didn't perceive their textile experiences closely matched by the LLM predictions. This is only the first exploration into perceptual alignment around touch, exemplified through textile hand. We discuss possible sources of this alignment variance, and how better human-AI perceptual alignment can benefit future everyday tasks.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# Llama大言語モデルの創発的シンボリック推論能力の評価

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models ( http://arxiv.org/abs/2406.06588v1 )

ライセンス: Link先を確認

Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti,

(参考訳) 大規模言語モデル (LLM) は,ユーザとのチャットの唯一の目的としてトレーニングされることの多い場合でも,幅広いタスクにおいて,優れたパフォーマンスを実現している。その他のスキルの中で、LLMは数学的推論ベンチマークにおいて創発的な能力を示し、適切なプロンプト法によって引き起こすことができる。本研究では,様々なシンボリック推論タスクにおいて,人気のあるオープンソースLLMの能力と限界を体系的に検討する。 Llama 2 ファミリーの3つのモデルについて,難易度の異なる数式を解く必要がある2つのデータセットで評価した。我々はLLM(Llama 2 Chat)とLlama 2(MAmmoTHとMetaMath)の2つの微調整版を数学的問題に対処するためにテストした。モデルのサイズを拡大し、関連するタスクを微調整することで、パフォーマンスが大幅に向上するのを観察する。さらに, 細粒度評価法を用いて, 計算精度の低い数式では, 計算精度が向上する傾向がみられた。

Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical reasoning benchmarks, which can be elicited with appropriate prompting methods. In this work, we systematically investigate the capabilities and limitations of popular open-source LLMs on different symbolic reasoning tasks. We evaluate three models of the Llama 2 family on two datasets that require solving mathematical formulas of varying degrees of difficulty. We test a generalist LLM (Llama 2 Chat) as well as two fine-tuned versions of Llama 2 (MAmmoTH and MetaMath) specifically designed to tackle mathematical problems. We observe that both increasing the scale of the model and fine-tuning it on relevant tasks lead to significant performance gains. Furthermore, using fine-grained evaluation measures, we find that such performance gains are mostly observed with mathematical formulas of low complexity, which nevertheless often remain challenging even for the largest fine-tuned models.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# PatentEval: 特許生成におけるエラーを理解する

PatentEval: Understanding Errors in Patent Generation ( http://arxiv.org/abs/2406.06589v1 )

ライセンス: Link先を確認

You Zuo, Kim Gerdes, Eric Villemonte de La Clergerie, Benoît Sagot,

(参考訳) 本研究では,機械が生成する特許文書における2つの異なるタスク,すなわちクレーム・ツー・アストラクション生成と,先行するクレームの生成を評価するための総合的なエラータイプロジーを提案する。我々はまた,この文脈で言語モデルを体系的に評価するためのベンチマークであるPatentEvalを開発した。我々の研究は、様々なモデルの人間によって注釈付けされた比較分析を含む。これらは、特許ドメイン内のタスクのトレーニング中に特別に適応されたものから、最新の汎用大規模言語モデル(LLM)まで様々である。さらに,特許文書評価における人間の判断を近似する指標について検討し,これらの指標が専門家評価とどの程度一致しているかを分析した。これらのアプローチは、特許テキスト生成の専門分野における現在の言語モデルの能力と限界に関する貴重な洞察を提供する。

In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: claims-to-abstract generation, and the generation of the next claim given previous ones. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context. Our study includes a comparative analysis, annotated by humans, of various models. These range from those specifically adapted during training for tasks within the patent domain to the latest general-purpose large language models (LLMs). Furthermore, we explored and evaluated some metrics to approximate human judgments in patent text evaluation, analyzing the extent to which these metrics align with expert assessments. These approaches provide valuable insights into the capabilities and limitations of current language models in the specialized field of patent text generation.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# LLMは古典的か非単調的か?ジェネリクスから学ぶ

Are LLMs classical or nonmonotonic reasoners? Lessons from generics ( http://arxiv.org/abs/2406.06590v1 )

ライセンス: Link先を確認

Alina Leidinger, Robert van Rooij, Ekaterina Shutova,

(参考訳) LLMにおける推論に関する最近の研究は、機械や人間のフィードバックに対する印象的な性能と柔軟な適応の証拠を提供している。現実世界をナビゲートするために人間の認知に不可欠な非単調な推論は、難しいが未調査の課題である。本研究では,7つの最先端LCMの非単調な推論能力について,1つの抽象的および1つの常識的推論タスク,例えば「バードフライ」や「ペンギンは飛べない」例外について検討する(図1参照)。 LLMは人間の非単調な推論能力に従って推論パターンを示すが、支持する例("Owls fly")や非関連情報("Lions has manes")の追加によって、ジェネリックスの真理条件に対する安定した信念を維持することができない。我々の研究は、人間の推論行動のLCMへの寄与と、一般的な能力の評価の落とし穴を浮き彫りにし、一貫した推論はいまだ解明されていない。

Recent scholarship on reasoning in LLMs has supplied evidence of impressive performance and flexible adaptation to machine generated or human feedback. Nonmonotonic reasoning, crucial to human cognition for navigating the real world, remains a challenging, yet understudied task. In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as 'Birds fly', and exceptions, 'Penguins don't fly' (see Fig. 1). While LLMs exhibit reasoning patterns in accordance with human nonmonotonic reasoning abilities, they fail to maintain stable beliefs on truth conditions of generics at the addition of supporting examples ('Owls fly') or unrelated information ('Lions have manes'). Our findings highlight pitfalls in attributing human reasoning behaviours to LLMs, as well as assessing general capabilities, while consistent reasoning remains elusive.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# 肺癌検診におけるTNM分類の高度化のための多言語大言語モデルの検討

Exploring Multilingual Large Language Models for Enhanced TNM classification of Radiology Report in lung cancer staging ( http://arxiv.org/abs/2406.06591v1 )

ライセンス: Link先を確認

Hidetoshi Matsuo, Mizuho Nishio, Takaaki Matsunaga, Koji Fujimoto, Takamichi Murakami,

(参考訳) 背景: 労働集約的構造と物語的報告により, 構造的放射線学報告は未発達のままである。ディープラーニング、特にGPT-3.5のような大規模言語モデル(LLM)は、自然言語による放射線学レポートの構造化を自動化することを約束している。しかし、LLMは英語以外の言語では効果が低いことが報告されているが、そのラジオロジカルな性能は広く研究されていない。目的: 本研究は, GPT3.5-turbo (GPT3.5) を用いた放射線学報告に基づくTNM分類の精度と日本語と英語の多言語LLMの有用性について検討することを目的とした。対象と方法:GPT3.5を用いて肺がんの胸部CT検査からTNM分類を自動的に生成し,その性能を評価するシステムを開発した。一般化線形混合モデルを用いて,両言語で完全あるいは部分的なTNM定義を提供することによる影響を統計的に分析した。結果: TNM の完全定義と, 英語での放射線学報告(M = 94%, N = 80%, T = 47%, ALL = 36%)により, 高い精度が得られた。 T, N, M の各因子の定義はそれぞれの精度を統計的に改善した(T: odds ratio (OR) = 2.35, p < 0.001; N: OR = 1.94, p < 0.01; M: OR = 2.50, p < 0.001)。日本人の報告では、NとMの精度が低下した(Nの精度:OR = 0.74、Mの精度:OR = 0.21)。結論:本研究は,TNM自動分類における多言語LPMの有用性をラジオグラフィーレポートで示している。追加のモデルトレーニングがなくても、提供されたTNM定義により性能が向上し、放射線学の文脈におけるLLMの関連性が示唆された。

Background: Structured radiology reports remains underdeveloped due to labor-intensive structuring and narrative-style reporting. Deep learning, particularly large language models (LLMs) like GPT-3.5, offers promise in automating the structuring of radiology reports in natural languages. However, although it has been reported that LLMs are less effective in languages other than English, their radiological performance has not been extensively studied. Purpose: This study aimed to investigate the accuracy of TNM classification based on radiology reports using GPT3.5-turbo (GPT3.5) and the utility of multilingual LLMs in both Japanese and English. Material and Methods: Utilizing GPT3.5, we developed a system to automatically generate TNM classifications from chest CT reports for lung cancer and evaluate its performance. We statistically analyzed the impact of providing full or partial TNM definitions in both languages using a Generalized Linear Mixed Model. Results: Highest accuracy was attained with full TNM definitions and radiology reports in English (M = 94%, N = 80%, T = 47%, and ALL = 36%). Providing definitions for each of the T, N, and M factors statistically improved their respective accuracies (T: odds ratio (OR) = 2.35, p < 0.001; N: OR = 1.94, p < 0.01; M: OR = 2.50, p < 0.001). Japanese reports exhibited decreased N and M accuracies (N accuracy: OR = 0.74 and M accuracy: OR = 0.21). Conclusion: This study underscores the potential of multilingual LLMs for automatic TNM classification in radiology reports. Even without additional model training, performance improvements were evident with the provided TNM definitions, indicating LLMs' relevance in radiology contexts.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# 自動プロセススーパービジョンによる言語モデルの数学的推論の改善

Improve Mathematical Reasoning in Language Models by Automated Process Supervision ( http://arxiv.org/abs/2406.06592v1 )

ライセンス: Link先を確認

Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi,

(参考訳) 数学的問題の解決やコード生成といった複雑な多段階推論タスクは、最も先進的な大規模言語モデル(LLM)でさえも大きなハードルとなる。 LLMの出力をORM(Outcome Reward Model)で検証することは、LLMの推論性能を向上させるための標準推論時間技術である。しかし、これは、中間結果が適切に報酬や罰則が与えられていない長い、または複数のホップ推論チェーンを持つタスクの推論には不十分であることを示す。プロセス監督は、推論プロセス中に中間報酬を割り当てることで、この制限に対処する。これまで、プロセスの監視データ収集に使われた手法は、人間のアノテーションやモンテカルロのステップごとの見積もりに頼っていた。この課題に対応して,高品質なプロセス監視データの効率的な収集を目的とした,MCTSアルゴリズムである「textit{OmegaPRM}」を提案する。このアルゴリズムは、二項探索によるChain of Thought(CoT)の最初のエラーを迅速に識別し、正と負の例のバランスをとり、効率と品質の両立を保証する。その結果、プロセスリワードモデル(Process Reward Model:PRM)をトレーニングするために、150万以上のプロセス監視アノテーションを収集できるようになりました。この完全自動化プロセスの監督と重み付き自己整合性アルゴリズムを併用して、Gemini Proモデルの数学推論性能を改良し、MATHベンチマークで69.4 %の成功率、51 %のベースモデル性能から36 %の改善を実現した。さらに、プロセス全体が人間の介入なしに動作し、既存の方法と比較して、我々の手法は金銭的にも計算的にも費用対効果がある。

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods.

翻訳日:2024-06-12 21:14:20 公開日:2024-06-05

# ESBMCによるArm(R) Confidential Computing Architectureのコンポーネント検証

Verifying components of Arm(R) Confidential Computing Architecture with ESBMC ( http://arxiv.org/abs/2406.04375v1 )

ライセンス: Link先を確認

Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, Lucas C. Cordeiro,

(参考訳) Realm Management Monitor(RMM)は、Arm Confidential Computing Architecture(Arm CCA)において重要なファームウェアコンポーネントである。これまでの研究は、RMMの仕様とプロトタイプ参照実装の検証に形式的手法を適用していた。しかし、単一の検証ツールにのみ依存することは、特定のバグや脆弱性の監視につながる可能性がある。本稿では,SMT(Satifiability Modulo Theories)ベースのソフトウェアモデルチェッカーであるESBMCの適用について述べる。 ESBMCのソースコードを正確に解析し、適切な時間枠内で仕様の失敗を特定する能力を示します。さらに,産業技術者の効率を高めるため,ESBMCの潜在的な改善を提案する。この研究は、実世界のシナリオにおける形式的検証技術の能力の探求に寄与し、産業的検証のニーズを満たすためのさらなる改善の道筋を提案する。

Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the application of ESBMC, a state-of-the-art Satisfiability Modulo Theories (SMT)-based software model checker, to further enhance RRM verification. We demonstrate ESBMC's ability to precisely parse the source code and identify specification failures within a reasonable time frame. Moreover, we propose potential improvements for ESBMC to enhance its efficiency for industry engineers. This work contributes to exploring the capabilities of formal verification techniques in real-world scenarios and suggests avenues for further improvements to better meet industrial verification needs.

翻訳日:2024-06-10 18:49:00 公開日:2024-06-05

# グラフニューラルネットワークとマンバの併用による全スライド画像の局所的・大域的組織空間的関係の把握

Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images ( http://arxiv.org/abs/2406.04377v1 )

ライセンス: Link先を確認

Ruiwen Ding, Kha-Dinh Luong, Erika Rodriguez, Ana Cristina Araujo Lemos da Silva, William Hsu,

(参考訳) 計算病理学では、ギガピクセル全体のスライド画像(WSI)から空間的特徴を抽出することが基本的な課題であるが、その大きさが大きいため、WSIは通常より小さなタイルに分割される。この分析の重要な側面は、これらのタイルから情報を集約し、WSIレベルで予測することです。本稿では,メッセージパッシンググラフニューラルネットワーク(GNN)と状態空間モデル(Mamba)を組み合わせて,WSIにおけるタイル間の局所的空間的関係とグローバル的空間的関係を捉えるモデルを提案する。早期肺腺癌(LUAD)患者の無再発生存予測に有効であった。タイルレベルの情報要約統計に基づくアグリゲーション、マルチインスタンス学習(MIL)ベースのアグリゲーション、GNNベースのアグリゲーション、GNNベースのアグリゲーションなど、WSIにおけるタイルレベルの情報アグリゲーションの最先端手法と比較した。追加実験では、異なるタイプのノード特徴と異なるタイルサンプリング戦略がモデル性能に与える影響が示された。この作業は、WSIベースの分析にも容易に拡張できます。コード:https://github.com/rina-ding/gat-mamba。

In computational pathology, extracting spatial features from gigapixel whole slide images (WSIs) is a fundamental task, but due to their large size, WSIs are typically segmented into smaller tiles. A critical aspect of this analysis is aggregating information from these tiles to make predictions at the WSI level. We introduce a model that combines a message-passing graph neural network (GNN) with a state space model (Mamba) to capture both local and global spatial relationships among the tiles in WSIs. The model's effectiveness was demonstrated in predicting progression-free survival among patients with early-stage lung adenocarcinomas (LUAD). We compared the model with other state-of-the-art methods for tile-level information aggregation in WSIs, including tile-level information summary statistics-based aggregation, multiple instance learning (MIL)-based aggregation, GNN-based aggregation, and GNN-transformer-based aggregation. Additional experiments showed the impact of different types of node features and different tile sampling strategies on the model performance. This work can be easily extended to any WSI-based analysis. Code: https://github.com/rina-ding/gat-mamba.

翻訳日:2024-06-10 18:49:00 公開日:2024-06-05

# TIDMAD:AIによる暗黒物質発見のための時系列データセット

TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising ( http://arxiv.org/abs/2406.04378v1 )

ライセンス: Link先を確認

J. T. Fry, Aobo Li, Lindley Winslow, Xinyi Hope Fu, Zhenghao Fu, Kaliroe M. W. Pappas,

(参考訳) ダークマターは宇宙の物質の約85%を占めていますが、地球上の実験室では直接観測されていません。ダークマターの起源は、現代物理学において最も重要な問題の一つであり、ダークマターを確実に検出することは、基礎科学におけるノーベル賞レベルのブレークスルーとなるだろう。 ABRACADABRA実験は暗黒物質を探すために特別に設計された。発見はまだされていないが、ABRACADABRAは物理学界で広く支持されている暗黒物質探索の結果を多数生成している。実験では、超長い時系列データを毎秒1000万サンプルの速度で生成し、そこでダークマター信号は超長い時系列の中で正弦波振動モードとして現れる。本稿では、ABRACADABRA実験から得られた包括的なデータリリースであるTIDMADについて、トレーニング、検証、科学サブセットに分割した超長期時系列データセット、直接モデルベンチマークのための慎重に設計されたデノナイズスコア、および物理論文として出版に適したコミュニティ標準ダークマター検索結果を生成する完全な分析フレームワークについて述べる。このデータリリースにより、コアAIアルゴリズムが信号を抽出し、実際の物理結果を生成することにより、基礎科学が前進する。データダウンロードと関連する解析スクリプトはhttps://github.com/jessicafry/TIDMADで公開されている。

Dark matter makes up approximately 85% of total matter in our universe, yet it has never been directly observed in any laboratory on Earth. The origin of dark matter is one of the most important questions in contemporary physics, and a convincing detection of dark matter would be a Nobel-Prize-level breakthrough in fundamental science. The ABRACADABRA experiment was specifically designed to search for dark matter. Although it has not yet made a discovery, ABRACADABRA has produced several dark matter search results widely endorsed by the physics community. The experiment generates ultra-long time-series data at a rate of 10 million samples per second, where the dark matter signal would manifest itself as a sinusoidal oscillation mode within the ultra-long time series. In this paper, we present the TIDMAD -- a comprehensive data release from the ABRACADABRA experiment including three key components: an ultra-long time series dataset divided into training, validation, and science subsets; a carefully-designed denoising score for direct model benchmarking; and a complete analysis framework which produces a community-standard dark matter search result suitable for publication as a physics paper. This data release enables core AI algorithms to extract the signal and produce real physics results thereby advancing fundamental science. The data downloading and associated analysis scripts are available at https://github.com/jessicafry/TIDMAD

翻訳日:2024-06-10 18:49:00 公開日:2024-06-05

# 近接量子コンピュータにおけるオープン量子システムの長時間誤差緩和シミュレーション

Long-Time Error-Mitigating Simulation of Open Quantum Systems on Near Term Quantum Computers ( http://arxiv.org/abs/2108.01183v2 )

ライセンス: Link先を確認

Brian Rost, Lorenzo Del Re, Nathan Earnest, Alexander F. Kemper, Barbara Jones, James K. Freericks,

(参考訳) 本研究では,最大2千個のエンタングゲートを含むディープ回路においても,ハードウェアエラーに対する堅牢性を示す量子ハードウェア上でのオープン量子システムシミュレーションについて検討する。無限の熱浴に結合した2つの電子系をシミュレートする。 1) 駆動電界における散逸性自由電子の系,及び 2) 磁場中の単一軌道における2つの相互作用する電子の熱化(ハバード原子)。これらの問題はIBMの量子コンピュータを用いて解決され、長い目で見れば忠実度が低下する兆しはない。この結果から, 開放量子系シミュレーションアルゴリズムは, ノイズの多いハードウェア上で, 同様に複雑な非散逸性アルゴリズムをはるかに上回ることができることを示した。我々の2つの例は、駆動散逸型量子多体問題は最終的に量子コンピュータで解決できることを約束している。

We study an open quantum system simulation on quantum hardware, which demonstrates robustness to hardware errors even with deep circuits containing up to two thousand entangling gates. We simulate two systems of electrons coupled to an infinite thermal bath: 1) a system of dissipative free electrons in a driving electric field; and 2) the thermalization of two interacting electrons in a single orbital in a magnetic field -- the Hubbard atom. These problems are solved using IBM quantum computers, showing no signs of decreasing fidelity at long times. Our results demonstrate that algorithms for simulating open quantum systems are able to far outperform similarly complex non-dissipative algorithms on noisy hardware. Our two examples show promise that the driven-dissipative quantum many-body problem can eventually be solved on quantum computers.

翻訳日:2024-06-08 01:27:18 公開日:2024-06-05

# 摂動理論と正方形の和

Perturbation Theory and the Sum of Squares ( http://arxiv.org/abs/2205.12325v3 )

ライセンス: Link先を確認

Matthew B. Hastings,

(参考訳) sum-of-squares (SoS) 階層は半定値プログラミングに基づく強力な手法であり、古典的および量子最適化の両問題に利用できる。この階層はいくつかの名前で呼ばれ、特に量子化学では還元密度行列 (reduced density matrix, RDM) と呼ばれる。我々は、スピン系(またはクビット系)、ボゾン系(非調和振動子)、およびクォート相互作用を持つフェルミオン系(フェルミオン系)の3種類の系の弱い結合摂動理論を再現するこの階層の能力を考える。このようなフェルミオン系に対しては、次数-$4$ SoS(量子化学において2$-RDMと呼ばれる)が二階摂動理論を再現しないが、次数-$6$ SoS(3$-RDM)が再現する(そして三階摂動理論を再現すると予想する)。実際、これを実現できる6$SoSの断片を特定できるが、これは実際の量子化学計算に有用であり、この断片を6$SoSよりも低コストで実装できる可能性がある。注目すべきことに、この断片は、Sachdev-Ye-Kitaev(SYK)モデルのためにHastingsとO'Donnellによって研究されたものと非常に似ている。

The sum-of-squares (SoS) hierarchy is a powerful technique based on semi-definite programming that can be used for both classical and quantum optimization problems. This hierarchy goes under several names; in particular, in quantum chemistry it is called the reduced density matrix (RDM) method. We consider the ability of this hierarchy to reproduce weak coupling perturbation theory for three different kinds of systems: spin (or qubit) systems, bosonic systems (the anharmonic oscillator), and fermionic systems with quartic interactions. For such fermionic systems, we show that degree-$4$ SoS (called $2$-RDM in quantum chemsitry) does not reproduce second order perturbation theory but degree-$6$ SoS ($3$-RDM) does (and we conjecture that it reproduces third order perturbation theory). Indeed, we identify a fragment of degree-$6$ SoS which can do this, which may be useful for practical quantum chemical calculations as it may be possible to implement this fragment with less cost than the full degree-$6$ SoS. Remarkably, this fragment is very similar to one studied by Hastings and O'Donnell for the Sachdev-Ye-Kitaev (SYK) model.

翻訳日:2024-06-08 01:27:18 公開日:2024-06-05

# 2部ネットワークにおける遅延補正ブロックモデルの変分推定

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks ( http://arxiv.org/abs/2206.08465v2 )

ライセンス: Link先を確認

Yunpeng Zhao, Ning Hao, Ji Zhu,

(参考訳) バイパルタイトグラフは様々な科学・工学分野にまたがる。同時に二部グラフ内の2種類のノードを双クラスタリングによってグループ化することは、そのようなグラフのネットワーク解析における根本的な課題である。潜在ブロックモデル(英: latent block model、LBM)は、ビクラスタリングのためのモデルベースのツールである。しかし、LBMの有効性は、データ行列における行と列の和の影響によって制限されることが多い。この制限に対処するために、行と列クラスタの異なる次数を考慮した次数補正潜在ブロックモデル(DC-LBM)を導入し、実世界のデータセットとシミュレーションデータの性能を大幅に向上させる。我々は,Mステップにおけるパラメータ推定のための閉形式解を作成することにより,効率的な変動予測-最大化アルゴリズムを開発した。さらに、直流-LBMの下での変動推定器のラベルの一貫性と収束率を証明し、グラフの大きさが大きくなると、平均的な行や列の次数が無限大に近づく限り、期待されるグラフ密度はゼロに近づく。

Bipartite graphs are ubiquitous across various scientific and engineering fields. Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs. The latent block model (LBM) is a commonly used model-based tool for biclustering. However, the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix. To address this limitation, we introduce the degree-corrected latent block model (DC-LBM), which accounts for the varying degrees in row and column clusters, significantly enhancing performance on real-world data sets and simulated data. We develop an efficient variational expectation-maximization algorithm by creating closed-form solutions for parameter estimates in the M steps. Furthermore, we prove the label consistency and the rate of convergence of the variational estimator under the DC-LBM, allowing the expected graph density to approach zero as long as the average expected degrees of rows and columns approach infinity when the size of the graph increases.

翻訳日:2024-06-08 01:19:21 公開日:2024-06-05

# フェデラル・フェデラル・フェデラル・フェデラル・ラーニング」、米連邦捜査局(表

FedCC: Robust Federated Learning against Model Poisoning Attacks ( http://arxiv.org/abs/2212.01976v2 )

ライセンス: Link先を確認

Hyejun Jeong, Hamin Son, Seohu Lee, Jayun Hyun, Tai-Myoung Chung,

(参考訳) 学習モデルにおけるプライバシの懸念に対処するために設計されたフェデレートラーニングは、データプライバシを保護する新たな分散パラダイムを導入しているが、サーバがローカルデータセットにアクセスできないことと保護対象の変化によって、攻撃面を区別する。堅牢なアグリゲーションアルゴリズムを含む既存のアプローチでは、悪意のあるクライアント、特に独立性のない分散データを効果的にフィルタリングすることができない。さらに、これらのアプローチは非IIDデータと毒殺攻撃を別々に扱うことが多い。両課題を同時に解決するため,FedCCは単純だが斬新なアルゴリズムである。クラスタリングにはPenultimate Layer RepresentationsのCentered Kernel Alignment類似性を活用し、非IIDデータ設定でも選択したパラメータを選択的に平均化することにより、悪意のあるクライアントを識別およびフィルタリングすることができる。対象のないモデル中毒とバックドア攻撃を緩和するFedCCの有効性について検討した。 FedCCは、既存の外れ値検出ベースと1次統計ベースの方法と比較して、攻撃の信頼性を一貫したゼロに減らす。具体的には、グローバルパフォーマンスの平均劣化を65.5倍に抑える。学習モデルを評価するというこの新たな視点は、FLモデルのセキュリティとプライバシの分野に価値ある貢献をもたらすと信じています。コードは、論文の受理時に利用可能になる。

Federated Learning, designed to address privacy concerns in learning models, introduces a new distributed paradigm that safeguards data privacy but differentiates the attack surface due to the server's inaccessibility to local datasets and the change in protection objective--parameters' integrity. Existing approaches, including robust aggregation algorithms, fail to effectively filter out malicious clients, especially those with non-Independently and Identically Distributed data. Furthermore, these approaches often tackle non-IID data and poisoning attacks separately. To address both challenges simultaneously, we present FedCC, a simple yet novel algorithm. It leverages the Centered Kernel Alignment similarity of Penultimate Layer Representations for clustering, allowing it to identify and filter out malicious clients by selectively averaging chosen parameters, even in non-IID data settings. Our extensive experiments demonstrate the effectiveness of FedCC in mitigating untargeted model poisoning and backdoor attacks. FedCC reduces the attack confidence to a consistent zero compared to existing outlier detection-based and first-order statistics-based methods. Specifically, it significantly minimizes the average degradation of global performance by 65.5\%. We believe that this new perspective of assessing learning models makes it a valuable contribution to the field of FL model security and privacy. The code will be made available upon paper acceptance.

翻訳日:2024-06-08 01:19:21 公開日:2024-06-05

# 近所で何が起きているのか? 地元ニュースの発見を監督するアプローチ

What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News ( http://arxiv.org/abs/2301.08146v3 )

ライセンス: Link先を確認

Deven Santosh Shah, Shiying He, Gosuddin Kamaruddin Siddiqi, Radhika Bansal,

(参考訳) 地域ニュース記事は、都市、郡、州のような地理的領域のユーザーに影響を与えるニュースのサブセットである。ローカルニュースの検出(ステップ) 1)その地理的位置と衝突半径を決定する(ステップ) 2) 正確な地域ニュースレコメンデーションに向けた重要なステップは2つある。ニュースタイトルから市名を検出するようなルールに基づくナイーブな手法は、ニュース内容の理解の欠如により誤った結果をもたらす傾向にある。自然言語処理の最新技術を活用し,ローカルニュースの自動検出とコンテンツに基づくローカルニュースレコメンデーションを可能にする統合パイプラインを開発した。本稿では,(1)ドメイン知識と自動データ処理を組み込んだ弱教師付きフレームワーク,(2)多言語設定への拡張性について述べる。スタンフォード大学のCoreNLP NERモデルと比較して、パイプラインの精度は高く、実世界および人間ラベル付きデータセット上でリコール評価を行う。このパイプラインは、より正確なローカルニュースをユーザーに提供し、ローカルビジネスがより露出しやすくし、近隣の安全に関する情報を提供する可能性がある。

Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.

翻訳日:2024-06-08 01:19:21 公開日:2024-06-05

# 2つの遠方励起原子からの遅延誘起自然暗黒状態発生

Delay-induced spontaneous dark state generation from two distant excited atoms ( http://arxiv.org/abs/2303.06559v2 )

ライセンス: Link先を確認

William Alvarez-Giron, Pablo Solano, Kanu Sinha, Pablo Barberis-Blostein,

(参考訳) 1次元導波路に結合した2つの完全に励起された2層原子の非マルコフ動力学を遅延の有無で検討する。我々は、逆原子アンサンブルが放射を増強するために同期する、よく知られた超蛍光現象に類似して、原子間分離に応じて原子を絡み合った暗黒状態に同期させる「サブ蛍光」効果が存在することを示した。我々の結果は長距離量子ネットワークに関係しており、遠方の量子エミッタ間の自発的な絡み合い発生のメカニズムを提示する。

We investigate the collective non-Markovian dynamics of two fully excited two-level atoms coupled to a one-dimensional waveguide in the presence of delay. We demonstrate that analogous to the well-known superfluorescence phenomena, where an inverted atomic ensemble synchronizes to enhance its emission, there is a `subfluorescence' effect that synchronizes the atoms into an entangled dark state depending on the interatomic separation. Our results are pertinent to long-distance quantum networks, presenting a mechanism for spontaneous entanglement generation between distant quantum emitters.

翻訳日:2024-06-08 01:09:36 公開日:2024-06-05

# 線形回帰としての増大バランスウェイト

Augmented balancing weights as linear regression ( http://arxiv.org/abs/2304.14545v3 )

ライセンス: Link先を確認

David Bruns-Smith, Oliver Dukes, Avi Feller, Elizabeth L. Ogburn,

(参考訳) 本稿では,自動脱バイアス機械学習(AutoDML)としても知られる拡張バランスウェイトの特徴について述べる。これらの人気の高い2倍の堅牢または非バイアスの機械学習推定器は、結果モデリングと重みのバランスをとることで、確率スコアを推定し、反転させる代わりに、共変量バランスを直接達成する重みを結合する。結果モデルと重み付けモデルの両方が、ある(おそらく無限)基底で線型である場合、拡張推定器は、元の結果モデルからの係数と不注意な通常の最小二乗(OLS)からの係数を同じデータに結合する係数を持つ単一の線形モデルと等価であることを示す。正規化パラメータの特定の選択の下では、拡張推定器はOLS推定器のみに崩壊することが多く、例えば1986年のラロンデデータセットの再解析で発生する。次に、これらの結果を結果と重み付けモデルの特定の選択に拡張します。まず、結果モデルと重み付けモデルの両方に(カーネル)リッジ回帰を用いた拡張推定器は、1つの(カーネル)リッジ回帰と等価であることを示す。これは有限サンプルで数値的に保持され、アンダースムーシングと漸近的な収束率の新しい解析の基礎となる。重み付けモデルがラッソペナル化回帰である場合、特殊ケースに対して閉形式表現を与え、 ``double selection' 特性を示す。我々のフレームワークは、この人気の高い推定器のクラスにブラックボックスを開き、アンダースムースとダブルロバストな推定器の半パラメトリック効率に関する既存の結果のギャップを埋め、拡張バランスウェイトの性能に関する新たな洞察を提供する。

We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning (AutoDML). These popular doubly robust or de-biased machine learning estimators combine outcome modeling with balancing weights - weights that achieve covariate balance directly in lieu of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the coefficients from the original outcome model and coefficients from an unpenalized ordinary least squares (OLS) fit on the same data. We see that, under certain choices of regularization parameters, the augmented estimator often collapses to the OLS estimator alone; this occurs for example in a re-analysis of the Lalonde 1986 dataset. We then extend these results to specific choices of outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression. This holds numerically in finite samples and lays the groundwork for a novel analysis of undersmoothing and asymptotic rates of convergence. When the weighting model is instead lasso-penalized regression, we give closed-form expressions for special cases and demonstrate a ``double selection'' property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.

翻訳日:2024-06-08 01:09:36 公開日:2024-06-05

# $\mathbb{R}$-smooth Banach空間における非線形方程式のPINN誤差推定

PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spaces ( http://arxiv.org/abs/2305.11915v3 )

ライセンス: Link先を確認

Jiexing Gao, Yurii Zakharian,

(参考訳) 本稿では,PINNの誤差推定を許容するPDEの演算型クラスについて述べる。また、$L^p$空間に対して、PINNの残差境界のツールであるブランブル・ヒルベルト型補題を得る。

In the paper, we describe in operator form classes of PDEs that admit PINN's error estimation. Also, for $L^p$ spaces, we obtain a Bramble-Hilbert type lemma that is a tool for PINN's residuals bounding.

翻訳日:2024-06-08 01:09:36 公開日:2024-06-05

# C-MCTS:Monte Carlo Tree Searchによる安全な計画

C-MCTS: Safe Planning with Monte Carlo Tree Search ( http://arxiv.org/abs/2305.16209v3 )

ライセンス: Link先を確認

Dinesh Parthasarathy, Georgios Kontes, Axel Plinge, Christopher Mutschler,

(参考訳) CMDP(Constrained Markov Decision Process)の定式化は、制約を受ける安全クリティカルな意思決定タスクの解決を可能にする。 CMDPはReinforcement Learningの文献で広く研究されているが、MCTSのようなサンプリングベースの計画アルゴリズムにはほとんど注目されていない。従来のアプローチは、モンテカルロのコスト見積を用いて、高い分散に苦しむ制約違反を避けるため、コストに関して保守的に機能する。エージェント展開前のオフラインフェーズで時間差学習を訓練した安全評論家を用いてコストを見積もるConstrained MCTS(C-MCTS)を提案する。批評家は、展開中にMCTS内の安全でない軌道をプルーニングすることで探索を制限する。 C-MCTSはコスト制約を満たすが、制約境界に近づき、以前の作業よりも高い報酬を達成する。良い副産物として、プランナーはより効率的なw.r.t.プランニングステップである。最も重要なことは、プランナーと現実世界のモデルミスマッチの下では、C-MCTSは以前の作業よりもコスト違反の影響を受けにくいことである。

The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches perform conservatively with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance. We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration by pruning unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards than previous work. As a nice byproduct, the planner is more efficient w.r.t. planning steps. Most importantly, under model mismatch between the planner and the real world, C-MCTS is less susceptible to cost violations than previous work.

翻訳日:2024-06-08 01:09:36 公開日:2024-06-05

# ArtWhisperer:芸術創造における人間とAIのインタラクションを特徴付けるデータセット

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations ( http://arxiv.org/abs/2306.08141v3 )

ライセンス: Link先を確認

Kailas Vodrahalli, James Zou,

(参考訳) 生成的AIがより普及するにつれて、人間のユーザがそのようなモデルとどのように相互作用するかを研究することが重要である。本研究では,対象画像の生成にテキスト・ツー・イメージ・モデルをどのように利用するかを検討する。このインタラクションを研究するために、私たちはArtWhispererというオンラインゲームを作成しました。このゲームを通して、5万以上の人間とAIのインタラクションを記録し、各インタラクションは、ユーザが生成した1つのテキストプロンプトと、それに対応する生成された画像に対応する。その多くは、ユーザがターゲットイメージの最良のプロンプトを見つけるために反復的なインタラクションであり、これは人間とAIのコラボレーションを研究するためのユニークなシーケンシャルデータセットである。本データセットの初期分析では,迅速なインタラクションとユーザ戦略のいくつかの特徴を同定する。人々は多様なプロンプトを提出し、類似した画像を生成するさまざまなテキスト記述を発見できる。興味深いことに、ユーザがより良いプロンプトを見つけるため、迅速な多様性は低下しない。さらに,我々のデータセットを用いたAIの聴取可能性の定量化のための新しい指標を提案する。我々は、タスクを適切に完了させるために必要な相互作用の期待数として、ステアビリティを定義する。この値は、各目標タスクにマルコフ連鎖を適合させ、マルコフ連鎖の適切なスコアに到達するための期待時間を計算することで推定する。我々は、異なるタイプのターゲットイメージと2つの異なるモデルでAIのステアビリティを定量化し比較し、都市と自然世界のイメージが芸術的、幻想的なイメージよりもステアビリティが高いことを発見した。これらの知見は、AIとAIの相互作用に関する洞察を与え、AIのステアビリティを評価する具体的な方法を示し、ArtWhispererデータセットの汎用性を実証する。

As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.

翻訳日:2024-06-08 00:59:06 公開日:2024-06-05

# 高次ネットワークにおけるDegree Heterogeneity: Inference in the Hypergraph $\boldsymbolβ$-Model

Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model ( http://arxiv.org/abs/2307.02818v4 )

ライセンス: Link先を確認

Sagnik Nandy, Bhaswar B. Bhattacharya,

(参考訳) ランダムグラフの$\boldsymbol{\beta}$-modelは、次数不均一なネットワーク内の対相互作用を表現するために一般的に用いられる。対の相互作用を超えて、Stasi et al (2014) は高次(複数方向)相互作用を持つネットワークにおける次不均一性を捉えるためのハイパーグラフ $\boldsymbol{\beta}$-model を導入した。本稿では,複数の層を持つハイパーグラフ $\boldsymbol{\beta}$-model の厳密な研究を開始する。まず、最大極大推定値(ML)の収束率を導出し、その最小値の最適性を確立する。また,ML推定の限界分布を導出し,モデルパラメータに対する漸近的に有効な信頼区間を構築する。次に、ハイパーグラフ $\boldsymbol{\beta}$-model における適合性の問題を考える。具体的には、Null仮説の下でのLRテストの漸近正規性を確立し、その検出閾値を導出し、しきい値における制限パワーを導出する。興味深いことに、LRテストの検出しきい値はこのしきい値以下で漸近的に無力である、最小限の最適値であることが判明した。理論的結果は数値実験でさらに検証される。ハイパーグラフ $\boldsymbol{\beta}$-models の推定と推論のための理論的枠組みの開発に加えて、上記の結果はグラフ $\boldsymbol{\beta}$-model の多くのギャップを埋める。

The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.

翻訳日:2024-06-08 00:59:06 公開日:2024-06-05

# 1つの論理量子ビットを符号化した量子極符号のファクトリベースフォールトトレラント生成

Factory-based Fault-tolerant Preparation of Quantum Polar Codes Encoding One logical Qubit ( http://arxiv.org/abs/2307.15226v2 )

ライセンス: Link先を確認

Ashutosh Goswami, Mehdi Mhalla, Valentin Savin,

(参考訳) Q1符号の論理的符号状態、すなわち1量子ビットを符号化する量子極性符号を作成するためのフォールトトレラントな方法が最近提案されている。その耐故障性は、エラー検出装置によって保証され、準備中にエラーが検出された場合には、完全に破棄される。誤り検出のため、準備は確率的であり、その成功率である準備率は、コード長とともに急速に減少し、大きなコード長のコード状態の準備が妨げられる。そこで本研究では,Q1コードステートの複製を並列に数回作成しようとする,Q1コードステートの工場準備について考察する。余分なスケジューリングステップを用いることで、エラーが検出されるたびに準備が完全に破棄されるのを回避できるので、順に準備率が向上する。さらに, モンテカルロシミュレーションに基づく数値結果の厳密な整合性を示す工場準備法を用いて構築したQ1符号の合成と論理誤差率を推定する理論的手法を提案する。したがって,モンテカルロシミュレーションが現実的に実現不可能な大符号長の推定値を提供するには,理論的手法が有用である。例えば、N = 256 の場合、p = 10^{-3} の実際に興味深い物理誤差率に対して 0.02\% から 27\% に増加する。 N = 256 の Q1 符号は、それぞれ p = 10^{-3} と p = 3 x 10^{-4} に対して 10^{-11} と 10^{-15} の論理誤差率を達成する。これは、類似の符号長と最小距離を持つ曲面符号と比較して約3桁の改善に対応しており、大規模なフォールトトレラント量子コンピューティングのための提案されたスキームの可能性を示唆している。

A fault-tolerant way to prepare logical code-states of Q1 codes, i.e., quantum polar codes encoding one qubit, has been recently proposed. The fault tolerance therein is guaranteed by an error detection gadget, where if an error is detected during the preparation, one discards entirely the preparation. Due to error detection, the preparation is probabilistic, and its success rate, referred to as the preparation rate, decreases rapidly with the code-length, preventing the preparation of code-states of large code-lengths. In this paper, to improve the preparation rate, we consider a factory preparation of Q1 code-states, where one attempts to prepare several copies of Q1 code-states in parallel. Using an extra scheduling step, we can avoid discarding the preparation entirely, every time an error is detected, hence, achieving an increased preparation rate in turn. We further provide a theoretical method to estimate preparation and logical error rates of Q1 codes, prepared using factory preparation, which is shown to tightly fit the Monte-Carlo simulation based numerical results. Therefore, our theoretical method is useful for providing estimates for large code-lengths, where Monte-Carlo simulations are practically not feasible. Our numerical results, for a circuit-level depolarizing noise model, indicate that the preparation rate increases significantly, especially for large code-length N. For example, for N = 256, it increases from 0.02\% to 27\% for a practically interesting physical error rate of p = 10^{-3}. Remarkably, a Q1 code with N = 256 achieves logical error rates around 10^{-11} and 10^{-15} for p = 10^{-3} and p = 3 x 10^{-4}, respectively. This corresponds to an improvement of about three orders of magnitude compared to a surface code with similar code-length and minimum distance, thus showing the promise of the proposed scheme for large-scale fault-tolerant quantum computing.

翻訳日:2024-06-08 00:59:06 公開日:2024-06-05

# ゼロサムマルコフゲームにおけるモデルフリーアルゴリズムのサンプル効率の改善

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games ( http://arxiv.org/abs/2308.08858v2 )

ライセンス: Link先を確認

Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang,

(参考訳) 近年,マルチエージェント強化学習(RL)の理論研究において,ツープレイヤーゼロサムマルコフゲームの問題が注目されている。特に有限ホライズン・エピソード・マルコフ決定過程(MDPs)では、モデルベースのアルゴリズムは、標本の複雑さが$O(H^3SAB/\epsilon^2)$で、地平線上の$H$と州数$S$(それぞれ$A$と$B$は2人のプレイヤーのアクションの数を表す)の依存性が最適である$O(H^3SAB/\epsilon^2)$を見つけることができる。しかし、既存のモデルフリーアルゴリズムではそのような最適性を達成できない。本研究では,モデルフリーのステージベースQ-ラーニングアルゴリズムを提案し,モデルフリーのアルゴリズムがモデルベースアルゴリズムと同一のサンプル複雑性を達成できることを示し,モデルフリーのアルゴリズムがモデルベースアルゴリズムと同一の最適性を享受できることを初めて示す。 H$への依存性の主な改善は、単一のエージェントRLでしか使われていなかった参照アドバンテージ分解に基づいて、一般的な分散還元技術を活用することで生じる。しかし、そのような手法は値関数の臨界単調性に依存しており、これはマルコフのゲームでは粗相関平衡(CCE)オラクルによるポリシーの更新によって成り立たない。そこで,この手法をマルコフゲームに拡張するために,提案アルゴリズムは,値差が史上最小となる楽観的かつ悲観的な値関数のペアとして参照値関数を更新し,標本効率の向上を期待する鍵となる設計を特徴としている。

The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $\epsilon$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/\epsilon^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.

翻訳日:2024-06-08 00:49:21 公開日:2024-06-05

# 単一光子量子ランキング:シークエンシャルデコーディングが高次元エンタングルメントに遭遇する

Single Photon Quantum Ranging: When Sequential Decoding Meets High Dimensional Entanglement ( http://arxiv.org/abs/2308.13045v2 )

ライセンス: Link先を確認

Armanpreet Pannu, Han Liu, Amr S. Helmy, Hesham El Gamal,

(参考訳) モード毎の低雑音レベルと低反射率(高損失)状態における量子レンジ問題について考察する。本稿では, 単一光子伝送戦略に焦点をあて, 送信機における高次元時間ビン絡み合わせと検出器における逐次決定ルールを慎重に構成した新しい手法を提案する。解析結果から, 単一光子古典法, 従来提案されていた2モード圧縮真空レンジリング法, ブロックベースの古典的スキームなどと比較して, この手法から, 様々な操作パラメータで活用できる重要な性能向上が得られた。このパフォーマンス向上は、 1)高次元時間ビン絡み合わされた信号が単一の光子と非常に微細な範囲分解能を提供する能力 2) 逐次決定規則は, 誤差の確率に制約のある送信光子の平均個数を最小化する。分析は低エネルギー/低騒音に限られるが、提案手法の優れた性能はより広い範囲のシナリオにまで拡張され、さらなる解析的および実験的研究の動機となるだろうと推測する。

We consider the quantum ranging problem in the low noise level per mode and low reflectivity (high loss) regime. We focus on single photon transmission strategies and propose a novel approach that combines high dimensional time-bin entanglement at the transmitter with a carefully constructed sequential decision rule at the detector. Our analytical results establish the significant performance gains that can be leveraged from this approach in a range of operating parameters, as compared to the single photon classical approach, the two-mode squeezed vacuum ranging scheme proposed earlier, and even the block-based classical scheme. One can attribute this performance gain to 1) the ability of the high dimensional time-bin entangled signaling to offer a very fine range resolution with a single transmitted photon and 2) the ability of the sequential decision rule to minimize the average number of transmitted photon subject to a constraint on the probability of error. While our analysis is limited to the low energy/low noise regime, we conjecture that the proposed approach's superior performance extends to a wider range of scenarios which should motivate further analytical and experimental investigations.

翻訳日:2024-06-08 00:49:21 公開日:2024-06-05

# シャープネスを考慮した最小化と安定性の限界

Sharpness-Aware Minimization and the Edge of Stability ( http://arxiv.org/abs/2309.12488v6 )

ライセンス: Link先を確認

Philip M. Long, Peter L. Bartlett,

(参考訳) 最近の実験では、勾配降下(GD)をステップサイズ$\eta$でトレーニングする場合、損失のHessianの演算ノルムは、約2/\eta$に達するまで増加し、その後、この値に変動する。 2/\eta$は、この損失の局所的な二次近似を考慮して「安定性の端」と呼ばれる。我々は,GD の変種である SAM (Sharpness-Aware Minimization) の「安定性の端」に到達するための同様の計算を行う。 GDの場合とは異なり、結果のSAM-辺は勾配のノルムに依存する。 3つのディープラーニングトレーニングタスクを用いて、SAMは、この分析によって同定された安定性の端で動作していることを実証的に確認する。

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.

翻訳日:2024-06-08 00:39:36 公開日:2024-06-05

# 自己スペシャライゼーション - 大規模言語モデルにおける潜在専門家の発見

Self-Specialization: Uncovering Latent Expertise within Large Language Models ( http://arxiv.org/abs/2310.00160v2 )

ライセンス: Link先を確認

Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky,

(参考訳) 近年の研究では、人間の手書き種子から始まるモデル自体から生成された命令データを用いて、大規模言語モデルが一般的な指示に従うように整列された自己アライメントの有効性が実証されている。本研究では、総合的なアライメントではなく、専門家ドメイン専門化(例えば、バイオメディシン、ファイナンス)のための自己アライメントに焦点を当てる。予備的な例として、汎用的な指示追従訓練が下流の専門家ドメインの性能に及ぼす限界効果を定量的に示す。そこで本研究では,数個のラベル付き種子を有効利用して,クロスタスクの一般化を実現しつつ,効果的なモデル特化を可能にする自己特殊化を提案する。自己専門化(Self-specialization)は、ジェネラリストが事前訓練したLLMから専門家モデルを“彫り出す”ための、データとパラメータ効率のよい方法を提供する。バイオメディカル・ファイナンシャル・ドメインにおける実験結果から,我々の自己専門化モデルは,そのベースモデルよりも大きなマージンで優れており,また,一般に訓練されたり,他の方法で対象ドメインに適応した大規模モデルよりも大きいことが示唆された。

Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we quantitively show the marginal effect that generic instruction-following training has on downstream expert domains' performance. To remedy this, we propose self-specialization - allowing for effective model specialization while achieving cross-task generalization by leveraging only a few labeled seeds. Self-specialization offers a data- and parameter-efficient way of "carving out" an expert model out of a generalist pre-trained LLM. Exploring a variety of popular open large models as a base for specialization, our experimental results in both biomedical and financial domains show that our self-specialized models outperform their base models by a large margin, and even larger models that are generally instruction-tuned or that have been adapted to the target domain by other means.

翻訳日:2024-06-08 00:39:36 公開日:2024-06-05

# マルチタイル型ニューラルラジアンスフィールド(NeRF) -- 大規模航空データセットの幾何学的評価

Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets ( http://arxiv.org/abs/2310.00530v4 )

ライセンス: Link先を確認

Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino,

(参考訳) ニューラル・ラジアンス・フィールド(Neural Radiance Fields、NeRF)は、航空写真撮影を含む3D再構成作業の恩恵を受ける可能性がある。しかしながら、推定幾何のスケーラビリティと精度は、大規模な航空資産には十分に文書化されていないため、そのようなデータセットは通常、非常に高いメモリ消費と緩やかな収束をもたらす。と。本稿では,大規模な航空データセット上でのNeRFのスケール化を目標とし,NeRFの詳細な幾何学的評価を行う。具体的には、RAMのイメージローディング時のメモリ消費を削減するためのマルチカメラタイリング(MCT)戦略、GPUメモリの表現訓練、タイル内収束率の向上について紹介する。 MCTは、大きなフレームイメージを異なるカメラモデルで複数のタイル画像に分解し、これらの小さなフレームイメージを、精度を損なうことなく、特定の場所に必要なトレーニングプロセスに投入する。提案手法は代表的手法であるMip-NeRFに実装し,その幾何学的性能を2つの典型的な空中データセット上の3フォットグラムのMVSパイプラインとLiDAR参照データと比較する。定性的かつ定量的な結果は、提案したNeRFアプローチが従来の手法よりも完全性やオブジェクトの詳細をもたらすことを示唆している。

Neural Radiance Fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well-documented for large-scale aerial assets,since such datasets usually result in very high memory consumption and slow convergence.. In this paper, we aim to scale the NeRF on large-scael aerial datasets and provide a thorough geometry assessment of NeRF. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory, and increase the convergence rate within tiles. MCT decomposes a large-frame image into multiple tiled images with different camera models, allowing these small-frame images to be fed into the training process as needed for specific locations without a loss of accuracy. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines on two typical aerial datasets against LiDAR reference data. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy.

翻訳日:2024-06-08 00:39:36 公開日:2024-06-05

# POTLOC:Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization

POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization ( http://arxiv.org/abs/2310.13585v2 )

ライセンス: Link先を確認

Elahe Vahdani, Yingli Tian,

(参考訳) 本稿では,1フレームのみをトレーニングセットの各アクションインスタンスにアノテートする点教師付き時間的動作検出の課題に対処する。現在のメソッドのほとんどは、アノテーション付きポイントのスパースな性質によって妨げられ、アクションの継続的な構造やアクションインスタンス内の固有の時間的およびセマンティックな依存関係を効果的に表現するのに苦労しています。その結果、これらの手法は単に最も独特なアクションセグメントだけを学習し、不完全なアクションプロポーザルの作成につながった。本稿では,Pseudo-label Oriented Transformer(POTLOC)を提案する。 POTLocは、自己学習戦略を通じて、継続的なアクション構造を特定し、追跡するように設計されている。ベースモデルは、ポイントレベルの監督のみでアクションプロポーザルを生成することから始まります。これらの提案は、推定された行動境界の精度を高めるために、改良と回帰を行い、その後、補助的な監視信号として「擬似ラベル」を生産する結果となった。モデルのアーキテクチャは、トランスフォーマーと時間的特徴ピラミッドを統合して、ビデオスニペットの依存関係と様々な期間のモデルアクションをキャプチャする。粗い位置と行動の境界に関する情報を提供する擬似ラベルは、行動力学の学習を促進するためのトランスフォーマーの指導を支援する。 POTLOCはTHUMOS'14とActivityNet-v1.2データセットの最先端のポイント管理手法より優れている。

This paper tackles the challenge of point-supervised temporal action detection, wherein only a single frame is annotated for each action instance in the training set. Most of the current methods, hindered by the sparse nature of annotated points, struggle to effectively represent the continuous structure of actions or the inherent temporal and semantic dependencies within action instances. Consequently, these methods frequently learn merely the most distinctive segments of actions, leading to the creation of incomplete action proposals. This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action Localization utilizing only point-level annotation. POTLoc is designed to identify and track continuous action structures via a self-training strategy. The base model begins by generating action proposals solely with point-level supervision. These proposals undergo refinement and regression to enhance the precision of the estimated action boundaries, which subsequently results in the production of `pseudo-labels' to serve as supplementary supervisory signals. The architecture of the model integrates a transformer with a temporal feature pyramid to capture video snippet dependencies and model actions of varying duration. The pseudo-labels, providing information about the coarse locations and boundaries of actions, assist in guiding the transformer for enhanced learning of action dynamics. POTLoc outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.

翻訳日:2024-06-08 00:29:50 公開日:2024-06-05

# AGIへの道の歩みを運用するためのAGIのレベル

Levels of AGI for Operationalizing Progress on the Path to AGI ( http://arxiv.org/abs/2311.02462v4 )

ライセンス: Link先を確認

Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg,

(参考訳) 本稿では,人工知能(AGI)モデルとその前駆体の性能と動作を分類する枠組みを提案する。このフレームワークは、AGIのパフォーマンス、一般性、自律性のレベルを導入し、モデルを比較し、リスクを評価し、AGIへの道筋に沿って進捗を測定する共通の言語を提供する。フレームワークを開発するために、既存のAGIの定義を分析し、AGIにとって有用なオントロジーが満たすべき6つの原則を抽出する。これらの原則を念頭において、我々は「AGIのレベル」の深さ(性能)と広さ(一般性)の能力に基づいて提案し、現在のシステムがこのオントロジーにどのように適合するかを反映する。これらのレベルに対してAGIモデルの振る舞いと能力を定量化する将来のベンチマークの課題について論じる。最後に、これらのAGIのレベルが自律性やリスクといったデプロイメント上の考慮事項とどのように相互作用するかについて議論し、高機能なAIシステムの責任と安全なデプロイメントにおいて、ヒューマン・AIインタラクションパラダイムを慎重に選択することの重要性を強調します。

We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. With these principles in mind, we propose "Levels of AGI" based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.

翻訳日:2024-06-08 00:29:50 公開日:2024-06-05

# 非凸分散学習のための圧縮・スパースモデル

Compressed and Sparse Models for Non-Convex Decentralized Learning ( http://arxiv.org/abs/2311.05760v2 )

ライセンス: Link先を確認

Andrew Campbell, Hang Liu, Leah Woldemariam, Anna Scaglione,

(参考訳) 最近の研究は、特に大規模かつ過度にパラメータ化されたニューラルネットワーク(NN)において、分散機械学習(ML)の効率に重要なボトルネックとして、頻繁なモデル通信を強調している。そこで本研究では,勾配圧縮手法とモデルスペーシフィケーションを組み合わせた新しい分散MLアルゴリズムであるMalcom-PSGDを提案する。我々は,目標値に$\ell_1$正規化を加えてモデルの疎結合を促進し,学習のための分散近位SGD法を提案する。提案手法では,ベクトル源符号化とディザリングに基づく量子化を用いて,疎化モデルの圧縮勾配通信を行う。我々の分析は、Malcom-PSGDが、一定のコンセンサスと学習率を仮定して、反復に対して$\mathcal{O}(1/\sqrt{t})$の収束率を達成していることを示している。この結果は,非凸圧縮SGD法の収束性の証明によって裏付けられる。さらに,Malcom-PSGDに関連する通信コストに対して,クローズドフォームの表現を行う。その結果,提案手法は,最先端技術と比較して通信コストを約7,5 %削減できることがわかった。

Recent research highlights frequent model communication as a significant bottleneck to the efficiency of decentralized machine learning (ML), especially for large-scale and over-parameterized neural networks (NNs). To address this, we present Malcom-PSGD, a novel decentralized ML algorithm that combines gradient compression techniques with model sparsification. We promote model sparsity by adding $\ell_1$ regularization to the objective and present a decentralized proximal SGD method for training. Our approach employs vector source coding and dithering-based quantization for the compressed gradient communication of sparsified models. Our analysis demonstrates that Malcom-PSGD achieves a convergence rate of $\mathcal{O}(1/\sqrt{t})$ with respect to the iterations $t$, assuming a constant consensus and learning rate. This result is supported by our proof for the convergence of non-convex compressed Proximal SGD methods. Additionally, we conduct a bit analysis, providing a closed-form expression for the communication costs associated with Malcom-PSGD. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately $75\%$ when compared to the state-of-the-art.

翻訳日:2024-06-08 00:29:50 公開日:2024-06-05

# 量子セキュアデジタル署名のための同相ポリノミアル公開鍵暗号

Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature ( http://arxiv.org/abs/2311.08967v3 )

ライセンス: Link先を確認

Randy Kuang, Maria Perepechaenko, Mahmoud Sayed, Dafu Lou,

(参考訳) 2022年の研究でKuangらは、量子セーフな公開鍵システムにおける乗算と除算の逆関係を利用した多変数ポリノミアル公開鍵(MPPK)暗号を導入した。彼らはMPPKをホモモルフィックなポリノミアル公開鍵(HPPK)に拡張し、大きな隠蔽リング操作に同型暗号化を適用した。当初、鍵カプセル化(KEM)のために設計されたHPPKのセキュリティは、公開多項式の同型暗号化に依存している。本稿では,HPPK KEMをデジタル署名方式に拡張する。 HPPK KEMをデジタルシグネチャに適応させるために、Barrett還元アルゴリズムの拡張を導入し、モジュラ乗算を素体上の検証方程式の分割に変換する。拡張アルゴリズムは、署名を公開多項式係数に非線形に埋め込み、初期のMPPK DSスキームの脆弱性に対処する。セキュリティ分析は、プライマリフィールドサイズの2倍のリングビット長を考慮して、プライベートキーリカバリと偽シグネチャ攻撃の指数関数的複雑性を示す。

In their 2022 study, Kuang et al. introduced Multivariable Polynomial Public Key (MPPK) cryptography, leveraging the inversion relationship between multiplication and division for quantum-safe public key systems. They extended MPPK into Homomorphic Polynomial Public Key (HPPK), employing homomorphic encryption for large hidden ring operations. Originally designed for key encapsulation (KEM), HPPK's security relies on homomorphic encryption of public polynomials. This paper expands HPPK KEM to a digital signature scheme, facing challenges due to the distinct nature of verification compared to decryption. To adapt HPPK KEM to digital signatures, the authors introduce an extension of the Barrett reduction algorithm, transforming modular multiplications into divisions in the verification equation over a prime field. The extended algorithm non-linearly embeds the signature into public polynomial coefficients, addressing vulnerabilities in earlier MPPK DS schemes. Security analysis demonstrates exponential complexity for private key recovery and forged signature attacks, considering ring bit length twice that of the prime field size.

翻訳日:2024-06-08 00:29:50 公開日:2024-06-05

# genEVA:LLMを用いた分岐物語の生成と可視化

GENEVA: GENErating and Visualizing branching narratives using LLMs ( http://arxiv.org/abs/2311.09213v3 )

ライセンス: Link先を確認

Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan,

(参考訳) 対話型ロールプレイングゲーム(RPG)は強力なストーリーテリングを必要とする。これらの物語は、大きな創造的なチームを書くのに何年もかかるかもしれない。本研究では,このプロセスを支援するため,大規模生成テキストモデルの可能性を示す。プロトタイプツールである \textbf{GENEVA} は、デザイナによって提供される高レベルな物語記述と制約にマッチするストーリーラインの分岐と再収束を伴うリッチな物語グラフを生成する。大規模言語モデル(LLM)であるGPT-4は、分岐した物語を生成し、2段階のプロセスでグラフ形式でレンダリングするために使用される。本稿では,異なる文脈制約下での4つの有名な物語の分岐物語生成におけるgenEVAの利用について述べる。このツールはゲーム開発、シミュレーション、その他のゲームライクな特性を持つアプリケーションを支援する可能性がある。

Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties.

翻訳日:2024-06-08 00:20:02 公開日:2024-06-05

# NFTウォッシュ取引:直接対間接推定

NFT Wash Trading: Direct vs. Indirect Estimation ( http://arxiv.org/abs/2311.18717v2 )

ライセンス: Link先を確認

Brett Hemenway Falk, Gerry Tsoukalas, Niuniu Zhang,

(参考訳) 最近の研究では、Binanceのようなオフチェーン暗号取引所における取引価値の約70%が洗浄取引である。この論文は、NFT市場へ向けられ、トランザクションのオンチェーンの性質、すなわちWeb3のイノベーションのキーテットは、適用すべきより直接的な推定方法を可能にする。最大の3つのNFT市場に焦点を当てると、NFTボリュームの30-40%、取引価値の25-95%が洗剤取引であることがわかった。我々はこの直接的なアプローチを利用して、文献で提案されている最近の間接推定手法を批判的に評価し、効果の大きな違いを明らかにし、一部は完全に失敗する。 Cong et al (2023) で示唆されているように、トレードラウンドネスフィルタは最も正確な間接推定法として出現する。実際,超パラメータ微調整による直接的および間接的アプローチの緊密な整合性を示す。本研究は,デジタルファイナンスにおける金融不正の検出・規制における技術革新の重要性を明らかにするものである。

Recent studies estimate around 70% of traded value on off-chain crypto exchanges like Binance is wash trading. This paper turns to NFT markets, where the on-chain nature of transactions-a key tenet of Web3 innovation-enables more direct estimation methods to be applied. Focusing on three of the largest NFT marketplaces, we find 30-40% of NFT volume and 25-95% of traded value involve wash trading. We leverage this direct approach to critically evaluate recent indirect estimation methods suggested in the literature, revealing major differences in effectiveness, with some failing altogether. Trade-roundedness filters, as suggested in Cong et al. (2023), emerge as the most accurate indirect estimation method. In fact, we show how direct and indirect approaches can be closely aligned via hyper-parameter fine-tuning. Our findings underscore the crucial role of technological innovation in detecting and regulating financial misconduct in digital finance.

翻訳日:2024-06-08 00:20:02 公開日:2024-06-05

# 人間のように反応する:人間に固有の振る舞いをNAOに組み込む

Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced Sociability ( http://arxiv.org/abs/2312.07671v2 )

ライセンス: Link先を確認

Ali Ghadami, Mohammadreza Taghimohammadi, Mohammad Mohammadzadeh, Mohammad Hosseinipour, Alireza Taheri,

(参考訳) ロボットの人間に対する受容性と社会性は、人間のような反応を取り入れることで著しく向上することができる。人間は考えずに、環境イベントに素早く反応できる。人間が自然反応を示す例は、突然大きな音に遭遇し、彼らを驚かせたり、怖がらせたりする時である。このような瞬間において、個人は直感的に手を動かし、音の起源に向かって向きを変え、出来事の原因を判断しようとする。この固有の行動は、この研究の少ない社会ロボティクスを探求する動機となった。本研究では, 動作発生器, 音響分類器, YOLOオブジェクト検出器から構成されるマルチモーダルシステムを用いて, 環境を感知し, 突然の音の存在下, 自然の人間の恐怖反応を示し, そして, 環境中の恐怖を感知する音源を特定する。これらの有効な動きと推論は、本質的な人間の反応を模倣し、ロボットの社会性を高めることができる。動作生成のために,LSTMとMDNネットワークに基づくモデルを提案し,様々な動作を合成した。また、音検出の場合、音信号のスペクトログラムを入力として使用する転写学習モデルが好ましい。音響検出、モーション生成、画像認識の個別モデルを開発した後、NAOロボットに実装された総合的な「フィーア」モジュールに統合された。最後に、恐怖モジュールを実用的にテストし、2つの専門家グループと非専門家グループ(ロボティクス分野)がロボットの性能を評価するためのアンケートを作成した。提案モジュールは,ロボットの周囲環境において,突発的かつ大音量の音が鳴り響く場合に,ロボットが人間のように振る舞うことを参加者に納得させ,また,非専門家が社会ロボットとその性能に対して高い期待を抱いていることを示す。

Robots' acceptability among humans and their sociability can be significantly enhanced by incorporating human-like reactions. Humans can react to environmental events very quickly and without thinking. An instance where humans show natural reactions is when they encounter a sudden and loud sound that startles or frightens them. During such moments, individuals may instinctively move their hands, turn toward the origin of the sound, and try to determine the event's cause. This inherent behavior motivated us to explore this less-studied part of social robotics. In this work, a multi-modal system composed of an action generator, sound classifier, and YOLO object detector was designed to sense the environment and, in the presence of sudden loud sounds, show natural human fear reactions; and finally, locate the fear-causing sound source in the environment. These valid generated motions and inferences could imitate intrinsic human reactions and enhance the sociability of robots. For motion generation, a model based on LSTM and MDN networks was proposed to synthesize various motions. Also, in the case of sound detection, a transfer learning model was preferred that used the spectrogram of the sound signals as its input. After developing individual models for sound detection, motion generation, and image recognition, they were integrated into a comprehensive "fear" module implemented on the NAO robot. Finally, the fear module was tested in practical application and two groups of experts and non-experts (in the robotics area) filled out a questionnaire to evaluate the performance of the robot. We indicated that the proposed module could convince the participants that the Nao robot acts and reasons like a human when a sudden and loud sound is in the robot's peripheral environment, and additionally showed that non-experts have higher expectations about social robots and their performance.

翻訳日:2024-06-08 00:20:02 公開日:2024-06-05

# Webの衝撃が機械翻訳される:マルチウェイ並列性からの洞察

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism ( http://arxiv.org/abs/2401.05749v2 )

ライセンス: Link先を確認

Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico,

(参考訳) ウェブ上のコンテンツは、しばしば多くの言語に翻訳されることを示し、これらのマルチウェイ翻訳の低品質は、機械翻訳(MT)を用いて作成された可能性が高いことを示している。マルチウェイ並列で機械生成されたコンテンツは、下位のリソース言語における翻訳を支配しているだけでなく、それらの言語における全ウェブコンテンツの大部分を構成している。また、多くの言語に翻訳されるコンテンツの種類の選択バイアスの証拠も、MTを通して低品質の英語コンテンツが多くの低レベルリソース言語に翻訳されるのと一致している。本研究は、モノリンガルデータとバイリンガルデータの両方をウェブから抽出した多言語大言語モデルのようなトレーニングモデルに関する深刻な懸念を提起する。

We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT). Multi-way parallel, machine generated content not only dominates the translations in lower resource languages; it also constitutes a large fraction of the total web content in those languages. We also find evidence of a selection bias in the type of content which is translated into many languages, consistent with low quality English content being translated en masse into many lower resource languages, via MT. Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web.

翻訳日:2024-06-08 00:10:18 公開日:2024-06-05

# Medusa: 複数のデコードヘッドを備えたシンプルなLCM推論高速化フレームワーク

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads ( http://arxiv.org/abs/2401.10774v2 )

ライセンス: Link先を確認

Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao,

(参考訳) 大規模言語モデル(LLM)は、逐次計算を必要とする自動回帰デコーディングを採用し、各ステップは前のステップの出力に依存する。これにより、各ステップが完全なモデルパラメータをHigh-Bandwidth Memory (HBM)からアクセラレータのキャッシュに移行する必要があるため、ボトルネックが発生する。投機的復号法のような手法はこの問題に対処するために提案されているが、それらの実装は独立したドラフトモデルの取得と維持に関わる課題によって妨げられている。本稿では,複数のトークンを並列に予測するために,余分なデコードヘッドを追加することで,LCM推論を効率化するMedusaを提案する。ツリーベースのアテンションメカニズムを使用して、Medusaは複数の候補継続を構築し、各デコードステップでそれらを同時に検証する。並列処理を活用することで、Medusaはデコードステップの数を大幅に削減する。 Medusa-1: Medusa は凍結した背骨 LLM 上に直接微調整され,無害な推論の加速を可能にする。 Medusa-2: MedusaはバックボーンLLMと共に微調整され、Medusaヘッドの予測精度が向上し、スピードアップが向上するが、バックボーンモデルの能力を保持する特別なトレーニングレシピが必要である。また、トレーニングデータがない状況に対処する自己蒸留や、生成品質を維持しつつ受け入れ率を高める典型的な受入方式など、Medusaの有用性を向上または拡張するいくつかの拡張を提案する。様々な大きさのモデルと訓練手順を用いてメデューサを評価する。実験により,Medusa-1は生成品質を損なうことなく2.2倍以上の高速化が可能であり,Medusa-2は2.3～3.6倍の高速化を実現している。

Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementation is impeded by the challenges associated with acquiring and maintaining a separate draft model. In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. Using a tree-based attention mechanism, Medusa constructs multiple candidate continuations and verifies them simultaneously in each decoding step. By leveraging parallel processing, Medusa substantially reduces the number of decoding steps required. We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration. Medusa-2: Medusa is fine-tuned together with the backbone LLM, enabling better prediction accuracy of Medusa heads and higher speedup but needing a special training recipe that preserves the backbone model's capabilities. Moreover, we propose several extensions that improve or expand the utility of Medusa, including a self-distillation to handle situations where no training data is available and a typical acceptance scheme to boost the acceptance rate while maintaining generation quality. We evaluate Medusa on models of various sizes and training procedures. Our experiments demonstrate that Medusa-1 can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x.

翻訳日:2024-06-08 00:00:12 公開日:2024-06-05

# 脱獄攻撃に対する言語モデルのロバストプロンプト最適化

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( http://arxiv.org/abs/2401.17263v3 )

ライセンス: Link先を確認

Andy Zhou, Bo Li, Haohan Wang,

(参考訳) AIアライメントの進歩にもかかわらず、大きな言語モデル(LLM)は敵の攻撃や脱獄に弱いままであり、敵は望ましくない行動を誘発するためにプロンプトを修正することができる。いくつかの防衛策が提案されているが、新たに提案された攻撃やより挑戦的な脅威モデルには適応していない。そこで本稿では,ロバスト・プロンプト・最適化(RPO)を用いて,ロバスト・プロンプト・最適化(RPO)による堅牢なシステムレベルの防御を実現する。本手法では, 敵を防御目標に直接組み込み, 軽量かつ移動可能な接尾辞を最適化することにより, RPOが最悪の場合の適応攻撃に適応できるようにする。 GPT-4の攻撃成功率(ASR)は6%,Llama-2は0%,JailbreakBenchは0%に低下した。コードはhttps://github.com/lapisrocks/rpoにある。

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO) to create robust system-level defenses. Our approach directly incorporates the adversary into the defensive objective and optimizes a lightweight and transferable suffix, enabling RPO to adapt to worst-case adaptive attacks. Our theoretical and experimental results show improved robustness to both jailbreaks seen during optimization and unknown jailbreaks, reducing the attack success rate (ASR) on GPT-4 to 6% and Llama-2 to 0% on JailbreakBench, setting the state-of-the-art. Code can be found at https://github.com/lapisrocks/rpo

翻訳日:2024-06-08 00:00:12 公開日:2024-06-05

# 絡み合いと測定の相補的関係

Complementary Relationships between Entanglement and Measurement ( http://arxiv.org/abs/2401.17537v2 )

ライセンス: Link先を確認

Michael Steiner, Ronald Rendell,

(参考訳) パターン可視性、予測可能性、識別可能性などの粒子の干渉特性に関する補完的な関係が存在する。さらに、情報ゲイン$G$と、絡み合ったスピン対に対する測定障害$F$の関係が知られている。ここでは、同様の絡み合いと測定の相補関係が生じるかどうかを考察する。量子ビット系では、単一系における測定と二部系における測定の両方が絡み合いに関して考慮される。 $\overline{E}+D\le 1$は、測定後の平均絡み合いが$\overline{E}$であり、1つの測定の計測乱れが$D$であることを示す。 Alice と Bob が共有する二部系の測定について、$\overline{E}+G\le 1$ ここで$G$は、Bob が得るアリスの結果に関する最大情報ゲインである。これらの結果は任意の初期混合状態と非エルミート作用素に対して一般化される。最大絡み合った初期状態の場合、$D\le E_{L}$と$G\le E_{L}$はアリスによる測定による絡み合い損失である。得られた乱れ量や情報取得量は、絡み合いによって厳密に制限されていると結論付けている。

Complementary relationships exist regarding interference properties of particles such as pattern visibility, predictability and distinguishability. Additionally, relationships are known between information gain $G$ and measurement disturbance $F$ for entangled spin pairs. The question of whether a similar complementary relationship between entanglement and measurement occurs is examined herein. For qubit systems, both measurement on a single system and measurements on a bipartite system are considered in regards to the entanglement. It is proven that $\overline{E}+D\le 1$ holds where $\overline{E}$ is the average entanglement after a measurement is made and for which $D$ is a measure of the measurement disturbance of a single measurement. For measurements on a bipartite system shared by Alice and Bob ,it is shown that $\overline{E}+G\le 1$ where $G$ is the maximum information gain regarding Alice's result that can be obtained by Bob. These results are generalized for arbitrary initial mixed states and as well to non-Hermitian operators. In the case of maximally entangled initial states, it is found that $D\le E_{L}$ and $G\le E_{L}$ where $E_{L}$ is the entanglement loss due to measurement by Alice. We conclude that the amount of disturbance and information gain that one can gain are strictly limited by entanglement.

翻訳日:2024-06-08 00:00:12 公開日:2024-06-05

# Monotone, Bi-Lipschitz, Polyak-Lojasiewicz Networks

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks ( http://arxiv.org/abs/2402.01344v4 )

ライセンス: Link先を確認

Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester,

(参考訳) 本稿では, 入力摂動に対する出力感度) と逆リプシッツ(出力と出力の差分性)の両方をスムーズに制御できるバイリプシッツ可逆ニューラルネットワークBiLipNetを提案する。 2つ目の貢献は、新しいスカラー出力ネットワークPLNetであり、これはBiLipNetと二次ポテンシャルの合成である。我々はPLNetがPolyak-Lojasiewicz条件を満たすことを示し、一意かつ効率的に計算可能な大域的最小値で非凸サロゲート損失を学習するために適用可能であることを示す。これらのネットワークの中心となる技術的要素は、証明された強い単調性とリプシッツ性を持つ新しい可逆的残留層であり、ビリップネットを構築するために直交層を構成する。これらの性質の証明は増分二次的制約に基づいており、スペクトル正規化で達成できるよりもはるかに厳密な境界となる。さらに、高速アルゴリズムを適用可能な3演算分割問題の連続として、BiLipNetの逆数、つまりPLNetの最小値の計算を定式化する。

This paper presents a new bi-Lipschitz invertible neural network, the BiLipNet, which has the ability to smoothly control both its Lipschitzness (output sensitivity to input perturbations) and inverse Lipschitzness (input distinguishability from different outputs). The second main contribution is a new scalar-output network, the PLNet, which is a composition of a BiLipNet and a quadratic potential. We show that PLNet satisfies the Polyak-Lojasiewicz condition and can be applied to learn non-convex surrogate losses with a unique and efficiently-computable global minimum. The central technical element in these networks is a novel invertible residual layer with certified strong monotonicity and Lipschitzness, which we compose with orthogonal layers to build the BiLipNet. The certification of these properties is based on incremental quadratic constraints, resulting in much tighter bounds than can be achieved with spectral normalization. Moreover, we formulate the calculation of the inverse of a BiLipNet -- and hence the minimum of a PLNet -- as a series of three-operator splitting problems, for which fast algorithms can be applied.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# 補助的短遅延による強遅延フィードバックによる強化学習の強化

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays ( http://arxiv.org/abs/2402.03141v2 )

ライセンス: Link先を確認

Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang,

(参考訳) 強化学習(Reinforcement Learning, RL)は、事象と知覚知覚の間の遅延の一般的な場合において困難である。最先端のSOTA(State-of-the-art State Augmentation)技術は、確率的環境における状態空間の爆発または性能劣化に悩まされる。これらの課題に対処するために, 確率環境における性能を損なうことなく, 短時間の遅延を含む補助的タスクを利用して, 長時間の遅延でRLを加速する, 補助的強化学習(AD-RL)手法を提案する。具体的には、AD-RLは短い遅延に対する値関数を学習し、ブートストラップとポリシー改善技術を用いて長い遅延に調整する。理論的には、これはサンプルの複雑さを大幅に減少させる可能性がある。決定論的および確率的ベンチマークでは,本手法はサンプル効率と政策性能の両方においてSOTAよりも有意に優れていた。コードはhttps://github.com/QingyuanWuNothing/AD-RLで入手できる。

Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# ポリノミアル時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v2 )

ライセンス: Link先を確認

Sungyoon Kim, Mert Pilanci,

(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムであれば,n がトレーニングサンプル数である O(log n^0.5) の係数によって,元の問題と緩和の間の相対的最適性ギャップが有界であることが示される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、抽出可能な多項式時間アルゴリズムにつながる。さらに, 緩やかな仮定の下では, 局所勾配法は訓練損失の低い点に収束し, 高い確率で収束することを示す。その結果,局所勾配法が有効である理由の理解に新たな光を当てることができた。

In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# DySLIM:カオスシステムのための不変測度による動的安定学習

DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems ( http://arxiv.org/abs/2402.04467v2 )

ライセンス: Link先を確認

Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez,

(参考訳) 散逸的なカオス系から力学を学ぶことは、その固有の不安定さのために、その正のリャプノフ指数によって形式化され、学習力学における誤りを指数関数的に増幅することが知られている。しかし、これらの系の多くはエルゴード性や引力を示す:コンパクトで非常に複雑な多様体で、軌跡は有限時間で収束し、不変測度、すなわち力学の作用の下で不変な確率分布をサポートし、システムの長期的な統計的挙動を規定する。本研究では、この構造を利用して、軌跡間の不適合のみを対象とする典型的な手法と対照的に、不変測度と力学の学習を対象とする新しい枠組みを提案する。我々のフレームワークは、既存の学習目的で使用できる、抽出可能でサンプルの効率的な目的を提案するのに使われます。我々のDynamics Stable Learning by Invariant Measure (DySLIM) の目的は、他の学習目標と比較して、より優れたポイントワイドトラッキングと長期統計精度を実現するモデルトレーニングを可能にすることである。スケーラブルな正規化項で分布をターゲットとすることで、気候や気候モデルのようなゆっくりと変化する分布を示すより複雑なシステムにこのアプローチを拡張できることを期待する。

Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measure (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# Caduceus: 双方向等価長鎖DNA配列モデリング

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling ( http://arxiv.org/abs/2403.03234v2 )

ライセンス: Link先を確認

Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov,

(参考訳) 大規模シーケンスモデリングが急速に進歩し、生物学やゲノム工学に発展した。しかし、ゲノム配列のモデリングは、長距離トークン相互作用のモデル化の必要性、ゲノムの上流領域と下流領域の影響、DNAの逆相補性(RC)といった課題をもたらす。本稿では、長距離マンバブロックから構築したこれらの課題に動機づけられたアーキテクチャを提案し、それを双方向性をサポートするBiMambaコンポーネントに拡張し、さらにRC等分散をサポートするMambaDNAブロックに拡張する。 RC同種二方向長鎖DNA言語モデルの最初のファミリーであるCaduceusの基盤としてMambaDNAを用い,CaduceusのDNA基盤モデルを生成する事前学習および微調整戦略を導入する。 Caduceusは、ダウンストリームベンチマークで以前の長距離モデルよりも優れており、挑戦的な長距離変動効果予測タスクでは、双方向性や等分散を生かさない10倍の大きなモデルの性能を上回っている。

Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of 10x larger models that do not leverage bi-directionality or equivariance.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# SU(3)離散部分群に対する原始量子ゲート:$Σ(36\times3)$

Primitive Quantum Gates for an SU(3) Discrete Subgroup: $Σ(36\times3)$ ( http://arxiv.org/abs/2405.05973v2 )

ライセンス: Link先を確認

Erik J. Gustafson, Yao Ji, Henry Lamm, Edison M. Murairi, Shuchen Zhu,

(参考訳) 我々は、108要素の$\Sigma(36\times3)$群のデジタル量子シミュレーションのための原始ゲートセットを構築する。量子シミュレーションのために$SU(3)$の非アーベル結晶のような部分群が構築されたのはこれが初めてである。ゲージリンクレジスタと必要なプリミティブ -- 反転ゲート、グループ乗算ゲート、トレースゲート、および$\Sigma(36\times3)$ Fourier変換 -- は、8量子符号化と不均一3量子レジスタと2量子レジスタの両方に対して提示される。後者では、任意のユニタリをこのアーキテクチャに分解する特別なコンパイラが開発された。

We construct the primitive gate set for the digital quantum simulation of the 108-element $\Sigma(36\times3)$ group. This is the first time a nonabelian crystal-like subgroup of $SU(3)$ has been constructed for quantum simulation. The gauge link registers and necessary primitives -- the inversion gate, the group multiplication gate, the trace gate, and the $\Sigma(36\times3)$ Fourier transform -- are presented for both an eight-qubit encoding and a heterogeneous three-qutrit plus two-qubit register. For the latter, a specialized compiler was developed for decomposing arbitrary unitaries onto this architecture.

翻訳日:2024-06-07 23:50:27 公開日:2024-06-05

# 平均$n$-stepの返却は強化学習における変数を減らす

Averaging $n$-step Returns Reduces Variance in Reinforcement Learning ( http://arxiv.org/abs/2402.03903v2 )

ライセンス: Link先を確認

Brett Daley, Martha White, Marlos C. Machado,

(参考訳) n$-step returnや$\lambda$-returnsといったマルチステップリターンは、強化学習(RL)メソッドのサンプル効率を改善するために一般的に使用される。多段階学習の利点を逆転させ、未来に近づきすぎると、多段階学習の利点が逆転する。我々の研究では、分散を減らすために複合戻り値 -- $n$-step の重み付き平均値 -- が示される。与えられた$n$-stepの戻り値と同じ縮約係数を持つ任意の化合物が、厳密に分散を減少させることを初めて証明する。さらに,この分散還元特性が線形関数近似の下での時間差学習の有限サンプル複雑性を向上させることを証明した。一般化合物のリターンは実装に費用がかかるため,ミニバッチ経験再生を用いた場合であっても,効率を保ちながら分散を低減できる2ブートストラップリターンを導入する。 DQN や PPO のような深部RL 剤の試料効率が$n$-step である場合が多いことを示す実験を行った。

Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- weighted averages of $n$-step returns -- to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that compound returns often increase the sample efficiency of $n$-step deep RL agents like DQN and PPO.

翻訳日:2024-06-07 23:40:31 公開日:2024-06-05

# 離散状態空間上の生成フロー:タンパク質共設計への応用によるマルチモーダルフローの実現

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design ( http://arxiv.org/abs/2402.04997v2 )

ライセンス: Link先を確認

Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola,

(参考訳) 離散データと連続データを組み合わせることは、生成モデルにとって重要な能力である。本稿では、離散データの新しいフローベースモデルである離散フローモデル(DFM)について述べる。私たちの重要な洞察は、連続時間マルコフ連鎖を用いて連続空間フローマッチングの離散的等価性を実現できるということです。 DFMは、離散拡散モデルを特定のインスタンスとして含む単純な導出の恩恵を受けつつ、既存の拡散に基づくアプローチよりも優れた性能を実現している。我々はDFM法を用いてマルチモーダルフローに基づくモデリングフレームワークを構築した。この能力をタンパク質共設計のタスクに適用し、タンパク質の構造と配列を共同生成するモデルを学ぶ。提案手法は,同じマルチモーダルモデルを用いてシーケンスや構造を柔軟に生成しながら,最先端の協調設計性能を実現する。

Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.

翻訳日:2024-06-07 23:40:31 公開日:2024-06-05

# ゼロショットエンドツーエンド音声翻訳の限界を押し上げる

Pushing the Limits of Zero-shot End-to-End Speech Translation ( http://arxiv.org/abs/2402.10422v2 )

ライセンス: Link先を確認

Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà,

(参考訳) データ不足とテキストモダリティ間のモダリティギャップは、エンドツーエンド音声翻訳(ST)システムの2つの大きな障害であり、その性能を損なう。従来の研究は、外部MTデータを活用することによるこれらの課題の軽減と、音声テキスト表現を近づける距離メトリクスの最適化を試みてきた。しかし、競争結果を達成するには、通常いくつかのSTデータが必要である。このため、ゼロショットSTの手法であるZeroSwotを導入し、ペアのSTデータを使わずにモダリティギャップをブリッジする。新たなCTC圧縮と最適トランスポートを利用して、ASRデータのみを用いて音声エンコーダを訓練し、多言語MTモデルの表現空間と整合する。音声エンコーダは、推論時にMTモデルとシームレスに統合され、MTモデルによってサポートされている全ての言語間で、音声からテキストへの直接変換を可能にする。実験の結果,STデータを使わずに効率よくモダリティギャップを塞ぐことができることがわかったが,MuST-CとCoVoSTは従来のゼロショットモデルだけでなく,教師付きモデルよりも手法の優位性を実証し,最先端の結果を得ることができた。

Data scarcity and the modality gap between the speech and text modalities are two major obstacles of end-to-end Speech Translation (ST) systems, thus hindering their performance. Prior work has attempted to mitigate these challenges by leveraging external MT data and optimizing distance metrics that bring closer the speech-text representations. However, achieving competitive results typically requires some ST data. For this reason, we introduce ZeroSwot, a method for zero-shot ST that bridges the modality gap without any paired ST data. Leveraging a novel CTC compression and Optimal Transport, we train a speech encoder using only ASR data, to align with the representation space of a massively multilingual MT model. The speech encoder seamlessly integrates with the MT model at inference, enabling direct translation from speech to text, across all languages supported by the MT model. Our experiments show that we can effectively close the modality gap without ST data, while our results on MuST-C and CoVoST demonstrate our method's superiority over not only previous zero-shot models, but also supervised ones, achieving state-of-the-art results.

翻訳日:2024-06-07 23:30:46 公開日:2024-06-05

# Llamasは英語で働くか?多言語トランスフォーマーの潜在言語について

Do Llamas Work in English? On the Latent Language of Multilingual Transformers ( http://arxiv.org/abs/2402.10588v3 )

ライセンス: Link先を確認

Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West,

(参考訳) 我々は、言語モデルがどのように機能するか、言語バイアスの起源を理解する上で重要な問題である、英語を内部的なピボット言語として使用する、バランスの取れない英語支配のコーパスで訓練された多言語言語モデルかどうかを問う。変換器モデルのLlama-2ファミリに着目し,一意に正しい単発連続性を持つ英語でないプロンプトを慎重に構築する。層から層へ変換器は、最終プロンプトトークンの入力埋め込みを次の確率が計算される出力埋め込みに徐々にマッピングする。中間埋め込みを高次元空間で追跡すると、(1)中間埋め込みは出力トークンの埋め込みから遠く離れたところから始まり、(2)既に中間層で意味論的に正しい次のトークンを復号できるが、そのバージョンが英語で入力言語よりも高い確率を与える。これらの結果を「入力空間」と「概念空間」と「出力空間」の3つの相がそれぞれ動作する概念モデルにキャストした。重要な証拠としては、抽象的な「概念空間」は他の言語よりも英語に近いことが示唆されており、多言語言語モデルが持つバイアスに関して重要な結果をもたらす可能性がある。

We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation. From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space", respectively. Crucially, our evidence suggests that the abstract "concept space" lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.

翻訳日:2024-06-07 23:30:46 公開日:2024-06-05

# ソフトな自己整合性により言語モデルエージェントが改善

Soft Self-Consistency Improves Language Model Agents ( http://arxiv.org/abs/2402.13212v2 )

ライセンス: Link先を確認

Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal,

(参考訳) 大規模言語モデル(LLM)の生成は、最終的な答えを選択するために複数のソリューションのサンプリングとスコアリングによって改善される。自己整合性(SC)のような現在の「サンプルと選択」手法は、回答を得るために多数決に頼っている。しかし、タスクが多くの明瞭で有効な答えを持っている場合、投票による選択には多数のサンプルが必要である。これにより、SCは複数のアクション(回答)を逐次生成する対話的なタスクに対して、極めて高価になる。このようなタスクに対して多数決が一貫した利得を得られないことを確立した後、スコアリング基準を軟化して成功率を高める方法を示す。我々は,SCの不連続スコアをモデル確率から計算した連続スコアに置き換えるソフト自己整合性(SOFT-SC)を導入する。 SOFT-SCは長期の対話的タスクの性能と効率を向上し、SCと同等またはより良いパフォーマンスのために半分のサンプルを必要とする。一定の数のサンプルに対して、SOFT-SCは、bashプログラムの絶対的な成功率でSCを1.3%上回り、オンラインショッピング(WebShop)では6.6%増、インタラクティブホームゲーム(ALFWorld)では4.7%増となる。最後に,オープンソースモデルとブラックボックスモデルの両方に適用可能であることを示す。

Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.

翻訳日:2024-06-07 21:22:40 公開日:2024-06-05

# 因果推論問題に対する言語モデルの最適化

Optimizing Language Models for Human Preferences is a Causal Inference Problem ( http://arxiv.org/abs/2402.14979v2 )

ライセンス: Link先を確認

Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency,

(参考訳) 大規模言語モデル(LLM)が学術的・商業的に広く使われるようになるにつれて、言語モデルが人間の好みに沿ったテキストを生成する方法への関心が高まっている。本稿では,テキストと関連する数値結果からなる直接結果データセットから人選好の言語モデル最適化について検討する。まず,言語モデルの最適化を因果問題と見なして,モデルがテキストと結果の関係を正しく学習することを保証する。本稿では,この因果的言語最適化問題を形式化し,その問題に対する非バイアスな代用目的を解決する手法-因果的選好最適化(CPO)を開発した。さらにCPOを2倍に頑健なCPO(DR-CPO)で拡張し,サロゲート目標のばらつきを低減し,バイアスに対する強い保証を維持した。最後に, DR-CPOの有効性を実証的に実証し, 困難条件下でのDR-CPOのロバスト性を検証した。

As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical outcome measuring the reader's response. We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method--causal preference optimization (CPO)--that solves an unbiased surrogate objective for the problem. We further extend CPO with doubly robust CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.

翻訳日:2024-06-07 21:12:20 公開日:2024-06-05

# SoK:フェデレーション・アンラーニングにおける課題と機会

SoK: Challenges and Opportunities in Federated Unlearning ( http://arxiv.org/abs/2403.02437v2 )

ライセンス: Link先を確認

Hyejun Jeong, Shiqing Ma, Amir Houmansadr,

(参考訳) 2017年に導入されたフェデレートラーニング(FL)は、信頼できない当事者間の協調的な学習を促進する。これにより、GDPRやCPRAといったプライバシー規制を尊重しながら、ユーザデータのトレーニングモデルが可能になる。しかし、新たなプライバシ要件は、データ所有者や法執行機関から要求された場合、モデル所有者にいくつかの学習データ、例えば、emph{forget}を指定できるように委任する可能性がある。これにより、"emph{machine unlearning}"と呼ばれる研究分野が誕生した。 FLの文脈では、集中的な環境での未学習のために開発された多くのテクニックは、簡単には適用できない。これは、集中学習と分散学習、特に相互作用性、確率性、不均一性、FLにおける限定的なアクセシビリティの違いによるものである。これに対し、最近の研究はFLに適した未学習メカニズムの開発に重点を置いている。本論文は、この新興分野の研究動向と課題を特定することを目的として、emph{federated unlearning}文学を深く研究することを目的としている。 FLアンラーニング(2020年以降)で発表された論文を慎重に分類することで、フェデレートされたアンラーニングのユニークな複雑さを特定し、集中型アンラーニングメソッドを直接適用する際の制限を強調することを目指している。我々は、影響の除去と性能回復に関する既存の非学習手法を比較し、脅威モデルと仮定を比較し、その意味と限界について議論する。例えば、データの不均一性やシミュレーション、デモに使われるデータセット、評価指標など、さまざまな観点からFLアンラーニング研究の実験的なセットアップを分析する。我々の研究は、将来のフェデレーション・アンラーニング研究のための洞察と提案を提供することを目的としている。

Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning.

翻訳日:2024-06-07 21:02:35 公開日:2024-06-05

# DRAGIN:大規模言語モデルの情報要求に基づく動的検索拡張生成

DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models ( http://arxiv.org/abs/2403.10081v2 )

ライセンス: Link先を確認

Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu,

(参考訳) 動的検索拡張生成(RAG)パラダイムは,Large Language Models(LLMs)のテキスト生成プロセスにおいて,いつ,何を検索するかを積極的に決定する。このパラダイムには2つの重要な要素がある: 検索モジュールをアクティベートする最適なモーメントを識別する(検索するタイミングを決定する)ことと、検索が起動したら適切なクエリを作成する(検索する項目を決定する)ことである。しかし、現在の動的RAGメソッドはどちらの面においても不足している。まず、いつ取得するかを決める戦略は、しばしば静的なルールに依存します。さらに、何を取得するかを決める戦略は、通常、LLMの最新の文や最後のいくつかのトークンに制限されるが、LLMのリアルタイム情報要求は、コンテキスト全体にまたがる可能性がある。これらの制約を克服するために,LLMのリアルタイム情報要求に基づく動的検索拡張生成(DRAGIN)という新しいフレームワークを導入する。本フレームワークは,テキスト生成プロセスにおいて,LLMのリアルタイム情報要求に基づいて,いつ,何を取得するかを決定するように設計されている。 DRAGINと既存の4つの知識集約型生成データセットを包括的に比較した。実験の結果,DRAGINは全タスクにおいて優れた性能を示し,本手法の有効性を実証した。 https://github.com/oneal2000/DRAGIN/tree/main

Dynamic retrieval augmented generation (RAG) paradigm actively decides when and what to retrieve during the text generation process of Large Language Models (LLMs). There are two key elements of this paradigm: identifying the optimal moment to activate the retrieval module (deciding when to retrieve) and crafting the appropriate query once retrieval is triggered (determining what to retrieve). However, current dynamic RAG methods fall short in both aspects. Firstly, the strategies for deciding when to retrieve often rely on static rules. Moreover, the strategies for deciding what to retrieve typically limit themselves to the LLM's most recent sentence or the last few tokens, while the LLM's real-time information needs may span across the entire context. To overcome these limitations, we introduce a new framework, DRAGIN, i.e., Dynamic Retrieval Augmented Generation based on the real-time Information Needs of LLMs. Our framework is specifically designed to make decisions on when and what to retrieve based on the LLM's real-time information needs during the text generation process. We evaluate DRAGIN along with existing methods comprehensively over 4 knowledge-intensive generation datasets. Experimental results show that DRAGIN achieves superior performance on all tasks, demonstrating the effectiveness of our method. We have open-sourced all the code, data, and models in GitHub: https://github.com/oneal2000/DRAGIN/tree/main

翻訳日:2024-06-07 20:52:38 公開日:2024-06-05

# VORTEX:リアルタイムオフチェーン支払いと暗号通貨のクロスチェーンスワップ

VORTEX: Real-Time Off-Chain Payments and Cross-Chain Swaps for Cryptocurrencies ( http://arxiv.org/abs/2403.15191v3 )

ライセンス: Link先を確認

Di Wu, Jian Liu, Zhengwei Hou, Wu Wen, Kui Ren,

(参考訳) 本稿では、オフチェーン決済とクロスチェーンスワップの2つの重要な課題に対処する、TEEベースのレイヤ2ソリューションであるVERTEXを提案する。チャンネルなしのオフチェーン支払い: オンチェーン関係や仲介チャネルを必要とせずに、誰にでも直接支払いができる。 - リアルタイムだが分散化されたクロスチェーンスワップ: 中央サーバに頼ることなく、リアルタイムのクロスチェーンスワップを可能にする、最初の既知のソリューションである。この新機能は、画期的な公正な交換プロトコルによって実現されている。 TEEクラッシュ耐性(TEE crash-tolerance): TEEクラッシュを処理するための2つのソリューションを提供する。我々は1000ノードからなるネットワーク上でECHOを評価し,その評価結果から,ECHOが7000TPSを達成することを示す。

In this paper, we present VERTEX, a TEE-based layer-2 solution that tackles two crucial challenges in the realm of cryptocurrencies: off-chain payments and cross-chain swaps. It offers three notable features: - Channel-free off-chain payments: it allows a payer to make direct payments to anyone without requiring any on-chain relationship or intermediary channels. - Real-time yet decentralized cross-chain swaps: it is the first known solution that enables real-time cross-chain swaps without relying on a central server. This novel feature is made possible through a ground-breaking fair exchange protocol. - TEE crash-tolerance: it offers two solutions to handle TEE crashes, one of which involves an innovative application of time-lock puzzles in this context. We evaluate ECHO on a network consists of 1000 nodes and the evaluation results show that ECHO can achieve 7000 TPS

翻訳日:2024-06-07 20:52:38 公開日:2024-06-05

# 動的システムの高精度かつ効率的な予測のためのハイブリッド化と次世代貯留層計算

Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems ( http://arxiv.org/abs/2403.18953v2 )

ライセンス: Link先を確認

Ravi Chepuri, Dael Amzalag, Thomas Antonsen Jr., Michelle Girvan,

(参考訳) Reservoir Computer (RC) は時系列予測のための強力な機械学習アーキテクチャである。近年,次世代貯水池コンピュータ (NGRC) が登場し,計算コストの削減やトレーニングデータ要求の低減など,RCに対して明確な優位性を提供している。しかし、NGRCはデータのサンプリング時間や非線形性のタイプに敏感であるなど、実際的な困難がある。本稿では,動的システムの時系列予測のためのハイブリッドRC-NGRC手法を提案する。計算資源の制限,準最適ハイパーパラメータ,疎サンプリングされたトレーニングデータなどの制約により,我々のハイブリッドアプローチは,カオス力学系の長期統計を正確に予測し,RCとNGRCのみが不足している状況において捉えることができることを示す。これらの条件下では, 小型貯水池を用いたハイブリッドRC-NGRC法は, 従来のRCよりもはるかに大きな貯水池に近づき, 従来のRCよりも計算効率が大きく向上し, 同時にNGRCの限界にも対処できることを示す。計算効率が高く,NGRC単独では不十分な場合に,ハイブリッドRC-NGRCアプローチが特に有用である可能性が示唆された。

Reservoir computers (RCs) are powerful machine learning architectures for time series prediction. Recently, next generation reservoir computers (NGRCs) have been introduced, offering distinct advantages over RCs, such as reduced computational expense and lower training data requirements. However, NGRCs have their own practical difficulties, including sensitivity to sampling time and type of nonlinearities in the data. Here, we introduce a hybrid RC-NGRC approach for time series forecasting of dynamical systems. We show that our hybrid approach can produce accurate short term predictions and capture the long term statistics of chaotic dynamical systems in situations where the RC and NGRC components alone are insufficient, e.g., due to constraints from limited computational resources, sub-optimal hyperparameters, sparsely-sampled training data, etc. Under these conditions, we show for multiple model chaotic systems that the hybrid RC-NGRC method with a small reservoir can achieve prediction performance approaching that of a traditional RC with a much larger reservoir, illustrating that the hybrid approach can offer significant gains in computational efficiency over traditional RCs while simultaneously addressing some of the limitations of NGRCs. Our results suggest that hybrid RC-NGRC approach may be particularly beneficial in cases when computational efficiency is a high priority and an NGRC alone is not adequate.

翻訳日:2024-06-07 20:42:53 公開日:2024-06-05

# TOD3Cap:屋外シーンでの3D映像撮影を目指す

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes ( http://arxiv.org/abs/2403.19589v2 )

ライセンス: Link先を確認

Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao,

(参考訳) 3D高密度キャプションは、自然言語による3Dシーンの包括的理解を実現するための基盤となる。最近、特に屋内で顕著な成果をみせている。しかし、屋外シーンにおける3次元高密度キャプションの探索は、2つの大きな課題によって妨げられている。 1) ダイナミックスや疎視的入力などの屋内と屋外のシーン間の領域ギャップは,既存の屋内手法を直接適用することが困難である。 2) アウトドアシーンに適した包括的ボックスキャプションペアアノテーションによるデータ不足。そこで本研究では,屋外3次元高密度キャプションの新たな課題について紹介する。入力として,パノラマカメラリグで撮影したLiDAR点雲とRGB画像のセットを仮定する。期待される出力は、キャプション付きのオブジェクトボックスのセットです。この課題に対処するために,BEV表現を利用してオブジェクトボックスの提案を生成し,リレーショナルQ-FormerとLLaMA-Adapterを統合するTOD3Capネットワークを提案する。また、850シーンから64.3Kの屋外オブジェクトを2.3M記述したTOD3Capデータセットも導入した。特に,私たちのTOD3Capネットワークは,屋外シーンにおける3Dオブジェクトのローカライズとキャプションを効果的に行うことができ,ベースライン手法の精度を著しく向上させる(+9.6 CiDEr@0.5IoU)。コード、データ、モデルはhttps://github.com/jxbbb/TOD3Capで公開されている。

3D dense captioning stands as a cornerstone in achieving a comprehensive understanding of 3D scenes through natural language. It has recently witnessed remarkable achievements, particularly in indoor settings. However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes. To this end, we introduce the new task of outdoor 3D dense captioning. As input, we assume a LiDAR point cloud and a set of RGB images captured by the panoramic camera rig. The expected output is a set of object boxes with captions. To tackle this task, we propose the TOD3Cap network, which leverages the BEV representation to generate object box proposals and integrates Relation Q-Former with LLaMA-Adapter to generate rich captions for these objects. We also introduce the TOD3Cap dataset, the largest one to our knowledge for 3D dense captioning in outdoor scenes, which contains 2.3M descriptions of 64.3K outdoor objects from 850 scenes. Notably, our TOD3Cap network can effectively localize and caption 3D objects in outdoor scenes, which outperforms baseline methods by a significant margin (+9.6 CiDEr@0.5IoU). Code, data, and models are publicly available at https://github.com/jxbbb/TOD3Cap.

翻訳日:2024-06-07 20:42:53 公開日:2024-06-05

# LLM評価のロバスト性の評価とベンチマークの分布推定

Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks ( http://arxiv.org/abs/2404.16966v2 )

ライセンス: Link先を確認

Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono,

(参考訳) ベンチマークは、LLM(Large Language Models)を評価するための中心的なアプローチとして登場した。調査コミュニティは、しばしばモデルの性能を評価するために、ベンチマークの試験プロンプト全体にわたるモデルの平均パフォーマンスに依存します。これは、ベンチマーク内のテストプロンプトが実世界の関心の分布からランダムなサンプルを表すという仮定と一致している。これは一般的にはそうではありませんが、代わりに特定のユースケースによって関心の分布が異なります。 1) テストプロンプト間のモデル性能の相関は非ランダムであり,(2) テストプロンプト間の相関を考慮すれば,主要なベンチマーク上でモデルランキングを変更することができる。

Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from a real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.

翻訳日:2024-06-07 20:33:09 公開日:2024-06-05

# 人間と大言語モデルにおける創造的プロセスの特徴付け

Characterising the Creative Process in Humans and Large Language Models ( http://arxiv.org/abs/2405.00899v2 )

ライセンス: Link先を確認

Surabhi S. Nath, Peter Dayan, Claire Stevenson,

(参考訳) 大きな言語モデルは非常に創造的で、創造的なタスクにおいて平均的な人間と同等に機能することが多い。しかし, LLM の創造性の研究は, 創造性にはほとんど関心を持たず, 単に \textit{products} に焦点を絞っている。人間の創造性に関するプロセス分析は、しばしば手書きのカテゴリや応答時間を利用する必要があるが、LLMには適用されない。本稿では,人間とLLMが交互利用課題における意味空間を探索する方法と,言語周波数課題における行動とを対比する手法を提案する。文埋め込みを用いて応答カテゴリを識別し、ジャンププロファイルを生成するために使用する意味的類似性を計算する。我々の結果は、人間における初期の研究と相関し、永続性(意味空間の深部探索)とフレキシブル(複数の意味空間を横断する広部探索)の両方を創造性へと導いてくれる。 LLMは、タスクによって異なる永続性または柔軟なパスに偏りがあることが判明した。人口としてのLSMは人間のプロファイルと一致するが、創造性との関係は異なる。我々のデータセットとスクリプトは \href{https://github.com/surabhisnath/Creative_Process}{GitHub} で入手できる。

Large language models appear quite creative, often performing on par with the average human on creative tasks. However, research on LLM creativity has focused solely on \textit{products}, with little attention on the creative \textit{process}. Process analyses of human creativity often require hand-coded categories or exploit response times, which do not apply to LLMs. We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task, and contrast with behaviour in a Verbal Fluency Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity, where both pathways lead to similar creativity scores. LLMs were found to be biased towards either persistent or flexible paths, that varied across tasks. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity. Our dataset and scripts are available on \href{https://github.com/surabhisnath/Creative_Process}{GitHub}.

翻訳日:2024-06-07 20:33:09 公開日:2024-06-05

# グラフニューラルネットワークの条件シフト・ロバスト整形予測

Conditional Shift-Robust Conformal Prediction for Graph Neural Network ( http://arxiv.org/abs/2405.11968v2 )

ライセンス: Link先を確認

S. Akansha,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの結果を予測する強力なツールとして登場した。有効性にもかかわらず、GNNの重大な欠点は、堅牢な不確実性推定を提供する能力が限られていることであり、エラーが重大な結果をもたらす状況において、信頼性に課題が生じる。さらに、GNNは、トレーニングデータとテストデータが同一の分布に従えば、実際のグラフデータシナリオでは、しばしば無意味な条件となる。本稿では,予測モデル出力を予測集合に変換することで不確かさを定量化するための,広く知られている統計手法であるコンフォメーション予測を利用して,条件シフト\footnote{Representing the change of Conditional probability distribution $P(label|input)$ from source domain to target domain。グラフベースの半教師あり学習(SSL)。さらに,潜在段階における条件シフトを最小限に抑えて,モデル予測の精細化を目的とした新たな損失関数を提案する。条件シフトロバスト (CondSR) によるGNNの共形予測は, モデルに依存しない, 様々な分類モデルに適用可能なアプローチである。提案手法の有効性を標準グラフベンチマークデータセットで検証し,ノード分類タスクにおける最先端のGNNと統合する。包括的評価により,提案手法は任意の目標限界範囲を連続的に達成し,条件付きシフト下での最先端GNNモデルの精度を最大12倍に向上し,予測セットサイズを最大48倍に削減することを示す。コードの実装は、さらなる探索と実験のために公開されています。

Graph Neural Networks (GNNs) have emerged as potent tools for predicting outcomes in graph-structured data. Despite their efficacy, a significant drawback of GNNs lies in their limited ability to provide robust uncertainty estimates, posing challenges to their reliability in contexts where errors carry significant consequences. Moreover, GNNs typically excel in in-distribution settings, assuming that training and test data follow identical distributions a condition often unmet in real world graph data scenarios. In this article, we leverage conformal prediction, a widely recognized statistical technique for quantifying uncertainty by transforming predictive model outputs into prediction sets, to address uncertainty quantification in GNN predictions amidst conditional shift\footnote{Representing the change in conditional probability distribution $P(label|input)$ from source domain to target domain.} in graph-based semi-supervised learning (SSL). Additionally, we propose a novel loss function aimed at refining model predictions by minimizing conditional shift in latent stages. Termed Conditional Shift Robust (CondSR) conformal prediction for GNNs, our approach CondSR is model-agnostic and adaptable to various classification models. We validate the effectiveness of our method on standard graph benchmark datasets, integrating it with state-of-the-art GNNs in node classification tasks. Comprehensive evaluations demonstrate that our approach consistently achieves any predefined target marginal coverage, enhances the accuracy of state of the art GNN models by up to 12\% under conditional shift, and reduces the prediction set size by up to 48\%. The code implementation is publicly available for further exploration and experimentation.

翻訳日:2024-06-07 20:23:24 公開日:2024-06-05

# 複合現実感に向けたマルチモーダルファイングラインドトレーニングアシスタントのための自律ワークフロー

Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality ( http://arxiv.org/abs/2405.13034v2 )

ライセンス: Link先を確認

Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo Cesar,

(参考訳) 自律人工知能(AI)エージェントは、言語ベースの環境を自動的に理解するための有望なプロトコルとして、特に大規模言語モデル(LLM)の指数関数的開発とともに登場した。しかし、マルチモーダル環境の詳細な包括的理解はいまだ未解明のままである。この作業は、AIエージェントを詳細にトレーニングするための拡張現実(XR)アプリケーションにシームレスに統合するための自律ワークフローを設計する。パイロットXR環境におけるLEGOブロック組立のためのマルチモーダルきめ細粒度トレーニングアシスタントのデモンストレーションを行う。具体的には、記憶、計画、XRツールとの相互作用をLLMと統合した脳言語エージェントと視覚言語エージェントを設計し、エージェントが過去の経験に基づいて行動を決定することを可能にする。さらに,商業LLMによって提供されるワークフローで自動的に合成される多モーダルなアセンブリ・ダイアログ・データセットLEGO-MRTAを紹介する。このデータセットは、マルチモーダルな指示マニュアル、会話、XR応答、視覚質問応答を含む。最後に,提案したデータセットを微調整することなく,その性能を評価するため,複数のオープンソース LLM をベンチマークとして提示する。我々は、このワークフローのより広範な影響が、XR環境におけるシームレスなユーザインタラクションのためのスマートアシスタントの開発を促進し、AIとHCIコミュニティの両方の研究を促進することを期待する。

Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.

翻訳日:2024-06-07 20:23:24 公開日:2024-06-05

# 説明可能な音声感情認識のための反復的特徴増強

Iterative Feature Boosting for Explainable Speech Emotion Recognition ( http://arxiv.org/abs/2405.20172v3 )

ライセンス: Link先を確認

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara,

(参考訳) 音声感情認識(SER)では、その実用的重要性を考慮せずに事前定義された特徴を用いることで、冗長で無関係な情報を含む高次元データセットが生成される可能性がある。その結果、高次元学習はしばしば計算複雑性を増大させながらモデルの精度を低下させる。本研究は,効率的なSERシステムを構築するために,特徴を慎重に検討し,分析することの重要性を浮き彫りにしている。本稿では,効率的な特徴工学手法に基づく新しい教師付きSER手法を提案する。特徴の関連性を評価し,特徴セットを洗練させるために,結果の説明可能性に特に注意を払っている。これは機能評価ループを通じて反復的に実行され、Shapley値を使用して機能選択を強化し、フレームワーク全体のパフォーマンスを改善する。このアプローチによって、モデルパフォーマンスと透明性のメリットのバランスが取れます。提案手法は,TESSデータセット上での感情認識において,ヒトレベルのパフォーマンス(HLP)および最先端の機械学習手法より優れる。本論文のソースコードはhttps://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognitionで公開されている。

In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset. The source code of this paper is publicly available at https://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognition.

翻訳日:2024-06-07 20:03:47 公開日:2024-06-05

# 合理性を考慮したマルチモーダル・マルチエージェントシステム:サーベイ

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey ( http://arxiv.org/abs/2406.00252v2 )

ライセンス: Link先を確認

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick,

(参考訳) 合理性(Rationality)とは、論理的な思考と、証拠や論理的な規則に沿った決定によって特徴づけられる、理性によって導かれる性質である。この品質は、ソリューションが十分に確立され、体系的に導出されることを保証するため、効果的な問題解決に不可欠である。大きな言語モデル(LLM)が顕著な精度で人間に似たテキストを生成するのに進歩しているにもかかわらず、トレーニングデータから継承されたバイアス、異なるコンテキスト間での不整合、複数のコンテキスト層を含む複雑なシナリオを理解するのが困難である。したがって、近年の研究は、一貫性と信頼性を高めるために、様々な種類のデータやツールと協調して働く複数のエージェントの強度を活用しようとしている。そこで本稿は,マルチモーダルシステムとマルチエージェントシステムが合理性に向かって進んでいるかを理解することを目的として,現状を調査し,合理性の観点から単モーダルシステムと単モーダルシステムの進歩を特定し,オープンな問題と今後の方向性について議論する。 https://github.com/bowen-upenn/MMMA_Rationality.comでオープンリポジトリをメンテナンスしています。

Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at https://github.com/bowen-upenn/MMMA_Rationality.

翻訳日:2024-06-07 20:03:47 公開日:2024-06-05

# メル周波数ケプストラム係数を用いた心臓音の高次分類 : 単音・アンサンブル分類法の比較検討

Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier Strategies ( http://arxiv.org/abs/2406.00702v2 )

ライセンス: Link先を確認

Amir Masoud Rahmani, Amir Haider, Parisa Khoshvaght, Mohammad Adeli, Entesar Gemeay, Yazeed Alkhrijah, Mokhtar Mohammadi, Mehdi Hosseinzadeh,

(参考訳) 本稿では,Mel Frequency Cepstral Coefficients (MFCCs) の2つの分類法(単一分類法とアンサンブル分類法)を用いた異常心電図検出における有効性について検討する。 Phonocardiograms were segmented into S1, systole, S2, and diastole intervals, and 13 MFCCs estimated from each segment, by 52 MFCCs per beat。単分類法では,9拍子のMFCCを平均化して心エコー図の分類を行った。逆に、アンサンブル分類法は9つの分類法を用いて、ビートを正常または異常として個別に評価し、全体分類は多数決に基づいて行った。どちらの方法も一般に公開されている心電図データベース上でテストされた。その結果, 単一分類法よりも高い精度を達成し, MFCCを時間, 時間, 統計的特徴など他の特徴よりも有効とみなし, 同様の研究で評価した。

This paper explores the efficacy of Mel Frequency Cepstral Coefficients (MFCCs) in detecting abnormal phonocardiograms using two classification strategies: a single-classifier and an ensemble-classifier approach. Phonocardiograms were segmented into S1, systole, S2, and diastole intervals, with thirteen MFCCs estimated from each segment, yielding 52 MFCCs per beat. In the single-classifier strategy, the MFCCs from nine consecutive beats were averaged to classify phonocardiograms. Conversely, the ensemble-classifier strategy employed nine classifiers to individually assess beats as normal or abnormal, with the overall classification based on the majority vote. Both methods were tested on a publicly available phonocardiogram database. Results demonstrated that the ensemble-classifier strategy achieved higher accuracy compared to the single-classifier approach, establishing MFCCs as more effective than other features, including time, time-frequency, and statistical features, evaluated in similar studies.

翻訳日:2024-06-07 19:54:03 公開日:2024-06-05

# ロバストセグメンテーションのための感度インフォームメント

Sensitivity-Informed Augmentation for Robust Segmentation ( http://arxiv.org/abs/2406.01425v3 )

ライセンス: Link先を確認

Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin,

(参考訳) セグメンテーションは、仮想トライオン、医療画像、自律運転、農業自動化など、多くのビジュアルコンピューティングアプリケーションにおいて不可欠なモジュールである。これらのアプリケーションは、一般的な携帯電話や高価な衛星画像カメラからでも、視覚センサーのデータの品質を劣化させることのできる、広範な消費者利用または高度に変動した環境を含むことが多い。ユーザ差や天候条件などの外部ノイズに加えて、カメラ品質の変動やレンズ歪みなどの内部ノイズは、開発と展開の両方においてセグメンテーションモデルの性能に影響を与える可能性がある。本研究では,学習ベースセグメンテーションモデルの堅牢性を高めるための,効率的で適応性が高く,勾配のない手法を提案する。まず,Kernel Inception Distance (KID) を用いた新しい適応感度解析手法を提案する。次に、適応SAとサンプル摂動ハイパーパラメータ値を用いて感度曲線をモデル化する。最後に、選択した摂動値を用いて対人訓練を行い、オンライントレーニング中のロバスト性を動的に再評価する。我々の手法は最小限の微調整でエンドツーエンドに実装され、セグメンテーションのための最先端データ拡張技術より一貫して優れている。これは、ビジュアルコンピューティングやコンピュータグラフィックスアプリケーションで使用される様々なセグメンテーションデータセットに対して、クリーンなデータ評価と現実の悪質なシナリオ評価の両方において、大幅な改善を示す。

Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In addition to external noises like user difference or weather conditions, internal noises such as variations in camera quality or lens distortion can affect the performance of segmentation models during both development and deployment. In this work, we present an efficient, adaptable, and gradient-free method to enhance the robustness of learning-based segmentation models across training. First, we introduce a novel adaptive sensitivity analysis (ASA) using Kernel Inception Distance (KID) on basis perturbations to benchmark perturbation sensitivity of pre-trained segmentation models. Then, we model the sensitivity curve using the adaptive SA and sample perturbation hyperparameter values accordingly. Finally, we conduct adversarial training with the selected perturbation values and dynamically re-evaluate robustness during online training. Our method, implemented end-to-end with minimal fine-tuning required, consistently outperforms state-of-the-art data augmentation techniques for segmentation. It shows significant improvement in both clean data evaluation and real-world adverse scenario evaluation across various segmentation datasets used in visual computing and computer graphics applications.

翻訳日:2024-06-07 19:54:03 公開日:2024-06-05

# Qラーニングにおける連続状態行動空間の識別方法--シンボリック・コントロール・アプローチ

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach ( http://arxiv.org/abs/2406.01548v3 )

ライセンス: Link先を確認

Sadek Belamfedel Alaoui, Adnane Saoud,

(参考訳) Q-ラーニングは、特定の目標を達成するためにコントローラを合成する効果的なアプローチとして広く認識されている。しかし、継続的な状態-作用空間によって引き起こされる課題への対処は現在も研究の焦点となっている。本稿では,空間離散化法における大きな欠点を浮き彫りにした系統解析について述べる。この課題に対処するため,本論文では,抽象から制御システムへのシミュレーションの交互化など,行動関係を表現するシンボリックモデルを提案する。この関係により、オリジナルのシステムへの抽象化に基づいて、合成されたコントローラをシームレスに適用することができる。シンボリックモデルのための新しいQ-ラーニング手法を導入し、最適なポリシーを符号化する2つのQ-テーブルを生成する。理論解析により、これらのQ-テーブルは、連続空間を持つ元の系のQ-値の上界と下界の両方として機能することを示した。さらに,空間抽象のパラメータとQ値の損失との相関について検討した。このアルゴリズムは任意の精度で最適性を達成し、精度と計算複雑性の間のトレードオフを制御する。得られた結果は、適切な学習パラメータを選択し、コントローラを洗練するための貴重な洞察を提供する。提案したQ-ラーニングに基づく記号モデルの工学的妥当性を2つのケーススタディで示す。

Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.

翻訳日:2024-06-07 19:54:03 公開日:2024-06-05

# 隠れた要因を明らかにする: 音声感情認識における特徴増強のための説明可能なAI

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition ( http://arxiv.org/abs/2406.01624v2 )

ライセンス: Link先を確認

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara,

(参考訳) 音声感情認識(SER)は、メンタルヘルス、教育、人間とコンピュータの相互作用など、いくつかの応用分野から注目されている。しかし、SERシステムの精度は、無関係かつ冗長な情報を含む可能性のある高次元特徴集合によって妨げられる。そこで本研究では,機械学習モデルの性能向上のための機能関連性や説明可能性を重視した,SERの反復的機能強化手法を提案する。我々のアプローチは、効率的なSERシステムを構築するための細心の注意を要する特徴の選択と分析である。モデル説明可能性による主要な問題に対処するために、Shapley値を持つ機能評価ループを用いて、反復的に機能セットを洗練します。このプロセスはモデルの性能と透明性のバランスをとっており、モデルの予測を包括的に理解することができる。提案手法は、無関係で冗長な特徴の識別や削除など、いくつかの利点を提供し、より効果的なモデルをもたらす。さらに、説明可能性を促進し、モデルの予測の理解を促進し、感情決定の重要な特徴を識別する。提案手法の有効性はトロントの感情音声セット(TESS)、ベルリンの感情音声データベース(EMO-DB)、Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS)、およびSurrey Audio-Visual Expressed Emotion(SAVEE)データセットのSERベンチマークで検証され、最先端の手法よりも優れている。私たちの知る限りでは、SERフレームワークにモデル説明可能性を導入するのはこれが初めてです。本論文のソースコードは、https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech -Emotion-Recognitionを通じて公開されている。

Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech -Emotion-Recognition.

翻訳日:2024-06-07 19:54:03 公開日:2024-06-05

# ECHOで高速でタイムリーに暗号化されたトラフィック分類

Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO ( http://arxiv.org/abs/2406.01852v2 )

ライセンス: Link先を確認

Shilo Daum, Tal Shapira, Anat Bremler-Barr, David Hay,

(参考訳) インターネットトラフィックの95%が暗号化されているため、このトラフィックを分類するための効果的なアプローチは、ネットワークのセキュリティと管理にとって不可欠である。本稿では,ML/DLベースの暗号化トラフィック分類のための新しい最適化プロセスであるECHOを紹介する。 ECHOは、分類時間とメモリ利用の両方を目標とし、2つの革新的なテクニックを取り入れている。最初のコンポーネントであるHO(Hyperparameter Optimization of binnings)は、効率的なトラフィック表現を作ることを目的としている。従来の研究では,パケットサイズやパケット到着時刻を固定サイズのビンにマッピングする表現を用いていた。これらの不均一な双対は、トレーニング段階でハイパーパラメータ最適化アルゴリズムを用いて導出される。 HOは必要な表現サイズに応じて精度を著しく向上させるか、または同等に、より小さな表現を用いて同等の精度を達成する。次に,EC(Early Classification of traffic)を導入し,信頼度に基づいて,異なる終了時間に適応した分類器のカスケードを用いて,より高速な分類を可能にする。 ECは、平均分類遅延を最大90%削減する。注目すべきは、この手法が分類精度を維持するだけでなく、場合によってはその精度を向上させることである。 3つの公開データセットを用いて、組み合わせた手法であるEarly Classification with Hyperparameter Optimization (ECHO)が、分類効率を大幅に向上させることを示した。

With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.

翻訳日:2024-06-07 19:54:03 公開日:2024-06-05

# コア毎のクリッピングによる低メモリ化と性能向上を効果的に訓練するASRモデル

Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping ( http://arxiv.org/abs/2406.02004v2 )

ライセンス: Link先を確認

Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan,

(参考訳) グラディエント・クリッピングは、大規模自動音声認識(ASR)モデルの訓練において重要な役割を果たす。一般的には、勾配の爆発を防ぐためのミニバッチ勾配や、意図しない暗記を緩和するために個々のサンプル勾配に適用される。この研究は、幅広いASRモデルのトレーニングにおいて、勾配クリッピングの特定の粒度、すなわちコアごとのクリッピング(PCC)の影響を体系的に調査する。我々は,PCCがASRモデルにおける意図しない記憶を効果的に緩和できることを実証的に実証した。驚くべきことに、PCCはASRのパフォーマンス指標に肯定的な影響を与え、収束率の改善と単語誤り率の低減につながっている。さらに,PCCが導入したハイパーパラメータの調整を避けるため,並列化最適化のための新しい変種アダプティブ・パー・コア・クリッピング(APCC)を提案する。本研究は,PCCの多面的メリットを,堅牢でプライバシ・フォワードなASRモデルトレーニングの戦略として強調した。

Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models. We empirically demonstrate that PCC can effectively mitigate unintended memorization in ASR models. Surprisingly, we find that PCC positively influences ASR performance metrics, leading to improved convergence rates and reduced word error rates. To avoid tuning the additional hyperparameter introduced by PCC, we further propose a novel variant, adaptive per-core clipping (APCC), for streamlined optimization. Our findings highlight the multifaceted benefits of PCC as a strategy for robust, privacy-forward ASR model training.

翻訳日:2024-06-07 19:44:18 公開日:2024-06-05

# Alice in Wonderland: State-Of-the-Art Large Language Modelにおける完全推論のブレークダウンを示す単純なタスク

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models ( http://arxiv.org/abs/2406.02061v2 )

ライセンス: Link先を確認

Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev,

(参考訳) 大規模言語モデル(LLM)は、しばしば基礎モデルの例として記述される。すなわち、様々なタスクや状況に対して、ほとんどショーやゼロショットの方法で強く移行するモデルであると同時に、事前トレーニングスケールを拡大する際の関数改善を予測するスケーリング法則を示す。これらの異なる機能やタスクが優れているという主張は、そのようなモデルに対して高いスコアを示す標準化されたベンチマークの様々なセットにまたがる測定に依存する。ここでは,人間によって容易に解ける簡潔で簡潔な自然言語で定式化された従来の共通感覚問題を用いて,強機能を主張する最大規模で訓練された最先端モデルの機能と推論能力の劇的な分解を実演する。モデルは間違った解に強い自信を表現し、しばしば非感覚的な「推論」のような説明は、明らかに失敗した応答の妥当性を正当化し、バックアップすることに似ている。正しいソリューションを得るための様々な標準的な介入、例えば、様々な種類の強化プロンプト、あるいは、複数のステップの再評価によって間違ったソリューションを再考するようモデルに促す、といったことは失敗します。これらの最初の観察は、科学・技術界に、現在のLLMの主張する能力の緊急な再評価を刺激するものであり、このような再評価は、現在の最先端の評価手順やベンチマークによって明らかに発見されないような基本的な理由づけ欠陥を適切に検出できるような、標準化されたベンチマークを作成するための共通の行動も必要である。論文における実験の再現コードと生の実験データはhttps://github.com/LAION-AI/AIWで見ることができる。

Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across various sets of standardized benchmarks showing high scores for such models. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales which claim strong function, using a simple, short, conventional common sense problem formulated in concise natural language, easily solvable by humans. The breakdown is dramatic, as models also express strong overconfidence in their wrong solutions, while providing often non-sensical "reasoning"-like explanations akin to confabulations to justify and backup the validity of their clearly failed responses, making them sound plausible. Various standard interventions in an attempt to get the right solution, like various type of enhanced prompting, or urging the models to reconsider the wrong solutions again by multi step re-evaluation, fail. We take these initial observations to the scientific and technological community to stimulate urgent re-assessment of the claimed capabilities of current generation of LLMs, Such re-assessment also requires common action to create standardized benchmarks that would allow proper detection of such basic reasoning deficits that obviously manage to remain undiscovered by current state-of-the-art evaluation procedures and benchmarks. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/AIW

翻訳日:2024-06-07 19:44:18 公開日:2024-06-05

# ハウサ語、ヨルバ語、イグボ語に対する攻撃言語とヘイトスピーチ検出のための多言語データセット

A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages ( http://arxiv.org/abs/2406.02169v2 )

ライセンス: Link先を確認

Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala,

(参考訳) オンライン攻撃言語の普及は、特に多言語文脈において、効果的な検出メカニズムの開発を必要とする。本研究は,ナイジェリアの主要言語であるHausa,Yoruba,Igboの3言語において,攻撃的言語検出のための新しいデータセットの開発と導入の課題に対処する。私たちはTwitterからデータを収集し、それを手動でアノテートして、ネイティブスピーカーを使用して、3つの言語毎にデータセットを作成しました。トレーニング済みの言語モデルを用いて、データセットにおける攻撃言語の検出の有効性を評価した。最高の性能モデルは90%の精度を達成した。攻撃的言語検出の研究をさらに支援するため、データセットとモデルを一般公開する計画である。

The proliferation of online offensive language necessitates the development of effective detection mechanisms, especially in multilingual contexts. This study addresses the challenge by developing and introducing novel datasets for offensive language detection in three major Nigerian languages: Hausa, Yoruba, and Igbo. We collected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers. We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best-performing model achieved an accuracy of 90\%. To further support research in offensive language detection, we plan to make the dataset and our models publicly available.

翻訳日:2024-06-07 19:44:18 公開日:2024-06-05

# Flash拡散: 画像生成のための条件付き拡散モデルを高速化する

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation ( http://arxiv.org/abs/2406.02347v2 )

ライセンス: Link先を確認

Clement Chadebec, Onur Tasar, Eyal Benaroche, Benjamin Aubin,

(参考訳) 本稿では,Flash拡散モデルの生成を高速化する,効率的で高速で多用途な蒸留法を提案する。このメソッドは、COCO2014とCOCO2017データセット上でイメージ生成を行ういくつかのステップにおいて、FIDとCLIP-Scoreの面で最先端のパフォーマンスに達する。その効率性に加えて、この手法の汎用性は、テキスト・トゥ・イメージ、インペイント、フェイス・スワッピング、スーパーレゾリューション、UNetベースのデノイザ(SD1.5, SDXL)やDiT(Pixart-$\alpha$)、アダプタなどの異なるバックボーンの使用など、いくつかのタスクにまたがる。いずれの場合も、非常に高品質な画像生成を維持しながら、サンプリングステップの数を劇的に削減することができる。公式実装はhttps://github.com/gojasper/flash-diffusion.comで公開されている。

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swapping, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$\alpha$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at https://github.com/gojasper/flash-diffusion.

翻訳日:2024-06-07 19:44:18 公開日:2024-06-05

# Llumnix: 大規模言語モデルの実行のための動的スケジューリング

Llumnix: Dynamic Scheduling for Large Language Model Serving ( http://arxiv.org/abs/2406.03243v1 )

ライセンス: Link先を確認

Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin,

(参考訳) 大規模言語モデル(LLM)に対する推論は、人々の日常生活における潜在能力を解放する鍵となる。しかし、リソース要件やレイテンシ要件の点で要求が本質的に不均一で予測できないため、多様なアプリケーションとLLMの動的実行特性の結果として、効率的なLLM提供は依然として困難である。既存のシステムは、これらの特性を扱うのに基本的に制限されており、厳しいキューの遅延、尾の遅延の低さ、SLO違反などの問題を引き起こす。 Llumnixは、複数のモデルインスタンスにまたがる実行時再スケジューリングによって、不均一で予測不能な要求に応答するLLMサービスシステムである。現代のオペレーティングシステムのCPUコア間のコンテキストスイッチと同様に、Llumnixはリクエストを再スケジュールし、ロードバランシングとアイソレーションを改善し、リソースのフラグメンテーションを緩和し、リクエスト優先順位とSLOを区別する。 Llumnixは、リクエストとそのインメモリ状態に対する効率的でスケーラブルなライブマイグレーションメカニズムでリスケジュールを実装し、複数のリスケジュールシナリオをエレガントに統一する動的スケジューリングポリシでそれを活用している。評価の結果,Llumnixはテールレイテンシを桁違いに改善し,高優先度要求を最大1.5倍高速化し,類似のテールレイテンシを実現しつつ36%のコスト削減を実現した。 Llumnixはhttps://github.com/AlibabaPAI/llumnixで公開されている。

Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamentally limited in handling these characteristics and cause problems such as severe queuing delays, poor tail latencies, and SLO violations. We introduce Llumnix, an LLM serving system that reacts to such heterogeneous and unpredictable requests by runtime rescheduling across multiple model instances. Similar to context switching across CPU cores in modern operating systems, Llumnix reschedules requests to improve load balancing and isolation, mitigate resource fragmentation, and differentiate request priorities and SLOs. Llumnix implements the rescheduling with an efficient and scalable live migration mechanism for requests and their in-memory states, and exploits it in a dynamic scheduling policy that unifies the multiple rescheduling scenarios elegantly. Our evaluations show that Llumnix improves tail latencies by an order of magnitude, accelerates high-priority requests by up to 1.5x, and delivers up to 36% cost savings while achieving similar tail latencies, compared against state-of-the-art LLM serving systems. Llumnix is publicly available at https://github.com/AlibabaPAI/llumnix.

翻訳日:2024-06-07 19:34:24 公開日:2024-06-05

# スワップゲートによる大域フェルミオンモード最適化

Global fermionic mode optimization via swap gates ( http://arxiv.org/abs/2406.03449v1 )

ライセンス: Link先を確認

Gero Friesecke, Miklós Antal Werner, Kornél Kapás, Andor Menczer, Örs Legeza,

(参考訳) 本稿では,大域フェルミオンモード最適化を用いて,与えられた誤差マージンに対する量子多体波関数の最適表現を求めるための一般的な手法を提案する。固定階数行列積状態多様体上の定常点は、グラスマン多様体 [Phys. Rev. Lett. 117, 210402] 上の合同最適化とスワップゲート制御置換によって得られる。大域量の最小化、ブロックエントロピー領域は、この方法が偏微分に関して全ての基準を満たすことを保証している。強相関分子系の大規模密度行列再正規化群シミュレーションと二次元フェルミオン格子モデルによる数値計算結果について述べる。

We propose a general approach to find an optimal representation of a quantum many body wave function for a given error margin via global fermionic mode optimization. The stationary point on a fixed rank matrix product state manifold is obtained via a joint optimization on the Grassman manifold [Phys. Rev. Lett. 117, 210402] together with swap gates controlled permutations. The minimization of the global quantity, the block entropy area, guarantees that the method fulfills all criteria with respect to partial derivatives. Numerical results via large scale density matrix renormalization group simulations on strongly correlated molecular systems and two-dimensional fermionic lattice models are discussed.

翻訳日:2024-06-07 19:34:24 公開日:2024-06-05

# 多次元・不均衡データセットに対するロバスト予測モデル

Robust Prediction Model for Multidimensional and Unbalanced Datasets ( http://arxiv.org/abs/2406.03507v1 )

ライセンス: Link先を確認

Pooja Thakar, Anil Mehta, Manisha,

(参考訳) データマイニングは有望な分野であり、予測能力のために複数のドメインに適用されている。実世界のデータは、多次元性、不均衡、欠落した値の問題に悩まされるため、データマイニングに簡単には利用できない。初心者による予測能力の使用は困難である。初心者は、利用可能な大量のデータから関連する属性のセットを見つけることは困難である。本稿では,ロバスト予測モデルを用いて属性の集合を見つけ,不均衡な実生活データセットと多次元実生活データセットの問題を解き,情報的意思決定のためのパターンの発見を支援する。モデルは、健康分野、教育、ビジネス、詐欺検出の5つの異なるデータセットでテストされる。その結果、モデルが頑健に動作し、様々な領域で適用可能であることが示された。

Data Mining is a promising field and is applied in multiple domains for its predictive capabilities. Data in the real world cannot be readily used for data mining as it suffers from the problems of multidimensionality, unbalance and missing values. It is difficult to use its predictive capabilities by novice users. It is difficult for a beginner to find the relevant set of attributes from a large pool of data available. The paper presents a Robust Prediction Model that finds a relevant set of attributes; resolves the problems of unbalanced and multidimensional real-life datasets and helps in finding patterns for informed decision making. Model is tested upon five different datasets in the domain of Health Sector, Education, Business and Fraud Detection. The results showcase the robust behaviour of the model and its applicability in various domains.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 事前訓練エンコーダのバックドア緩和に関する相互情報案内

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders ( http://arxiv.org/abs/2406.03508v1 )

ライセンス: Link先を確認

Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang,

(参考訳) ラベル付きデータを必要としないエンコーダの事前トレーニングには,自己教師付き学習(SSL)がますます魅力的なものになっている。これらのトレーニング済みエンコーダ上に構築された下流タスクは、ほぼ最先端のパフォーマンスを達成することができる。しかし、SSLによる事前訓練されたエンコーダは、既存の研究で示されているように、バックドア攻撃に対して脆弱である。下流タスクモデルのために多くのバックドア緩和技術が設計されている。しかし,事前学習時のラベル情報の欠如により,事前学習エンコーダに適用した場合,その有効性は損なわれ,制限される。本稿では,事前訓練したエンコーダに対するバックドア攻撃に対処するため,MIMICという相互誘導型バックドア緩和手法を提案する。 MIMICは、潜在的なバックドアエンコーダを教師ネットとして扱い、知識蒸留を用いて教師ネットからクリーンな学生エンコーダを蒸留する。既存の知識蒸留のアプローチとは異なり、MIMICは学生を無作為な体重で初期化し、教師のネットからバックドアを継承しない。そして、MIMICは各層間の相互情報と抽出した特徴を利用して、教師ネット内の良識の所在を特定する。蒸留損失は, クローン損失と注意損失の2つの側面から発生し, バックドアを緩和し, エンコーダ性能を同時に維持することを目的としている。 SSLにおける2つのバックドア攻撃による評価の結果,MIMIC はクリーンデータの 5% しか利用せず,最先端のバックドア緩和技術7 を超越して攻撃成功率を大幅に低減できることが示された。

Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC. MIMIC treats the potentially backdoored encoder as the teacher net and employs knowledge distillation to distill a clean student encoder from the teacher net. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing <5% of clean data, surpassing seven state-of-the-art backdoor mitigation techniques.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 非対称調和振動子のコヒーレント状態

Coherent states of the asymmetric harmonic oscillator ( http://arxiv.org/abs/2406.03509v1 )

ライセンス: Link先を確認

G. Chadzitaskos,

(参考訳) 非対称高調波発振器に対して, 非対称性パラメータがばね定数比の平方根となる形式的コヒーレント状態を構築した。これらの状態はグラウバーのアプローチとペレロモフのアプローチに基づいているが、一般にコヒーレントな状態に必要な全ての性質を満たすわけではない。時間が経つにつれ、このような方法で導入されたコヒーレントな状態は一般に非コヒーレントになる。しかし、スプリング定数の平方根比に対して、$\frac{4k+1}{4l+1}$または$\frac{4k+3}{4l+3}$の特定のパラメータが存在する。これらのパラメータに対して、固有状態のヒルベルト空間の部分空間上のコヒーレント状態を構築することができる。これらのコヒーレントな状態は、進化の過程でコヒーレンスを維持する。この事例も分析される。

We constructed formal coherent states for an asymmetric harmonic oscillator, where the asymmetry parameter is the square root of the ratio of spring constants. Although these states are constructed based on both Glauber's and Perelomov's approaches, in general they do not satisfy all the properties required for coherent states. Over time, the coherent states introduced in this way generally become incoherent. However, there are some specific parameters for the square root ratios of the spring constants $\frac{4k+1}{4l+1}$ or $\frac{4k+3}{4l+3}$. For these parameters it is possible to construct coherent states on the subspace of the Hilbert space of eigenstates. These coherent states keep their coherence during the time evolution. This case is also analyzed.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 音声による臨床うつ病スクリーニング : 実証的研究

Speech-based Clinical Depression Screening: An Empirical Study ( http://arxiv.org/abs/2406.03510v1 )

ライセンス: Link先を確認

Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi,

(参考訳) 本研究では, 精神科面接, チャットボット会話, テキスト読解など, さまざまな相互作用シナリオを対象としたAIによる抑うつスクリーニングにおける音声信号の有用性について検討した。参加者には、北京大学第6病院の外来から徴発されたうつ病患者や、地域社会のコントロールグループメンバーが含まれており、すべて標準化された診断プロトコルに従って精神科医によって診断されている。音声と深部音声の特徴を各参加者の分節録音から抽出した。分類はニューラルネットワークまたはSVMを使用して行われ、最終的な評価はまとめられたクリップ結果によって決定された。対話シナリオ, 音声処理技術, 特徴型による分析により, 抑うつスクリーニングの重要な指標として音声が確認される。具体的には、人間とコンピュータの相互作用が臨床面接の有効性と一致し、読解タスクを超越する。セグメントの長さと量はモデル性能に大きく影響し、ディープ音声の特徴は従来の音響特性よりもかなり優れていた。

This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants includes depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading tasks. Segment duration and quantity significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# MagiNet:不完全なトラフィックデータのためのマスク対応グラフインプットネットワーク

MagiNet: Mask-Aware Graph Imputation Network for Incomplete Traffic Data ( http://arxiv.org/abs/2406.03511v1 )

ライセンス: Link先を確認

Jianping Zhou, Bin Lu, Zhanyu Liu, Siyu Pan, Xuejun Feng, Hua Wei, Guanjie Zheng, Xinbing Wang, Chenghu Zhou,

(参考訳) 検出器の故障と通信障害のため、交通データの収集中に欠落したデータがどこにでもある。したがって、インテリジェントトランスポートシステム(ITS)のデータ分析と意思決定を容易にするために、欠落した値をインプットすることが極めて重要である。しかし、既存の計算手法は一般に、欠落した値を初期化し、避けられないノイズを発生させるため、0のプリフィル技術を実行する。さらに,不完全な交通データに内在する時空間相関を明らかにするために,過度に平滑な補間を観測する。そこで我々はMask-Aware Graph imputation Network: MagiNetを提案する。適応マスク時空間エンコーダを設計し、不完全データの潜在表現を学習し、不足した値への依存を解消する。さらに、複数のブロックを積み重ねた時空間デコーダを考案し、不完全なトラフィックデータ中の空間的および時間的依存関係を捕捉し、過度に平滑な計算を緩和する。その結果, RMSEでは平均4.31%, MAPEでは3.72%向上した。

Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, introducing inevitable noises. Moreover, we observe prevalent over-smoothing interpolations, falling short in revealing the intrinsic spatio-temporal correlations of incomplete traffic data. To this end, we propose Mask-Aware Graph imputation Network: MagiNet. Our method designs an adaptive mask spatio-temporal encoder to learn the latent representations of incomplete data, eliminating the reliance on pre-filling missing values. Furthermore, we devise a spatio-temporal decoder that stacks multiple blocks to capture the inherent spatial and temporal dependencies within incomplete traffic data, alleviating over-smoothing imputation. Extensive experiments demonstrate that our method outperforms state-of-the-art imputation methods on five real-world traffic datasets, yielding an average improvement of 4.31% in RMSE and 3.72% in MAPE.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 困難か違いか?オーディオディープフェイク検出の一般化を理解する

Harder or Different? Understanding Generalization of Audio Deepfake Detection ( http://arxiv.org/abs/2406.03512v1 )

ライセンス: Link先を確認

Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger,

(参考訳) 最近の研究は、音声のディープフェイク検出における重要な課題を強調している。これは、テキスト音声(TTS)モデルの品質が継続的に向上していること、すなわち、より新しいDeepFakesは単に'ハード'で検出できるのか? あるいは、あるモデルで生成されたディープフェイクが、別のモデルで生成されたディープフェイクと根本的に異なるからだろうか? ドメイン内テストデータとドメイン外テストデータのパフォーマンスギャップを'ハードネス'と'ディファレンス'コンポーネントに分解することで、この問題に答える。 ASVspoofデータベースを用いて行った実験は、硬さ成分が事実上無視可能であることを示している。これは現実世界のディープフェイク検出に直接的な意味を持ち、現在支配的な研究トレンドであるモデル容量の増加だけでは、一般化の課題に効果的に対処できないことを強調している。

Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using another model? We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components. Experiments performed using ASVspoof databases indicate that the hardness component is practically negligible, with the performance gap being attributed primarily to the difference component. This has direct implications for real-world deepfake detection, highlighting that merely increasing model capacity, the currently-dominant research trend, may not effectively address the generalization challenge.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# デバイス間フェデレーション学習のためのバッファ付き非同期セキュアアグリゲーション

Buffered Asynchronous Secure Aggregation for Cross-Device Federated Learning ( http://arxiv.org/abs/2406.03516v1 )

ライセンス: Link先を確認

Kun Wang, Yi-Rui Yang, Wu-Jun Li,

(参考訳) 非同期フェデレーション学習(AFL)は、デバイス間フェデレーション学習におけるデバイス不均一性の課題に対処する有効な方法である。しかしながら、AFLは通常、既存のセキュアアグリゲーションプロトコルは同期アグリゲーションに基づいているため、フェデレートラーニングにおけるユーザのプライバシを保護するために使用される既存のセキュアアグリゲーションプロトコルと互換性がない。本稿では,バッファ型非同期セキュアアグリゲーション(BASA)と呼ばれる新しいセキュアアグリゲーションプロトコルを提案する。既存のプロトコルと比較して、BASAはAFLと完全に互換性があり、各ユーザがユーザ間の同期通信に頼ることなく、サーバとの1ラウンドの通信しか必要としないという条件の下でセキュアなアグリゲーションを提供する。 BASAに基づいてハードウェアに余分な要求を伴わずにセキュアなアグリゲーションを実現する最初のAFL法を提案する。我々は、BASAが、トレーニング効率とスケーラビリティの観点から、クロスデバイス・フェデレーション・ラーニングのための既存のセキュア・アグリゲーション・プロトコルより優れていることを実証的に実証した。

Asynchronous federated learning (AFL) is an effective method to address the challenge of device heterogeneity in cross-device federated learning. However, AFL is usually incompatible with existing secure aggregation protocols used to protect user privacy in federated learning because most existing secure aggregation protocols are based on synchronous aggregation. To address this problem, we propose a novel secure aggregation protocol named buffered asynchronous secure aggregation (BASA) in this paper. Compared with existing protocols, BASA is fully compatible with AFL and provides secure aggregation under the condition that each user only needs one round of communication with the server without relying on any synchronous interaction among users. Based on BASA, we propose the first AFL method which achieves secure aggregation without extra requirements on hardware. We empirically demonstrate that BASA outperforms existing secure aggregation protocols for cross-device federated learning in terms of training efficiency and scalability.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 不均一な個人差分学習のための雑音認識アルゴリズム

Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning ( http://arxiv.org/abs/2406.03519v1 )

ライセンス: Link先を確認

Saber Malekmohammadi, Yaoliang Yu, Yang Cao,

(参考訳) 高いユーティリティと厳密なデータプライバシは、いくつかのクライアント間で分散したデータからモデルを学ぶ、フェデレートラーニング(FL)システムの主要な目標のひとつです。後者はFL(DPFL)で差分プライバシーを利用することで実現されている。クライアントのプライバシ要件には不均一性があることが多く、既存のDPFLは、クライアントの統一的なプライバシ要件を前提とするか、あるいはサーバが完全に信頼されていない場合(設定)には適用できない。さらに、クライアントのバッチサイズやデータセットサイズには不均一性がしばしば存在し、示すように、クライアントモデルの更新間でDPノイズレベルが余分に変化する。このような異種性の源では、クライアントのアグリゲーションの重み付けをプライバシパラメータに比例して割り当てるなど、直接的なアグリゲーション戦略によって、実用性が低下する。本稿では,クライアントモデル更新における真のノイズレベルを効率的に推定し,集約モデル更新におけるノイズレベルを大幅に低減するRobust-HDPを提案する。 Robust-HDPはユーティリティと収束速度を改善し、不正なプライバシパラメータをサーバに送信する可能性のあるクライアントに対して安全である。複数のデータセットに対する大規模な実験結果と理論的解析により,Robust-HDPの有効性が確認された。私たちのコードはここにある。

High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not applicable when server is not fully trusted (our setting). Furthermore, there is often heterogeneity in batch and/or dataset size of clients, which as shown, results in extra variation in the DP noise level across clients model updates. With these sources of heterogeneity, straightforward aggregation strategies, e.g., assigning clients aggregation weights proportional to their privacy parameters will lead to lower utility. We propose Robust-HDP, which efficiently estimates the true noise level in clients model updates and reduces the noise-level in the aggregated model updates considerably. Robust-HDP improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameter to server. Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# VideoPhy:ビデオ生成のための物理コモンセンスの評価

VideoPhy: Evaluating Physical Commonsense for Video Generation ( http://arxiv.org/abs/2406.03520v1 )

ライセンス: Link先を確認

Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover,

(参考訳) インターネット規模のビデオデータの事前トレーニングの最近の進歩は、様々な視覚概念やスタイルで高品質な動画を作成できるテキスト・ビデオ生成モデルの開発につながっている。現実的な動きを合成し、複雑な物体をレンダリングする能力により、これらの生成モデルは物理世界の汎用シミュレータになる可能性がある。しかし、既存のテキスト・ビデオ生成モデルでは、この目標からどこまで離れているのかは不明だ。この目的のために、生成したビデオが現実世界のアクティビティの物理的なコモンセンスに従うかどうかを評価するために設計されたベンチマークであるVideoPhyを紹介する(例えば、大理石は傾斜した表面に置かれたときにロールダウンする)。具体的には、物理世界における様々な物質種間の相互作用を含む688のキャプションのリスト(例えば、固形固形流体、固形流体、流体流体)をキュレートする。次に、オープンモデル(例: VideoCrafter2)やクローズドモデル(例: Google, Pika)など、さまざまな最先端のテキスト・ビデオ生成モデルから、これらのキャプションに条件付けされたビデオを生成します。さらに,人間による評価の結果,既存のモデルではテキストプロンプトに忠実な動画を生成する能力が乏しく,物理的コモンセンスも欠如していることが判明した。具体的には、最高のパフォーマンスモデルであるピカは、19.7%のインスタンスでキャプションと物理法に準拠するビデオを生成する。 VideoPhyは、ビデオ生成モデルは物理的な世界を正確にシミュレートするものではないと強調する。最後に、データセットを自動評価器であるVideoCon-Physicsで補足し、意味的定着と物理的常識を大規模に評価する。

Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However, it is unclear how far we are from this goal with the existing text-to-video generative models. To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). Specifically, we curate a list of 688 captions that involve interactions between various material types in the physical world (e.g., solid-solid, solid-fluid, fluid-fluid). We then generate videos conditioned on these captions from diverse state-of-the-art text-to-video generative models, including open models (e.g., VideoCrafter2) and closed models (e.g., Lumiere from Google, Pika). Further, our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts, while also lack physical commonsense. Specifically, the best performing model, Pika, generates videos that adhere to the caption and physical laws for only 19.7% of the instances. VideoPhy thus highlights that the video generative models are far from accurately simulating the physical world. Finally, we also supplement the dataset with an auto-evaluator, VideoCon-Physics, to assess semantic adherence and physical commonsense at scale.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 開非平衡量子系におけるMpemba効果

Mpemba effects in open nonequilibrium quantum systems ( http://arxiv.org/abs/2406.03521v1 )

ライセンス: Link先を確認

Andrea Nava, Reinhold Egger,

(参考訳) いくつかの貯水池に結合した量子系を開放するために、古典的な熱的メンバ効果(初期のホット系は、冷たいものよりも最終平衡状態に速く緩和する)を一般化する。一般に、2つの異なる種類の量子Mpemba効果が可能であることを示す。それらは量子状態トモグラフィーによって区別される。しかし、(型を決定することなしに)量子ムペンバ効果の存在は、電流やエネルギーのような単純な観測可能量を測定することで既に確立できる。 2つの金属鉛に結合した相互作用する2サイト北エフ模型の実験可能な場合の一般的な結果について述べる。

We generalize the classical thermal Mpemba effect (where an initially hot system relaxes faster to the final equilibrium state than a cold one) to open quantum systems coupled to several reservoirs. We show that, in general, two different types of quantum Mpemba effects are possible. They may be distinguished by quantum state tomography. However, the existence of a quantum Mpemba effect (without determining the type) can already be established by measuring simpler observables such as currents or energies. We illustrate our general results for the experimentally feasible case of an interacting two-site Kitaev model coupled to two metallic leads.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# $\mathcal{PT}$-symmetric系における拡散複雑性と局在

Spread complexity and localization in $\mathcal{PT}$-symmetric systems ( http://arxiv.org/abs/2406.03524v1 )

ライセンス: Link先を確認

Aranya Bhattacharya, Rathindra Nath Das, Bidyut Dey, Johanna Erdmenger,

(参考訳) 本稿では,拡散複雑性と拡散エントロピーを用いた$\mathcal{PT}$-対称量子系における波動関数の拡散に関する研究フレームワークを提案する。境界点に複雑なオンサイトポテンシャルを持つ強結合鎖を考える。 $\mathcal{PT}$-unbroken 相では、波動関数は非局在化される。我々は、$\mathcal{PT}$-breakken 相において、強結合格子の片端に局在する。この局在は非エルミート皮膚効果の実現である。 $\mathcal{PT}$-breakken 相の局在は格子鎖基底とクリロフ基底の両方で観察される。スプレッドエントロピー、エントロピー複雑性、およびクリロフ逆参加比(英語版)と呼ばれるさらなる尺度は、波動関数のダイナミクスを探索し、クリロフ基底で探索された局所化の強さを定量化する。状態の情報を保存するために必要なクリロフ基底ベクトルの数は、局所化の強さによって減少する。以上の結果から,Krylov空間の測度を非エルミート皮膚効果とその局在相転移の特徴づけに利用できることが示唆された。

We present a framework for investigating wave function spreading in $\mathcal{PT}$-symmetric quantum systems using spread complexity and spread entropy. We consider a tight-binding chain with complex on-site potentials at the boundary sites. In the $\mathcal{PT}$-unbroken phase, the wave function is delocalized. We find that in the $\mathcal{PT}$-broken phase, it becomes localized on one edge of the tight-binding lattice. This localization is a realization of the non-Hermitian skin effect. Localization in the $\mathcal{PT}$-broken phase is observed both in the lattice chain basis and the Krylov basis. Spread entropy, entropic complexity, and a further measure that we term the Krylov inverse participation ratio probe the dynamics of wave function spreading and quantify the strength of localization probed in the Krylov basis. The number of Krylov basis vectors required to store the information of the state reduces with the strength of localization. Our results demonstrate how measures in Krylov space can be used to characterize the non-hermitian skin effect and its localization phase transition.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# エッジ重み決定図を用いた混合次元量子状態生成

Mixed-Dimensional Qudit State Preparation Using Edge-Weighted Decision Diagrams ( http://arxiv.org/abs/2406.03531v1 )

ライセンス: Link先を確認

Kevin Mato, Stefan Hillmich, Robert Wille,

(参考訳) 量子コンピュータは、古典的なコンピュータでは基本的に難解な重要な問題を解く可能性がある。量子コンピューティングプラットフォームの基盤となる物理は、多値論理(multi-valued logic)の使用をサポートする。このポテンシャルを利用するための重要な要素の1つは、多値系(qudit)のために量子状態を効率的に準備する能力である。量子コンピュータの時間感度のため、必要な状態に備える回路は可能な限り短くする必要がある。本稿では,混合次元系に着目した量子状態生成法について検討する。提案手法は, 対応する混合次元量子状態を構成する量子回路を自動生成する。この目的のために、決定図は、実現される量子状態のコンパクトな表現として使用される。さらに、量子状態を近似して、精度、メモリの複雑さ、回路内の演算数の間の微調整されたトレードオフを可能にする能力も取り入れている。実験的な評価は、高速でスケーラブルな量子状態の準備を容易にするための提案手法の有効性を示し、性能は決定図のサイズに直接関連している。この実装は MQT Qudits at github.com/cda-tum/mqt-qudits のフレームワーク MQT Qudits の一部として、ミュンヘン量子ツールキット(MQT)の一部として無料で利用可能である。

Quantum computers have the potential to solve important problems which are fundamentally intractable on a classical computer. The underlying physics of quantum computing platforms supports using multi-valued logic, which promises a boost in performance over the prevailing two-level logic. One key element to exploiting this potential is the capability to efficiently prepare quantum states for multi-valued, or qudit, systems. Due to the time sensitivity of quantum computers, the circuits to prepare the required states have to be as short as possible. In this paper, we investigate quantum state preparation with a focus on mixed-dimensional systems, where the individual qudits may have different dimensionalities. The proposed approach automatically realizes quantum circuits constructing a corresponding mixed-dimensional quantum state. To this end, decision diagrams are used as a compact representation of the quantum state to be realized. We further incorporate the ability to approximate the quantum state to enable a finely controlled trade-off between accuracy, memory complexity, and number of operations in the circuit. Empirical evaluations demonstrate the effectiveness of the proposed approach in facilitating fast and scalable quantum state preparation, with performance directly linked to the size of the decision diagram. The implementation is freely available as part of Munich Quantum Toolkit~(MQT), under the framework MQT Qudits at github.com/cda-tum/mqt-qudits.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 量子コンピューティングにおける時間的ハドロン真空分極と光による散乱:シュウィンガーモデル実験

Towards Quantum Computing Timelike Hadronic Vacuum Polarization and Light-by-Light Scattering: Schwinger Model Tests ( http://arxiv.org/abs/2406.03536v1 )

ライセンス: Link先を確認

João Barata, Kazuki Ikeda, Swagato Mukherjee, Jonathan Raghoonanan,

(参考訳) ハドロン真空分極(HVP)と光バイライト散乱(HLBL)は、ミューオンの異常な磁気モーメントに関する標準モデル予測を評価する上で重要である。しかし、これらの観測可能な時間的領域の直接的な第一原理格子ゲージ理論に基づく計算は、依然として困難である。空間的領域における格子量子色力学(QCD)計算と、時間的領域からの実験データパラメトリゼーションに依存する分散的アプローチとの相違が持続する。本稿では、1+1次元量子電磁力学(QED)、すなわちシュウィンガーモデルを用いてHVPとHLBLを解析する手法を紹介する。そのために、テンソルネットワーク技術、特に行列積状態とデジタル量子コンピュータの古典的エミュレータの両方を使用します。単純化されたモデルで実現可能性を示すため、我々の手法はデジタル量子コンピュータを活用した将来の取り組みの舞台となる。

Hadronic vacuum polarization (HVP) and light-by-light scattering (HLBL) are crucial for evaluating the Standard Model predictions concerning the muon's anomalous magnetic moment. However, direct first-principle lattice gauge theory-based calculations of these observables in the timelike region remain challenging. Discrepancies persist between lattice quantum chromodynamics (QCD) calculations in the spacelike region and dispersive approaches relying on experimental data parametrization from the timelike region. Here, we introduce a methodology employing 1+1-dimensional quantum electrodynamics (QED), i.e. the Schwinger Model, to investigate the HVP and HLBL. To that end, we use both tensor network techniques, specifically matrix product states, and classical emulators of digital quantum computers. Demonstrating feasibility in a simplified model, our approach sets the stage for future endeavors leveraging digital quantum computers.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# データ複雑度の幾何学的視点:拡散モデルを用いた効率的な局所固有次元推定

A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models ( http://arxiv.org/abs/2406.03537v1 )

ライセンス: Link先を確認

Hamidreza Kamkari, Brendan Leigh Ross, Rasa Hosseinzadeh, Jesse C. Cresswell, Gabriel Loaiza-Ganem,

(参考訳) 高次元データは一般に低次元部分多様体の上にあり、ダトゥムの局所内在次元(LID)を推定する(つまり、それが属する部分多様体の次元)ことは長年の問題である。 LIDは、変化の局所的な要因の数として理解することができる: ダタムの変動の要因が多ければ多いほど、それがより複雑になる傾向がある。この量の推定は、ニューラルネットワークの一般化からアウト・オブ・ディストリビューションデータの検出、敵例、AI生成テキストに至るまで、コンテキストにおいて有用であることが証明されている。近年の深層生成モデルの成功は、それらをLID推定に活用する機会を与えるが、生成モデルに基づく現在の手法は、不正確な見積もりを生成し、単一の事前学習モデル以上のものを必要とし、計算集約的であり、あるいは最良の深部生成モデル、すなわち拡散モデル(DM)を利用できない。本研究では, DMに付随するFokker-Planck方程式が, 上記すべての欠陥に対処するLID推定器を提供することを示す。我々の推定器はFLIPDと呼ばれ、すべての一般的なDMと互換性があり、LID推定ベンチマークで既存のベースラインを上回っている。また,実LIDが不明な自然画像にもFLIPDを適用した。競合推定器と比較して、FLIPDは複雑性の非LID測度と高い相関を示し、複雑性の質的な評価とよく一致し、安定拡散のスケールで高解像度の画像を抽出可能な唯一の推定器である。

High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models, i.e. diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide a LID estimator which addresses all the aforementioned deficiencies. Our estimator, called FLIPD, is compatible with all popular DMs, and outperforms existing baselines on LID estimation benchmarks. We also apply FLIPD on natural images where the true LID is unknown. Compared to competing estimators, FLIPD exhibits a higher correlation with non-LID measures of complexity, better matches a qualitative assessment of complexity, and is the only estimator to remain tractable with high-resolution images at the scale of Stable Diffusion.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 非アベリアンアゾニック系における絡み合い非対称性

Entanglement Asymmetry in non-Abelian Anyonic Systems ( http://arxiv.org/abs/2406.03546v1 )

ライセンス: Link先を確認

Nicetu Tibau Vidal, Ved Kunte, Lucia Vilchez-Estevez, Mohit Lal Bera, Manabendra Nath Bera,

(参考訳) フォールトトレラントなトポロジカル量子計算のための有望なプラットフォームであるNon-Abelian anyonsは、物理的に許容される状態と演算に制限を課すチャージスーパーセレクションルール(cSSR)に準拠している。しかし、任意の量子情報理論におけるcSSRと融合規則の分岐はほとんど未解明のままである。本研究では, クイディット, ボソン, フェルミオンなどの非アノニック系と, 情報理論特性が根本的に異なることを明らかにし, 複雑な構造を提示する。バイパルタイト系では、純粋な状態は異なる境界スペクトルを持ち、混合状態は純粋な境界状態を含む。さらに注目すべきは、純粋な絡み合った状態において、当事者は絡み合った状態への平等なアクセスを欠いている可能性があることだ。この絡み合った非対称性は、アリスとボブの間に共有される絡み合った正準状態を用いて量子テレポーテーションにおいて現れ、アリスは未知の量子情報をボブに完全にテレポーティングできるが、ボブはこの能力に欠ける。これらの特徴は従来の理解に挑戦し、量子情報や相関を常に特徴付ける新しいアプローチを必要とする。これらの特徴は非アベリア格子ゲージ場理論にも現れることを期待する。本研究は, 量子通信と暗号プロトコルの実現に繋がる可能性があり, 一方が他方に傾いている場合の知識理論的側面の理解を著しく促進する。

Non-Abelian anyons, a promising platform for fault-tolerant topological quantum computation, adhere to the charge super-selection rule (cSSR), which imposes restrictions on physically allowed states and operations. However, the ramifications of cSSR and fusion rules in anyonic quantum information theory remain largely unexplored. In this study, we unveil that the information-theoretic characteristics of anyons diverge fundamentally from those of non-anyonic systems such as qudits, bosons, and fermions and display intricate structures. In bipartite anyonic systems, pure states may have different marginal spectra, and mixed states may contain pure marginal states. More striking is that in a pure entangled state, parties may lack equal access to entanglement. This entanglement asymmetry is manifested in quantum teleportation employing an entangled anyonic state shared between Alice and Bob, where Alice can perfectly teleport unknown quantum information to Bob, but Bob lacks this capability. These traits challenge conventional understanding, necessitating new approaches to characterize quantum information and correlations in anyons. We expect that these distinctive features will also be present in non-Abelian lattice gauge field theories. Our findings significantly advance the understanding of the information-theoretic aspects of anyons and may lead to realizations of quantum communication and cryptographic protocols where one party holds sway over the other.

翻訳日:2024-06-07 19:24:39 公開日:2024-06-05

# 統合不確実性注入による深層学習によるロバスト通信と計算

Robust Communication and Computation using Deep Learning via Joint Uncertainty Injection ( http://arxiv.org/abs/2406.03548v1 )

ライセンス: Link先を確認

Robert-Jeron Reifert, Hayssam Dahrouj, Alaa Alameer Ahmad, Haris Gacanin, Aydin Sezgin,

(参考訳) コミュニケーションと計算の収束は、機械学習と人工知能の統合とともに、第6世代の通信システム(6G)の鍵となる力となる。本稿では,空間多重化を用いた複数のデバイスを同時に運用する1つの基地局のネットワークについて考察する。そこで本稿では,チャネル情報と計算状態情報の両面での不確実性の中で,計算割り当てとともに送信と計算の能力を同時に管理する,革新的なディープラーニングベースのアプローチを提案する。より具体的には、計算と電力制約の対象となるサービス機器間の最悪の遅延を最小限に抑える、堅牢なソリューションを提案することを目的としている。この論文は、推定チャネルと計算要求を最適化されたリソース割り当てにマッピングするディープニューラルネットワーク(DNN)ベースのソリューションを使用する。トレーニング中、DNN出力後に不確実性サンプルを注入し、通信および計算推定誤差の両方を共同で考慮する。 DNNは、堅牢なユーティリティを使用してバックプロパゲーションを通じてトレーニングされ、したがって、不確実性分布を暗黙的に学習する。本研究は, 従来のDNN法と比較して, 高チャネル, 計算不確実性系において, 堅牢な遅延性能が向上していることを検証するものである。

The convergence of communication and computation, along with the integration of machine learning and artificial intelligence, stand as key empowering pillars for the sixth-generation of communication systems (6G). This paper considers a network of one base station serving a number of devices simultaneously using spatial multiplexing. The paper then presents an innovative deep learning-based approach to simultaneously manage the transmit and computing powers, alongside computation allocation, amidst uncertainties in both channel and computing states information. More specifically, the paper aims at proposing a robust solution that minimizes the worst-case delay across the served devices subject to computation and power constraints. The paper uses a deep neural network (DNN)-based solution that maps estimated channels and computation requirements to optimized resource allocations. During training, uncertainty samples are injected after the DNN output to jointly account for both communication and computation estimation errors. The DNN is then trained via backpropagation using the robust utility, thus implicitly learning the uncertainty distributions. Our results validate the enhanced robust delay performance of the joint uncertainty injection versus the classical DNN approach, especially in high channel and computational uncertainty regimes.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# Npix2Cpix: 歴史的文書画像からの透かし検索のための検索分類統合を備えたGANベースの画像変換ネットワーク

Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images ( http://arxiv.org/abs/2406.03556v1 )

ライセンス: Link先を確認

Utsab Saha, Sawradip Saha, Shaikh Anowarul Fattah, Mohammad Saquib,

(参考訳) 古代の透かしの識別と復元は、長い間、コーディコロジーと歴史の主要なトピックであった。透かしに基づく歴史文書の分類は、透かしの多様性、混み合った、騒々しいサンプル、複数の表現のモード、クラスとクラス内の変化の微妙な区別のために困難である。本稿では,U-net をベースとした条件付き逆数生成ネットワーク (GAN) を提案する。劣化した(ノイズの多い)ピクセルからクリーンなピクセルへの画像変換を行う能力を考えると、提案するネットワークはNpix2Cpixと呼ばれる。提案ネットワークでは,直接劣化した透かし画像を利用する代わりに,逆算学習を用いて画像から画像への変換を用いて,透かしの復元と分類を行う。入力ノイズ画像からクリーンな画像を出力するマッピングを学習するために、提案したU-netベースのGANのジェネレータと判別器を、画像間の距離に基づいて2つの別々の損失関数を用いて訓練する。提案したGANをノイズの多い透かし画像の前処理に使用した後、シームズをベースとしたワンショット学習を用いて透かしを分類する。大規模な歴史的透かしデータセットの実験結果によると、汚染画像から透かしを抽出すると、高いワンショット分類精度が得られる。得られた透かしの質的,定量的評価は,提案手法の有効性を示すものである。

The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks can be difficult due to the diversity of watermarks, crowded and noisy samples, multiple modes of representation, and minor distinctions between classes and intra-class changes. This paper proposes a U-net-based conditional generative adversarial network (GAN) to translate noisy raw historical watermarked images into clean, handwriting-free images with just watermarks. Considering its ability to perform image translation from degraded (noisy) pixels to clean pixels, the proposed network is termed as Npix2Cpix. Instead of employing directly degraded watermarked images, the proposed network uses image-to-image translation using adversarial learning to create clutter and handwriting-free images for restoring and categorizing the watermarks for the first time. In order to learn the mapping from input noisy image to output clean image, the generator and discriminator of the proposed U-net-based GAN are trained using two separate loss functions, each of which is based on the distance between images. After using the proposed GAN to pre-process noisy watermarked images, Siamese-based one-shot learning is used to classify watermarks. According to experimental results on a large-scale historical watermark dataset, extracting watermarks from tainted images can result in high one-shot classification accuracy. The qualitative and quantitative evaluation of the retrieved watermarks illustrates the effectiveness of the proposed approach.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# サブトラクティブホモモルフィズムによる外部データベースのステートレスおよび非インタラクティブ順序保存暗号化

Stateless and Non-Interactive Order-Preserving Encryption for Outsourced Databases through Subtractive Homomorphism ( http://arxiv.org/abs/2406.03559v1 )

ライセンス: Link先を確認

Dongfang Zhao,

(参考訳) OPEは、アウトソースされたデータベースサーバが、インデックスや完全な範囲クエリを構築するために、暗号化されたタプルをソートできる重要な技術であるため、アウトソースされたデータベースの文脈で、20年以上にわたって広く研究されてきた。最先端のOPEスキームの必要性 (i)ステートフルなクライアント -- クライアントが平文と暗号文の間のマッピングのローカルストレージを管理していることを意味する。 (ii)クエリ中のクライアントとサーバ間のインタラクション。第一のケースでは、ストレージ要件がクライアントの能力を超える可能性がある;第二のケースでは、サーバがソートや比較を含むクエリを実行すると、クライアントはアクセスできないかもしれない。本稿では、ステートレスクライアントに適した新しいOPEスキームを提案し、クエリ中にクライアントとサーバのインタラクションを必要としない。提案プロトコルの鍵となる考え方は,2つの平文の違いの符号が評価鍵を持つ代数演算によって明らかにされるように,同型暗号スキームの基盤となる付加性を活用することである。本論文では,提案プロトコルの正当性と安全性を実証し,その実装と実験結果を拡張レポートに示す。

Order-preserving encryption (OPE) has been extensively studied for more than two decades in the context of outsourced databases because OPE is a key enabling technique to allow the outsourced database servers to sort encrypted tuples in order to build indexes, complete range queries, and so forth. The state-of-the-art OPE schemes require (i) a stateful client -- implying that the client manages the local storage of some mapping between plaintexts and ciphertexts, and/or (ii) the interaction between the client and the server during the query. In production systems, however, the above assumptions do not always hold (not to mention performance overhead): In the first case, the storage requirement could exceed the capability of the client; In the second case, the clients may not be accessible when the server executes a query involving sort or comparison. This paper proposes a new OPE scheme that works for stateless clients and requires no client-server interaction during the queries. The key idea of our proposed protocol is to leverage the underlying additive property of a homomorphic encryption scheme such that the sign of the difference between two plaintexts can be revealed by some algebraic operations with an evaluation key. We will demonstrate the correctness and security of the proposed protocol in this short paper; the implementation and experimental results will be presented in an extended report.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 非線形モデル縮小のためのニューラルな経験的補間法

Neural empirical interpolation method for nonlinear model reduction ( http://arxiv.org/abs/2406.03562v1 )

ライセンス: Link先を確認

Max Hirsch, Federico Pichi, Jan S. Hesthaven,

(参考訳) 本稿では,離散的経験的補間法に代わるニューラルネットワークを用いたニューラル・経験的補間法(NEIM)を導入し,パラメータ化された非線形偏微分方程式に対するリミットオーダーモデル(ROM)において非線形項の計算の時間的複雑さを低減する。 NEIMは、ROMの非線形項のアフィン分解を近似することにより、この還元を達成し、拡張のベクトル項はROM溶液によってニューラルネットワークによって与えられ、係数はいくつかの「最適」係数の補間によって与えられる。 NEIMは強欲な戦略に基づいており,その性能を調査するための基本的な誤り解析を行うことができる。 NEIMは、自動微分モデルにおいて実装が容易で、ROM非線形性の非線形射影であり、非局所非線形性と局所非線形性の両方に効率的であり、ROM非線形性の明示的な形式ではなく、データのみに依存するという利点がある。本稿では, 解依存および解非依存の非線形性, 非線形楕円問題, および液晶の非線形パラボリックモデルに対する方法論の有効性を示す。

In this paper, we introduce the neural empirical interpolation method (NEIM), a neural network-based alternative to the discrete empirical interpolation method for reducing the time complexity of computing the nonlinear term in a reduced order model (ROM) for a parameterized nonlinear partial differential equation. NEIM is a greedy algorithm which accomplishes this reduction by approximating an affine decomposition of the nonlinear term of the ROM, where the vector terms of the expansion are given by neural networks depending on the ROM solution, and the coefficients are given by an interpolation of some "optimal" coefficients. Because NEIM is based on a greedy strategy, we are able to provide a basic error analysis to investigate its performance. NEIM has the advantages of being easy to implement in models with automatic differentiation, of being a nonlinear projection of the ROM nonlinearity, of being efficient for both nonlocal and local nonlinearities, and of relying solely on data and not the explicit form of the ROM nonlinearity. We demonstrate the effectiveness of the methodology on solution-dependent and solution-independent nonlinearities, a nonlinear elliptic problem, and a nonlinear parabolic model of liquid crystals.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# GFN:多元性応用における分解能不変化演算子学習のためのグラフフィードフォワードネットワーク

GFN: A graph feedforward network for resolution-invariant reduced operator learning in multifidelity applications ( http://arxiv.org/abs/2406.03569v1 )

ライセンス: Link先を確認

Oisín M. Morrison, Federico Pichi, Jan S. Hesthaven,

(参考訳) 本研究は,多忠実度アプリケーションのための新しい分解能不変モデルオーダー削減戦略を提案する。この研究で開発された新しいニューラルネットワーク層であるグラフフィードフォワードネットワークは、ニューラルネットワークの重みとメッシュのノードとを直接リンクすることで、フィードフォワードネットワークの概念をグラフ構造化データに拡張し、ネットワークの解釈可能性を高める。パラメトリックな偏微分方程式に対する自己エンコーダに基づく還元戦略において,異なるメッシュサイズでのトレーニングとテストの能力を利用する。この拡張は、エラーバウンダリによるパフォーマンス保証が保証されていることを示している。提案手法の能力は, 対流支配現象や高次元パラメータ空間の問題を含む3つの挑戦的ベンチマークで検証される。この手法は, 最先端モデルと比較して軽量で柔軟な手法であり, 単一忠実度と多忠実度の両方のシナリオにおいて優れた一般化性能を示す。

This work presents a novel resolution-invariant model order reduction strategy for multifidelity applications. We base our architecture on a novel neural network layer developed in this work, the graph feedforward network, which extends the concept of feedforward networks to graph-structured data by creating a direct link between the weights of a neural network and the nodes of a mesh, enhancing the interpretability of the network. We exploit the method's capability of training and testing on different mesh sizes in an autoencoder-based reduction strategy for parametrised partial differential equations. We show that this extension comes with provable guarantees on the performance via error bounds. The capabilities of the proposed methodology are tested on three challenging benchmarks, including advection-dominated phenomena and problems with a high-dimensional parameter space. The method results in a more lightweight and highly flexible strategy when compared to state-of-the-art models, while showing excellent generalisation performance in both single fidelity and multifidelity scenarios.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# Concave Objectivesを用いたオンラインパッケージングのための簡易学習支援アルゴリズム

A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives ( http://arxiv.org/abs/2406.03574v1 )

ライセンス: Link先を確認

Elena Grigorescu, Young-San Lin, Maoyuan Song,

(参考訳) 学習強化アルゴリズムは、アルゴリズムの性能を向上させるために機械学習予測を使用する可能性があるため、近年、コンピュータサイエンスコミュニティで広く研究されている。予測は、将来を知ることなく、取り消せない決定をするオンラインアルゴリズムにとって特に有用である。このような学習強化されたアルゴリズムは、予測が正確である場合の古典的なオンラインアルゴリズムの限界を克服し、予測が不正確である場合の相容れない実行を目標としている。一般的なアプローチは、既存のオンラインアルゴリズムを特定のアドバイス概念に適応させることである。しかし、理想的には、従来のオンラインソリューションをブラックボックス方式で単純に使うだけで、近似の保証に大きな損失を被ることはない。ブラックボックスを開くのを避けるようなクリーンなソリューションは、しばしばまれであり、初めて見逃されることもある。例えば、Grigorescu et al (NeurIPS 22) は線形プログラムを網羅するオンライン学習アルゴリズムを提案したが、後に彼らの論文で述べられているように、彼らの結果はアドバイスとブラックボックスとして与えられるオンラインアルゴリズムを切り替える自然なアプローチによって仮定できることが判明した。本研究では,オンラインパッキング問題に対して,線形制約とコンケーブ目的を用いた単純な学習拡張アルゴリズムを導入,解析する。オンラインパッキングリニアプログラミング、knapsack、リソース管理のメリット、スループットの最大化、ネットワークユーティリティの最大化など、当社のフレームワークの直接的な応用例をいくつか紹介する。さらに、このような単純なブラックボックス解が最適である場合に必要かつ十分な条件を理解するという問題を提起する。これは、文献から多くのアドホックなアプローチを統合する研究の重要な方向であると考えています。

Learning-augmented algorithms has been extensively studied recently in the computer-science community, due to the potential of using machine learning predictions in order to improve the performance of algorithms. Predictions are especially useful for online algorithms making irrevocable decisions without knowledge of the future. Such learning-augmented algorithms aim to overcome the limitations of classical online algorithms when the predictions are accurate, and still perform comparably when the predictions are inaccurate. A common approach is to adapt existing online algorithms to the particular advice notion employed, which often involves understanding previous sophisticated algorithms and their analyses. However, ideally, one would simply use previous online solutions in a black-box fashion, without much loss in the approximation guarantees. Such clean solutions that avoid opening up black-boxes are often rare, and may be even missed the first time around. For example, Grigorescu et al. (NeurIPS 22) proposed a learning-augmented algorithms for online covering linear programs, but it later turned out that their results can be subsumed by a natural approach that switches between the advice and an online algorithm given as a black-box, as noted in their paper. In this work, we introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives. We exhibit several direct applications of our framework including online packing linear programming, knapsack, resource management benefit, throughput maximization, and network utility maximization. We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal. We believe this is an important direction of research that would unify many ad-hoc approaches from the literature.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 因果推論における不均一効果の再検討

Reconciling Heterogeneous Effects in Causal Inference ( http://arxiv.org/abs/2406.03575v1 )

ライセンス: Link先を確認

Audrey Chang, Emily Diana, Alexander Williams Tolbert,

(参考訳) 本稿では,因果推論における参照クラス問題に対する解法を提案する。本稿では、機械学習におけるモデル乗法にReconcileアルゴリズムを適用し、因果推論における異種効果を再現する。不均一効果の条件平均処理効果(CATE)推定器の相違は参照クラス問題を引き起こす。確率を解釈するために個人からグループ・フレームワークを採用することで、科学哲学や因果推論などの分野にまたがる参照クラス問題は、コンピュータ科学におけるモデル乗法問題と同等であることがわかる。次に、CATE推定器の個々の確率の差分を分解するためにReconcile Algorithmを適用した。基準クラス問題は,グループベースエビデンスを用いた個人確率予測の文脈に現れるため,医療,保険,住宅などの高所得者,特に疎外化社会において,公正な結果の確保に有意な意味を持つ。予測モデリングにおける格差緩和の重要性を強調することで、技術的厳密さと社会的含意の意識を融合した学際戦略のさらなる探究が求められます。最終的に、我々の発見はアルゴリズムの公正性に対する全体論的アプローチを提唱し、株式とアクセスの幅広い目標を達成する上で、思慮深い、十分に取り巻かれたソリューションの重要な役割をあらわすものである。

In this position and problem pitch paper, we offer a solution to the reference class problem in causal inference. We apply the Reconcile algorithm for model multiplicity in machine learning to reconcile heterogeneous effects in causal inference. Discrepancy between conditional average treatment effect (CATE) estimators of heterogeneous effects poses the reference class problem, where estimates for individual predictions differ by choice of reference class. By adopting the individual to group framework for interpreting probability, we can recognize that the reference class problem -- which appears across fields such as philosophy of science and causal inference -- is equivalent to the model multiplicity problem in computer science. We then apply the Reconcile Algorithm to reconcile differences in estimates of individual probability among CATE estimators. Because the reference class problem manifests in contexts of individual probability prediction using group-based evidence, our results have tangible implications for ensuring fair outcomes in high-stakes such as healthcare, insurance, and housing, especially for marginalized communities. By highlighting the importance of mitigating disparities in predictive modeling, our work invites further exploration into interdisciplinary strategies that combine technical rigor with a keen awareness of social implications. Ultimately, our findings advocate for a holistic approach to algorithmic fairness, underscoring the critical role of thoughtful, well-rounded solutions in achieving the broader goals of equity and access.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 階層化データ拡張による交通信号認識の強化--クラス不均衡とインスタンススカルシティの対応

Enhancing Traffic Sign Recognition with Tailored Data Augmentation: Addressing Class Imbalance and Instance Scarcity ( http://arxiv.org/abs/2406.03576v1 )

ライセンス: Link先を確認

Ulan Alsiyeu, Zhasdauren Duisebekov,

(参考訳) 本稿では、道路安全に不可欠な交通標識認識(TSR)における重要な課題、特にデータセットにおけるクラス不均衡とインスタンス不足に対処する。本稿では,合成画像生成,幾何変換,およびモデル堅牢性と精度向上のためのデータセット品質向上のための新しい障害物ベースの拡張手法など,データ拡張技術を紹介する。本手法は,実世界の条件を正確にシミュレートするための多種多様な拡張プロセスを導入し,トレーニングデータの多様性と代表性を拡大する。この結果,TSRモデルの性能は大幅に向上し,交通標識認識システムに大きな影響を及ぼすことがわかった。この研究は、TSRのデータセット制限に対処するだけでなく、異なる領域やアプリケーションにまたがる同様の課題のモデルも提案している。

This paper tackles critical challenges in traffic sign recognition (TSR), which is essential for road safety -- specifically, class imbalance and instance scarcity in datasets. We introduce tailored data augmentation techniques, including synthetic image generation, geometric transformations, and a novel obstacle-based augmentation method to enhance dataset quality for improved model robustness and accuracy. Our methodology incorporates diverse augmentation processes to accurately simulate real-world conditions, thereby expanding the training data's variety and representativeness. Our findings demonstrate substantial improvements in TSR models performance, offering significant implications for traffic sign recognition systems. This research not only addresses dataset limitations in TSR but also proposes a model for similar challenges across different regions and applications, marking a step forward in the field of computer vision and traffic sign recognition systems.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 機械学習における脆弱性検出のための貢献要因の説明

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning ( http://arxiv.org/abs/2406.03577v1 )

ライセンス: Link先を確認

Esma Mouine, Yan Liu, Lu Xiao, Rick Kazman, Xiao Wang,

(参考訳) ソフトウェアリポジトリから脆弱性をマイニングし、機械学習技術を使ってソフトウェア脆弱性を自動的に検出する傾向が増えている。マイニングと学習プロセスの異なる要因は、様々な特性を持つソフトウェアプロジェクトの脆弱性を特定する精度にどのように影響しますか? ソースコードの静的解析、ソフトウェアリポジトリマイニング、NLPベースの機械学習など、この分野での実質的な研究が進められている。しかし、実践者は最先端のベースラインモデルを構築する上で重要な要素についての経験を欠いている。さらに、プロジェクトからプロジェクトへの脆弱性シグネチャの転送可能性に関する経験が不足している。本研究では、異なる脆弱性機能と3つの代表的な機械学習モデルの組み合わせが、実際の17のプロジェクトにおいて、脆弱性検出の精度にどのように影響するかを検討する。脆弱性表現には2つの種類がある。 1) 異なるトークン化戦略と3つの異なる埋め込み技術(bag-of-words, word2vec, fastText)でNLPから抽出されたコード機能。 2) ソフトウェアシステムの抽象的な設計を捉える8つのアーキテクチャメトリクスのセット。 3つの機械学習アルゴリズムには、ランダムフォレストモデル、サポートベクターマシンモデル、残留ニューラルネットワークモデルが含まれる。解析の結果,単語のバケット埋め込みから抽出したシグネチャをランダムな森林と組み合わせることで,他の17プロジェクトと比較すると,検出精度を約4%向上することがわかった。さらに,本実験により,脆弱性シグネチャのドメイン間での転送制限についても検討した。

There is an increasing trend to mine vulnerabilities from software repositories and use machine learning techniques to automatically detect software vulnerabilities. A fundamental but unresolved research question is: how do different factors in the mining and learning process impact the accuracy of identifying vulnerabilities in software projects of varying characteristics? Substantial research has been dedicated in this area, including source code static analysis, software repository mining, and NLP-based machine learning. However, practitioners lack experience regarding the key factors for building a baseline model of the state-of-the-art. In addition, there lacks of experience regarding the transferability of the vulnerability signatures from project to project. This study investigates how the combination of different vulnerability features and three representative machine learning models impact the accuracy of vulnerability detection in 17 real-world projects. We examine two types of vulnerability representations: 1) code features extracted through NLP with varying tokenization strategies and three different embedding techniques (bag-of-words, word2vec, and fastText) and 2) a set of eight architectural metrics that capture the abstract design of the software systems. The three machine learning algorithms include a random forest model, a support vector machines model, and a residual neural network model. The analysis shows a recommended baseline model with signatures extracted through bag-of-words embedding, combined with the random forest, consistently increases the detection accuracy by about 4% compared to other combinations in all 17 projects. Furthermore, we observe the limitation of transferring vulnerability signatures across domains based on our experiments.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 食品による拡散概念代数の限界を理解する

Understanding the Limitations of Diffusion Concept Algebra Through Food ( http://arxiv.org/abs/2406.03582v1 )

ライセンス: Link先を確認

E. Zhixuan Zeng, Yuhao Chen, Alexander Wong,

(参考訳) 近年,画像生成技術,特に潜伏拡散モデルが急速に普及している。これらの大規模モデルが学習する意味概念を操作および明確化するために多くの技術が開発され、バイアスと概念関係に関する重要な洞察を提供する。しかしながら、これらの技法は、人間や動物の顔の伝統的な領域と芸術的スタイルの変遷においてのみ検証されることが多い。食品分野は、複雑な構成と地域バイアスを通じて固有の課題を提供しており、既存の方法の限界と機会に光を当てることができる。食品画像のレンズを通して,概念横断技術における定性的パターンと定量的パターンを解析する。我々は、モデルが料理の多様性のニュアンスを捉え、表現する能力に関する測定可能な洞察を明らかにし、モデルのバイアスと制限が出現する領域を特定する。

Image generation techniques, particularly latent diffusion models, have exploded in popularity in recent years. Many techniques have been developed to manipulate and clarify the semantic concepts these large-scale models learn, offering crucial insights into biases and concept relationships. However, these techniques are often only validated in conventional realms of human or animal faces and artistic style transitions. The food domain offers unique challenges through complex compositions and regional biases, which can shed light on the limitations and opportunities within existing methods. Through the lens of food imagery, we analyze both qualitative and quantitative patterns within a concept traversal technique. We reveal measurable insights into the model's ability to capture and represent the nuances of culinary diversity, while also identifying areas where the model's biases and limitations emerge.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 遺伝的プログラミングへのシンボリック回帰のための最近のアルゴリズムの比較

A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming ( http://arxiv.org/abs/2406.03585v1 )

ライセンス: Link先を確認

Yousef A. Radwan, Gabriel Kronberger, Stephan Winkler,

(参考訳) 記号回帰は、解釈可能な結果を生成することを目標とする機械学習手法である。例えばランダムな森やニューラルネットワークのような、不透明な他の機械学習手法とは異なり、象徴的回帰は、科学者が理解可能な方法でデータをモデル化し、マップすることを目的としている。ニューラルネットのマッピング能力と深層学習技術とを、記号回帰の説明力で融合させようとする新しい手法である。本稿では,これらの新しいシステムについて検討し,長年にわたってシンボルレグレッションを先導してきた遺伝的プログラミングに基づく従来の手法と比較して,エンド・ツー・エンドのトランスフォーマーモデルの性能を検証した。我々は、これらのシステムを新しいデータセット上で比較し、よく知られたベンチマークデータセットで改善された古い手法のバイアスを避ける。 Operon が実装した従来の GP 法は,最近発表された2つのシンボル回帰法よりも依然として優れていることを示す。

Symbolic regression is a machine learning method with the goal to produce interpretable results. Unlike other machine learning methods such as, e.g. random forests or neural networks, which are opaque, symbolic regression aims to model and map data in a way that can be understood by scientists. Recent advancements, have attempted to bridge the gap between these two fields; new methodologies attempt to fuse the mapping power of neural networks and deep learning techniques with the explanatory power of symbolic regression. In this paper, we examine these new emerging systems and test the performance of an end-to-end transformer model for symbolic regression versus the reigning traditional methods based on genetic programming that have spearheaded symbolic regression throughout the years. We compare these systems on novel datasets to avoid bias to older methods who were improved on well-known benchmark datasets. Our results show that traditional GP methods as implemented e.g., by Operon still remain superior to two recently published symbolic regression methods.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# CountCLIP -- [Re]CLIPに10までのカウントを教える

CountCLIP -- [Re] Teaching CLIP to Count to Ten ( http://arxiv.org/abs/2406.03586v1 )

ライセンス: Link先を確認

Harshvardhan Mestha, Tejas Agarwal, Karan Bania, Shreyas V, Yash Bhisikar,

(参考訳) 大規模視覚言語モデル(VLM)は、下流タスクにおける高いパフォーマンスを実現するために、リッチな共同画像テキスト表現を学習する。しかし、それらはオブジェクトの定量的な理解を示すことができず、カウント・アウェアの表現が不十分である。本稿では,CLIPモデル(Radford et al ,2021)を微調整し,ゼロショット分類の性能を維持しつつ,画像中のゼロショットカウント精度を向上させる方法を提案する。より少ない計算資源でトレーニングデータの小さなサブセットでモデルの性能を向上させる。私たちは、自分たちのコードで研究を再現することで、これらの主張を検証する。実装はhttps://github.com/SforAiDl/CountCLIPで確認できる。

Large vision-language models (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good counting-aware representation. This paper conducts a reproducibility study of 'Teaching CLIP to Count to Ten' (Paiss et al., 2023), which presents a method to finetune a CLIP model (Radford et al., 2021) to improve zero-shot counting accuracy in an image while maintaining the performance for zero-shot classification by introducing a counting-contrastive loss term. We improve the model's performance on a smaller subset of their training data with lower computational resources. We verify these claims by reproducing their study with our own code. The implementation can be found at https://github.com/SforAiDl/CountCLIP.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 対話型検索エンジンにおけるランキング操作

Ranking Manipulation for Conversational Search Engines ( http://arxiv.org/abs/2406.03589v1 )

ライセンス: Link先を確認

Samuel Pfrommer, Yatong Bai, Tanmay Gautam, Somayeh Sojoudi,

(参考訳) 主要な検索エンジンプロバイダは、ユーザクエリに応答して、Large Language Model (LLM)生成コンテンツを急速に取り入れている。これらの対話型検索エンジンは、検索したWebサイトテキストをLLMコンテキストにロードして、要約と解釈を行う。近年の研究では、LLMはジェイルブレイクやインジェクション攻撃に対して非常に脆弱であることが示されており、敵弦を用いたLLMの安全性と品質の目標を阻害している。本研究では,対話型検索エンジンが参照するソースのランク付け順序に対するインジェクションのインジェクションの影響について検討する。そこで本研究では,現実の消費者製品Webサイトの集中データセットを導入し,対話型検索ランキングを敵問題として定式化する。実験により, 対向注入のない会話型検索ランキングを解析し, 製品名, 文書内容, コンテキスト位置の優先順位付けにおいて, 異なるLLMが著しく異なることを示す。次に、低ランク製品を確実に促進する攻撃木ベースのジェイルブレイク手法を提案する。重要なことに、これらの攻撃はPerplexity.aiのような最先端の会話検索エンジンに効果的に転送される。ウェブサイト所有者が検索ランクを上げるための強力な金銭的インセンティブを考えると、我々の問題定式化は将来の堅牢性作業にとって重要であると論じる。

Major search engine providers are rapidly incorporating Large Language Model (LLM)-generated content in response to user queries. These conversational search engines operate by loading retrieved website text into the LLM context for summarization and interpretation. Recent research demonstrates that LLMs are highly vulnerable to jailbreaking and prompt injection attacks, which disrupt the safety and quality goals of LLMs using adversarial strings. This work investigates the impact of prompt injections on the ranking order of sources referenced by conversational search engines. To this end, we introduce a focused dataset of real-world consumer product websites and formalize conversational search ranking as an adversarial problem. Experimentally, we analyze conversational search rankings in the absence of adversarial injections and show that different LLMs vary significantly in prioritizing product name, document content, and context position. We then present a tree-of-attacks-based jailbreaking technique which reliably promotes low-ranked products. Importantly, these attacks transfer effectively to state-of-the-art conversational search engines such as perplexity.ai. Given the strong financial incentive for website owners to boost their search ranking, we argue that our problem formulation is of critical importance for future robustness work.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# BVE + EKF:拡張カルマンフィルタを用いた3次元タスク空間における物体位置推定のための視点推定器

BVE + EKF: A viewpoint estimator for the estimation of the object's position in the 3D task space using Extended Kalman Filters ( http://arxiv.org/abs/2406.03591v1 )

ライセンス: Link先を確認

Sandro Costa Magalhães, António Paulo Moreira, Filipe Neves dos Santos, Jorge Dias,

(参考訳) RGB-Dセンサーは、放射線や雨などの外部の摂動に敏感であるため、オープンフィールド環境で動作している複数の課題に直面している。複数の作品がモノクロカメラを用いて物体の3D位置を認識するという課題に近づいている。しかし、これらの研究の大部分は、複雑なデータ駆動型で予測が難しいディープラーニングベースのソリューションに重点を置いている。そこで本稿では,拡張カルマンフィルタ (EKF) を用いたガウス視点推定器 (BVE) を用いて3次元物体の位置を予測する問題にアプローチする。このアルゴリズムはタスクの効率を証明し、最大平均ユークリッド誤差は約32mmに達した。実験は人工ガウス雑音を用いてMATLABに展開・評価された。今後の研究は、ロボットシステムにシステムを実装することを目指している。

RGB-D sensors face multiple challenges operating under open-field environments because of their sensitivity to external perturbations such as radiation or rain. Multiple works are approaching the challenge of perceiving the 3D position of objects using monocular cameras. However, most of these works focus mainly on deep learning-based solutions, which are complex, data-driven, and difficult to predict. So, we aim to approach the problem of predicting the 3D objects' position using a Gaussian viewpoint estimator named best viewpoint estimator (BVE) powered by an extended Kalman filter (EKF). The algorithm proved efficient on the tasks and reached a maximum average Euclidean error of about 32 mm. The experiments were deployed and evaluated in MATLAB using artificial Gaussian noise. Future work aims to implement the system in a robotic system.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# 質問応答システムにおける検索複雑性の測定

Measuring Retrieval Complexity in Question Answering Systems ( http://arxiv.org/abs/2406.03592v1 )

ライセンス: Link先を確認

Matteo Gabburo, Nicolaas Paul Jedema, Siddhant Garg, Leonardo F. R. Ribeiro, Alessandro Moschitti,

(参考訳) 本稿では,検索に基づく質問回答(QA)においてどの質問が困難なのかを検討する。我が家一検索複雑性(RC)とは、検索された文書の完全性に基づき、質問に答えることの難しさを測る新しい計量である。 (II)任意の検索システムに与えられたRCを測定するための教師なしパイプラインを提案する。提案するパイプラインは,6つのQAベンチマークにおいて,LLMを含む代替推定器よりもRCを正確に測定する。さらに、RCスコアは6つのベンチマークのうち5つでQA性能と専門家の判断の両方と強く相関しており、RCが質問の難易度を効果的に測定していることを示している。その後の高RC質問の分類は、複数のホップ、構成、時間的QAを含む幅広い質問形態にまたがっており、RCスコアが複雑な質問の新たなサブセットを分類できることを示している。我々のシステムは、既存のデータセットに関するより困難な質問の特定を支援することで、検索ベースのシステムに大きな影響を与える。

In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC more accurately than alternative estimators, including LLMs, on six challenging QA benchmarks. Further investigation reveals that RC scores strongly correlate with both QA performance and expert judgment across five of the six studied benchmarks, indicating that RC is an effective measure of question difficulty. Subsequent categorization of high-RC questions shows that they span a broad set of question shapes, including multi-hop, compositional, and temporal QA, indicating that RC scores can categorize a new subset of complex questions. Our system can also have a major impact on retrieval-based systems by helping to identify more challenging questions on existing datasets.

翻訳日:2024-06-07 19:14:47 公開日:2024-06-05

# なぜ「プロブレム」が肯定的感性を予測するのか : 感性分類における非直観的特徴の説明を事例として

Why is "Problems" Predictive of Positive Sentiment? A Case Study of Explaining Unintuitive Features in Sentiment Classification ( http://arxiv.org/abs/2406.03594v1 )

ライセンス: Link先を確認

Jiaming Qu, Jaime Arguello, Yue Wang,

(参考訳) 説明可能なAI(XAI)アルゴリズムは、マシンラーニングモデルがどのように予測を行うかを理解するためのものだ。この目的のために、多くのアプローチが、どの入力特徴がターゲットラベルの最も予測可能であるかを説明している。しかし、そのような説明は依然としてユーザを困惑させる可能性がある(例えば、製品レビューでは、"problems"という言葉は肯定的な感情を予測している)。説明が残っていない場合、曖昧な説明は否定的な影響を与える可能性がある。入力特徴と対象ラベルの非直感的関連を説明することは,XAI研究における未探索領域である。本研究は、感情分類器によって学習された直感的関連を事例として、この方向の最初の取り組みを行う。本研究では,(1)ユーザに対して直感的に見える連想を自動的に検出する手法を提案し,(2)非直感的特徴が予測的である理由を理解するための説明を生成する。クラウドソースによる調査(N=300)の結果,提案手法は感情分類における予測的だが直観的でない特徴を効果的に検出・説明できることがわかった。

Explainable AI (XAI) algorithms aim to help users understand how a machine learning model makes predictions. To this end, many approaches explain which input features are most predictive of a target label. However, such explanations can still be puzzling to users (e.g., in product reviews, the word "problems" is predictive of positive sentiment). If left unexplained, puzzling explanations can have negative impacts. Explaining unintuitive associations between an input feature and a target label is an underexplored area in XAI research. We take an initial effort in this direction using unintuitive associations learned by sentiment classifiers as a case study. We propose approaches for (1) automatically detecting associations that can appear unintuitive to users and (2) generating explanations to help users understand why an unintuitive feature is predictive. Results from a crowdsourced study (N=300) found that our proposed approaches can effectively detect and explain predictive but unintuitive features in sentiment classification.