Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240528となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 道徳的判断をテキストから復号する:パイロット研究 Decoding moral judgement from text: a pilot study ( http://arxiv.org/abs/2407.00039v1 ) ライセンス: Link先を確認	Diana E. Gherman, Thorsten O. Zander,	(参考訳) 道徳的判断は、認知的・感情的な次元に関わる複雑な人間の反応である。道徳的神経相関のいくつかは知られているが、単一の裁判所レベルで道徳的違反を検出することができるかどうかはまだ分かっていない。本稿では,受動的脳-コンピュータインタフェースを用いたテキスト刺激による道徳的判断復号の実現可能性について検討する。効果的な道徳的判断の誘因として,テキスト刺激提示に先立って映像音声による情緒的プライミングを用い,そのテキストを道徳的エージェントに属性付けする。以上の結果から,道徳的整合性と不整合状態との信頼性の高い分類を実現するためには,さらなる努力が必要であることが示唆された。我々は、中立と道徳的にチャージされた試験の精度の良い結果を得る。本研究では,ニューロアダプティブな人間-コンピュータインタラクションと,より人間互換な大規模言語モデル(LLM)への道を開くことを目的とする。 Moral judgement is a complex human reaction that engages cognitive and emotional dimensions. While some of the morality neural correlates are known, it is currently unclear if we can detect moral violation at a single-trial level. In a pilot study, here we explore the feasibility of moral judgement decoding from text stimuli with passive brain-computer interfaces. For effective moral judgement elicitation, we use video-audio affective priming prior to text stimuli presentation and attribute the text to moral agents. Our results show that further efforts are necessary to achieve reliable classification between moral congruency vs. incongruency states. We obtain good accuracy results for neutral vs. morally-charged trials. With this research, we try to pave the way towards neuroadaptive human-computer interaction and more human-compatible large language models (LLMs)	翻訳日:2024-07-22 22:38:24 公開日:2024-05-28
# UCAVドッグファイトにおけるDRLを用いた空気圧決定法の検討 Interpretable DRL-based Maneuver Decision of UCAV Dogfight ( http://arxiv.org/abs/2407.01571v1 ) ライセンス: Link先を確認	Haoran Han, Jian Cheng, Maolong Lv,	(参考訳) 本稿では, 深部強化学習(DRL)が高次機動決定に寄与する3層無人戦闘機(UCAV)のドッグファイトフレームを提案する。 4チャンネルの低レベル制御法が最初に構築され、続いて8つの基本的な飛行操作(BFM)を含む図書館が設けられている。 UCAVドッグファイトにおけるBFM選択にはDouble Deep Q Network (DDQN) が適用される。シミュレーションの結果, エージェントはDT戦略に対して85.75%の勝利率を達成でき, 各種の未確認相手に対面した場合, 肯定的な結果が得られることがわかった。提案した枠組みに基づいて,DRLをベースとしたドッグファイトの解釈性が有意に向上した。ヨーヨーを行い、旋回率を調整し、操作性を高める。ディーブ・アンド・チェイス」の行動の創発は、エージェントが相手の欠点を利用する新しい戦術を生成できることを示している。 This paper proposes a three-layer unmanned combat aerial vehicle (UCAV) dogfight frame where Deep reinforcement learning (DRL) is responsible for high-level maneuver decision. A four-channel low-level control law is firstly constructed, followed by a library containing eight basic flight maneuvers (BFMs). Double deep Q network (DDQN) is applied for BFM selection in UCAV dogfight, where the opponent strategy during the training process is constructed with DT. Our simulation result shows that, the agent can achieve a win rate of 85.75% against the DT strategy, and positive results when facing various unseen opponents. Based on the proposed frame, interpretability of the DRL-based dogfight is significantly improved. The agent performs yo-yo to adjust its turn rate and gain higher maneuverability. Emergence of "Dive and Chase" behavior also indicates the agent can generate a novel tactic that utilizes the drawback of its opponent.	翻訳日:2024-07-22 22:18:55 公開日:2024-05-28
# 深層学習によるインド株式市場のセクター収益性を探る Exploring Sectoral Profitability in the Indian Stock Market Using Deep Learning ( http://arxiv.org/abs/2407.01572v1 ) ライセンス: Link先を確認	Jaydip Sen, Hetvi Waghela, Sneha Rakshit,	(参考訳) 本稿では,Long-Term Memory(LSTM)モデルを用いた株価の正確な予測とそのポートフォリオ設計への応用について検討する。株価の予測は不可能であるという効率的な市場仮説にもかかわらず、最近の研究は高度なアルゴリズムと予測モデルの可能性を示している。この研究は、既存の株価予測手法に関する文献に基づいており、機械学習とディープラーニングアプローチへのシフトを強調している。 LSTMモデルでは、NSE、インドに上場している18のセクターで180銘柄の歴史的株価を用いて、将来の価格を予測する。これらの予測は、各株の売買決定を導き、セクターの収益性を分析する。この研究の主な貢献は、ロバストなポートフォリオ設計のための最適化LSTMモデルの導入、売買取引のためのLSTM予測の利用、セクターの収益性とボラティリティに関する洞察である。その結果,株価を正確に予測し,投資決定を下す上でLSTMモデルの有効性が示された。セクターの収益性と予測精度を比較することで、インドの現在の金融市場のダイナミクスに関する貴重な洞察を提供する。 This paper explores using a deep learning Long Short-Term Memory (LSTM) model for accurate stock price prediction and its implications for portfolio design. Despite the efficient market hypothesis suggesting that predicting stock prices is impossible, recent research has shown the potential of advanced algorithms and predictive models. The study builds upon existing literature on stock price prediction methods, emphasizing the shift toward machine learning and deep learning approaches. Using historical stock prices of 180 stocks across 18 sectors listed on the NSE, India, the LSTM model predicts future prices. These predictions guide buy/sell decisions for each stock and analyze sector profitability. The study's main contributions are threefold: introducing an optimized LSTM model for robust portfolio design, utilizing LSTM predictions for buy/sell transactions, and insights into sector profitability and volatility. Results demonstrate the efficacy of the LSTM model in accurately predicting stock prices and informing investment decisions. By comparing sector profitability and prediction accuracy, the work provides valuable insights into the dynamics of the current financial markets in India.	翻訳日:2024-07-22 22:18:55 公開日:2024-05-28
# 軌道最適化のためのモデルベース拡散 Model-Based Diffusion for Trajectory Optimization ( http://arxiv.org/abs/2407.01573v1 ) ライセンス: Link先を確認	Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu,	(参考訳) 拡散モデルの最近の進歩は、反復的な精錬プロセスを通じて複雑な分布から高忠実度サンプルを生成する強力な能力を示している。運動計画と制御における拡散モデルの実証的な成功にもかかわらず、これらの手法のモデルフリー性は、容易に利用可能なモデル情報を活用することができず、訓練データ以外の新しいシナリオ(例えば、異なるダイナミクスを持つ新しいロボット)にその一般化を制限しない。本研究では,モデルベース拡散(MBD)を導入し,データのない軌道最適化(TO)問題の解法として拡散法を用いた最適化手法を提案する。鍵となる考え方は、TO問題におけるモデル情報を活用することでスコア関数を明示的に計算することであり、これが我々のアプローチをモデルベース拡散と呼ぶ理由である。さらに、MBDは外部データを必要としないが、様々な品質のデータと自然に統合して拡散過程を制御できる。また、MBDはサンプリングベース最適化と興味深い関係があることも明らかにした。実験的な評価から,MBDは接触に富む課題に挑戦する上で,最先端の強化学習およびサンプリングベースTO法より優れていることが示された。さらに、MBDがデータと統合する能力は、標準拡散モデルの範囲を超えて、不完全かつ実用的なデータ(例えば、高次元ヒューマノイドの部分状態デモ)であっても、その汎用性と実用性を高める。 Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD's ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models.	翻訳日:2024-07-22 22:18:55 公開日:2024-05-28
# サーモグラフィー技術の探求:顔検出・認識・感情のための総合的な顔データセット Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion ( http://arxiv.org/abs/2407.09494v1 ) ライセンス: Link先を確認	Mohamed Fawzi Abdelshafie Abuhussein, Ashraf Darwish, Aboul Ella Hassanien,	(参考訳) このデータセットは、顔の検出、認識、感情分析のためにUNI-T UTi165Aカメラを用いてキャプチャされた6823の熱画像を含む。この画像は、感情(幸せ、悲しみ、怒り、自然、驚き)を描いた2485の顔認識画像と、顔認識のための2054のイメージと、顔検出のための2284のイメージで構成されている。このデータセットは、さまざまな条件、カラーパレット、撮影角度、ズームレベルをカバーしており、温度範囲は10{\deg}Cから400{\deg}C、解像度は19,200ピクセルである。これは、熱画像技術の進歩、アルゴリズム開発の支援、異なるパレットにわたる顔認識のためのベンチマークのための貴重なリソースとして機能する。さらに、顔の動き認識に寄与し、コンピュータビジョン、心理学、神経科学における学際的なコラボレーションを促進する。このデータセットは、セキュリティ、ヘルスケア、人間とコンピュータのインタラクションに応用して、サーマルフェイスの検出と認識の研究における透明性を促進する。 This dataset includes 6823 thermal images captured using a UNI-T UTi165A camera for face detection, recognition, and emotion analysis. It consists of 2485 facial recognition images depicting emotions (happy, sad, angry, natural, surprised), 2054 images for face recognition, and 2284 images for face detection. The dataset covers various conditions, color palettes, shooting angles, and zoom levels, with a temperature range of -10{\deg}C to 400{\deg}C and a resolution of 19,200 pixels. It serves as a valuable resource for advancing thermal imaging technology, aiding in algorithm development, and benchmarking for facial recognition across different palettes. Additionally, it contributes to facial motion recognition, fostering interdisciplinary collaboration in computer vision, psychology, and neuroscience. The dataset promotes transparency in thermal face detection and recognition research, with applications in security, healthcare, and human-computer interaction.	翻訳日:2024-07-22 13:38:25 公開日:2024-05-28
# Interpret3C: 個別の特徴選択による解釈可能な学生クラスタリング Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection ( http://arxiv.org/abs/2407.11979v1 ) ライセンス: Link先を確認	Isadora Salles, Paola Mejia-Domenzain, Vinitra Swamy, Julian Blackwell, Tanja Käser,	(参考訳) 教育におけるクラスタリング、特にMOOCのような大規模オンライン環境でのクラスタリングは、多様な学生のニーズを理解し、適応するために不可欠である。しかし、クラスタリングの有効性は、その解釈可能性に依存するため、高次元データでは困難になる。既存のクラスタリングアプローチは、機能の重要性における個々の違いを無視し、均質化された機能セットに依存していることが多い。このギャップに対処するために,解釈可能なニューラルネットワーク(NN)を教師なし学習コンテキストに組み込んだ,新たなクラスタリングパイプラインであるInterpret3C(Interpretable Conditional Computation Clustering)を導入する。本手法は, NNにおける適応ゲーティングを利用して, 生徒ごとの特徴を抽出する。次に、生徒毎の最も関連性の高い機能を使用してクラスタリングを行い、クラスタの関連性と解釈可能性を高める。我々はInterpret3Cを用いて,5,000人以上の学生を抱えるMOOCにおいて,個々の特徴の重要性を考慮した行動クラスタの分析を行った。この研究は、スケーラブルでロバストなクラスタリング手法と、個々の学生の違いを尊重し、高次元データの解釈可能性を改善する教育ケーススタディを提供することによって、この分野に貢献する。 Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging with high-dimensional data. Existing clustering approaches often neglect individual differences in feature importance and rely on a homogenized feature set. Addressing this gap, we introduce Interpret3C (Interpretable Conditional Computation Clustering), a novel clustering pipeline that incorporates interpretable neural networks (NNs) in an unsupervised learning context. This method leverages adaptive gating in NNs to select features for each student. Then, clustering is performed using the most relevant features per student, enhancing clusters' relevance and interpretability. We use Interpret3C to analyze the behavioral clusters considering individual feature importances in a MOOC with over 5,000 students. This research contributes to the field by offering a scalable, robust clustering methodology and an educational case study that respects individual student differences and improves interpretability for high-dimensional data.	翻訳日:2024-07-22 11:50:18 公開日:2024-05-28
# 家庭レベルの貧困度測定におけるブースティングアルゴリズムの利用:フィリピンにおける世帯重質質の予測と分類のための機械学習アプローチ Use of Boosting Algorithms in Household-Level Poverty Measurement: A Machine Learning Approach to Predict and Classify Household Wealth Quintiles in the Philippines ( http://arxiv.org/abs/2407.13061v1 ) ライセンス: Link先を確認	Erika Lynet Salvador,	(参考訳) 本研究では、アダプティブブースティング(AdaBoost)、キャットブースティング(CatBoost)、グラディエントブースティングマシン(GBM)、ライトグラディエントブースティングマシン(LightGBM)、エクストリームグラディエントブースティング(XGBoost)の5つのアルゴリズムを用いて、フィリピンの貧困レベルを予測する機械学習モデルの有効性を評価した。 CatBoostが上位モデルとして登場し、精度、精度、リコール、F1スコアで91%、XGBoostとGBMが99%、GBMが88%で最高スコアを記録した。さらに、これらのモデルの計算効率を調べ、実世界のアプリケーションに不可欠なトレーニング時間、テスト速度、モデルサイズ要因のバランスを分析する。訓練期間は長いものの、CatBoostは高い試験効率を示した。これらの結果から,機械学習は貧困予測や政策介入の進展に有効であることが示唆された。今後の研究は、これらのモデルの予測精度とポリシーユーティリティを高めるために、より広範な多様なデータを統合することに焦点を当てるべきである。 This study assessed the effectiveness of machine learning models in predicting poverty levels in the Philippines using five boosting algorithms: Adaptive Boosting (AdaBoost), CatBoosting (CatBoost), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). CatBoost emerged as the superior model and achieved the highest scores across accuracy, precision, recall, and F1-score at 91 percent, while XGBoost and GBM followed closely with 89 percent and 88 percent respectively. Additionally, the research examined the computational efficiency of these models to analyze the balance between training time, testing speed, and model size factors crucial for real-world applications. Despite its longer training duration, CatBoost demonstrated high testing efficiency. These results indicate that machine learning can aid in poverty prediction and in the development of targeted policy interventions. Future studies should focus on incorporating a wider variety of data to enhance the predictive accuracy and policy utility of these models.	翻訳日:2024-07-22 08:18:00 公開日:2024-05-28
# 個人に対するアービタリティのコスト--モデル多重性の法的・技術的課題の検討 The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity ( http://arxiv.org/abs/2407.13070v1 ) ライセンス: Link先を確認	Prakhar Ganesh, Ihsan Ibrahim Daldaban, Ignacio Cofone, Golnoosh Farnadi,	(参考訳) モデル多重性(Multipleity)とは、異なる基礎となる学習機能にもかかわらず、複数のモデルが類似した性能を達成する現象であり、モデル選択において任意性を導入する現象である。この仲裁性は期待に反するように見えるかもしれないが、個人への影響は深刻である。本稿では, 最終予測を超える仲裁性の効果, 保護グループに属する個人に対する仲裁性の違い, および, 様々な文脈にまたがってモノポリーを生成する単一アルゴリズムシステムの仲裁性に関わる課題など, 多重性から生じる様々な個人的関心事について検討する。これは、これらの懸念に関する実証的な調査と、法的な観点からの包括的な分析の両方を提供し、カナダの反差別法においてこれらの問題がどのように認識されているかに対処する。両分野の今後の研究方向性を明らかにするとともに,法的な要件を満たすためのモデル乗法と,現行法とモデル選択における任意性含意の法的ギャップの両面での技術的課題の議論を締めくくる。 Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introduces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.	翻訳日:2024-07-22 08:18:00 公開日:2024-05-28
# 先端メディア分析のためのメディアインサイトエンジン:ペットの健康診断のためのコンピュータビジョンイノベーションを事例として Media Insights Engine for Advanced Media Analysis: A Case Study of a Computer Vision Innovation for Pet Health Diagnosis ( http://arxiv.org/abs/2407.13679v1 ) ライセンス: Link先を確認	Anjanava Biswas,	(参考訳) 本稿では,大手ペット小売業者であるPetcoが,Media Insights Engineを用いてペットの健康分析プロセスを革新し,初診までの時間を短縮したケーススタディを提案する。同社はこのフレームワークを利用して、ペットのビデオや画像の健康上の問題を特定し、事前に構築された獣医学診断でAIの結果を検証するなど、高度なコンピュータビジョンタスクのためのカスタムアプリケーションを構築した。 Media Insights Engineはモジュラーで拡張可能なソリューションを提供しており、Petcoはメディアワークロードのための機械学習アプリケーションを素早く構築できる。このフレームワークを利用することで、Petcoはプロジェクトの開発を加速し、ペットの健康分析の効率を改善し、最終的にペットの健康問題の最初の診断までの時間を短縮することができた。本稿では,メディアを用いたペットの健康分析の課題,メディアインサイトエンジンのメリット,およびこのフレームワークを用いたPetcoのカスタムアプリケーションのアーキテクチャについて論じる。 This paper presents a case study of how Petco, a leading pet retailer, innovated their pet health analysis processes using the Media Insights Engine to reduce the time to first diagnosis. The company leveraged this framework to build custom applications for advanced computer vision tasks, such as identifying potential health issues in pet videos and images, and validating AI outcomes with pre-built veterinary diagnoses. The Media Insights Engine provides a modular and extensible solution that enabled Petco to quickly build machine learning applications for media workloads. By utilizing this framework, Petco was able to accelerate their project development, improve the efficiency of their pet health analysis, and ultimately reduce the time to first diagnosis for pet health issues. This paper discusses the challenges of pet health analysis using media, the benefits of using the Media Insights Engine, and the architecture of Petco's custom applications built using this framework.	翻訳日:2024-07-22 08:07:30 公開日:2024-05-28
# スペイン語とLLMベンチマーク:MMLUは翻訳で失われたか? Spanish and LLM Benchmarks: is MMLU Lost in Translation? ( http://arxiv.org/abs/2406.17789v1 ) ライセンス: Link先を確認	Irene Plaza, Nina Melero, Cristina del Pozo, Javier Conde, Pedro Reviriego, Marina Mayor-Rocher, María Grandury,	(参考訳) 大規模言語モデル(LLM)の評価は継続的な改善プロセスにおいて重要な要素であり、様々なタスクやトピックにおけるLLMの性能を評価するために多くのベンチマークが開発されている。 LLMが世界中で採用されるにつれて、英語以外の言語での評価がますます重要になっている。しかし、ほとんどのLLMベンチマークは自動化ツールを使用して単純に翻訳され、ターゲット言語で実行される。これは、その言語におけるLLMのパフォーマンスだけでなく、翻訳の質にも依存することを意味する。本稿では,MMLU(Massive Multitask Language Understanding)ベンチマークについて考察する。ベンチマークの選択されたカテゴリは、Azure TranslatorとChatGPT4を使用してスペイン語に変換され、ChatGPT4上で動作する。次に、結果は、スペイン語と英語で異なる回答を生成するテスト項目を特定するために処理される。それらは手動で分析され、自動翻訳が変更を引き起こしたかどうかが分かる。その結果, フェールした項目のかなりの部分は, ベンチマークの翻訳の誤りに起因することがわかった。これらの結果は、少なくとも項目の翻訳を改訂し、好ましくは、専門家が対象言語にテストを適用することで、英語以外の言語でのベンチマークを改善することが強く主張される。 The evaluation of Large Language Models (LLMs) is a key element in their continuous improvement process and many benchmarks have been developed to assess the performance of LLMs in different tasks and topics. As LLMs become adopted worldwide, evaluating them in languages other than English is increasingly important. However, most LLM benchmarks are simply translated using an automated tool and then run in the target language. This means that the results depend not only on the LLM performance in that language but also on the quality of the translation. In this paper, we consider the case of the well-known Massive Multitask Language Understanding (MMLU) benchmark. Selected categories of the benchmark are translated into Spanish using Azure Translator and ChatGPT4 and run on ChatGPT4. Next, the results are processed to identify the test items that produce different answers in Spanish and English. Those are then analyzed manually to understand if the automatic translation caused the change. The results show that a significant fraction of the failing items can be attributed to mistakes in the translation of the benchmark. These results make a strong case for improving benchmarks in languages other than English by at least revising the translations of the items and preferably by adapting the tests to the target language by experts.	翻訳日:2024-07-01 06:21:45 公開日:2024-05-28
# Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification (英語) Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification ( http://arxiv.org/abs/2406.17790v1 ) ライセンス: Link先を確認	Areeg Fahad Rasheed, M. Zarkoosh,	(参考訳) 数ショットの学習の中で、ICL(In-context Learning)は、少量のデータや、大規模なデータセットのトレーニングモデルが禁止されているリソース制約のある環境でのモデルパフォーマンスを改善するために、コンテキスト情報を活用する潜在的な方法となっている。しかし,数ショットで選択した試料の品質はICLの有用性を著しく制限した。本研究の主な目的は,数ショットの学習シナリオにおいて,高品質なサンプルを選択することで,文脈内学習の評価指標の性能を向上させることである。我々は,高品質試料を同定するために2乗検定を用い,低品質試料を用いて得られた試料と比較した。これらの結果から, 高品質な試料の利用により, 評価指標のすべてに対して, 性能が向上することが示唆された。 Within few-shot learning, in-context learning (ICL) has become a potential method for leveraging contextual information to improve model performance on small amounts of data or in resource-constrained environments where training models on large datasets is prohibitive. However, the quality of the selected sample in a few shots severely limits the usefulness of ICL. The primary goal of this paper is to enhance the performance of evaluation metrics for in-context learning by selecting high-quality samples in few-shot learning scenarios. We employ the chi-square test to identify high-quality samples and compare the results with those obtained using low-quality samples. Our findings demonstrate that utilizing high-quality samples leads to improved performance with respect to all evaluated metrics.	翻訳日:2024-07-01 06:21:45 公開日:2024-05-28
# SelMatch: 選択に基づく初期化とトラジェクトリマッチングによる部分更新によるデータセット蒸留の効果的スケールアップ SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching ( http://arxiv.org/abs/2406.18561v1 ) ライセンス: Link先を確認	Yongmin Lee, Hye Won Chung,	(参考訳) データセット蒸留は、大規模なデータセットからクラス毎の少数の画像(IPC)を合成し、パフォーマンス損失を最小限に抑えた完全なデータセットトレーニングを近似することを目的としている。非常に小さなIPC範囲では有効であるが、多くの蒸留法はIPCの増加に伴い、ランダムなサンプル選択が劣るほど効果が低下する。各種ICCスケールのトラジェクトリマッチングに基づく蒸留法について検討した結果,ICCが増加しても,より硬い試料の複雑で稀な特徴を合成データセットに組み込むことに苦慮していることが明らかとなった。そこで本研究では,IPCで効果的にスケールする新しい蒸留法であるSelMatchを紹介する。 SelMatchは、選択ベースの初期化とトラジェクトリマッチングによる部分的な更新を使用して、PCスケールに合わせて、合成データセットの望ましい困難レベルを管理する。 CIFAR-10/100とTinyImageNetでテストすると、SelMatchは5%から30%のサブセット比で、選択のみおよび蒸留のみの手法で、常にパフォーマンスが向上する。 Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation methods across various IPC scales reveals that these methods struggle to incorporate the complex, rare features of harder samples into the synthetic dataset even with the increased IPC, resulting in a persistent coverage gap between easy and hard test samples. Motivated by such observations, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset's desired difficulty level tailored to IPC scales. When tested on CIFAR-10/100 and TinyImageNet, SelMatch consistently outperforms leading selection-only and distillation-only methods across subset ratios from 5% to 30%.	翻訳日:2024-07-01 06:00:20 公開日:2024-05-28
# 機能拡張によるSSLの改善 Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation ( http://arxiv.org/abs/2406.18562v1 ) ライセンス: Link先を確認	Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan, Marzyeh Ghassemi,	(参考訳) 教師付き学習手法は、より単純な特徴を優先する帰納的バイアスを示す。このような特徴がラベルと急激な相関がある場合、これは少数部分群における最適以下のパフォーマンスをもたらす可能性がある。ラベルのないデータから学習する手法の普及にもかかわらず、これらの表現が予測の急激な特徴に依存している範囲は不明確である。本研究では,視覚表現学習における自己監督学習(SSL)に対する刺激的特徴の影響について検討する。最初に、SSLで一般的に使われている拡張は、画像空間において望ましくない不変性を引き起こすことを実証的に示し、これを簡単な例で説明します。さらに、SSL中のデータセット再サンプリングなど、突発的な相関に対処する古典的なアプローチは、不変表現を一貫して導くものではないことを示す。これらの知見に触発されて、我々は、プルーニングによりエンコーダの後の層を規則化することにより、事前学習中にこれらの表現からスプリアス情報を除去するLateTVGを提案する。本手法は,SSL中にグループ情報やラベル情報を必要とせずに,複数のベンチマークのベースラインよりも優れた表現を生成する。 Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which learn from unlabeled data, the extent to which these representations rely on spurious features for prediction is unclear. In this work, we explore the impact of spurious features on Self-Supervised Learning (SSL) for visual representation learning. We first empirically show that commonly used augmentations in SSL can cause undesired invariances in the image space, and illustrate this with a simple example. We further show that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representations. Motivated by these findings, we propose LateTVG to remove spurious information from these representations during pre-training, by regularizing later layers of the encoder via pruning. We find that our method produces representations which outperform the baselines on several benchmarks, without the need for group or label information during SSL.	翻訳日:2024-07-01 06:00:20 公開日:2024-05-28
# 光学系を用いた重力誘起絡み合いの可能性 Feasible generation of gravity-induced entanglement by using optomechanical systems ( http://arxiv.org/abs/2406.04361v1 ) ライセンス: Link先を確認	Daisuke Miki, Akira Matsumura, Kazuhiro Yamamoto,	(参考訳) 本研究は,S/N=1の信号対雑音比を達成するための実験パラメータを明らかにするための,光学系による重力誘起絡み(GIE)の検出の可能性について報告する。提案手法は,重力波観測の分野で成熟した連続測定,フィードバック制御,カルマンフィルタリングプロセスにおいて,重力相互作用を介して結合された光学鏡間のGIE生成に焦点を当てる。我々は、運動の最小分散を推定する光学鏡の条件共分散行列の時間発展を評価するために、リカティ方程式を解いた。その結果、GIEはオプティメカルカップリングを伴わないよく知られた時間スケールよりも高速に生成されることが示された。高速な絡み合いの発生はカルマンフィルター法(英語版)による量子状態のスクイージング(英語版)と関連しており、これは光学系を用いて実験的にGIEを検出する利点である。 We report the feasibility of detecting the gravity-induced entanglement (GIE) with optomechanical systems, which is the first investigation that clarifies the feasible experimental parameters to achieve a signal-to-noise ratio of S/N=1. Our proposal focuses on GIE generation between optomechanical mirrors, coupled via gravitational interactions, under continuous measurement, feedback control, and Kalman filtering process, which matured in connection with the field of gravitational wave observations. We solved the Riccati equation to evaluate the time evolution of the conditional covariance matrix for optomechanical mirrors that estimated the minimum variance of the motions. The results demonstrate that GIE is generated faster than a well-known time scale without optomechanical coupling. The fast generation of entanglement is associated with quantum-state squeezing by the Kalman filtering process, which is an advantage of using optomechanical systems to experimentally detect GIE.	翻訳日:2024-06-23 14:05:12 公開日:2024-05-28
# スピン1型ウンルー・デ・ウィット検出器の研究 A study of the spin 1 Unruh-De Witt detectors ( http://arxiv.org/abs/2406.04362v1 ) ライセンス: Link先を確認	F. M. Guedes, M. S. Guimaraes, I. Roditi, S. P. Sorella,	(参考訳) 相対論的スカラー量子場と相互作用するスピン1のウンルー・デ・ウィット検出器について述べる。フィールドモードを追尾した後、Bell-CHSH不等式の不等式を調査するために、2部分石英系の密度行列を用いた。スピン1/2$の場合とは異なり、スピン1/2$の場合、量子場の効果によって違反の大きさが小さくなる。この効果は、ツイレルソンの境界が四重項の場合、飽和していないという事実に起因している。 A study of the spin 1 Unruh-De Witt detectors interacting with a relativistic scalar quantum field is presented. After tracing out the field modes, the resulting density matrix for a bipartite qutrit system is employed to investigate the violation of the Bell-CHSH inequality. Unlike the case of spin $1/2$, for which the effects of the quantum field result in a decreasing of the size of violation, in the case of spin $1$ both decreasing and increasing of the violation may occur. This effect is ascribed to the fact that Tsirelson's bound is not saturated in the case of qutrits.	翻訳日:2024-06-23 14:05:12 公開日:2024-05-28
# シミュレーションアニーリングを用いたTPMS設計材料の機械学習駆動最適化 Machine Learning-Driven Optimization of TPMS Architected Materials Using Simulated Annealing ( http://arxiv.org/abs/2406.05142v1 ) ライセンス: Link先を確認	Akshansh Mishra,	(参考訳) 本研究は,3つの周期曲面(TPMS)構造の引張応力を機械学習とシミュレート・アニーリング(SA)により最適化する新しい手法を提案する。本研究は, TPMSモデルの有限要素解析から得られたデータセットを用いて, 応力予測におけるランダムフォレスト, 決定木およびXGBoostモデルの性能を評価する。対象関数はモデルの精度を高めるために検証セット上の負のR二乗値を最小化した。 SA-XGBoostモデルは他のモデルよりも優れており、R2乗値は0.96である。対照的に、SA-Random ForestモデルではR2乗が0.89であり、SA-Decision Treeモデルでは検証スコアの変動が大きくなった。これは、SA-XGBoostモデルがデータ内の複雑な関係を捉えるのに最も効果的であることを示している。 SAの統合は、これらの機械学習モデルのハイパーパラメータを最適化し、予測能力を向上するのに役立つ。 The research paper presents a novel approach to optimizing the tensile stress of Triply Periodic Minimal Surface (TPMS) structures through machine learning and Simulated Annealing (SA). The study evaluates the performance of Random Forest, Decision Tree, and XGBoost models in predicting tensile stress, using a dataset generated from finite element analysis of TPMS models. The objective function minimized the negative R-squared value on the validation set to enhance model accuracy. The SA-XGBoost model outperformed the others, achieving an R-squared value of 0.96. In contrast, the SA-Random Forest model achieved an R squared value of 0.89 while the SA-Decision Tree model exhibited greater fluctuations in validation scores. This demonstrates that the SA-XGBoost model is most effective in capturing the complex relationships within the data. The integration of SA helps in optimizing the hyperparameters of these machine learning models, thereby enhancing their predictive capabilities.	翻訳日:2024-06-23 13:55:28 公開日:2024-05-28
# カーネル密度推定を用いた機械学習モデルの領域決定:材料特性予測への応用 Determining Domain of Machine Learning Models using Kernel Density Estimates: Applications in Materials Property Prediction ( http://arxiv.org/abs/2406.05143v1 ) ライセンス: Link先を確認	Lane E. Schultz, Yiqi Wang, Ryan Jacobs, Dane Morgan,	(参考訳) 機械学習モデルの適用可能性のドメインに関する知識は、正確で信頼性の高いモデル予測を保証するために不可欠である。本研究では,モデル領域の評価を行う新しい手法を開発し,複数のモデルタイプおよび材料特性データセットに適用した場合に,ドメイン内とドメイン外との正確な識別が可能であることを示す。提案手法は,カーネル密度推定を用いて特徴空間におけるテスト点とトレーニング点の距離を評価し,この距離が領域決定に有効なツールであることを示す。確立された化学知識に基づく無関係と判断された化学物質群は,本測定値と有意な相違が認められた。また, 相違度の高い尺度は, モデル性能の低さ(残留度が高い)とモデル不確実性の低さ(信頼できない不確実性推定)に関連していることを示した。機械学習モデルの新たな予測がドメイン内なのかドメイン外なのかを識別するために、研究者が許容される相違しきい値を確立するための自動化ツールが提供される。 Knowledge of the domain of applicability of a machine learning model is essential to ensuring accurate and reliable model predictions. In this work, we develop a new approach of assessing model domain and demonstrate that our approach provides accurate and meaningful designation of in-domain versus out-of-domain when applied across multiple model types and material property data sets. Our approach assesses the distance between a test and training data point in feature space by using kernel density estimation and shows that this distance provides an effective tool for domain determination. We show that chemical groups considered unrelated based on established chemical knowledge exhibit significant dissimilarities by our measure. We also show that high measures of dissimilarity are associated with poor model performance (i.e., high residual magnitudes) and poor estimates of model uncertainty (i.e., unreliable uncertainty estimation). Automated tools are provided to enable researchers to establish acceptable dissimilarity thresholds to identify whether new predictions of their own machine learning models are in-domain versus out-of-domain.	翻訳日:2024-06-23 13:55:28 公開日:2024-05-28
# シャープ比の最適化:マルチアーマッドバンドにおけるリスク調整型意思決定 Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits ( http://arxiv.org/abs/2406.06552v1 ) ライセンス: Link先を確認	Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak,	(参考訳) シャープ比率(SR)は金融時系列の特徴付けにおいて重要なパラメータであり、変動を通じて株/ポートフォリオの報酬とボラティリティを共同で検討している。最高の専門家であるEven-Dar et al (2006)に対して、オフラインポリシーでさえ常に後悔を経験しているため、SRを最適化するためのオンラインアルゴリズムの導出は特に困難である。したがって、通常の SR の定義を最適化する代わりに、正規化された正方形 SR (RSSR) を最適化する。 RSSRの2つの設定、Regret Minimization(RM)とBest Arm Identification(BAI)について検討する。そこで本研究では,UCB-RSSR と呼ばれる RM の RSSR 最大化のための新しいマルチアーム・バンディット (MAB) アルゴリズムを提案する。 RSSRの推定値に対して経路依存濃度を導出する。このことから, UCB-RSSR の反証を導出し, 水平 n で演奏される二本腕のバンディットケースの O(log n) として進化することを示す。また、よく知られたBAIアルゴリズム、すなわちシーケンシャル半減と逐次リジェクションの固定予算設定も検討し、SHVV、SHSR、SuRSRアルゴリズムを提案する。提案した全てのBAIアルゴリズムの誤差確率の上限を導出する。 UCB-RSSRは、他のSR最適化バンディットアルゴリズムであるU-UCB Cassel et al(2023)よりも優れていることを示す。また, GRA-UCB および MVTS アルゴリズムから得られた他のベンチマークに対して有効性を確立する。さらに、複数の異なる設定に対して提案したBAIアルゴリズムの性能を実証する。我々の研究は、提案アルゴリズムがリスク対応ポートフォリオ管理問題に広範な応用を見出すことを強調している。その結果,提案アルゴリズムはリスク対応ポートフォリオ管理問題に広範な応用が期待できることがわかった。 Sharpe Ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its variance. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert Even-Dar et al (2006). Thus, instead of optimizing the usual definition of SR, we optimize regularized square SR (RSSR). We consider two settings for the RSSR, Regret Minimization (RM) and Best Arm Identification (BAI). In this regard, we propose a novel multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as O(log n) for the two-armed bandit case played for a horizon n. We also consider a fixed budget setting for well-known BAI algorithms, i.e., sequential halving and successive rejects, and propose SHVV, SHSR, and SuRSR algorithms. We derive the upper bound for the error probability of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We also establish its efficacy with respect to other benchmarks derived from the GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed BAI algorithms for multiple different setups. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems. Consequently, our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.	翻訳日:2024-06-23 13:55:28 公開日:2024-05-28
# マルチドメインテキスト分類のための確率的逆ネットワーク Stochastic Adversarial Networks for Multi-Domain Text Classification ( http://arxiv.org/abs/2406.00044v1 ) ライセンス: Link先を確認	Xu Wang, Yuan Wu,	(参考訳) 敵対的訓練は多領域テキスト分類(MDTC)の進展に役立っている。 MDTC法は伝統的に、ドメイン不変知識のための共有特徴抽出器と、ドメイン固有知識のための個別特徴抽出器を備えた共有プライベートパラダイムを用いている。最先端の結果を得たにもかかわらず、これらの手法は、新しいドメインの連続的な追加によるモデルパラメータのエスカレーションに対応している。この課題に対処するために、従来の重みベクトルとは対照的に、ドメイン固有の特徴抽出器のパラメータを多変量ガウス分布として革新的にモデル化するSAN(Stochastic Adversarial Network)を導入する。この設計により、モデルパラメータを大幅に増加させることなく、多数のドメイン固有の特徴抽出器を生成でき、モデルのサイズは単一のドメイン固有の抽出器と同等に維持できる。さらに, ドメインラベルのスムース化とロバストな擬似ラベル正規化を併用して, 対人訓練の安定性と特徴識別性を向上する。 2つの主要なMDTCベンチマークで評価したSANの性能は、現在の最先端手法に対する競争優位性を示している。コードはhttps://github.com/wangxu0820/SANで公開されている。 Adversarial training has been instrumental in advancing multi-domain text classification (MDTC). Traditionally, MDTC methods employ a shared-private paradigm, with a shared feature extractor for domain-invariant knowledge and individual private feature extractors for domain-specific knowledge. Despite achieving state-of-the-art results, these methods grapple with the escalating model parameters due to the continuous addition of new domains. To address this challenge, we introduce the Stochastic Adversarial Network (SAN), which innovatively models the parameters of the domain-specific feature extractor as a multivariate Gaussian distribution, as opposed to a traditional weight vector. This design allows for the generation of numerous domain-specific feature extractors without a substantial increase in model parameters, maintaining the model's size on par with that of a single domain-specific extractor. Furthermore, our approach integrates domain label smoothing and robust pseudo-label regularization to fortify the stability of adversarial training and to refine feature discriminability, respectively. The performance of our SAN, evaluated on two leading MDTC benchmarks, demonstrates its competitive edge against the current state-of-the-art methodologies. The code is available at https://github.com/wangxu0820/SAN.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# 大規模言語モデルのパーソナライズされたステアリング:双方向選好最適化によるヴァーサタイルステアリングベクトル Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization ( http://arxiv.org/abs/2406.00045v1 ) ライセンス: Link先を確認	Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen,	(参考訳) 研究者は、Large Language Models(LLM)の振る舞いを制御し、様々なアプリケーションに適したパーソナライズされたLLMを構築するためのアプローチを研究してきた。微調整は直接的な解決策であるように見えるが、かなりの計算資源が必要であり、元のLLMの実用性に大きな影響を及ぼす可能性がある。最近の取り組みはより軽量な戦略を導入し、LLMのトランスフォーマーアーキテクチャの特定の層内でのアクティベーションを調整することで、モデル出力を望ましい振る舞いに導く「ステアリングベクトル」の抽出に重点を置いている。しかし、そのようなステアリングベクトルは人間の嗜好データのアクティベートから直接抽出され、特にアライメントに関連するシナリオにおいて、しばしば最適以下の結果と時折失敗につながる。この研究は、双方向の選好最適化によってより効果的なステアリングベクトルを生み出すことができる革新的なアプローチを提案する。提案手法は, ステアリングベクトルが人間の嗜好データペアの生成確率に直接影響し, 対象行動のより正確に表現できるように設計されている。ステアリングベクトルの方向と大きさを慎重に調整することにより、所望の動作を様々な強度でパーソナライズした制御を可能にした。様々なオープンエンド世代タスク、特にAIペルソナのステアリングに焦点を当てた大規模な実験が、我々のアプローチの有効性を検証した。さらに、真理性の管理、幻覚の緩和、脱獄攻撃への対処など、重要なアライメントのシナリオを包括的に調査する。興味深いことに,本手法はこれらのシナリオにおいて優れたステアリング効果を示すことができる。さらに、異なるモデル/LoRA間のステアリングベクトルの転送可能性を示し、同時に複数のベクトルを適用することの相乗効果を強調した。 Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior. By carefully adjusting the direction and magnitude of the steering vector, we enabled personalized control over the desired behavior across a spectrum of intensities. Extensive experimentation across various open-ended generation tasks, particularly focusing on steering AI personas, has validated the efficacy of our approach. Moreover, we comprehensively investigate critical alignment-concerning scenarios, such as managing truthfulness, mitigating hallucination, and addressing jailbreaking attacks. Remarkably, our method can still demonstrate outstanding steering effectiveness across these scenarios. Furthermore, we showcase the transferability of our steering vectors across different models/LoRAs and highlight the synergistic benefits of applying multiple vectors simultaneously.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# 一般化可能な目標認識フェアネスによるヘイトスピーチ検出 Hate Speech Detection with Generalizable Target-aware Fairness ( http://arxiv.org/abs/2406.00046v1 ) ライセンス: Link先を確認	Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca Demartini, Hongzhi Yin,	(参考訳) ソーシャルメディアプラットフォームの普及による副作用に対抗するため、ヘイトスピーチ検出(HSD)は、早期に有害なオンライン投稿の拡散を阻止する重要な役割を担っている。しかし、ソーシャルメディア上で広く普及している話題コミュニティを考えると、訓練されたHSD分類器は特定の対象グループ(例えば、女性や黒人)に偏りやすくなり、偽陽性/陰性の結果が、コンテンツモデレーション機構の公正性に対する公衆の信頼を著しく損なうことになり、最終的にはオンライン社会の多様性を損なうことになる。既存のフェアネスを意識したHSD法は、対象とするグループ間でのいくつかの相違を緩和することができるが、それらは主に、既知の、固定されたと思われるターゲットの狭い選択に特化している。これにより、新たなターゲットグループが常に時間とともに出現する現実世界のユースケースへの一般化が必然的に防止される。この欠陥に対処するために、我々は、推論中に多様で見えざるターゲットを含む各ポストを適切に分類する新しい方法であるGeneralizable target-aware Fairness (GetFair)を提案する。ターゲット関連の機能に対するHSD分類器の急激な依存を取り除くため、GetFairは、フィルタされたポスト埋め込みからターゲットグループを回復する識別器を欺くために、対向パイプラインで一連のフィルタ関数を訓練する。拡張性と一般化性を維持するため、ターゲット間のセマンティック親和性によって正規化されるハイパーネットワークを用いて、全てのフィルタ関数を革新的にパラメータ化する。ターゲットの事前訓練された単語を入力として埋め込み、ハイパーネットワークは専用のフィルタパラメータを格納することなく、各ターゲット固有のフィルタがオンザフライで使用する重みを生成する。最後に、2つのHSDデータセットの比較実験では、サンプル外のターゲットでGetFairのパフォーマンスが有利であることが示されている。 To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge over time. To tackle this defect, we propose Generalizable target-aware Fairness (GetFair), a new method for fairly classifying each post that contains diverse and even unseen targets during inference. To remove the HSD classifier's spurious dependence on target-related features, GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator that recovers the targeted group from filtered post embeddings. To maintain scalability and generalizability, we innovatively parameterize all filter functions via a hypernetwork that is regularized by the semantic affinity among targets. Taking a target's pretrained word embedding as input, the hypernetwork generates the weights used by each target-specific filter on-the-fly without storing dedicated filter parameters. Finally, comparative experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# シュロディンガー方程式に対するフローベース解の効率的な正規化のための理論的枠組み A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Schrodinger Equation ( http://arxiv.org/abs/2406.00047v1 ) ライセンス: Link先を確認	Daniel Freedman, Eyal Rozenberg, Alex Bronstein,	(参考訳) 量子力学における中心的な問題は、分子や物質に対する電子シュロディンガー方程式を解くことである。この問題に対する変分モンテカルロのアプローチはサンプリングによって特定の変分対象を近似し、アンザッツとして知られるパラメータ化された波動関数の族よりもこの近似対象を最適化する。近年、ニューラルネットワークがアンザッツとして使われ、成功している。しかし、そのような波動関数からのサンプリングにはマルコフ・チェイン・モンテカルロのアプローチが必要であり、これは本質的に非効率である。そこで本研究では,アンザッツによる解法を提案する。アンザッツは安価で,必要な量子力学的性質を満足する。以下の2つの必須成分を用いた正規化フローが我々の要求を満たすことを証明している。 a) 決定的点過程から構築された基礎分布 b) 置換群の特定の部分群に同値なフロー層。次に、必要等式を満たす連続正規化フローと離散正規化フローの両方を構築する方法を示す。さらに、波動関数の非滑らかな性質(尖点)を捉える方法や、フレームワークが複数の分子をまたいだ誘導を提供するためにどのように一般化されるかを示す。結果として生じる理論的枠組みは電子シュロディンガー方程式を解くための効率的なアプローチを必要とする。 A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature ("cusps") of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrodinger Equation.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# 深層ニューラルネットワークによる言語構造獲得の理論 Towards a theory of how the structure of language is acquired by deep neural networks ( http://arxiv.org/abs/2406.00048v1 ) ライセンス: Link先を確認	Francesco Cagnetta, Matthieu Wyart,	(参考訳) 言語の構造を学ぶのにどのくらいのデータが必要か? 本研究では,確率論的文脈自由文法(PCFG)を用いて生成した合成データセットについて検討する。モデルを用いてトークンとトークンの相関関係を解析的に決定し,文法の隠れ変数を表現できることを示す。さらに、有限トレーニングセットは、相関の分解を、トレーニングセットのサイズが大きくなる有効範囲に制限する。結果として、多くの例で訓練された言語モデルは、文法の構造をより深く表現することができるため、問題の高次元性にもかかわらず、優れた性能を達成することができる。トレーニングセットのサイズと効果的な相関範囲の関係は、我々の合成データセットを超えていると推測する。特に,本予想では,学習セットサイズによるテスト損失行動のスケーリング法則がコンテキストウィンドウの長さに依存するのかを予測し,シェイクスピアの戯曲からの行の収集を実証的に確認する。 How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model that captures the tree-like structure of natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a representation of the grammar's hidden variables, the longer the range the deeper the variable. In addition, a finite training set limits the resolution of correlations to an effective range, whose size grows with that of the training set. As a result, a Language Model trained with increasingly many examples can build a deeper representation of the grammar's structure, thus reaching good performance despite the high dimensionality of the problem. We conjecture that the relationship between training set size and effective range of correlations holds beyond our synthetic datasets. In particular, our conjecture predicts how the scaling law for the test loss behaviour with training set size depends on the length of the context window, which we confirm empirically for a collection of lines from Shakespeare's plays.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# QUEST: 機械翻訳のための品質に配慮したメトロポリス・ハスティング QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation ( http://arxiv.org/abs/2406.00049v1 ) ライセンス: Link先を確認	Gonçalo R. A. Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José G. C. de Souza, André F. T. Martins,	(参考訳) 機械翻訳(MT)における重要な課題は、高品質で多様な翻訳を生成することである。 MTモデルから推定される推定推定値は,翻訳品質と相関が低いことを示す。対照的に、品質評価指標(COMETやBLEURTなど)は、人間の判断と高い相関を示し、リランカーとしての使用(品質認識やベイズリスクの最小化など)を動機付けている。しかし、高い推定品質の単一翻訳に依存すると、「メートル法をゲームする」可能性が高まる。本稿では,高品質で多様な翻訳の集合をサンプリングする問題に対処する。ギブス分布のエネルギー関数として利用することで、ノイズ品質推定の過度な信頼を回避するための簡便で効果的な方法を提供する。分布のモードを探す代わりに、簡単なマルコフ連鎖モンテカルロアプローチであるメトロポリス・ハスティングスアルゴリズムを用いて高密度領域から複数のサンプルを生成する。その結果,提案手法は複数の言語対 (英:$\leftrightarrow${German, Russian}) に対して,2つの強いデコーダのみのLLM (Alma-7b, Tower-7b) を持つ高品質で多様な出力をもたらすことがわかった。 An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution. Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm, a simple Markov chain Monte Carlo approach. The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (English$\leftrightarrow${German, Russian}) with two strong decoder-only LLMs (Alma-7b, Tower-7b).	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# デュアルプロセス学習:重み付けによるインコンテキスト対インウェイト戦略の利用制御 Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting ( http://arxiv.org/abs/2406.00053v1 ) ライセンス: Link先を確認	Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick,	(参考訳) 言語モデルには、コンテキスト内学習(ICL)を実行する能力があり、コンテキストに基づいた振る舞いを柔軟に適応させることができる。これは、データの反復的な観察から、情報がモデルパラメータに静的に符号化される、重み付き学習とは対照的である。このようなコンテキスト内で学習する能力にもかかわらず、言語モデルは目に見えないか、まれに現れるトークンに直面したときに苦労することが知られている。したがって、$\textbf{structureural in-context learning}$を、任意のトークン上でコンテキスト内学習を実行するモデルの能力として定義する。理想的なモデルは、柔軟に in-weights 操作をデプロイ(エンコードされたセマンティック情報を使ってあいまいさや未知のコンテキストを堅牢に適合させるために)し、構造的 in-context 操作(新しいトークンに対応するために)を行うことができる。実演モデルと玩具モデルの両方を用いて、単純な音声設定における構造的インコンテキストアルゴリズムについて検討する。モデルが新しい言語に一般化するのを助けるために最近導入された手法である能動的忘れ字法は、構造的コンテキスト内学習ソリューションを採用するようモデルに強制する。最後に、$\textbf{temporary forgetting}$を紹介します。これは、モデルがインウェイトとインコンテキストソリューションにどれだけ依存するかを制御できる、アクティブな忘れの直接的な拡張です。重要なことは、一時的忘れることによって、$\textit{dual process strategy}$を誘導することができます。 Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens. Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than semantic content encoded in token embeddings. An ideal model would be able to do both: flexibly deploy in-weights operations (in order to robustly accommodate ambiguous or unknown contexts using encoded semantic information) and structural in-context operations (in order to accommodate novel tokens). We study structural in-context algorithms in a simple part-of-speech setting using both practical and toy models. We find that active forgetting, a technique that was recently introduced to help models generalize to new languages, forces models to adopt structural in-context learning solutions. Finally, we introduce $\textbf{temporary forgetting}$, a straightforward extension of active forgetting that enables one to control how much a model relies on in-weights vs. in-context solutions. Importantly, temporary forgetting allows us to induce a $\textit{dual process strategy}$ where in-context and in-weights solutions coexist within a single model.	翻訳日:2024-06-09 15:59:42 公開日:2024-05-28
# 文脈的類似性を用いた判断行動検索 Judgement Citation Retrieval using Contextual Similarity ( http://arxiv.org/abs/2406.01609v1 ) ライセンス: Link先を確認	Akshat Mohan Dasula, Hrushitha Tigulla, Preethika Bhukya,	(参考訳) 伝統的に、法律研究の分野では、複雑な事例記述からの関連する引用の検索は、法的用語を理解する専門知識を委任する手作業やキーワードベースの検索アプリケーションを必要としている。法的ケース記述は、法律専門家や研究者にとって重要な情報を保持し、より効率的で自動化されたアプローチを必要とする。本稿では,自然言語処理(NLP)と機械学習技術を組み合わせて,訴訟記述の組織化と活用を促進する手法を提案する。このアプローチは、最先端の埋め込みモデルの助けを借りて、テキスト埋め込みの作成を中心に展開される。提案手法は,非教師付きクラスタリングと教師付き引用検索の2つの主要な目的に対処する。提案手法は任意のデータセットに使用することができるが,米国最高裁判所(SCOTUS)データセットを用い,顕著な結果を得た。我々の手法は90.9%という驚くべき精度を達成した。労働集約的なプロセスを自動化することによって、法律研究においてより効率的で時間節約し、アクセスしやすくする方法を開拓し、法律専門家、学者、研究者に恩恵を与えます。 Traditionally in the domain of legal research, the retrieval of pertinent citations from intricate case descriptions has demanded manual effort and keyword-based search applications that mandate expertise in understanding legal jargon. Legal case descriptions hold pivotal information for legal professionals and researchers, necessitating more efficient and automated approaches. We propose a methodology that combines natural language processing (NLP) and machine learning techniques to enhance the organization and utilization of legal case descriptions. This approach revolves around the creation of textual embeddings with the help of state-of-art embedding models. Our methodology addresses two primary objectives: unsupervised clustering and supervised citation retrieval, both designed to automate the citation extraction process. Although the proposed methodology can be used for any dataset, we employed the Supreme Court of The United States (SCOTUS) dataset, yielding remarkable results. Our methodology achieved an impressive accuracy rate of 90.9%. By automating labor-intensive processes, we pave the way for a more efficient, time-saving, and accessible landscape in legal research, benefiting legal professionals, academics, and researchers.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-28
# FinEmbedDiff:マルチモーダル埋め込みモデルを用いたベクトルサンプリングによる財務文書分類の費用効果 FinEmbedDiff: A Cost-Effective Approach of Classifying Financial Documents with Vector Sampling using Multi-modal Embedding Models ( http://arxiv.org/abs/2406.01618v1 ) ライセンス: Link先を確認	Anjanava Biswas, Wrick Talukdar,	(参考訳) テキスト、表、チャート、画像を含むマルチモーダル財務文書の正確な分類は極めて重要であるが、難しい。従来のテキストベースのアプローチは、これらの文書の複雑なマルチモーダルな性質を捉えるのに失敗することが多い。本研究では,FinEmbedDiffを提案する。FinEmbedDiffは,事前学習したマルチモーダル埋め込みモデルを利用して財務文書を分類する,コスト効率の高いベクトルサンプリング手法である。提案手法は,文書に対するマルチモーダル埋め込みベクトルを生成し,ベクトル類似度を用いた事前計算されたクラス埋め込みと比較する。大規模なデータセットに基づいて評価したFinEmbedDiffは、最先端のベースラインと比較して、競合する分類精度を実現し、計算コストを大幅に削減する。この方法は強力な一般化能力を示し、現実の金融アプリケーションにとって実用的でスケーラブルなソリューションである。 Accurate classification of multi-modal financial documents, containing text, tables, charts, and images, is crucial but challenging. Traditional text-based approaches often fail to capture the complex multi-modal nature of these documents. We propose FinEmbedDiff, a cost-effective vector sampling method that leverages pre-trained multi-modal embedding models to classify financial documents. Our approach generates multi-modal embedding vectors for documents, and compares new documents with pre-computed class embeddings using vector similarity measures. Evaluated on a large dataset, FinEmbedDiff achieves competitive classification accuracy compared to state-of-the-art baselines while significantly reducing computational costs. The method exhibits strong generalization capabilities, making it a practical and scalable solution for real-world financial applications.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-28
# PPOベースの言語モデルはハック可能か? Are PPO-ed Language Models Hackable? ( http://arxiv.org/abs/2406.02577v1 ) ライセンス: Link先を確認	Suraj Anand, David Getzen,	(参考訳) 好ましくない振る舞いを取り除くために、$\textit{align}$言語モデルに多くのアルゴリズムが提案されている。しかし、非常に大きな州空間と適切な報酬関数を作成することに関連する課題は、しばしば様々なジェイルブレイクを引き起こす。本稿では,肯定的な感情言語生成の制御における報酬の効果を検討することを目的とする。人間のフィードバックに基づく報酬モデルのオンライントレーニングの代わりに、静的学習された感情分類器を用いる。また、トレーニング後にモデルの重みとアクティベーションがエンドユーザに露出する環境についても検討する。近位政策最適化(PPO)の前後の機械的解釈可能性のレンズを用いて,事前学習したGPT-2を検証し,肯定的な感情応答を促進させた。これらの知見を用いて、(1)PPO-edモデルを「ハック」して負の感情反応を生成し、(2)報酬関数に項を加えて「負の」重みを変えようとする。 Numerous algorithms have been proposed to $\textit{align}$ language models to remove undesirable behaviors. However, the challenges associated with a very large state space and creating a proper reward function often result in various jailbreaks. Our paper aims to examine this effect of reward in the controlled setting of positive sentiment language generation. Instead of online training of a reward model based on human feedback, we employ a statically learned sentiment classifier. We also consider a setting where our model's weights and activations are exposed to an end-user after training. We examine a pretrained GPT-2 through the lens of mechanistic interpretability before and after proximal policy optimization (PPO) has been applied to promote positive sentiment responses. Using these insights, we (1) attempt to "hack" the PPO-ed model to generate negative sentiment responses and (2) add a term to the reward function to try and alter `negative' weights.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-28
# フェアLLMの不可能性 The Impossibility of Fair LLMs ( http://arxiv.org/abs/2406.03198v1 ) ライセンス: Link先を確認	Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Alexander D'Amour, Chenhao Tan,	(参考訳) 公正なAIの必要性は、ChatGPTやGemini、その他の大規模言語モデル(LLM)といった汎用システムの時代において、ますます明確になっている。しかしながら、人間とAIの相互作用の複雑さの増大とその社会的影響は、どのように公正性標準を適用することができるのかという疑問を提起している。本稿では、機械学習研究者が、グループフェアネスやフェア表現など、フェアネスを評価するのに用いた技術的枠組みを概観し、LLMへの適用には固有の制約があることを見出した。それぞれのフレームワークがLLMに論理的に拡張していないか、あるいはLLMにとって難解な公平性の概念を提示しているかを示す。これらの課題に対処するため、我々は、特にユースケースにおいて公正を達成するためのより現実的な目標、すなわち、コンテキストの臨界性、LLM開発者の責任、そして、設計と評価の反復的なプロセスにおけるステークホルダーの参加の必要性に関するガイドラインを開発する。さらに、最終的には、スケーラブルなAIアシストアライメントの形式として、フェアネスの課題に対処するために、AIシステムの汎用能力を使用する必要さえある。 The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness, such as group fairness and fair representations, and find that their application to LLMs faces inherent limitations. We show that each framework either does not logically extend to LLMs or presents a notion of fairness that is intractable for LLMs, primarily due to the multitudes of populations affected, sensitive attributes, and use cases. To address these challenges, we develop guidelines for the more realistic goal of achieving fairness in particular use cases: the criticality of context, the responsibility of LLM developers, and the need for stakeholder participation in an iterative process of design and evaluation. Moreover, it may eventually be possible and even necessary to use the general-purpose capabilities of AI systems to address fairness challenges as a form of scalable AI-assisted alignment.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-28
# ADR-BC: 対向密度重み付き回帰行動クローニング ADR-BC: Adversarial Density Weighted Regression Behavior Cloning ( http://arxiv.org/abs/2405.20351v1 ) ライセンス: Link先を確認	Ziqi Zhang, Zifeng Zhuang, Donglin Wang, Jingzehua Xu, Miao Liu, Shuai Zhang,	(参考訳) 通常、従来のImitation Learning(IL)手法は、まず報酬やQ関数を定式化し、次にこの形の関数を強化学習(RL)フレームワークで使用して経験則を最適化する。しかし、形状の報酬/Q関数が基底真理報酬/Q関数を適切に表現していない場合、多段階のRLフレームワーク内でポリシーを更新すると累積バイアスが発生し、さらに政策学習に影響を及ぼす可能性がある。行動クローニング(BC)を利用して、一段階の更新方法でいくつかのデモを直接模倣することでポリシーを学ぶことは累積バイアスを避けることができるが、BCは、実証されたアクションを巧みに模倣し、目に見えない状態のアクションペアに一般化する能力を制限する傾向にある。これらの課題に対処するため,ADR-BCを提案する。特に、ADR-BCの目的は、準最適分布を分岐しながら専門家分布と一致するような物理的意味を共有することである。したがって、ADR-BCはより堅牢な専門家分布マッチングを実現することができる。一方、ADR-BCは1段階の行動クローニングフレームワークであり、多段階のRLフレームワークに関連する累積バイアスを避けている。 ADR-BCの性能を検証するため,我々は広範囲な実験を行った。具体的には、ADR-BCは、Gym-Mujocoドメインのすべてのタスクに対して、以前の最先端(SOTA)の一般化されたILベースラインであるCEILよりも10.5%改善されている。さらに、AdroitドメインとKitchenドメインの全タスクの本当の報酬を使用して、Implicit Q Learning(IQL)よりも89.5%改善されている。一方,ADR-BCの有効性をさらに示すため,広範囲にわたる改善を行った。 Typically, traditional Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning (RL) framework to optimize the empirical policy. However, if the shaped reward/Q function does not adequately represent the ground truth reward/Q function, updating the policy within a multi-step RL framework may result in cumulative bias, further impacting policy learning. Although utilizing behavior cloning (BC) to learn a policy by directly mimicking a few demonstrations in a single-step updating manner can avoid cumulative bias, BC tends to greedily imitate demonstrated actions, limiting its capacity to generalize to unseen state action pairs. To address these challenges, we propose ADR-BC, which aims to enhance behavior cloning through augmented density-based action support, optimizing the policy with this augmented support. Specifically, the objective of ADR-BC shares the similar physical meanings that matching expert distribution while diverging the sub-optimal distribution. Therefore, ADR-BC can achieve more robust expert distribution matching. Meanwhile, as a one-step behavior cloning framework, ADR-BC avoids the cumulative bias associated with multi-step RL frameworks. To validate the performance of ADR-BC, we conduct extensive experiments. Specifically, ADR-BC showcases a 10.5% improvement over the previous state-of-the-art (SOTA) generalized IL baseline, CEIL, across all tasks in the Gym-Mujoco domain. Additionally, it achieves an 89.5% improvement over Implicit Q Learning (IQL) using real rewards across all tasks in the Adroit and Kitchen domains. On the other hand, we conduct extensive ablations to further demonstrate the effectiveness of ADR-BC.	翻訳日:2024-06-03 18:44:15 公開日:2024-05-28
# スペクトル匿名化の漸近的有用性 Asymptotic utility of spectral anonymization ( http://arxiv.org/abs/2405.20779v1 ) ライセンス: Link先を確認	Katariina Perkonoja, Joni Virta,	(参考訳) 現代のデータランドスケープでは、複数ソースのデータ収集とサードパーティの共有が特徴であり、個人のプライバシを確保することが重要な関心事である。様々な匿名化手法が存在するが、それらのユーティリティ保存とプライバシ保証は定量化が難しいままである。本研究では、スペクトル匿名化(SA)アルゴリズムの有用性とプライバシを、特に漸近的なフレームワークで研究することで、このギャップに対処する。元のデータを直接修正する従来の匿名化手法とは異なり、SAはデータをスペクトルベースで摂動させ、その後元のベースに戻す。原版である $\mathcal{P}$-SA とともに、ランダムな置換変換を用いる2つの新しいSA変種: $\mathcal{J}$-spectral anonymization と $\mathcal{O}$-spectral anonymization を導入する。いくつかの現実的な仮定の下では、これらのSAアルゴリズムが元のデータの第一と第二の瞬間をいかに保存するかを示す。特に, 共分散推定における3つのSAアルゴリズムの漸近効率は, 原データと比較して正確に50%であることがわかった。これらの漸近的結果の適用性を評価するために,有限データを用いたシミュレーション研究を行い,距離ベースのレコードリンクを用いて,これらのアルゴリズムが提供するプライバシー保護を評価する。我々の研究は、有限サンプルユーティリティにおいて明確な優位性を示す手法は存在しないが、$\mathcal{O}$-SAは、計算複雑性が増大しているにもかかわらず、同じレコードを生成しないという例外的なプライバシー保護のために、自分自身を区別していることを明らかにしている。逆に$\mathcal{P}$-SA は計算効率の良い代替品として現れ、平均推定における未整合効率を示す。 In the contemporary data landscape characterized by multi-source data collection and third-party sharing, ensuring individual privacy stands as a critical concern. While various anonymization methods exist, their utility preservation and privacy guarantees remain challenging to quantify. In this work, we address this gap by studying the utility and privacy of the spectral anonymization (SA) algorithm, particularly in an asymptotic framework. Unlike conventional anonymization methods that directly modify the original data, SA operates by perturbing the data in a spectral basis and subsequently reverting them to their original basis. Alongside the original version $\mathcal{P}$-SA, employing random permutation transformation, we introduce two novel SA variants: $\mathcal{J}$-spectral anonymization and $\mathcal{O}$-spectral anonymization, which employ sign-change and orthogonal matrix transformations, respectively. We show how well, under some practical assumptions, these SA algorithms preserve the first and second moments of the original data. Our results reveal, in particular, that the asymptotic efficiency of all three SA algorithms in covariance estimation is exactly 50% when compared to the original data. To assess the applicability of these asymptotic results in practice, we conduct a simulation study with finite data and also evaluate the privacy protection offered by these algorithms using distance-based record linkage. Our research reveals that while no method exhibits clear superiority in finite-sample utility, $\mathcal{O}$-SA distinguishes itself for its exceptional privacy preservation, never producing identical records, albeit with increased computational complexity. Conversely, $\mathcal{P}$-SA emerges as a computationally efficient alternative, demonstrating unmatched efficiency in mean estimation.	翻訳日:2024-06-03 18:05:14 公開日:2024-05-28
# 言語モデル透かしのブラックボックス検出 Black-Box Detection of Language Model Watermarks ( http://arxiv.org/abs/2405.20777v1 ) ライセンス: Link先を確認	Gloaguen Thibaud, Jovanović Nikola, Staab Robin, Vechev Martin,	(参考訳) 透かしはLLM生成テキストを検出するための有望な方法として登場した。 LLMプロバイダがシークレットキーを付与した透かしを適用できるようにする。最近の研究は3つの主要な透かし方式を提案しており、そのうち2つはLLM分布の保存性に焦点を当てている。これは、LLM機能を維持するための魅力的なプロキシであると同時に、透かしの配置を隠すことで、悪意のあるアクターが特定のLCMを避けたり、その透かしを攻撃したりすることで誤用を隠すのが難しくなるという考えによっても動機づけられている。しかし、検出可能性に関して多くの議論があるにもかかわらず、これらのスキームファミリーのうちどれかが現実的なブラックボックス設定で検出可能かどうかを以前の研究は調査していない。ブラックボックスクエリの限られた数だけを用いて、最も人気のある3つの透かしスキーム群すべての存在を検出するための厳密な統計的テストを開発した。提案手法の有効性を,多種多様なオープンソースモデルを用いて実験的に検証した。以上の結果から,現在の透かし方式は従来考えられていたよりも検出可能であり,また,透かしが配備されたという事実を無視することは,プロバイダが敵から守るための有効な方法ではない可能性が示唆された。 GPT4、Claude 3、Gemini 1.0 Proといった一般的な公開APIの背後にある透かしの存在をテストするために、私たちのメソッドをさらに適用します。 Watermarking has emerged as a promising way to detect LLM-generated text. To apply a watermark an LLM provider, given a secret key, augments generations with a signal that is later detectable by any party with the same key. Recent work has proposed three main families of watermarking schemes, two of which focus on the property of preserving the LLM distribution. This is motivated by it being a tractable proxy for maintaining LLM capabilities, but also by the idea that concealing a watermark deployment makes it harder for malicious actors to hide misuse by avoiding a certain LLM or attacking its watermark. Yet, despite much discourse around detectability, no prior work has investigated if any of these scheme families are detectable in a realistic black-box setting. We tackle this for the first time, developing rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries. We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models. Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries. We further apply our methods to test for watermark presence behind the most popular public APIs: GPT4, Claude 3, Gemini 1.0 Pro, finding no strong evidence of a watermark at this point in time.	翻訳日:2024-06-03 14:37:39 公開日:2024-05-28
# 安全対応型LDMに対する逆例の生成改善 Improved Generation of Adversarial Examples Against Safety-aligned LLMs ( http://arxiv.org/abs/2405.20778v1 ) ライセンス: Link先を確認	Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen,	(参考訳) 大きな言語モデル(LLM)が安全基準に準拠し、無害なコンテンツを生み出すことを保証するための多くの努力にもかかわらず、LLMに対するジェイルブレイク攻撃(英語版)として知られるこれらの制限を回避し、いくつかの成功は達成されている。勾配に基づく手法を用いて生成された敵対的プロンプトは、自動的にジェイルブレイク攻撃を行う際、優れた性能を示す。しかしながら、テキストの離散的な性質のため、LLMの入力勾配はトークンの交換によって生じる損失の程度を正確に反映するのに苦労し、ホワイトボックスの設定でさえ、安全に整合したLLMに対する攻撃の成功率は制限された。本稿では,ブラックボックス画像分類モデルに対する攻撃として提案されたトランスファーベース攻撃に触発されたイノベーションを活用することで,この問題に対する新たな視点を探求する。そこで我々は,これらの移動型攻撃,すなわちスキップグラディエント・メソッドと中間レベル・アタックの効果的な手法のイデオロギーを,ホワイトボックスのLDMに対して自動生成された敵例の有効性を改善するために,初めて適用した。適切な適応により、これらのイデオロギーを勾配に基づく逆数生成プロセスに注入し、明らかな計算コストを伴わずに大幅な性能向上を達成する。一方、利得の背後にあるメカニズムを議論することで、新たな洞察を導き、これらの手法の適切な組み合わせも開発されている。実験の結果,AdvBench上でのLlama-2-7B-Chatモデルに対するGCGと比較して,開発した組み合わせは30%の絶対的な攻撃成功率向上を実現していることがわかった。 Despite numerous efforts to ensure large language models (LLMs) adhere to safety standards and produce harmless content, some successes have been achieved in bypassing these restrictions, known as jailbreak attacks against LLMs. Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing jailbreak attacks automatically. Nevertheless, due to the discrete nature of texts, the input gradient of LLMs struggles to precisely reflect the magnitude of loss change that results from token replacements in the prompt, leading to limited attack success rates against safety-aligned LLMs, even in the white-box setting. In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks that were originally proposed for attacking black-box image classification models. For the first time, we appropriate the ideologies of effective methods among these transfer-based attacks, i.e., Skip Gradient Method and Intermediate Level Attack, for improving the effectiveness of automatically generated adversarial examples against white-box LLMs. With appropriate adaptations, we inject these ideologies into gradient-based adversarial prompt generation processes and achieve significant performance gains without introducing obvious computational cost. Meanwhile, by discussing mechanisms behind the gains, new insights are drawn, and proper combinations of these methods are also developed. Our empirical results show that the developed combination achieves >30% absolute increase in attack success rates compared with GCG for attacking the Llama-2-7B-Chat model on AdvBench.	翻訳日:2024-06-03 14:37:39 公開日:2024-05-28
# 差分的私的メカニズムの普遍的エクササイズ圧縮 Universal Exact Compression of Differentially Private Mechanisms ( http://arxiv.org/abs/2405.20782v1 ) ライセンス: Link先を確認	Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li,	(参考訳) 差分プライバシー機構の通信コストを低減するため,PPR(Poisson private representation)と呼ばれる新しい構成を導入し,局所的な差分プライバシーを確保しつつ任意の局所的ランダム化器を圧縮・シミュレートする。従来のシミュレーションに基づく局所微分プライバシー機構とは異なり、PPRはデータの結合分布と元の局所ランダム化器の出力を正確に保存する。したがって、PPR圧縮されたプライバシメカニズムは、不偏性やガウシアン性など、元のプライバシメカニズムの望ましい統計特性をすべて保持している。さらに、PPRは理論的な下界から対数的ギャップ内の圧縮サイズを達成する。 PPRを用いて、分散平均推定のための通信、精度、中央および局所的な差分プライバシーの間の新しい秩序的なトレードオフを与える。分散平均推定実験の結果、PPRは、座標サブサンプリングされたガウス機構よりも通信、精度、中央差分プライバシーのトレードオフが良好であると同時に、局所差分プライバシーも提供することが示された。 To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the original local randomizer. Hence, the PPR-compressed privacy mechanism retains all desirable statistical properties of the original privacy mechanism such as unbiasedness and Gaussianity. Moreover, PPR achieves a compression size within a logarithmic gap from the theoretical lower bound. Using the PPR, we give a new order-wise trade-off between communication, accuracy, central and local differential privacy for distributed mean estimation. Experiment results on distributed mean estimation show that PPR consistently gives a better trade-off between communication, accuracy and central differential privacy compared to the coordinate subsampled Gaussian mechanism, while also providing local differential privacy.	翻訳日:2024-06-03 14:37:39 公開日:2024-05-28
# マルチモーダル・ムード・リーダー:事前学習したモデルが物体間感情認識に役立てる Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition ( http://arxiv.org/abs/2405.19373v1 ) ライセンス: Link先を確認	Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang,	(参考訳) 脳波(EEG)に基づく感情認識は、神経信号処理や感情計算などの分野で大きな注目を集め、多様な発展を遂げている。しかし、個人特有の脳解剖学は、被験者間での脳波信号の非無視的な自然差をもたらし、クロスオブジェクト感情認識の課題を提起する。最近の研究はこれらの問題に対処しようと試みているが、実用性やモデルフレームワークの統一性には限界がある。現在の方法では、脳波信号の複雑な時空間ダイナミクスを捉えるのに苦労し、マルチモーダル情報を効果的に統合することができず、被検体間での最適化性能と限定的な一般化性をもたらす。これらの制約を克服するために,マスク脳信号モデリングと空間的注意機構を利用したクロスオブジェクト感情認識のための,事前学習モデルに基づくマルチモーダルモードリーダを開発した。このモデルは,大規模データセットの事前学習を通じて,脳波信号の普遍的な潜時表現を学習し,脳波データから抽出した微分エントロピー(DE)特徴を処理する。その後、識別的特徴を統合するために多層融合層を提案し、異なる次元とモダリティにまたがる特徴の利点を最大化する。公開データセットに関する大規模な実験は、Mood Readerのクロスオブジェクト感情認識タスクにおける優れたパフォーマンスを示し、最先端の手法よりも優れています。さらに、このモデルは注意点から切り離され、感情関連脳領域の質的分析を提供し、神経信号処理における感情研究に有用な洞察を提供する。 Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.	翻訳日:2024-05-31 19:45:41 公開日:2024-05-28
# 最適マルチクラスU-キャリブレーション誤差とそれを超えるもの Optimal Multiclass U-Calibration Error and Beyond ( http://arxiv.org/abs/2405.19374v1 ) ライセンス: Link先を確認	Haipeng Luo, Spandan Senapati, Vatsal Sharan,	(参考訳) オンラインマルチクラスU-キャリブレーションの問題を考えると、予測者はU-キャリブレーション誤差が低いクラスに対して連続的な分布予測を行うことを目標としている。 Kleinberg et al (2023) は U-calibration error $O(K\sqrt{T})$ after $T$ rounds というアルゴリズムを開発した。我々は、最適U校正誤差が$\Theta(\sqrt{KT})$ -- まず、ダスカラキスとシルグカニスのFollow-the-Perturbed-Leaderアルゴリズム(2016)がこの上限を達成し、その後、特定の適切な損失で構築された一致した下限が続くという単純な観察から始める。また、損失関数に関する自然な仮定では、Lipschitz の固有損失に対して $\Theta(\log T)$ U-calibration error, $O(\log T)$ U-calibration error for a certain class of decomposable proper loss, U-calibration error bounds for proper loss with a low covered number などである。 We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. Kleinberg et al. (2023) developed an algorithm with U-calibration error $O(K\sqrt{T})$ after $T$ rounds and raised the open question of what the optimal bound is. We resolve this question by showing that the optimal U-calibration error is $\Theta(\sqrt{KT})$ -- we start with a simple observation that the Follow-the-Perturbed-Leader algorithm of Daskalakis and Syrgkanis (2016) achieves this upper bound, followed by a matching lower bound constructed with a specific proper loss (which, as a side result, also proves the optimality of the algorithm of Daskalakis and Syrgkanis (2016) in the context of online learning against an adversary with finite choices). We also strengthen our results under natural assumptions on the loss functions, including $\Theta(\log T)$ U-calibration error for Lipschitz proper losses, $O(\log T)$ U-calibration error for a certain class of decomposable proper losses, U-calibration error bounds for proper losses with a low covering number, and others.	翻訳日:2024-05-31 19:35:57 公開日:2024-05-28
# Cross-Attentive Modulationトークンを用いたリンクセット予測のグローバルな認識の改善 Improving global awareness of linkset predictions using Cross-Attentive Modulation tokens ( http://arxiv.org/abs/2405.19375v1 ) ライセンス: Link先を確認	Félix Marcoccia, Cédric Adjih, Paul Mühlethaler,	(参考訳) 複数のリンク予測やグラフ生成技術のほとんどは、適切なリンク予測を形成するためにノードレベルの情報交換を利用するグラフニューラルネットワーク(GNN)に頼っている。このようなノードレベルの相互作用は順序列としてノードを処理せず、ノードの自然な順序付けを暗示する。グラフ問題には適しているが、予測されるリンクのグローバルなオーケストレーションの提供に苦慮しているため、パフォーマンスが損なわれる可能性がある。典型的な問題は、大域的な接続性、固定径、過密化や過密化といった情報のボトルネック効果の回避などの高レベルな特性を確保することの難しさである。この問題に対処するために、我々は、予測リンクのグローバル一貫性を改善するコンテキスト認識計算を可能にするために、ノードとエッジレベルの変調に使用されるクロスアテンテートユニットを導入するクロスアテンテート変調(CAM)トークンを提案する。いくつかの置換不変アーキテクチャで実装し、私たちの仕事のメリットを証明するベンチマークをベンチマークします。 Most of multiple link prediction or graph generation techniques rely on the attention mechanism or on Graph Neural Networks (GNNs), which consist in leveraging node-level information exchanges in order to form proper link predictions. Such node-level interactions do not process nodes as an ordered sequence, which would imply some kind of natural ordering of the nodes: they are said to be permutation invariant mechanisms. They are well suited for graph problems, but struggle at providing a global orchestration of the predicted links, which can result in a loss of performance. Some typical issues can be the difficulty to ensure high-level properties such as global connectedness, fixed diameter or to avoid information bottleneck effects such as oversmoothing and oversquashing, which respectively consist in abundant smoothing in dense areas leading to a loss of information and a tendency to exclude isolated nodes from the message passing scheme, and often result in irrelevant, unbalanced link predictions. To tackle this problem, we hereby present Cross-Attentive Modulation (CAM) tokens, which introduce cross-attentive units used to condition node and edge-level modulations in order to enable context-aware computations that improve the global consistency of the prediction links. We will implement it on a few permutation invariant architectures, and showcase benchmarks that prove the merits of our work.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-28
# PureEBM:エネルギーモデルミッドランダイナミクスによるユニバーサルポゾンの浄化 PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models ( http://arxiv.org/abs/2405.19376v1 ) ライセンス: Link先を確認	Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie,	(参考訳) データ中毒攻撃は、トレーニング中に敵の例を注入することで、ターゲット配布テストデータの誤分類につながることによって、機械学習モデルの完全性に重大な脅威をもたらす。既存のSoTA(State-of-the-art)防衛手法は、一般化性能の大幅な低下、特定の攻撃タイプや分類器への特異性、訓練中のかなりのオーバーヘッドなど、様々な制限に悩まされており、現実のアプリケーションでは非現実的または限定的である。この課題に対応するために、我々は、画像$xで初期化された収束エネルギーベースモデル(EBM)の反復的ランゲヴィンサンプリングにより実現された普遍確率前処理ステップ$\Psi_{T}(x)$を適用することにより、悪質な白、グレー、ブラックボックスのイメージ毒から自然に訓練された分類器を保護するユニバーサルデータ浄化手法を導入する。 $$\Psi_{T}(x)$のミッドランダイナミクス分類器ネットワークの一般化に重要な機能に対する最小限の影響で毒情報を浄化する。 EBM の対照的な学習過程は,有毒な EBM トレーニングデータが存在する場合でも,普遍的な清浄剤を維持でき,さらに,有毒なNarcissus および無毒な無毒な Gradient Matching と Bullseye Polytope を誘導する SoTA の防御を達成できることを示す。この研究はPureGenで導入されたより大きなフレームワークのサブセットであり、ESMの浄化と毒の防御により詳細な焦点をあてている。 Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution test data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from a variety of limitations, such as significantly reduced generalization performance, specificity to particular attack types and classifiers, and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $\Psi_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $\Psi_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that the contrastive learning process of EBMs allows them to remain universal purifiers, even in the presence of poisoned EBM training data, and to achieve SoTA defense on leading triggered poison Narcissus and triggerless poisons Gradient Matching and Bullseye Polytope. This work is a subset of a larger framework introduced in PureGen with a more detailed focus on EBM purification and poison defense.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-28
# ウイルスゲノムアライメントフリー分類における統計的線形モデル:C型肝炎ウイルスへの応用 Statistical Linear Models in Virus Genomic Alignment-free Classification: Application to Hepatitis C Viruses ( http://arxiv.org/abs/1910.05421v3 ) ライセンス: Link先を確認	Amine M. Remita, Abdoulaye Baniré Diallo,	(参考訳) ウイルス配列分類は病原体の検出、疫学調査、進化研究において重要な課題である。統計的学習法は、環境からのサンプル中のウイルス配列の分類と同定に広く用いられている。これらの手法は、組換え、突然変異率、多様性など、ウイルスゲノムの性質と性質に関連するいくつかの課題に直面している。また、新しい世代のシークエンシング技術は、大量の断片化されたシーケンスを生成することで、他の困難を生じさせる。線形分類器はウイルスの分類によく用いられるが、アライメントフリーアプローチの文脈では既存のモデルの精度空間の探索が欠如している。本研究では, 遺伝子組換えおよび部分的, 完全ゲノムのサブタイプにおける線形分類器の能力について, 徹底的な評価手法を提案する。 C型肝炎ウイルス(HCV)に感染する。本研究では,分類器型(生成的・識別的)とその超パラメータ(平滑化値と正規化ペナルティ関数),分類タスク(ジェノタイピングとサブタイピング),テストシーケンスの長さ(部分的・完全),k-mer語の長さなど,いくつかの変数が検討されている。全体として、いくつかの分類器は、上記の実験変数の正確な組み合わせの集合が与えられたときによく機能する。最後に、ウイルスゲノムの分類をより堅牢に評価するための手順とベンチマークデータを提供する。 Viral sequence classification is an important task in pathogen detection, epidemiological surveys and evolutionary studies. Statistical learning methods are widely used to classify and identify viral sequences in samples from environments. These methods face several challenges associated with the nature and properties of viral genomes such as recombination, mutation rate and diversity. Also, new generations of sequencing technologies rise other difficulties by generating massive amounts of fragmented sequences. While linear classifiers are often used to classify viruses, there is a lack of exploration of the accuracy space of existing models in the context of alignment free approaches. In this study, we present an exhaustive assessment procedure exploring the power of linear classifiers in genotyping and subtyping partial and complete genomes. It is applied to the Hepatitis C viruses (HCV). Several variables are considered in this investigation such as classifier types (generative and discriminative) and their hyper-parameters (smoothing value and regularization penalty function), the classification task (genotyping and subtyping), the length of the tested sequences (partial and complete) and the length of k-mer words. Overall, several classifiers perform well given a set of precise combination of the experimental variables mentioned above. Finally, we provide the procedure and benchmark data to allow for more robust assessment of classification from virus genomes.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-28
# テンソルネットワークにおける臨界U(1)スピン液体と創発対称性のロバスト性 Robustness of critical U(1) spin liquids and emergent symmetries in tensor networks ( http://arxiv.org/abs/2008.04833v2 ) ライセンス: Link先を確認	Henrik Dreyer, Laurens Vanderstraeten, Ji-Yao Chen, Ruben Verresen, Norbert Schuch,	(参考訳) 臨界共鳴バレンス結合 (RVB) スピン液体の長距離一重項を持つドーピングに対する応答について検討し, より一般的には非対称摂動に対するU(1)対称テンソルネットワークについて検討した。フィールド理論の記述を用いて、RVBではドーピングが関連する摂動を構成しており、以前の観測とは対照的にすぐにギャップを開きます。本分析では, ドッピング量においても非常に大きな相関長を予測し, 高精度な数値シミュレーションを用いて検証する。これは注意深い分析の必要性を強調しつつ、臨界系に対する変分アンサッツのような状態の使用を正当化する。最後に、非対称摂動がギャップを開かず、U(1)対称性が再帰するPEPSの例を示す。 We study the response of critical Resonating Valence Bond (RVB) spin liquids to doping with longer-range singlets, and more generally of U(1)-symmetric tensor networks to non-symmetric perturbations. Using a field theory description, we find that in the RVB, doping constitutes a relevant perturbation which immediately opens up a gap, contrary to previous observations. Our analysis predicts a very large correlation length even at significant doping, which we verify using high-accuracy numerical simulations. This emphasizes the need for careful analysis, but also justifies the use of such states as a variational ansatz for critical systems. Finally, we give an example of a PEPS where non-symmetric perturbations do not open up a gap and the U(1) symmetry re-emerges.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-28
# 識別機構を有するスケーラブルなビデオオブジェクト分割 Scalable Video Object Segmentation with Identification Mechanism ( http://arxiv.org/abs/2203.11442v8 ) ライセンス: Link先を確認	Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang,	(参考訳) 本稿では,半教師付きビデオオブジェクトセグメンテーション(VOS)において,スケーラブルで効果的なマルチオブジェクトモデリングを実現する上での課題について述べる。従来のVOSメソッドは、単一の正のオブジェクトで機能をデコードし、マルチオブジェクト表現の学習を制限する。さらに、以前のテクニックは特定のアプリケーション目標に適合し、異なるスピード精度要件を満たす柔軟性に欠けていた。これらの問題を解決するために,AOT(Associating Objects with Transformers)とAOST(Associating Objects with Scalable Transformers)という2つの革新的なアプローチを提案する。効果的なマルチオブジェクトモデリングの追求において、AOTは各オブジェクトにユニークなIDを割り当てるためのID(ID)メカニズムを導入している。このアプローチにより、ネットワークは、すべてのオブジェクト間の関連を同時にモデル化し、単一のネットワークパスにおけるオブジェクトの追跡とセグメンテーションを容易にする。非フレキシブルなデプロイメントの課題に対処するため、AOSTはさらに、スケーラブルな監視とレイヤ単位のIDベースの注意を取り入れた、スケーラブルな長期的な短期トランスフォーマーを統合している。これにより、VOSで初めてオンラインアーキテクチャのスケーラビリティが可能になり、ID埋め込みの表現制限を克服できる。マルチオブジェクトアノテーションを含むVOSのベンチマークが欠如していることを踏まえ,我々のアプローチを検証するために,ビデオオブジェクトセグメンテーション・イン・ザ・ワイルド(VOSW)ベンチマークを提案する。 VOSWと一般的に使用されているVOSベンチマーク5つ、YouTube-VOS 2018と2019 Val、DAVIS-2017 Val & Test、DAVIS-2016を含む、様々なAOTおよびAOSTのバリエーションを評価した。当社のアプローチは最先端の競合に勝って,6つのベンチマークすべてにおいて,例外的な効率性とスケーラビリティを一貫して示しています。プロジェクトページ: https://github.com/yoxu515/aot-benchmark.com This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS). Previous VOS methods decode features with a single positive object, limiting the learning of multi-object representation as they must match and segment each target separately under multi-object scenarios. Additionally, earlier techniques catered to specific application objectives and lacked the flexibility to fulfill different speed-accuracy requirements. To address these problems, we present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST). In pursuing effective multi-object modeling, AOT introduces the IDentification (ID) mechanism to allocate each object a unique identity. This approach enables the network to model the associations among all objects simultaneously, thus facilitating the tracking and segmentation of objects in a single network pass. To address the challenge of inflexible deployment, AOST further integrates scalable long short-term transformers that incorporate scalable supervision and layer-wise ID-based attention. This enables online architecture scalability in VOS for the first time and overcomes ID embeddings' representation limitations. Given the absence of a benchmark for VOS involving densely multi-object annotations, we propose a challenging Video Object Segmentation in the Wild (VOSW) benchmark to validate our approaches. We evaluated various AOT and AOST variants using extensive experiments across VOSW and five commonly used VOS benchmarks, including YouTube-VOS 2018 & 2019 Val, DAVIS-2017 Val & Test, and DAVIS-2016. Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks. Project page: https://github.com/yoxu515/aot-benchmark.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-28
# 2光子駆動によるノイズ非エルミタン量子センシングの指数感度回復 Exponential sensitivity revival of noisy non-Hermitian quantum sensing with two-photon drives ( http://arxiv.org/abs/2303.16575v2 ) ライセンス: Link先を確認	Liying Bao, Bo Qi, Franco Nori, Daoyi Dong,	(参考訳) 多重モード非エルミート格子力学の特異な性質を利用して指数関数的に感度の高いセンサを構築することができる。しかし、ノイズの影響はいまだ不明であり、感度が著しく低下する可能性がある。非エルミタンセンサの感度回復と安定性に対する損失と利得の影響を解析的に特徴付け,強調する。量子センシングの優位性は損失の存在下で消滅するという一般的な信念を守り、損失を積極的に調整することで、感覚力学が安定すると指数的な感度が驚くほど回復する。さらに、ゲインが理想的指数感度を完全に回復し、バランスの取れたロスとゲインによって非エルミートセンシングの安定性を確保することが重要であることを証明した。本論文は、損失と利得を積極的に調整することで感度を著しく向上する方法を開き、将来の量子センシングと量子工学を促進する。 Unique properties of multimode non-Hermitian lattice dynamics can be utilized to construct exponentially sensitive sensors. However, the impact of noise remains unclear, which may severely degrade their sensitivity. We analytically characterize and highlight the impact of loss and gain on the sensitivity revival and stability of non-Hermitian sensors. Defying the general belief that the superiority of quantum sensing will vanish in the presence of loss, we find that by proactively tuning the loss, the exponential sensitivity can be surprisingly regained when the sensing dynamics is stable. Furthermore, we prove that gain is crucial to fully revive the ideally exponential sensitivity and to ensure the stability of non-Hermitian sensing by making a balanced loss and gain. Our paper opens a way to significantly enhance the sensitivity by proactively tuning the loss and gain, which may promote future quantum sensing and quantum engineering.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-28
# 自己監督型時空間グラウンド(自己監督型時空間グラウンド) : ナラティブ・インストラクションによるマルチアクションビデオ What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions ( http://arxiv.org/abs/2303.16990v2 ) ライセンス: Link先を確認	Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne,	(参考訳) 時空間グラウンドメント(時空間グラウンド)とは、空間と時間における事象の局所化(例えばビデオデータ)を、言葉による記述のみに基づいて記述する作業である。このタスクのモデルは、通常、人間の注釈付き文とバウンディングボックスの監督によって訓練される。本研究は、この課題をマルチモーダルな監督の観点から解決し、人間のアノテーションを使わずに、ゆるやかなビデオとサブタイトルの監督のみに基づいて訓練された時空間行動基盤のためのフレームワークを提案する。この目的のために,局所的な表現学習と,より詳細な空間情報を活用することに焦点を当てたグローバルな表現符号化を併用して,高次表現をキャプチャし,両者を協調的なアプローチで組み込む。この課題を実生活環境で評価するために,5K以上のイベントに対して,高密度な時空間的接地アノテーションを提供するベンチマークデータセットが提案されている。提案手法は,空間的,時間的,不整合な多行動時空間グラウンドなど,様々な設定において,現在のベースラインよりも改善されていることを示す。 Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this task are usually trained with human-annotated sentences and bounding box supervision. This work addresses this task from a multimodal supervision perspective, proposing a framework for spatio-temporal action grounding trained on loose video and subtitle supervision only, without human annotation. To this end, we combine local representation learning, which focuses on leveraging fine-grained spatial information, with a global representation encoding that captures higher-level representations and incorporates both in a joint approach. To evaluate this challenging task in a real-life setting, a new benchmark dataset is proposed providing dense spatio-temporal grounding annotations in long, untrimmed, multi-action instructional videos for over 5K events. We evaluate the proposed approach and other methods on the proposed and standard downstream tasks showing that our method improves over current baselines in various settings, including spatial, temporal, and untrimmed multi-action spatio-temporal grounding.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-28
# リッチテキストを用いた表現型テキスト・画像生成 Expressive Text-to-Image Generation with Rich Text ( http://arxiv.org/abs/2304.06720v3 ) ライセンス: Link先を確認	Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang,	(参考訳) プレーンテキストは、テキストと画像の合成の一般的なインターフェースになっている。しかし、その限定されたカスタマイズオプションは、ユーザーが求める出力を正確に記述することを妨げる。例えば、プレーンテキストは、それぞれの単語の正確なRGB色値や重要性など、連続的な量を特定するのを難しくしている。さらに、複雑なシーンのための詳細なテキストプロンプトを作成することは、人間が書くのが面倒で、テキストエンコーダが解釈するのは難しい。これらの課題に対処するために、フォントスタイル、サイズ、色、フットノートなどのフォーマットをサポートするリッチテキストエディタを提案する。それぞれの単語の属性をリッチテキストから抽出し、局所的なスタイル制御、明示的なトークン再重み付け、正確な色レンダリング、詳細な領域合成を可能にする。領域ベースの拡散プロセスによりこれらの機能を実現する。まず,平文を用いた拡散過程の注意図に基づいて各単語の領域を抽出する。各領域に対して,地域固有の詳細なプロンプトを作成し,地域固有のガイダンスを適用してテキスト属性を強制し,地域ベースのインジェクションによる平文生成に対する忠実さを維持する。リッチテキストからの画像生成の様々な例を示し、定量的評価により、本手法が強いベースラインより優れていることを示す。 Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-28
# 関連性への注意のシフト:自由形大言語モデルの予測的不確実性定量化に向けて Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models ( http://arxiv.org/abs/2307.01379v3 ) ライセンス: Link先を確認	Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, Kaidi Xu,	(参考訳) 大規模言語モデル (LLM) は、言語生成と命令に続く有望な結果を示すが、しばしば「ハロシン化」し、出力の信頼性を低下させる。不確実性量子化(UQ)の潜在的な解決策にもかかわらず、LSM内で正確に実装することは困難である。我々の研究は単純なヒューリスティックを導入している: 自動回帰 LLM テキストの全てのトークンは、その基礎となる意味を等しく表しているわけではない。しかし、現在の手法では不確実性を評価する際にこの不等式を過小評価しており、限定的な意味を持つトークンはUQにおいて等しくあるいは過度に重み付けされる。これを修正するために、より関連性の高いSAR(Shifting Attention to more Relevant)コンポーネントをトークンレベルと文レベルの両方で提案する。 Vicuna, WizardLM, LLaMA-2-chat など, 一般的な LLM を対象とし, モデルサイズを33B まで拡張した広範囲な実験を行った。我々は,読解,理科Q&A,医学Q&Aなどの領域を含む,自由形式の質問応答タスクを評価した。総合的な人口統計分析と合わせて,SARの優れた性能を実証した。コードはhttps://github.com/jinhaoduan/SAR.comで公開されている。 Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate", making their outputs less reliable. Despite Uncertainty Quantification's (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underlying meaning, as "linguistic redundancy" often allows a few keywords to convey the essence of long sentences. However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. We conduct extensive experiments involving a range of popular "off-the-shelf" LLMs, such as Vicuna, WizardLM, and LLaMA-2-chat, with model sizes extending up to 33B parameters. We evaluate various free-form question-answering tasks, encompassing domains such as reading comprehension, science Q&A, and medical Q&A. Our experimental results, coupled with a comprehensive demographic analysis, demonstrate the superior performance of SAR. The code is available at https://github.com/jinhaoduan/SAR.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-28
# PIGEON:画像位置情報の予測 PIGEON: Predicting Image Geolocations ( http://arxiv.org/abs/2307.05845v6 ) ライセンス: Link先を確認	Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn,	(参考訳) 惑星規模の画像のジオローカライゼーションは、世界中のどこから来た画像の多様性のため、依然として困難な問題である。視覚変換器をベースとした手法は地理的局所化の精度を大幅に向上させたが、先行文学における成功はランドマークの画像の狭い分布に制約されており、性能は見当たらない場所に一般化されていない。本稿では, セマンティックジオセル生成, マルチタスクコントラスト事前学習, 新たな損失関数を組み合わせた新しいジオローカライズシステムを提案する。さらに,本研究は,推定精度を高めるため,位置クラスタ上で検索を行う最初の試みである。まず,Geoguessrのゲームから得られたデータに基づいてトレーニングを行い,目標地点から25km以内に推定値の40%以上を世界規模で配置することができる。また、ロボットを開発し、人間に対する盲点実験でPIGEONをデプロイし、プレイヤーの上位0.01%にランク付けした。我々はまた、世界有数のプロであるGeoguessrプレーヤーの1人に対して、数百万人の視聴者と6試合に挑戦し、6試合全てで勝利した。第2のモデルであるPIGEOTTOは、FlickrとWikipediaの画像データセットでトレーニングされ、幅広い画像ジオローカライゼーションベンチマークで最先端の結果を達成し、都市の精度レベルでは最大7.7%、国レベルでは最大38.8ポイントのSOTAを上回ります。この結果から,PIGEOTTOは未知の場所に効果的に一般化する最初の画像ジオローカライゼーションモデルであり,高精度で惑星規模の画像ジオローカライゼーションシステムを実現するための道を開くことができることが示唆された。私たちのコードはGitHubで入手可能です。 Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-28
# 古典的、量子的、閉かつオープンなシステムに対する作用 Action for classical, quantum, closed and open systems ( http://arxiv.org/abs/2307.12320v2 ) ライセンス: Link先を確認	Janos Polonyi,	(参考訳) 作用函数は古典力学、量子力学、閉力学、開力学を変動原理の一般化や古典力学、量子力学の経路積分形式論において定義することができることはよく知られている。これらのスキームは異常な特徴、すなわち自由度を正式に再活性化することに基づいている。やり直しを動機付けるいくつかの議論は古典力学や量子力学において、そのような形式主義が自然であることを証明するために進められている。 It is well known that the action functional can be used to define classical, quantum, closed, and open dynamics in a generalization of the variational principle and in the path integral formalism in classical and quantum dynamics, respectively. These schemes are based on an unusual feature, a formal redoubling of the degrees of freedom. Several arguments to motivate the redoubling are put forward in classical and quantum mechanics to demonstrate that such a formalism is natural.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-28
# 代数集合を用いた半クリフォードゲートの特性評価 Characterising semi-Clifford gates using algebraic sets ( http://arxiv.org/abs/2309.15184v2 ) ライセンス: Link先を確認	Imin Chen, Nadish de Silva,	(参考訳) フォールトトレラント量子計算における中心的な役割により、クリフォード階層の第3階層のゲートの集合とその「近対角」半クリフォードゲートの傑出した部分集合について研究する。クリフォード階層ゲートは適切なマジック状態のゲートテレポーテーションによって実装することができる。フォールトトレランスを達成するために必要なこれらのリソース状態の膨大な量は、普遍量子コンピュータの実践的実現にとって重要なボトルネックである。セミクリフォードゲートはこれらの資源状態をはるかに効率的に利用して実装できるので重要である。最大2キューディットの3階ゲートが全て半クリフォードであることを証明する。したがって、qubit の場合における Zeng-Chen-Chuang (2008) の結果と、qutrit の場合における 2 番目の著者 (2020) の結果を、任意の素次元 $d$ のクォーディットの場合に一般化する。初期の結果は網羅的な計算に頼っていたが、本研究では代数幾何学のツールを活用している。具体的には、三階クリフォード階層ゲートと三階半クリフォードゲートの集合に対応する2つのスキームを構築する。次に、これらのスキームを modulo $d$ に還元した2つの代数集合が、同じ有理点の集合を共有することを示す。 Motivated by their central role in fault-tolerant quantum computation, we study the sets of gates of the third-level of the Clifford hierarchy and their distinguished subsets of `nearly diagonal' semi-Clifford gates. The Clifford hierarchy gates can be implemented via gate teleportation given appropriate magic states. The vast quantity of these resource states required for achieving fault-tolerance is a significant bottleneck for the practical realisation of universal quantum computers. Semi-Clifford gates are important because they can be implemented with far more efficient use of these resource states. We prove that every third-level gate of up to two qudits is semi-Clifford. We thus generalise results of Zeng-Chen-Chuang (2008) in the qubit case and of the second author (2020) in the qutrit case to the case of qudits of arbitrary prime dimension $d$. Earlier results relied on exhaustive computations whereas our present work leverages tools of algebraic geometry. Specifically, we construct two schemes corresponding to the sets of third-level Clifford hierarchy gates and third-level semi-Clifford gates. We then show that the two algebraic sets resulting from reducing these schemes modulo $d$ share the same set of rational points.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-28
# ニューラルネットワークの理論と実践の切り離しについて:NTK視点の限界 On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective ( http://arxiv.org/abs/2310.00137v2 ) ライセンス: Link先を確認	Jonathan Wenger, Felix Dangel, Agustinus Kristiadi,	(参考訳) ニューラル・タンジェント・カーネル(NTK)は、大規模ニューラルネットワークの振る舞いを記述する理論的枠組みとして注目されている。カーネル法は理論的によく理解されており、結果としてアルゴリズムの利点が享受され、より広い合成ニューラルネットワークアーキテクチャで実証できる。これらの利点には、高速な最適化、信頼性のある不確実性定量化、継続的な学習の改善などがある。しかしながら、現在のカーネル体制への収束率の定量化の結果は、これらの利点を活用するには、それらよりも桁違いに広いアーキテクチャが必要であることを示唆している。この仮定は、実際に使用されるアーキテクチャがNTKが予測した振る舞いを示さないという懸念を提起する。本稿では,NTKに関するこれまでの研究を,この制限機構が大規模建築物の実用的関連行動を予測するかどうかを実証的に検証することによって補足する。我々の結果は、複数のドメインにまたがるケースではないことを証明している。この観測により、NTK理論がアーキテクチャとアルゴリズムの選択にどのような影響を及ぼすべきかという疑問がさらに持ち上がった。 The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that architectures used in practice do not exhibit behaviors as predicted by the NTK. Here, we supplement previous work on the NTK by empirically investigating whether the limiting regime predicts practically relevant behavior of large-width architectures. Our results demonstrate that this is not the case across multiple domains. This observed disconnect between theory and practice further calls into question to what degree NTK theory should inform architectural and algorithmic choices.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-28
# DeepHGCN: 効率的でスケーラブルなDeep Hyperbolic Graph Convolutional Networksの準備 DeepHGCN: Recipe for Efficient and Scalable Deep Hyperbolic Graph Convolutional Networks ( http://arxiv.org/abs/2310.02027v3 ) ライセンス: Link先を確認	Jiaxu Liu, Xinping Yi, Xiaowei Huang,	(参考訳) 双曲グラフ畳み込みネットワーク (HGCN) は階層グラフから情報を抽出する大きな可能性を証明している。しかし、既存のHGCNは、高額な双曲演算と、深さが増加するにつれて過度に平滑な問題のために、浅いアーキテクチャに限られている。 GCNでは、過剰なスムースメントを緩和するために治療が適用されているが、双曲療法の開発は、双曲性の性質に適合するように慎重に設計されるべきであるため、異なる課題を呈している。以上の課題に対処するため,本研究では,計算効率を劇的に改善し,オーバースムーシング効果を大幅に軽減した,最初の深層HGCNアーキテクチャであるDeepHGCNを提案する。ディープHGCNは,(1)高速かつ高精度な線形写像を実現する新しい双曲的特徴変換層,(2)双曲的残差接続や重みと特徴の正則化といった手法を,効率的な双曲的中点法により促進する。広範囲な実験により、DeepHGCNはユークリッドと浅い双曲GCNの変種と比較してリンク予測とノード分類のタスクが大幅に改善されていることが示されている。 Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-28
# 予測不確実性に対するモデル非依存変数の重要性--エントロピーに基づくアプローチ Model-agnostic variable importance for predictive uncertainty: an entropy-based approach ( http://arxiv.org/abs/2310.12842v2 ) ライセンス: Link先を確認	Danny Wood, Theodore Papamarkou, Matt Benatan, Richard Allmendinger,	(参考訳) 機械学習アルゴリズムの予測を信頼するには,これらの予測に寄与する要因を理解する必要がある。確率論的かつ不確実性を考慮したモデルの場合、予測自体の理由だけでなく、モデルが予測に自信を持つ理由も理解する必要がある。本稿では、既存の説明可能性の手法を不確実性認識モデルに拡張し、そのような拡張を用いてモデルの予測分布における不確実性の原因を理解する方法について述べる。特に、置換特徴量の重要性、部分依存プロット、個別条件予測プロットを適応させることにより、モデル行動に対する新たな洞察が得られ、これらの手法が、その分布の下での予測分布のエントロピーと基底真理ラベルの対数類似度の両方に対する特徴の影響を測定することができることを示す。合成データと実世界のデータの両方を用いて実験を行い、不確実性の原因とモデル性能への影響の両方を理解するためにこれらの手法の有用性を実証する。 In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the reasons for the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches to understand both the sources of uncertainty and their impact on model performance.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-28
# GEO: ジェネレーティブエンジン最適化 GEO: Generative Engine Optimization ( http://arxiv.org/abs/2311.09735v2 ) ライセンス: Link先を確認	Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik R Narasimhan, Ameet Deshpande,	(参考訳) 大規模言語モデル (LLMs) の出現は, ユーザクエリに応答するための情報収集と要約に生成モデルを使用する, 検索エンジンの新たなパラダイムに根ざしている。この新技術は、ジェネレーティブエンジン(GE)の統一的なフレームワークの下で形式化され、正確でパーソナライズされたレスポンスを生成し、GoogleやBingのような従来の検索エンジンを急速に置き換えます。生成エンジンは通常、複数のソースから情報を合成し、LLMを使ってそれらを要約することでクエリを満足する。この変更により、‘textit{user}ユーティリティと‘textit{generative search engine}トラフィックが大幅に改善されるが、第3のステークホルダーであるWebサイトとコンテンツクリエーターにとって大きな課題となる。生成エンジンのブラックボックスと高速移動の性質を考えると、コンテンツクリエーターは、そのコンテンツを表示するtextit{when} と \textit{how} をほとんど、あるいは全くコントロールしていない。生成エンジンが残るためには、創造者経済が不利益にならないようにしなければなりません。これを解決するために、私たちは、可視化メトリクスの最適化と定義のための柔軟なブラックボックス最適化フレームワークを通じて、GEレスポンスにおけるコンテンツの可視性を改善するために、コンテンツクリエーターを支援する最初の新しいパラダイムであるジェネラティブエンジン最適化(GEO)を紹介します。我々は,複数のドメインにまたがる多様なユーザクエリの大規模ベンチマークであるGEO-benchと,これらのクエリに応答する関連Webソースを導入することで,体系的な評価を容易にする。厳密な評価により,GEOの可視性は最大40%向上することを示した。さらに、これらの戦略の有効性はドメインによって異なり、ドメイン固有の最適化手法の必要性が強調されている。私たちの研究は、情報発見システムにおける新たなフロンティアを開き、GEの開発者とコンテンツクリエーターの両方に深い影響をもたらします。 The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them using LLMs. While this shift significantly improves \textit{user} utility and \textit{generative search engine} traffic, it poses a huge challenge for the third stakeholder - website and content creators. Given the black-box and fast-moving nature of generative engines, content creators have little to no control over \textit{when} and \textit{how} their content is displayed. With generative engines here to stay, we must ensure the creator economy is not disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in GE responses through a flexible black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation by introducing GEO-bench, a large-scale benchmark of diverse user queries across multiple domains, along with relevant web sources to answer these queries. Through rigorous evaluation, we demonstrate that GEO can boost visibility by up to 40\% in GE responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific optimization methods. Our work opens a new frontier in information discovery systems, with profound implications for both developers of GEs and content creators.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-28
# Direct Clifford+T Lattice surgery Compilation を用いた実用量子回路の実用化のための現実的コスト Realistic Cost to Execute Practical Quantum Circuits using Direct Clifford+T Lattice Surgery Compilation ( http://arxiv.org/abs/2311.10686v3 ) ライセンス: Link先を確認	Tyler LeBlond, Christopher Dean, George Watkins, Ryan S. Bennink,	(参考訳) 本稿では,Clifford+Tゲートセットを用いて表現された量子回路を表面コード格子手術命令セットに明示的にコンパイルする資源推定パイプラインについて報告する。コンパイルされた回路からのマジック状態要求のケイデンスにより、ポストホック解析においてマジック状態の蒸留と貯蔵要求の最適化が可能となる。論理回路を格子状手術操作にコンパイルするために,オープンソースのLattice Surgery Compilerを構築した。修正されたコンパイラは、論理ゲートを抽象的なレイアウトに依存しない命令セットに変換し、第2は、特定のリソースレイアウトに従ってハードウェアタイルに割り当てられる局所格子手術命令にコンパイルする。第2段階は論理的並列性を維持しながら、フォールトトレラント層におけるリソース競合を回避し、リアリズムを支援する。さらに、ユーザーはマジック状態が補充された専用タイルを指定することができ、論理計算からのリソースコストをマジック状態の蒸留と貯蔵から独立して考慮することができる。我々は,分子の基底状態推定のための資源推定を提供することにより,大規模で実用的な量子回路へのパイプラインの適用性を実証する。実回路における可変マジック状態の消費速度は、生産量が異なる限り、マジック状態記憶装置の資源コストが支配的になる可能性がある。 We report a resource estimation pipeline that explicitly compiles quantum circuits expressed using the Clifford+T gate set into a surface code lattice surgery instruction set. The cadence of magic state requests from the compiled circuit enables the optimization of magic state distillation and storage requirements in a post-hoc analysis. To compile logical circuits into lattice surgery operations, we build upon the open-source Lattice Surgery Compiler. The revised compiler operates in two stages: the first translates logical gates into an abstract, layout-independent instruction set; the second compiles these into local lattice surgery instructions that are allocated to hardware tiles according to a specified resource layout. The second stage retains logical parallelism while avoiding resource contention in the fault-tolerant layer, aiding realism. Additionally, users can specify dedicated tiles at which magic states are replenished, enabling resource costs from the logical computation to be considered independently from magic state distillation and storage. We demonstrate the applicability of our pipeline to large, practical quantum circuits by providing resource estimates for the ground state estimation of molecules. We find that variable magic state consumption rates in real circuits can cause the resource costs of magic state storage to dominate unless production is varied to suit.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-28
# 解析可解モデルにおけるページ曲線絡み合いのダイナミクス Page curve entanglement dynamics in an analytically solvable model ( http://arxiv.org/abs/2311.18045v3 ) ライセンス: Link先を確認	Stefan Kehrein,	(参考訳) ブラックホールの絡み合いエントロピーは、ページ曲線に従うことが期待されている。時間とともに最初の線形増加の後、絡み合いエントロピーはページ時間で最大に達し、その後減少する。エントロピーの絡み合いは、体積法則で飽和するのではなく、最近になって漸近的に消える。ページ曲線の屈曲は、粒子電流と絡み合い生成の間の半古典的な接続の崩壊、ハミルトニアンの絡み合いにおける量子相転移、および$q\rightarrow\infty$ Renyiエントロピーの非解析的挙動を伴う。これらの観測は、ここで解析された正確な可解性モデルを超えて、より大きな種類のシステムに当てはまると期待されている。 The entanglement entropy of black holes is expected to follow the Page curve. After an initial linear increase with time the entanglement entropy should reach a maximum at the Page time and then decrease. This paper introduces an exactly solvable model of free fermions that explicitly shows such a Page curve: The entanglement entropy vanishes asymptotically for late times instead of saturating at a volume law. The bending down of the Page curve is accompanied by a breakdown of the semiclassical connection between particle current and entanglement generation, a quantum phase transition in the entanglement Hamiltonian and non-analytic behavior of the $q\rightarrow\infty$ Renyi entropy. These observations are expected to hold for a larger class of systems beyond the exactly solvable model analyzed here.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-28
# テクスチャ生成のためのフィールド遅延をもつ単一メッシュ拡散モデル Single Mesh Diffusion Models with Field Latents for Texture Generation ( http://arxiv.org/abs/2312.09250v3 ) ライセンス: Link先を確認	Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia,	(参考訳) 高品質なテクスチャを合成することを目的として、3次元形状の表面に直接作用する固有潜在拡散モデルの枠組みを導入する。提案手法は,2つのコントリビューション,メッシュ頂点上の離散ベクトル場としてテクスチャを符号化する潜時表現,および学習された潜時空間における拡散過程を学習する場潜時拡散モデルである。私たちは、メッシュ上の特定のテクスチャのバリエーションを生成するために、モデルがトレーニングされる、単一テクスチャ・メシュのパラダイムを考えています。合成テクスチャは,既存の単一テクスチャ・メシュ生成モデルと比較すると,優れた忠実度を示す。我々のモデルは、インペイントやラベル誘導生成などのユーザ制御編集タスクにも適応できる。提案手法の有効性は, アイソメトリー下でのフレームワークの等価性に起因し, 局所的に類似した領域の細部をシームレスに再現し, 生成的テクスチャ伝達の概念への扉を開くことができる。 We introduce a framework for intrinsic latent diffusion models operating directly on the surfaces of 3D shapes, with the goal of synthesizing high-quality textures. Our approach is underpinned by two contributions: field latents, a latent representation encoding textures as discrete vector fields on the mesh vertices, and field latent diffusion models, which learn to denoise a diffusion process in the learned latent space on the surface. We consider a single-textured-mesh paradigm, where our models are trained to generate variations of a given texture on a mesh. We show the synthesized textures are of superior fidelity compared those from existing single-textured-mesh generative models. Our models can also be adapted for user-controlled editing tasks such as inpainting and label-guided generation. The efficacy of our approach is due in part to the equivariance of our proposed framework under isometries, allowing our models to seamlessly reproduce details across locally similar regions and opening the door to a notion of generative texture transfer.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-28
# テキスト-画像拡散モデルのための正規化ニュートンラフソンインバージョン Regularized Newton Raphson Inversion for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.12540v2 ) ライセンス: Link先を確認	Dvir Samuel, Barak Meiri, Nir Darshan, Shai Avidan, Gal Chechik, Rami Ben-Ari,	(参考訳) 拡散反転は、画像とそれを記述したテキストプロンプトを取り込み、画像を生成する雑音消音器を見つけるという問題である。現在のほとんどのインバージョン技術は、暗黙の方程式を解くことで動作し、ゆっくりと収束するか、再構成された画像が貧弱になる可能性がある。そこで我々は,この問題を暗黙の方程式の根源として定式化し,効率的な解法を設計する。我々の解法は、数値解析においてよく知られた手法であるNewton-Raphson (NR) に基づいている。 NRの単純な応用は計算不可能であり、誤った解に収束する傾向がある。高品質な再構成を提供する解に迅速に収束する効率的な正規化定式化について述べる。また,インバージョンプロセス中の条件付けによる不整合の原因を同定し,インバージョン品質を著しく低下させる。この問題に対処するため、我々はエンコーディングの即時調整を導入し、この問題を効果的に修正する。我々のソリューションであるRegularized Newton-Raphson Inversionは、遅延一貫性モデルのために0.5秒以内に画像を反転させ、インタラクティブな画像編集のための扉を開く。さらに、画像補間と希少物体の生成における改善された結果を示す。 Diffusion inversion is the problem of taking an image and a text prompt that describes it and finding a noise latent that would generate the image. Most current inversion techniques operate by approximately solving an implicit equation and may converge slowly or yield poor reconstructed images. Here, we formulate the problem as finding the roots of an implicit equation and design a method to solve it efficiently. Our solution is based on Newton-Raphson (NR), a well-known technique in numerical analysis. A naive application of NR may be computationally infeasible and tends to converge to incorrect solutions. We describe an efficient regularized formulation that converges quickly to a solution that provides high-quality reconstructions. We also identify a source of inconsistency stemming from prompt conditioning during the inversion process, which significantly degrades the inversion quality. To address this, we introduce a prompt-aware adjustment of the encoding, effectively correcting this issue. Our solution, Regularized Newton-Raphson Inversion, inverts an image within 0.5 sec for latent consistency models, opening the door for interactive image editing. We further demonstrate improved results in image interpolation and generation of rare objects.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-28
# ソフトウェアデブロ化ツールの幅広い比較評価 A Broad Comparative Evaluation of Software Debloating Tools ( http://arxiv.org/abs/2312.13274v2 ) ライセンス: Link先を確認	Michael D. Brown, Adam Meily, Brian Fairservice, Akshay Sood, Jonathan Dorn, Eric Kilmer, Ronald Eytchison,	(参考訳) ソフトウェアデ肥大化ツールは、bloatと呼ばれる不要なコードを削除することで、プログラムのセキュリティとパフォーマンスを改善しようとしている。多くのテクニックが提案されているが、採用への障壁がいくつか現れている。すなわち、デ肥大化ツールは高度に専門化されており、採用者が自身のニーズに合ったタイプのツールを見つけることは困難である。これは、確立されたメトリクスの欠如と、ツール間の比較評価によってさらに妨げられます。この情報ギャップを埋めるため、我々は10年間にわたるデブロ化文学と、現在商業開発中のいくつかのツールを調査し、デブロ化エコシステムに関する知識を分類した。次に, 相対的強度と弱さを判定するために, 10個の脱血ツールの広範囲な比較評価を行った。評価は、20のベンチマークプログラム、12のパフォーマンス、セキュリティ、正当性の測定ツールに基づいて行われた。筆者らの評価では, 脱血文学における一般的な物語と矛盾するいくつかの知見が浮かび上がっている。まず、デ肥大化ツールには、現実のソフトウェアで使用するために必要な成熟度が欠如している。第二に、デ肥大化ツールは健全で堅牢なプログラムを作成するのに苦労する。新たなファジィファジィツールであるDIFFERを用いて、私たちのデ肥大化の試みのわずか13%が、健全で堅牢なデ肥大化プログラムを生み出したことがわかった。最後に,この結果から,デ肥大化ツールは一般的に,デ肥大化プログラムの性能やセキュリティの姿勢を著しく改善しないことが明らかとなった。この論文における私たちのコントリビューションは、潜在的な採用者がツールの展望をよりよく理解し、より有能なデブロ化ツールの今後の研究と開発を動機付けるだろうと信じています。この目的のために、ベンチマークセット、データ、カスタムツールを公開しました。 Software debloating tools seek to improve the program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this information gap, we surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics. Our evaluation surfaces several concerning findings that contradict the prevailing narrative in debloating literature. First, debloating tools lack the required maturity to be used on real-world software, evidenced by a slim 21\% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13\% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools. To this end, we have made our benchmark set, data, and custom tools publicly available.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-28
# コンセプト・ボトルネック・モデルは地域性に悪影響を及ぼすか? Do Concept Bottleneck Models Obey Locality? ( http://arxiv.org/abs/2401.01259v2 ) ライセンス: Link先を確認	Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik,	(参考訳) 概念に基づく手法は、人間の理解可能な概念を用いてモデル予測を説明する。これらのモデルは正確な概念予測器を必要とするが、既存の概念予測器が基礎となる概念に忠実であることは明らかではない。本稿では,一般的なコンセプトベースアーキテクチャのファミリであるConcept Bottleneck Models (CBM) の忠実さを,データセットの「地域」を尊重するかどうかを考察する。ローカリティは、コンセプトの価値を予測する際に、関連する機能のみを使用する。局所性が考慮されない場合、その概念は、急激な相関性、性能劣化、堅牢性に基づいて予測される。本研究は,モデル入力の摂動によってCBM予測がどのように変化するのかを考察し,独立概念が重複しない特徴部分集合に局所化されても,CBMが局所性を捉えないことを示す。我々の経験的および理論的結果は、相関した概念を持つデータセットが、局所性を学習できない正確だが解釈不能なモデルに繋がることを示した。全体として、CBMの解釈性は脆弱であり、CBMは時に急激な特徴に依存し、概念予測器の堅牢性に関するさらなる研究を必要としている。 Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "localities" in datasets. Localities involve using only relevant features when predicting a concept's value. When localities are not considered, concepts may be predicted based on spuriously correlated features, degrading performance and robustness. This work examines how CBM predictions change when perturbing model inputs, and reveals that CBMs may not capture localities, even when independent concepts are localised to non-overlapping feature subsets. Our empirical and theoretical results demonstrate that datasets with correlated concepts may lead to accurate but uninterpretable models that fail to learn localities. Overall, we find that CBM interpretability is fragile, as CBMs occasionally rely upon spurious features, necessitating further research into the robustness of concept predictors.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-28
# 2次元量子多体基底状態のバンバン準備--2次元テンソルネットワークを用いたアルゴリズムの最適化 Bang-bang preparation of quantum many-body ground states in two dimensions: optimization of the algorithm with a two-dimensional tensor network ( http://arxiv.org/abs/2401.09158v3 ) ライセンス: Link先を確認	Yintai Zhang, Jacek Dziarmaga,	(参考訳) バンバン(BB)アルゴリズムは、初期積状態が$H_1$と$H_2$の間で交互に変化することによって、2次元(2次元)量子多体ハミルトンの基底状態を作成する。近傍テンソル更新を用いて、BB進化を無限対絡み状態 (iPEPS) でシミュレートする。交代シーケンスは、最終エネルギーをコスト関数として最適化する。エネルギーは、その安定性のために接空間法で計算される。この手法は、iPEPSの変分最適化により得られた基底状態に対して、量子臨界点付近の2次元逆場量子イジングモデルでベンチマークされる。最適BB配列は、基底状態の量子アニールまたは断熱処理(AP)をシミュレートする配列と非摂動的に異なる。最適BBエネルギーは最適APエネルギーよりもはるかに速いバン数と収束する。 A bang-bang (BB) algorithm prepares the ground state of a two-dimensional (2D) quantum many-body Hamiltonian $H=H_1+H_2$ by evolving an initial product state alternating between $H_1$ and $H_2$. We use the neighborhood tensor update to simulate the BB evolution with an infinite pair-entangled projected state (iPEPS). The alternating sequence is optimized with the final energy as a cost function. The energy is calculated with the tangent space methods for the sake of their stability. The method is benchmarked in the 2D transverse field quantum Ising model near its quantum critical point against a ground state obtained by variational optimization of the iPEPS. The optimal BB sequence differs non-perturbatively from a sequence simulating quantum annealing or adiabatic preparation (AP) of the ground state. The optimal BB energy converges with the number of bangs much faster than the optimal AP energy.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-28
# 部分グロモフ・ワッサーシュタイン計量 Partial Gromov-Wasserstein Metric ( http://arxiv.org/abs/2402.03664v2 ) ライセンス: Link先を確認	Yikun Bai, Rocio Diaz Martin, Abihith Kothapalli, Hengrong Du, Xinran Liu, Soheil Kolouri,	(参考訳) 近年、Gromov-Wasserstein(GW)距離は、異なる距離空間における測度の比較を可能にするため、機械学習コミュニティへの関心が高まっている。古典的なGW問題と同じ質量要件によって課される制限を克服するために、研究者たちはバランスの取れない環境でその応用を探求し始めている。しかし、アンバランス GW (UGW) は、2つの測度空間 (mm-空間) の間の厳密な距離/距離というよりは、差分と見なすことができる。本稿では,部分グロモフ・ワッサーシュタイン(PGW)と呼ばれるUGW問題の特殊な事例を提案する。我々は、PGWがmm空間間のよく定義された計量であることを確立し、PGW問題に対する最小化器の存在やPGWとGWの関係など、理論的性質について議論する。次に、PGW問題を解くために、Frank-Wolfeアルゴリズムの2つの変種を提案し、それらが数学的および計算学的に等価であることを示す。さらに、PGW測定値に基づいて、mm-空間に対するバリー中心の類似概念を導入する。最後に, 形状マッチング, 形状検索, 形状補間などの応用において, PGW測定と関連する解法の有効性を検証し, 既存のベースラインと比較した。 The Gromov-Wasserstein (GW) distance has gained increasing interest in the machine learning community in recent years, as it allows for the comparison of measures in different metric spaces. To overcome the limitations imposed by the equal mass requirements of the classical GW problem, researchers have begun exploring its application in unbalanced settings. However, Unbalanced GW (UGW) can only be regarded as a discrepancy rather than a rigorous metric/distance between two metric measure spaces (mm-spaces). In this paper, we propose a particular case of the UGW problem, termed Partial Gromov-Wasserstein (PGW). We establish that PGW is a well-defined metric between mm-spaces and discuss its theoretical properties, including the existence of a minimizer for the PGW problem and the relationship between PGW and GW, among others. We then propose two variants of the Frank-Wolfe algorithm for solving the PGW problem and show that they are mathematically and computationally equivalent. Moreover, based on our PGW metric, we introduce the analogous concept of barycenters for mm-spaces. Finally, we validate the effectiveness of our PGW metric and related solvers in applications such as shape matching, shape retrieval, and shape interpolation, comparing them against existing baselines.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-28
# LLMs for Material Discovery:実際は分子のベイズ最適化に良いのか? A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? ( http://arxiv.org/abs/2402.05015v2 ) ライセンス: Link先を確認	Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alán Aspuru-Guzik, Geoff Pleiss,	(参考訳) 自動化は現代の物質発見の基盤の1つである。ベイズ最適化(BO)はそのようなワークフローの不可欠な部分であり、科学者は事前のドメイン知識を利用して大きな分子空間を効率的に探索することができる。このような事前の知識は多くの形態をとることができるが、大きな言語モデル(LLM)にカプセル化された補助的な科学的知識には、かなりのファンファーレがあった。しかし、既存の研究は、ヒューリスティックな材料探索のためのLLMを探索しているだけである。実際、最近の研究は、ポイント推定された非ベイズ的 LLM から不確実性推定(BO の積分部分)を得る。本研究では, LLMが分子空間におけるベイズ最適化の原理を加速するのに実際に有用かどうかを考察する。私たちはこの質問に答える際に冷静で思いやりのない姿勢を取る。これは慎重に行われる一 LLM を標準だが原則化された BO シュロゲートモデルの固定特徴抽出器として見ること。二パラメータ効率のよい微調整法とベイズニューラルネットワークを活用してLLMサロゲートの後部を得る。実世界の化学問題に対する広範な実験により、LLMは分子上のBOに有用であるが、ドメイン固有のデータで事前訓練または微調整された場合に限り有用であることが示された。 Automation is one of the cornerstones of contemporary material discovery. Bayesian optimization (BO) is an essential part of such workflows, enabling scientists to leverage prior domain knowledge into efficient exploration of a large molecular space. While such prior knowledge can take many forms, there has been significant fanfare around the ancillary scientific knowledge encapsulated in large language models (LLMs). However, existing work thus far has only explored LLMs for heuristic materials searches. Indeed, recent work obtains the uncertainty estimate -- an integral part of BO -- from point-estimated, non-Bayesian LLMs. In this work, we study the question of whether LLMs are actually useful to accelerate principled Bayesian optimization in the molecular space. We take a sober, dispassionate stance in answering this question. This is done by carefully (i) viewing LLMs as fixed feature extractors for standard but principled BO surrogate models and by (ii) leveraging parameter-efficient finetuning methods and Bayesian neural networks to obtain the posterior of the LLM surrogate. Our extensive experiments with real-world chemistry problems show that LLMs can be useful for BO over molecules, but only if they have been pretrained or finetuned with domain-specific data.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-28
# ニュートリノ媒体における一生の遭遇モデル:コヒーレント振動からフレーバー平衡へ Once-in-a-lifetime encounter models for neutrino media: From coherent oscillations to flavor equilibration ( http://arxiv.org/abs/2402.05022v2 ) ライセンス: Link先を確認	Anson Kost, Lucas Johns, Huaiyu Duan,	(参考訳) 集団ニュートリノ振動は典型的には、平均場近似(英語版)としても知られる最低階の量子力学方程式を用いて研究される。しかし、近年の量子多体シミュレーションでは、ニュートリノ間の量子絡み合いが重要であり、ニュートリノガスのフレーバー平衡をもたらす可能性が示唆されている。本研究では,ニュートリノガスに対する新しい量子モデルを開発し,一対のニュートリノが一生に一度だけ相互作用できることを示す。我々のモデルの主要なパラメータは$\gamma=\mu \Delta z$であり、$\mu$はニュートリノ結合強度であり、これはニュートリノ密度に比例する。我々のモデルは、極限$\gamma\to0$の平均場アプローチに還元され、時間$t \gg (\gamma\mu)^{-1}$のフレーバー平衡を達成する。これらのモデルは、粒子の観点からコヒーレントなフレーバー振動の出現を示し、集合ニュートリノ振動における量子エンタングルメントの役割を解明するのに役立つ。 Collective neutrino oscillations are typically studied using the lowest-order quantum kinetic equation, also known as the mean-field approximation. However, some recent quantum many-body simulations suggest that quantum entanglement among neutrinos may be important and may result in flavor equilibration of the neutrino gas. In this work, we develop new quantum models for neutrino gases in which any pair of neutrinos can interact at most once in their lifetimes. A key parameter of our models is $\gamma=\mu \Delta z$, where $\mu$ is the neutrino coupling strength, which is proportional to the neutrino density, and $\Delta z$ is the duration over which a pair of neutrinos can interact each time. Our models reduce to the mean-field approach in the limit $\gamma\to0$ and achieve flavor equilibration in time $t \gg (\gamma\mu)^{-1}$. These models demonstrate the emergence of coherent flavor oscillations from the particle perspective and may help elucidate the role of quantum entanglement in collective neutrino oscillations.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-28
# 一般化された選好最適化:オフラインアライメントへの統一アプローチ Generalized Preference Optimization: A Unified Approach to Offline Alignment ( http://arxiv.org/abs/2402.05749v2 ) ライセンス: Link先を確認	Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot,	(参考訳) オフライン優先最適化により、オフラインデータから直接大規模なモデルを微調整することが可能となり、最近のアライメントプラクティスで有効であることが証明された。凸関数の一般クラスによってパラメータ化されるオフライン損失の族である一般化優先最適化(GPO)を提案する。 GPOは、DPO、IPO、SLiCといった既存のアルゴリズムを特別なケースとして含みながら、優先最適化に関する統一的なビューを可能にします。 GPOフレームワークはまた、損失を定義する凸関数の設計を通じて、オフラインアルゴリズムが正規化をどのように実施するかについても光を当てている。解析および実験により、正準RLHFの定式化を意図したオフライン正則化とKL分散正則化の関連性および微妙な相違が明らかとなった。ガオら 2023 と同様の制御された設定では、GPO 変種は正規化と性能の類似したトレードオフを達成できるが、ハイパーパラメータの最適値は理論によって予測されるように異なる可能性がある。以上の結果から,新たなアルゴリズムツールキットと経験的洞察を実践者のアライメントに提供した。 Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. GPO enables a unified view over preference optimization, encompassing existing algorithms such as DPO, IPO and SLiC as special cases, while naturally introducing new variants. The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss. Our analysis and experiments reveal the connections and subtle differences between the offline regularization and the KL divergence regularization intended by the canonical RLHF formulation. In a controlled setting akin to Gao et al 2023, we also show that different GPO variants achieve similar trade-offs between regularization and performance, though the optimal values of hyper-parameter might differ as predicted by theory. In all, our results present new algorithmic toolkits and empirical insights to alignment practitioners.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-28
# Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods ( http://arxiv.org/abs/2402.07108v2 ) ライセンス: Link先を確認	Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye,	(参考訳) オンライン線形プログラミングは、収益管理と資源配分の両方において重要な役割を担い、近年では効率的な一階オンライン学習アルゴリズムの開発に重点を置いている。一階法の実証的な成功にもかかわらず、それらは一般に$\mathcal{O}(\sqrt{T})$に劣らない後悔を達成し、これは、最先端の線形プログラミング(LP)ベースのオンラインアルゴリズムによって保証される$\mathcal{O}(\log T)$に比して最適である。本稿では,オンライン線形プログラミングに関するいくつかの重要な事実を整理し,一階法に基づくオンラインアルゴリズムが$\mathcal{O}(\sqrt{T})を超えることの難しさを明らかにする。この課題に対処するために、意思決定から学習を分離する新しいアルゴリズムフレームワークを導入する。初めて、この新しいフレームワークで一階法が後悔する$\mathcal{O}(T^{1/3})$が得られることを示す。 Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-28
# 切り換え可能なメカニズムによる暗黙の因果表現学習 Implicit Causal Representation Learning via Switchable Mechanisms ( http://arxiv.org/abs/2402.11124v2 ) ライセンス: Link先を確認	Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, Mark Crowley,	(参考訳) 観測データと介入データからの因果表現の学習には,暗黙の潜伏因果表現学習が必要である。因果的メカニズムの暗黙的な学習は通常、ハードとソフトの介入という2つの介入データを含む。現実のシナリオでは、ソフトな介入はハードな介入よりも現実的であることが多い。因果変化を直接強制するハード介入とは異なり、ソフト介入は因果機構に影響を与えることによって間接的に影響を与える。しかし、ソフト介入の微妙さは因果モデルの学習にいくつかの課題を課している。 1つの課題は、親関係はそのままであり、ソフト介入の効果が曖昧であることである。本稿では,ソフト介入を用いた因果モデル学習の課題に対処し,暗黙的モデリングを継続する。提案手法は,異なる因果機構を切り替えるように設計された \textit{causal mechanism switch variable} を用いてソフト介入の効果をモデル化する。実験では,ベースラインアプローチと比較して,同定可能な因果表現の学習の改善を一貫して観察した。 Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modeling. Our approach models the effects of soft interventions by employing a \textit{causal mechanism switch variable} designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-28
# PandoraのWhite-Box:大規模言語モデルにおける精密トレーニングデータの検出と抽出 Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models ( http://arxiv.org/abs/2402.17012v2 ) ライセンス: Link先を確認	Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel,	(参考訳) 本稿では,Large Language Models (LLMs) に対する最先端のプライバシ攻撃について述べる。我々の見出しは、ベースラインアタックの数百倍の精度を持つ事前訓練されたLLMに対する新たなメンバシップ推論アタック(MIA)と、自然条件下で微調整されたLLMから、細調整されたデータセットの50%以上(!)を抽出できることを示すパイプラインである。基礎となるモデルへの様々なアクセス、事前学習および微調整データ、MIAとトレーニングデータ抽出の両方について検討する。プレトレーニングデータには,モデル勾配に基づいてトレーニングデータメンバシップを予測する教師付きニューラルネットワーク分類器と,最近のLCMにおけるモデルスティーリング作業を活用したモデルへのロジットアクセスのみを必要とするこの攻撃の変種という,2つの新しいMIAを提案する。私たちの知る限り、これはモデルステアリング情報を明示的に組み込んだ最初のMIAです。どちらの攻撃も既存のブラックボックスベースラインより優れており、我々の監視された攻撃は、LSMに対するMIA攻撃の成功と、他の機械学習モデルにとって最も強力な攻撃とのギャップを埋める。微調整では, ベースモデルと微調整モデルとの損失率に基づく単純な攻撃により, ほぼ完全なMIA性能が得られることがわかった。これらの結果は、MIAおよびトレーニングデータ抽出のための事前訓練されたLLMと微調整されたLLMの両方に対する最強のプライバシ攻撃であり、これは独立した科学的関心を持ち、LLMのセキュリティ、プライバシ、著作権問題に重要な実践的意味を持つ。 In this paper we develop state-of-the-art privacy attacks against Large Language Models (LLMs), where an adversary with some access to the model tries to learn something about the underlying training data. Our headline results are new membership inference attacks (MIAs) against pretrained LLMs that perform hundreds of times better than baseline attacks, and a pipeline showing that over 50% (!) of the fine-tuning dataset can be extracted from a fine-tuned LLM in natural settings. We consider varying degrees of access to the underlying model, pretraining and fine-tuning data, and both MIAs and training data extraction. For pretraining data, we propose two new MIAs: a supervised neural network classifier that predicts training data membership on the basis of (dimensionality-reduced) model gradients, as well as a variant of this attack that only requires logit access to the model which leverages recent model-stealing work on LLMs. To our knowledge this is the first MIA that explicitly incorporates model-stealing information. Both attacks outperform existing black-box baselines, and our supervised attack closes the gap between MIA attack success against LLMs and the strongest known attacks for other machine learning models. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models. Taken together, these results represent the strongest existing privacy attacks against both pretrained and fine-tuned LLMs for MIAs and training data extraction, which are of independent scientific interest and have important practical implications for LLM security, privacy, and copyright issues.	翻訳日:2024-05-30 23:21:17 公開日:2024-05-28
# 多平面光変換器を用いた高次元量子鍵分布 High-dimensional quantum key distribution using a multi-plane light converter ( http://arxiv.org/abs/2403.04210v2 ) ライセンス: Link先を確認	Ohad Lib, Kfir Sulimany, Mateus Araújo, Michael Ben-Or, Yaron Bromberg,	(参考訳) 高次元量子鍵分布(QKD)は、2進法に比べて高い情報容量と強い雑音耐性を提供する。しかし、これらの利点は、要求される高次元の測定と変換を実現するのが困難であることによってしばしば妨げられる。本稿では,大規模マルチプレーン光コンバータ(MPLC)を実装し,QKDの空間モードの高次元モードソータとしてプログラムする。 5次元QKDと6つの非バイアスベース,25次元QKDの2つの相互バイアスベースを同じ実験装置で示す。さらに,実験誤差に対して頑健な相互に偏りのない基底のペアを構築することを提案し,測定複雑性は符号化次元の平方根に限られることを示した。このアプローチは、より高次元のQKD実装の道を開く。 High-dimensional quantum key distribution (QKD) offers higher information capacity and stronger resilience to noise compared to its binary counterpart. However, these advantages are often hindered by the difficulty of realizing the required high-dimensional measurements and transformations. Here, we implement a large-scale multi-plane light converter (MPLC) and program it as a high-dimensional mode sorter of spatial modes for QKD. Using the MPLC, we demonstrate five-dimensional QKD with six mutually unbiased bases and 25-dimensional QKD with two mutually unbiased bases in the same experimental setup. Furthermore, we propose a construction of pairs of mutually unbiased bases that are robust to experimental errors, with measurement complexity scaling only with the square root of the encoded dimension. This approach paves the way for QKD implementations in higher dimensions.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-28
# 未知のファインタニング例による言語モデルの幻覚制御 Unfamiliar Finetuning Examples Control How Language Models Hallucinate ( http://arxiv.org/abs/2403.05612v2 ) ライセンス: Link先を確認	Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine,	(参考訳) 大きな言語モデルは、馴染みのないクエリに直面すると幻覚化することが知られているが、モデル幻覚化の方法を管理する基盤となるメカニズムは、まだ完全には理解されていない。この研究では、ベースモデルの知識の範囲を超えて概念を導入する、モデルの微調整データに見慣れない例が、これらのエラーを形成するのに不可欠であることが分かりました。特に、LLMの幻覚予測は、馴染みの無い微調整の例と関連する反応を反映する傾向にある。これは、不慣れな微調整例がどのように教師されるかを変更することで、不慣れなクエリに対するモデルの応答に影響を与える可能性があることを示唆している(例: ‘I don't know'')。 SFT, RL, および報奨モデルによるトリヴィアQAおよびMMLUの微調整を含む一連の制御実験において, この観測を実証的に検証した。本研究は,RLファインタニング戦略をさらに研究し,長大なモデル生成の現実性を改善することを目的とする。その結果、報酬モデルによる幻覚は、RLの事実性を微調整する効果を著しく損なうが、報酬モデルによる報酬モデルの幻覚がこれらのネガティブな効果を最小化する方法を戦略的に制御できることが判明した。幻覚の制御に関するこれまでの知見を活かし、より信頼性の高い報酬モデルを学ぶためのアプローチを提案し、長文の伝記や書物・プロット生成タスクにおけるRL事実性の微調整の有効性を向上することを示す。 Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations. We find that, while hallucinations from the reward model can significantly undermine the effectiveness of RL factuality finetuning, strategically controlling how reward models hallucinate can minimize these negative effects. Leveraging our previous observations on controlling hallucinations, we propose an approach for learning more reliable reward models, and show that they improve the efficacy of RL factuality finetuning in long-form biography and book/movie plot generation tasks.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-28
# 一般化職業モデルによる伝達性強化学習 Transferable Reinforcement Learning via Generalized Occupancy Models ( http://arxiv.org/abs/2403.06328v2 ) ライセンス: Link先を確認	Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta,	(参考訳) 知的エージェントは、様々なタスクに迅速に適応できるジェネラリストでなければならない。強化学習(RL)において、モデルに基づくRLは、原則として計画を通じて任意の報酬関数への移行を可能にする、世界の力学モデルを学ぶ。しかし、自己回帰モデルロールアウトは複合誤差に悩まされ、モデルベースRLは長距離問題には有効ではない。継承機能は、政策の長期的状態占有度をモデル化し、新しいタスクの下での政策評価を線形報酬回帰に還元することで代替手段を提供する。しかし、後継機能による政策改善は難しい可能性がある。本研究は、定常データセットから後継特徴の分布を学習する一般化占有モデル(GOM)と、異なる後継特徴を実現するためのポリシーを新たに提案する。これらのモデルは任意の新しいタスクに対する最適なアクションを素早く選択できる。データセットの長期的な結果を直接モデル化することにより、GOMは、報酬関数間の迅速な転送を可能にしながら、複合エラーを回避することができる。本稿では,拡散モデルを用いたGOMの実用的インスタンス化について述べるとともに,様々なシミュレーションロボティクス問題に対して理論的にも経験的にも,トランスファー可能なモデルの新たなクラスとしての有効性を示す。ビデオとコードはhttps://weirdlabuw.github.io/gom/。 Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new tasks to linear reward regression. Yet, policy improvement with successor features can be challenging. This work proposes a novel class of models, i.e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features. These models can quickly select the optimal action for arbitrary new tasks. By directly modeling long-term outcomes in the dataset, GOMs avoid compounding error while enabling rapid transfer across reward functions. We present a practical instantiation of GOMs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems. Videos and code at https://weirdlabuw.github.io/gom/.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-28
# 正規化非負スケール不変低ランク近似モデルの効率的なアルゴリズム Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models ( http://arxiv.org/abs/2403.18517v2 ) ライセンス: Link先を確認	Jeremy E. Cohen, Valentin Leplat,	(参考訳) スパース非負行列因子化やスパース非負タッカー分解のような正規化非負の低ランク近似は、解釈可能性を高めた次元還元モデルの重要な分岐である。しかし、実際的な観点からは、正規化子と正規化係数の選択と効率的なアルゴリズムの設計は、これらのモデルの多因子の性質とこれらの選択を裏付ける理論の欠如のために困難である。本稿ではこれらの課題を改善することを目的とする。等質正規化スケール不変量(英語版)と呼ばれるより一般的なモデルを研究することにより、低ランク近似モデルに固有のスケール不変性が、予期せぬ有益効果と有害効果の両方で暗黙的な正則化を引き起こすことが証明される。この観察により、低ランク近似モデルにおける正規化関数の効果をよりよく理解し、正規化ハイパーパラメータの選択をガイドし、専用最適化アルゴリズムの収束速度を高めるためのバランス戦略を設計することができる。これらの結果のいくつかはすでに知られているが、正規化低ランク近似の特定の例に限定されている。また、正規化された非負の低ランク近似の多くを、収束保証付きで処理する一般化行列化最小化アルゴリズムを導出する。我々は,スパース非負行列因子分解,リッジ規則化カノニカルポリアディック分解,スパース非負タッカー分解への貢献を紹介する。 Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-28
# 各種人工知能を用いた血液検査パラメータに基づくCOVID-19検出 COVID-19 Detection Based on Blood Test Parameters using Various Artificial Intelligence Methods ( http://arxiv.org/abs/2404.02348v2 ) ライセンス: Link先を確認	Kavian Khanjani, Seyed Rasoul Hosseini, Hamid Taheri, Shahrzad Shashaani, Mohammad Teshnehlab,	(参考訳) 2019年には、新型コロナウイルスによる新型コロナウイルス感染症SARS-CoV-2(SARS-CoV-2)という新たな課題に直面した。新型コロナウイルスは世界中で急速に広まり、死亡率が高くなり、医療機関は感染抑制策を講じた。早期の疾患検出は治療プロセスにおいて不可欠であり、この取り組みを支援するためにコンピュータベースの自動検出システムが開発されている。これらのシステムは、機械学習、ニューラルネットワーク、ファジィシステム、病気の分類のためのディープラーニングといった人工知能(AI)アプローチに依存していることが多い。本研究は、自己分類分類器を用いて、さまざまなAI手法を用いて、新型コロナウイルス患者と他者とを区別することを目的とした。この研究では、血液検査サンプルと放射線画像の2つのデータセットを使用しました。サンラファエル病院で採取した血液検査の最良の結果は、Ensemble法(ニューラルネットワークと2つの機械学習手法の組み合わせ)を用いて、新型コロナウイルスと非新型コロナウイルスの2種類の個人を含む。その結果、新型コロナウイルスの診断はコスト効率が高く、他の方法よりも短い時間で結果が得られることがわかった。提案されたモデルは、使用するデータセットに対して94.09%の精度を達成した。第2に、X線写真は、正常、ウイルス性肺炎、グラウンドガラスの透明度、COVID-19感染の4つのクラスに分けられた。これらはセグメンテーションと分類に使用された。肺葉は画像から抽出され、その後特定のクラスに分類された。画像データセットで91.1%の精度を達成した。一般的に、この研究は、新型コロナウイルスの検出と管理におけるAIの可能性を強調し、この分野における継続的な研究と開発の重要性を強調している。 In 2019, the world faced a new challenge: a COVID-19 disease caused by the novel coronavirus, SARS-CoV-2. The virus rapidly spread across the globe, leading to a high rate of mortality, which prompted health organizations to take measures to control its transmission. Early disease detection is crucial in the treatment process, and computer-based automatic detection systems have been developed to aid in this effort. These systems often rely on artificial intelligence (AI) approaches such as machine learning, neural networks, fuzzy systems, and deep learning to classify diseases. This study aimed to differentiate COVID-19 patients from others using self-categorizing classifiers and employing various AI methods. This study used two datasets: the blood test samples and radiography images. The best results for the blood test samples obtained from San Raphael Hospital, which include two classes of individuals, those with COVID-19 and those with non-COVID diseases, were achieved through the use of the Ensemble method (a combination of a neural network and two machines learning methods). The results showed that this approach for COVID-19 diagnosis is cost-effective and provides results in a shorter amount of time than other methods. The proposed model achieved an accuracy of 94.09% on the dataset used. Secondly, the radiographic images were divided into four classes: normal, viral pneumonia, ground glass opacity, and COVID-19 infection. These were used for segmentation and classification. The lung lobes were extracted from the images and then categorized into specific classes. We achieved an accuracy of 91.1% on the image dataset. Generally, this study highlights the potential of AI in detecting and managing COVID-19 and underscores the importance of continued research and development in this field.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-28
# 接点交叉によって生じる小さな回避交差に対する二段階断熱遷移確率 Two-level adiabatic transition probability for small avoided crossings generated by tangential intersections ( http://arxiv.org/abs/2404.17777v2 ) ライセンス: Link先を確認	Kenta Higuchi, Takuya Watanabe,	(参考訳) 本稿では,二つのパラメータ(断熱パラメータとエネルギーギャップパラメータ)がゼロとなる限界の下で,二段回避交差の遷移確率の漸近挙動について検討する。これは、接する交差点によって回避された交差が生成され、非断熱的な体制に従う、我々の以前の作品の継続である。主な結果は、遷移確率の漸近膨張だけでなく、いくつかの回避された交差と異なる消滅順序から生じる2パラメータ状態の共存によって引き起こされる量子干渉も解明する。 In this paper, the asymptotic behaviors of the transition probability for two-level avoided crossings are studied under the limit where two parameters (adiabatic parameter and energy gap parameter) tend to zero. This is a continuation of our previous works where avoided crossings are generated by tangential intersections and obey a non-adiabatic regime. The main results elucidate not only the asymptotic expansion of transition probability but also a quantum interference caused by several avoided crossings and a coexistence of two-parameter regimes arising from different vanishing orders.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-28
# 交互ミラーのシンプレクティック解析 A Symplectic Analysis of Alternating Mirror Descent ( http://arxiv.org/abs/2405.03472v2 ) ライセンス: Link先を確認	Jonas Katona, Xiuyuan Wang, Andre Wibisono,	(参考訳) 双線型ゼロサムゲームに対する交互ミラーD(Alternating Mirror Descent, AMD)アルゴリズムの挙動を理解することにより, シンプレクティック・オイラー法による連続時間ハミルトン流の離散化について検討する。我々は、シンプレクティックオイラー法において、保存量である修正ハミルトニアン(MH)の存在と性質に重点を置いて、ハミルトン力学、リー代数、シンプレクティック数値積分器の結果を用いた分析フレームワークを提供する。元のハミルトニアンが二次函数であるとき、MHを閉形式で計算し、それ以前に知られている他の保存量と一般的に異なることを示す。 AMD の平均イテレートの双対性ギャップは、改良された $\mathcal{O}(K^{1/5})$ total regret bound と $\mathcal{O}(K^{-4/5})$ $\mathcal{O}(K^{-4/5})$ $ である。最後に、もし真であれば、AMDの完全後悔は$\mathcal{O}\left(K^{\varepsilon}\right)$、平均的なイテレートの双対性ギャップは$\mathcal{O}\left(K^{-1+\varepsilon}\right)$として、任意の$\varepsilon>0$に対して$\mathcal{O}\left(K^{-1+\varepsilon}\right)$であり、MHの収束条件によって$\varepsilon=0$を取ることができるという予想を提案する。 Motivated by understanding the behavior of the Alternating Mirror Descent (AMD) algorithm for bilinear zero-sum games, we study the discretization of continuous-time Hamiltonian flow via the symplectic Euler method. We provide a framework for analysis using results from Hamiltonian dynamics, Lie algebra, and symplectic numerical integrators, with an emphasis on the existence and properties of a conserved quantity, the modified Hamiltonian (MH), for the symplectic Euler method. We compute the MH in closed-form when the original Hamiltonian is a quadratic function, and show that it generally differs from the other conserved quantity known previously in that case. We derive new error bounds on the MH when truncated at orders in the stepsize in terms of the number of iterations, $K$, and use these bounds to show an improved $\mathcal{O}(K^{1/5})$ total regret bound and an $\mathcal{O}(K^{-4/5})$ duality gap of the average iterates for AMD. Finally, we propose a conjecture which, if true, would imply that the total regret for AMD scales as $\mathcal{O}\left(K^{\varepsilon}\right)$ and the duality gap of the average iterates as $\mathcal{O}\left(K^{-1+\varepsilon}\right)$ for any $\varepsilon>0$, and we can take $\varepsilon=0$ upon certain convergence conditions for the MH.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-28
# 任意遅延下における不均一物体の非同期フェデレーション確率最適化 Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays ( http://arxiv.org/abs/2405.10123v2 ) ライセンス: Link先を確認	Charikleia Iakovidou, Kibaek Kim,	(参考訳) フェデレートラーニング(FL)は、中央サーバの協調の下で、複数の場所("clients")に保持されたデータでモデルをセキュアにトレーニングするために提案されている。 FLアルゴリズムの性能を阻害する2つの大きな課題は、クライアントの階層化による長いトレーニング時間と、非IDなローカルデータ分布("client drift")下でのモデルの精度の低下である。本研究では,非同期通信を利用して収束を高速化し,拡張性を向上するアルゴリズムであるAsynchronous Exact Averaging (AREA) を提案・解析し,クライアント更新頻度の変動によるクライアントのドリフトの補正にクライアントメモリを利用する。さらに、AREAは、私たちの知る限り、遅延適応段階化を使わずに、任意に長い遅延の下で収束することが保証される最初の方法である。 i) 強凸で滑らかな関数に対して、漸近的にその大きさが反復数に関して使われる確率勾配の分散にのみ依存する誤差近傍に収束する。 (ii) 凸で非滑らかな関数の場合, 集中確率勾配法の収束率を, 最小(または最大)ではなく, 個々のクライアント更新頻度の平均に依存する定数因子に一致させる。解析の結果,特にクライアント数の増加に伴い,ローカルデータが非IDである場合,AREAは最先端の手法よりも優れることが示された。 Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients, and a decline in model accuracy under non-iid local data distributions ("client drift"). In this work, we propose and analyze Asynchronous Exact Averaging (AREA), a new stochastic (sub)gradient algorithm that utilizes asynchronous communication to speed up convergence and enhance scalability, and employs client memory to correct the client drift caused by variations in client update frequencies. Moreover, AREA is, to the best of our knowledge, the first method that is guaranteed to converge under arbitrarily long delays, without the use of delay-adaptive stepsizes, and (i) for strongly convex, smooth functions, asymptotically converges to an error neighborhood whose size depends only on the variance of the stochastic gradients used with respect to the number of iterations, and (ii) for convex, non-smooth functions, matches the convergence rate of the centralized stochastic subgradient method up to a constant factor, which depends on the average of the individual client update frequencies instead of their minimum (or maximum). Our numerical results validate our theoretical analysis and indicate AREA outperforms state-of-the-art methods when local data are highly non-iid, especially as the number of clients grows.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-28
# インフラストラクチャエンジニアリング: 研究エコシステムにおける過小評価された役割 Infrastructure Engineering: A Still Missing, Undervalued Role in the Research Ecosystem ( http://arxiv.org/abs/2405.10473v2 ) ライセンス: Link先を確認	Vanessa Sochat,	(参考訳) 研究はますますソフトウェアに頼り、バイオインフォマティクス、高性能コンピューティング、物理学、機械学習、人工知能の原動力となっている。研究対象となるソフトウェアや関連資産を直接的に開発するソフトウェア技術者であるリサーチソフトウェアエンジニアのために、かなりの進歩があったが、研究インフラストラクチャとイノベーション、すなわち、コンパイラと互換性ツールの開発、オーケストレーションとスケジューリングインフラストラクチャ、開発者環境、コンテナテクノロジ、ワークフローマネージャといった、研究インフラストラクチャとイノベーションの背後にある労働力にはほとんど関心が向けられていない。クラウドコンピューティングのさまざまなモデルに向けて経済的なインセンティブが進み、両方の世界のベストを表す新しいパラダイムを開発するためには革新が必要であるため、「収束コンピューティング」と呼ばれる取り組みは、そのような役割の必要性は理想的ではなく、科学の継続的な成功に不可欠である。非伝統的な職種に散在するスタッフは、この分野のいくつかの側面で作業する時間を見出しているが、それを支援するための大きな労働力の欠如とインセンティブが科学界を後退させてきた。この記事では、この欠落したレイヤの重要性を強調し、インフラストラクチャエンジニアの役割の欠如が、相互運用性、ポータビリティ、そして科学の再現性において、いかに非効率になったかを例示します。我々は、これらの技術に対して、個人が明示的に作業するためのリソースを割り当て、提供し、維持できないことは、我々の科学コミュニティの継続的な成功に最適でない未来をもたらす可能性があることを示唆する。 Research has become increasingly reliant on software, serving as the driving force behind bioinformatics, high performance computing, physics, machine learning and artificial intelligence, to name a few. While substantial progress has been made in advocating for the research software engineer, a kind of software engineer that typically works directly on software and associated assets that go into research, little attention has been placed on the workforce behind research infrastructure and innovation, namely compilers and compatibility tool development, orchestration and scheduling infrastructure, developer environments, container technologies, and workflow managers. As economic incentives are moving toward different models of cloud computing and innovating is required to develop new paradigms that represent the best of both worlds, an effort called "converged computing," the need for such a role is not just ideal, but essential for the continued success of science. While scattered staff in non-traditional roles have found time to work on some facets of this space, the lack of a larger workforce and incentive to support it has led to the scientific community falling behind. In this article we will highlight the importance of this missing layer, providing examples of how a missing role of infrastructure engineer has led to inefficiencies in the interoperability, portability, and reproducibility of science. We suggest that an inability to allocate, provide resources for, and sustain individuals to work explicitly on these technologies could lead to possible futures that are sub-optimal for the continued success of our scientific communities.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-28
# 対話型協調計画獲得におけるマインドモデリング理論の限界 Limits of Theory of Mind Modelling in Dialogue-Based Collaborative Plan Acquisition ( http://arxiv.org/abs/2405.12621v2 ) ライセンス: Link先を確認	Matteo Bortoletto, Constantin Ruhdorfer, Adnen Abdessaied, Lei Shi, Andreas Bulling,	(参考訳) 対話型協調計画獲得(CPA)に関する最近の研究は、非対称なスキルセットと知識を持つ設定において、心の理論(ToM)モデリングが不足した知識予測を改善することを示唆している。 ToMは効果的なコラボレーションのために重要とされているが、この新しいタスクに対する実際の影響は未解明のままである。計画をグラフとして表現し、タスク固有の制約を活用することで、CPAのパフォーマンスが自分自身の不足した知識を予測するときにほぼ倍になるため、ToMモデリングによる改善は減少することを示す。この現象は、既存のベースライン法を評価する際にも持続する。 CPAにおけるToMの関連性をよりよく理解するために,本研究では,ToM機能の有無によるモデルの性能比較を原則的に報告する。異なるモデルとアブリゲーションにわたる結果は、学習されたToM機能は、ToMに知覚可能なリンクを伴わずに、データ内の遅延パターンを反映する可能性が高いことを一貫して示唆している。この発見は、CPA以降におけるToMの役割のより深い理解と、計算協調エージェントにおける精神状態のモデリングと評価のための新しい方法を要求する。 Recent work on dialogue-based collaborative plan acquisition (CPA) has suggested that Theory of Mind (ToM) modelling can improve missing knowledge prediction in settings with asymmetric skill-sets and knowledge. Although ToM was claimed to be important for effective collaboration, its real impact on this novel task remains under-explored. By representing plans as graphs and by exploiting task-specific constraints we show that, as performance on CPA nearly doubles when predicting one's own missing knowledge, the improvements due to ToM modelling diminish. This phenomenon persists even when evaluating existing baseline methods. To better understand the relevance of ToM for CPA, we report a principled performance comparison of models with and without ToM features. Results across different models and ablations consistently suggest that learned ToM features are indeed more likely to reflect latent patterns in the data with no perceivable link to ToM. This finding calls for a deeper understanding of the role of ToM in CPA and beyond, as well as new methods for modelling and evaluating mental states in computational collaborative agents.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# Pytorch-Wildlife: 保全のための協調的なディープラーニングフレームワーク Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation ( http://arxiv.org/abs/2405.12930v2 ) ライセンス: Link先を確認	Andres Hernandez, Zhongqi Miao, Luisa Vargas, Rahul Dodhia, Juan Lavista,	(参考訳) 様々な要因によって引き起こされた世界の生物多様性の急激な減少は、大規模な野生生物モニタリングの緊急の必要性を浮き彫りにしている。これに対し、科学者は野生生物のモニタリングにおいて、データ処理のための自動化されたディープラーニング手法に目を向けた。しかし、これらの高度な手法を現実のシナリオに適用することは、その複雑さと専門知識の必要性により、主に技術的な課題と学際的障壁のために困難である。これらの課題に対処するために、PyTorch上に構築されたオープンソースのディープラーニングプラットフォームであるPytorch-Wildlifeを紹介します。強力なAIモデルの作成、修正、共有のために設計されている。このプラットフォームはユーザビリティとアクセシビリティを重視しており、技術的背景が限られている個人でもアクセス可能である。また、機能拡張とさらなる開発を簡単にするためのモジュール化されたコードベースも提供する。 Pytorch-Wildlifeは直感的でユーザフレンドリなインターフェースを提供し、画像やビデオの動物検出と分類のために、ローカルインストールまたはHugging Faceを通じてアクセスすることができる。現実世界の2つの応用として、Pytorch-Wildlifeは、アマゾン熱帯雨林での動物分類モデルの訓練や、ガラパゴス諸島での侵入性オポッサムの認識に利用されている。 Opossumモデルは98%の精度で、Amazonモデルはデータの90%で36匹の動物に対して92%の精度で認識する。 Pytorch-Wildlifeが進化するにつれて、環境問題に対処しながら、より多くの保全タスクを統合することを目指しています。 Pytorch-Wildlifeはhttps://github.com/microsoft/CameraTraps.comで公開されている。 The alarming decline in global biodiversity, driven by various factors, underscores the urgent need for large-scale wildlife monitoring. In response, scientists have turned to automated deep learning methods for data processing in wildlife monitoring. However, applying these advanced methods in real-world scenarios is challenging due to their complexity and the need for specialized knowledge, primarily because of technical challenges and interdisciplinary barriers. To address these challenges, we introduce Pytorch-Wildlife, an open-source deep learning platform built on PyTorch. It is designed for creating, modifying, and sharing powerful AI models. This platform emphasizes usability and accessibility, making it accessible to individuals with limited or no technical background. It also offers a modular codebase to simplify feature expansion and further development. Pytorch-Wildlife offers an intuitive, user-friendly interface, accessible through local installation or Hugging Face, for animal detection and classification in images and videos. As two real-world applications, Pytorch-Wildlife has been utilized to train animal classification models for species recognition in the Amazon Rainforest and for invasive opossum recognition in the Galapagos Islands. The Opossum model achieves 98% accuracy, and the Amazon model has 92% recognition accuracy for 36 animals in 90% of the data. As Pytorch-Wildlife evolves, we aim to integrate more conservation tasks, addressing various environmental challenges. Pytorch-Wildlife is available at https://github.com/microsoft/CameraTraps.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# FairLENS: 法執行音声認識における公正性の評価 FairLENS: Assessing Fairness in Law Enforcement Speech Recognition ( http://arxiv.org/abs/2405.13166v2 ) ライセンス: Link先を確認	Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Bryan Wheeler, Yasser Ibrahim,	(参考訳) 自動音声認識(ASR)技術は強力なツールとなり、法執行のシナリオにおける効率性を高めている。異なる音響環境における人口集団の公平性を確保するために、ASRエンジンは現実的な設定で様々な話者間でテストされなければならない。しかし、信頼性のあるモデル間の公平性の違いを説明することは依然として困難である。一方、ほとんどのパブリックなASRデータセットは満足のいく公正性評価を行うには不十分である。この制限に対処するため、系統的な公平性評価フレームワークであるFairLENSを構築しました。本研究では,異なるモデル間の公平さの相違を検証するための,新しい適応性評価手法を提案する。また、複数のシナリオと人口統計次元をカバーする公平性評価データセットも収集した。このフレームワークを活用することで、1つのオープンソースと11の商用利用可能な最先端のASRモデルに対して公平性の評価を行った。以上の結果から,特定の実世界のシナリオに対してASRモデルを選択する際に,ユーザが情報選択を行うためのフェアネスガイドラインとして機能するモデルが,他のモデルよりも多くのバイアスを示すことが明らかとなった。さらに、特定の人口集団に対するモデルバイアスについて検討し、音響領域の変化が新しいバイアスの出現につながることを観察した。 Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most public ASR datasets are insufficient to perform a satisfying fairness evaluation. To address the limitations, we built FairLENS - a systematic fairness evaluation framework. We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We also collected a fairness evaluation dataset covering multiple scenarios and demographic dimensions. Leveraging this framework, we conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models. Our results reveal that certain models exhibit more biases than others, serving as a fairness guideline for users to make informed choices when selecting ASR models for a given real-world scenario. We further explored model biases towards specific demographic groups and observed that shifts in the acoustic domain can lead to the emergence of new biases.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# CamemBERT-bioを用いた臨床物語の多目的表現 Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio ( http://arxiv.org/abs/2405.18448v1 ) ライセンス: Link先を確認	Boammani Aser Lompo, Thanh-Dung Le,	(参考訳) 本研究では,CamemBERT-bioを用いて,医学文献から抽出した数値を7つの異なる生理カテゴリーに分類することを目的とした。従来の研究は、トランスフォーマーベースのモデルが従来のNLPモデルと同等に機能しない可能性を示唆していた。 CamemBERT-bioのパフォーマンスを向上させるために,キーワード埋め込みをモデルに組み込むことと,テキストからすべての数値データを排除して数に依存しない戦略を採用するという,2つの大きなイノベーションを紹介した。ラベル埋め込み手法の実装は、注意機構を洗練させ、"数値盲点"データセットを使用する技術は、文脈中心の学習を促進することを目的としている。我々の研究のもう1つの重要な要素は、抽出された数値データの臨界度を決定することである。これを実現するために、確立された標準範囲内に値が該当するかどうかを検証するための簡単なアプローチを利用した。 F1スコア0.89の従来法を上回り,CamemBERT-bioの有効性が著しく向上した。これは従来のアプローチの0.73ドルF_1$スコアよりも20倍、最先端のアプローチの0.82ドルF_1$スコアよりも9倍以上増加することを意味する。トレーニングデータセットが小さく、バランスの取れていないにもかかわらず、これらすべてが達成された。 This research aims to classify numerical values extracted from medical documents across seven distinct physiological categories, employing CamemBERT-bio. Previous studies suggested that transformer-based models might not perform as well as traditional NLP models in such tasks. To enhance CamemBERT-bio's performances, we introduce two main innovations: integrating keyword embeddings into the model and adopting a number-agnostic strategy by excluding all numerical data from the text. The implementation of label embedding techniques refines the attention mechanisms, while the technique of using a `numerical-blind' dataset aims to bolster context-centric learning. Another key component of our research is determining the criticality of extracted numerical data. To achieve this, we utilized a simple approach that involves verifying if the value falls within the established standard ranges. Our findings are encouraging, showing substantial improvements in the effectiveness of CamemBERT-bio, surpassing conventional methods with an F1 score of 0.89. This represents an over 20\% increase over the 0.73 $F_1$ score of traditional approaches and an over 9\% increase over the 0.82 $F_1$ score of state-of-the-art approaches. All this was achieved despite using small and imbalanced training datasets.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# アダプティブ・マルチスケール網膜診断:トランスファーラーニングとシームズネットワークを活用した総合的ファンドス多値検出のためのハイブリッドトリオモデルアプローチ Adaptive Multiscale Retinal Diagnosis: A Hybrid Trio-Model Approach for Comprehensive Fundus Multi-Disease Detection Leveraging Transfer Learning and Siamese Networks ( http://arxiv.org/abs/2405.18449v1 ) ライセンス: Link先を確認	Yavuz Selim Inan,	(参考訳) WHOは、世界中の22億人以上がメディアヘイズ、緑内障、ドルーゼンなどの視覚障害に苦しんでいると宣言した。少なくとも10億件の症例は予防または治療が成功していた可能性があるが、貧困、専門医の欠如、眼科医による不正確な眼底診断、あるいはまれな疾患の存在のために未治療のままである。これを解決するために,12種類の共通眼疾患と稀眼疾患を正確に診断するハイブリッドトリオネットワークモデルアルゴリズムを開発した。このアルゴリズムは3,200基の画像のRFMiDデータセットとBinary Relevance Methodを用いて、病気を別々に検出し、拡張性を確保し、誤った相関を避ける。それぞれの検出器は、性能を最適化するために微調整されたハイパーパラメータを組み込んでおり、古典的な伝達学習CNNモデル、二段階CNNモデル、シームズネットワークの3つの特徴成分から構成されていた。診断は、このTrio-Model with Ensembled Machine Learningアルゴリズムから抽出された特徴を用いて行われた。提案したモデルの平均精度は97%、AUCスコアは0.96である。過去のベンチマークと比較すると、F1スコアの10%以上の増加は、ほとんどの疾患で見られた。さらに、シームズ・ネットワークを用いて、過去の研究では信頼性が低いために予測できなかった光ディスク口蓋裂などの疾患の予測に成功している。本発明の診断ツールは、一般的な疾患と稀な疾患の両方の早期発見をグローバル化するための、安定的で適応的で、費用効果があり、効率的で、アクセスしやすく、高速なソリューションを提供する。 WHO has declared that more than 2.2 billion people worldwide are suffering from visual disorders, such as media haze, glaucoma, and drusen. At least 1 billion of these cases could have been either prevented or successfully treated, yet they remain unaddressed due to poverty, a lack of specialists, inaccurate ocular fundus diagnoses by ophthalmologists, or the presence of a rare disease. To address this, the research has developed the Hybrid Trio-Network Model Algorithm for accurately diagnosing 12 distinct common and rare eye diseases. This algorithm utilized the RFMiD dataset of 3,200 fundus images and the Binary Relevance Method to detect diseases separately, ensuring expandability and avoiding incorrect correlations. Each detector, incorporating finely tuned hyperparameters to optimize performance, consisted of three feature components: A classical transfer learning CNN model, a two-stage CNN model, and a Siamese Network. The diagnosis was made using features extracted through this Trio-Model with Ensembled Machine Learning algorithms. The proposed model achieved an average accuracy of 97% and an AUC score of 0.96. Compared to past benchmark studies, an increase of over 10% in the F1-score was observed for most diseases. Furthermore, using the Siamese Network, the model successfully made predictions in diseases like optic disc pallor, which past studies failed to predict due to low confidence. This diagnostic tool presents a stable, adaptive, cost-effective, efficient, accessible, and fast solution for globalizing early detection of both common and rare diseases.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# スペクトルモードマッチングによる普遍量子周波数コム測定 Universal quantum frequency comb measurements by spectral mode-matching ( http://arxiv.org/abs/2405.18454v1 ) ライセンス: Link先を確認	Bakhao Dioum, Virginia D'Auria, Alessandro Zavatta, Olivier Pfister, Giuseppe Patera,	(参考訳) マルチモード干渉計の周波数コムは、フィールド符号化された量子情報に対して例外的なスケーラビリティを提供する。しかし、安定場検出法であるホモダイン検出は、いくつかのスペクトル二次構造(およびLOに関するその対称性)が到達できないため、コム全体の量子情報にアクセスすることができない。ここでは,光量子コンピューティングに必要であり,パルス型LOを用いたホモダイン検出では不可能な,多モード量子光学源の任意の1ショット計測を行うための,最初の一般的なアプローチを提案する。このアプローチでは、メモリ効果を伴う干渉計と解釈できるスペクトルモードマッチングを用いる。完全形式を導出し,マイクロキャビティアレイによる実装を提案する。 The frequency comb of a multimode interferometer offers exceptional scalability potential for field-encoded quantum information. However, the staple field detection method, homodyne detection, cannot access quantum information in the whole comb because some spectral quadratures (and their asymmetries with respect to the LO) are out of reach. We present here the first general approach to make arbitrary, one-shot measurements of a multimode quantum optical source, something that is required for photonic quantum computing and is not possible when using homodyne detection with a pulse-shaped LO. This approach uses spectral mode-matching, which can be understood as interferometry with a memory effect. We derive a complete formalism and propose an implementation by microcavity arrays.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# 反復ガウス過程における過パラメータ最適化のための線形系解法の改良 Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes ( http://arxiv.org/abs/2405.18457v1 ) ライセンス: Link先を確認	Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-Lobato,	(参考訳) 非常に大きなデータセットへのハイパーパラメータ最適化のスケーリングは、ガウスのプロセスコミュニティでは未解決の問題である。本稿では, 共役勾配, 交互射影, 確率勾配勾配などの線形系解法を用いて, 限界次数勾配を推定する反復法について述べる。解決者間で適用可能な3つの重要な改善点について論じる。 (i)パスワイズ勾配推定器で、必要な解法反復数を減らし、予測を行う計算コストを補正する。 (II) 先段からの解を用いた温かい開始線形系解法は、無視バイアスのコストでより高速な解法収束をもたらす。 3) 線形系解法は, 計算予算が限られており, 温暖化開始と相乗効果があり, 解法の進行が複数の余分な確率ステップで蓄積される。これらのテクニックは、トレランスを解決した場合に最大72\times$のスピードアップを提供し、早期停止時には平均残留ノルムを最大7\times$まで下げる。 Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community. This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions, (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias, (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps. These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# グレーボックス深部フォトニックニューラルネットワークのトレーニングのための非対称推定器 Asymmetrical estimator for training grey-box deep photonic neural networks ( http://arxiv.org/abs/2405.18458v1 ) ライセンス: Link先を確認	Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng,	(参考訳) 物理ニューラルネットワーク(PNN)は、その高帯域幅、伝搬内アナログ処理のため、ニューラルネットワークアクセラレーションの新たなパラダイムである。推論に対するPNNのアドバンテージにもかかわらず、トレーニングは依然として課題である。物理変換の不完全な情報は、バックプロパゲーション(BP)からの従来の勾配に基づく更新の失敗を意味する。本稿では、PNN構造をグレーボックスとして扱う非対称トレーニング(AT)法を提案する。 ATは、物理的な制御-変換マッピングに関する情報を必要としない、深層ニューラルネットワーク構造の最後の層出力とニューロントポロジカル接続のみを知りながら、トレーニングを実行する。我々は、未校正フォトニック集積回路(PIC)により実装された深層グレーボックスPNNに対してAT法を実験的に実証し、アイリスフラワーの分類精度を改善し、乱数推定からほぼ理論的最大値への修正MNIST手書き桁を修正した。また、MNIST, fashion-MNIST, Kuzushiji-MNISTなど、さまざまなデータセットに対するAT over BPの連続的な性能向上も紹介した。 AT法は、ハードウェアのオーバーヘッドを最小限に抑え、計算のオーバーヘッドを減らし、物理計算の利点を十分に探求するための頑丈な軽量な訓練として成功した。 Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# 空間依存対策の情報理論的ルーツの提案 Probing the Information Theoretical Roots of Spatial Dependence Measures ( http://arxiv.org/abs/2405.18459v1 ) ライセンス: Link先を確認	Zhangyu Wang, Krzysztof Janowicz, Gengchen Mai, Ivan Majic,	(参考訳) 直感的には、空間依存の測度とエントロピーの情報理論測度との間には関係がある。例えば、空間データサンプルが平均的に、期待される情報よりも少ないことを述べ、空間データが特別な理由を直感的に説明できる。同様に、圧縮が容易な空間データ、例えばリモートセンシング画像は、空間的自己相関も顕著である。情報理論の広く使われている言語における空間情報理論の(非常に特異的な)コア概念を定式化することで、それらの違いと類似性に関する新たな視点が開かれ、また、より広範なAI/MLコミュニティとの学際的なコラボレーションを促進する。しかし、この直感的な関係は形式化と一般化が難しいため、以前の研究は主にランドスケープパターンを記述する実験結果に頼っている。本研究では,空間的自己相関(特にモランのI)の情報理論のルーツを,自己情報レンズ(補題としても知られる)を通して探求し,形式的証明と実験の両方を提供する。 Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant spatial autocorrelation. Formulating our (highly specific) core concepts of spatial information theory in the widely used language of information theory opens new perspectives on their differences and similarities and also fosters cross-disciplinary collaboration, e.g., with the broader AI/ML communities. Interestingly, however, this intuitive relation is challenging to formalize and generalize, leading prior work to rely mostly on experimental results, e.g., for describing landscape patterns. In this work, we will explore the information theoretical roots of spatial autocorrelation, more specifically Moran's I, through the lens of self-information (also known as surprisal) and provide both formal proofs and experiments.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-28
# アルゴリズムが不当なまま残る理由:アルゴリズム活動にまつわる電力構造 Why Algorithms Remain Unjust: Power Structures Surrounding Algorithmic Activity ( http://arxiv.org/abs/2405.18461v1 ) ライセンス: Link先を確認	Andrew Balch,	(参考訳) アルゴリズムは私たちの社会生活においてますます重要な役割を果たす。残念なことに、彼らは社会的な不正を常習することが多い。これらのアルゴリズムの不正に対処する一般的な手段は、アルゴリズムの改革である、より公平で説明責任があり透明なアルゴリズム自体を微調整することである。しかし、批判的アルゴリズム研究の新たな分野は、アルゴリズムを取り巻くパワー構造を無視しているため、改革派アプローチがアルゴリズムの不正を抑えることに失敗したことを示している。私は、このパワー構造を分析するために、重要なアルゴリズム研究からの電話を受け、Erik Olin Wright氏によって開発されたフレームワークを使用して、アルゴリズムが社会内で研究、開発、訓練、展開される方法であるアルゴリズム活動を取り巻くパワーの構成を調べます。アルゴリズム活動が平等で非民主的で、持続不可能な理由は、それを形作る権力構造が、社会的エンパワーメントというよりも経済的なエンパワーメントの1つであるからである、と私は主張する。アルゴリズム活動が社会的に公正であるためには、アルゴリズムの反対側にいる人々に力を与えるために、このパワー構成を変える必要があります。この目的のために、私はアルゴリズム活動の文脈におけるライトの共生的、間質的、ラプチュラルな変換と、アルゴリズムを使って社会問題に対処する仮説研究プロジェクトでどのように適用されるかを探る。私は、社会的にただのアルゴリズム活動というビジョンで締めくくると、将来的な作業は、提案された変革を統合し、社会的エンパワーメントのための新しいメカニズムを開発することを目指している。 Algorithms play an increasingly-significant role in our social lives. Unfortunately, they often perpetuate social injustices while doing so. The popular means of addressing these algorithmic injustices has been through algorithmic reformism: fine-tuning the algorithm itself to be more fair, accountable, and transparent. While commendable, the emerging discipline of critical algorithm studies shows that reformist approaches have failed to curtail algorithmic injustice because they ignore the power structure surrounding algorithms. Heeding calls from critical algorithm studies to analyze this power structure, I employ a framework developed by Erik Olin Wright to examine the configuration of power surrounding Algorithmic Activity: the ways in which algorithms are researched, developed, trained, and deployed within society. I argue that the reason Algorithmic Activity is unequal, undemocratic, and unsustainable is that the power structure shaping it is one of economic empowerment rather than social empowerment. For Algorithmic Activity to be socially just, we need to transform this power configuration to empower the people at the other end of an algorithm. To this end, I explore Wright's symbiotic, interstitial, and raptural transformations in the context of Algorithmic Activity, as well as how they may be applied in a hypothetical research project that uses algorithms to address a social issue. I conclude with my vision for socially just Algorithmic Activity, asking that future work strives to integrate the proposed transformations and develop new mechanisms for social empowerment.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 標準模型物理を超えるシンボリック回帰 Symbolic Regression for Beyond the Standard Model Physics ( http://arxiv.org/abs/2405.18471v1 ) ライセンス: Link先を確認	Shehu AbdusSalam, Steve Abel, Miguel Crispim Romao,	(参考訳) 標準モデル物理学を超えて研究するための強力なツールとして,記号回帰を提案する。ベンチマークモデルとして、GUTスケールで定義された4次元パラメータ空間を持つ、いわゆる制約最小対称標準モデルを考える。本研究では、ヒッグス質量、ミューオンの異常磁気モーメントへの寄与、コールドダークマターの相対密度という理論のパラメータから、3つの低エネルギー観測対象を再現する分析式を提案する。提案手法の威力を示すために,グローバル適合解析における記号表現を用いて,従来の手法と比較して極めて高速に得られるパラメータの後方確率密度を導出する。 We propose symbolic regression as a powerful tool for studying Beyond the Standard Model physics. As a benchmark model, we consider the so-called Constrained Minimal Supersymmetric Standard Model, which has a four-dimensional parameter space defined at the GUT scale. We provide a set of analytical expressions that reproduce three low-energy observables of interest in terms of the parameters of the theory: the Higgs mass, the contribution to the anomalous magnetic moment of the muon, and the cold dark matter relic density. To demonstrate the power of the approach, we employ the symbolic expressions in a global fits analysis to derive the posterior probability densities of the parameters, which are obtained extremely rapidly in comparison with conventional methods.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 有限温度ライドバーグアレイ:量子相と絡み合い特性 Finite-temperature Rydberg arrays: quantum phases and entanglement characterization ( http://arxiv.org/abs/2405.18477v1 ) ライセンス: Link先を確認	Nora Reinić, Daniel Jaschke, Darvin Wanisch, Pietro Silvi, Simone Montangero,	(参考訳) アナログ量子シミュレータの最も顕著なプラットフォームの一つとして、Rydberg原子配列は量子相と遷移を探索するための有望なツールである。 1次元Rydberg系の基底状態特性は、既に徹底的に検討されているが、解析は有限温度シナリオに向けて拡張されている。本研究では, 熱平衡における量子多体状態を構築するためのテンソルネットワークに基づく数値ツールボックスを開発し, 古典的相関や絡み合いモノトンを探索する。有限系サイズの熱ゆらぎにより連続的に収縮する秩序相を観察した。さらに, 半系分岐の絡み合いと絡み合いの負性性を調べることにより, 絡み合いの共形スケーリング則が0温度臨界点から低温状態へ広がることを数値的に確認する。 As one of the most prominent platforms for analog quantum simulators, Rydberg atom arrays are a promising tool for exploring quantum phases and transitions. While the ground state properties of one-dimensional Rydberg systems are already thoroughly examined, we extend the analysis towards the finite-temperature scenario. For this purpose, we develop a tensor network-based numerical toolbox for constructing the quantum many-body states at thermal equilibrium, which we exploit to probe classical correlations as well as entanglement monotones. We clearly observe ordered phases continuously shrinking due to thermal fluctuations at finite system sizes. Moreover, by examining the entanglement of formation and entanglement negativity of a half-system bipartition, we numerically confirm that a conformal scaling law of entanglement extends from the zero-temperature critical points into the low-temperature regime.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# サブ波長原子配列における集合的基底状態冷却 Collectively enhanced ground-state cooling in subwavelength atomic arrays ( http://arxiv.org/abs/2405.18482v1 ) ライセンス: Link先を確認	Oriol Rubies-Bigorda, Raphael Holzinger, Ana Asenjo-Garcia, Oriol Romero-Isart, Helmut Ritsch, Stefan Ostermann, Carlos Gonzalez-Ballestero, Susanne F. Yelin, Cosimo C. Rusconi,	(参考訳) 自由空間におけるサブ波長原子配列は、創発的な多体量子現象を探索する主要なプラットフォームになりつつある。これらのアレイは強い光誘起双極子-双極子相互作用を特徴とし、狭い線幅を特徴とするサブラジアント集団共鳴をもたらす。本研究では、これらの狭い集団共鳴を利用したサブ波長アレイに閉じ込められた原子のサイドバンド冷却方式を提案する。我々は、原子の内的自由度を断熱的に除去し、原子運動の効果的なマスター方程式を導出し、その予測を全系の数値シミュレーションで検証する。この結果から, サブラジアント共鳴により, 原子のアンサンブルが, 双極子相互作用を伴わない温度に冷却できることが示唆された。注目すべきは、個々の原子遷移がそうでない場合でも、狭い集団共鳴をサイドバンド分解することができることである。このようなシナリオでは、光誘起双極子-双極子相互作用により、基底状態の冷却が実現可能である。このアプローチは、エミッターの密集したアンサンブルに基づく将来の量子技術に利用することができ、運動制御の強化のために多体共生崩壊を利用するための道を開くことができる。 Subwavelength atomic arrays in free space are becoming a leading platform for exploring emergent many-body quantum phenomena. These arrays feature strong light-induced dipole-dipole interactions, resulting in subradiant collective resonances characterized by narrowed linewidths. In this work, we present a sideband cooling scheme for atoms trapped in subwavelength arrays that utilizes these narrow collective resonances. We derive an effective master equation for the atomic motion by adiabatically eliminating the internal degrees of freedom of the atoms, and validate its prediction with numerical simulations of the full system. Our results demonstrate that subradiant resonances enable the cooling of ensembles of atoms to temperatures lower than those achievable without dipole interactions, provided the atoms have different trap frequencies. Remarkably, narrow collective resonances can be sideband-resolved even when the individual atomic transition is not. In such scenarios, ground state cooling becomes feasible solely due to light-induced dipole-dipole interactions. This approach could be utilized for future quantum technologies based on dense ensembles of emitters, and paves the way towards harnessing many-body cooperative decay for enhanced motional control.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# オープンドメインテキスト駆動型マルチパーソン運動合成に向けて Towards Open Domain Text-Driven Synthesis of Multi-Person Motions ( http://arxiv.org/abs/2405.18483v1 ) ライセンス: Link先を確認	Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitch Hill,	(参考訳) この研究は、テキスト記述から複数の人間の自然な、多様な集団の動きを生成することを目的としている。シングル・パーソン・テキスト・トゥ・モーション・ジェネレーションは広く研究されているが、利用可能なデータセットが欠如しているため、ワン・ツー・モーション・プロンプトから1つか2つ以上の被験者の動作を合成することは依然として困難である。本研究では,大規模な画像やビデオからのポーズ情報を推定することにより,人間のポーズと動きのデータセットをキュレートする。我々のモデルはトランスフォーマーベースの拡散フレームワークを使用しており、複数の主題やフレームを持つ複数のデータセットに対応しています。実験では,複数人物の静的ポーズの生成と複数人物の動作シーケンスの生成の両方を探索する。我々の知る限り、本手法は、多種多様なテキストプロンプトから多目的運動列を多種多様な多様性と忠実度で生成する最初の方法である。 This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pose information from large-scale image and video datasets. Our models use a transformer-based diffusion framework that accommodates multiple datasets with any number of subjects or frames. Experiments explore both generation of multi-person static poses and generation of multi-person motion sequences. To our knowledge, our method is the first to generate multi-subject motion sequences with high diversity and fidelity from a large variety of textual prompts.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 最小絡み合った典型的な熱状態を用いた分光と複素時間相関 Spectroscopy and complex-time correlations using minimally entangled typical thermal states ( http://arxiv.org/abs/2405.18484v1 ) ライセンス: Link先を確認	Zhenjiu Wang, Paul McClarty, Dobromila Dankova, Andreas Honecker, Alexander Wietek,	(参考訳) テンソルネットワーク状態は強い相関物理学の側面を捉えて大きな成功を収めた。しかし,非零温度での動的相関器の取得は,これらの手法を用いても一般に困難である。本稿では,最小絡み合った典型的な熱状態(METTS)を用いた相関器の計算方法を提案する。本手法は,物理演算子の動的相関を実時間で直接計算するが,複素時間平面上で相関が評価される拡張を提案する。虚時成分は絡み合い成長の速度を束縛し、より大きなシステムサイズの研究を可能にする計算困難を強く緩和する。物理相関器を抽出するには、純粋にリアルタイムな進化の限界を取る必要がある。私たちはこの情報を得るための2つのルートを提示します。 (i)複素時間における解析相関関数と確率論的解析継続法を組み合わせることにより、実時間限界を求める。 (2) 数値解析継続の努力を必要とせず, 漸近的に所望の相関関数を定量的にキャプチャするエルミチアン相関関数。これらの数値的手法は、2次元のスピン1/半の相互作用モデルであるシャストリー・サザーランドモデルの有限温度ダイナミクスを捉える。 Tensor network states have enjoyed great success at capturing aspects of strong correlation physics. However, obtaining dynamical correlators at non-zero temperatures is generically hard even using these methods. Here, we introduce a practical approach to computing such correlators using minimally entangled typical thermal states (METTS). While our primary method directly computes dynamical correlators of physical operators in real time, we propose extensions where correlations are evaluated in the complex-time plane. The imaginary time component bounds the rate of entanglement growth and strongly alleviates the computational difficulty allowing the study of larger system sizes. To extract the physical correlator one must take the limit of purely real-time evolution. We present two routes to obtaining this information (i) via an analytic correlation function in complex time combined with a stochastic analytic continuation method to obtain the real-time limit and (ii) a hermitian correlation function that asymptotically captures the desired correlation function quantitatively without requiring effort of numerical analytic continuation. We show that these numerical techniques capture the finite-temperature dynamics of the Shastry-Sutherland model - a model of interacting spin one-half in two dimensions.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 衛星画像における火山活動の異常検出 Anomaly detection for the identification of volcanic unrest in satellite imagery ( http://arxiv.org/abs/2405.18487v1 ) ライセンス: Link先を確認	Robert Gabriel Popescu, Nantheera Anantrasirichai, Juliet Biggs,	(参考訳) 衛星画像は噴火前に火山の変形を検出する可能性があるが、大量の画像が日常的に取得される一方で、火山の変形イベントを含むのはごくわずかである。手動検査はこれらの異常を見逃しかねず、教師付き学習でモデル化された自動システムは適切にラベル付けされたデータセットを必要とする。これらの課題に対処するために, 衛星データにおける教師なし深層学習を用いて, 火山変形を異常として識別する方法について検討した。我々の検出器はパッチ分布モデリング(PaDiM)に基づいており、検出性能は重み付けされた距離で向上し、より深い層の特徴をより重要視する。さらに,ノイズや不完全データを扱うための前処理手法を提案する。最終フレームワークは, 変形特性が異なる5つの火山で試験し, その性能を火山変形検出の教師付き学習法と比較した。 Satellite images have the potential to detect volcanic deformation prior to eruptions, but while a vast number of images are routinely acquired, only a small percentage contain volcanic deformation events. Manual inspection could miss these anomalies, and an automatic system modelled with supervised learning requires suitably labelled datasets. To tackle these issues, this paper explores the use of unsupervised deep learning on satellite data for the purpose of identifying volcanic deformation as anomalies. Our detector is based on Patch Distribution Modeling (PaDiM), and the detection performance is enhanced with a weighted distance, assigning greater importance to features from deeper layers. Additionally, we propose a preprocessing approach to handle noisy and incomplete data points. The final framework was tested with five volcanoes, which have different deformation characteristics and its performance was compared against the supervised learning method for volcanic deformation detection.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 基底状態特性の予測:定数サンプル複雑度とディープラーニングアルゴリズム Predicting Ground State Properties: Constant Sample Complexity and Deep Learning Algorithms ( http://arxiv.org/abs/2405.18489v1 ) ライセンス: Link先を確認	Marc Wanner, Laura Lewis, Chiranjib Bhattacharyya, Devdatt Dubhashi, Alexandru Gheorghiu,	(参考訳) 量子多体物理学における基本的な問題は、局所ハミルトニアンの基底状態を見つけることである。最近の多くの研究は、基底状態の学習に証明可能な効率的な機械学習(ML)アルゴリズムを提供した。具体的には、[Huang et al Science 2022] は、同じ状態のハミルトンからサンプリングされたデータポイントに対して、$n$-qubitのギャップを持つ局所ハミルトン$H$の基底状態の学習方法を導入した。その後、[Lewis et al Nature Communications 2024] によって$n$-qubit 系の幾何が知られているとき、$\mathcal{O}(\log n)$サンプルに改良された。本研究では, 基底状態特性を学習するためのシステムサイズ$n$とは無関係に, 一定のサンプル複雑性を実現するための2つのアプローチを提案する。我々の最初のアルゴリズムは、Lewis et al が使用するMLモデルの簡単な修正から成り、前もって知られていた利害関係に適用される。我々の第2のアルゴリズムは、たとえその特性の説明がわからないとしても適用され、ディープニューラルネットワークモデルである。ニューラルネットワークの性能を示す実験結果が実証されているが、我々の知る限り、これは基底状態特性を予測するニューラルネットワークモデルに束縛された初めての厳密なサンプル複雑性である。また,従来の結果と比較して,提案手法のスケーリング改善を確認する数値実験を行った。 A fundamental problem in quantum many-body physics is that of finding ground states of local Hamiltonians. A number of recent works gave provably efficient machine learning (ML) algorithms for learning ground states. Specifically, [Huang et al. Science 2022], introduced an approach for learning properties of the ground state of an $n$-qubit gapped local Hamiltonian $H$ from only $n^{\mathcal{O}(1)}$ data points sampled from Hamiltonians in the same phase of matter. This was subsequently improved by [Lewis et al. Nature Communications 2024], to $\mathcal{O}(\log n)$ samples when the geometry of the $n$-qubit system is known. In this work, we introduce two approaches that achieve a constant sample complexity, independent of system size $n$, for learning ground state properties. Our first algorithm consists of a simple modification of the ML model used by Lewis et al. and applies to a property of interest known beforehand. Our second algorithm, which applies even if a description of the property is not known, is a deep neural network model. While empirical results showing the performance of neural networks have been demonstrated, to our knowledge, this is the first rigorous sample complexity bound on a neural network model for predicting ground state properties. We also perform numerical experiments that confirm the improved scaling of our approach compared to earlier results.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# LLMと記憶:著作権コンプライアンスの品質と特異性について LLMs and Memorization: On Quality and Specificity of Copyright Compliance ( http://arxiv.org/abs/2405.18492v1 ) ライセンス: Link先を確認	Felix B Mueller, Rebekka Görge, Anna K Bernzen, Janna C Pirk, Maximilian Poretschkin,	(参考訳) 大規模言語モデル(LLM)のメモリ化が懸念されている。 LLMは、著作権のある作品を含むトレーニングデータの一部を容易に再現できることが示されている。これは、欧州AI法と同様に、既存の著作権法に違反している可能性があるため、解決すべき重要な問題である。本研究では,欧州法を例に,LLMにおける著作権侵害の可能性を定量化するための体系的な分析法を提案する。従来の研究と異なり、現実的なエンドユーザーシナリオにおける命令精細モデルの評価を行う。我々の分析は160文字のしきい値に基づいており、ドイツ著作権サービス提供法とファジィテキストマッチングアルゴリズムから借りている。著作権及びパブリックドメインデータのモデル行動を比較することにより、著作権侵害対策の特異性を分析する。本研究では,保護されたテキスト(拒絶や幻覚など)を生成する代わりに,行動モデルがどのような行動を示すかを検討するとともに,これらの行動に関する最初の法的評価を行う。著作権の遵守, 明細性, 適切な拒絶には, 人気のLCM間で大きな違いがあることが判明した。 Alpaca、GPT 4、GPT 3.5、Luminousは、OpenGPT-X、Alpaca、Luminousと比べ、特に低い数の著作権侵害を発生させる。コードはまもなく公開される予定だ。 Memorization in large language models (LLMs) is a growing concern. LLMs have been shown to easily reproduce parts of their training data, including copyrighted work. This is an important problem to solve, as it may violate existing copyright laws as well as the European AI Act. In this work, we propose a systematic analysis to quantify the extent of potential copyright infringements in LLMs using European law as an example. Unlike previous work, we evaluate instruction-finetuned models in a realistic end-user scenario. Our analysis builds on a proposed threshold of 160 characters, which we borrow from the German Copyright Service Provider Act and a fuzzy text matching algorithm to identify potentially copyright-infringing textual reproductions. The specificity of countermeasures against copyright infringement is analyzed by comparing model behavior on copyrighted and public domain data. We investigate what behaviors models show instead of producing protected text (such as refusal or hallucination) and provide a first legal assessment of these behaviors. We find that there are huge differences in copyright compliance, specificity, and appropriate refusal among popular LLMs. Alpaca, GPT 4, GPT 3.5, and Luminous perform best in our comparison, with OpenGPT-X, Alpaca, and Luminous producing a particularly low absolute number of potential copyright violations. Code will be published soon.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 視覚課題における第2モーメント指数スケーリング最適化器の統一バランス理論 The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks ( http://arxiv.org/abs/2405.18498v1 ) ライセンス: Link先を確認	Gongyue Zhang, Honghai Liu,	(参考訳) 可変第2モーメント指数スケーリング(SMES)を用いて、一階最適化器を統一する潜在的な方法を特定した。バック伝搬から始まり、勾配の消滅や爆発のような古典的な現象に対処し、データセットのスパーシリティに関連する問題に対処し、最適化におけるバランスの理論を導入する。この理論により、SGDと適応オプティマイザはより広範な推論の下で統一され、一階オプティマイザの一般化された公式内でバランスの取れたアプローチを達成するために、変動的な指数的スケーリングを採用することが提案される。いくつかの古典的データセットやネットワーク上で,バランス係数の違いがトレーニングプロセス全体に与える影響を確認する試験を行った。 We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 分類のための大マルジン識別損失 Large Margin Discriminative Loss for Classification ( http://arxiv.org/abs/2405.18499v1 ) ライセンス: Link先を確認	Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone,	(参考訳) 本稿では,Deep Learningの文脈において,大きなマージンを有する新たな識別的損失関数を提案する。この損失は、クラス内コンパクト性とクラス間分離性によって表されるニューラルネットの識別力を高める。一方、クラスコンパクト性は、同じクラスのサンプル同士の近接距離によって保証される。一方、クラス間の分離性は、各クラスから最も近い境界までの最小距離を保証するマージン損失によって促進される。私たちの損失のすべての用語は明示的な意味を持ち、得られた特徴空間の直接的なビューを与えます。本研究では,コンパクト度とマージン項の関係を数学的に解析し,ハイパーパラメータが学習特徴に与える影響に関する指針を与える。さらに、ニューラルネットのパラメータに関する損失の勾配特性も解析する。これに基づいて、トレーニングにおける安定性と一貫性を同時に享受する部分運動量更新と呼ばれる戦略を設計する。さらに,より理論的な洞察を得るため,一般化誤差についても検討する。我々の損失関数は、実験における標準ソフトマックス損失と比較して、モデルの試験精度を体系的に向上させる。 In this paper, we introduce a novel discriminative loss function with large margin in the context of Deep Learning. This loss boosts the discriminative power of neural nets, represented by intra-class compactness and inter-class separability. On the one hand, the class compactness is ensured by close distance of samples of the same class to each other. On the other hand, the inter-class separability is boosted by a margin loss that ensures the minimum distance of each class to its closest boundary. All the terms in our loss have an explicit meaning, giving a direct view of the feature space obtained. We analyze mathematically the relation between compactness and margin term, giving a guideline about the impact of the hyper-parameters on the learned features. Moreover, we also analyze properties of the gradient of the loss with respect to the parameters of the neural net. Based on this, we design a strategy called partial momentum updating that enjoys simultaneously stability and consistency in training. Furthermore, we also investigate generalization errors to have better theoretical insights. Our loss function systematically boosts the test accuracy of models compared to the standard softmax loss in our experiments.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# SoundCTM:テキスト・ツー・サウンド・ジェネレーションのためのスコアベース・一貫性モデル SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation ( http://arxiv.org/abs/2405.18503v1 ) ライセンス: Link先を確認	Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji,	(参考訳) サウンドコンテンツは、ビデオゲーム、音楽、映画などのマルチメディア作品にとって欠かせない要素である。最近の高品質な拡散型音響生成モデルは、クリエイターにとって貴重なツールとなりうる。しかし、高品質な音を出すにもかかわらず、これらのモデルは推論速度が遅い。この欠点は、通常、試行錯誤によって音を洗練させ、芸術的な意図と整合させるクリエーターの負担を和らげる。この問題に対処するため,SoundCTM(Sound Consistency Trajectory Models)を導入する。提案モデルは,高品位1段音生成と高品位1段音生成との柔軟な遷移を可能にする。これにより、クリエーターは最初は1ステップのサンプルで音をコントロールし、マルチステップ生成によってそれを精製することができる。 CTMは基本的にフレキシブルな1ステップとマルチステップの生成を実現するが、その顕著な性能は追加の事前訓練された特徴抽出器と、他のドメインでは必ずしも利用できない訓練に高価である敵の損失に大きく依存する。そこで我々は,CTMのトレーニングフレームワークを再構築し,蒸留損失に教師のネットワークを活用することにより,新たな特徴距離を導入する。さらに, 分類器を含まない誘導軌道を蒸留しながら, 条件付きおよび無条件の学生モデルを同時に訓練し, 推論中にそれらのモデルを補間する。また,SoundCTMのフレキシブルサンプリング機能を活用して,トレーニング不要な制御可能なフレームワークを提案する。 SoundCTMは、余分なオフザシェルフネットワークを使わずに、有望な1ステップと複数ステップのリアルタイムサウンド生成を実現する。さらに,SoundCTMの可制御音発生能力について,無訓練で実演する。 Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 監視格子ゲージ理論における対称性-保護ゼノ相転移 Symmetry-protection Zeno phase transition in monitored lattice gauge theories ( http://arxiv.org/abs/2405.18504v1 ) ライセンス: Link先を確認	Matteo M. Wauters, Edoardo Ballini, Alberto Biella, Philipp Hauke,	(参考訳) 量子測定はシステム力学に大きな影響を及ぼす。これらは量子ゼノ効果のような複雑な非平衡現象を引き起こし、量子シミュレーションにおける誤差の緩和に使用できる。このような能力は格子ゲージ理論(LGT)において特に有用であり、多くの局所保存法則の保存が困難である。調整された量子測定がゲージ対称性の破れを和らげることは知られているが、この保護の性質、特にしきい値の挙動の可能性はまだ解明されていない。ここでは、測定速度によって引き起こされる鋭い遷移の存在を、シミュレーション誤差に抵抗する保護ゲージ理論則と不規則則との間に示す。この結果は 1+1d $\mathbb{Z}_2$ LGT のパラダイム的な例に基づいている。局所対称性発生器に結合した補助量子ビットの射影的測定により保護を詳細に検討し、この手法をアナログ(弱)測定プロトコルと比較する。連続時間制限におけるアンサンブル平均は、同じリウヴィリア力学を共有するが、確率ゲージ保護プロトコルの異なる物理実装は、非常に異なる統計量を持つ軌道解法を生成する。さらに,ビットフリップ誤りを訂正し,離散時間スキームを大幅に向上するオンチップフィードバック機構を設計する。我々の結果は、強い相互作用を持つ高制約量子系の散逸臨界性に光を当て、ゲージ理論量子シミュレーションの誤差軽減と補正に関する貴重な洞察を提供する。 Quantum measurements profoundly influence system dynamics. They lead to complex nonequilibrium phenomena like the quantum Zeno effect, and they can be used for mitigating errors in quantum simulations. Such an ability is particularly valuable for lattice gauge theories (LGTs), which require the challenging preservation of an extensive number of local conservation laws. While it is known that tailored quantum measurements can soften violations of gauge symmetry, the nature of this protection, and in particular the possibility of a threshold behavior, is still unexplored. Here, we demonstrate the existence of a sharp transition, triggered by the measurement rate, between a protected gauge-theory regime resistant to simulation errors and an irregular regime. Our results are based on the paradigmatic example of a 1+1d $\mathbb{Z}_2$ LGT. We study in detail the protection through projective measurements of ancillary qubits coupled to the local symmetry generators, and compare this approach with analog (weak) measurement protocols. We show that, while the resulting ensemble averages in the continuous-time limit share the same Liouvillian dynamics, different physical implementations of the stochastic gauge protection protocol yield trajectory unravelings with vastly different statistics. Additionally, we design an on-chip feedback mechanism that corrects bit-flip errors and significantly enhances the discrete-time scheme. Our results shed light on the dissipative criticality of strongly-interacting, highly-constrained quantum systems, and they offer valuable insights into error mitigation and correction of gauge-theory quantum simulations.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# フローサイトメトリー予測のためのグラフニューラルネットワークへの階層的生物前駆体注入 Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction ( http://arxiv.org/abs/2405.18507v1 ) ライセンス: Link先を確認	Fatemeh Nassajian Mojarrad, Lorenzo Bini, Thomas Matthes, Stéphane Marchand-Maillet,	(参考訳) フローサイトメトリー(FC)データから得られた末梢血や骨髄などの血液学的サンプルの複雑な景観において、細胞レベルでの予測は深刻な課題を呈している。本研究では、グラフニューラルネットワーク(GNN)に階層的な事前知識を注入して、表層セルデータの単一セルマルチクラス分類を行う。データをグラフとして表現し,クラス間の階層的関係を符号化することにより,複数のGNNモデル,すなわちFCHC-GNNに適用可能な階層的プラグイン手法を提案する。 19人の異なる患者のコホートに対する大規模な実験により、階層的な生物学的制約を取り入れることによって、複数の指標においてパフォーマンスが著しく向上することが実証された。提案手法は, 複雑な生物予測タスクにおける一般化向上のための構造的帰納バイアスの重要性を強調した。 In the complex landscape of hematologic samples such as peripheral blood or bone marrow derived from flow cytometry (FC) data, cell-level prediction presents profound challenges. This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of tabular cellular data. By representing the data as graphs and encoding hierarchical relationships between classes, we propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain. Extensive experiments on our cohort of 19 distinct patients, demonstrate that incorporating hierarchical biological constraints boosts performance significantly across multiple metrics compared to baseline GNNs without such priors. The proposed approach highlights the importance of structured inductive biases for gaining improved generalization in complex biological prediction tasks.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# AIと人間の感情アライメントの改善: 安定拡散v1, DALL-E2, DALL-E3で表される感情の人間のレーティング Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3 ( http://arxiv.org/abs/2405.18510v1 ) ライセンス: Link先を確認	James Derek Lomas, Willem van der Maden, Sohhom Bandyopadhyay, Giovanni Lion, Nirmal Patel, Gyanesh Jain, Yanna Litowsky, Haian Xue, Pieter Desmet,	(参考訳) 生成AIシステムは、テキストや画像を通じて感情を表現する能力がますます高まっている。効果的な感情表現は、AIシステム、特に人間のメンタルヘルスと幸福をサポートするように設計されたシステムにおいて、大きな役割を果たす可能性が高い。これは、AI表現された感情と人間の感情の知覚との整合をよりよく理解するために、我々の現在の研究を動機付けます。 AIが特定の感情を表現しようとするとき、その感情が成功するかどうかをどうやって評価すればよいのか? この問いに答えるために、私たちは、生成的AIによって表現される感情と人間の知覚との整合性を測定する調査を設計した。 3つの生成画像モデル(DALL-E 2、DALL-E 3、Stable Diffusion v1)を用いて240のサンプル画像を生成した。 Prolificのウェブサイトから募集された24人の参加者は、感情を生成するために使用されるテキストプロンプト(つまり「感情を楽しませるロボット」)とAIが生成する感情表現のアライメントを評価した。評価の結果,生成型AIモデルでは,人間の感情に順応した感情表現を生成できることが示唆されたが,そのアライメントは使用するAIモデルと感情そのものに大きく依存していることが示唆された。これらのシステムの性能の変動を分析し、将来の改善のためのギャップを特定する。我々は、メンタルヘルスと幸福をサポートするように設計された将来のAIシステムへの影響についての議論で締めくくった。 Generative AI systems are increasingly capable of expressing emotions via text and imagery. Effective emotional expression will likely play a major role in the efficacy of AI systems -- particularly those designed to support human mental health and wellbeing. This motivates our present research to better understand the alignment of AI expressed emotions with the human perception of emotions. When AI tries to express a particular emotion, how might we assess whether they are successful? To answer this question, we designed a survey to measure the alignment between emotions expressed by generative AI and human perceptions. Three generative image models (DALL-E 2, DALL-E 3 and Stable Diffusion v1) were used to generate 240 examples of images, each of which was based on a prompt designed to express five positive and five negative emotions across both humans and robots. 24 participants recruited from the Prolific website rated the alignment of AI-generated emotional expressions with a text prompt used to generate the emotion (i.e., "A robot expressing the emotion amusement"). The results of our evaluation suggest that generative AI models are indeed capable of producing emotional expressions that are well-aligned with a range of human emotions; however, we show that the alignment significantly depends upon the AI model used and the emotion itself. We analyze variations in the performance of these systems to identify gaps for future improvement. We conclude with a discussion of the implications for future AI systems designed to support mental health and wellbeing.	翻訳日:2024-05-30 22:13:00 公開日:2024-05-28
# 脳疾患とセグメンテーションのためのMRIデータベースを用いた共同学習の可能性とメリット Feasibility and benefits of joint learning from MRI databases with different brain diseases and modalities for segmentation ( http://arxiv.org/abs/2405.18511v1 ) ライセンス: Link先を確認	Wentian Xu, Matthew Moffat, Thalia Seale, Ziyun Liang, Felix Wagner, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, Abhirup Banerjee, Konstantinos Kamnitsas,	(参考訳) マルチモーダルMRIにおける脳病変のセグメンテーションモデルは通常、特定の疾患のプロトコルによって決定されるMRIモダリティのセットが予め定義された単一のデータベースを用いて、特定の病理のために訓練される。さまざまなMRIモダリティとさまざまな脳病理のためのアノテーションを含む複数のデータベースを使用してモデルをトレーニングすることは可能か? この共同学習は、トレーニング中に利用可能なモダリティと病理のセットのパフォーマンスに恩恵をもたらすだろうか? モダリティと病理の異なる新しいデータベースを解析することは可能だろうか? 我々は、様々な手法を開発し、比較し、モデルとトレーニングフレームワークに適切な、シンプルで実践的な変更を加えることで、有望な結果が得られることを示す。われわれは5種類の脳病理と異なるMRIモダリティを含む7つのデータベースを実験した。その結果、異なる脳病理と一連のモダリティを持つマルチモーダルMRIデータベースのジョイントトレーニングが実現可能であり、実用的な利点をもたらすことが初めて示された。これにより、トレーニング中に遭遇した病理を様々なモダリティのセットで分割し、フォローアップファインタニングのような新しいタイプの病理を分割することが可能になる。本研究は, このパラダイムの可能性と限界を考察し, 今後の方向性を導く上で有用であることが示唆された。コードおよび事前訓練されたモデル:https://github.com/WenTXuL/MultiUnet Models for segmentation of brain lesions in multi-modal MRI are commonly trained for a specific pathology using a single database with a predefined set of MRI modalities, determined by a protocol for the specific disease. This work explores the following open questions: Is it feasible to train a model using multiple databases that contain varying sets of MRI modalities and annotations for different brain pathologies? Will this joint learning benefit performance on the sets of modalities and pathologies available during training? Will it enable analysis of new databases with different sets of modalities and pathologies? We develop and compare different methods and show that promising results can be achieved with appropriate, simple and practical alterations to the model and training framework. We experiment with 7 databases containing 5 types of brain pathologies and different sets of MRI modalities. Results demonstrate, for the first time, that joint training on multi-modal MRI databases with different brain pathologies and sets of modalities is feasible and offers practical benefits. It enables a single model to segment pathologies encountered during training in diverse sets of modalities, while facilitating segmentation of new types of pathologies such as via follow-up fine-tuning. The insights this study provides into the potential and limitations of this paradigm should prove useful for guiding future advances in the direction. Code and pretrained models: https://github.com/WenTXuL/MultiUnet	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# グラフアルゴリズムによる変圧器推論能力の理解 Understanding Transformer Reasoning Capabilities via Graph Algorithms ( http://arxiv.org/abs/2405.18512v1 ) ライセンス: Link先を確認	Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni,	(参考訳) どのトランスフォーマースケーリングレジームが、アルゴリズムのさまざまなクラスを完璧に解決できるのか? トランスフォーマーベースのニューラルネットワークによって、膨大な経験的進歩が達成されている一方で、現実的なパラメータ体系におけるアルゴリズム推論能力に関する理論的理解が欠如している。本稿では,ネットワークの深さ,幅,アルゴリズム実行のための余分なトークン数の観点から,この問題を考察する。我々の新しい表現階層は、9つのアルゴリズム的推論問題を、異なる現実的なパラメータスケーリング方式の変換器で解けるクラスに分離する。グラフ接続のようなタスクには対数深さが必要で十分であることを示す一方、埋め込み次元の小さい単一層トランスは文脈的検索タスクを解くことができる。また、GraphQAベンチマークを用いて、経験的証拠を多用した理論解析も支援している。これらの結果は、トランスフォーマーが多くのグラフ推論タスクで優れており、特殊なグラフニューラルネットワークよりも優れていることを示している。 Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# Atlas3D:シミュレーションと製作のための物理的に制約されたセルフ・サポーティング・テキスト・トゥ・3D Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication ( http://arxiv.org/abs/2405.18515v1 ) ライセンス: Link先を確認	Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, Chenfanfu Jiang,	(参考訳) 既存の拡散ベースのテキスト・ツー・3D生成手法は主に視覚的にリアルな形状や外観を作り出すことに焦点を当てており、下流のタスクに必要な物理的な制約を無視することが多い。生成したモデルは物理ベースのシミュレーションや3Dプリントでバランスを保つのにしばしば失敗する。このバランスは、対話型ゲーム、具体化されたAI、ロボット工学におけるユーザーデザインの意図を満たすために不可欠である。さらに、安定したモデルでは、家庭装飾用のフィギュアのような3Dプリントされたオブジェクトが、追加のサポートを必要とせずに、単独で立ち上がることが保証されている。このギャップを埋めるために,既存のスコア蒸留サンプリング(SDS)ベースのテキスト・ツー・3Dツールを強化する,自動で実装が容易なAtlas3Dを導入する。 Atlas3Dは、重力、接触、摩擦の下での物理的安定性の法則に従う自己支持型3Dモデルの生成を保証する。提案手法は,従来のフレームワークのリファインメントや後処理モジュールとして機能する,新しい微分可能なシミュレーションベース損失関数と物理的にインスパイアされた正規化を組み合わせたものである。我々は、Atlas3Dの有効性を広範囲な生成タスクを通して検証し、シミュレーションと実環境の両方で結果の3Dモデルを検証する。 Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embodied AI, and robotics, where stable models are needed for reliable interaction. Additionally, stable models ensure that 3D-printed objects, such as figurines for home decoration, can stand on their own without requiring additional supports. To fill this gap, we introduce Atlas3D, an automatic and easy-to-implement method that enhances existing Score Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the generation of self-supporting 3D models that adhere to physical laws of stability under gravity, contact, and friction. Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization, serving as either a refinement or a post-processing module for existing frameworks. We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# LSTM-COXモデル:反復イベント処理のための簡潔かつ効率的な深層学習手法 LSTM-COX Model: A Concise and Efficient Deep Learning Approach for Handling Recurrent Events ( http://arxiv.org/abs/2405.18518v1 ) ライセンス: Link先を確認	Zhang Runquan, Shi Xiaoping,	(参考訳) 現在の臨床医学の分野では、リカレント事象を解析するための従来の手法は、複雑な時間依存データを扱う際に制限がある。本研究では,Long Short-Term Memory Network (LSTM) とCoxモデルを組み合わせることで,動的時間的情報を用いて繰り返しイベントを解析する際のモデルの性能を向上させる。従来のモデルと比較して、LSTM-Coxモデルは臨床リスクの特徴抽出の精度を大幅に向上させ、シミュレーションデータセット上での良好な性能を維持しつつ、Akaike Information Criterion(AIC)の低い値を示す。膀胱癌再発データを実験的に解析し, トレーニング期間中の平均2乗誤差を低減し, テストセットで最大0.90のコンコーダンス指数を達成した。さらに,高リスク群と低リスク群を効果的に区別し,腫瘍再発数や最大サイズなどの再発リスクの特徴を他の研究および臨床試験結果と一致させた。本研究は,再帰的データの解析と特徴抽出の簡便かつ効率的な方法を提供するだけでなく,深層学習技術を臨床リスク予測システムに統合するための便利な経路を提供する。 In the current field of clinical medicine, traditional methods for analyzing recurrent events have limitations when dealing with complex time-dependent data. This study combines Long Short-Term Memory networks (LSTM) with the Cox model to enhance the model's performance in analyzing recurrent events with dynamic temporal information. Compared to classical models, the LSTM-Cox model significantly improves the accuracy of extracting clinical risk features and exhibits lower Akaike Information Criterion (AIC) values, while maintaining good performance on simulated datasets. In an empirical analysis of bladder cancer recurrence data, the model successfully reduced the mean squared error during the training phase and achieved a Concordance index of up to 0.90 on the test set. Furthermore, the model effectively distinguished between high and low-risk patient groups, and the identified recurrence risk features such as the number of tumor recurrences and maximum size were consistent with other research and clinical trial results. This study not only provides a straightforward and efficient method for analyzing recurrent data and extracting features but also offers a convenient pathway for integrating deep learning techniques into clinical risk prediction systems.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# オフライン型アクター臨界:深部オフポリシィRLの最適歴史的挙動を適応的に曲げる Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL ( http://arxiv.org/abs/2405.18520v1 ) ライセンス: Link先を確認	Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan,	(参考訳) オフ・ポリティクス強化学習(RL)は、以前に収集したデータを政策学習に活用することにより、多くの複雑な現実世界のタスクに取り組むことで顕著な成功を収めた。しかし、既存のRLアルゴリズムのほとんどは、リプレイバッファ内の情報を最大限に活用することができず、サンプル効率とポリシー性能を制限している。本研究では,共有オンライン再生バッファをベースとしたオフラインRLポリシーの同時学習が,本来のオンライン学習ポリシーより優れていることを発見した。これは、オンラインのポリシー学習を改善するために、オフラインの最適ポリシーを突発的に改善する新たな可能性の動機となっている。この知見に基づき,モデルのないオンラインRLフレームワークであるOBAC(Offline-Boosted Actor-Critic)を提案する。実験の結果,OBACは他のモデルフリーのRLベースラインよりも優れており,6つのタスクスイートにまたがる53のタスクにまたがるサンプル効率と漸近性能の点で,高度なモデルベースRLメソッドと競合することがわかった。 Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline RL policy based on the shared online replay buffer can sometimes outperform the original online learning policy, though the occurrence of such performance gains remains uncertain. This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance. Our experiments demonstrate that OBAC outperforms other popular model-free RL baselines and rivals advanced model-based RL methods in terms of sample efficiency and asymptotic performance across 53 tasks spanning 6 task suites.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# TripletMix: 3D理解のためのトリプルトデータ拡張 TripletMix: Triplet Data Augmentation for 3D Understanding ( http://arxiv.org/abs/2405.18523v1 ) ライセンス: Link先を確認	Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng,	(参考訳) データ拡張は、特に従来のデータセットが制限される3Dビジョンにおいて、ディープラーニングモデルの一般化能力を向上するための重要なツールであることが証明されている。これまでの進歩にもかかわらず、既存のメソッドは、主に、テキスト、イメージ、ポイントクラウドを統合したマルチモーダルトリプルデータの増大にギャップを残した、ユニモーダルなデータシナリオに対応している。 3つのモダリティを同時に増強することで多様性が向上し、モダリティ間のアライメントが向上し、より包括的で堅牢な3D表現が得られる。このギャップに対処するために,3次元理解におけるマルチモーダルデータ拡張の未検討問題に対処する新しいアプローチであるTripletMixを提案する。 TripletMixは、マルチモーダル三重項データに対する混合ベースの拡張の原理を革新的に応用し、クロスモーダル接続の保存と最適化を可能にした。提案するTripletMixは,特徴レベルと入力レベルを組み合わせ,生データと潜時特徴の二重化を実現し,特徴整合性の確保と多彩で現実的なトレーニングサンプルの提供により,モデルのクロスモーダル理解と一般化能力を大幅に向上させる。我々は,TripletMixが,ゼロショットや線形探索などの学習シナリオにおけるモデルのベースライン性能を向上するだけでなく,モデルの一般化可能性を大幅に向上させることを示した。特に、ScanObjectNNのゼロショット分類精度を51.3%から61.9%に改善し、Objaverse-LVISは46.8%から51.4%に改善しました。本研究は,3次元物体認識と理解を著しく向上させるマルチモーダルデータ拡張の可能性を明らかにするものである。 Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing methods primarily cater to unimodal data scenarios, leaving a gap in the augmentation of multimodal triplet data, which integrates text, images, and point clouds. Simultaneously augmenting all three modalities enhances diversity and improves alignment across modalities, resulting in more comprehensive and robust 3D representations. To address this gap, we propose TripletMix, a novel approach to address the previously unexplored issue of multimodal data augmentation in 3D understanding. TripletMix innovatively applies the principles of mixed-based augmentation to multimodal triplet data, allowing for the preservation and optimization of cross-modal connections. Our proposed TripletMix combines feature-level and input-level augmentations to achieve dual enhancement between raw data and latent features, significantly improving the model's cross-modal understanding and generalization capabilities by ensuring feature consistency and providing diverse and realistic training samples. We demonstrate that TripletMix not only improves the baseline performance of models in various learning scenarios including zero-shot and linear probing classification but also significantly enhances model generalizability. Notably, we improved the zero-shot classification accuracy on ScanObjectNN from 51.3 percent to 61.9 percent, and on Objaverse-LVIS from 46.8 percent to 51.4 percent. Our findings highlight the potential of multimodal data augmentation to significantly advance 3D object recognition and understanding.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# コンパクトな空間に配向する:不均一なアーキテクチャ間の対照的な知識蒸留 Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures ( http://arxiv.org/abs/2405.18524v1 ) ライセンス: Link先を確認	Hongjun Wu, Li Xiao, Xingkuo Zhang, Yining Miao,	(参考訳) 知識蒸留はニューラルネットワークを圧縮するために一般的に用いられ、推論コストとメモリフットプリントを削減している。均質アーキテクチャのシナリオでは、特徴に基づく手法がその有効性に対して広く検証されている。しかし、教師モデルと学生モデルが異種アーキテクチャである場合、特徴表現の固有の違いはこれらの手法の性能を著しく低下させる。近年の研究では、低周波成分が画像の特徴の大部分を占めていることが強調されている。そこで本研究では,低周波成分を用いたコントラスト知識蒸留(Contrastive Knowledge Distillation, LFCC)フレームワークを提案する。具体的には,教師モデルと学生モデルの両方から,中間特徴の低周波成分を抽出するマルチスケール低域フィルタの集合を設計し,それらをコンパクトな空間に整列させて,構造的差異を克服する。さらに,教師/学生の本質的なペアリング特性を活用して,サンプル内特徴類似性の制約とサンプル間特徴分散の制約をコントラスト学習タスクに順応的に再構成する,革新的なサンプルレベルのコントラスト学習フレームワークを設計する。この戦略により、学生モデルは、異なるサンプルの特徴の識別を同時に強化しつつ、サンプル内特徴の一致に乗じることができる。その結果,LFCCフレームワークは異種アーキテクチャにおける特徴表現の共通点を正確に捉えている。 3つのアーキテクチャ(CNN, Transformer, MLP)にわたる広範囲な評価と実証分析により,ImageNet-1KとCIFAR-100の挑戦的なベンチマークにおいて,LFCCが優れた性能を発揮することが示された。すべてのコードは公開されます。 Knowledge distillation is commonly employed to compress neural networks, reducing the inference costs and memory footprint. In the scenario of homogenous architecture, feature-based methods have been widely validated for their effectiveness. However, in scenarios where the teacher and student models are of heterogeneous architectures, the inherent differences in feature representation significantly degrade the performance of these methods. Recent studies have highlighted that low-frequency components constitute the majority of image features. Motivated by this, we propose a Low-Frequency Components-based Contrastive Knowledge Distillation (LFCC) framework that significantly enhances the performance of feature-based distillation between heterogeneous architectures. Specifically, we designe a set of multi-scale low-pass filters to extract the low-frequency components of intermediate features from both the teacher and student models, aligning them in a compact space to overcome architectural disparities. Moreover, leveraging the intrinsic pairing characteristic of the teacher-student framework, we design an innovative sample-level contrastive learning framework that adeptly restructures the constraints of within-sample feature similarity and between-sample feature divergence into a contrastive learning task. This strategy enables the student model to capitalize on intra-sample feature congruence while simultaneously enhancing the discrimination of features among disparate samples. Consequently, our LFCC framework accurately captures the commonalities in feature representation across heterogeneous architectures. Extensive evaluations and empirical analyses across three architectures (CNNs, Transformers, and MLPs) demonstrate that LFCC achieves superior performance on the challenging benchmarks of ImageNet-1K and CIFAR-100. All codes will be publicly available.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# REPARO: 微分可能な3次元レイアウトアライメントによる合成3Dアセット生成 REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment ( http://arxiv.org/abs/2405.18525v1 ) ライセンス: Link先を確認	Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li,	(参考訳) 従来の画像から3Dモデルでは、バイアスや閉塞の複雑さのため、複数のオブジェクトを含むシーンで苦労することが多い。この課題に対処するために,単一画像からの合成3Dアセット生成のための新しいアプローチであるREPAROを提案する。まず、シーンから個々のオブジェクトを抽出し、オフザシェルフ画像から3Dモデルを使用してそれらの3Dメッシュを再構築し、異なるレンダリング技術によってこれらのメッシュのレイアウトを最適化し、コヒーレントなシーン構成を保証する。最適なトランスポートベース長範囲の外観損失項と高レベルの意味損失項を微分可能レンダリングに統合することにより、REPAROは3Dアセットのレイアウトを効果的に復元することができる。提案手法は,オブジェクト独立性,細部精度,全体のシーンコヒーレンスを著しく向上させることができる。マルチオブジェクトシーンの広汎な評価は、REPAROが単一画像からの多オブジェクト3Dシーン生成の複雑さに対処するための包括的アプローチを提供することを示している。 Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities. To address this challenge, we present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models; then, it optimizes the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition. By integrating optimal transport-based long-range appearance loss term and high-level semantic loss term in the differentiable rendering, REPARO can effectively recover the layout of 3D assets. The proposed method can significantly enhance object independence, detail accuracy, and overall scene coherence. Extensive evaluation of multi-object scenes demonstrates that our REPARO offers a comprehensive approach to address the complexities of multi-object 3D scene generation from single images.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# コンフォーマル予測による逆問題におけるタスク駆動不確かさの定量化 Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction ( http://arxiv.org/abs/2405.18527v1 ) ライセンス: Link先を確認	Jeffrey Wen, Rizwan Ahmad, Philip Schniter,	(参考訳) 逆問題の画像化では、画像が欠落/破損した測定結果から回復しようとする。このような問題は正しくないため、測定・回収プロセスによって引き起こされる不確実性を定量化する大きな動機がある。復元された画像が、ソフトアウトプット分類などの下流タスクに使用されるアプリケーションによって動機付けられ、不確実性定量化のためのタスク中心のアプローチを提案する。特に、コンフォメーション予測を用いて、実際の画像からユーザ特定確率までのタスク出力を含むことが保証される間隔を構築し、その間隔の幅を用いて測定と復元による不確実性の定量化を行う。後方サンプリングに基づく画像復元のために,局所的な適応予測区間を構築した。さらに,タスクの不確実性が許容範囲以下になると,複数のラウンドで測定値の収集を行う。我々は,MRI(Accelerated Magnetic resonance imaging)の方法論を実証する。 In imaging inverse problems, one seeks to recover an image from missing/corrupted measurements. Because such problems are ill-posed, there is great motivation to quantify the uncertainty induced by the measurement-and-recovery process. Motivated by applications where the recovered image is used for a downstream task, such as soft-output classification, we propose a task-centered approach to uncertainty quantification. In particular, we use conformal prediction to construct an interval that is guaranteed to contain the task output from the true image up to a user-specified probability, and we use the width of that interval to quantify the uncertainty contributed by measurement-and-recovery. For posterior-sampling-based image recovery, we construct locally adaptive prediction intervals. Furthermore, we propose to collect measurements over multiple rounds, stopping as soon as the task uncertainty falls below an acceptable level. We demonstrate our methodology on accelerated magnetic resonance imaging (MRI).	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# BI-マンバを用いた多視点胸部X線による心血管疾患の検出 Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba ( http://arxiv.org/abs/2405.18533v1 ) ライセンス: Link先を確認	Zefan Yang, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan,	(参考訳) 医療画像における心血管疾患(CVD)リスクの正確な予測は、患者の健康管理に重要である。従来の研究では、CT(Computed tomography)における画像特徴がCVDのリスクを予測するのに役立つことが示されている。しかし、CTには顕著な放射線曝露があり、患者に悪影響を及ぼす可能性がある。対照的に、胸部X線は放射線のレベルを著しく低くし、より安全な選択肢を提供する。本研究は,胸部X線によるCVDリスク予測の可能性について検討する。畳み込みニューラルネットワーク(CNN)とトランスフォーマーは、コンピュータ支援診断のための確立された2つのネットワークアーキテクチャである。しかし、大きなコンテキストモデリング能力や2次時間複雑性が欠如しているため、非常に高解像度の胸部X線をモデル化するのに苦労している。状態空間列モデル (SSM) に触発され, 競合するシーケンスモデリング能力を持つネットワークアーキテクチャをトランスフォーマーとして, 線形時間複雑性として, 両方向画像マンバ (BI-Mamba) を提案し, 反対方向情報で一方向SSMを補完する。 BI-Mambaは、マルチビュー胸部X線の長距離依存性を符号化するために、並列フォワードブロックとバックウォークブロックを利用する。 NLST(National Lung Screening Trail)における10,395名の被験者の画像について広範な実験を行った。その結果、BI-MambaはResNet-50とViT-Sを同等のパラメータサイズで上回り、トレーニング中に大量のGPUメモリを節約していることがわかった。また, BI-Mambaは従来のCTと比較して有望な性能を示し, CVDリスク予測のための胸部X線の可能性を明らかにする。 Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower levels of radiation, offering a safer option. This rationale motivates our investigation into the feasibility of using chest X-ray for predicting CVD risk. Convolutional Neural Networks (CNNs) and Transformers are two established network architectures for computer-aided diagnosis. However, they struggle to model very high resolution chest X-ray due to the lack of large context modeling power or quadratic time complexity. Inspired by state space sequence models (SSMs), a new class of network architectures with competitive sequence modeling power as Transfomers and linear time complexity, we propose Bidirectional Image Mamba (BI-Mamba) to complement the unidirectional SSMs with opposite directional information. BI-Mamba utilizes parallel forward and backwark blocks to encode longe-range dependencies of multi-view chest X-rays. We conduct extensive experiments on images from 10,395 subjects in National Lung Screening Trail (NLST). Results show that BI-Mamba outperforms ResNet-50 and ViT-S with comparable parameter size, and saves significant amount of GPU memory during training. Besides, BI-Mamba achieves promising performance compared with previous state of the art in CT, unraveling the potential of chest X-ray for CVD risk prediction.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# 組合せ最適化におけるサブサンプリングによる個別プライバシ会計 Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization ( http://arxiv.org/abs/2405.18534v1 ) ライセンス: Link先を確認	Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon,	(参考訳) 本研究では,アルゴリズムが一方のAdd-DPである場合,そのサブサンプル版が両側のDPを満たすという単純な観察を通して,個別化されたプライバシ会計を解析する新しい手法を提案する。これにより、分解可能な部分モジュラ最大化や集合被覆を含む、プライベート組合せ最適化問題に対する改良されたアルゴリズムがいくつか得られる。我々の誤差保証は漸近的に厳密であり、我々のアルゴリズムは純粋DPを満足する一方、既知アルゴリズム(Gupta et al , 2010; Chaturvedi et al , 2021)は近似DPである。また,ストリーム内の重み付け問題に純粋DPアルゴリズムを付与することにより,組合せ最適化を超越した手法を適用した(Kaplan et al ,2021; Cohen & Lyu, 2023)。 In this work, we give a new technique for analyzing individualized privacy accounting via the following simple observation: if an algorithm is one-sided add-DP, then its subsampled variant satisfies two-sided DP. From this, we obtain several improved algorithms for private combinatorial optimization problems, including decomposable submodular maximization and set cover. Our error guarantees are asymptotically tight and our algorithm satisfies pure-DP while previously known algorithms (Gupta et al., 2010; Chaturvedi et al., 2021) are approximate-DP. We also show an application of our technique beyond combinatorial optimization by giving a pure-DP algorithm for the shifting heavy hitter problem in a stream; previously, only an approximateDP algorithm was known (Kaplan et al., 2021; Cohen & Lyu, 2023).	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# 固体スピン量子ビットのデコヒーレンス:計算的視点 Decoherence of solid-state spin qubits: a computational perspective ( http://arxiv.org/abs/2405.18535v1 ) ライセンス: Link先を確認	Mykyta Onizhuk, Giulia Galli,	(参考訳) 量子技術における固体スピンの有用性は、量子状態のコヒーレントな重ね合わせにどれだけ長く留まるかに依存する。このColloquiumは、第一原理シミュレーションが様々な種類の固体電子スピンのスピンダイナミクスを予測し、量子コンピューティング、ネットワーク、センシングのための新しいプラットフォームの設計と改善を支援する方法について論じている。まず、一般的な量子システムに影響を及ぼすノイズの必要な概念を概説する。次に、スピン欠陥量子ビットのスピンフォノン緩和を予測する最近の進歩に焦点を当てる。次に,スピンスピン相互作用によって引き起こされる量子デコヒーレンスをシミュレーションするためのクラスタ法について議論し,これらのシミュレーションの精度を保証する上での検証の重要性を強調する。我々は、最近の実験結果の解釈において、検証されたクラスタ法がどのように有効かを強調し、さらに重要なことは、新しいスピンベースの量子プラットフォームにおけるコヒーレンス特性を予測し、次世代量子技術の発展を導くことである。 The usefulness of solid-state spins in quantum technologies depends on how long they can remain in a coherent superposition of quantum states. This Colloquium discusses how first-principles simulations can predict spin dynamics for different types of solid-state electron spins, helping design novel and improved platforms for quantum computing, networking, and sensing. We begin by outlining the necessary concepts of the noise affecting generic quantum systems. We then focus on recent advances in predicting spin-phonon relaxation of the spin-defect qubits. Next, we discuss cluster methods as a means of simulating quantum decoherence induced by spin-spin interactions, emphasizing the critical role of validation in ensuring the accuracy of these simulations. We highlight how validated cluster methods can be instrumental in interpreting recent experimental results and, more importantly, predicting the coherence properties of novel spin-based quantum platforms, guiding the development of next-generation quantum technologies.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# ドメイン反転ニューラルプロセスを用いた機械的循環支援のためのデータ駆動シミュレータ Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process ( http://arxiv.org/abs/2405.18536v1 ) ライセンス: Link先を確認	Sophia Sun, Wenyuan Chen, Zihao Zhou, Sonia Fereidooni, Elise Jortberg, Rose Yu,	(参考訳) 確率的ディープシークエンスモデルとして実装されたMCS(Mechanical Circulatory Support)デバイス。 MCSの既存の機械シミュレータは、仮定の単純化に依存しており、患者固有の振る舞いに敏感であり、実際の治療シナリオに適用性を制限する。これらの欠点に対処するために、我々のモデルであるDomain Adversarial Neural Process (DANP)は、ニューラルネットワークアーキテクチャを用いて、MCSポンプレベルと不確実性を伴う大動脈圧測定との確率的関係をキャプチャする。我々は、シミュレーションデータと実世界の観測データを組み合わせるために、ドメインの敵対的トレーニングを使用し、その結果、より現実的で多様な潜在的な結果が表現される。非定常的傾向予測の19%の改善による経験的結果は、臨床医がMCS患者の治療について理解し、決定を下すための効果的なツールとしてDANPを確立した。 Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process architecture, allowing it to capture the probabilistic relationship between MCS pump levels and aortic pressure measurements with uncertainty. We use domain adversarial training to combine simulation data with real-world observations, resulting in a more realistic and diverse representation of potential outcomes. Empirical results with an improvement of 19% in non-stationary trend prediction establish DANP as an effective tool for clinicians to understand and make informed decisions regarding MCS patient treatment.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# ARにおける組込み音声駆動オンザフライ参照による拡張会話 Augmented Conversation with Embedded Speech-Driven On-the-Fly Referencing in AR ( http://arxiv.org/abs/2405.18537v1 ) ライセンス: Link先を確認	Shivesh Jadon, Mehrad Faridan, Edward Mah, Rajan Vaish, Wesley Willett, Ryo Suzuki,	(参考訳) 本稿では,拡張現実(AR)における組込み音声駆動のオンザフライ参照を通じて,共同会話を支援することを目的とした,拡張現実の概念を紹介する。今日、スマートフォンのようなコンピューティング技術は、会話中に様々な参照に素早くアクセスできる。しかし、これらのツールはしばしば注意をそらし、アイコンタクトを減らし、ユーザーは携帯電話の画面に注意を集中させ、関連する情報にアクセスするためにキーワードを手入力する。対照的に、ARベースのオンザフライ参照は、音声会話から自動的に抽出されるキーワードに基づいて、リアルタイムで関連する視覚的参照を提供する。これらの視覚的参照を会話パートナーの周囲に埋め込むことで、強化された会話は混乱と摩擦を減らし、ユーザーはアイコンタクトを維持し、より自然なソーシャルインタラクションをサポートすることができる。この概念を実証するために,実時間音声認識,自然言語処理,視線に基づく対話を利用したホロレンスベースのインタフェースである \system を開発した。本稿では,ユーザ中心の設計プロセスを通じて識別された7つの設計ガイドラインに基づいて,会話の視覚的参照の設計空間について検討し,我々の実装について述べる。最初のユーザ調査では、スマートフォンの検索に比べて会話の邪魔や摩擦を減らし、非常に有用で関連性の高い情報を提供する。 This paper introduces the concept of augmented conversation, which aims to support co-located in-person conversations via embedded speech-driven on-the-fly referencing in augmented reality (AR). Today computing technologies like smartphones allow quick access to a variety of references during the conversation. However, these tools often create distractions, reducing eye contact and forcing users to focus their attention on phone screens and manually enter keywords to access relevant information. In contrast, AR-based on-the-fly referencing provides relevant visual references in real-time, based on keywords extracted automatically from the spoken conversation. By embedding these visual references in AR around the conversation partner, augmented conversation reduces distraction and friction, allowing users to maintain eye contact and supporting more natural social interactions. To demonstrate this concept, we developed \system, a Hololens-based interface that leverages real-time speech recognition, natural language processing and gaze-based interactions for on-the-fly embedded visual referencing. In this paper, we explore the design space of visual referencing for conversations, and describe our our implementation -- building on seven design guidelines identified through a user-centered design process. An initial user study confirms that our system decreases distraction and friction in conversations compared to smartphone searches, while providing highly useful and relevant information.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# モデル駆動工学における自動化の過去、現在、そして未来 The Past, Present, and Future of Automation in Model-Driven Engineering ( http://arxiv.org/abs/2405.18539v1 ) ライセンス: Link先を確認	Lola Burgueño, Davide Di Ruscio, Houari Sahraoui, Manuel Wimmer,	(参考訳) モデル駆動エンジニアリング(MDE)は多くの異なるエンジニアリングタスク、特に設計から実装への移行に関わる自動化に関する膨大な知識を提供する。人工知能(AI)技術に関する大きな進歩により、既存のMDE技術や技術をどのように改善できるか、あるいは現在専用のサポートを欠いている他のアクティビティも自動化できるかといった、MDEの将来に対する疑問が持ち上がる。しかし同時に、複雑なシステムの作成、運用、保守のために、エンジニアのループを維持するためにモデルをどこに、どのように使用するべきかを再検討する必要がある。これらのオープンポイントに関する専門的な研究のきっかけとして、MDEにおける自動化の歴史と、MDEにおける自動化をさらに改善し、中長期的視点において障害を克服しなければならないかという視点について論じる。 Model-Driven Engineering (MDE) provides a huge body of knowledge of automation for many different engineering tasks, especially those involving transitioning from design to implementation. With the huge progress made on Artificial Intelligence (AI) techniques, questions arise for the future of MDE such as how existing MDE techniques and technologies can be improved or how other activities which currently lack dedicated support can also be automated. However, at the same time, it has to be revisited where and how models should be used to keep the engineers in the loop for creating, operating, and maintaining complex systems. To trigger dedicated research on these open points, we discuss the history of automation in MDE and present perspectives on how automation in MDE can be further improved and which obstacles have to be overcome in the medium and long term perspective.	翻訳日:2024-05-30 22:03:07 公開日:2024-05-28
# 堅牢なレッドチームと安全チューニングのための大規模言語モデルに対する多様な攻撃学習 Learning diverse attacks on large language models for robust red-teaming and safety tuning ( http://arxiv.org/abs/2405.18540v1 ) ライセンス: Link先を確認	Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain,	(参考訳) レッドチーム、あるいは有害な応答を誘発するプロンプトの特定は、大規模言語モデル(LLM)の安全かつ責任あるデプロイを保証するための重要なステップである。多くの攻撃プロンプトに対する効果的な防御を開発するには、多様な攻撃を発見する必要がある。自動化されたレッドチームは通常、例えば補助毒性分類器によって測定されたように、強化学習を使用して攻撃言語モデルを微調整し、ターゲットのLSMから望ましくない応答を誘発するプロンプトを生成する。新規性と多様性を優先する明確な規則化であっても、既存のアプローチはモード崩壊または効果的な攻撃を発生させることができないことを示す。フレキシブルで確率論的に原理化された代替手段として,GFlowNetの微調整と二次平滑化フェーズを併用して,多種多様な効果的な攻撃プロンプトを生成するようアタッカーモデルを訓練することを提案する。提案手法により生成された攻撃は,安全チューニングと遠隔操作の両方で広範囲のLLMに対して有効であり,目標LLM間での移動が良好であることがわかった。最後に,提案手法により生成したレッドチームプロンプトのデータセットを用いて,安全チューニングされたモデルが,他のRLベースのレッドチームアプローチからの攻撃に対して堅牢であることを示す。 Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# ビジョンランゲージモデルの低ランクFew-Shot適応 Low-Rank Few-Shot Adaptation of Vision-Language Models ( http://arxiv.org/abs/2405.18541v1 ) ライセンス: Link先を確認	Maxime Zanella, Ismail Ben Ayed,	(参考訳) VLM(Vision-Language Models)の少数の適応の最近の進歩は、目標下流タスクにおいてわずか数個のラベル付きサンプルを犠牲にして、その一般化能力をさらに推し進めている。しかし、この有望な、既にかなりの数ショットの文献は、主に迅速な学習に焦点を合わせており、より少ない範囲において、パラメータ効率の良いファインチューニング(PEFT)の最近の進歩を見越して、アダプタに焦点をあてている。さらに、VLMの既存の数発の学習手法は、重い訓練手順と/または慎重に選択されたタスク固有のハイパーパラメータに依存しており、それらの適用性を阻害する可能性がある。これに対し、VLMのための数ショット学習においてローランド適応(LoRA)を導入し、現在の最先端のプロンプトとアダプタベースのアプローチと比較して、11のデータセットにその可能性を示す。驚くべきことに、私たちの単純なCLIP-LoRAメソッドは、トレーニング時間を短縮し、すべてのターゲットタスク、すなわち、すべてのデータセットとショット数に同じハイパーパラメータを保持するとともに、大幅に改善されている。もちろん、我々の驚くべき結果は、迅速な学習とアダプタベースの研究の可能性を否定するものではない。しかし,本研究の強力なベースラインは,これらの突発性被験者の経過を数発のVLMで評価するのに有効であると考えられた。 Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, task-specific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-of-the-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times and keeping the same hyper-parameters in all the target tasks, i.e., across all the datasets and numbers of shots. Certainly, our surprising results do not dismiss the potential of prompt-learning and adapter-based research. However, we believe that our strong baseline could be used to evaluate progress in these emergent subjects in few-shot VLMs.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# 自然言語処理機能を有するエンターテイメントチャットボットを用いた高齢者の認知障害の自動検出 Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with Natural Language Processing capabilities ( http://arxiv.org/abs/2405.18542v1 ) ライセンス: Link先を確認	Francisco de Arriba-Pérez, Silvia García-Méndez, Francisco J. González-Castaño, Enrique Costa-Montenegro,	(参考訳) 従来の研究者は認知障害の治療モニタリングのためのインテリジェントシステムを提案してきた。しかし、この目的のための既存の実践的なアプローチは手動テストに基づいている。これにより、過剰なケアやホワイトコート効果などの問題が発生する。これらの問題を回避するため,高齢者の関心を喚起し,認知障害を透過的に監視するインテリジェントな会話システムを提案する。自動チャットボット対話は、コンテンツ記述スキルの評価と機械学習アルゴリズムによる認知障害の検出を可能にする。我々は、自然言語生成技術を用いて、更新されたニュース項目からこれらの対話フローを自動生成する。このシステムは、質問に対する回答のゴールドスタンダードも推論するので、これらの回答とユーザ応答を比べることで、認知能力を自動的に評価することができる。類似度は[0, 1]の値を持つ類似度で、類似度のレベルが増加する。本研究は,認知症早期の高齢者30名を対象に,老年医学者の指導のもと,フィールドテストを実施した。実験では, 利用者のストレスと集中度を解析した。認知障害のない患者は最大で5倍の成績を示した。特に類似度は、ストレスや集中していない参加者の0.03と、リラックスしたユーザーと集中したユーザーの0.36と様々である。最後に、自動認知障害検出のためのテキスト解析機能に基づく機械学習アルゴリズムを開発し、精度、F測定、リコールレベルを80%以上とした。そこで我々は,エンターテイメントコンテンツに基づく高齢者の認知障害の自動検出手法を検証した。 Previous researchers have proposed intelligent systems for therapeutic monitoring of cognitive impairments. However, most existing practical approaches for this purpose are based on manual tests. This raises issues such as excessive caretaking effort and the white-coat effect. To avoid these issues, we present an intelligent conversational system for entertaining elderly people with news of their interest that monitors cognitive impairment transparently. Automatic chatbot dialogue stages allow assessing content description skills and detecting cognitive impairment with Machine Learning algorithms. We create these dialogue flows automatically from updated news items using Natural Language Generation techniques. The system also infers the gold standard of the answers to the questions, so it can assess cognitive capabilities automatically by comparing these answers with the user responses. It employs a similarity metric with values in [0, 1], in increasing level of similarity. To evaluate the performance and usability of our approach, we have conducted field tests with a test group of 30 elderly people in the earliest stages of dementia, under the supervision of gerontologists. In the experiments, we have analysed the effect of stress and concentration in these users. Those without cognitive impairment performed up to five times better. In particular, the similarity metric varied between 0.03, for stressed and unfocused participants, and 0.36, for relaxed and focused users. Finally, we developed a Machine Learning algorithm based on textual analysis features for automatic cognitive impairment detection, which attained accuracy, F-measure and recall levels above 80%. We have thus validated the automatic approach to detect cognitive impairment in elderly people based on entertainment content.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# CAPTCHAの利用者認識 : 大学とインターネット利用者の比較研究 User Perception of CAPTCHAs: A Comparative Study between University and Internet Users ( http://arxiv.org/abs/2405.18547v1 ) ライセンス: Link先を確認	Arun Reddy, Yuan Cheng,	(参考訳) CAPTCHAは、ウェブ上の人間とボットのユーザーを区別するために一般的に使用される。しかし、さまざまなタイプのCAPTCHAを持っているにもかかわらず、セキュリティとユーザビリティについてはまだ懸念されている。これらの懸念に対処するため、大学キャンパスとAmazon Mechanical Turkから250人以上の参加者を調査した。私たちの目標は、現在のCAPTCHA実装のセキュリティとユーザビリティに関するユーザの認識を集めることです。統計的・理論的手法を用いてデータを解析した結果,難易度の増加による現在のCAPTCHA課題のナビゲートに苦慮していることが判明した。その結果、ユーザエクスペリエンスに悪影響を及ぼすフラストレーションを経験する。さらに、参加者はこれらのシステムの信頼性とセキュリティについて懸念を表明した。私たちの発見は、よりセキュアでユーザフレンドリなCAPTCHA技術を作成する上で、貴重な洞察を与えることができます。 CAPTCHAs are commonly used to distinguish between human and bot users on the web. However, despite having various types of CAPTCHAs, there are still concerns about their security and usability. To address these concerns, we surveyed over 250 participants from a university campus and Amazon Mechanical Turk. Our goal was to gather user perceptions regarding the security and usability of current CAPTCHA implementations. After analyzing the data using statistical and thematic methods, we found that users struggle to navigate current CAPTCHA challenges due to increasing difficulty levels. As a result, they experience frustration, which negatively impacts their user experience. Additionally, participants expressed concerns about the reliability and security of these systems. Our findings can offer valuable insights for creating more secure and user-friendly CAPTCHA technologies.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# エンコーダオンリー変圧器の形式推論の計算複雑性 The Computational Complexity of Formal Reasoning for Encoder-Only Transformers ( http://arxiv.org/abs/2405.18548v1 ) ライセンス: Link先を確認	Marco Sälzer, Eric Alsmann, Martin Lange,	(参考訳) 本研究では,エンコーダのみの変圧器(EOT)の形式的推論の課題と可能性について検討する。本稿では,自然発生型満足度問題(SAT)の形で,関連する形式的推論タスクを凝縮する。 EOTを考えるとSATは決定不可能であり,表現性コミュニティでは一般的に考慮されている。さらに,SATが決定可能な現実シナリオを特定し,それに対応する複雑性境界を確立する。自明なケースの他に、量子化されたEOT、すなわち固定幅の算術で制限されたEOTは、その注意力の制限によりSATの決定可能性に繋がる。しかし、SAT が NEXPTIME ハードなシナリオと、量子化された EOT に対して NEXPTIME で解決可能であることを示すシナリオを確立することは困難である。理論的結果を補完するため, フォーマルな推論の全体的視点において, 研究結果とその意義を考察した。 We investigate challenges and possibilities of formal reasoning for encoder-only transformers (EOT), meaning sound and complete methods for verifying or interpreting behaviour. In detail, we condense related formal reasoning tasks in the form of a naturally occurring satisfiability problem (SAT). We find that SAT is undecidable if we consider EOT, commonly considered in the expressiveness community. Furthermore, we identify practical scenarios where SAT is decidable and establish corresponding complexity bounds. Besides trivial cases, we find that quantized EOT, namely those restricted by some fixed-width arithmetic, lead to the decidability of SAT due to their limited attention capabilities. However, the problem remains difficult, as we establish those scenarios where SAT is NEXPTIME-hard and those where we can show that it is solvable in NEXPTIME for quantized EOT. To complement our theoretical results, we put our findings and their implications in the overall perspective of formal reasoning.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# 不確実なデータから学ぶ:可能な世界から可能なモデルへ Learning from Uncertain Data: From Possible Worlds to Possible Models ( http://arxiv.org/abs/2405.18549v1 ) ライセンス: Link先を確認	Jiongli Zhu, Su Feng, Boris Glavic, Babak Salimi,	(参考訳) 本研究では,不確実なデータから線形モデルを効率よく学習する方法を提案する。提案手法では,コンベックスポリトープの一種である抽象解釈とゾノトープを用いて,これらのデータセットの変動をコンパクトに表現し,すべての可能な世界に対する勾配勾配のシンボリックな実行を可能にする。我々は、この過程が固定点に収束することを保証する技術を開発し、この固定点に対する閉形式解を導出する。提案手法は,全ての可能な最適モデルと予測範囲を過度に近似する。提案手法の有効性を理論的および経験的分析により実証し,データ品質の問題によるモデルと予測の不確実性について推論する可能性を明らかにする。 We introduce an efficient method for learning linear models from uncertain data, where uncertainty is represented as a set of possible variations in the data, leading to predictive multiplicity. Our approach leverages abstract interpretation and zonotopes, a type of convex polytope, to compactly represent these dataset variations, enabling the symbolic execution of gradient descent on all possible worlds simultaneously. We develop techniques to ensure that this process converges to a fixed point and derive closed-form solutions for this fixed point. Our method provides sound over-approximations of all possible optimal models and viable prediction ranges. We demonstrate the effectiveness of our approach through theoretical and empirical analysis, highlighting its potential to reason about model and prediction uncertainty due to data quality issues in training data.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# ニューラルネットワークのスムーズなl0正規化によるエントロピー誤差関数のSGD法 SGD method for entropy error function with smoothing l0 regularization for neural networks ( http://arxiv.org/abs/2405.18552v1 ) ライセンス: Link先を確認	Trong-Tuan Nguyen, Van-Dat Thang, Nguyen Van Thin, Phuong T. Nguyen,	(参考訳) エントロピー誤差関数はニューラルネットワークで広く使われている。それでも、この誤差関数に基づくネットワークトレーニングは、一般的に、収束速度が遅くなり、局所的な最小値や、実際には不正な飽和問題にも容易に閉じ込められる。実際、ニューラルネットワークとその応用におけるエントロピー誤差関数に基づく多くの結果が存在する。しかし、そのようなアルゴリズムの理論とその収束は、今のところ完全には研究されていない。そこで本研究では,フィードフォワードニューラルネットワークにおけるl0正規化を円滑に行うエントロピー関数を提案する。実世界のデータセットを用いて、新たに考案されたアルゴリズムが、検討されたニューラルネットワークの予測性能を大幅に改善できることを示す実験的な評価を行った。さらに, 実験結果から, 提案した関数は, 十分に確立されたベースラインに比べて, より正確な分類をもたらすことが明らかとなった。ニューラルネットワークを効果的に学習し、最先端のアルゴリズムと比較してより正確な予測を生成するため、我々の研究は新しくなっています。この点に関して、このアルゴリズムはこの分野の既存の研究に貢献し、機械学習とディープラーニングの研究を進めていくことを期待する。 The entropy error function has been widely used in neural networks. Nevertheless, the network training based on this error function generally leads to a slow convergence rate, and can easily be trapped in a local minimum or even with the incorrect saturation problem in practice. In fact, there are many results based on entropy error function in neural network and its applications. However, the theory of such an algorithm and its convergence have not been fully studied so far. To tackle the issue, we propose a novel entropy function with smoothing l0 regularization for feed-forward neural networks. Using real-world datasets, we performed an empirical evaluation to demonstrate that the newly conceived algorithm allows us to substantially improve the prediction performance of the considered neural networks. More importantly, the experimental results also show that our proposed function brings in more precise classifications, compared to well-founded baselines. Our work is novel as it enables neural networks to learn effectively, producing more accurate predictions compared to state-of-the-art algorithms. In this respect, we expect that the algorithm will contribute to existing studies in the field, advancing research in Machine Learning and Deep Learning.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# FAIIRツール: 若者のメンタルヘルスサービス提供のための会話型AIエージェントアシスタント The FAIIR Tool: A Conversational AI Agent Assistant for Youth Mental Health Service Provision ( http://arxiv.org/abs/2405.18553v1 ) ライセンス: Link先を確認	Stephen Obadinma, Alia Lachana, Maia Norman, Jocelyn Rankin, Joanna Yu, Xiaodan Zhu, Darren Mastropaolo, Deval Pandya, Roxana Sultan, Elham Dolatabadi,	(参考訳) 世界の医療システムとメンタルヘルス機関は、限られた資源の同時挑戦とともに、若者のメンタルヘルスサービスへの需要が高まっている。これらの制約を踏まえ、本研究は、ドメイン適応型および微調整型トランスフォーマーモデルのアンサンブルであるFAIIR(Frontline Assistant: Issue Identification and Recommendation)ツールの作成と評価において、自然言語処理を活用し、若者が経験している可能性のある問題を識別する。本研究では,FAIIRツールに活用される技術開発,性能,検証プロセスについて,キッズヘルプ電話による最前線危機対応の状況に適用する。フロントライン危機応答器は、各会話に従って定義されたリストからイシュータグを割り当てる。関連性の問題の特定の支援は、CRの負担を軽減し、適切な資源を提供し、アクティブな救助や強制的な報告が即時エスカレーションを必要とする重要な状況で実施されることを保証する。 World's healthcare systems and mental health agencies face both a growing demand for youth mental health services, alongside a simultaneous challenge of limited resources. Given these constraints, this work presents our experience in the creation and evaluation of the FAIIR (Frontline Assistant: Issue Identification and Recommendation) tool, an ensemble of domain-adapted and fine-tuned transformer models, leveraging natural language processing to identify issues that youth may be experiencing. We explore the technical development, performance, and validation processes leveraged for the FAIIR tool in application to situations of frontline crisis response via Kids Help Phone. Frontline Crisis Responders assign an issue tag from a defined list following each conversation. Assisting with the identification of issues of relevance helps reduce the burden on CRs, ensuring that appropriate resources can be provided and that active rescues and mandatory reporting can take place in critical situations requiring immediate de-escalation.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# 合成とアンローリングを用いた画像ベースニューラルネットワーク制御系のスケーラブルなサロゲート検証 Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling ( http://arxiv.org/abs/2405.18554v1 ) ライセンス: Link先を確認	Feiyang Cai, Chuchu Fan, Stanley Bak,	(参考訳) 入力としてイメージを使用するニューラルネットワーク制御システムの安全性を検証することは難しい問題である。本研究では,実世界に代わって条件付き生成逆数ネットワーク(cGAN)をイメージジェネレータとして訓練し,サロゲート検証アプローチを考慮した最近の研究に基づいて構築する。これにより、クローズドループシステムの集合ベースの形式解析が可能となり、シミュレーションやテスト以外の分析が可能になる。既存の作業は小さな例では有効であるが、過剰なオーバー近似は単一の制御期間と複数の制御期間の両方でそのスケーラビリティを制限している。この2つの誤りの原因を克服する手法を提案する。まず,システムダイナミクスの単調解析のように入力状態と制御出力の依存関係を失うことなく,cGANやニューラルネットワークコントローラとともにシステムのダイナミクスを構成することで,一段階誤差を克服する。第2に、制御ループの複数のステップを大規模ニューラルネットワークにアンロールする単一ステップ構成を繰り返すことで、マルチステップエラーを低減する。次に、既存のネットワーク検証ツールを活用して、複数のステップの正確な到達可能な集合を計算し、各ステップにおける抽象化エラーの蓄積を避ける。本稿では,自律型航空機タクシーシステムと高度緊急制動システムという2つのケーススタディを用いて,精度とスケーラビリティの両面からアプローチの有効性を実証する。航空機のタクシーシステムでは, 従来のベースライン方式に比べて, 収束到達可能セットが175%大きい。緊急制動システムでは, cGANからの画像出力変数の24倍の回数で, ベースライン法はどの状態も安全であることを示すのに失敗する。 Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the real world. This enables set-based formal analysis of the closed-loop system, providing analysis beyond simulation and testing. While existing work is effective on small examples, excessive overapproximation both within a single control period and across multiple control periods limits its scalability. We propose approaches to overcome these two sources of error. First, we overcome one-step error by composing the system's dynamics along with the cGAN and neural network controller, without losing the dependencies between input states and the control outputs as in the monotonic analysis of the system dynamics. Second, we reduce multi-step error by repeating the single-step composition, essentially unrolling multiple steps of the control loop into a large neural network. We then leverage existing network verification tools to compute accurate reachable sets for multiple steps, avoiding the accumulation of abstraction error at each step. We demonstrate the effectiveness of our approach in terms of both accuracy and scalability using two case studies: an autonomous aircraft taxiing system and an advanced emergency braking system. On the aircraft taxiing system, the converged reachable set is 175% larger using the prior baseline method compared with our proposed approach. On the emergency braking system, with 24x the number of image output variables from the cGAN, the baseline method fails to prove any states are safe, whereas our improvements enable set-based safety analysis.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# 動的治療レジームにおける強化学習 : 批判的再検討の必要性 Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination ( http://arxiv.org/abs/2405.18556v1 ) ライセンス: Link先を確認	Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu,	(参考訳) 急速に変化する医療分野では、動的治療体制(DTR)におけるオフライン強化学習(RL)の実装は、前例のない機会と課題の混在を示している。本稿では、DTRの文脈におけるオフラインRLの現状を批判的に検証する。本稿では,DTRにRLを適用することの再評価について論じる。不整合性,潜在的に不整合性評価指標,ナイーブおよび教師あり学習ベースラインの欠如,既存研究におけるRL定式化の選択の多様さなどの懸念を引用する。公開されているSepsisデータセットを用いて17,000以上の評価実験を行ったケーススタディにより、RLアルゴリズムの性能は評価指標の変化やマルコフ決定プロセス(MDP)の定式化と大きく異なることを示した。驚いたことに、いくつかのケースでは、RLアルゴリズムはポリシー評価手法や報酬設計に従属するランダムなベースラインによって超えることができる。これにより、将来のDTRにおけるより慎重な政策評価とアルゴリズム開発が求められている。さらに,RLに基づく動的治療体制の信頼性向上に向けた可能性についても検討し,コミュニティ内でさらなる議論を招いた。コードはhttps://github.com/GilesLuo/ReassessDTRで入手できる。 In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# ポテンシャル場に基づくDeep Metric Learning Potential Field Based Deep Metric Learning ( http://arxiv.org/abs/2405.18560v1 ) ライセンス: Link先を確認	Shubhang Bhatnagar, Narendra Ahuja,	(参考訳) ディープ・メトリック・ラーニング(DML)は、意味的に意味のある表現空間を学ぶためにネットワークを訓練する。現在の多くのアプローチは、各タプレット内の例とモデル相互作用のn-タプルをマイニングしている。本稿では, 電場から着想を得た新しい構成DMLモデルを提案する。このモデルでは, タプルではなく, 連続ポテンシャル場による各例(埋め込み)の影響を表現し, それらの結合した大域ポテンシャル場を得るために, 電場を重畳する。我々は、同じ/異なるクラスの画像からの埋め込み間の相互作用を表現するために、魅力的な/反発的なポテンシャル場を使用する。サンプルの相互影響が距離に比例する典型的な学習法とは対照的に、距離による影響の低減を強制し、崩壊する分野へと導く。このような減衰は,クラス内変動が大きく,ラベルノイズも大きい実世界のデータセットの性能向上に有効であることを示す。他のプロキシベースのメソッドと同様に、プロキシを使ってサンプルのサブポピュレーションを簡潔に表現します。本稿では,Cars-196,CUB-200-2011,SOPの3つの標準DMLベンチマークを用いて評価を行った。 Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model, inspired by electrostatic fields in physics that, instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks- Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# トレーニングデータセットを持たない多変量時系列の因果的説明 Counterfactual Explanations for Multivariate Time-Series without Training Datasets ( http://arxiv.org/abs/2405.18563v1 ) ライセンス: Link先を確認	Xiangyu Sun, Raquel Aoki, Kevin H. Wilson,	(参考訳) 機械学習(ML)手法は、過去10年間に著しい成長を遂げてきたが、ハイインパクトな現実世界のドメインにおける実践的応用は、その不透明さによって妨げられている。 MLメソッドが重要な決定を行う責任がある場合、ステークホルダは、これらの決定を変更する方法に関する洞察を必要とすることが多い。対物的説明(CFE)はソリューションとして現れ、不透明なMLモデルの解釈を提供し、ある決定から別の決定への遷移経路を提供する。しかし、既存のCFEメソッドの多くはモデルのトレーニングデータセットへのアクセスを必要としており、多変量時系列を処理できるメソッドはほとんどなく、トレーニングデータセットなしでは多変量時系列を処理できない。これらの制限は多くのシナリオで恐ろしくできる。本稿では、トレーニングデータセットが利用できない場合にCFEを生成する新しい強化学習ベースのCFE手法CFWoTを提案する。 CFWoTはモデルに依存しず、連続的および離散的な特徴を持つ静的および多変量時系列データセットに適している。ユーザは、CFWoTが保証する因果制約だけでなく、非アクション可能、不変、および推奨の機能を指定できる柔軟性がある。いくつかのデータセット上の4つのベースラインに対してCFWoTの性能を実証し、トレーニングデータセットにアクセスできないにもかかわらず、CFWoTは入力時系列の変更を著しく小さくするCFEを見つける。これらの性質により、CFEは結果を変えるのに必要な変化の大きさが大幅に減少するので、より実用的なものとなる。 Machine learning (ML) methods have experienced significant growth in the past decade, yet their practical application in high-impact real-world domains has been hindered by their opacity. When ML methods are responsible for making critical decisions, stakeholders often require insights into how to alter these decisions. Counterfactual explanations (CFEs) have emerged as a solution, offering interpretations of opaque ML models and providing a pathway to transition from one decision to another. However, most existing CFE methods require access to the model's training dataset, few methods can handle multivariate time-series, and none can handle multivariate time-series without training datasets. These limitations can be formidable in many scenarios. In this paper, we present CFWoT, a novel reinforcement-learning-based CFE method that generates CFEs when training datasets are unavailable. CFWoT is model-agnostic and suitable for both static and multivariate time-series datasets with continuous and discrete features. Users have the flexibility to specify non-actionable, immutable, and preferred features, as well as causal constraints which CFWoT guarantees will be respected. We demonstrate the performance of CFWoT against four baselines on several datasets and find that, despite not having access to a training dataset, CFWoT finds CFEs that make significantly fewer and significantly smaller changes to the input time-series. These properties make CFEs more actionable, as the magnitude of change required to alter an outcome is vastly reduced.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# 量子力学と古典力学の変遷を探る Exploring the transition between Quantum and Classical Mechanics ( http://arxiv.org/abs/2405.18564v1 ) ライセンス: Link先を確認	E. Aldo Arroyo,	(参考訳) 量子力学から古典力学への遷移を1次元自由粒子モデルを用いて検討する。古典的解析では、ガウス分布から引き出された粒子の初期位置と速度を考える。粒子の最終的な位置はこれらの初期条件に依存するため、これらの初期条件に付随するガウス分布は最終位置の分布を与える。量子シナリオでは、初期ガウス波パケットを用いて、時間進化は最後の波動関数を与え、そこから量子確率密度を与える。量子確率密度は畳み込み定理から得られる粒子の最終位置の古典的正規分布と一致する。しかし、ガウス分布の重ね合わせの場合、古典的および量子的結果は量子干渉によってずれる。この問題に対処するために,古典分布を量子から復元する新しい手法を提案する。このアプローチでは、切り離されたフーリエ解析により量子干渉効果を除去する。これらの結果は現代の量子デコヒーレンス理論と一致している。この包括的分析により、古典量子対応の理解と量子システムからの古典性の出現の基礎となるメカニズムが強化される。 We investigate the transition from quantum to classical mechanics using a one-dimensional free particle model. In the classical analysis, we consider the initial positions and velocities of the particle drawn from Gaussian distributions. Since the final position of the particle depends on these initial conditions, convolving the Gaussian distributions associated with these initial conditions gives us the distribution of the final positions. In the quantum scenario, using an initial Gaussian wave packet, the temporal evolution provides the final wave function, and from it, the quantum probability density. We find that the quantum probability density coincides with the classical normal distribution of the particle's final position obtained from the convolution theorem. However, for superpositions of Gaussian distributions, the classical and quantum results deviate due to quantum interference. To address this issue, we propose a novel approach to recover the classical distribution from the quantum one. This approach involves removing the quantum interference effects through truncated Fourier analysis. These results are consistent with modern quantum decoherence theory. This comprehensive analysis enhances our understanding of the classical-quantum correspondence and the mechanisms underlying the emergence of classicality from quantum systems.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# ウォームスタートPush-Relabel Warm-starting Push-Relabel ( http://arxiv.org/abs/2405.18568v1 ) ライセンス: Link先を確認	Sami Davies, Sergei Vassilvitskii, Yuyan Wang,	(参考訳) Push-Relabelは、最も有名なネットワークフローアルゴリズムの1つである。カットを飽和させるプレフローを維持することで、Ford-Fulkersonのような他のフローアルゴリズムよりも理論的および経験的な実行時間を楽しむことができる。実際には、Push-Relabelは理論的な保証が約束できるものよりも高速である。しかし、Push-Relabelを任意の初期化で実行する方法は、必ずしもプレフローやカット飽和ではない。我々は,予測フローでPush-Relabelを温めるための最初の理論的保証を提供する。興味深いことに、我々のアルゴリズムは長い間使われてきたギャップを許容するヒューリスティックを使っており、我々の研究以前には、それが実行時改善に繋がる理由に関する厳密な理論的正当化は存在しなかった。次に、ウォームスタートしたPush-Relabelが実際にうまく動作することを示す実験を紹介します。 Push-Relabel is one of the most celebrated network flow algorithms. Maintaining a pre-flow that saturates a cut, it enjoys better theoretical and empirical running time than other flow algorithms, such as Ford-Fulkerson. In practice, Push-Relabel is even faster than what theoretical guarantees can promise, in part because of the use of good heuristics for seeding and updating the iterative algorithm. However, it remains unclear how to run Push-Relabel on an arbitrary initialization that is not necessarily a pre-flow or cut-saturating. We provide the first theoretical guarantees for warm-starting Push-Relabel with a predicted flow, where our learning-augmented version benefits from fast running time when the predicted flow is close to an optimal flow, while maintaining robust worst-case guarantees. Interestingly, our algorithm uses the gap relabeling heuristic, which has long been employed in practice, even though prior to our work there was no rigorous theoretical justification for why it can lead to run-time improvements. We then provide experiments that show our warm-started Push-Relabel also works well in practice.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# モダリティギャップではない:コントラストギャップの特徴と対処 Its Not a Modality Gap: Characterizing and Addressing the Contrastive Gap ( http://arxiv.org/abs/2405.18570v1 ) ライセンス: Link先を確認	Abrar Fahim, Alex Murphy, Alona Fyshe,	(参考訳) CLIPのようなマルチモーダルコントラストモデルは、入力画像とテキストを共同表現空間に埋め込むことで、ゼロショット分類における最先端の性能を達成する。近年、CLIPのような2エンコーダのコントラストモデルではモダリティギャップが報告されている。これまでの研究では、このギャップは存在することが示唆されている。 1)コーン効果 2)データセットのミスマッチペア,及び 3)訓練不足。これらすべての要因を考慮に入れたとしても、同じモダリティを使用しても、対照的な損失は実際にトレーニング中にギャップを生じさせます。その結果、モダリティギャップは2エンコーダのコントラスト損失に固有のものであり、コントラストギャップにリネームすることを提案した。この対照的なギャップがCLIP空間の低均一性に起因する証拠を提示する。このギャップを埋めるために, マルチモーダル・セッティングに不定形コントラスト損失の均一性とアライメント特性を適用し, これらの項をCLIP損失に追加するだけで, 表現空間内での埋め込みをより均一に分散し, ギャップを閉じることを示す。実験では、ゼロショット画像分類やマルチモーダル演算などの下流タスクにおいて、修正された表現空間がデフォルトのCLIP損失よりも優れた性能を実現することを示す。 Multi-modal contrastive models such as CLIP achieve state-of-the-art performance in zero-shot classification by embedding input images and texts on a joint representational space. Recently, a modality gap has been reported in two-encoder contrastive models like CLIP, meaning that the image and text embeddings reside in disjoint areas of the latent space. Previous studies suggest that this gap exists due to 1) the cone effect, 2) mismatched pairs in the dataset, and 3) insufficient training. We show that, even when accounting for all these factors, and even when using the same modality, the contrastive loss actually creates a gap during training. As a result, We propose that the modality gap is inherent to the two-encoder contrastive loss and rename it the contrastive gap. We present evidence that attributes this contrastive gap to low uniformity in CLIP space, resulting in embeddings that occupy only a small portion of the latent space. To close the gap, we adapt the uniformity and alignment properties of unimodal contrastive loss to the multi-modal setting and show that simply adding these terms to the CLIP loss distributes the embeddings more uniformly in the representational space, closing the gap. In our experiments, we show that the modified representational space achieves better performance than default CLIP loss in downstream tasks such as zero-shot image classification and multi-modal arithmetic.	翻訳日:2024-05-30 21:53:22 公開日:2024-05-28
# LLMのための低ランクファインタニング:公平性の観点から Low-rank finetuning for LLMs: A fairness perspective ( http://arxiv.org/abs/2405.18572v1 ) ライセンス: Link先を確認	Saswat Das, Marco Romanelli, Cuong Tran, Zarreen Reza, Bhavya Kailkhura, Ferdinando Fioretto,	(参考訳) 低ランク近似技術は、計算とメモリの要求が減り、微調整された大規模言語モデル(LLM)のデファクトスタンダードとなっている。本稿では,これらの手法が初期訓練済みデータ分布から微調整データセットのシフトを捉える上での有効性について検討する。その結果,このような変化を学習する際に,低ランク微調整が不足するケースがあることが判明した。これは、特に、事前訓練されたモデルや公正なモデルを提供することが重要であるシナリオにおいて、毒性軽減のために微調整が採用される場合に、非無視的な副作用を生じる。いくつかのモデル、データセット、タスクに関する総合的な実証的な証拠を通して、低ランクの微調整が好ましくないバイアスや有毒な振る舞いを必然的に保存することを示す。また、これは、責任あるLCM開発を促進するための慎重な評価の必要性を強調しながら、シーケンシャルな意思決定タスクにまで拡張されることも示している。 Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models (LLMs) due to their reduced computational and memory requirements. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. Our findings reveal that there are cases in which low-rank fine-tuning falls short in learning such shifts. This, in turn, produces non-negligible side effects, especially when fine-tuning is adopted for toxicity mitigation in pre-trained models, or in scenarios where it is important to provide fair models. Through comprehensive empirical evidence on several models, datasets, and tasks, we show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors. We also show that this extends to sequential decision-making tasks, emphasizing the need for careful evaluation to promote responsible LLMs development.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# コンテキスト認識型コードの要約におけるプログラマの視覚的注意 Programmer Visual Attention During Context-Aware Code Summarization ( http://arxiv.org/abs/2405.18573v1 ) ライセンス: Link先を確認	Aakash Bansal, Robert Wallace, Zachary Karas, Ningzhi Tang, Yu Huang, Toby Jia-Jun Li, Collin McMillan,	(参考訳) Abridged: プログラマの注意は、プログラマがプログラミングタスクを追求する上で、ソースコードの一部を視覚的に重視することを示しています。 XY Javaプログラマは5つの大きなJavaプロジェクトから5時間のセッションで40のメソッドの要約を作成しました。要約を書いている間、私たちは視線追跡装置を使ってプログラマの視覚的注意をマッピングしました。また、各要約の質も評価する。我々は、コンテキスト認識コードの要約中に、プログラマの注意の間で共通の振る舞いを定義する視線パターンとメトリクスを発見した。具体的には,要約の質を維持しつつ,セッション中に多くのメソッドを要約するので,プログラマは単語の読み出しを著しく減らし(p<0.01)、単語の書き直しを著しく減らさなければなりません(p\textless0.03)。また、参加者が見ているソースコードの量は、より高品質な要約と相関することがわかったが、この傾向は、しきい値を読み上げた後、要約の質が大幅に低下する(p<0.01)。我々はまた、プログラマの注意に基づくコード要約のための最も文脈的な情報を提供するプロジェクトにおけるメソッドの種類についての洞察を集めた。具体的には、プログラマが対象とするメソッドと同じクラス内でメソッドを見るのに多くの時間を費やしたことを観察した。驚いたことに、プログラマは対象のメソッドのコールグラフのメソッドを見るのに、はるかに時間を費やすことができませんでした。我々は,経験的観察がプログラマの注意をモデル化し,文脈認識によるソースコードの自動要約を改善するための将来の研究にどのように役立つかについて議論する。 Abridged: Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. We conducted an in-depth human study with XY Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye-tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read significantly (p<0.01) fewer words and make significantly fewer revisits to words (p\textless0.03) as they summarize more methods during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p<0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# SpecTra: マルチモーダル仕様の生成による言語モデルのコード翻訳能力の向上 SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications ( http://arxiv.org/abs/2405.18574v1 ) ライセンス: Link先を確認	Vikram Nitin, Baishakhi Ray,	(参考訳) 大規模言語モデル(LLM)は、重要な現実世界のアプリケーションを持つ自動コード翻訳のタスクにますます使われています。しかし、既存のほとんどのアプローチでは、プログラムのソースコードのみを LLM への入力として使用しており、プログラムから抽出できる異なる種類の仕様を考慮していない。本稿では、新しい自己整合性フィルタを用いて、与えられたプログラムから高品質な不変量、テストケース、自然言語記述を生成するマルチステージアプローチであるSpecTraを提案する。 SpecTraを2つのコード変換タスク(C to Rust,C to Go)で評価し、これらのタスクで人気の高い4つのLLMのパフォーマンスを最大10ポイント向上し、相対的に23%改善できることを示す。コード翻訳におけるLCMの性能向上には,高品質な仕様作成が有望かつ効率的な方法である可能性が示唆された。 Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality invariants, test cases, and natural language descriptions from a given program, and then uses these along with the source code to improve the quality of LLM-generated translations. We evaluate SpecTra on two code translation tasks - C to Rust, and C to Go - and show that it can enhance the performance of four popular LLMs on these tasks by up to 10 percentage points and a relative improvement of up to 23%. Our research suggests that generating high-quality specifications could be a promising and efficient way to improve the performance of LLMs for code translation.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 最大構造弱凸関数の差分に対する単一ループ確率アルゴリズム Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions ( http://arxiv.org/abs/2405.18577v1 ) ライセンス: Link先を確認	Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang,	(参考訳) 本稿では,非滑らかな非凸問題のクラスを$\min_{x}[\max_{y\in Y}\phi(x,} の形で研究する。 y) - \max_{z\in Z}\psi(x, どちらも$\Phiです。 (x) = \max_{y\in Y}\phi(x, y)$と$\Psi (x)=\max_{z\in Z}\psi(x, z)$は弱凸関数であり、$\phi(x) である。 y), \psi(x, z)$ は、それぞれ$y$ と $z$ の点で強凹函数である。研究されているが、シングルループ確率アルゴリズム、すなわち弱い凸関数と弱い凸 min-max 問題の違いが欠落している2つの問題群をカバーする。本研究では,SMAGと呼ばれる確率論的モローエンベロープ近似勾配法を提案する。この設計の鍵となる考え方は、原始変数と双対変数の確率勾配更新の1ステップだけを用いて、モローエンベロープの$\Phi, \Psi$の近似勾配を計算することである。提案アルゴリズムの有効性を検証するために, 実証実験として, ROC曲線 (pAUC) 最適化の下で, 正未ラベル学習(PU) と部分領域について, 対向フェアネス正規化器を用いて実験を行った。 In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}\phi(x, y) - \max_{z\in Z}\psi(x, z)]$, where both $\Phi(x) = \max_{y\in Y}\phi(x, y)$ and $\Psi(x)=\max_{z\in Z}\psi(x, z)$ are weakly convex functions, and $\phi(x, y), \psi(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $\Phi, \Psi$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 公共技術と公共セクター Public Technologies Transforming Work of the Public and the Public Sector ( http://arxiv.org/abs/2405.18579v1 ) ライセンス: Link先を確認	Seyun Kim, Bonnie Fan, Willa Yunqi Yang, Jessie Ramey, Sarah E Fox, Haiyi Zhu, John Zimmerman, Motahhare Eslami,	(参考訳) 公共セクターが採用する技術は、さまざまなコミュニケーション手段と意思決定手段を創造することによって、公共機関の従業員の業務プラクティスを変革した。業務領域の将来に関する最近の研究の多くは、公務員に対する技術進歩の影響に集中しているが、この分野に携わる外部利害関係者の業務実践への影響は未解明のままである。本稿では,米国各地の建築部門が展開するOneStopというデジタルプラットフォームに注目し,様々なステップやサービスを,公務員と公務員とのオンライン接触の単一ポイントに統合することを目的とする。地域事業主、建設プロセスの専門家、地域代表者、建設部員を含む22人の利害関係者との半構造化インタビューを図り、この技術移行がこれらの利害関係者の作業にどのように影響したかを考察した。我々はOneStopの採用によって引き起こされる多面的視点と経験を観察する。 OneStopは、部署の従業員との対面関係の欠如により、現地のビジネスオーナーにとって不平等な慣行を悪化させた。公共部門の従業員にとって、OneStopは、建築部門の優先順位と価値を表す作業プラクティスを標準化した。本研究は, 技術移行における標準化, 平等, 株式に関する緊張関係と, 公共セクターにおける公平な実践に関する設計上の意味について考察する。 Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging with this sector remains under-explored. In this paper, we focus on a digital platform called OneStop which is deployed by several building departments across the U.S. and aims to integrate various steps and services into a single point of online contact between public sector employees and the public. Drawing on semi-structured interviews with 22 stakeholders, including local business owners, experts involved in the construction process, community representatives, and building department employees, we investigate how this technology transition has impacted the work of these different stakeholders. We observe a multifaceted perspective and experience caused by the adoption of OneStop. OneStop exacerbated inequitable practices for local business owners due to a lack of face-to-face interactions with the department employees. For the public sector employees, OneStop standardized the work practices, representing the building department's priorities and values. Based on our findings, we discuss tensions around standardization, equality, and equity in technology transition, as well as design implications for equitable practices in the public sector.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 産業における人工知能 4.0:産業システム統合の課題 Artificial Intelligence in Industry 4.0: A Review of Integration Challenges for Industrial Systems ( http://arxiv.org/abs/2405.18580v1 ) ライセンス: Link先を確認	Alexander Windmann, Philipp Wittenberg, Marvin Schieseck, Oliver Niggemann,	(参考訳) 業界 4.0 では、CPS (Cyber-Physical Systems) は、予測保守や生産計画を含むアプリケーションに人工知能 (AI) が活用できる膨大なデータセットを生成する。しかし、AIの可能性を実証しているにもかかわらず、製造業のような分野に広く採用されていることは依然として限られている。システム統合、データ関連の問題、労働関連の問題の管理、信頼できるAIの確保などです。定量的分析では、実践者にとって重要な課題とトピックが強調されるが、それでも学者によって十分に調査される必要がある。本稿では,これらの課題に対する既存の解決策を簡潔に論じ,今後の研究への道筋を提案する。この調査は、CPSにおけるAIの費用対効果を評価する実践者や、これらの緊急課題に対処することを目指す研究者のためのリソースとして役立ちたい。 In Industry 4.0, Cyber-Physical Systems (CPS) generate vast data sets that can be leveraged by Artificial Intelligence (AI) for applications including predictive maintenance and production planning. However, despite the demonstrated potential of AI, its widespread adoption in sectors like manufacturing remains limited. Our comprehensive review of recent literature, including standards and reports, pinpoints key challenges: system integration, data-related issues, managing workforce-related concerns and ensuring trustworthy AI. A quantitative analysis highlights particular challenges and topics that are important for practitioners but still need to be sufficiently investigated by academics. The paper briefly discusses existing solutions to these challenges and proposes avenues for future research. We hope that this survey serves as a resource for practitioners evaluating the cost-benefit implications of AI in CPS and for researchers aiming to address these urgent challenges.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# テキスト対応グラフの可能性を解き明かす:大規模言語モデルによる自動関係分解 Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models ( http://arxiv.org/abs/2405.18581v1 ) ライセンス: Link先を確認	Hyunjin Seo, Taewon Kim, June Yong Yang, Eunho Yang,	(参考訳) テキスト分散グラフ(TAG)の最近の進歩は、言語モデルのテキストモデリング機能を利用することで、ノードの特徴の質を大幅に改善している。この成功にもかかわらず、事前に定義されたグラフ構造を強化するためにテキスト属性を活用することは、ほとんど探索されていない。これまでの文献では,従来のTAGのエッジは単一関係(例,ハイパーリンク)として扱われ,実際には混合意味論(例,「助言された」,「参加する」など)を包含していた。この単純化は、高度なノード機能と統合された場合でも、下流タスクにおけるグラフニューラルネットワーク(GNN)の表現学習プロセスを妨げる。対照的に、これらのエッジを異なる意味関係に分解することは、GNNの性能を大幅に向上させる。それにもかかわらず、エッジを手動で識別し、対応する意味関係にラベル付けすることは労働集約的であり、しばしばドメインの専門知識を必要とする。この目的のために,RoSE (Relation-oriented Semantic Edge-decomposition) を導入した。これは,Large Language Models (LLMs) の機能を利用して,生のテキスト属性を分析してグラフ構造を分解する新しいフレームワークである。 RoSEは,(1)LLMベースのジェネレータと識別器を用いて意味のある関係を識別し,(2)LLMベースの分解器を用いて接続ノードに関連するテキストコンテンツを解析することにより,各エッジを対応する関係に分類する。大規模な実験により、我々のモデルに依存しないフレームワークは、さまざまなデータセットのノード分類性能を大幅に向上し、ウィスコンシンデータセットでは最大16%の改善が達成された。 Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlinks) in previous literature, actually encompass mixed semantics (e.g., "advised by" and "participates in"). This simplification hinders the representation learning process of Graph Neural Networks (GNNs) on downstream tasks, even when integrated with advanced node features. In contrast, we discover that decomposing these edges into distinct semantic relations significantly enhances the performance of GNNs. Despite this, manually identifying and labeling of edges to corresponding semantic relations is labor-intensive, often requiring domain expertise. To this end, we introduce RoSE (Relation-oriented Semantic Edge-decomposition), a novel framework that leverages the capability of Large Language Models (LLMs) to decompose the graph structure by analyzing raw text attributes - in a fully automated manner. RoSE operates in two stages: (1) identifying meaningful relations using an LLM-based generator and discriminator, and (2) categorizing each edge into corresponding relations by analyzing textual contents associated with connected nodes via an LLM-based decomposer. Extensive experiments demonstrate that our model-agnostic framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 暗号化制御システムのための検証可能な計算方式 A Verifiable Computing Scheme for Encrypted Control Systems ( http://arxiv.org/abs/2405.18586v1 ) ライセンス: Link先を確認	Francesca Stabile, Walter Lucia, Amr Youssef, Giuseppe Franze,	(参考訳) クラウドコンピューティング技術の普及は、高性能、リモートアクセシビリティ、プライバシを提供する、ネットワーク化された暗号化制御システムをデプロイするための道を開いた。しかし、サードパーティのクラウドサービスプロバイダ上でコントロールアルゴリズムが実行される場合、コントロールロジックはクラウド上の悪意のあるエージェントによって変更される可能性がある。これにより、雲から受信した制御信号の正当性を検証することが必須となる。ゼロ知識証明手法のような従来の検証手法は、証明生成と検証の両方で計算的に要求されるが、証明器と検証器の間には数ラウンドの相互作用が必要であり、その結果、リアルタイム制御システムでは適用できない。本稿では,確率論的カット・アンド・チョース手法に着想を得た,計算的に安価で検証可能な新しい計算ソリューションを提案する。提案方式により, プラントのアクチュエータは, 制御方式の性能を損なうことなく, 暗号化クラウドベースのネットワーク制御による計算を検証できる。遠隔操作型Khepera IV差動駆動ロボットを用いて,提案手法の有効性と実時間適用性を示す。 The proliferation of cloud computing technologies has paved the way for deploying networked encrypted control systems, offering high performance, remote accessibility and privacy. However, in scenarios where the control algorithms run on third-party cloud service providers, the control logic might be changed by a malicious agent on the cloud. Consequently, it is imperative to verify the correctness of the control signals received from the cloud. Traditional verification methods, like zero-knowledge proof techniques, are computationally demanding in both proof generation and verification, may require several rounds of interactions between the prover and verifier and, consequently, are inapplicable in realtime control system applications. In this paper, we present a novel computationally inexpensive verifiable computing solution inspired by the probabilistic cut-and-choose approach. The proposed scheme allows the plant's actuator to validate the computations accomplished by the encrypted cloud-based networked controller without compromising the control scheme's performance. We showcase the effectiveness and real-time applicability of the proposed verifiable computation scheme using a remotely controlled Khepera IV differential-drive robot.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 幾何学的複雑度によるマルチクラス一般化境界 A Margin-based Multiclass Generalization Bound via Geometric Complexity ( http://arxiv.org/abs/2405.18590v1 ) ライセンス: Link先を確認	Michael Munn, Benoit Dherin, Javier Gonzalvo,	(参考訳) ディープニューラルネットワークの一般化能力をよりよく理解するために、彼らの成功に関する理論的理解を解き放ち、さらなる改善の道筋を提供する手段として、かなりの努力がなされている。本稿では,最近の複雑性尺度である幾何複雑性に依存するニューラルネットワークのマージンベース多クラス一般化境界について検討する。我々は、ネットワークの余分な正規化幾何学的複雑さとスケールし、幅広いデータ分布とモデルクラスを持つ一般化誤差の新たな上限を導出する。 CIFAR-10およびCIFAR-100データセット上でSGDでトレーニングしたResNet-18モデルに対して,本手法の一般化境界を実験的に検討した。 There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 自爆テロ検知器の配置に関するメタヒューリスティックなアプローチ Metaheuristic approaches to the placement of suicide bomber detectors ( http://arxiv.org/abs/2405.18593v1 ) ライセンス: Link先を確認	Carlos Cotta, José E. Gallardo,	(参考訳) 自爆テロはテロリズムの悪名高い形態であり、世界的テロ戦争の時代にますます広まりつつある。本研究は,本種の標的攻撃事例と,脅威領域に分布する検知器の使用を保護対策として検討する。このような検知器は信頼性が低いため、攻撃を検知する確率を最大化するために戦略的に配置する必要があるため、予想される死傷者数を最小化する。この目的のために、局所探索と集団探索に基づく異なるメタヒューリスティックなアプローチが検討され、文献からの強力な欲求的ヒューリスティックに対してベンチマークされる。非常に多様な特性を有する合成事例について広範な実証評価を行った。ほとんどのメタヒューリスティックスはグリーディアルゴリズムよりも優れており、ヒルクライマーは残りのアプローチよりも優れていることが示されている。このヒルクライマーはその後、どの問題特徴が欲求的アプローチより上かを決定するための感度分析を受け、最終的に現実的なシナリオの後に構築された多くの問題インスタンスにデプロイされ、ヒューリスティックの優れたパフォーマンスを裏付ける。 Suicide bombing is an infamous form of terrorism that is becoming increasingly prevalent in the current era of global terror warfare. We consider the case of targeted attacks of this kind, and the use of detectors distributed over the area under threat as a protective countermeasure. Such detectors are non-fully reliable, and must be strategically placed in order to maximize the chances of detecting the attack, hence minimizing the expected number of casualties. To this end, different metaheuristic approaches based on local search and on population-based search are considered and benchmarked against a powerful greedy heuristic from the literature. We conduct an extensive empirical evaluation on synthetic instances featuring very diverse properties. Most metaheuristics outperform the greedy algorithm, and a hill-climber is shown to be superior to remaining approaches. This hill-climber is subsequently subject to a sensitivity analysis to determine which problem features make it stand above the greedy approach, and is finally deployed on a number of problem instances built after realistic scenarios, corroborating the good performance of the heuristic.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 説明可能なXGBoostに基づく偽情報・偽情報検出手法 An Explainable XGBoost-based Approach on Assessing Detection of Deception and Disinformation ( http://arxiv.org/abs/2405.18596v1 ) ライセンス: Link先を確認	Alex V Mbaziira, Maha F Sabir,	(参考訳) 脅威のある俳優は、地政学的およびグローバルな公開イベントを引き続き利用し続け、インターネット上で偽情報を広める攻撃的なキャンペーンを展開している。本稿では,虚偽とサイバー犯罪に関連する心理言語学および計算言語学のプロセスを用いた偽情報検出の先行研究を拡張し,機械学習モデルの予測結果に影響を及ぼす特徴について理解を深める。本稿では,eXtreme Gradient Boosting 機械学習アルゴリズムを用いて,偽情報・詐欺・偽陽性・否定的オンラインレビュー・詐欺を訓練したハイブリッドモデルにおいて,偽情報の偽造パターンを決定することを試みる。 4つのハイブリッドモデルは、偽情報と詐欺(DIS+EN)、偽情報と詐欺(DIS+FB)、偽情報と好ましくない偽レビュー(DIS+POS)、偽情報と好ましくない偽レビュー(DIS+NEG)に基づいて訓練されたモデルを生成する。 4種類のハイブリッドモデルは,75%から85%の予測精度で偽情報や偽情報を検出した。モデルの結果をSHAPで評価し,特徴の影響を判定した。 Threat actors continue to exploit geopolitical and global public events launch aggressive campaigns propagating disinformation over the Internet. In this paper we extend our prior research in detecting disinformation using psycholinguistic and computational linguistic processes linked to deception and cybercrime to gain an understanding of the features impact the predictive outcome of machine learning models. In this paper we attempt to determine patterns of deception in disinformation in hybrid models trained on disinformation and scams, fake positive and negative online reviews, or fraud using the eXtreme Gradient Boosting machine learning algorithm. Four hybrid models are generated which are models trained on disinformation and fraud (DIS+EN), disinformation and scams (DIS+FB), disinformation and favorable fake reviews (DIS+POS) and disinformation and unfavorable fake reviews (DIS+NEG). The four hybrid models detected deception and disinformation with predictive accuracies ranging from 75% to 85%. The outcome of the models was evaluated with SHAP to determine the impact of the features.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# コンフォーマル予測から信頼領域へ From Conformal Predictions to Confidence Regions ( http://arxiv.org/abs/2405.18601v1 ) ライセンス: Link先を確認	Charles Guille-Escuret, Eugene Ndiaye,	(参考訳) コンフォーマル予測手法は予測モデルにおける不確実性の定量化を著しく進めてきた。しかし、モデルパラメーターに対する信頼領域の構築は、しばしばデータ分布に関する厳密な仮定を必要とする、あるいは単に漸近的な保証を提供する、顕著な課題を示す。本稿では,モデルパラメータに対する信頼領域を確立するために,モデル出力に共形予測間隔を組み合わせた新しいアプローチCCRを提案する。本稿では,雑音に対する最小限の仮定の下でのカバレッジ保証について述べる。本手法は, 完全あるいはクロスコンフォーマルなアプローチを含む, 分割共形予測とブラックボックス手法の両方に適用可能である。線形モデルの特定の場合において、導出された信頼領域は混合整数線形プログラム(MILP)の実現可能な集合として現れ、個々のパラメータに対する信頼区間の導出を容易にし、堅牢な最適化を可能にする。我々はCCRと最近のヘテロスケダス音や非ガウス音といった難易度設定の進歩を実証的に比較した。 Conformal prediction methodologies have significantly advanced the quantification of uncertainties in predictive models. Yet, the construction of confidence regions for model parameters presents a notable challenge, often necessitating stringent assumptions regarding data distribution or merely providing asymptotic guarantees. We introduce a novel approach termed CCR, which employs a combination of conformal prediction intervals for the model outputs to establish confidence regions for model parameters. We present coverage guarantees under minimal assumptions on noise and that is valid in finite sample regime. Our approach is applicable to both split conformal predictions and black-box methodologies including full or cross-conformal approaches. In the specific case of linear models, the derived confidence region manifests as the feasible set of a Mixed-Integer Linear Program (MILP), facilitating the deduction of confidence intervals for individual parameters and enabling robust optimization. We empirically compare CCR to recent advancements in challenging settings such as with heteroskedastic and non-Gaussian noise.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# SST-GCN:道路交通事故リスク予測のためのシーケンスベース時空間グラフ畳み込みネットワーク SST-GCN: The Sequential based Spatio-Temporal Graph Convolutional networks for Minute-level and Road-level Traffic Accident Risk Predictio ( http://arxiv.org/abs/2405.18602v1 ) ライセンス: Link先を確認	Tae-wook Kim, Han-jin Lee, Hyeon-Jin Jung, Ji-Woong Yang, Ellen J. Hong,	(参考訳) 交通事故は世界中で大きな社会問題として認識されており、毎年多くの負傷者や大きなコストがかかる。その結果,交通事故の予測・防止方法が長年研究されてきた。人工知能の分野での進歩に伴い、さまざまな研究が交通事故予測に機械学習とディープラーニング技術を適用している。現代の交通状況は1分ごとに急速に変化し、道路によって大きく変化している。言い換えれば、交通事故のリスクは各道路の様々なパターンで分単位で変化する。そのため,ミニ・レベルとロード・レベルにおける交通事故のリスクを予測することが望ましい。しかし、道路は隣接する道路と密接かつ複雑な関係にあるため、ミニット・レベルとロード・レベルでの交通事故の予測に関する研究は困難である。したがって,交通事故予測のための道路の空間的・時間的特性を反映できるモデルの構築が不可欠である。その結果,グラフ畳み込みネットワークを用いて道路の空間的特性を捉える手法や,交通事故のリスクを予測するための時間的特性を再現する手法が近年試みられている。本稿では, 韓国の首都ソウルに構築された道路データセットを用いて, GCN と LSTM を組み合わせたシーケンスベース時空間グラフ畳み込みネットワーク(SST-GCN)を提案する。実験により、SST-GCNは他の最先端モデルよりも小さなレベル予測の方が優れていることが示された。 Traffic accidents are recognized as a major social issue worldwide, causing numerous injuries and significant costs annually. Consequently, methods for predicting and preventing traffic accidents have been researched for many years. With advancements in the field of artificial intelligence, various studies have applied Machine Learning and Deep Learning techniques to traffic accident prediction. Modern traffic conditions change rapidly by the minute, and these changes vary significantly across different roads. In other words, the risk of traffic accidents changes minute by minute in various patterns for each road. Therefore, it is desirable to predict traffic accident risk at the Minute-Level and Road-Level. However, because roads have close and complex relationships with adjacent roads, research on predicting traffic accidents at the Minute-Level and Road-Level is challenging. Thus, it is essential to build a model that can reflect the spatial and temporal characteristics of roads for traffic accident prediction. Consequently, recent attempts have been made to use Graph Convolutional Networks to capture the spatial characteristics of roads and Recurrent Neural Networks to capture their temporal characteristics for predicting traffic accident risk. This paper proposes the Sequential based Spatio-Temporal Graph Convolutional Networks (SST-GCN), which combines GCN and LSTM, to predict traffic accidents at the Minute-Level and Road-Level using a road dataset constructed in Seoul, the capital of South Korea. Experiments have demonstrated that SST-GCN outperforms other state-of-the-art models in Minute-Level predictions.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# BioBERTを用いた深層学習と融合したChemProt-DrugProtによるバイオメディカルリレーション抽出 BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction ( http://arxiv.org/abs/2405.18605v1 ) ライセンス: Link先を確認	Bridget T. McInnes, Jiawei Tang, Darshini Mahendran, Mai H. Nguyen,	(参考訳) 本稿では,生物医学的テキストから関係抽出を高度化するための方法論について述べる。 BioBERTモデルと多層完全接続ネットワークアーキテクチャを活用することで,新たなマージ戦略を用いて,ChemProtデータセットとPaldrProtデータセットを統合する。大規模な実験を通じて、特にデータセット間で共有されるCPRグループにおいて、大幅な性能向上を示す。この結果は,サンプル数の増加とモデル精度の向上において,データセットのマージの重要性を浮き彫りにした。さらに, バイオメディカル研究と臨床実習における自動情報抽出の可能性を強調した。 This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# 3次元多視点多目的追跡のためのトラック初期化と再同定 Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking ( http://arxiv.org/abs/2405.18606v1 ) ライセンス: Link先を確認	Linh Van Ma, Tran Thien Dat Nguyen, Ba-Ngu Vo, Hyunsung Jang, Moongu Jeon,	(参考訳) モノクロカメラからの2次元検出のみを用いた3次元多対象追跡(MOT)ソリューションを提案する。さらに, カメラのリコンフィグレーションを行う場合, カメラのリコンフィグレーションは必要とされないが, カメラのリコンフィグレーションは, カメラのリコンフィグレーションのみを更新する必要がある。提案手法は,トラック開始・終了・再識別・オクルージョンハンドリング・データアソシエーションを単一ベイズフィルタ再帰に組み込んだベイズ多対象定式化に基づく。しかし、これらの機能を利用する正確なフィルタは、(多目的)フィルタリング密度が指数関数的に増加し、既存の近似はこれらの機能の一部を高速に切り離すため、数値的に難解である。そこで本研究では,オブジェクトの特徴とキネマティクスを計測モデルに組み込むことにより,オンラインMOTに適したより効率的な近似法を開発した。具体的には、複数のカメラから2次元検出と抽出した特徴を利用して、トラック開始・終了・再識別機能を実現するために、多目的フィルタリング密度をよりよく近似する。さらに,カメラ面上の3次元物体の2次元投影に基づく抽出可能な幾何オクルージョンモデルを導入することにより,フィルタのオクルージョンハンドリング機能を実現する。課題のあるデータセットに対する提案ソリューションの評価は、既存のマルチビューMOTソリューションと比較して、カメラ構成がオンザフライで変化する場合の大幅な改善と堅牢性を示している。ソースコードはhttps://github.com/linh-gist/mv-glmb-ab.comで公開されている。 We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance-reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.	翻訳日:2024-05-30 21:43:38 公開日:2024-05-28
# DTR-Bench: 強化学習に基づく動的処理レジームのためのシリコ環境とベンチマークプラットフォーム DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime ( http://arxiv.org/abs/2405.18610v1 ) ライセンス: Link先を確認	Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu,	(参考訳) 強化学習(Reinforcement Learning, RL)は、個人化医療における動的治療体制(DTR)を最適化する可能性、特に薬物服用処方薬や医薬品の推奨に対して、認知度を高めている。しかし、様々な医療シナリオをシミュレートするための統一されたフレームワークが存在しないことや、これらのコンテキストにおけるRLアルゴリズムの有効性をベンチマークするための包括的な分析など、大きな課題が続いている。このギャップに対処するために,がん化学療法,放射線療法,糖尿病のグルコース管理,敗血症治療など,一般的なDTR応用に適した4つのシミュレーション環境からなるベンチマークプラットフォームである「textit{DTR-Bench}」を紹介した。薬物動態・薬物動態 (PK/PD) の変動, ノイズ, 欠落データなど, 現実の課題の中で, それらの性能を強調し, 様々な現状のRLアルゴリズムの評価を行った。実験の結果,RLアルゴリズムでは雑音や患者変動の有無によって性能劣化の程度が異なっており,いくつかのアルゴリズムは収束しない。さらに、時間的観察表現を用いることで、DTR設定の性能が常に向上するわけではないことが観察された。これらの複雑さを効果的に管理し、患者固有の医療を増強できるロバストで適応的なRLアルゴリズムを開発する必要性が示唆された。ベンチマークとコードはhttps://github.com/GilesLuo/DTR-Bench.comで公開しています。 Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effectiveness of RL algorithms within these contexts. To address this gap, we introduce \textit{DTR-Bench}, a benchmarking platform comprising four distinct simulation environments tailored to common DTR applications, including cancer chemotherapy, radiotherapy, glucose management in diabetes, and sepsis treatment. We evaluate various state-of-the-art RL algorithms across these settings, particularly highlighting their performance amidst real-world challenges such as pharmacokinetic/pharmacodynamic (PK/PD) variability, noise, and missing data. Our experiments reveal varying degrees of performance degradation among RL algorithms in the presence of noise and patient variability, with some algorithms failing to converge. Additionally, we observe that using temporal observation representations does not consistently lead to improved performance in DTR settings. Our findings underscore the necessity of developing robust, adaptive RL algorithms capable of effectively managing these complexities to enhance patient-specific healthcare. We have open-sourced our benchmark and code at https://github.com/GilesLuo/DTR-Bench.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# GLOCON Database: 設計決定とユーザマニュアル(v1.0) GLOCON Database: Design Decisions and User Manual (v1.0) ( http://arxiv.org/abs/2405.18613v1 ) ライセンス: Link先を確認	Ali Hürriyetoğlu, Osman Mutlu, Fırat Duruşan, Erdem Yörük,	(参考訳) GLOCONは、複数の言語で各国のニュースソースから自動的に抽出される論争的な出来事のデータベースである。全国のニュースソースが利用され、完全なニュースアーカイブが処理され、各ソースのイベントリストが作成される。自動化は、完全なニュースアーカイブ(Y\"or\"uk et al 2022)からランダムにサンプリングされた金の標準コーパスを使用して達成され、Duru\c{s}an et al (2022)で提供されるイベント定義に基づいて、少なくとも2つのドメイン専門家によって注釈付けされる。 GLOCON is a database of contentious events automatically extracted from national news sources from various countries in multiple languages. National news sources are utilized, and complete news archives are processed to create an event list for each source. Automation is achieved using a gold standard corpus sampled randomly from complete news archives (Y\"or\"uk et al. 2022) and all annotated by at least two domain experts based on the event definition provided in Duru\c{s}an et al. (2022).	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# Augmented Physics:静的図からインタラクティブな物理シミュレーションを作成する機械学習ツール Augmented Physics: A Machine Learning-Powered Tool for Creating Interactive Physics Simulations from Static Diagrams ( http://arxiv.org/abs/2405.18614v1 ) ライセンス: Link先を確認	Aditya Gunturu, Yi Wen, Jarin Thundathil, Nandi Zhang, Rubaiat Habib Kazi, Ryo Suzuki,	(参考訳) 静的な教科書図からインタラクティブな物理シミュレーションを作成するための機械学習ツールであるAugmented Physicsを紹介した。 Segment Anything や OpenCV などのコンピュータビジョン技術を活用することで,ユーザが物理教科書から図を半自動抽出し,抽出したコンテンツに基づいてインタラクティブなシミュレーションを生成することができる。これらのインタラクティブなダイアグラムは、スキャンされた教科書ページにシームレスに統合され、重力、光学、回路、キネマティックスなど、様々な物理概念の対話的でパーソナライズされた学習体験を容易にする。 7人の物理インストラクターによる説明研究に基づいて、我々は4つの重要な強化手法を探求する。 1【拡張実験】 2)アニメーション図。 3)双方向マニピュレータ,及び 4)パラメータ可視化。技術評価,ユーザビリティスタディ(N=12),エキスパートインタビュー(N=12。その結果,本システムは,物理教育において,よりエンゲージメントとパーソナライズされた学習体験を促進することが示唆された。 We introduce Augmented Physics, a machine learning-powered tool designed for creating interactive physics simulations from static textbook diagrams. Leveraging computer vision techniques, such as Segment Anything and OpenCV, our web-based system enables users to semi-automatically extract diagrams from physics textbooks and then generate interactive simulations based on the extracted content. These interactive diagrams are seamlessly integrated into scanned textbook pages, facilitating interactive and personalized learning experiences across various physics concepts, including gravity, optics, circuits, and kinematics. Drawing on an elicitation study with seven physics instructors, we explore four key augmentation techniques: 1) augmented experiments, 2) animated diagrams, 3) bi-directional manipulatives, and 4) parameter visualization. We evaluate our system through technical evaluation, a usability study (N=12), and expert interviews (N=12). The study findings suggest that our system can facilitate more engaging and personalized learning experiences in physics education.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# ウェーブレットを用いた視覚変換器用画像トケナイザ Wavelet-Based Image Tokenizer for Vision Transformers ( http://arxiv.org/abs/2405.18616v1 ) ライセンス: Link先を確認	Zhenhai Zhu, Radu Soricut,	(参考訳) 非重複パッチワイドコンボリューションは、すべての最先端ビジョントランスフォーマー(ViT)モデルのデフォルトの画像トークンである。多くのViT変異体が効率と精度を改善するために提案されているが、画像トークン化装置自体の改善に関する研究はほとんど報告されていない。本稿ではウェーブレット変換に基づく新しい画像トークン化手法を提案する。新たなトークン機構を備えたViTモデルは,ImageNet検証セットのトレーニングスループットの向上とトップ1精度の向上を実現する。本稿では,ViTモデルアーキテクチャの変更を伴わずに,トークン化器がトレーニングスループットを向上する理由に関する理論的解析を行う。分析の結果,新しいトークンーザは高解像度画像を効果的に処理でき,対向攻撃に対して自然に耐性があることが示唆された。さらに、画像理解のための一様でないグリッド上の画像トークンなど、ViTベースのモデル設計のための重要な研究方向について、新たな視点を提供する。 Non-overlapping patch-wise convolution is the default image tokenizer for all state-of-the-art vision Transformer (ViT) models. Even though many ViT variants have been proposed to improve its efficiency and accuracy, little research on improving the image tokenizer itself has been reported in the literature. In this paper, we propose a new image tokenizer based on wavelet transformation. We show that ViT models with the new tokenizer achieve both higher training throughput and better top-1 precision for the ImageNet validation set. We present a theoretical analysis on why the proposed tokenizer improves the training throughput without any change to ViT model architecture. Our analysis suggests that the new tokenizer can effectively handle high-resolution images and is naturally resistant to adversarial attack. Furthermore, the proposed image tokenizer offers a fresh perspective on important new research directions for ViT-based model design, such as image tokens on a non-uniform grid for image understanding.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# RealitySummary:大規模言語モデルを用いたオンデマンド混合現実感文書強調 RealitySummary: On-Demand Mixed Reality Document Enhancement using Large Language Models ( http://arxiv.org/abs/2405.18620v1 ) ライセンス: Link先を確認	Aditya Gunturu, Shivesh Jadon, Nandi Zhang, Jarin Thundathil, Wesley Willett, Ryo Suzuki,	(参考訳) 本稿では、オンデマンドテキスト抽出、要約、拡張を用いて、印刷物やデジタル文書を拡張可能な複合現実読影アシスタントであるRealitySummaryを紹介する。拡張読影ツールは、オーバーレイされたデジタルコンテンツによる物理的な読書体験を強化することを約束するが、以前のシステムは、通常、その一般化可能性と実世界のユースケースを制限する、事前処理された文書を必要とする。本稿では,大規模言語モデルを活用したオンデマンド文書拡張について検討する。そこで我々はまず,文書拡張の5つのカテゴリ(要約,拡張,ナビゲーション,比較,抽出)を特定した探索的設計研究を行った。そこで我々は,Google Cloud OCRとGPT-4を使ってテキストを自動的に抽出して要約し,Microsoft Hololens 2とApple Vision Proを使って文書に関する情報を埋め込む概念実証システムを開発した。 6つの特定のドキュメント拡張のリアルタイム例を示します。 1)要約。 2)比較表 3) タイムライン。 4)キーワードリスト。 5)要約ハイライト、及び 6) 情報カード。ユーザビリティスタディ (N=12) とイン・ザ・ワイルドスタディ (N=11) の結果は、オンデマンドMR文書の強化と今後の研究機会の可能性を浮き彫りにしている。 We introduce RealitySummary, a mixed reality reading assistant that can enhance any printed or digital document using on-demand text extraction, summarization, and augmentation. While augmented reading tools promise to enhance physical reading experiences with overlaid digital content, prior systems have typically required pre-processed documents, which limits their generalizability and real-world use cases. In this paper, we explore on-demand document augmentation by leveraging large language models. To understand generalizable techniques for diverse documents, we first conducted an exploratory design study which identified five categories of document enhancements (summarization, augmentation, navigation, comparison, and extraction). Based on this, we developed a proof-of-concept system that can automatically extract and summarize text using Google Cloud OCR and GPT-4, then embed information around documents using a Microsoft Hololens 2 and Apple Vision Pro. We demonstrate real-time examples of six specific document augmentations: 1) summaries, 2) comparison tables, 3) timelines, 4) keyword lists, 5) summary highlighting, and 6) information cards. Results from a usability study (N=12) and in-the-wild study (N=11) highlight the potential benefits of on-demand MR document enhancement and opportunities for future research.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# ネットワーク干渉を用いたマルチアーマッドバンド Multi-Armed Bandits with Network Interference ( http://arxiv.org/abs/2405.18621v1 ) ライセンス: Link先を確認	Abhineet Agarwal, Anish Agarwal, Lorenzo Masoero, Justin Whitehouse,	(参考訳) 干渉によるオンライン実験は、電子商取引や医学における適応的臨床試験のような近代的な応用において共通の課題である。例えば、オンラインマーケットプレースでは、商品の収益は競合商品に適用される割引に依存する。干渉による統計的推測はオフライン環境で広く研究されているが、後悔を最小限に抑えるために適応的に治療を割り当てる方法についてはあまり知られていない。我々は,学習者(eコマースプラットフォーム)が,可能な$\mathcal{A}$アクション(割引)の1つを,後悔(収益の最大化)を最小限に抑えるために,T$ラウンド以上の$N$ユニット(グッド)に順次割り当てる,マルチアームバンディット(MAB)問題を研究することで,このギャップに対処する。従来のMAB問題とは異なり、各ユニットの報酬は他のユニットに割り当てられた処理に依存する。 $\mathcal{A}$アクションと$N$ユニットでは、アクション空間が$\mathcal{A}^N$として成長するので、後悔を最小化することは組合せ的に困難である。この問題を克服するために、各ユニットの報酬は、近隣ユニットの$s$に割り当てられた処理によってのみ影響を受ける、スパースネットワーク干渉モデルについて検討する。離散フーリエ解析のツールを用いて、単位固有報酬 $r_n: [\mathcal{A}]^N \rightarrow \mathbb{R} $ の疎線型表現を開発し、後悔を最小限に抑える単純な線形回帰アルゴリズムを提案する。重要なことは、学習者がすべてのユニットの干渉地区を観察し、いつその領域が未知になるかの両方において、我々のアルゴリズムは確実に低い後悔を達成することである。これは、既知のネットワーク上の干渉の強さに厳密な条件を課すこのトピックに関する他の研究を著しく一般化し、また、後悔を著しく弱い最適な行動と比較する。数値シミュレーションにより理論的知見を裏付ける。 Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to competing goods. Statistical inference with interference is widely studied in the offline setting, but far less is known about how to adaptively assign treatments to minimize regret. We address this gap by studying a multi-armed bandit (MAB) problem where a learner (e-commerce platform) sequentially assigns one of possible $\mathcal{A}$ actions (discounts) to $N$ units (goods) over $T$ rounds to minimize regret (maximize revenue). Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. With $\mathcal{A}$ actions and $N$ units, minimizing regret is combinatorially difficult since the action space grows as $\mathcal{A}^N$. To overcome this issue, we study a sparse network interference model, where the reward of a unit is only affected by the treatments assigned to $s$ neighboring units. We use tools from discrete Fourier analysis to develop a sparse linear representation of the unit-specific reward $r_n: [\mathcal{A}]^N \rightarrow \mathbb{R} $, and propose simple, linear regression-based algorithms to minimize regret. Importantly, our algorithms achieve provably low regret both when the learner observes the interference neighborhood for all units and when it is unknown. This significantly generalizes other works on this topic which impose strict conditions on the strength of interference on a known network, and also compare regret to a markedly weaker optimal action. Empirically, we corroborate our theoretical findings via numerical simulations.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# フォトニック量子コンピューティングを用いたデータセットのビクラスタリング Biclustering a dataset using photonic quantum computing ( http://arxiv.org/abs/2405.18622v1 ) ライセンス: Link先を確認	Ajinkya Borle, Ameya Bhave,	(参考訳) ビクラスタリングは、特定の基準に従ってデータセットの行と列をまとめようとする機械学習とデータマイニングにおける問題である。本研究では、ボソンやガウスボソンサンプリング(GBS)のような量子コンピューティングモデルがこの問題にもたらす自然な関係を強調した。まず, ボソンサンプリングを用いて, 行列の永久性に基づく二クラスターを同定する。次に、ガウスボソンサンプリングを用いたデータセット内のクラスタを見つけるヒューリスティックを提案する。一データセットを二部グラフに変換して (i) GBS を実行して、より大きい二部グラフ内の最も密度の高い部分グラフを見つける。以上より提案したヒューリスティックスをシミュレーションした結果,今後の探査に期待できる結果が得られた。 Biclustering is a problem in machine learning and data mining that seeks to group together rows and columns of a dataset according to certain criteria. In this work, we highlight the natural relation that quantum computing models like boson and Gaussian boson sampling (GBS) have to this problem. We first explore the use of boson sampling to identify biclusters based on matrix permanents. We then propose a heuristic that finds clusters in a dataset using Gaussian boson sampling by (i) converting the dataset into a bipartite graph and then (ii) running GBS to find the densest sub-graph(s) within the larger bipartite graph. Our simulations for the above proposed heuristics show promising results for future exploration in this area.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# CNNとLSTMベースの侵入検知システムによるIoTセキュリティの強化 Enhancing IoT Security with CNN and LSTM-Based Intrusion Detection Systems ( http://arxiv.org/abs/2405.18624v1 ) ライセンス: Link先を確認	Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari,	(参考訳) モノのインターネット(IoT)デバイスをサイバー攻撃から守ることは、固有のセキュリティ脆弱性のために必須である。これらの脆弱性には、個人と組織の両方に大きなダメージを与える高度な攻撃が含まれます。侵入検知システム(IDS)のような堅牢なセキュリティ対策を採用することは、これらの問題を解決し、IoTシステムをそのような攻撃から保護するために不可欠である。この文脈で提案するIDSモデルは,畳み込みニューラルネットワーク(CNN)と長短期記憶(LSTM)ディープラーニング(DL)モデルを組み合わせて構成する。この融合により、パターン認識のためのCNNの空間的特徴抽出能力とLSTMの逐次記憶保持を活用し、複雑な時間的依存関係を識別し、精度と効率を向上させることにより、IoTトラフィックをバイナリカテゴリ、良性、悪質なアクティビティに分類し、検出しやすくする。提案モデルの性能評価には,CICIDS2017データセットを使用した最終テストフェーズを通じてモデルのパフォーマンスを検証しながら,トレーニングと最終テストの両方に新たなCICIoT2023データセットを使用した。提案モデルの精度は98.42%,最小損失は0.0275である。偽陽性率(FPR)も同様に重要であり、F1スコア98.57%で9.17%に達した。これらの結果から,CNN-LSTM IDSモデルの有効性が示唆された。 Protecting Internet of things (IoT) devices against cyber attacks is imperative owing to inherent security vulnerabilities. These vulnerabilities can include a spectrum of sophisticated attacks that pose significant damage to both individuals and organizations. Employing robust security measures like intrusion detection systems (IDSs) is essential to solve these problems and protect IoT systems from such attacks. In this context, our proposed IDS model consists on a combination of convolutional neural network (CNN) and long short-term memory (LSTM) deep learning (DL) models. This fusion facilitates the detection and classification of IoT traffic into binary categories, benign and malicious activities by leveraging the spatial feature extraction capabilities of CNN for pattern recognition and the sequential memory retention of LSTM for discerning complex temporal dependencies in achieving enhanced accuracy and efficiency. In assessing the performance of our proposed model, the authors employed the new CICIoT2023 dataset for both training and final testing, while further validating the model's performance through a conclusive testing phase utilizing the CICIDS2017 dataset. Our proposed model achieves an accuracy rate of 98.42%, accompanied by a minimal loss of 0.0275. False positive rate(FPR) is equally important, reaching 9.17% with an F1-score of 98.57%. These results demonstrate the effectiveness of our proposed CNN-LSTM IDS model in fortifying IoT environments against potential cyber threats.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# 適応的文脈をもつ因果文脈帯域 Causal Contextual Bandits with Adaptive Context ( http://arxiv.org/abs/2405.18626v1 ) ライセンス: Link先を確認	Rahul Madhavan, Aurghya Maiti, Gaurav Sinha, Siddharth Barman,	(参考訳) 本研究では,学習者が選択した初期介入に基づいて,文脈が選択される因果的文脈包帯の変種について検討する。各ラウンドの開始時に、学習者は、環境によって確率的文脈が明らかになるかに応じて、初期動作を選択する。その後、学習者は最終動作を選択し、報酬を受け取る。環境との相互作用にT$が与えられた場合、学習者の目的は、(最初のアクションと最後のアクションを選択する)ポリシーを最大限の報酬で学習することである。本稿では、ある既知の因果グラフにおいて、全ての動作がノード上での介入に対応する特定の状況について検討する。決定論的文脈設定から事前作業を拡張し、簡単な後悔の最小化保証を得る。これは、インスタンス依存の因果パラメータ$\lambda$によって実現されます。さらに、私たちの単純な後悔は、多くのインスタンスに対して本質的にきついことを証明します。我々の研究の重要な特徴は、バンディット探索問題に対処するために凸最適化を使うことである。また、理論的結果を検証し、プロジェクトのGitHubリポジトリでコードをリリースするための実験も行っています。 We study a variant of causal contextual bandits where the context is chosen based on an initial intervention chosen by the learner. At the beginning of each round, the learner selects an initial action, depending on which a stochastic context is revealed by the environment. Following this, the learner then selects a final action and receives a reward. Given $T$ rounds of interactions with the environment, the objective of the learner is to learn a policy (of selecting the initial and the final action) with maximum expected reward. In this paper we study the specific situation where every action corresponds to intervening on a node in some known causal graph. We extend prior work from the deterministic context setting to obtain simple regret minimization guarantees. This is achieved through an instance-dependent causal parameter, $\lambda$, which characterizes our upper bound. Furthermore, we prove that our simple regret is essentially tight for a large class of instances. A key feature of our work is that we use convex optimization to address the bandit exploration problem. We also conduct experiments to validate our theoretical results, and release our code at our project GitHub repository: https://github.com/adaptiveContextualCausalBandits/aCCB.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# PureGen: 生成モデルダイナミクスによる列車時間ポゾン防御のためのユニバーサルデータ浄化 PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics ( http://arxiv.org/abs/2405.18627v1 ) ライセンス: Link先を確認	Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie,	(参考訳) トレインタイムのデータ中毒攻撃は、トレーニング中に敵対的な例を導入することによって機械学習モデルを脅かす。現在の防衛手法は、しばしば一般化性能を低下させ、攻撃固有のものであり、訓練のオーバーヘッドがかなり大きい。そこで本稿では,エネルギーベースモデル (EBM) の反復的ランゲヴィン力学, 拡散確率モデル (DDPM) あるいはその両方を用いて実現された確率変換($\Psi(x)$)を用いた普遍的データ浄化手法を提案する。これらのアプローチは、分類器の一般化に最小限の影響で有毒データを浄化する。 CIFAR-10, Tiny-ImageNet, CINIC-10におけるNarcissus, Bullseye Polytope, Gradient Matchingなど,攻撃や分類器固有の情報を必要とせずに, 特殊訓練されたEMMとDDPMは, 様々な攻撃(Narcisus, Bullseye Polytope, Gradient Matching)に対する最先端の防御を提供する。提案手法は, 有毒あるいは分布に変化した生成モデルトレーニングデータであっても, 高い有効性を維持していることを示す。 Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $\Psi(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# LLM推論のメモリ効率向上のためのハードウェア対応並列プロンプトデコーディング Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference ( http://arxiv.org/abs/2405.18628v1 ) ライセンス: Link先を確認	Hao, Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan,	(参考訳) LLM(Large Language Models)の自動回帰デコーディングは、ハードウェア性能に大きなオーバーヘッドをもたらす。近年,マルチトークン生成のための様々な投機的復号化手法が研究されているが,これらの取り組みはスループットなどの処理速度の向上に主眼を置いている。重要なのは、メモリ消費やトレーニングコストなど、実際のデプロイメントに必要な他のメトリクスを無視することが多い。これらの制限を克服するために、0.0002$%のトレーニング可能なパラメータを必要とする新しい並列プロンプトデコーディングを提案し、たった16時間で単一のA100-40GB GPUの効率的なトレーニングを可能にする。人間の自然言語生成プロセスにインスパイアされた$PPD$は、複数のプロンプトトークンを使用して、将来の時間ステップで生成された出力を並列に近似する。このアプローチは,マルチトークン生成に必要な条件依存情報を部分的に復元し,長距離予測において最大28%の受入率を得る。さらに、この復号方式を適応的に最適化し、異なるGPU上での計算能力を完全に活用するハードウェア対応動的スパースツリー手法を提案する。 MobileLlama から Vicuna-13B までの LLM の幅広いベンチマーク実験を通じて、我々のアプローチは最大2.49$\times$ スピードアップを示し、最小限のランタイムメモリオーバーヘッドを0.0004$% で維持する。さらに重要なことは、我々の並列プロンプトデコーディングは、既存の投機的デコーディングと相乗的統合のための直交最適化として機能し、最大で1.22\times$さらなるスピード改善を示すことである。私たちのコードはhttps://github.com/hmarkc/parallel-prompt-decoding.comで利用可能です。 The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these efforts have primarily focused on improving processing speed such as throughput. Crucially, they often neglect other metrics essential for real-life deployments, such as memory consumption and training cost. To overcome these limitations, we propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours. Inspired by the human natural language generation process, $PPD$ approximates outputs generated at future timesteps in parallel by using multiple prompt tokens. This approach partially recovers the missing conditional dependency information necessary for multi-token generation, resulting in up to a 28% higher acceptance rate for long-range predictions. Furthermore, we present a hardware-aware dynamic sparse tree technique that adaptively optimizes this decoding scheme to fully leverage the computational capacities on different GPUs. Through extensive experiments across LLMs ranging from MobileLlama to Vicuna-13B on a wide range of benchmarks, our approach demonstrates up to 2.49$\times$ speedup and maintains a minimal runtime memory overhead of just $0.0004$%. More importantly, our parallel prompt decoding can serve as an orthogonal optimization for synergistic integration with existing speculative decoding, showing up to $1.22\times$ further speed improvement. Our code is available at https://github.com/hmarkc/parallel-prompt-decoding.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# 学生評価におけるパートナーとしての大規模言語モデル Large Language Models as Partners in Student Essay Evaluation ( http://arxiv.org/abs/2405.18632v1 ) ライセンス: Link先を確認	Toru Ishida, Tongxi Liu, Hailong Wang, William K. Cheung,	(参考訳) ワークショップコースにおける総合的な評価の重要性が増すにつれ、教員の作業負荷を減らすための効率的で公平な評価方法への需要が高まっている。本稿では,3つのシナリオにおいて,学生エッセイを用いたLarge Language Models (LLMs)による評価について述べる。 1) れんが等の指導を伴わない。 2) あらかじめ特定された潤滑油,及び 3)エッセイのペア比較による。評価の質と安定性に関する懸念は残るものの, 分析結果の定量的分析により, LLMと教員評価の相互比較シナリオにおける相関が強いことが明らかとなった。そこで,LLM評価コメントの質的分析を行い,以下の結果を得た。 1) LLM は, 教員の評価能力に適合することができる。 2 LLM評価のバリエーションは、混乱よりも多様性と解釈すべきであり、 3)人間とLLMによる評価は相違し,相互に補完することができる。結論として, LLM は, 教員の助手としてだけではなく, 評価委員会のパートナーとして, 今後の研究の方向性を概説すべきであると考えられる。 As the importance of comprehensive evaluation in workshop courses increases, there is a growing demand for efficient and fair assessment methods that reduce the workload for faculty members. This paper presents an evaluation conducted with Large Language Models (LLMs) using actual student essays in three scenarios: 1) without providing guidance such as rubrics, 2) with pre-specified rubrics, and 3) through pairwise comparison of essays. Quantitative analysis of the results revealed a strong correlation between LLM and faculty member assessments in the pairwise comparison scenario with pre-specified rubrics, although concerns about the quality and stability of evaluations remained. Therefore, we conducted a qualitative analysis of LLM assessment comments, showing that: 1) LLMs can match the assessment capabilities of faculty members, 2) variations in LLM assessments should be interpreted as diversity rather than confusion, and 3) assessments by humans and LLMs can differ and complement each other. In conclusion, this paper suggests that LLMs should not be seen merely as assistants to faculty members but as partners in evaluation committees and outlines directions for further research.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# 文脈内アライメントによる自己補正の理論的理解 A Theoretical Understanding of Self-Correction through In-context Alignment ( http://arxiv.org/abs/2405.18634v1 ) ライセンス: Link先を確認	Yifei Wang, Yuyang Wu, Zeming Wei, Stefanie Jegelka, Yisen Wang,	(参考訳) 人間の経験を模倣するだけでなく、最近の研究では、人間と同様に、大きな言語モデル(LLM)は自己補正によって純粋に能力を向上させることができる、すなわち、ある状況下での自己検査による以前の反応を補正できることが示されている。しかし、そのような能力の出現についてはほとんど分かっていない。本研究は、アライメントタスクに類似した簡易なセットアップに基づいて、理論的に文脈内学習の観点から自己補正を解析し、LCMが比較的正確な自己評価を報酬として与えている場合、その応答を文脈内方法で洗練することができることを示す。特に、単純化された線形変圧器に関する従来の理論を超えて、我々の理論的構成は、自己補正のための現実的な変圧器の重要な設計であるソフトマックスアテンション、マルチヘッドアテンション、MLPブロックの役割を支えている。合成データセットを用いて,これらの知見を広範囲に検証した。これらの知見に触発されて、簡単な自己補正ステップが大きな違いをもたらすLDMジェイルブレイクに対する防御など、新しい自己補正の応用についても説明する。これらの発見は、より良い基礎モデルを構築するための理解、活用、自己補正の強化に関するさらなる研究を促すだろうと考えている。 Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we also illustrate novel applications of self-correction, such as defending against LLM jailbreaks, where a simple self-correction step does make a large difference. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# In-Distribution Labelはいつ,どのようにしてアウト・オブ・ディストリビューション検出に役立つのか? When and How Does In-Distribution Label Help Out-of-Distribution Detection? ( http://arxiv.org/abs/2405.18635v1 ) ライセンス: Link先を確認	Xuefeng Du, Yiyou Sun, Yixuan Li,	(参考訳) トレーニングディストリビューションから逸脱したデータポイントの検出は、信頼性の高い機械学習を保証する上で重要である。大規模な研究は、古典的な異常検出技術から現代のアウト・オブ・ディストリビューション(OOD)検出アプローチまで、この課題に焦点をあてている。 OOD検出は一般的に、ラベル付きIDデータセットからの教師付き学習に依存するが、異常検出はIDデータ全体を単一のクラスとして扱い、IDラベルを無視することができる。この基本的な区別は、まだ厳密に調査されていない重要な疑問を提起している。本稿では,OOD検出におけるIDラベルの影響を理論的に説明するための形式的理解を提供することにより,このギャップを埋める。我々は,OODデータからのIDデータの分離性について,グラフ理論を用いて厳密に解析する。我々のアプローチの鍵は、グラフ上のスペクトル分解によるデータ表現のキャラクタリゼーションである。これらの表現を活用することで、OOD検出性能とIDラベルの有無を比較した証明可能なエラー境界を確立し、OOD検出の強化を実現するための条件を明らかにする。最後に、シミュレーションと実データの両方に経験的な結果を示し、理論的保証を検証し、洞察を補強する。コードはhttps://github.com/deeplearning-wisc/id_label.comで公開されている。 Detecting data points deviating from the training distribution is pivotal for ensuring reliable machine learning. Extensive research has been dedicated to the challenge, spanning classical anomaly detection techniques to contemporary out-of-distribution (OOD) detection approaches. While OOD detection commonly relies on supervised learning from a labeled in-distribution (ID) dataset, anomaly detection may treat the entire ID data as a single class and disregard ID labels. This fundamental distinction raises a significant question that has yet to be rigorously explored: when and how does ID label help OOD detection? This paper bridges this gap by offering a formal understanding to theoretically delineate the impact of ID labels on OOD detection. We employ a graph-theoretic approach, rigorously analyzing the separability of ID data from OOD data in a closed-form manner. Key to our approach is the characterization of data representations through spectral decomposition on the graph. Leveraging these representations, we establish a provable error bound that compares the OOD detection performance with and without ID labels, unveiling conditions for achieving enhanced OOD detection. Lastly, we present empirical results on both simulated and real datasets, validating theoretical guarantees and reinforcing our insights. Code is publicly available at https://github.com/deeplearning-wisc/id_label.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# 思想のマーケットプレースとしてのChatGPT:真理を探求することはAIコンテンツガバナンスのゴールか? ChatGPT as the Marketplace of Ideas: Should Truth-Seeking Be the Goal of AI Content Governance? ( http://arxiv.org/abs/2405.18636v1 ) ライセンス: Link先を確認	Jiawei Zhang,	(参考訳) 法的な議論の中で最も永続的なメタファーの1つとして、アイデアの市場は、何十年にもわたって、法学的な風景にかなりの影響力を及ぼしてきた。この理論の発端から1世紀後、ChatGPTは21世紀に革命的な技術進歩として登場した。本研究は,ChatGPTがマーケットプレースメタファーを効果的に表していることを示す。代々の法学者によって望まれる約束をインスタンス化するだけでなく、持続的な学術的批判を通じて認識される危険を埋める。特に、ChatGPTの業績とアイデア理論の市場は、少なくとも4つの共通の特徴(アリーナ、手段、目的、欠陥)を示している。これらの共有属性は、歴史的にアイデア理論のマーケットプレースを実現するための最も適格なエンジンをChatGPTに与えるのに十分である。マーケットプレース理論とChatGPTの比較は単なる出発点である。より意味のある取り組みは、市場理論を修正するために研究者が提起した経験、洞察、提案の蓄積を参照することによって、内部と外部のAIポリシを再評価し再検討することである。 AIコンテンツガバナンスの目標として、真理を探すべきなのか? 絶対的真理探索の目標が達成不可能であることを考えると、ゼロリスク政策の採用に反対する。代わりに、より司法的なアプローチは、十分な正当化に基づいて競合する異なる視点を生成するために、大きな言語モデル(LLM)を訓練する知識ベースの代替案を採用することである。この研究は、いわゆるAIコンテンツリスクはAI企業が生み出すものではなく、情報エコシステム全体に固有のものだとも主張している。したがって、これらのリスク管理の負担は、チャットボット会社にのみ負担されるのではなく、異なるソーシャルアクターに分散されるべきである。 As one of the most enduring metaphors within legal discourse, the marketplace of ideas has wielded considerable influence over the jurisprudential landscape for decades. A century after the inception of this theory, ChatGPT emerged as a revolutionary technological advancement in the twenty-first century. This research finds that ChatGPT effectively manifests the marketplace metaphor. It not only instantiates the promises envisaged by generations of legal scholars but also lays bare the perils discerned through sustained academic critique. Specifically, the workings of ChatGPT and the marketplace of ideas theory exhibit at least four common features: arena, means, objectives, and flaws. These shared attributes are sufficient to render ChatGPT historically the most qualified engine for actualizing the marketplace of ideas theory. The comparison of the marketplace theory and ChatGPT merely marks a starting point. A more meaningful undertaking entails reevaluating and reframing both internal and external AI policies by referring to the accumulated experience, insights, and suggestions researchers have raised to fix the marketplace theory. Here, a pivotal issue is: should truth-seeking be set as the goal of AI content governance? Given the unattainability of the absolute truth-seeking goal, I argue against adopting zero-risk policies. Instead, a more judicious approach would be to embrace a knowledge-based alternative wherein large language models (LLMs) are trained to generate competing and divergent viewpoints based on sufficient justifications. This research also argues that so-called AI content risks are not created by AI companies but are inherent in the entire information ecosystem. Thus, the burden of managing these risks should be distributed among different social actors, rather than being solely shouldered by chatbot companies.	翻訳日:2024-05-30 21:33:21 公開日:2024-05-28
# ConSiDERS-The-Human Evaluation Framework: 生成型大規模言語モデルに対する人的評価の再考 ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models ( http://arxiv.org/abs/2405.18638v1 ) ライセンス: Link先を確認	Aparna Elangovan, Ling Liu, Lei Xu, Sravan Bodapati, Dan Roth,	(参考訳) 本稿では,人為的な大規模言語モデル(LLM)の評価は,ユーザエクスペリエンス研究や人間の行動心理学といった分野から洞察を得て,実験設計と結果の信頼性を確保するための多分野的な取り組みであるべきだ,と論じる。これらの評価から得られた結論は、ユーザビリティ、美学、認知バイアスなどの要因を考慮しなければならない。認知バイアスが、流動的な情報や真理をいかに説明するか、そして、認識の不確実性が、Likertのような評価スコアの信頼性にどのように影響するかを強調します。さらに、評価は、効果的なテストセットを必要とする、ますます強力な大規模言語モデルの能力と弱点を区別するべきである。人的評価のスケーラビリティは、広く採用するためにも不可欠である。そこで, 生成NLP時代の効果的な人的評価システムを設計するために, コンシダーS-The-Human評価フレームワークを提案し, 一貫性, Scoring Critera, 差別化, ユーザエクスペリエンス, 責任, スケーラビリティの6つの柱からなる。 In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to ensure that the experimental design and results are reliable. The conclusions from these evaluations, thus, must consider factors such as usability, aesthetics, and cognitive biases. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. Furthermore, the evaluation should differentiate the capabilities and weaknesses of increasingly powerful large language models -- which requires effective test sets. The scalability of human evaluation is also crucial to wider adoption. Hence, to design an effective human evaluation system in the age of generative NLP, we propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars --Consistency, Scoring Critera, Differentiating, User Experience, Responsible, and Scalability.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 自己教師付き事前学習によるECoGからの音声復号化 Improving Speech Decoding from ECoG with Self-Supervised Pretraining ( http://arxiv.org/abs/2405.18639v1 ) ライセンス: Link先を確認	Brian A. Yuan, Joseph G. Makin,	(参考訳) 頭蓋内脳と機械のインタフェースに関する最近の研究は、音声音声を高精度にデコードできることを実証している。しかし、そのようなネットワークは、非常に多くのラベル付きデータで表現力を得るため、人間の患者から取得した侵襲的なニューラル記録には特に負担となる要件である。一方、これらの患者は典型的には、デコーダの訓練に用いられる実験ブロックの外で音声を生成する。このようなデータや、他の患者のデータを利用してデコードを改善することで、データ収集の負担が軽減される。ここでは、心電図(ECoG)データに対するノイズコントラスト損失を用いて音声の潜時表現を学習する、単純で自己監督的で完全な畳み込みモデルであるwav2vecを再設計することで、これが可能であることを実証する。ラベル付き音声セッションからwav2vecの表現空間にECoGを変換した後、最終的に教師付きエンコーダデコーダをトレーニングし、これらの表現をテキストにマッピングする。多数のラベル付きブロックを実験し、ほとんどの場合、新しい表現は元のECoGデータよりも優れた復号化性能が得られる。他の患者のデータにwav2vecを事前学習することで、パフォーマンスを向上させることもできる。ベストケースでは、wav2vecの表現は元のデータに対する単語誤り率を50%以上減少させる。 Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly burdensome for invasive neural recordings acquired from human patients. On the other hand, these patients typically produce speech outside of the experimental blocks used for training decoders. Making use of such data, and data from other patients, to improve decoding would ease the burden of data collection -- especially onerous for dys- and anarthric patients. Here we demonstrate that this is possible, by reengineering wav2vec -- a simple, self-supervised, fully convolutional model that learns latent representations of audio using a noise-contrastive loss -- for electrocorticographic (ECoG) data. We train this model on unlabelled ECoG recordings, and subsequently use it to transform ECoG from labeled speech sessions into wav2vec's representation space, before finally training a supervised encoder-decoder to map these representations to text. We experiment with various numbers of labeled blocks; for almost all choices, the new representations yield superior decoding performance to the original ECoG data, and in no cases do they yield worse. Performance can also be improved in some cases by pretraining wav2vec on another patient's data. In the best cases, wav2vec's representations decrease word error rates over the original data by upwards of 50%.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 閉時間曲線上の生命 Life on a closed timelike curve ( http://arxiv.org/abs/2405.18640v1 ) ライセンス: Link先を確認	Lorenzo Gavassino,	(参考訳) 我々は、G\"{o}del-型宇宙において、近時のような曲線を走行する仮説的宇宙船の内部動力学を研究する。適切な時間における進化の生成元が角運動量となるように曲線を選択する。ウィグナーの定理を用いて、宇宙船の内部のエネルギー準位が自然に離散化されなければならないことを証明する。レベル分離は、曲線のラウンドトリップを完了した後、全ての系が初期状態に戻るように微調整されることが判明した。これは例えば、宇宙船内の観測者の記憶が、旅の終わりまでに必ず消されることを意味する。さらに一般に、エントロピーが増加すると、ポアンカー・'{e} サイクルはループの終わりまでにそれを反転させ、エントロピーはその初期値に還元する。このようなエントロピーの減少は固有状態熱化仮説と一致している。時間トラバーパラドックスの非存在は、我々の分析の厳密な概要として従う。 We study the internal dynamics of a hypothetical spaceship traveling on a close timelike curve in a G\"{o}del-type universe. We choose the curve so that the generator of evolution in proper time is the angular momentum. Using Wigner's theorem, we prove that the energy levels internal to the spaceship must undergo spontaneous discretization. The level separation turns out to be finely tuned so that, after completing a roundtrip of the curve, all systems are back to their initial state. This implies, for example, that the memories of an observer inside the spaceship are necessarily erased by the end of the journey. More in general, if there is an increase in entropy, a Poincar\'{e} cycle will eventually reverse it by the end of the loop, forcing entropy to decrease back to its initial value. We show that such decrease in entropy is in agreement with the eigenstate thermalization hypothesis. The non-existence of time-travel paradoxes follows as a rigorous corollary of our analysis.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 有害微調整に対する大規模言語モデルの遅延安全アライメント Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning ( http://arxiv.org/abs/2405.18641v1 ) ライセンス: Link先を確認	Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu,	(参考訳) 近年の研究では、有害データと混合したデータセットを微調整することで、安全アライメントを伴うLarge Language Models (LLM) を脱獄することができることが示されている。文献ではじめて、調整段階の状態を分離し、アライメントとユーザデータセットを最適化することで、脱獄効果を緩和できることを示す。残念なことに、その後の研究では、この単純な双状態最適化(BSO)ソリューションは、アライメント状態に投資するステップが小さすぎると収束不安定になり、アライメント性能が低下することを示した。統計的解析により, コンセンサスに対するtextit{excess drift} が不安定性の原因となる可能性が示唆された。この問題を治療するために、各状態のドリフトを制限するための近項を導入する、 \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment(\textbf{Lisa})を提案する。理論的には、近位項の利点は収束解析によって支えられ、リサの収束を保証するのに十分な大きな近位因子が必要であることを示す。その結果,LLMの精度をユーザタスクに保ちながら,近似項を持つLisaはアライメント性能を著しく向上させることができることがわかった。コードは \url{https://github.com/git-disl/Lisa} で入手できる。 Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards consensus could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream finetuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at \url{https://github.com/git-disl/Lisa}.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# JADS: 自己教師型共同アスペクト発見と要約のためのフレームワーク JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization ( http://arxiv.org/abs/2405.18642v1 ) ライセンス: Link先を確認	Xiaobo Guo, Jay Desai, Srinivasan H. Sengamedu,	(参考訳) テキスト文書の複数の側面やトピックを含む要約を生成するために、ほとんどのアプローチでは、クラスタリングやトピックモデリングを使用して関連する文をグループ化し、各グループの要約を生成する。これらのアプローチは、要約アルゴリズムとクラスタリングアルゴリズムを共同で最適化するのに苦労する。一方、アスペクトベースの要約は既知のアスペクトを必要とする。私たちのソリューションはトピックの発見と要約をひとつのステップに統合します。テキストデータを与えられた場合、JADS(Joint Aspect Discovery and Summarization Algorithm)は入力からアスペクトを発見し、トピックの要約を生成する。本稿では,まず複数の文書(例えば,CNN/DailyMail記事)からの文を入力として混合してラベル付きデータセットを生成し,その混合物の要約をラベルとして利用する自己教師型フレームワークを提案する。 JADSモデルは、2段階のベースラインよりも優れています。事前トレーニングでは、モデルの性能と安定性が向上する。さらに、JADSから派生した埋め込みはより優れたクラスタリング能力を示す。提案手法は,地上の真理と高いセマンティックアライメントを実現し,現実的である。 To generate summaries that include multiple aspects or topics for text documents, most approaches use clustering or topic modeling to group relevant sentences and then generate a summary for each group. These approaches struggle to optimize the summarization and clustering algorithms jointly. On the other hand, aspect-based summarization requires known aspects. Our solution integrates topic discovery and summarization into a single step. Given text data, our Joint Aspect Discovery and Summarization algorithm (JADS) discovers aspects from the input and generates a summary of the topics, in one step. We propose a self-supervised framework that creates a labeled dataset by first mixing sentences from multiple documents (e.g., CNN/DailyMail articles) as the input and then uses the article summaries from the mixture as the labels. The JADS model outperforms the two-step baselines. With pretraining, the model achieves better performance and stability. Furthermore, embeddings derived from JADS exhibit superior clustering capabilities. Our proposed method achieves higher semantic alignment with ground truth and is factual.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# データクラスタリングによる光検出磁気共鳴スペクトルの高速キャラクタリゼーション Fast characterization of optically detected magnetic resonance spectra via data clustering ( http://arxiv.org/abs/2405.18648v1 ) ライセンス: Link先を確認	Dylan G. Stone, Benjamin Whitefield, Mehran Kianinia, Carlo Bradac,	(参考訳) 光検出磁気共鳴(ODMR)は、室温で固体量子エミッタのスピン状態を測定するための、確立された強力な技術となっている。放出体、励起状態、準安定状態を含むスピン依存的な再結合プロセスに基づき、ODMRは個々の電子と核スピンのイメージングだけでなく、ナノスケールの電場、温度、ひずみ、圧力のスピンベースの量子センシングを可能にしている。これらのセンサーの多くの応用の中心は、これらのスペクトルの共鳴周波数が、スピンセンサーに作用する物理量に直接マップされるので、ODMRデータを確実に分析する能力である。しかし、これは面倒なことであり、従来のフィッティング法を用いて共鳴を決定するのに適した信号対雑音レベルに達するのに、ミリ秒から数秒までの比較的長い積分時間が必要である。本稿では,この制限を克服し,ODMRスペクトルの共振周波数を精度良く(~1.3x因子),高分解能(〜4.7x因子),および/または全データ点(〜5x因子)で決定するアルゴリズムを提案する。提案したクラスタリングアルゴリズム(CA)は、多くのODMRベースの量子センシングアプリケーション、特にノイズや少ないデータセットを扱う場合、強力なツールである。 Optically detected magnetic resonance (ODMR) has become a well-established and powerful technique for measuring the spin state of solid-state quantum emitters, at room temperature. Relying on spin-dependent recombination processes involving the emitters ground, excited and metastable states, ODMR is enabling spin-based quantum sensing of nanoscale electric and magnetic fields, temperature, strain and pressure, as well as imaging of individual electron and nuclear spins. Central to many of these sensing applications is the ability to reliably analyze ODMR data, as the resonance frequencies in these spectra map directly onto target physical quantities acting on the spin sensor. However, this can be onerous, as relatively long integration times -- from milliseconds up to tens of seconds -- are often needed to reach a signal-to-noise level suitable to determine said resonances using traditional fitting methods. Here, we present an algorithm based on data clustering that overcome this limitation and allows determining the resonance frequencies of ODMR spectra with better accuracy (~1.3x factor), higher resolution (~4.7x factor) and/or overall fewer data points (~5x factor) than standard approaches based on statistical inference. The proposed clustering algorithm (CA) is thus a powerful tool for many ODMR-based quantum sensing applications, especially when dealing with noisy and scarce data sets.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# LLMをトレーニングして自己デバッグと説明的コードを改善する Training LLMs to Better Self-Debug and Explain Code ( http://arxiv.org/abs/2405.18649v1 ) ライセンス: Link先を確認	Nan Jiang, Xiaopeng Li, Shiqi Wang, Qiang Zhou, Soneya Binta Hossain, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras,	(参考訳) コード生成の分野では、自己デバッグが重要です。 LLMは実行フィードバックに基づいて生成されたコードを洗練することができる。なぜなら、1回の試行で正しい解を生成することは、複雑なタスクに挑戦することを証明しているからである。自己デバッグに関する以前の作業は主に、小さなオープンソースLLMではうまく動作しない、いくつかの例でLLMを提供することによって、メソッドのプロンプトに重点を置いていた。本研究では,LLMの自己デバッグ能力を大幅に向上させるトレーニングフレームワークを提案する。直感的には、間違ったコードに対する一連の説明とコードの改良が、LLMが間違ったコードを分析し、改善するのに役立ちます。そこで本稿では,コード説明や洗練のための高品質なデータセットを自動で収集するパイプラインを提案する。コード説明と改良品質を考慮した新たな報酬設計により, 成功軌道と失敗軌道の両面において, 教師付き微調整(SFT)と強化学習(RL)を行う。 SFTは、パス@1を最大15.92%改善し、パス@10を4つのベンチマークで9.30%改善した。 RLトレーニングでは、pass@1が3.54%、pass@10が2.55%改善されている。トレーニングされたLLMは反復的な精錬能力を示し、コードを継続的に精錬し続けることができる。最後に、我々の人間による評価は、我々のフレームワークで訓練されたLLMがより有用なコード説明を生成し、開発者がソースコードのバグをよりよく理解するのに役立ちます。 In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose a training framework that significantly improves self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92% and pass@10 by 9.30% over four benchmarks. RL training brings additional up to 3.54% improvement on pass@1 and 2.55% improvement on pass@10. The trained LLMs show iterative refinement ability, and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 論証に基づく対話における人間モデルの近似 Approximating Human Models During Argumentation-based Dialogues ( http://arxiv.org/abs/2405.18650v1 ) ライセンス: Link先を確認	Yinxu Tang, Stylianos Loukas Vasileiou, William Yeoh,	(参考訳) 説明可能なAIプランニング(XAIP)は、人間のユーザへの意思決定とアクションを効果的に説明し、信頼を育み、人間とAIのコラボレーションを促進するAIエージェントを開発することを目的としている。 XAIPにおける重要な課題は、AIエージェントと人間のメンタルモデルを調整するためのモデル和解である。既存のアプローチはしばしば、既知の決定論的人間モデルと仮定するが、この単純化は現実世界の相互作用の複雑さや不確実性を捉えないかもしれない。本稿では,AIエージェントが議論に基づく対話を通じて確率的人間モデルを学習し,更新することを可能にする新しいフレームワークを提案する。提案手法は,信頼に基づく,確実性に基づく更新機構を取り入れ,エージェントが,エージェントの主張に対する人間の信頼と,自身の議論における確実性に基づいて,人間の精神状態に対する理解を深めることを可能にする。確率重み付け関数は確率理論にインスパイアされた確率重み付け関数を用いて信頼と認識された確率の関係を捉え、ベイズ的手法を用いてエージェントの確率分布を人間モデル上で更新する。本研究では,議論シナリオにおけるアプローチの有効性を実証的に評価し,人間の信念の形成と適応のダイナミクスを捉える能力を示す。 Explainable AI Planning (XAIP) aims to develop AI agents that can effectively explain their decisions and actions to human users, fostering trust and facilitating human-AI collaboration. A key challenge in XAIP is model reconciliation, which seeks to align the mental models of AI agents and humans. While existing approaches often assume a known and deterministic human model, this simplification may not capture the complexities and uncertainties of real-world interactions. In this paper, we propose a novel framework that enables AI agents to learn and update a probabilistic human model through argumentation-based dialogues. Our approach incorporates trust-based and certainty-based update mechanisms, allowing the agent to refine its understanding of the human's mental state based on the human's expressed trust in the agent's arguments and certainty in their own arguments. We employ a probability weighting function inspired by prospect theory to capture the relationship between trust and perceived probability, and use a Bayesian approach to update the agent's probability distribution over possible human models. We conduct a human-subject study to empirically evaluate the effectiveness of our approach in an argumentation scenario, demonstrating its ability to capture the dynamics of human belief formation and adaptation.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# ボットとオンライン政治コミュニケーションへの動的システムアプローチ A Dynamical Systems Approach to Bots and Online Political Communication ( http://arxiv.org/abs/2405.18652v1 ) ライセンス: Link先を確認	Beril Bulat, Martin Hilbert,	(参考訳) ボットはデジタル世界でますます普及し、民主的なプロセスを形作る上で積極的な役割を担ってきた。これまでの研究では、個々のレベルでの影響に焦点が当てられていたが、通信力学に対するマクロレベルの潜在的な影響は、まだほとんど理解されていない。本研究は、Twitter上でのオンライン政治討論の力学を形作る政治ボットの役割を検討するために、動的システム理論からの情報理論的アプローチを採用する。我々は、この動的プロセスのコンポーネントを、その複雑さ、予測可能性、および残りの不確実性の観点から定量化する。本研究は, ボット活動が, オンライン政治コミュニケーションの構造力学における複雑性と不確実性に関連していることを示唆している。この研究は、時間とともに展開する計算プロセスとして人間のボット力学をモデル化する際に、力学系理論からの情報理論測度を使用するためのショーケースとして機能する。 Bots have become increasingly prevalent in the digital sphere and have taken up a proactive role in shaping democratic processes. While previous studies have focused on their influence at the individual level, their potential macro-level impact on communication dynamics is still little understood. This study adopts an information theoretic approach from dynamical systems theory to examine the role of political bots shaping the dynamics of an online political discussion on Twitter. We quantify the components of this dynamic process in terms of its complexity, predictability, and the remaining uncertainty. Our findings suggest that bot activity is associated with increased complexity and uncertainty in the structural dynamics of online political communication. This work serves as a showcase for the use of information-theoretic measures from dynamical systems theory in modeling human-bot dynamics as a computational process that unfolds over time.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 基礎言語モデルに基づく継続的学習の最近の進歩 Recent Advances of Foundation Language Models-based Continual Learning: A Survey ( http://arxiv.org/abs/2405.18653v1 ) ライセンス: Link先を確認	Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Liang He, Yuan Xie,	(参考訳) 近年,基盤言語モデル (LM) は自然言語処理 (NLP) とコンピュータビジョン (CV) の分野において重要な成果を上げている。従来のニューラルネットワークモデルとは異なり、ファンデーションLMは、膨大な数のパラメータを持つ広範囲な教師なしデータセットの事前トレーニングを通じて、豊富なコモンセンス知識を取得することによって、伝達学習の優れた能力を得る。しかし、破滅的な忘れ物のために、人間のような継続的学習をエミュレートすることはできない。その結果,従来の知識を忘れずに新たなタスクに適応できるように,様々な連続学習(CL)ベースの手法が開発されている。しかし、既存のアプローチの体系的な分類とそれらの性能の比較はいまだに欠落しており、これは我々の調査が目指すギャップである。予備学習言語モデル(PLM)、大規模言語モデル(LLM)、視覚言語モデル(VLM)など、基礎言語モデルに適用されたCLに基づく既存文献の包括的なレビュー、要約、分類について検討する。我々はこれらの研究を,従来の手法,パラメータ効率に基づく手法,命令チューニングに基づく手法,連続的な事前学習手法からなるオフラインCLとオンラインCLに分割する。オフラインCLはドメイン・インクリメンタル・ラーニング、タスク・インクリメンタル・ラーニング、クラス・インクリメンタル・ラーニングを含む。さらに,CL研究で使用される典型的なデータセットとメトリクスを概説し,LMを用いた連続学習における課題と今後の課題を詳細に分析する。 Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV). Unlike traditional neural network models, foundation LMs obtain a great ability for transfer learning by acquiring rich commonsense knowledge through pre-training on extensive unsupervised datasets with a vast number of parameters. However, they still can not emulate human-like continuous learning due to catastrophic forgetting. Consequently, various continual learning (CL)-based methodologies have been developed to refine LMs, enabling them to adapt to new tasks without forgetting previous knowledge. However, a systematic taxonomy of existing approaches and a comparison of their performance are still lacking, which is the gap that our survey aims to fill. We delve into a comprehensive review, summarization, and classification of the existing literature on CL-based approaches applied to foundation language models, such as pre-trained language models (PLMs), large language models (LLMs) and vision-language models (VLMs). We divide these studies into offline CL and online CL, which consist of traditional methods, parameter-efficient-based methods, instruction tuning-based methods and continual pre-training methods. Offline CL encompasses domain-incremental learning, task-incremental learning, and class-incremental learning, while online CL is subdivided into hard task boundary and blurry task boundary settings. Additionally, we outline the typical datasets and metrics employed in CL research and provide a detailed analysis of the challenges and future work for LMs-based continual learning.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# データ拡張コントラストチューニングによる物体幻覚の緩和 Mitigating Object Hallucination via Data Augmented Contrastive Tuning ( http://arxiv.org/abs/2405.18654v1 ) ライセンス: Link先を確認	Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister,	(参考訳) その顕著な進歩にもかかわらず、MLLM(Multimodal Large Language Models)は事実的不正確な情報を幻覚する傾向がある。本研究では,MLLMのオブジェクト幻覚に対処し,モデル入力に存在しないオブジェクトに関する情報を提供する。本稿では,幻覚を緩和するための事前訓練された既成のMLLMに適用可能な,一般的な視覚言語機能を維持しつつ,コントラスト的なチューニング手法を提案する。与えられた実数トークンに対して,地筋情報を選択的に変更することにより,生成データ拡張による幻覚トークンを作成する。提案したコントラッシブチューニングはトークンレベルで適用され、幻覚化トークンと比較して事実トークンの相対的可能性を向上させる。本研究は,幻覚の緩和におけるコントラストチューニングの有効性を徹底的に評価する。さらに、提案するコントラストチューニングは単純で高速で、推論時に追加のオーバーヘッドを伴わずに最小限のトレーニングを必要とする。 Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations while preserving its general vision-language capabilities. For a given factual token, we create a hallucinated token through generative data augmentation by selectively altering the ground-truth information. The proposed contrastive tuning is applied at the token level to improve the relative likelihood of the factual token compared to the hallucinated one. Our thorough evaluation confirms the effectiveness of contrastive tuning in mitigating hallucination. Moreover, the proposed contrastive tuning is simple, fast, and requires minimal training with no additional overhead at inference.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# CAVACHON:マルチモーダル単一セルデータを統合する階層的変分オートエンコーダ CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data ( http://arxiv.org/abs/2405.18655v1 ) ライセンス: Link先を確認	Ping-Han Hsieh, Ru-Xiu Hsiao, Katalin Ferenc, Anthony Mathelier, Rebekka Burkholz, Chien-Yu Chen, Geir Kjetil Sandve, Tatiana Belova, Marieke Lydia Kuijjer,	(参考訳) ペアリング単一セルシークエンシング技術により、分子データの相補的モダリティを単一セル分解能で同時測定できる。これらの技術の進歩とともに、これらのデータを統合するために変分オートエンコーダに基づく多くの手法が開発されている。しかし、これらの手法は、モデリングと解釈を大幅に強化する可能性があるデータモダリティ間の先行する生物学的関係を明示的に含んでいるわけではない。一般化階層型変分オートエンコーダを用いて,多モードデータ間の条件付き独立関係を有向非巡回グラフとして明示的に組み込んだ新しい確率論的学習フレームワークを提案する。単セルマルチオミクスデータ統合に関連する様々なアプリケーションにおけるフレームワークの汎用性を実証する。これらには、共通の情報と異なる情報を異なるモダリティから分離すること、モダリティ固有の差分解析、統合されたセルクラスタリングが含まれる。提案手法は, 生物学的仮説の複雑さを捉え, ペア化された単一セルマルチオミクスデータの異なるモジュラリティなど, 異なる生物学的データ型間の接続を解き明かす, 高度に柔軟なグラフィカルモデルの構築を容易にすることを期待する。提案されたフレームワークの実装は、リポジトリhttps://github.com/kuijjerlab/CAVACHONで見ることができる。 Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modalities, which could significantly enhance modeling and interpretation. We propose a novel probabilistic learning framework that explicitly incorporates conditional independence relationships between multi-modal data as a directed acyclic graph using a generalized hierarchical variational autoencoder. We demonstrate the versatility of our framework across various applications pertinent to single-cell multi-omics data integration. These include the isolation of common and distinct information from different modalities, modality-specific differential analysis, and integrated cell clustering. We anticipate that the proposed framework can facilitate the construction of highly flexible graphical models that can capture the complexities of biological hypotheses and unravel the connections between different biological data types, such as different modalities of paired single-cell multi-omics data. The implementation of the proposed framework can be found in the repository https://github.com/kuijjerlab/CAVACHON.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# D-CoRP:機能的脳ネットワークのための微分接続性再構成 D-CoRP: Differentiable Connectivity Refinement for Functional Brain Networks ( http://arxiv.org/abs/2405.18658v1 ) ライセンス: Link先を確認	Haoyu Hu, Hongrun Zhang, Chao Li,	(参考訳) 脳ネットワークは脳を理解するための重要なツールであり、科学的研究と臨床診断のための洞察を提供する。脳ネットワークの既存のモデルは、主に脳の領域に焦点を当てるか、または脳の結合性の複雑さを見落としている。 MRI由来の脳ネットワークデータは通常、接続ノイズの影響を受けやすいため、脳ネットワークのモデリングに接続性を導入する必要がある。このギャップに対処するために、脳の接続性を改善するための識別可能なモジュールを導入する。我々は,脳ネットワークの複雑さに対処し,ノイズや冗長な接続をフィルタするために,情報ボトルネック理論に基づく多変量最適化を開発する。また,本手法は,ほとんどのグラフニューラルネットワークに適用可能な柔軟なプラグインとして機能する。実験の結果,提案手法は様々なベースラインモデルの性能を著しく向上し,他の最先端手法よりも優れており,脳ネットワーク接続性の改善における提案手法の有効性と一般化性を示している。コードは一般公開される予定だ。 Brain network is an important tool for understanding the brain, offering insights for scientific research and clinical diagnosis. Existing models for brain networks typically primarily focus on brain regions or overlook the complexity of brain connectivities. MRI-derived brain network data is commonly susceptible to connectivity noise, underscoring the necessity of incorporating connectivities into the modeling of brain networks. To address this gap, we introduce a differentiable module for refining brain connectivity. We develop the multivariate optimization based on information bottleneck theory to address the complexity of the brain network and filter noisy or redundant connections. Also, our method functions as a flexible plugin that is adaptable to most graph neural networks. Our extensive experimental results show that the proposed method can significantly improve the performance of various baseline models and outperform other state-of-the-art methods, indicating the effectiveness and generalizability of the proposed method in refining brain network connectivity. The code will be released for public availability.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 大規模言語モデルにおける内在的社会経済バイアスの理解 Understanding Intrinsic Socioeconomic Biases in Large Language Models ( http://arxiv.org/abs/2405.18662v1 ) ライセンス: Link先を確認	Mina Arzaghi, Florian Carichon, Golnoosh Farnadi,	(参考訳) 大規模言語モデル(LLM)は、ローン承認やビザアプリケーションといった重要な意思決定プロセスに統合されつつある。本稿では, LLMにおける人口特性と社会経済的バイアスの関係について検討する。様々な人口集団における社会経済的バイアスを体系的に定量化するために,100万の英文からなる新しいデータセットを導入する。以上の結果から, GPT-2 や Llama 2 や Falcon のような最先端モデルの両方において, 社会経済的バイアスが広範に存在することが明らかとなった。これらのバイアスは交叉性を考慮すると顕著に増幅され、LSMは名前から複数の人口特性を抽出し、特定の社会経済的バイアスと相関する顕著な能力を示す。この研究は、これらの強力なモデルをクリティカルな現実世界のアプリケーションにデプロイする際に、差別的な結果から保護するために、積極的に頑健なバイアス軽減技術が必要であることを強調している。 Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applications, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# コントラスト戦略による生涯学習と選択フォーミング Lifelong Learning and Selective Forgetting via Contrastive Strategy ( http://arxiv.org/abs/2405.18663v1 ) ライセンス: Link先を確認	Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding,	(参考訳) Lifelong Learningは、以前のタスクのキャパシティを維持しながら、新しいタスクに対して優れたパフォーマンスでモデルをトレーニングすることを目的としている。しかしながら、いくつかの実践シナリオでは、プライバシの問題による望ましくない知識をシステムに忘れる必要がある。この2つの共同作業はLearning with Selective Forgetting (LSF)と呼ばれる。本稿では,LSFのコントラスト戦略に基づく新しいフレームワークを提案する。具体的には、保存されたクラス(タスク)に対して、同じクラス内の異なるサンプルから抽出された特徴をコンパクト化する。削除されたクラスに対して、同じクラスの異なるサンプルの機能を分散して不規則にする。すなわち、ネットワークは特定の削除されたクラスからのサンプルに対して、まるでネットワークにトレーニングがないかのように、定期的な応答を持っていない。機能の分散を維持したり邪魔したりすることで、異なるクラスの忘れ物と記憶を互いに独立させたりすることができる。 4つのベンチマークデータセットで実験を行い,本手法は新たな最先端技術を実現する。 Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on contrastive strategy for LSF. Specifically, for the preserved classes (tasks), we make features extracted from different samples within a same class compacted. And for the deleted classes, we make the features from different samples of a same class dispersed and irregular, i.e., the network does not have any regular response to samples from a specific deleted class as if the network has no training at all. Through maintaining or disturbing the feature distribution, the forgetting and memory of different classes can be or independent of each other. Experiments are conducted on four benchmark datasets, and our method acieves new state-of-the-art.	翻訳日:2024-05-30 21:23:36 公開日:2024-05-28
# 行動する前に考える: ワーキングメモリを備えた決定変換器 Think Before You Act: Decision Transformers with Working Memory ( http://arxiv.org/abs/2305.16338v3 ) ライセンス: Link先を確認	Jikun Kang, Romain Laroche, Xingdi Yuan, Adam Trischler, Xue Liu, Jie Fu,	(参考訳) 決定変換器に基づく意思決定エージェントは、複数のタスクにまたがる一般化能力を示している。しかし、その性能は大量のデータと計算に依存している。この非効率性は、モデルがトレーニングを通してパラメータの振る舞いを記憶する忘れ現象に起因していると我々は主張する。結果として、新しいタスクに対するトレーニングは、以前のタスクに対するモデルの性能を低下させる可能性がある。 LLMの暗黙記憶機構とは対照的に、人間の脳は分散メモリストレージを利用して複数のスキルを効率的に管理し、整理し、忘れる現象を緩和する。そこで本研究では,ダウンストリームタスクの情報を格納,ブレンド,検索するためのワーキングメモリモジュールを提案する。評価の結果,提案手法は,AtariゲームやMeta-Worldオブジェクト操作タスクにおけるトレーニング効率と一般化を改善していることがわかった。さらに,メモリの微調整により,提案アーキテクチャの適応性はさらに向上することを示す。 Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.	翻訳日:2024-05-30 11:43:58 公開日:2024-05-28
# 離散データを用いた生成モデルのための魚のフローマッチング Fisher Flow Matching for Generative Modeling over Discrete Data ( http://arxiv.org/abs/2405.14664v3 ) ライセンス: Link先を確認	Oscar Davis, Samuel Kessler, Mircea Petrache, İsmail İlkan Ceylan, Michael Bronstein, Avishek Joey Bose,	(参考訳) 離散データに対する生成的モデリングは、言語モデリング、生物学的シーケンス設計、グラフ構造化された分子データなど、最近多くの成功談を目にしている。離散データに対する主要な生成的モデリングパラダイムは、依然として自己回帰的であり、最近では拡散やフローマッチングに基づく代替手段が、画像やビデオ生成のような連続的なデータ設定における印象的なパフォーマンスを欠いている。本稿では,離散データのための新しいフローマッチングモデルであるFisher-Flowを紹介する。 Fisher-Flow は離散データ上のカテゴリー分布を、その自然なリーマン計量を持つ統計多様体上の点として考えることで、明らかな幾何学的視点を採っている: $\textit{Fisher-Rao metric}$。その結果、離散データ自体は、$d$-hypersphere $\mathbb{S}^d_+$ の正のorthantに連続的に再パラメータ化され、$\mathbb{S}^d_+$ の(閉形式の)測地線に沿って質量を輸送することで、任意のソース分布をターゲットにマッピングするフローを原則的に定義できることを示した。さらに、Fisher-Flowの学習フローは、Riemannの最適輸送を活用して、トレーニングダイナミクスを改善することで、さらにブートストラップすることができる。 Fisher-Flowにより誘導される勾配流は, 前方KLの発散を低減するのに最適であることを示す。我々は,DNAプロモーターやDNAエンハンサー配列の設計を含む,合成および多種多様な実世界のベンチマークに基づいてFisher-Flowを評価する。実験的に、これらのベンチマーク上で、Fisher-Flowは事前拡散およびフローマッチングモデルよりも改善されていることが判明した。 Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the $\textit{Fisher-Rao metric}$. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the $d$-hypersphere $\mathbb{S}^d_+$, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of $\mathbb{S}^d_+$. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.	翻訳日:2024-05-30 11:43:58 公開日:2024-05-28
# 単純さが有効性を満たすとき--単語埋め込みとLSTMによるコードコメントの一貫性の検出 When simplicity meets effectiveness: Detecting code comments coherence with word embeddings and LSTM ( http://arxiv.org/abs/2405.16272v2 ) ライセンス: Link先を確認	Michael Dubem Igbomezie, Phuong T. Nguyen, Davide Di Ruscio,	(参考訳) コードコメントは、プログラマに実用的な情報を提供し、基盤となるコードの意図や意味をよりよく理解できるようにするため、ソフトウェア開発において重要な役割を担います。それでも、開発者はコードを更新した後にコメントをそのまま残す傾向にあり、2つのアーティファクトの間に相違が生じます。このような不一致は開発者の間で誤解や混乱を引き起こし、コードの理解やメンテナンスなど、さまざまな活動を妨げます。したがって、コードスニペットが与えられたら、そのコメントが一貫性があり、コードの背後にある意図をよく反映しているかどうかを特定することが重要です。残念ながら、この問題に対する既存のアプローチは、奨励的なパフォーマンスを得る一方で、厳格に事前訓練されたモデルに頼るか、入力データをテキストとして扱うか、単語の順序や同義語を含むコメントやコードに含まれる固有の特徴を無視している。この研究は、コードコメントコヒーレンスを検出するための実践的なアプローチとしてCo3Dを提示している。コーパス対のコヒーレンスを予測しながら、単語の内部的意味とテキスト中の単語の逐次順序に注意を払う。我々は、Gensim word2vecエンコーディングと単純なリカレントニューラルネットワークの組み合わせ、Gensim word2vecエンコーディングとLSTMモデルの組み合わせ、CodeBERTをデプロイした。実験の結果,Co3Dは予測性能が良好であり,良好なベースラインを達成できた。文脈によっては、単純なアーキテクチャを使うことで満足できる予測を導入することができると結論付けている。 Code comments play a crucial role in software development, as they provide programmers with practical information, allowing them to understand better the intent and semantics of the underpinning code. Nevertheless, developers tend to leave comments unchanged after updating the code, resulting in a discrepancy between the two artifacts. Such a discrepancy may trigger misunderstanding and confusion among developers, impeding various activities, including code comprehension and maintenance. Thus, it is crucial to identify if, given a code snippet, its corresponding comment is coherent and reflects well the intent behind the code. Unfortunately, existing approaches to this problem, while obtaining an encouraging performance, either rely on heavily pre-trained models, or treat input data as text, neglecting the intrinsic features contained in comments and code, including word order and synonyms. This work presents Co3D as a practical approach to the detection of code comment coherence. We pay attention to internal meaning of words and sequential order of words in text while predicting coherence in code-comment pairs. We deployed a combination of Gensim word2vec encoding and a simple recurrent neural network, a combination of Gensim word2vec encoding and an LSTM model, and CodeBERT. The experimental results show that Co3D obtains a promising prediction performance, thus outperforming well-established baselines. We conclude that depending on the context, using a simple architecture can introduce a satisfying prediction.	翻訳日:2024-05-30 11:33:46 公開日:2024-05-28
# 効率的なパラメータ化ニューラルメトロエレクティックシステム Efficiently Parameterized Neural Metriplectic Systems ( http://arxiv.org/abs/2405.16305v2 ) ライセンス: Link先を確認	Anthony Gruber, Kookjin Lee, Haksoo Lim, Noseong Park, Nathaniel Trask,	(参考訳) メトロトレクティックシステムは、状態の大きさとメトロトレクティックデータのランクの両方で二次的にスケールする方法でデータから学習される。提案手法は, エネルギー保存とエントロピー安定性に加えて, 近似誤差が低い場合に, 一般化する可能性を示す誤差推定値とともに, データからメチレント力学を正確に学習する能力を示す近似結果から導かれる。提案手法は, モデル表現率を損なうことなく, 精度とスケーラビリティが向上することを確認した。 Metriplectic systems are learned from data in a way that scales quadratically in both the size of the state and the rank of the metriplectic data. Besides being provably energy conserving and entropy stable, the proposed approach comes with approximation results demonstrating its ability to accurately learn metriplectic dynamics from data as well as an error estimate indicating its potential for generalization to unseen timescales when approximation error is low. Examples are provided which illustrate performance in the presence of both full state information as well as when entropic variables are unknown, confirming that the proposed approach exhibits superior accuracy and scalability without compromising on model expressivity.	翻訳日:2024-05-30 11:33:46 公開日:2024-05-28
# プログラム生成・エミュレーション・検索による推論学習 Learning to Reason via Program Generation, Emulation, and Search ( http://arxiv.org/abs/2405.16337v2 ) ライセンス: Link先を確認	Nathaniel Weir, Muhammad Khalifa, Linlu Qiu, Orion Weller, Peter Clark,	(参考訳) 言語モデル(LM)によるプログラム合成は、様々な推論能力の集合を解き放ち、コードチューニングされたLMは、様々なアルゴリズム的記号操作タスク(例えば、単語の連結)を解くプログラムを生成することに長けていることが証明されている。しかし、すべての推論タスクは、コードとして容易に表現できるわけではない。例えば、常識的推論、道徳的意思決定、皮肉な理解を含むタスク。我々のゴールは、LMのプログラム合成スキルをそのようなタスクに拡張し、擬似プログラム、すなわちいくつかのリーフ関数呼び出しが未定義のままであるPythonプログラムを通じて結果を評価することである。そのために、コード生成とエミュレートされた実行(CoGEX)を提案する。 CoGEX は(1) LM を訓練して独自の擬似プログラムを生成し、(2) それらの葉機能を含むプログラムの実行をエミュレートし、LM の知識が実行ギャップを埋めることを可能にする。本稿では,CoGEXモデルを新しいタスクに適応させるため,与えられたデータセットのすべてのインスタンスに適用した場合に,擬似実行が最適な性能を示すプログラムを1つ探すプログラム探索を行う手法を提案する。提案手法は,タスクのバッテリ上での標準的なコンテキスト内学習手法と比較して,アルゴリズム的推論とソフト推論の両方において大きな改善をもたらすことを示す。この結果は、コード合成が以前考えられていたよりもはるかに幅広い問題に応用可能であることを証明している。リリースしたデータセット、微調整されたモデル、実装は、 \url{https://github.com/nweir127/CoGEX}で確認できます。 Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word concatenation). However, not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend an LM's program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (CoGEX). CoGEX works by (1) training LMs to generate their own pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one. To adapt the CoGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. We show that our approach yields large improvements compared to standard in-context learning approaches on a battery of tasks, both algorithmic and soft reasoning. This result thus demonstrates that code synthesis can be applied to a much broader class of problems than previously considered. Our released dataset, fine-tuned models, and implementation can be found at \url{https://github.com/nweir127/CoGEX}.	翻訳日:2024-05-30 11:33:46 公開日:2024-05-28
# SpinQuant: 学習回転によるLLM量子化 SpinQuant: LLM quantization with learned rotations ( http://arxiv.org/abs/2405.16406v2 ) ライセンス: Link先を確認	Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort,	(参考訳) 重み、アクティベーション、KVキャッシュに適用されるPTQ(Post-training Quantization)技術は、大規模言語モデル(LLM)のメモリ使用量、レイテンシ、消費電力を大幅に削減するが、外れ値が存在する場合の大きな量子化誤差を引き起こす可能性がある。近年の研究では、回転活性化または重量行列は、外れ値を取り除き、量子化の恩恵をもたらすことが示唆されている。本研究では,全精度トランスフォーマーアーキテクチャにおいて同一の出力となる回転パラメータの集合を同定し,いくつかのランダムな回転が,下流ゼロショット推論性能の最大13点差を伴って,他のものよりもはるかに優れた量子化をもたらすことを見出した。その結果、小さな検証セット上でケイリー最適化を用いて回転行列を最適化(あるいは学習)するSpinQuantを提案する。重量、アクティベーション、KV-cacheの4ビット量子化により、SpinQuantはゼロショット推論タスクの精度ギャップをLLaMA-2 7Bモデルでわずか2.9ポイントに縮小し、LLM-QATを19.1ポイント、SmoothQuantを25.0ポイント超えた。 SpinQuantは同時に作業のQuaRotを上回り、異常なローテーションを適用してアウトレイラを除去する。特に、定量化が難しいLLaMA-2 7B/LLaMA-3 8Bモデルでは、SpinQuantはQuaRotと比較してギャップを30.2%/34.1%削減する。 Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot.	翻訳日:2024-05-30 11:33:46 公開日:2024-05-28
# POMDPの解法における変圧器の再考 Rethinking Transformers in Solving POMDPs ( http://arxiv.org/abs/2405.17358v2 ) ライセンス: Link先を確認	Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du, Huazhe Xu,	(参考訳) 実世界のシナリオにおける強化学習(RL)のような連続的な意思決定アルゴリズムは、必然的に部分観測可能な環境に直面している。本稿では、部分的に観測可能なマルコフ決定プロセス(POMDP)におけるトランスフォーマー(transformers)という一般的なアーキテクチャの有効性を精査し、その理論的限界を明らかにする。我々はトランスフォーマーがモデル化に苦慮している正規言語がPOMDPに還元可能であることを確立する。このことはトランスフォーマーがPOMDP固有の帰納バイアスを学習する上で大きな課題となる。本稿では、RLのシーケンスモデルとしてのトランスフォーマーの一般的な信念に疑問を呈し、ポイントワイズ・リカレント構造を導入することを提案する。 Deep Linear Recurrent Unit (LRU) は、部分的に観測可能なRLの代替としてよく適しており、Transformerの準最適性能とLRUのかなりの強度を強調した実証的な結果である。 Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.	翻訳日:2024-05-30 11:23:10 公開日:2024-05-28
# DeTox: モデル編集のための Toxic Subspace Projection DeTox: Toxic Subspace Projection for Model Editing ( http://arxiv.org/abs/2405.13967v3 ) ライセンス: Link先を確認	Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu,	(参考訳) 近年,大規模言語モデル(LLM)の安全性向上のために,嗜好データに代表される人間の行動に適合する手法として,直接選好最適化(DPO)などのアライメントアルゴリズムが開発されている。しかし、これらの手法はどちらも計算集約的であり、制御性と透明性が欠如しているため、脱獄や広範囲の使用を阻害する傾向がある。さらに、これらのチューニングベースの手法は、トレーニングのための大規模な嗜好データを必要とし、ノイズの多い選好データに影響を受けやすい。本稿では,無調律アライメント(DeTox)を導入し,その有効性を示す。 DeToxはモデルパラメータ空間内の有毒な部分空間を識別し、検出された部分空間を投影することでモデル毒性を低減する、サンプル効率のよいモデル編集手法である。言語モデルから好みデータ埋め込みを抽出し、これらの埋め込みから有害でない情報を除去することにより、有害な部分空間を同定する。 DeTox は DPO よりもサンプリング効率が高く,さらにノイズの多いデータに対するロバスト性を示す。最後に、DeTox と DPO の間の理論的および実証的な接続を確立することにより、DeTox が単一の DPO ステップの復号版として解釈可能であることを示す。 Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widespread use. Furthermore, these tuning-based methods require large-scale preference data for training and are susceptible to noisy preference data. In this paper, we introduce a tuning-free alignment alternative (DeTox) and demonstrate its effectiveness under the use case of toxicity reduction. Grounded on theory from factor analysis, DeTox is a sample-efficient model editing approach that identifies a toxic subspace in the model parameter space and reduces model toxicity by projecting away the detected subspace. The toxic sub-space is identified by extracting preference data embeddings from the language model, and removing non-toxic information from these embeddings. We show that DeTox is more sample-efficient than DPO, further showcasing greater robustness to noisy data. Finally, we establish both theoretical and empirical connections between DeTox and DPO, showing that DeTox can be interpreted as a denoised version of a single DPO step.	翻訳日:2024-05-30 10:56:57 公開日:2024-05-28
# 文脈的ニューラル・レマタイゼーション改善のための簡易関節モデル A Simple Joint Model for Improved Contextual Neural Lemmatization ( http://arxiv.org/abs/1904.02306v5 ) ライセンス: Link先を確認	Chaitanya Malaviya, Shijie Wu, Ryan Cotterell,	(参考訳) 英語の動詞には複数の形がある。例えば、会話は、文脈によっては、話す、話す、話すようにも見えます。補題化のNLPタスクは、これらの多様な形式を補題として知られる正準形式にマッピングしようとする。ユニバーサル依存コーパスから20言語について, 最新の結果が得られる, 補題化と形態的タグ付けのための単純なジョイントニューラルモデルを提案する。本論文では,トレーニングと復号化に加えて,本モデルについて述べる。誤り解析は, 共同形態的タグ付けとレムマタイズが, より大規模な形態的複雑性を示す低リソースのレンマタイズや言語に特に有用であることを示している。コードと事前トレーニングされたモデルは、https://sigmorphon.github.io/sharedtasks/2019/task2/で利用可能だ。 English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially helpful in low-resource lemmatization and languages that display a larger degree of morphological complexity. Code and pre-trained models are available at https://sigmorphon.github.io/sharedtasks/2019/task2/.	翻訳日:2024-05-30 05:10:10 公開日:2024-05-28
# 語彙的セマンティック・ギャップを文脈で埋める話者 Speakers Fill Lexical Semantic Gaps with Context ( http://arxiv.org/abs/2010.02172v4 ) ライセンス: Link先を確認	Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell,	(参考訳) 語彙的曖昧さは言語に広く浸透し、経済的な単語の再利用を可能にし、言語をより効率的にする。しかし、もし曖昧な単語が文脈から曖昧にできない場合、この効率の上昇が言語を明瞭にし、頻繁な誤ったコミュニケーションをもたらす可能性がある。言語を明確かつ効率的に符号化するためには、単語型の語彙的あいまいさは、平均的に、その文脈にどの程度の情報を提供するかと相関するべきであると仮定する。この場合、単語の語彙的あいまいさをその意味のエントロピーとして運用するには、人間のアノテーションを必要とするもの(WordNetを使用)と、そうでないもの(BERTを使用)の2つの方法を提供する。我々は,6つの高リソース言語において,BERTに基づくあいまいさの推定値とWordNet(例えば$\rho = 0.40$)における単語の同義語数との間に,ピアソンの有意な相関関係があることを示し,これらの評価を検証した。次に、単語の語彙的曖昧さが文脈的不確実性と負の相関関係にあるという我々の主要な仮説を検証し、我々が分析する18の類型的多様言語全てに有意な相関関係を見出す。これは、あいまいさの存在下では、話者が文脈をより情報的にすることで補うことを示唆している。 Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear -- resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this -- one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. $\rho = 0.40$ in English). We then test our main hypothesis -- that a word's lexical ambiguity should negatively correlate with its contextual uncertainty -- and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.	翻訳日:2024-05-30 05:10:10 公開日:2024-05-28
# 構成的処理のための機器変数推定 Instrumental Variable Estimation for Compositional Treatments ( http://arxiv.org/abs/2106.11234v3 ) ライセンス: Link先を確認	Elisabeth Ailer, Christian L. Müller, Niki Kilbertus,	(参考訳) 多くの科学的データセットは自然界において構成的である。重要な生物学的例としては、生態学における種数、単細胞シークエンシングデータ由来の細胞型組成物、およびマイクロバイオーム研究におけるアンプリコン量データがある。ここでは、構成が原因となる機器変数設定において、構成データに対する因果的視点を提供する。まず,微生物データ分析における多様性指標などの共通要約統計に因果的意味を寄与しないことを警告し,介入の観点から構成原因の解釈について,実践者の潜在的な落とし穴を明確に述べる。次に, 構成標本空間の特殊構造を考慮した統計的データ変換と回帰手法を用いた多変量解析手法を提唱, 開発し, 科学的に解釈可能な結果を得た。合成バイオームデータと実バイオームデータの比較分析では,提案手法の利点と限界が示された。本分析は, コンポジションデータの文脈において, 有効かつ有意義な原因効果推定のための有用なフレームワークとガイダンスを提供するものであると仮定する。 Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# フィンガープリンティングによる画像から画像へ生成する敵対的ネットワーク Fingerprinting Image-to-Image Generative Adversarial Networks ( http://arxiv.org/abs/2106.11760v4 ) ライセンス: Link先を確認	Guanlin Li, Guowen Xu, Han Qiu, Shangwei Guo, Run Wang, Jiwei Li, Tianwei Zhang, Rongxing Lu,	(参考訳) Generative Adversarial Networks (GAN) は様々なアプリケーションシナリオで広く利用されている。商用のGANの製造には相当な計算資源と人的資源が必要であるため、GANの著作権保護は緊急に必要である。本稿では,信頼できる第三者に基づく画像間GANの知的財産権(IP)保護のための新しいフィンガープリント手法を提案する。我々は,従来の指紋認証手法による盗難と堅牢性のボトルネックを突破し,分類モデルをGANに導入した。具体的には、ターゲットGANと分類器から合成ディープラーニングモデルを革新的に構築する。次に, この合成モデルから指紋サンプルを生成し, それを分類器に埋め込んで, 効果的な所有権検証を行う。このスキームは、現代の画像から画像への変換GANを実質的に保護するためにいくつかの具体的な方法論を刺激する。理論的解析は、これらの手法がIP保護に必要な異なるセキュリティ要件を満たすことを証明している。また、我々のソリューションが既存の戦略より優れていることを示す広範な実験も行います。 Generative Adversarial Networks (GANs) have been widely used in various application scenarios. Since the production of a commercial GAN requires substantial computational and human resources, the copyright protection of GANs is urgently needed. This paper presents a novel fingerprinting scheme for the Intellectual Property (IP) protection of image-to-image GANs based on a trusted third party. We break through the stealthiness and robustness bottlenecks suffered by previous fingerprinting methods for classification models being naively transferred to GANs. Specifically, we innovatively construct a composite deep learning model from the target GAN and a classifier. Then we generate fingerprint samples from this composite model, and embed them in the classifier for effective ownership verification. This scheme inspires some concrete methodologies to practically protect the modern image-to-image translation GANs. Theoretical analysis proves that these methods can satisfy different security requirements necessary for IP protection. We also conduct extensive experiments to show that our solutions outperform existing strategies.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# クラス増分学習におけるクロスタスク機能の重要性について On the importance of cross-task features for class-incremental learning ( http://arxiv.org/abs/2106.11930v4 ) ライセンス: Link先を確認	Albin Soutif--Cormerais, Marc Masana, Joost van de Weijer, Bartłomiej Twardowski,	(参考訳) クラス増分学習では、限られたリソースを持つエージェントは、以前のタスクからデータにアクセスできないという制約により、分類タスクのシーケンスを学習し、継続的に増加する分類問題を形成する必要がある。タスクIDを推論時に利用できるタスクインクリメンタル学習との主な違いは、学習者が同時に見ていないクラスを区別するために、クロスタスクの差別を行う必要があることである。この問題に対処するためのアプローチは多種多様であり、ほとんどは無視できない大きさの外部メモリ(バッファ)を使用する。本稿では,クロスタスクの特徴の学習を減らし,クラスILの基本的なリプレイ戦略の性能に与える影響について検討する。また、クラス増分学習のための新しい忘れ方策を定義し、忘れ方も性能低下の主な原因ではないことを確かめる。実験結果から,クラス増分学習のための将来的なアルゴリズムは,忘れてはならないだけでなく,タスク間の知識伝達やクロスタスク機能の品質向上も目指すべきであることがわかった。タスクが限られた量のデータを含む場合、これは特に重要である。 In class-incremental learning, an agent with limited resources needs to learn a sequence of classification tasks, forming an ever growing classification problem, with the constraint of not being able to access data from previous tasks. The main difference with task-incremental learning, where a task-ID is available at inference time, is that the learner also needs to perform cross-task discrimination, i.e. distinguish between classes that have not been seen together. Approaches to tackle this problem are numerous and mostly make use of an external memory (buffer) of non-negligible size. In this paper, we ablate the learning of cross-task features and study its influence on the performance of basic replay strategies used for class-IL. We also define a new forgetting measure for class-incremental learning, and see that forgetting is not the principal cause of low performance. Our experimental results show that future algorithms for class-incremental learning should not only prevent forgetting, but also aim to improve the quality of the cross-task features, and the knowledge transfer between tasks. This is especially important when tasks contain limited amount of data.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# ホールダー成長を伴う凸関数最小化のための非接触近点アルゴリズムの複雑さ Complexity of Inexact Proximal Point Algorithm for minimizing convex functions with Holderian Growth ( http://arxiv.org/abs/2108.04482v6 ) ライセンス: Link先を確認	Andrei Pătraşcu, Paul Irofti,	(参考訳) 数十年前、PPA (Proximal Point Algorithm) は抽象演算子理論と数値最適化のコミュニティの両方で長期の魅力を得始めた。現代の応用においても、研究者たちは近位最小化理論を使って、非滑らか性を克服するスケーラブルなアルゴリズムを設計している。 Fer:91,Ber:82constrained,Ber:89parallel,Tom:11} は PPA の収束挙動と目的関数の正則性の間の密接な関係を確立した。この写本では、完全かつ不正確なPPAの漸近反復複雑性を導出し、凸関数を$\gamma-$Holderian growth: $\BigO{\log(1/\epsilon)}$($\gamma \in [1,2]$)および$\BigO{1/\epsilon^{\gamma - 2}}$($\gamma > 2$)で最小化する。特に, 決定論的ノイズの存在下においても, 急激な最小値に対する有限収束と二次成長に対する線形収束という, PPA上のよく知られた結果を回復する。さらに、各IPPAを反復的に計算するための内部ルーチンとして、単純な近位次法をリカレントに呼び出すと、不正確なPPAを再起動するために、新しい計算複雑性境界が得られる。数値実験では, 既存の再起動バージョンよりも改善が見られた。 Several decades ago the Proximal Point Algorithm (PPA) started to gain a long-lasting attraction for both abstract operator theory and numerical optimization communities. Even in modern applications, researchers still use proximal minimization theory to design scalable algorithms that overcome nonsmoothness. Remarkable works as \cite{Fer:91,Ber:82constrained,Ber:89parallel,Tom:11} established tight relations between the convergence behaviour of PPA and the regularity of the objective function. In this manuscript we derive nonasymptotic iteration complexity of exact and inexact PPA to minimize convex functions under $\gamma-$Holderian growth: $\BigO{\log(1/\epsilon)}$ (for $\gamma \in [1,2]$) and $\BigO{1/\epsilon^{\gamma - 2}}$ (for $\gamma > 2$). In particular, we recover well-known results on PPA: finite convergence for sharp minima and linear convergence for quadratic growth, even under presence of deterministic noise. Moreover, when a simple Proximal Subgradient Method is recurrently called as an inner routine for computing each IPPA iterate, novel computational complexity bounds are obtained for Restarting Inexact PPA. Our numerical tests show improvements over existing restarting versions of the Subgradient Method.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# 非エルミート量子ウォークと非マルコビアン性--コイン-ポジション相互作用 Non-Hermitian quantum walks and non-Markovianity: the coin-position interaction ( http://arxiv.org/abs/2109.10682v3 ) ライセンス: Link先を確認	Himanshu Badhani, Subhashish Banerjee, C. M. Chandrashekar,	(参考訳) $\mathcal{PT}$-対称、$\mathcal{PT}$-アンブローク状態の非エルミート的ハミルトニアンは、ヒルベルト空間の適切な選択の下でユニタリ力学を導くことができる。ヒルベルト空間は、下層のベクトル空間上のハミルトニアン互換内積写像によって決定され、これは ` `metric operator' によって促進される。しかし、より伝統的な手法では、進化をオープンシステム力学として扱い、状態は各段階の正規化によって構築される。本研究では、$\mathcal{PT}$-symmetric Hamiltonianの下で進化する系の還元力学を構成する2つの方法の比較研究を示す。我々のシステムは、スピンと自由度が2つのサブシステムを形成する1次元量子ウォークである。 2つの手法によるサブシステム間の情報フローを比較する。計量形式論の下では、サブシステムへの情報バックフローのパワーローの崩壊は、$\mathcal{PT}$-unbrokenから壊れた位相への遷移の明確な兆候を与える。これは正規化状態法の下での情報バックフローとは違っている。また、非ハーミティシティモデルが開系力学を開化しても、擬ハーミティシティは計量ヒルベルト空間のサブシステム間の絡み合いを増大させ、従って擬ハーミティシティの場合が量子力学の資源と見なされることを示す。 A $\mathcal{PT}$-symmetric, non-Hermitian Hamiltonian in the $\mathcal{PT}$-unbroken regime can lead to unitary dynamics under the appropriate choice of the Hilbert space. The Hilbert space is determined by a Hamiltonian-compatible inner product map on the underlying vector space, facilitated by a ``metric operator". A more traditional method, however, involves treating the evolution as open system dynamics, and the state is constructed through normalization at each time step. In this work, we present a comparative study of the two methods of constructing the reduced dynamics of a system evolving under a $\mathcal{PT}$-symmetric Hamiltonian. Our system is a one-dimensional quantum walk with the spin and position degrees of freedom forming its two subsystems. We compare the information flow between the subsystems under the two methods. We find that under the metric formalism, a power law decay of the information backflow to the subsystem gives a clear indication of the transition from $\mathcal{PT}$-unbroken to the broken phase. This is unlike the information backflow under the normalized state method. We also note that even though non-Hermiticity models open system dynamics, pseudo-Hermiticity can increase entanglement between the subsystem in the metric Hilbert space, thus indicating that pseudo-Hermiticity cases can be seen as a resource in quantum mechanics.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# ALA:自然界を意識した対光攻撃 ALA: Naturalness-aware Adversarial Lightness Attack ( http://arxiv.org/abs/2201.06070v3 ) ライセンス: Link先を確認	Yihao Huang, Liangru Sun, Qing Guo, Felix Juefei-Xu, Jiayi Zhu, Jincao Feng, Yang Liu, Geguang Pu,	(参考訳) ほとんどの研究者は、DNNの脆弱性を特殊な逆の例で明らかにし、修復することで、DNNの堅牢性を高めようとしてきた。攻撃例の一部にはLpノルムに制限された知覚不能な摂動がある。しかし、その高周波特性のため、逆転例はデノナイズ法によって防御することができ、物理的世界では実現し難い。欠陥を避けるために、いくつかの研究は、より堅牢性と実用性を高めるために制限のない攻撃を提案している。これらの例が通常不自然に見え、警備員に警告できることは残念である。本稿では,画像の明度を変更することに焦点を当てた,非制限の非制限逆襲攻撃である逆光攻撃(ALA)を提案する。人間の知覚に欠かせない形状と色は、ほとんど影響を受けない。攻撃成功率の高い敵例を得るために,画像中の光と日陰の関係の制約のない拡張を提案する。画像の自然性を高めるため、光の範囲と分布に応じて自然性を考慮した正規化を行う。 ALAの有効性は、異なるタスクのための2つの一般的なデータセット(画像分類のためのImageNetとシーン認識のためのPlaces-365)で検証される。 Most researchers have tried to enhance the robustness of DNNs by revealing and repairing the vulnerability of DNNs with specialized adversarial examples. Parts of the attack examples have imperceptible perturbations restricted by Lp norm. However, due to their high-frequency property, the adversarial examples can be defended by denoising methods and are hard to realize in the physical world. To avoid the defects, some works have proposed unrestricted attacks to gain better robustness and practicality. It is disappointing that these examples usually look unnatural and can alert the guards. In this paper, we propose Adversarial Lightness Attack (ALA), a white-box unrestricted adversarial attack that focuses on modifying the lightness of the images. The shape and color of the samples, which are crucial to human perception, are barely influenced. To obtain adversarial examples with a high attack success rate, we propose unconstrained enhancement in terms of the light and shade relationship in images. To enhance the naturalness of images, we craft the naturalness-aware regularization according to the range and distribution of light. The effectiveness of ALA is verified on two popular datasets for different tasks (i.e., ImageNet for image classification and Places-365 for scene recognition).	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# DecisionHoldem:不完全な情報ゲームのためのディバイスポンジェントによる安全な深さ制限問題解決 DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games ( http://arxiv.org/abs/2201.11580v2 ) ライセンス: Link先を確認	Qibin Zhou, Dongdong Bai, Junge Zhang, Fuqing Duan, Kaiqi Huang,	(参考訳) 不完全情報ゲーム(英: imperfect-information game)は、非対称な情報を持つゲームの一種である。人生において、完全情報ゲームよりも一般的である。ポーカーのような不完全な情報ゲームにおける人工知能(AI)は近年大きく進歩し成功している。 LibratusやDeepstackのような超人的なポーカーAIの大成功は、ポーカー研究に注意を払う研究者を惹きつけている。しかし、オープンソースコードの欠如は、テキサスホールドエムAIの開発をある程度制限している。本稿では、対戦者の私的手の範囲を考慮し、戦略の悪用性を低減することで、安全な深度制限付きサブゲーム問題解決が可能な、テキサスの無限界ホールディングスのためのハイレベルAIであるDecisionHoldemを紹介する。実験結果から、DecisionHoldemは、Slumbot、Deepstack、viz、Openstackのハイレベルな再現を730mbb/h以上、700mbb/hで達成した。さらに,不完全な情報ゲームにおけるAI開発を促進するために,DecisionHoldemのソースコードとツールをリリースする。 An imperfect-information game is a type of game with asymmetric information. It is more common in life than perfect-information game. Artificial intelligence (AI) in imperfect-information games, such like poker, has made considerable progress and success in recent years. The great success of superhuman poker AI, such as Libratus and Deepstack, attracts researchers to pay attention to poker research. However, the lack of open-source code limits the development of Texas hold'em AI to some extent. This article introduces DecisionHoldem, a high-level AI for heads-up no-limit Texas hold'em with safe depth-limited subgame solving by considering possible ranges of opponent's private hands to reduce the exploitability of the strategy. Experimental results show that DecisionHoldem defeats the strongest openly available agent in heads-up no-limit Texas hold'em poker, namely Slumbot, and a high-level reproduction of Deepstack, viz, Openstack, by more than 730 mbb/h (one-thousandth big blind per round) and 700 mbb/h. Moreover, we release the source codes and tools of DecisionHoldem to promote AI development in imperfect-information games.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# 重回帰学習による軽量超解法を目指して Towards Lightweight Super-Resolution with Dual Regression Learning ( http://arxiv.org/abs/2207.07929v5 ) ライセンス: Link先を確認	Yong Guo, Mingkui Tan, Zeshuai Deng, Jingdong Wang, Qi Chen, Jiezhang Cao, Yanwu Xu, Jian Chen,	(参考訳) ディープニューラルネットワークは、低解像度(LR)画像から高解像度(HR)画像へのマッピングを学習することで、画像超解像(SR)タスクにおいて顕著なパフォーマンスを示した。しかし、SR問題は一般的に不適切な問題であり、既存の手法にはいくつかの制限がある。第一に、SRのマッピング空間は、同じLR画像から超解ける多くの異なるHR画像が存在するため、非常に大きい可能性がある。その結果、そのような大きな空間から有望なSR写像を直接学習することは困難である。第二に、高い計算コストを持つ非常に大きなモデルを開発することは、しばしば、有望なSR性能を得るために避けられない。実際には、モデルの冗長性を減らしてコンパクトなモデルを得るためにモデル圧縮技術を用いることができる。しかし、既存のモデル圧縮手法では、非常に大きなSRマッピング空間のため、冗長なコンポーネントを正確に識別することは困難である。最初の課題を解決するために、SRマッピングの可能な空間を減らすための二重回帰学習方式を提案する。具体的には、LR画像からHR画像へのマッピングに加えて、ダウンサンプリングカーネルを推定し、LR画像の再構成を行うために、追加の二重回帰マッピングを学習する。このように、双対写像は、可能な写像の空間を減らすための制約として機能する。 2つ目の課題に対処するために、チャネルプルーニングに基づく層レベルとチャネルレベルのモデル冗長性を低減するための二重回帰圧縮(DRC)手法を提案する。具体的には、まず二重回帰損失を最小限に抑えるチャネル数探索法を開発し、各層の冗長性を決定する。探索されたチャネル番号を考慮に入れれば、チャネルの重要性を評価し、冗長なチャネルを刈り取るために、二重回帰法をさらに活用する。拡張実験により, 高精度かつ効率的なSRモデルを得る上で, 提案手法の有効性が示された。 Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# セパレータ・デコーダ構造を持つディープニューラルネットワークを用いた単一チャネル水中音響信号の未知数の同時音源分離 Simultaneous source separation of unknown numbers of single-channel underwater acoustic signals based on deep neural networks with separator-decoder structure ( http://arxiv.org/abs/2207.11749v4 ) ライセンス: Link先を確認	Qinggang Sun, Kejun Wang,	(参考訳) 単一チャネル水中音響信号の分離は、実用上重要な課題である。未知の信号数によるソース分離問題に注目する研究はほとんどなく、システムの性能を評価する方法はまだ明らかになっていない。本稿では,この2つの問題に対処するために,一定数の出力チャネルを持つ深層学習に基づく同時分離解を提案する。この解は、目標への出力のアライメントによって引き起こされる置換問題による次元的災害を回避する。具体的には,セパレータ・デコーダ構造を持つ2段階の学習ベース分離モデルを提案する。また,対象信号を含まない出力チャネルにおけるミュートチャネルを有する状況に対する分離システムの2つの定量的指標を用いた性能評価手法を提案する。放射音の混合を模擬した実験により, 提案手法は, 既知信号数と同等の分離性能が得られることを示した。セパレータ・デコーダ構造を持つ分離モデルは、既知の信号数に対して開発された2つのモデルとして、高い説明性と拡張性を持ち、このフレームワークの最先端性を得るため、競争性能が向上した。 The separation of single-channel underwater acoustic signals is a challenging problem with practical significance. Few existing studies focus on the source separation problem with unknown numbers of signals, and how to evaluate the performance of the systems is not yet clear. In this paper, a deep learning-based simultaneous separating solution with a fixed number of output channels equal to the maximum number of possible targets is proposed to address these two problems. This solution avoids the dimensional disaster caused by the permutation problem induced by the alignment of outputs to targets. Specifically, we propose a two-step learning-based separation model with a separator-decoder structure. A performance evaluation method with two quantitative metrics of the separation system for situations with mute channels in the output channels that do not contain target signals is also proposed. Experiments conducted on simulated mixtures of radiated ship noise show that the proposed solution can achieve similar separation performance to that attained with a known number of signals. The proposed separation model with separator-decoder structure achieved competitive performance as two models developed for known numbers of signals, which is highly explainable and extensible and gets the state of the art under this framework.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# 教師なしコントラスト学習によるインフォーマティブヘルス指標の学習 Learning Informative Health Indicators Through Unsupervised Contrastive Learning ( http://arxiv.org/abs/2208.13288v3 ) ライセンス: Link先を確認	Katharina Rombach, Gabriel Michau, Wilfried Bürzle, Stefan Koller, Olga Fink,	(参考訳) 複雑な工業資産の健全性をモニタリングすることは、安全かつ効率的な運用に不可欠である。経時的に産業資産の健康状態に関する定量的なリアルタイムな洞察を提供する健康指標は、e.gフォールト検出や予後診断のための貴重なツールである。本研究では, コントラスト学習を用いて健康指標を学習し, 作業時間が劣化のプロキシとなる新しい, 汎用的で教師なしのアプローチを提案する。本手法は, 機械の摩耗評価と鉄道車輪の故障検出という2つの課題と, 異なる特性の事例から評価する。提案手法は, ミル機械の摩耗(平均0.97相関)に追従する健康指標を効果的に学習し, 鉄道車両の故障検出に適している(精度88.7%)。実験は、様々なシステムと健康状態に対するアプローチの汎用性を実証した。 Monitoring the health of complex industrial assets is crucial for safe and efficient operations. Health indicators that provide quantitative real-time insights into the health status of industrial assets over time serve as valuable tools for e.g. fault detection or prognostics. This study proposes a novel, versatile and unsupervised approach to learn health indicators using contrastive learning, where the operational time serves as a proxy for degradation. To highlight its versatility, the approach is evaluated on two tasks and case studies with different characteristics: wear assessment of milling machines and fault detection of railway wheels. Our results show that the proposed methodology effectively learns a health indicator that follows the wear of milling machines (0.97 correlation on average) and is suitable for fault detection in railway wheels (88.7% balanced accuracy). The conducted experiments demonstrate the versatility of the approach for various systems and health conditions.	翻訳日:2024-05-30 05:05:50 公開日:2024-05-28
# 多モード音響共振器における相関周波数雑音 Correlated frequency noise in a multimode acoustic resonator ( http://arxiv.org/abs/2208.13410v5 ) ライセンス: Link先を確認	Nuttamas Tubsrinuan, Jared H. Cole, Per Delsing, Gustav Andersson,	(参考訳) 周波数不安定は、量子デバイスにおけるエラーの主な原因である。本研究では、14個のSAWモードの反射係数を7時間以上同時に測定する表面波共振器(SAW)の周波数変動について検討した。我々は2つの異なるノイズ特性を報告した。 2レベルシステム(TLS)欠陥との相互作用によって生じるマルチモード周波数ノイズは、デチューニングの増加に伴って減少する有意な相関関係を示す。この発見は、量子デバイスにおける支配的なノイズ源の1つである寄生TLS挙動の現在の理解と一致する。 TLSによるノイズに加えて、遅い反相関ダイナミクスを持つ強い異常周波数変動を観測する。これらのノイズバーストは超伝導量子系で観測された宇宙放射の符号に似ている。 Frequency instabilities are a major source of errors in quantum devices. This study investigates frequency fluctuations in a surface acoustic wave (SAW) resonator, where reflection coefficients of 14 SAW modes are measured simultaneously for more than seven hours. We report two distinct noise characteristics. Multimode frequency noise caused by interactions with two-level system (TLS) defects shows significant degrees of correlations that diminish with increased detuning. This finding agrees with the current understanding of the parasitic TLS behavior as one of the dominant noise sources in quantum devices. In addition to the TLS-induced noise, we observe strong anomalous frequency fluctuations with slow, anti-correlated dynamics. These noise bursts resemble signatures of cosmic radiation observed in superconducting quantum systems.	翻訳日:2024-05-30 04:56:06 公開日:2024-05-28
# 強い逆指数としての次数1/2から1のサンドウィッチ付きレニイ分岐の操作的解釈 Operational Interpretation of the Sandwiched Rényi Divergence of Order 1/2 to 1 as Strong Converse Exponents ( http://arxiv.org/abs/2209.00554v5 ) ライセンス: Link先を確認	Ke Li, Yongsheng Yao,	(参考訳) サンドイッチ付き R'enyi divergence of order $\alpha\in(\frac{1}{2},1)$ と、その誘導された量子情報量と、量子タスクの正確な強い逆指数を特徴づける操作的解釈を提供する。特に私たちは a) 最大相対エントロピーの滑らか化 b) 量子プライバシーの増幅 (c) 量子情報の分離。これら3つのタスクの正確な逆指数を決定する問題は、その性能を忠実度または浄化距離で測定することで解決する。結果は、次数 $\alpha\in(\frac{1}{2},1)$ のサンドイッチ付き R'enyi 分岐と、その誘導量子 R'enyi 条件エントロピーと量子 R'enyi 相互情報によって与えられる。 R'enyi を R'enyi パラメータで挟んだサンドイッチの正確な操作意味を $\alpha\in(\frac{1}{2},1)$ で見つけるのはこれが初めてである。 We provide the sandwiched R\'enyi divergence of order $\alpha\in(\frac{1}{2},1)$, as well as its induced quantum information quantities, with an operational interpretation in the characterization of the exact strong converse exponents of quantum tasks. Specifically, we consider (a) smoothing of the max-relative entropy, (b) quantum privacy amplification, and (c) quantum information decoupling. We solve the problem of determining the exact strong converse exponents for these three tasks, with the performance being measured by the fidelity or purified distance. The results are given in terms of the sandwiched R\'enyi divergence of order $\alpha\in(\frac{1}{2},1)$, and its induced quantum R\'enyi conditional entropy and quantum R\'enyi mutual information. This is the first time to find the precise operational meaning for the sandwiched R\'enyi divergence with R\'enyi parameter in the interval $\alpha\in(\frac{1}{2},1)$.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 量子整流を用いた熱回路 Heat-based circuits using quantum rectification ( http://arxiv.org/abs/2209.06215v2 ) ライセンス: Link先を確認	Kasper Poulsen, Nikolaj T. Zinner,	(参考訳) 現代のコンピュータ部品の消費電力が増大するにつれて、論理情報を処理するための電力コストの削減により、熱ベースの回路はますます重要になっている。熱ベース回路では、温度差を用いて回路を介して熱電流を駆動することで計算を行う。基本成分として高調波発振器と3レベル量子整流器を用い、ダイオードの直列構成、ダイオードの並列配置、ダイオードブリッジ整流器の3つの異なる熱ベース回路について検討した。標準電子部品の熱に基づくアナログとして使用するために,各回路の必要な機能を示す。さらに、ダイオードブリッジ整流器は、入力バイアスとは無関係に出力バイアスの一貫した符号を与えるので、入力を整流する。その結果、熱電流成分を熱ベース回路に結合させることが理論的に可能であることが証明された。 3つの回路は、現在の量子技術プラットフォームを使って実現可能であるべきである。 With increased power consumption of modern computer components, heat-based circuitry has become ever more relevant due to a lower power expense to process logic bits of information. In heat-based circuits, computations are performed by driving heat currents through a circuit using a temperature difference. Utilizing harmonic oscillators and three-level quantum rectifiers as base components, we study three different heat-based circuits: a series configuration of diodes, a parallel configuration of diodes, and a diode bridge rectifier. We demonstrate the required functionality of each circuit for use as heat-based analogues of standard electronic components. Furthermore, the diode bridge rectifier is found to give consistent sign of the output bias independent of the input bias thus rectifying the input. Our results prove the theoretical feasibility of combining heat current components into heat-based circuits. The three circuits should be realizable using several of the current quantum technology platforms.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 時系列異常検出のためのディープラーニング Deep Learning for Time Series Anomaly Detection: A Survey ( http://arxiv.org/abs/2211.05244v3 ) ライセンス: Link先を確認	Zahra Zamanzadeh Darban, Geoffrey I. Webb, Shirui Pan, Charu C. Aggarwal, Mahsa Salehi,	(参考訳) 時系列異常検出は、製造業や医療を含む幅広い研究分野や応用に応用されている。異常の存在は、生産障害、システム欠陥、心臓の発散など、新しい事象や予期せぬ出来事を示しうるため、特に興味がある。時系列の大規模かつ複雑なパターンにより、研究者は異常パターンを検出するための特別な深層学習モデルを開発するようになった。本調査は,ディープラーニングを用いた構造化および総合的時系列異常検出モデルの提供に焦点を当てる。異常検出モデルを異なるカテゴリに分割する要因に基づいた分類を提供する。各カテゴリの基本的な異常検出技術を説明する以外に、利点と限界についても論じる。さらに,近年の様々なアプリケーション領域にわたる時系列における深部異常検出の例についても紹介する。最終的に、深い異常検出モデルを採用する際に直面する研究と課題のオープンな問題を要約する。 Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 結晶のディラック・フォックモデルに対する最小化器の存在 Existence of minimizers for the Dirac-Fock model of crystals ( http://arxiv.org/abs/2212.01142v4 ) ライセンス: Link先を確認	Isabelle Catto, Long Meng, Eric Paturel, Eric Séré,	(参考訳) 非相対論的結晶の基底状態に関する数学的および物理学的な文献には、多くの異なるモデルが存在するが、相対論的ケースはあまり研究されておらず、結晶の完全な相対論的処理に関する数学的結果も分かっていない。本稿では,結晶の平均場相対論的エネルギーを周期密度行列で紹介する。このモデルは、原子と分子のディラック・フォック基底状態の最近の定義と、結晶の非相対論的ハートリー・フォックモデルの両方から着想を得ている。細胞1個あたりの電子数があまり多くない場合、基底状態の存在を証明します。 Whereas many different models exist in the mathematical and physics literature for ground states of non-relativistic crystals, the relativistic case has been much less studied and we are not aware of any mathematical result on a fully relativistic treatment of crystals. In this paper, we introduce a mean-field relativistic energy for crystals in terms of periodic density matrices. This model is inspired both from a recent definition of the Dirac-Fock ground state for atoms and molecules, due to one of us, and from the non-relativistic Hartree-Fock model for crystals. We prove the existence of a ground state when the number of electrons per cell is not too large.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# ネットワークの遅延でBitcoinのセキュリティが回復 Refined Bitcoin Security-Latency Under Network Delay ( http://arxiv.org/abs/2212.01372v3 ) ライセンス: Link先を確認	Mustafa Doger, Sennur Ulukus,	(参考訳) 我々は,中本コンセンサスに対するセキュリティ-レイテンシ境界,すなわち,チェーン内で$k$-deepになったブロックの安全性について検討する。我々は,3つの相の正反対鎖と真正鎖の競合を分析することにより,最先端の境界を改良する。また,[Guo, Ren; AFT 2022] のモデルでは, ターゲットブロックがチェーン内で$k$-deepとなる場合に, 逆鎖の成長の確率分布を求める。我々は、このレースの特定の特性を分析し、既存の結果よりも厳密な境界を提供するランダムウォークを用いて各フェーズをモデル化する。これら3つのフェーズを組み合わせることで、小さな$\lambda\Delta$で、ブロックチェーンの新たな上位と下位のバウンダリを提供する。 We study security-latency bounds for Nakamoto consensus, i.e., how secure a block is after it becomes $k$-deep in the chain. We improve the state-of-the-art bounds by analyzing the race between adversarial and honest chains in three different phases. We find the probability distribution of the growth of the adversarial chains under models similar to those in [Guo, Ren; AFT 2022] when a target block becomes $k$-deep in the chain. We analyze certain properties of this race to model each phase with random walks that provide tighter bounds than the existing results. Combining all three phases provides novel upper and lower bounds for blockchains with small $\lambda\Delta$.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 学生予測分析におけるバイアス軽減のための多層個人化フェデレーション学習 Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive Analytics ( http://arxiv.org/abs/2212.02985v2 ) ライセンス: Link先を確認	Yun-Wei Chu, Seyyedali Hosseinalipour, Elizabeth Tenorio, Laura Cruz, Kerrie Douglas, Andrew Lan, Christopher Brinton,	(参考訳) 測定された活動に基づいて成績を予測する学生モデリングの従来の手法は、データ可用性バイアスによる少数/非表現の学生グループに対して正確な結果の提供に苦慮している。本稿では,学生グループ化基準の異なる層に対する推論精度を,コースごとに,また各コース内での人口統計学的サブグループによって最適化する多層パーソナライズド・フェデレーション・ラーニング(MLPFL)手法を提案する。提案手法では,個別の学生サブグループのパーソナライズされたモデルは,全データセットにまたがる共通性をモデル化しながら,サブグループの不均一性を考慮したメタ段階的な更新を通じて,分散形式で訓練されたグローバルモデルから導かれる。提案手法の評価では, モデルトレーニングにおける学生行動の多様性(講義ビデオの訪問, フォーラムへの参加など)を活用する2つの人気下流学生モデリングタスク, 知識追跡, 成果予測のケーススタディを考察する。 3つの実世界のオンラインコースデータセットの実験は、既存の学生モデルベンチマークに対するアプローチによって達成された顕著な改善を示し、平均予測品質が向上し、異なる学生サブグループ間でのばらつきが減少したことが証明された。学習者の知識状態の埋め込みを視覚的に分析した結果,個人化手法は,学習者のサブグループに集約された活動パターンを抽出し,ベースラインを超えて得られるパフォーマンス向上と一致していることがわかった。 Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, such as by course and by demographic subgroups within each course. In our approach, personalized models for individual student subgroups are derived from a global model, which is trained in a distributed fashion via meta-gradient updates that account for subgroup heterogeneity while preserving modeling commonalities that exist across the full dataset. The evaluation of the proposed methodology considers case studies of two popular downstream student modeling tasks, knowledge tracing and outcome prediction, which leverage multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums) in model training. Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks, as evidenced by an increased average prediction quality and decreased variance across different student subgroups. Visual analysis of the resulting students' knowledge state embeddings confirm that our personalization methodology extracts activity patterns clustered into different student subgroups, consistent with the performance enhancements we obtain over the baselines.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# PaDPaF : 部分結合型GANによる部分絡み合い PaDPaF: Partial Disentanglement with Partially-Federated GANs ( http://arxiv.org/abs/2212.03836v2 ) ライセンス: Link先を確認	Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč,	(参考訳) フェデレーション学習は、レコメンデーションシステム、IoT(Internet of Things)、ヘルスケア、自動運転車など、多くの潜在的な現実のアプリケーションで人気のある機械学習パラダイムとなっている。現在のほとんどのアプリケーションは分類に基づくタスクに重点を置いているが、パーソナライズされた生成モデルの学習はほとんど探索されていないままであり、不均一な設定におけるそれらの利点をよりよく理解する必要がある。本研究では,グローバルクライアント非依存とローカルクライアント固有の生成モデルを組み合わせた新しいアーキテクチャを提案する。本稿では,フェデレーションモデルをトレーニングするための標準手法を用いて,クライアント依存のバリエーション(スタイル)から一貫した表現(コンテンツ)を暗黙的に切り離すことにより,プライバシとパーソナライズを実現していることを示す。このような分解を用いて、パーソナライズされたモデルは、クライアントの所定のスタイルを保ちながら、ローカルに見えないラベルを生成することができ、グローバルコンテンツ機能上で単純な線形分類器をトレーニングすることで、すべてのクライアントのラベルを高精度に予測することができる。さらに、コンテンツのみを共有することで、データ匿名化のような他の重要なアプリケーションを可能にする。本研究の成果を概説し,提案手法の理論的動機についても考察した。 Federated learning has become a popular machine learning paradigm with many potential real-life applications, including recommendation systems, the Internet of Things (IoT), healthcare, and self-driving cars. Though most current applications focus on classification-based tasks, learning personalized generative models remains largely unexplored, and their benefits in the heterogeneous setting still need to be better understood. This work proposes a novel architecture combining global client-agnostic and local client-specific generative models. We show that using standard techniques for training federated models, our proposed model achieves privacy and personalization by implicitly disentangling the globally consistent representation (i.e. content) from the client-dependent variations (i.e. style). Using such decomposition, personalized models can generate locally unseen labels while preserving the given style of the client and can predict the labels for all clients with high accuracy by training a simple linear classifier on the global content features. Furthermore, disentanglement enables other essential applications, such as data anonymization, by sharing only the content. Extensive experimental evaluation corroborates our findings, and we also discuss a theoretical motivation for the proposed approach.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# ロバストネスは統計的推定にプライバシーを損なう Robustness Implies Privacy in Statistical Estimation ( http://arxiv.org/abs/2212.05015v2 ) ライセンス: Link先を確認	Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan,	(参考訳) 本研究では,高次元アルゴリズム統計学における対向ロバスト性と差分プライバシーの関係について検討する。提案手法は, サンプルの複雑さ, 精度, プライバシのトレードオフが最適であるプライベートな推定器を, 平均および共分散推定を含む多種多様な高次元パラメータ推定問題に対して生成できる, プライバシから堅牢性への最初のブラックボックス削減を実現する。この削減は、いくつかの重要な特殊ケースにおいて多項式時間で実施可能であることを示す。特に,2乗法に基づく高次元ガウス平均と共分散に対する近似多項式時間ロバスト推定器を用いて,ほぼ最適サンプル-精度-プライバシトレードオフを用いたこれらの問題の多項式時間プライベート推定器を設計する。また, アルゴリズムは, ほぼ最適に崩壊したサンプルに対して頑健である。 We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a nearly optimal fraction of adversarially-corrupted samples.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 不均一顔再同定のための相互ランク付け最適化 Mutimodal Ranking Optimization for Heterogeneous Face Re-identification ( http://arxiv.org/abs/2212.05510v2 ) ライセンス: Link先を確認	Hui Hu, Jiawei Zhang, Zhen Han,	(参考訳) 不均一な顔の再識別、すなわち、不規則な可視光(VIS)と近赤外線(NIR)カメラをまたいだ異種顔のマッチングは、ビデオ監視アプリケーションにおいて重要な問題となっている。しかし、不均一なNIR-VIS面間の大きな領域差は、顔の再識別性能を劇的に低下させる。この問題を解決するために,不均一顔再同定のための多モード融合ランキング最適化アルゴリズムを提案する。まず、NIR-VIS/NIR-NIR/VIS-VISフェースペアを含むマルチモーダルフェースペアをNIR-VISフェース間の相互変換により得るヘテロジニアスフェース変換ネットワークを設計する。第2に、線形および非線形融合戦略を提案し、マルチモーダルフェースペアの初期ランキングリストを集計し、モーダル相補性に基づいて最適化された再ランクリストを取得する。実験結果から,提案アルゴリズムは相補性を効果的に利用し,SCfaceデータセット上での相対的手法よりも優れていることがわかった。 Heterogeneous face re-identification, namely matching heterogeneous faces across disjoint visible light (VIS) and near-infrared (NIR) cameras, has become an important problem in video surveillance application. However, the large domain discrepancy between heterogeneous NIR-VIS faces makes the performance of face re-identification degraded dramatically. To solve this problem, a multimodal fusion ranking optimization algorithm for heterogeneous face re-identification is proposed in this paper. Firstly, we design a heterogeneous face translation network to obtain multimodal face pairs, including NIR-VIS/NIR-NIR/VIS-VIS face pairs, through mutual transformation between NIR-VIS faces. Secondly, we propose linear and non-linear fusion strategies to aggregate initial ranking lists of multimodal face pairs and acquire the optimized re-ranked list based on modal complementarity. The experimental results show that the proposed multimodal fusion ranking optimization algorithm can effectively utilize the complementarity and outperforms some relative methods on the SCface dataset.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 変分ベイズ量子気象学のための1軸ツイストの最適化 Optimizing one-axis twists for variational Bayesian quantum metrology ( http://arxiv.org/abs/2212.12461v3 ) ライセンス: Link先を確認	Tyler G. Thurtell, Akimasa Miyake,	(参考訳) 量子力学と感覚は、ある量子状態やチャネルの未知のパラメータを1軸のツイストやその他の量子資源によって生成されるスピンスクイーズなどの絡み合いを用いて推定する際の利点を求める。特に、量子ビット位相推定(回転センシング)は、電場センシング、磁気メソメトリー、原子時計、ジャイロスコープへの応用において、ユビキタスな問題として現れる。位相推定問題にベイズ形式を適用し、位相の値に関する限られた初期知識を考慮し、変分距離論を定式化し、状態準備(または符号化)および測定(または復号)手順をパラメータ化量子回路として扱う。各種パラメトリケートプロトコルの有効性だけでなく,空間的相関ノイズなどの複雑なノイズの影響に対するロバスト性も理解することが重要である。まず、任意軸ツイストアンサーゼと呼ばれる新しいパラメタライズド符号化および復号化プロトコルを提案し、目標推定誤差を達成するのに必要な1軸ツイストの数を大幅に削減できることを示す。さらに,これらの戦略に付随する推定誤差は,事前情報に制限がある未探索の制度においても,古典的(あるいは非ツイスト的)プロトコルよりも高速に,システムサイズで減少することを示した。最後に, 多項式サイズのテンソルネットワークアルゴリズムを用いて, 群スピンの対称部分空間を超えて, 実測距離を数値的に解析し, 任意の軸のツイストアンサーゼに対して, 数個の1軸のツイストと, 実質的に関連する雑音レベルに対する全ツイスト角の小さい量子的優位性が持続することを示した。 Quantum metrology and sensing seek advantage in estimating an unknown parameter of some quantum state or channel, using entanglement such as spin squeezing produced by one-axis twists or other quantum resources. In particular, qubit phase estimation, or rotation sensing, appears as a ubiquitous problem with applications to electric field sensing, magnetometry, atomic clocks, and gyroscopes. By adopting the Bayesian formalism to the phase estimation problem to account for limited initial knowledge about the value of the phase, we formulate variational metrology and treat the state preparation (or encoding) and measurement (or decoding) procedures as parameterized quantum circuits. It is important to understand how effective various parametrized protocols are as well as how robust they are to the effects of complex noise such as spatially correlated noise. First, we propose a new family of parametrized encoding and decoding protocols called arbitrary-axis twist ansatzes, and show that it can lead to a substantial reduction in the number of one-axis twists needed to achieve a target estimation error. Furthermore, we demonstrate that the estimation error associated with these strategies decreases with system size in a faster manner than classical (or no-twists) protocols, even in the less-explored regimes where the prior information is limited. Last, using a polynomial-size tensor network algorithm, we numerically analyze practical variational metrology beyond the symmetric subspace of a collective spin, and find that quantum advantage persists for the arbitrary-axis twist ansatzes with a few one-axis twists and smaller total twisting angles for practically relevant noise levels.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 部分的モビライゼーション:ロシアメディアアウトレットとテレグラム間の多言語情報フローの追跡 Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and Telegram ( http://arxiv.org/abs/2301.10856v5 ) ライセンス: Link先を確認	Hans W. A. Hanley, Zakir Durumeric,	(参考訳) ウクライナ侵攻後のロシアのオンラインメディアからの偽情報やプロパガンダを受け、ロシア・トゥデイやスプートニク・ニュースといったロシアのメディアはヨーロッパ全土で禁止された。視聴者シップを維持するために、これらのロシアのメディアの多くは、Telegramのようなメッセージングサービスでコンテンツを強く宣伝し始めた。本研究では、2022年を通して、ロシアのメディア16社が732のTelegramチャンネルとどのように対話し、利用したかを検討する。基礎モデルMPNet,DP-meansクラスタリング,Hawkesプロセスを活用することで,ニュースサイトとTelegramチャンネル間での物語の拡散を追跡できる。我々は、ニュースメディアがTelegramを通じて既存の物語を広めるだけでなく、メッセージプラットフォームから資料を発信していることを示す。例えば、我々の研究のウェブサイト全体では、2.3%(ura.news)から26.7%(ukraina.ru)までの記事がTelegram上での活動から生まれたり反したりした内容について論じている。最後に、個々のトピックの拡散を追跡することで、ロシアのメディアエコシステム内でニュースメディアやTelegramチャンネルがコンテンツを拡散する速度を測定し、ura.newsや@genshabなどのTelegramチャンネルがコンテンツを拡散するのに最も効果的であることを示す。 In response to disinformation and propaganda from Russian online media following the invasion of Ukraine, Russian media outlets such as Russia Today and Sputnik News were banned throughout Europe. To maintain viewership, many of these Russian outlets began to heavily promote their content on messaging services like Telegram. In this work, we study how 16 Russian media outlets interacted with and utilized 732 Telegram channels throughout 2022. Leveraging the foundational model MPNet, DP-means clustering, and Hawkes processes, we trace how narratives spread between news sites and Telegram channels. We show that news outlets not only propagate existing narratives through Telegram but that they source material from the messaging platform. For example, across the websites in our study, between 2.3% (ura.news) and 26.7% (ukraina.ru) of articles discussed content that originated/resulted from activity on Telegram. Finally, tracking the spread of individual topics, we measure the rate at which news outlets and Telegram channels disseminate content within the Russian media ecosystem, finding that websites like ura.news and Telegram channels such as @genshab are the most effective at disseminating their content.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 密度-ソフトマックス:分布シフト下における不確かさ推定とロバストネスの効率的なテスト時間モデル Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts ( http://arxiv.org/abs/2302.06495v3 ) ライセンス: Link先を確認	Ha Manh Bui, Anqi Liu,	(参考訳) サンプリングに基づく手法、例えばDeep EnsemblesやBayesian Neural Netsは、不確実性推定とロバストな一般化の質を改善するための有望なアプローチとなっている。しかし、それらは大規模なモデルサイズとテスト時のレイテンシに悩まされ、低リソースデバイスやリアルタイムアプリケーションに必要なスケーラビリティが制限される。これらの問題を解くために,リプシッツ制約特徴抽出器上に構築された密度関数とソフトマックス層を組み合わせることで,サンプリング不要な決定論フレームワークであるdentity-Softmaxを提案する。理論的には、我々のモデルはミニマックス不確実性リスクの解であり、特徴空間上では距離を意識していることを示し、分散シフトの際の標準ソフトマックスの過度な信頼度を低減する。実験的に,本手法は不確実性とロバスト性の観点から最先端技術と競合する結果が得られる一方で,モデルパラメータの数が少なく,テスト時のレイテンシも低い。 Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test-time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test-time.	翻訳日:2024-05-30 04:56:05 公開日:2024-05-28
# 混合半教師付き一般線形回帰と深層学習・補間への応用 Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep-Learning and Interpolators ( http://arxiv.org/abs/2302.09526v3 ) ライセンス: Link先を確認	Oren Yuval, Saharon Rosset,	(参考訳) 回帰タスクにおける教師あり学習の予測性能を向上させる半教師あり学習法(SSL)を設計するためにラベルなしデータを使用する手法を提案する。主な考え方は、ラベルなしデータを統合するための異なるメカニズムを設計し、ラベルなしデータに与えられる重みを制御する混合パラメータ$\alpha$を含めることである。一般化線形モデル(GLM)およびモデルの線形補間器クラスに着目し、異なる混合機構の特性を分析し、全ての場合において、予測性能の観点から、ラベルなしデータと任意の非ゼロ混合比$\alpha>0$を統合することは、必然的に有益であることを示す。さらに、ラベル付きおよびラベルなしデータを手元で使用しながら、SSLの混合が最高の予測性能を提供する場合、最良の混合比$\alpha^$を推定する厳密なフレームワークを提供する。提案手法が標準教師付きモデルと比較した場合, 各種設定において, 理論解析を支援する方法として, 広範囲なシミュレーションによって実証的に実証された。また、実世界の回帰タスクにおいて、ディープニューラルネットワークのようなより複雑なモデルを改善するための方法論(直感的な修正を含む)の適用性を実証する。 We present a methodology for using unlabeled data to design semi supervised learning (SSL) methods that improve the prediction performance of supervised learning for regression tasks. The main idea is to design different mechanisms for integrating the unlabeled data, and include in each of them a mixing parameter $\alpha$, controlling the weight given to the unlabeled data. Focusing on Generalized Linear Models (GLM) and linear interpolators classes of models, we analyze the characteristics of different mixing mechanisms, and prove that in all cases, it is invariably beneficial to integrate the unlabeled data with some nonzero mixing ratio $\alpha>0$, in terms of predictive performance. Moreover, we provide a rigorous framework to estimate the best mixing ratio $\alpha^$ where mixed SSL delivers the best predictive performance, while using the labeled and unlabeled data on hand. The effectiveness of our methodology in delivering substantial improvement compared to the standard supervised models, in a variety of settings, is demonstrated empirically through extensive simulation, in a manner that supports the theoretical analysis. We also demonstrate the applicability of our methodology (with some intuitive modifications) to improve more complex models, such as deep neural networks, in real-world regression tasks.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# ナイブベイズ・クラシファイアの欠落データ-決定と中毒 Naive Bayes Classifiers over Missing Data: Decision and Poisoning ( http://arxiv.org/abs/2303.04811v2 ) ライセンス: Link先を確認	Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris,	(参考訳) 我々は、欠落した値を含む可能性のある汚いデータセットに対して、ML分類器の証明可能な堅牢性について検討した。テストポイントがML分類器にとって確実なのは、分類器がトレーニングされた汚いデータセットのクリーン化バージョン(指数関数的に多い)に関係なく、そのテストポイントについて同じ予測を返した場合である。本稿では,Naive Bayes Classifiers (NBC) が,未知の値を持つ汚いデータセットよりも優れていることを理論的に示す。 (i)複数の入力テストポイントがすべて汚いデータセット上で確実に堅牢であるかどうかを決定するための効率的な多項式時間アルゴリズムが存在し、二クリーンデータセットに欠落した細胞を挿入することにより全ての入出力試験点を確実に不正にすることを目的としたデータ中毒攻撃は、単点試験点に対して多項式時間であるが、複数検点に対してはNP完全である。大規模な実験により、我々のアルゴリズムは効率的で、既存のベースラインより優れています。 We study the certifiable robustness of ML classifiers on dirty datasets that could contain missing values. A test point is certifiably robust for an ML classifier if the classifier returns the same prediction for that test point, regardless of which cleaned version (among exponentially many) of the dirty dataset the classifier is trained on. In this paper, we show theoretically that for Naive Bayes Classifiers (NBC) over dirty datasets with missing values: (i) there exists an efficient polynomial time algorithm to decide whether multiple input test points are all certifiably robust over a dirty dataset; and (ii) the data poisoning attack, which aims to make all input test points certifiably non-robust by inserting missing cells to the clean dataset, is in polynomial time for single test points but NP-complete for multiple test points. Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# CHGNN: 半スーパービジョンのコントラストハイパーグラフ学習ネットワーク CHGNN: A Semi-Supervised Contrastive Hypergraph Learning Network ( http://arxiv.org/abs/2303.06213v2 ) ライセンス: Link先を確認	Yumeng Song, Yu Gu, Tianyi Li, Jianzhong Qi, Zhenghao Liu, Christian S. Jensen, Ge Yu,	(参考訳) ハイパーグラフは、ソーシャルネットワークやバイオインフォマティクスなどのアプリケーションで見られるデータオブジェクト間の高次関係をモデル化することができる。しかし、グラフ畳み込みネットワークをハイパーグラフに拡張するハイパーグラフ学習に関する最近の研究は、ラベルのないデータの特徴から効果的に学習することはできない。このような学習のために,ラベル付きおよびラベルなしデータから学習するために,自己教師付きコントラスト学習技術を活用したコントラスト型ハイパーグラフニューラルネットワークCHGNNを提案する。第一に、CHGNNは適応的なハイパーグラフビュー生成器を備えており、これは自動拡張戦略を採用し、最小限のビューの摂動確率分布を学習する。第二に、CHGNNはハイパーエッジの均一性を考慮し、情報を効果的に融合する改良されたハイパーグラフエンコーダを含んでいる。第3に、CHGNNは、ビュージェネレータの類似性損失とノード分類損失と、監督信号を注入するハイパーエッジ均質損失とを組み合わせた共同損失機能を備えている。また、基本およびクロスバリデーションのコントラスト損失が含まれており、コントラスト損失トレーニングの強化に関係している。 9つの実データセットの実験結果から、CHGNNの有効性に関する洞察が得られる。 Hypergraphs can model higher-order relationships among data objects that are found in applications such as social networks and bioinformatics. However, recent studies on hypergraph learning that extend graph convolutional networks to hypergraphs cannot learn effectively from features of unlabeled data. To such learning, we propose a contrastive hypergraph neural network, CHGNN, that exploits self-supervised contrastive learning techniques to learn from labeled and unlabeled data. First, CHGNN includes an adaptive hypergraph view generator that adopts an auto-augmentation strategy and learns a perturbed probability distribution of minimal sufficient views. Second, CHGNN encompasses an improved hypergraph encoder that considers hyperedge homogeneity to fuse information effectively. Third, CHGNN is equipped with a joint loss function that combines a similarity loss for the view generator, a node classification loss, and a hyperedge homogeneity loss to inject supervision signals. It also includes basic and cross-validation contrastive losses, associated with an enhanced contrastive loss training process. Experimental results on nine real datasets offer insight into the effectiveness of CHGNN, showing that it outperforms 13 competitors in terms of classification accuracy consistently.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# SpikeCV: 継続的コンピュータビジョンの時代を開く SpikeCV: Open a Continuous Computer Vision Era ( http://arxiv.org/abs/2303.11684v2 ) ライセンス: Link先を確認	Yajing Zheng, Jiyuan Zhang, Rui Zhao, Jianhao Ding, Shiyan Chen, Ruiqin Xiong, Zhaofei Yu, Tiejun Huang,	(参考訳) SpikeCVはスパイクカメラ用の新しいオープンソースのコンピュータビジョンプラットフォームで、近年急速に発展しているニューロモルフィックな視覚センサーである。スパイクカメラでは、各画素位置が直接光強度を蓄積し、非同期にスパイクを発射する。出力されるバイナリスパイクは40,000Hzの周波数に達することができる。新しい視覚表現として、スパイクシーケンスは時空間完全性が高く、外界の連続的な視覚情報を保存する。スパイクカメラの低レイテンシと高ダイナミックレンジを活用することで、高画質イメージングや超高速目標検出など、多くのスパイクベースのアルゴリズムが大きな進歩を遂げている。スパイクビジョンのためのコミュニティエコロジーを構築して、より多くのユーザがスパイクカメラを利用できるようにするために、SpikeCVは、さまざまな超高速シーンデータセット、ハードウェアインターフェース、使いやすいモジュールライブラリを提供する。 SpikeCVはスパイクデータのカプセル化、データセットインターフェースの標準化、ビジョンタスクのモジュール化、挑戦的なシーンのためのリアルタイムアプリケーションに焦点を当てている。オープンソースのPythonエコシステムの出現により、SpikeCVのモジュールはPythonライブラリとして使用でき、研究者の数値解析のニーズの多くを満たすことができる。オフラインおよびリアルタイムアプリケーションにおけるSpikeCVの効率性を示す。プロジェクトリポジトリのアドレスは \url{https://openi.pcl.ac.cn/Cordium/SpikeCV} と \url{https://github.com/Zyj061/SpikeCV SpikeCV is a new open-source computer vision platform for the spike camera, which is a neuromorphic visual sensor that has developed rapidly in recent years. In the spike camera, each pixel position directly accumulates the light intensity and asynchronously fires spikes. The output binary spikes can reach a frequency of 40,000 Hz. As a new type of visual expression, spike sequence has high spatiotemporal completeness and preserves the continuous visual information of the external world. Taking advantage of the low latency and high dynamic range of the spike camera, many spike-based algorithms have made significant progress, such as high-quality imaging and ultra-high-speed target detection. To build up a community ecology for the spike vision to facilitate more users to take advantage of the spike camera, SpikeCV provides a variety of ultra-high-speed scene datasets, hardware interfaces, and an easy-to-use modules library. SpikeCV focuses on encapsulation for spike data, standardization for dataset interfaces, modularization for vision tasks, and real-time applications for challenging scenes. With the advent of the open-source Python ecosystem, modules of SpikeCV can be used as a Python library to fulfilled most of the numerical analysis needs of researchers. We demonstrate the efficiency of the SpikeCV on offline inference and real-time applications. The project repository address are \url{https://openi.pcl.ac.cn/Cordium/SpikeCV} and \url{https://github.com/Zyj061/SpikeCV	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# 分散スパースブロック符号のための因子化器 Factorizers for Distributed Sparse Block Codes ( http://arxiv.org/abs/2303.13957v2 ) ライセンス: Link先を確認	Michael Hersche, Aleksandar Terzic, Geethan Karunaratne, Jovin Langenegger, Angéline Pouget, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi,	(参考訳) 分散スパースブロック符号(SBC)は、固定幅ベクトルを用いてシンボルデータ構造を符号化し、操作するためのコンパクトな表現を示す。しかし、大きな課題の1つは、データ構造の分散表現を、可能なすべての組み合わせを探索することなく構成要素に切り離し、あるいは分解することである。現代のニューラルネットワークがクエリSBCsベクトルを生成するために行った知覚的不確実性や近似のため、SBCsベクトルがノイズが多いと、この分解はより困難になる。これらの課題に対処するために,我々はまず,GSBCと呼ばれるより柔軟で一般化されたSBCを分解する高速かつ高精度な手法を提案する。我々の反復分解器は、しきい値に基づく非線形活性化、条件付きランダムサンプリング、および$\ell_\infty$-based similarity metricを導入している。第二に,Deep Convolutional Neural Network (CNN) を用いて生成したノイズの多い積ベクトルによってクエリされた場合,その精度が向上する。これにより、CNNの巨大な完全連結層(FCL)を置き換えることができる。$C$のトレーニング可能なクラスベクトルや属性の組み合わせは、F$-factorコードブックを持つ因子によって暗黙的に表現され、それぞれ$\sqrt[\leftroot{-2}\uproot{2}F]{C}$の固定コードベクタで表現できる。本稿では,CNNの分類層と新たな損失関数を柔軟に統合する手法を提案する。この統合により、畳み込み層はノイズの多い積ベクトルを生成できるので、ファクターはデコードでき、デコードされた因子は下流のタスクに基づいて異なる解釈をすることができる。 CIFAR-100, ImageNet-1K, RAVENデータセット上での4つの深層CNNアーキテクチャの実現可能性を示す。あらゆるユースケースにおいて、パラメータと操作の数はFCLと比較して顕著に減少する。 Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-width vectors. One major challenge however is to disentangle, or factorize, the distributed representation of data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when SBCs vectors are noisy due to perceptual uncertainty and approximations made by modern neural networks to generate the query SBCs vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an $\ell_\infty$-based similarity metric. Secondly, the proposed factorizer maintains a high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby $C$ trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having $F$-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. With this integration, the convolutional layers can generate a noisy product vector that our factorizer can still decode, whereby the decoded factors can have different interpretations based on downstream tasks. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are notably reduced compared to the FCL.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# 不確実性誘導型次ベストビュー最適化を用いたアクティブインプリシトオブジェクト再構成 Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimization ( http://arxiv.org/abs/2303.16739v4 ) ライセンス: Link先を確認	Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu,	(参考訳) オブジェクト再構築時のセンサビューのアクティブな計画は、自律移動ロボットにとって不可欠である。有効な方法は、精度と効率のバランスをとれるべきである。本稿では,新たな暗黙表現とアクティブな再構築タスクをシームレスに統合する手法を提案する。私たちは幾何学的プロキシとして暗黙の占有領域を構築します。トレーニング中、事前のオブジェクトバウンディングボックスを補助情報として利用し、クリーンで詳細な再構築を生成する。ビューの不確実性を評価するために,再構成された占有確率場から直接エントロピーを抽出するサンプリングベースアプローチを,ビュー情報ゲインの尺度として採用した。これにより、さらなる不確実性マップや学習の必要性がなくなる。有限個の候補の集合におけるビューの不確実性を比較する従来の方法とは異なり、連続多様体上の次のベストビュー(NBV)を求める。暗黙的表現の微分可能性を活用することで、NBVは勾配降下を用いたビューの不確実性を最大化することにより、直接最適化することができる。これは異なるシナリオに対するメソッドの適応性を著しく向上させる。シミュレーションおよび実世界の実験により,本手法はアクティブな再構築作業におけるビュープランニングの精度と効率を効果的に向上することを示した。提案されたシステムはhttps://github.com/HITSZ-NRSL/ActiveImplicitRecon.gitでオープンソース化される。 Actively planning sensor views during object reconstruction is crucial for autonomous mobile robots. An effective method should be able to strike a balance between accuracy and efficiency. In this paper, we propose a seamless integration of the emerging implicit representation with the active reconstruction task. We build an implicit occupancy field as our geometry proxy. While training, the prior object bounding box is utilized as auxiliary information to generate clean and detailed reconstructions. To evaluate view uncertainty, we employ a sampling-based approach that directly extracts entropy from the reconstructed occupancy probability field as our measure of view information gain. This eliminates the need for additional uncertainty maps or learning. Unlike previous methods that compare view uncertainty within a finite set of candidates, we aim to find the next-best-view (NBV) on a continuous manifold. Leveraging the differentiability of the implicit representation, the NBV can be optimized directly by maximizing the view uncertainty using gradient descent. It significantly enhances the method's adaptability to different scenarios. Simulation and real-world experiments demonstrate that our approach effectively improves reconstruction accuracy and efficiency of view planning in active reconstruction tasks. The proposed system will open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# Med-Tuning: 医療用ボリュームセグメンテーションのためのパラメータ効率の良いチューニングフレームワーク Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation ( http://arxiv.org/abs/2304.10880v4 ) ライセンス: Link先を確認	Jiachen Shen, Wenxuan Wang, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, Jiangyun Li,	(参考訳) 医用ボリュームセグメンテーションのための深層学習に基づく手法のモデル性能を高めるため、FT(pre-training then fine-tuning)パラダイムが広く採用されている。しかし、従来のフルFTは高い計算コストとメモリコストを発生させる。このように、医療用ボリュームセグメンテーションタスクのための微調整済みモデルを、効果的かつパラメータ効率の両面で重要視している。本稿では,医療用ボリュームセグメンテーションタスクのためのパラメータ効率チューニング(PET)を実現するためのMed-Tuningという新しいフレームワークと,タスク固有の特徴抽出のためのMed-Adapterという効率的なプラグイン・アンド・プレイモジュールを提案する。調整パラメータが少なかったため,本フレームワークは,自然画像上で事前学習したセグメンテーション作業における2次元ベースラインの精度を向上させる。 3つのベンチマークデータセット(CTおよびMRIモダリティ)の大規模な実験により,本手法は従来のPET法よりも容積セグメンテーションタスクにおいて良好な結果が得られることが示された。完全なFTと比較して、Med-Tuningは細調整されたモデルのパラメータを最大4倍に減らし、セグメンテーション性能も向上した。プロジェクトのWebページは \url{https://rubics-xuan.github.io/Med-Tuning/} にある。 The "pre-training then fine-tuning (FT)" paradigm is widely adopted to boost the model performance of deep learning-based methods for medical volumetric segmentation. However, conventional full FT incurs high computational and memory costs. Thus, it is of increasing importance to fine-tune pre-trained models for medical volumetric segmentation tasks in a both effective and parameter-efficient manner. In this paper, we introduce a new framework named Med-Tuning to realize parameter-efficient tuning (PET) for medical volumetric segmentation task and an efficient plug-and-play module named Med-Adapter for task-specific feature extraction. With a small number of tuned parameters, our framework enhances the 2D baselines's precision on segmentation tasks, which are pre-trained on natural images. Extensive experiments on three benchmark datasets (CT and MRI modalities) show that our method achieves better results than previous PET methods on volumetric segmentation tasks. Compared to full FT, Med-Tuning reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance. Our project webpage is at \url{https://rubics-xuan.github.io/Med-Tuning/}.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# 忠実度に基づく滑らかなMin-Relative Entropy:特性と応用 Fidelity-Based Smooth Min-Relative Entropy: Properties and Applications ( http://arxiv.org/abs/2305.05859v2 ) ライセンス: Link先を確認	Theshani Nuradha, Mark M. Wilde,	(参考訳) 忠実度に基づく滑らかなミン相対エントロピー(英: smooth min-relative entropy)は、熱力学やコヒーレンスといった資源理論を含む、以前の量子情報の研究において様々な文脈で現れた微分可能性尺度である。ここでは、この量について包括的に研究する。まず、データ処理の不等式を含むいくつかの基本的な性質を満たすことを証明する。また,忠実度に基づくスムーズなミン相対エントロピーと,スムーズなミン相対エントロピーとスムーズなサンドイッチされたR'enyi相対エントロピーを含む広く用いられている情報理論量との間には,サンドイッチされたR'enyi相対エントロピーとスムーズな最大相対エントロピーが特別な場合である。その後、これらの接続を用いて、忠実性に基づく滑らかな min-相対エントロピーとすべての滑らかなサンドイッチされた R'enyi 相対エントロピーの2次漸近性を確立し、第一次項が量子相対エントロピーであり、第二次項が量子相対エントロピー分散を伴うことを発見した。また, 得られた特性を利用して, 対象状態が混合された一般資源理論において, 忠実度に基づく滑らかな min-相対エントロピーが, 操作タスクに対して一発のバウンダリを与えることを示す。上記の観測は、蒸留可能なランダム性に関する上界の2階展開と、特定の古典量子状態の蒸留可能なランダム性の正確な2階漸近をもたらす。最後に、滑らかな最大相対エントロピーと滑らかな条件付きmin-エントロピーのための半定値プログラムと、忠実度に基づく滑らかなmin-相対エントロピーのための双線型プログラムを構築し、このプログラムを用いて、最後のものから最初のものへの有界性について検討する。 The fidelity-based smooth min-relative entropy is a distinguishability measure that has appeared in a variety of contexts in prior work on quantum information, including resource theories like thermodynamics and coherence. Here we provide a comprehensive study of this quantity. First we prove that it satisfies several basic properties, including the data-processing inequality. We also establish connections between the fidelity-based smooth min-relative entropy and other widely used information-theoretic quantities, including smooth min-relative entropy and smooth sandwiched R\'enyi relative entropy, of which the sandwiched R\'enyi relative entropy and smooth max-relative entropy are special cases. After that, we use these connections to establish the second-order asymptotics of the fidelity-based smooth min-relative entropy and all smooth sandwiched R\'enyi relative entropies, finding that the first-order term is the quantum relative entropy and the second-order term involves the quantum relative entropy variance. Utilizing the properties derived, we also show how the fidelity-based smooth min-relative entropy provides one-shot bounds for operational tasks in general resource theories in which the target state is mixed, with a particular example being randomness distillation. The above observations then lead to second-order expansions of the upper bounds on distillable randomness, as well as the precise second-order asymptotics of the distillable randomness of particular classical-quantum states. Finally, we establish semi-definite programs for smooth max-relative entropy and smooth conditional min-entropy, as well as a bilinear program for the fidelity-based smooth min-relative entropy, which we subsequently use to explore the tightness of a bound relating the last to the first.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# AIを利用したコード生成ツールにおける信頼のための調査と設計 Investigating and Designing for Trust in AI-powered Code Generation Tools ( http://arxiv.org/abs/2305.11248v2 ) ライセンス: Link先を確認	Ruotong Wang, Ruijia Cheng, Denae Ford, Thomas Zimmermann,	(参考訳) GitHub CopilotのようなAI駆動のコード生成ツールが普及するにつれて、ソフトウェア開発者がAIツールを信頼していることを理解することが、ツールの採用と責任ある使用の鍵となる。しかし、開発者がAIで信頼を構築する方法や、生成するAIシステムのインターフェースを設計して、適切なレベルの信頼を促進する方法についてはほとんど分かっていません。本稿では,2段階の質的調査の結果について述べる。私たちはまず17人の開発者にインタビューを行い、AIコード生成ツールを適切に信頼する上での課題を理解しました。適切な期待の構築、AIツールの設定、AI提案の検証など、主な3つの課題を取り上げました。これらの課題に対処するため、我々は第2段階の設計調査を行い、開発者の信頼構築プロセスを支援する設計概念を探求した。 1)AIパフォーマンスのコミュニケーションにより、ユーザーは適切な期待を達成できる。 2) ユーザが好みを設定して調整することでAIを設定できるようにし、 3)AI提案の評価を支援するためのモデルメカニズムの指標を提供する。これらの設計概念が、AIによるコード生成ツールへの適切な信頼を構築するのにどのように役立つか、そして設計における潜在的なリスクについて、開発者のフィードバックを集めた。これらの結果から,AIを利用したコード生成ツールの信頼性設計に関する設計勧告が提案されている。 As AI-powered code generation tools such as GitHub Copilot become popular, it is crucial to understand software developers' trust in AI tools -- a key factor for tool adoption and responsible usage. However, we know little about how developers build trust with AI, nor do we understand how to design the interface of generative AI systems to facilitate their appropriate levels of trust. In this paper, we describe findings from a two-stage qualitative investigation. We first interviewed 17 developers to contextualize their notions of trust and understand their challenges in building appropriate trust in AI code generation tools. We surfaced three main challenges -- including building appropriate expectations, configuring AI tools, and validating AI suggestions. To address these challenges, we conducted a design probe study in the second stage to explore design concepts that support developers' trust-building process by 1) communicating AI performance to help users set proper expectations, 2) allowing users to configure AI by setting and adjusting preferences, and 3) offering indicators of model mechanism to support evaluation of AI suggestions. We gathered developers' feedback on how these design concepts can help them build appropriate trust in AI-powered code generation tools, as well as potential risks in design. These findings inform our proposed design recommendations on how to design for trust in AI-powered code generation tools.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# ReFIT: 推論中のリランカからの関連フィードバック ReFIT: Relevance Feedback from a Reranker during Inference ( http://arxiv.org/abs/2305.11744v2 ) ライセンス: Link先を確認	Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi,	(参考訳) Retrieve-and-Rerankは、ニューラルネットワーク検索において一般的なフレームワークであり、バイエンコーダネットワークは、最初に定義された候補数(例えば、K=100)を検索し、さらに強力なクロスエンコーダモデルによって再帰する。リランカは、検索者に比べて改善された候補スコアを得ることが多いが、そのスコープは検索された上位K候補に限られる。その結果、リランカはRecall@Kで検索性能を改善することができない。本研究では,リランカを利用してリコールを改善する手法を提案する。具体的には、推論中のテストインスタンスを考慮し、そのインスタンスのリランカの予測を軽量な更新メカニズムを使用して検索者のクエリ表現に蒸留する。蒸留損失の目的は、レトリバーの候補スコアを、リランカーが生成したスコアとより緊密に合わせることである。アルゴリズムは、更新されたクエリベクタを使用して第2の検索ステップを実行する。本研究では,この手法が様々な検索・参照フレームワークに適用可能であり,複数のドメイン,言語,モダリティ間の検索リコールを大幅に強化することを示す。 Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# サブスペース構成可能なネットワーク Subspace-Configurable Networks ( http://arxiv.org/abs/2305.13536v3 ) ライセンス: Link先を確認	Dong Wang, Olga Saukh, Xiaoxi He, Lothar Thiele,	(参考訳) エッジデバイスへのディープラーニングモデルのデプロイは増加しているが、知覚されたデータの動的変化に直面した場合、これらのモデルは堅牢性に欠けることが多い。これはセンサーのドリフトや、特定のセンサー配置や自然に変化する感知条件などの要因によってオフライントレーニングで使用されたものと比較して、データの変動に起因する可能性がある。したがって、望まれる堅牢性を達成するには、不変アーキテクチャか、データ拡張技術のような特別なトレーニングアプローチのいずれかを活用する必要がある。あるいは、入力変換はドメインシフト問題として扱われ、デプロイ後のモデル適応によって解決される。本稿では、パラメータ設定のための最適なネットワークがサブ空間の一部である構成可能なネットワークのパラメータ化部分空間を訓練する。得られた部分空間は低次元であり、複雑な非可逆変換であっても驚くほど単純な構造を持ち、限られた記憶資源と計算資源が懸かっている場合、サブスペース構成可能なネットワーク(SCN)の極めて高い効率をもたらす。 While the deployment of deep learning models on edge devices is increasing, these models often lack robustness when faced with dynamic changes in sensed data. This can be attributed to sensor drift, or variations in the data compared to what was used during offline training due to factors such as specific sensor placement or naturally changing sensing conditions. Hence, achieving the desired robustness necessitates the utilization of either an invariant architecture or specialized training approaches, like data augmentation techniques. Alternatively, input transformations can be treated as a domain shift problem, and solved by post-deployment model adaptation. In this paper, we train a parameterized subspace of configurable networks, where an optimal network for a particular parameter setting is part of this subspace. The obtained subspace is low-dimensional and has a surprisingly simple structure even for complex, non-invertible transformations of the input, leading to an exceptionally high efficiency of subspace-configurable networks (SCNs) when limited storage and computing resources are at stake.	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# SciMON:新奇性に最適化された科学的な吸気装置 SciMON: Scientific Inspiration Machines Optimized for Novelty ( http://arxiv.org/abs/2305.14259v6 ) ライセンス: Link先を確認	Qingyun Wang, Doug Downey, Heng Ji, Tom Hope,	(参考訳) 文献に基づく新たな科学的方向を生成するために,ニューラルランゲージモデルを探索し,拡張する。文献に基づく仮説生成の研究は伝統的に、仮説の表現性を制限する二進的リンク予測に焦点を当ててきた。この一連の作品は、新規性を最適化することにも焦点をあてていない。我々は、入力背景コンテキスト(例えば、問題、実験的な設定、目標)としてモデルを使い、文学に根ざした自然言語のアイデアを出力する、新しい設定で劇的な出発点を取ります。本稿では,過去の科学的論文から「吸入」を抽出し,先行論文と反復的に比較し,十分な新規性が達成されるまでアイデア提案を更新することによって,新規性のために明示的に最適化するモデリングフレームワークであるSciMONについて述べる。包括的評価の結果,GPT-4は全体的に低い技術深度と新規性を持つアイデアを産み出す傾向にあることがわかった。我々の研究は、科学文献から生まれた新しいアイデアを生み出す言語モデルの評価と開発に向けた第一歩である。 We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and developing language models that generate new ideas derived from the scientific literature	翻訳日:2024-05-30 04:46:21 公開日:2024-05-28
# ベイジアンサロゲートモデルによるLCM生成テキストの効率的な検出 Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model ( http://arxiv.org/abs/2305.16617v2 ) ライセンス: Link先を確認	Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng,	(参考訳) 特に大規模言語モデル(LLM)から機械生成テキストを検出することは、その誤用による深刻な社会問題を防止するために重要である。特定のデータセットに専用の検出器を訓練する手法もあるが、見えないテストデータに一般化するには不十分である。最近のTectGPTは、期待できる検出性能を示しているが、単一の候補を検出するには、数百の摂動でソースLLMをクエリする必要があるため、重大な非効率な問題に悩まされている。この論文は、このギャップを埋めることを目的としている。具体的には,ベイジアンサロゲートモデルを導入し,ベイジアン不確実性に基づいた典型的なサンプルを選択し,典型的なサンプルから他のサンプルへのスコアを補間し,クエリ効率を向上させることを提案する。実験の結果,提案手法はクエリコストの低い既存手法よりも有意に優れていた。特に,LLaMAファミリモデルで生成されたテキストを検出する場合,200クエリで検出GPTを2～3クエリで上回る。 The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection performance, it suffers from significant inefficiency issues, as detecting a single candidate requires querying the source LLM with hundreds of its perturbations. This paper aims to bridge this gap. Concretely, we propose to incorporate a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other samples, to improve query efficiency. Empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget. Notably, when detecting the text generated by LLaMA family models, our method with just 2 or 3 queries can outperform DetectGPT with 200 queries.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# プラグイン・パフォーマティブ最適化 Plug-in Performative Optimization ( http://arxiv.org/abs/2305.18728v3 ) ライセンス: Link先を確認	Licong Lin, Tijana Zrnic,	(参考訳) 予測が実行された場合、どの予測器をデプロイするかの選択は、将来の観測の分布に影響を与える。演奏性の下での学習における過大な目標とは、低い‘emph{performative risk}’、すなわち、誘導分布における優れたパフォーマンスを持つ予測子を見つけることである。バンディットやその他の微分自由法を含むパフォーマンスリスクを最適化する解の族は、パフォーマンスフィードバックのいかなる構造にも依存せず、収束速度が極端に遅くなる。補完的な解の族は、戦略的分類における最良の応答モデルのようなフィードバックに明示的な \emph{models} を利用する。しかしながら、これらの比率は、正しいフィードバックモデルに大きく依存しています。本研究は, 性能予測において, 潜在的に不特定なモデルを用いるための一般的なプロトコルである<emph{plug-in Performanceative optimization}について検討する。誤特定が過度に過度でない限り、このソリューションはモデルに依存しない戦略よりもはるかに優れていることを示す。我々の結果は、たとえ不特定であっても、モデルが実際にパフォーマンス設定の学習に役立つという仮説を支持している。 When predictions are performative, the choice of which predictor to deploy influences the distribution of future observations. The overarching goal in learning under performativity is to find a predictor that has low \emph{performative risk}, that is, good performance on its induced distribution. One family of solutions for optimizing the performative risk, including bandits and other derivative-free methods, is agnostic to any structure in the performative feedback, leading to exceedingly slow convergence rates. A complementary family of solutions makes use of explicit \emph{models} for the feedback, such as best-response models in strategic classification, enabling faster rates. However, these rates critically rely on the feedback model being correct. In this work we study a general protocol for making use of possibly misspecified models in performative prediction, called \emph{plug-in performative optimization}. We show this solution can be far superior to model-agnostic strategies, as long as the misspecification is not too extreme. Our results support the hypothesis that models, even if misspecified, can indeed help with learning in performative settings.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# 最小限超過リスク最適化の効率的な確率近似 Efficient Stochastic Approximation of Minimax Excess Risk Optimization ( http://arxiv.org/abs/2306.00026v2 ) ライセンス: Link先を確認	Lijun Zhang, Haomin Bai, Wei-Wei Tu, Ping Yang, Yao Hu,	(参考訳) 従来の分散ロバスト最適化(DRO)は、分布の集合に対する最大リスクを最小限にすることを目的としているが、Agarwal と Zhang (2022) は先日、リスクを過剰リスクに置き換える変種を提案した。 DROと比較して、新しい定式化$\unicode{x2013}$minimax excess risk optimization (MERO) は、異なる分布における異種ノイズの影響を抑制する利点がある。しかし、過剰リスクの選択は、非常に困難なミニマックス最適化問題を引き起こし、現在、経験的MEROの非効率アルゴリズムが存在するのみである。本稿では,MEROを直接対象とする効率的な確率近似手法を提案する。具体的には,各分布の最小リスクを推定するために,確率凸最適化の手法を活用し,偏り勾配を持つ確率凸凹最適化(SCCO)問題としてMEROを解く。バイアスの存在は、SCCOの理論的保証を適用不可能にし、幸運なことに、最小リスクの推定誤差に起因するバイアスが制御下にあることを実証する。したがって、MEROは依然としてほぼ最適な収束速度で最適化することができる。さらに,各分布から抽出したサンプルの量が異なる場合の現実的シナリオについて検討し,分布依存収束率をもたらす確率論的アプローチを提案する。 While traditional distributionally robust optimization (DRO) aims to minimize the maximal risk over a set of distributions, Agarwal and Zhang (2022) recently proposed a variant that replaces risk with excess risk. Compared to DRO, the new formulation$\unicode{x2013}$minimax excess risk optimization (MERO) has the advantage of suppressing the effect of heterogeneous noise in different distributions. However, the choice of excess risk leads to a very challenging minimax optimization problem, and currently there exists only an inefficient algorithm for empirical MERO. In this paper, we develop efficient stochastic approximation approaches which directly target MERO. Specifically, we leverage techniques from stochastic convex optimization to estimate the minimal risk of every distribution, and solve MERO as a stochastic convex-concave optimization (SCCO) problem with biased gradients. The presence of bias makes existing theoretical guarantees of SCCO inapplicable, and fortunately, we demonstrate that the bias, caused by the estimation error of the minimal risk, is under-control. Thus, MERO can still be optimized with a nearly optimal convergence rate. Moreover, we investigate a practical scenario where the quantity of samples drawn from each distribution may differ, and propose a stochastic approach that delivers distribution-dependent convergence rates.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# 半監督残差変換器を用いた予算アノテーションを用いた効率的な異常検出 Efficient Anomaly Detection with Budget Annotation Using Semi-Supervised Residual Transformer ( http://arxiv.org/abs/2306.03492v3 ) ライセンス: Link先を確認	Hanxi Li, Jingqi Wu, Hao Chen, Mingwen Wang, Chunhua Shen,	(参考訳) 異常検出は、通常、訓練中に通常のサンプルのみが見られ、検出器は飛行中の異常を検出する必要があるため、難しい。最近提案されたディープラーニングベースのアプローチは、この問題を緩和する可能性があるが、実世界のアプリケーションのための産業レベルの異常検知器を得るには、まだまだ長い道のりがある。一方、特定のADタスクでは、精度を高めるために、いくつかの異常サンプルを手動でラベル付けする。しかし、このパフォーマンス向上には相当なアノテーションの努力が費やされているため、多くの実践的なシナリオでは難解である。この研究では、上記の2つの問題を統一されたフレームワークで解決する。まず、パッチマッチングベースのADアルゴリズムの成功に触発されて、新しい位置制約パッチマッチングによって生成される残差に対して、スライディング・ビジョン・トランスフォーマーを訓練する。第二に、従来の画素ワイドセグメンテーション問題をブロックワイド分類問題に投入する。これにより、スライディング変圧器は、アノテーションの手間をはるかに少なくして、さらに高い精度が得られる。第3に,ラベル付けコストをさらに削減するために,境界ボックスのみを用いて異常領域をラベル付けすることを提案する。弱ラベルによる未ラベル領域を、2つの新しいデータ拡張手法を備えた高度にカスタマイズされた半教師付き学習スキームを用いて効果的に活用する。提案手法は、教師なしシナリオと教師なしシナリオの両方において、すべての評価指標を用いて、最先端のアプローチよりも優れている。一般的なMVTec-ADデータセットでは、SemiRESTアルゴリズムは、教師なし状態における平均精度(AP)が81.2%、教師付き異常検出のためのAPが84.4%である。意外なことに、バウンディングボックスベースのセミスーパービジョンでは、SemiRESTはMVTec-AD上で完全な監視(83.8%AP)でSOTAメソッドよりも優れています。 Anomaly Detection is challenging as usually only the normal samples are seen during training and the detector needs to discover anomalies on-the-fly. The recently proposed deep-learning-based approaches could somehow alleviate the problem but there is still a long way to go in obtaining an industrial-class anomaly detector for real-world applications. On the other hand, in some particular AD tasks, a few anomalous samples are labeled manually for achieving higher accuracy. However, this performance gain is at the cost of considerable annotation efforts, which can be intractable in many practical scenarios. In this work, the above two problems are addressed in a unified framework. Firstly, inspired by the success of the patch-matching-based AD algorithms, we train a sliding vision transformer over the residuals generated by a novel position-constrained patch-matching. Secondly, the conventional pixel-wise segmentation problem is cast into a block-wise classification problem. Thus the sliding transformer can attain even higher accuracy with much less annotation labor. Thirdly, to further reduce the labeling cost, we propose to label the anomalous regions using only bounding boxes. The unlabeled regions caused by the weak labels are effectively exploited using a highly-customized semi-supervised learning scheme equipped with two novel data augmentation methods. The proposed method outperforms all the state-of-the-art approaches using all the evaluation metrics in both the unsupervised and supervised scenarios. On the popular MVTec-AD dataset, our SemiREST algorithm obtains the Average Precision (AP) of 81.2% in the unsupervised condition and 84.4% AP for supervised anomaly detection. Surprisingly, with the bounding-box-based semi-supervisions, SemiREST still outperforms the SOTA methods with full supervision (83.8% AP) on MVTec-AD.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# 相対論的シナリオにおける量子補性トレードオフの解法 Unveiling quantum complementarity tradeoffs in relativistic scenarios ( http://arxiv.org/abs/2306.08136v3 ) ライセンス: Link先を確認	Marcos L. W. Basso, Ismael L. Paiva, Pedro R. Dieguez,	(参考訳) 補完性は様々な量子現象を理解する上で重要な役割を果たしている。ここでは、完全相補関係の量間のトレードオフが、内部スピンを持つ粒子に対して任意の時空でどのように修正されるかを示す。この効果は、時空における局所的なウィグナー回転に起因し、スピンを系の外部自由度に結合する。本研究は,2つの一般化遅延チョイス干渉計を用いた。干渉計内での相補性トレードオフの違いにもかかわらず、両方のセットアップのインターフェロメトリの可視性は、いかなる相対論的状態においても一致している。我々の結果は、一般相対性理論が局所ウィグナー回転のように量子重畳に対する普遍的なデコヒーレンス効果を誘導し、純粋にキネマティックであり、スピンダイナミクスを妨げているという発見を拡張した。この結果のニュートン限界を概説する。 Complementarity plays a pivotal role in understanding a diverse range of quantum phenomena. Here, we show how the tradeoff between quantities of a complete complementarity relation is modified in an arbitrary spacetime for a particle with an internal spin. This effect stems from local Wigner rotations in the spacetime, which couple the spin to the system's external degrees of freedom. To conduct our study, we utilize two generalized delayed-choice interferometers. Despite differences in complementarity tradeoffs inside the interferometers, the interferometric visibility of both setups coincides in any relativistic regime. Our results extend the finding that general relativity induces a universal decoherence effect on quantum superpositions, as local Wigner rotations, being purely kinematical, preclude any spin dynamics. To illustrate, we analyze the Newtonian limit of our results.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# Quantum Pufferfish Privacy: 量子システムのためのフレキシブルなプライバシフレームワーク Quantum Pufferfish Privacy: A Flexible Privacy Framework for Quantum Systems ( http://arxiv.org/abs/2306.13054v2 ) ライセンス: Link先を確認	Theshani Nuradha, Ziv Goldfeld, Mark M. Wilde,	(参考訳) 本稿では,量子フグのプライバシ(QPP)と呼ばれる,量子システムのための汎用的なプライバシフレームワークを提案する。古典的なフグのプライバシーにインスパイアされた私たちの定式化は、プライベート情報、実行可能な測定、ドメイン知識を指定する柔軟性を提供することで、量子微分プライバシーの制限を一般化し、対処します。本稿では,QPPをData-Leditzky情報スペクトルのばらつきの観点から等価に定式化できることを示す。我々は、この発散を半定値プログラムとして再定義し、そのいくつかの特性を導出し、QPP機構の凸性、構成性、および後処理を証明するために使用される。脱分極機構のQPPを保証するパラメータも導出する。一般QPP機構のプライバシ・ユーティリティ・トレードオフを解析し、また、脱分極機構を明示的な事例として研究する。 QPPフレームワークは、量子アルゴリズムを利用する仮説テストパイプラインを介して、プライバシ違反を特定するためのプライバシ監査に適用される。量子フェアネスや他の量子分岐への接続についても検討し、いくつかのQPPの変種について検討した。 We propose a versatile privacy framework for quantum systems, termed quantum pufferfish privacy (QPP). Inspired by classical pufferfish privacy, our formulation generalizes and addresses limitations of quantum differential privacy by offering flexibility in specifying private information, feasible measurements, and domain knowledge. We show that QPP can be equivalently formulated in terms of the Datta-Leditzky information spectrum divergence, thus providing the first operational interpretation thereof. We reformulate this divergence as a semi-definite program and derive several properties of it, which are then used to prove convexity, composability, and post-processing of QPP mechanisms. Parameters that guarantee QPP of the depolarization mechanism are also derived. We analyze the privacy-utility tradeoff of general QPP mechanisms and, again, study the depolarization mechanism as an explicit instance. The QPP framework is then applied to privacy auditing for identifying privacy violations via a hypothesis testing pipeline that leverages quantum algorithms. Connections to quantum fairness and other quantum divergences are also explored and several variants of QPP are examined.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# 人中心型eXplainable Artificial Intelligence(XAI)の未来は、ポストホックな説明ではない The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations ( http://arxiv.org/abs/2307.00364v2 ) ライセンス: Link先を確認	Vinitra Swamy, Jibril Frej, Tanja Käser,	(参考訳) 説明可能な人工知能(XAI)は、深層学習システムに対する人間の理解と信頼を可能にする上で重要な役割を担っている。モデルがより大きく、よりユビキタスになり、日常的な側面で広く普及するにつれて、モデルミスの悪影響を最小限に抑えるために説明可能性が必要である。残念なことに、人間中心のXAI(例えば医療、教育、パーソナライズされた広告の予測タスク)における現在のアプローチは、単一のポストホックの説明器に依存する傾向にある。そこで本稿では,現状技術解説者の限界に対処するための行動を呼びかける。本稿では、ポストホックな説明可能性から解釈可能なニューラルネットワークアーキテクチャの設計へのシフトを提案する。我々は、人間中心XAI(リアルタイム、正確、行動可能、人間解釈可能、一貫性)の5つのニーズを特定し、解釈可能なニューラルネットワークワークフロー(InterpretCCによる適応ルーティングとI2MDによる時間的診断)の2つのスキームを提案する。我々は、人間中心のXAIの未来は、ブラックボックスの説明や従来の解釈可能なモデルへの回帰ではなく、本質的に解釈可能なニューラルネットワークにあると仮定する。 Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single post-hoc explainer, whereas recent work has identified systematic disagreement between post-hoc explainers when applied to the same instances of underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose a shift from post-hoc explainability to designing interpretable neural network architectures. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing with InterpretCC and temporal diagnostics with I2MD). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# グラフ自己同型群同変ニューラルネットワーク Graph Automorphism Group Equivariant Neural Networks ( http://arxiv.org/abs/2307.07810v2 ) ライセンス: Link先を確認	Edward Pearce-Crump, William J. Knottenbelt,	(参考訳) 置換同変ニューラルネットワークは通常、グラフ上に存在するデータから学習するために使用される。しかしながら、任意のグラフ $G$ が $n$ 頂点を持つ場合、対称群 $S_n$ をその対称性の群として用いることは、頂点間の関係を考慮に入れない。対称性の実際の群が自己同型群 Aut$(G)$ であることを考えると、学習可能で線型な Aut$(G)$-同変函数の完全な特徴付けを得ることにより、Aut$(G)$ に同値なニューラルネットワークを構築する方法を示す。特に、これらの層関数に対して、$\mathbb{R}^{n}$ の標準基底で分散する行列の集合が見つかる。この結果は、Frucht (1938) の定理により、任意の有限群がグラフの自己同型群に同型であることを示すため、対称性の群が有限群であるデータから学ぶ上で重要な結果をもたらす。 Permutation equivariant neural networks are typically used to learn from data that lives on a graph. However, for any graph $G$ that has $n$ vertices, using the symmetric group $S_n$ as its group of symmetries does not take into account the relations that exist between the vertices. Given that the actual group of symmetries is the automorphism group Aut$(G)$, we show how to construct neural networks that are equivariant to Aut$(G)$ by obtaining a full characterisation of the learnable, linear, Aut$(G)$-equivariant functions between layers that are some tensor power of $\mathbb{R}^{n}$. In particular, we find a spanning set of matrices for these layer functions in the standard basis of $\mathbb{R}^{n}$. This result has important consequences for learning from data whose group of symmetries is a finite group because a theorem by Frucht (1938) showed that any finite group is isomorphic to the automorphism group of a graph.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# 空間ピラミッドプールを用いた畳み込みニューラルネットワークによるネットワークロバスト性評価の包括的解析 Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid Pooling ( http://arxiv.org/abs/2308.08012v2 ) ライセンス: Link先を確認	Wenjun Jiang, Tianlong Fan, Changhao Li, Chuanfu Zhang, Tao Zhang, Zong-fu Luo,	(参考訳) 複雑なネットワークを理解し、最適化し、修復するための重要な側面である接続性の堅牢性は、伝統的に時間がかかり、しばしば非現実的なシミュレーションによって評価されてきた。幸いなことに、機械学習はこの課題に対処するための新たな道筋を提供する。しかしながら、より一般的なエッジ削除シナリオのパフォーマンス、堅牢性を直接トレーニングする代わりにアタックカーブを通じて堅牢性を取得すること、予測タスクのスケーラビリティ、予測能力の転送性など、いくつかの重要な問題は未解決のままである。本稿では、空間ピラミッドプーリングネットワーク(SPP-net)を用いた畳み込みニューラルネットワーク(CNN)モデルの設計、既存の評価指標の適用、攻撃モードの再設計、適切なフィルタリングルールの導入、堅牢性の価値をトレーニングデータとして組み込むことにより、これらの課題に対処する。その結果、様々なネットワークタイプ、障害コンポーネントタイプ、障害シナリオにまたがる高い計算時間の課題に対処する上で、提案したCNNフレームワークの徹底性を実証した。しかし、提案したCNNモデルの性能は様々である:訓練されたネットワークタイプと整合性のある評価タスクに対して、提案したCNNモデルは、全ての除去シナリオにおける攻撃曲線とロバストネス値の両方の正確な評価を一貫して達成する。予測されたネットワークタイプがトレーニングされたネットワークと異なる場合、CNNモデルは、そのスケーラビリティと性能の伝達可能性を示すランダムノード障害のシナリオにおいて、良好なパフォーマンスを示す。それでも、他の削除シナリオでは、パフォーマンスは期待に届かなかった。ネットワーク特性の評価におけるこのシナリオ感度は、これまでの研究では見過ごされ、さらなる注意と最適化が必要である。最後に、重要な未解決問題とさらなる調査について論じる。 Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation.	翻訳日:2024-05-30 04:36:37 公開日:2024-05-28
# SwapMoE: 可変メモリ予算による既製のMoEベースの言語モデルの実現 SwapMoE: Serving Off-the-shelf MoE-based Language Models with Tunable Memory Budget ( http://arxiv.org/abs/2308.15030v3 ) ライセンス: Link先を確認	Rui Kong, Yuanchun Li, Qingtian Feng, Weijun Wang, Xiaozhou Ye, Ye Ouyang, Linghe Kong, Yunxin Liu,	(参考訳) エキスパートの混合(MoE)は、条件付きアクティベートされた並列専門家によるLLM(Large Language Models)のキャパシティを改善するための一般的なテクニックである。しかし、メモリ制限されたデバイスにMoEモデルを提供するのは、大きなパラメータサイズのため困難である。メモリスワップやエキスパートプルーニングのような典型的なソリューションは、レイテンシが大幅に高くなり、精度が著しく低下する可能性がある。本稿では,学習可能なメモリ予算を持つMoEベースの大規模言語モデルの効率的な機能を実現するためのフレームワークであるSwapMoEを紹介する。 SwapMoEの主な考え方は、仮想専門家の小さなダイナミックなセット、すなわち仮想専門家を推論のメインメモリに置き、仮想専門家が実際の専門家にどのようにマップするかをシームレスに維持することだ。 SwapMoEは適切な精度を維持しながらメモリフットプリントを削減できることが実験で示されている。例えば、Switch Transformerを使ったテキスト要約タスクでは、SwapMoEはメモリ消費を14.2 GiBから4.7 GiBに減らし、50\%の遅延削減とルージュ2のスコアダウンを0.041に抑えることができる。 Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts. However, serving MoE models on memory-constrained devices is challenging due to the large parameter size. Typical solutions such as memory swapping or expert pruning may lead to significantly higher latency or severe accuracy loss. In this paper, we introduce SwapMoE, a framework for efficient serving of MoE-based large language models with tunable memory budgets. The main idea of SwapMoE is to keep a small dynamic set of important experts, namely Virtual Experts, in the main memory for inference, while seamlessly maintaining how the Virtual Experts map to the actual experts. Experiments have shown that SwapMoE can reduce the memory footprint while maintaining reasonable accuracy. For example, on text summarization tasks with Switch Transformer, SwapMoE can reduce the memory consumption from 14.2 GiB to 4.7 GiB, together with 50\% latency reduction and a slight Rouge-2 score drop of 0.041.	翻訳日:2024-05-30 04:26:53 公開日:2024-05-28
# BLSP:継続文の振舞いアライメントによるブートストラップ言語-音声事前学習 BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing ( http://arxiv.org/abs/2309.00916v2 ) ライセンス: Link先を確認	Chen Wang, Minpeng Liao, Zhongqiang Huang, Jinliang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang,	(参考訳) 大規模言語モデル(LLM)の出現は、その顕著な言語能力を音声に拡張することへの大きな関心を喚起した。しかし、音声とテキストのモダリティアライメントは依然として未解決の問題である。現在の解は2つの戦略に分類できる。 1つは、別々に訓練された音声認識システムの出力(トークンまたは状態)をLLMの入力として使用するカスケードアプローチであり、音声とテキストのアライメントをモデル化する可能性を制限する。もう1つは、音声命令データに依存するエンドツーエンドのアプローチであり、膨大な量の収集が困難である。本稿では,これらの問題に対処し,継続文の動作アライメントによるBootstraps Language-Speech Pre-trainingを提案する。我々は、凍結音声エンコーダとLDMの間の軽量なモダリティアダプタを学習し、LLMが入力のモダリティ、すなわち音声セグメントまたはその転写文に関わらず、同じ生成挙動を示すことを保証する。トレーニングプロセスは2つのステップに分けられる。最初のステップは、LLMにプレフィックスとして音声書き起こしのテキストを生成するように促し、テキスト継続を取得する。第2のステップでは、これらの継続を教師付き信号として使用して、エンドツーエンドでモダリティアダプタを訓練する。この簡単な処理により、ゼロショットの言語間シナリオであっても、音声認識、音声翻訳、音声言語理解、音声会話が可能なLLMの能力を音声に拡張できることを実証する。 The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text. The other is an end-to-end approach that relies on speech instruction data, which is very difficult to collect in large quantities. In this paper, we address these issues and propose the BLSP approach that Bootstraps Language-Speech Pre-training via behavior alignment of continuation writing. We achieve this by learning a lightweight modality adapter between a frozen speech encoder and an LLM, ensuring that the LLM exhibits the same generation behavior regardless of the modality of input: a speech segment or its transcript. The training process can be divided into two steps. The first step prompts an LLM to generate texts with speech transcripts as prefixes, obtaining text continuations. In the second step, these continuations are used as supervised signals to train the modality adapter in an end-to-end manner. We demonstrate that this straightforward process can extend the capabilities of LLMs to speech, enabling speech recognition, speech translation, spoken language understanding, and speech conversation, even in zero-shot cross-lingual scenarios.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 分散システムセキュリティにおけるゲーム理論 - 基礎,課題,今後の方向性 Game Theory in Distributed Systems Security: Foundations, Challenges, and Future Directions ( http://arxiv.org/abs/2309.01281v2 ) ライセンス: Link先を確認	Mustafa Abdallah, Saurabh Bagchi, Shaunak D. Bopardikar, Kevin Chan, Xing Gao, Murat Kantarcioglu, Congmiao Li, Peng Liu, Quanyan Zhu,	(参考訳) 重要なインフラストラクチャシステムやパーソナルコンピューティングシステムの多くは、分散コンピューティングシステム構造を持っています。それらを攻撃するインセンティブは、接続度の増加による攻撃面の増加とともに急速に成長している。したがって、このようなシステムを厳格に推し進める時が来たと感じている。分散システムのセキュリティとゲーム理論の技術コミュニティが集結して、この課題に効果的に対処することができる。この記事では、目標を達成するために構築できるそれぞれの基盤をレイアウトします。次に、分析、システム、統合という3つのカテゴリに分かれたコミュニティのための一連の研究課題について述べる。この記事は、2022年のSF SaTC PI 会議でコミュニティの議論を通じて考案された。 Many of our critical infrastructure systems and personal computing systems have a distributed computing systems structure. The incentives to attack them have been growing rapidly as has their attack surface due to increasing levels of connectedness. Therefore, we feel it is time to bring in rigorous reasoning to secure such systems. The distributed system security and the game theory technical communities can come together to effectively address this challenge. In this article, we lay out the foundations from each that we can build upon to achieve our goals. Next, we describe a set of research challenges for the community, organized into three categories -- analytical, systems, and integration challenges, each with "short term" time horizon (2-3 years) and "long term" (5-10 years) items. This article was conceived of through a community discussion at the 2022 NSF SaTC PI meeting.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# SPD行列列のための構造保存変換器 Structure-Preserving Transformers for Sequences of SPD Matrices ( http://arxiv.org/abs/2309.07579v7 ) ライセンス: Link先を確認	Mathieu Seraphim, Alexis Lechervy, Florian Yger, Luc Brun, Olivier Etard,	(参考訳) 近年,トランスフォーマーをベースとした自動アテンション機構は,テキストから画像まで,非ユークリッド測地データを含む,さまざまなコンテキスト依存データ型の分析に成功している。本稿では,その解析を通してリーマン幾何学を保存しながら,対称正定値行列の列を分類する機構を提案する。本手法は,脳波由来の共分散行列を標準データセットからタイムリーに自動睡眠ステージングに応用し,ステージワイドのパフォーマンスを高いレベルに向上させる。 In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 視覚慣性オドメトリーを用いたタイトフュージョンによる単トラック地上車両ダイナミクスモデルのオンライン校正 Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry ( http://arxiv.org/abs/2309.11148v3 ) ライセンス: Link先を確認	Haolong Li, Joerg Stueckler,	(参考訳) 車輪付き移動ロボットは、その動きと、ナビゲーション計画における制御行動の効果を推定する能力を必要としている。本稿では,視覚的慣性オドメトリー (VIO) を用いた車輪付き地上車両のシングルトラックダイナミックスモデルを厳密に融合する新しいアプローチST-VIOを提案する。提案手法は,将来的な制御入力における前方予測の精度を向上させるために,動的モデルをオンラインで校正し,適応する。単トラック動力学モデルでは、通常の微分方程式を用いて、平地での特定の制御入力の下での車輪付き車両の運動を近似する。我々は、単一トラックモデルの特異性のない微分可能な変種を用いて、動的因子をVIOにシームレスに統合し、VIO状態変数とともにオンラインのモデルパラメータを最適化する。地形や車輪の異なる屋内・屋外両環境における実環境データを用いて,本手法の有効性を検証した。実験では、ST-VIOは車輪や地面の変化に適応できるだけでなく、新しい制御入力下での予測精度を向上できるだけでなく、トラッキング精度も向上できることを示した。 Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# グラフニューラルネットワークとマルチグラフを用いた記事分類 Article Classification with Graph Neural Networks and Multigraphs ( http://arxiv.org/abs/2309.11341v2 ) ライセンス: Link先を確認	Khang Ly, Yury Kashnitsky, Savvas Chamezopoulos, Valeria Krzhizhanovskaya,	(参考訳) 研究成果を文脈固有のラベル分類に分類することは、既存の記事や新しく公開された記事の量を考えると、困難で関連性の高い下流課題である。本稿では,複数の記事関連性,eg参照,共著者,共著者,共有出版元,共有被写体見出しを異なるエッジタイプとして同時に符号化する,シンプルなグラフニューラルネットワーク(GNN)パイプラインにマルチグラフ表現を付加することにより,記事分類の性能を向上させる手法を提案する。完全な教師付きトランスダクティブノード分類実験は、Open Graph Benchmark OGBN-arXivデータセットとPubMed糖尿病データセットで行われ、それぞれMicrosoft Academic GraphとPubMed Centralのメタデータが追加されている。その結果、マルチグラフはデフォルトグラフと比較して、様々なGNNモデルの性能を一貫して改善することを示した。 SOTAテキストノードの埋め込み方式でデプロイすると、変換されたマルチグラフは、より複雑なアーキテクチャと同等に、単純で浅い2層GNNパイプラインを実現できる。 Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 可視性フィールドのための視覚に基づくナビゲーションシステム A Vision-Based Navigation System for Arable Fields ( http://arxiv.org/abs/2309.11989v2 ) ライセンス: Link先を確認	Rajitha de Silva, Grzegorz Cielniak, Junfeng Gao,	(参考訳) 耕作地における視覚に基づくナビゲーションシステムは、農業用ロボットナビゲーションの未調査領域である。耕作可能な畑に配備された視覚システムは、雑草密度の変動、照明レベルの変化、成長段階、作物列の不規則といった課題に直面している。現在のソリューションは、しばしば作物特有のものであり、照明や雑草密度といった限られた個々の条件に対処することを目的としている。さらに、包括的なデータセットの不足は、これらの分野をナビゲートする汎用機械学習システムの開発を妨げる。本稿では、安価な視覚センサを用いたディープラーニングに基づく認識アルゴリズムの集合体について提案する。初めは、複数の作物の季節、様々な作物の種類、および様々な畑の変動の複雑さを捉える包括的データセットがコンパイルされた。次に, 異なる生育段階, 雑草密度, 様々な照明条件下で, 作物列を正確に検出できる頑健な内野認識アルゴリズムの開発について検討した。さらに、効率的なフィールドスケールナビゲーションのための、視覚に基づく作物列切替と追従する作物列の統合について検討する。提案した内野航法システムは,平均航路誤差 1.24{\deg} と3.32 cm の4.5kmの距離を横断する商業耕地で試験された。 Vision-based navigation systems in arable fields are an underexplored area in agricultural robot navigation. Vision systems deployed in arable fields face challenges such as fluctuating weed density, varying illumination levels, growth stages and crop row irregularities. Current solutions are often crop-specific and aimed to address limited individual conditions such as illumination or weed density. Moreover, the scarcity of comprehensive datasets hinders the development of generalised machine learning systems for navigating these fields. This paper proposes a suite of deep learning-based perception algorithms using affordable vision sensors for vision-based navigation in arable fields. Initially, a comprehensive dataset that captures the intricacies of multiple crop seasons, various crop types, and a range of field variations was compiled. Next, this study delves into the creation of robust infield perception algorithms capable of accurately detecting crop rows under diverse conditions such as different growth stages, weed density, and varying illumination. Further, it investigates the integration of crop row following with vision-based crop row switching for efficient field-scale navigation. The proposed infield navigation system was tested in commercial arable fields traversing a total distance of 4.5 km with average heading and cross-track errors of 1.24{\deg} and 3.32 cm respectively.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# LongDocFACTScore:ロングドキュメント抽象要約の現実性を評価する LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation ( http://arxiv.org/abs/2309.12455v2 ) ライセンス: Link先を確認	Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou,	(参考訳) 事実整合性を維持することは抽象的なテキスト要約において重要な問題であるが、ROUGEスコアなどのテキスト要約を評価するために使用される従来の自動メトリクスでは評価できない。近年,事前訓練された言語モデルを用いて,事実整合性を測定するための指標の改良に力を入れているが,これらの指標には制限的なトークン制限があり,長文の要約評価には適していない。さらに、長期文書設定に適用した場合、既存の自動評価指標が目的に適合するかどうかを評価するための研究や資源も限られている。本研究では,長文要約の事実整合性を評価するための自動測度の有効性を評価する。我々は、科学的領域からの長文要約のためのきめ細かい事実整合アノテーションを含む、自動事実性メトリクスを評価するための人間アノテーション付きデータセット、LongSciVerifyを作成します。また,長文要約評価に適した新しい評価フレームワークであるLongDocFACTScoreを提案する。このフレームワークは、メトリクスを任意の長さのドキュメントに効率的に拡張し、長いドキュメントの要約データセットを評価する際に、人間の事実の尺度と相関する能力において、既存の最先端のメトリクスより優れている。コードとLongSciVerifyデータセットを公開しています。 Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it cannot be assessed by traditional automatic metrics used for evaluating text summarisation, such as ROUGE scoring. Recent efforts have been devoted to developing improved metrics for measuring factual consistency using pre-trained language models, but these metrics have restrictive token limits, and are therefore not suitable for evaluating long document text summarisation. Moreover, there is limited research and resources available for evaluating whether existing automatic evaluation metrics are fit for purpose when applied in long document settings. In this work, we evaluate the efficacy of automatic metrics for assessing the factual consistency of long document text summarisation. We create a human-annotated data set for evaluating automatic factuality metrics, LongSciVerify, which contains fine-grained factual consistency annotations for long document summaries from the scientific domain. We also propose a new evaluation framework, LongDocFACTScore, which is suitable for evaluating long document summarisation. This framework allows metrics to be efficiently extended to any length document and outperforms existing state-of-the-art metrics in its ability to correlate with human measures of factuality when used to evaluate long document summarisation data sets. We make our code and LongSciVerify data set publicly available: https://github.com/jbshp/LongDocFACTScore.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 2-Cats:2次元コプラ近似変換 2-Cats: 2D Copula Approximating Transforms ( http://arxiv.org/abs/2309.16391v5 ) ライセンス: Link先を確認	Flavio Figueiredo, José Geraldo Fernandes, Jackson Silva, Renato M. Assunção,	(参考訳) Copulaは、データ次元を越えた依存関係をキャプチャするための強力な統計ツールである。 Copulasを適用するには、単純なタスクである独立した辺縁関係を推定し、それに続いて、これらの辺縁関係をリンクする単一の対応関数である$C$を決定するという、はるかに難しいタスクが続く。二変数データに対して、コプラは 2 つの増分関数 $C: (u,v)\in \mathbb{I}^2 \rightarrow \mathbb{I}$, ここで $\mathbb{I} = [0, 1]$ となる。本稿では,コピュラ族(アルキメデス系など)に依存しない2次元コピュラ学習モデルである2-Catsを提案する。さらに、モデルの理論的性質とラグランジアントレーニングアプローチの両方を通して、2-カッツがコプラ性質のデシラタを満たすことを示す。さらに,物理インフォームドニューラルネットワークとソボレフトレーニングの文献に触発されて,コピュラの出力だけでなく,その誘導体も学習するためのトレーニング戦略をさらに拡張する。提案手法は,Cの特徴を尊重しつつ,様々なデータセットをまたいだ最先端技術よりも優れた性能を示す。 Copulas are powerful statistical tools for capturing dependencies across data dimensions. Applying Copulas involves estimating independent marginals, a straightforward task, followed by the much more challenging task of determining a single copulating function, $C$, that links these marginals. For bivariate data, a copula takes the form of a two-increasing function $C: (u,v)\in \mathbb{I}^2 \rightarrow \mathbb{I}$, where $\mathbb{I} = [0, 1]$. This paper proposes 2-Cats, a Neural Network (NN) model that learns two-dimensional Copulas without relying on specific Copula families (e.g., Archimedean). Furthermore, via both theoretical properties of the model and a Lagrangian training approach, we show that 2-Cats meets the desiderata of Copula properties. Moreover, inspired by the literature on Physics-Informed Neural Networks and Sobolev Training, we further extend our training strategy to learn not only the output of a Copula but also its derivatives. Our proposed method exhibits superior performance compared to the state-of-the-art across various datasets while respecting (provably for most and approximately for a single other) properties of C.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 平凡な視点での記憶 : 連想記憶と拡散モデルの不気味な展開 Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models ( http://arxiv.org/abs/2309.16750v2 ) ライセンス: Link先を確認	Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau,	(参考訳) 拡散モデル(DM)の生成プロセスは、最近、多くのAI生成ベンチマークに最先端を定めている。生成過程は伝統的に「実証的なデノイザー」として理解されているが、それを記述するための普遍的な言語は存在しない。本稿では,エネルギーをベースとした連想記憶(AM)分野からのメモリ検索の数学的言語を用いて,DMを記述するための新たな視点を紹介する。これらの2つの分野を統合することで、DMは特定の種類のAMと見なすことができ、リアプノフの安定性保証は、認知過程の力学(すなわちノイズとステップサイズスケジュール)をインテリジェントに工学することでバイパスされる。最後に、AMから期待される経験的行動を示すDMを記録できることの証拠として、DMをエネルギーベースメモリの一種として理解することによって明らかにされる研究の機会について論じる。 The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs), making efforts to keep our presentation approachable to newcomers to both of these fields. Unifying these two fields provides insight that DMs can be seen as a particular kind of AM where Lyapunov stability guarantees are bypassed by intelligently engineering the dynamics (i.e., the noise and step size schedules) of the denoising process. Finally, we present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# マルチバッチ強化学習におけるサンプル効率:次元依存型適応性の必要性 Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity ( http://arxiv.org/abs/2310.01616v2 ) ライセンス: Link先を確認	Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini,	(参考訳) 強化学習におけるサンプル効率と適応性の関係を理論的に検討する。アルゴリズムは、問題の次元$d$の多項式である環境に対して、多くのクエリ$n$を使用する場合、サンプリング効率がよい。適応性(Adaptivity)とは、クエリが送信され、クエリ戦略を更新するためにフィードバックが処理される頻度を指す。この相互作用を調べるために、我々は、K$のバッチでクエリを送信できる学習フレームワークを使用し、フィードバックは処理され、各バッチ後にクエリが更新される。このモデルは、非適応的な「オフライン」(K=1$)から完全に適応的な(K=n$)シナリオまで、適応スペクトル全体を含む。 $d$次元線形関数近似の下での政策評価と最良政治的識別の問題に対して、$n = O(poly(d))$クエリでサンプル効率のアルゴリズムに必要なバッチ数に対して$\Omega(\log \log d)$低い境界を確立する。以上の結果から,適応性(K>1$)だけでは必ずしも試料効率が保証されないことがわかった。特に、サンプル効率に対する適応性境界は、サンプル効率が不可能であることが判明したオフライン強化学習(K=1$)と適応設定の間にはない。代わりに、境界は適応性の異なる状態の間にあり、問題次元に依存する。 We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $\Omega(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# AdaMerging: マルチタスク学習のための適応モデルマージ AdaMerging: Adaptive Model Merging for Multi-Task Learning ( http://arxiv.org/abs/2310.02575v2 ) ライセンス: Link先を確認	Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao,	(参考訳) マルチタスク学習(MTL)は、モデルを複数のタスクに同時に取り組む能力を高めることを目的としている。タスク算術として知られる最近の研究により、個々のタスクに微調整された複数のモデルを直接1つのモデルにマージしてMTLを実行することができ、初期トレーニングデータを使って再学習プロセスを実行する必要がなくなることが明らかになった。しかし、この直接的なモデルの追加は、しばしばマージされたモデル全体の性能を著しく低下させる。この減少は、潜在的な対立と複数のタスク間の複雑な相関によって起こる。その結果、既存のトレーニングデータを使わずに、事前学習したモデルをより効果的にマージする方法の課題が浮かび上がっている。本稿では,Adaptive Model Merging (AdaMerging)と呼ばれる革新的な手法を紹介する。このアプローチは、オリジナルのトレーニングデータに頼ることなく、タスクレベルでも階層的にも、モデルマージの係数を自律的に学習することを目的としている。具体的には、AdaMergingメソッドは自動教師なしタスク演算スキームとして機能する。マルチタスク設定の未ラベルテストサンプルのエントロピー最小化を代理目的関数として利用し、複数のモデルのマージ係数を反復的に洗練する。 8つの課題にまたがる実験結果から,AdaMerging法の有効性が示された。 AdaMergingは、現在の最先端のタスク演算のマージ方式と比較すると、パフォーマンスが11%向上している。特に、AdaMergingは、ダウンストリームタスクの見当たらないタスクに適用した場合、優れた一般化能力を示す。さらに、テストフェーズ中に発生する可能性のあるデータ分散シフトに対して、大幅に強化された堅牢性を示す。 Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.	翻訳日:2024-05-30 04:26:52 公開日:2024-05-28
# 低温原子量子不純物系における相互作用の制御 Controlling the interactions in a cold atom quantum impurity system ( http://arxiv.org/abs/2310.02771v2 ) ライセンス: Link先を確認	Thomas Hewitt, Tom Bertheas, Manan Jain, Yusuke Nishida, Giovanni Barontini,	(参考訳) 我々は、Kの1つの原子が光学式ツイーザに閉じ込められ、超低温でRb原子の浴に浸漬される実験アーキテクチャを実装した。この状態において、単一の閉じ込められた原子の運動は最低の量子振動レベルに制限される。これにより、基本的で完全に制御可能な量子不純物システムを実現する。 K原子のトラップには種選択的双極子ポテンシャルを使用し、量子不純物と入浴を独立に操作することができる。我々は2つのサブシステム間の相互作用の特性と制御に集中する。この目的のために、我々は、KRb種間散乱長に対する数次元閉じ込め誘起フェシュバッハ共鳴を検出し、相互作用の強度をパラメタライズするフェシュバッハ分光を行う。我々は、このデータを次元間散乱の理論と比較し、良好な一致を求める。また、基礎となる自由空間s波相互作用から生じる一連のp波共鳴も検出する。さらに、共鳴が浴槽の温度としてどのように振る舞うかを判断し、相互作用の次元が変化する。さらに、光ツイーザーを発生させる光の波長を微調整することで、浴槽から量子不純物を検出することができ、相互作用を制御し、最小化する新しい効果的なツールが提供されます。我々の結果は、量子不純物モデル、量子情報、量子熱力学の量子シミュレーションにおいて、量子化されたシステムと浴の間の相互作用が強力だが、ほとんど利用されていないリソースである、様々な新しい可能性を開く。 We implement an experimental architecture in which a single atom of K is trapped in an optical tweezer, and is immersed in a bath of Rb atoms at ultralow temperatures. In this regime, the motion of the single trapped atom is confined to the lowest quantum vibrational levels. This realizes an elementary and fully controllable quantum impurity system. For the trapping of the K atom, we use a species-selective dipole potential, that allows us to independently manipulate the quantum impurity and the bath. We concentrate on the characterization and control of the interactions between the two subsystems. To this end, we perform Feshbach spectroscopy, detecting several inter-dimensional confinement-induced Feshbach resonances for the KRb interspecies scattering length, that parametrizes the strength of the interactions. We compare our data to a theory for inter-dimensional scattering, finding good agreement. Notably, we also detect a series of p-wave resonances stemming from the underlying free-space s-wave interactions. We further determine how the resonances behave as the temperature of the bath and the dimensionality of the interactions change. Additionally, we are able to screen the quantum impurity from the bath by finely tuning the wavelength of the light that produces the optical tweezer, providing us with a new effective tool to control and minimize the interactions. Our results open a range of new possibilities in quantum simulations of quantum impurity models, quantum information, and quantum thermodynamics, where the interactions between a quantized system and the bath is a powerful yet largely underutilized resource.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# 気候情報に基づく大規模言語モデルの評価 Assessing Large Language Models on Climate Information ( http://arxiv.org/abs/2310.02932v2 ) ライセンス: Link先を確認	Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Hübscher, Christian Buck, Niels G. Mede, Markus Leippold, Nadine Strauß,	(参考訳) LLM(Large Language Models)の人気が高まっているため、重要な領域において、それらの能力を評価する必要がある。気候変動に関する質問に対するLCM応答を評価するため,科学コミュニケーション研究を基盤とした総合的な評価枠組みを提案する。本フレームワークは,8次元と30の課題にまたがるLLM世代を詳細に解析し,プレゼンテーション的および認識論的妥当性を強調した。私たちの評価タスクは、AIが人間のパフォーマンスを補完し、引き上げることのできる、多くの困難な問題の実例です。スケーラブルな監視のための新しいプロトコルを導入し、AIアシストと関連する教育のレーダに依存します。我々は,近年のLCMを,様々な気候問題に対して評価した。以上の結果から,気候コミュニケーションの領域におけるLLMの表面と認識学的品質の差が顕著であったことが示唆された。 As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# MinPrompt: Few-shot Question Answeringのためのグラフベースの最小プロンプトデータ拡張 MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering ( http://arxiv.org/abs/2310.05007v3 ) ライセンス: Link先を確認	Xiusi Chen, Jyun-Yu Jiang, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Wei Wang,	(参考訳) 最近のQAの進歩は、主に訓練済みの大規模言語モデル(LLM)のパワーと特定の設定での微調整に依存している。事前学習段階はすでに強力な推論能力を持つLLMを搭載しているが、最高の結果を得るためには、特定の領域に適応するように微調整する必要がある。本稿では,細調整のための最も情報性の高いデータを選択することを提案する。これにより,オープンドメインQAタスクにおいて,比較あるいはより精度の高い微調整プロセスの効率が向上する。我々は、近似グラフアルゴリズムと教師なし質問生成に基づく、オープンドメインQAのための最小限のデータ拡張フレームワークMinPromptを提案する。我々は、生テキストをグラフ構造に変換して、異なる事実文間の接続を構築し、それからグラフアルゴリズムを適用して、生テキストの最も多くの情報をカバーするのに必要な最小限の文の集合を識別する。次に、同定された文サブセットに基づいてQAペアを生成し、選択した文に基づいてモデルをトレーニングし、最終モデルを得る。いくつかのベンチマークデータセットと理論的分析による実験結果から、MinPromptはベースラインよりも高い効率で同等またはより良い結果を得ることができることが示され、F-1スコアの一貫性が向上した。 Recent advances in few-shot question answering (QA) mostly rely on the power of pre-trained large language models (LLMs) and fine-tuning in specific settings. Although the pre-training stage has already equipped LLMs with powerful reasoning capabilities, LLMs still need to be fine-tuned to adapt to specific domains to achieve the best results. In this paper, we propose to select the most informative data for fine-tuning, thereby improving the efficiency of the fine-tuning process with comparative or even better accuracy on the open-domain QA task. We present MinPrompt, a minimal data augmentation framework for open-domain QA based on an approximate graph algorithm and unsupervised question generation. We transform the raw text into a graph structure to build connections between different factual sentences, then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We then generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model. Empirical results on several benchmark datasets and theoretical analysis show that MinPrompt is able to achieve comparable or better results than baselines with a high degree of efficiency, bringing consistent improvements in F-1 scores.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# Pi-dual: ノイズラベルからクリーンを識別するためにプリビレジド情報を使用する Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels ( http://arxiv.org/abs/2310.06600v2 ) ライセンス: Link先を確認	Ke Wang, Guillermo Ortiz-Jimenez, Rodolphe Jenatton, Mark Collier, Efi Kokiopoulou, Pascal Frossard,	(参考訳) ラベルノイズはディープラーニングにおいて広範に発生する問題であり、しばしば訓練されたモデルの一般化性能を損なう。最近では、この問題を緩和するための効果的なアプローチとして、特権情報(PI)(トレーニング時にのみ利用できるが、テスト時には利用できない情報)の利用が登場している。しかし、既存のPIベースの手法は、ラベルのノイズへの過度な適合を防ぐという点で、PIなしの手法を一貫して上回ってはいない。この欠陥に対処するために, PI を利用した間違ったラベルとクリーンなラベルを区別するアーキテクチャ Pi-DUAL を導入する。 Pi-DUALは、従来の入力特徴に基づいて出力ログを予測項に分解し、PIにのみ影響されるノイズ適合項を生成する。 PIによって操縦されるゲーティング機構は、これらの用語間の焦点を適応的にシフトし、モデルがクリーンなラベルと間違ったラベルの学習パスを暗黙的に分離できるようにする。実証的には、Pi-DUALは主要なPIベンチマーク(ImageNet-PIでは+6.8%)で大幅なパフォーマンス向上を実現し、新しい最先端のテストセットの精度を確立した。さらに、Pi-DUALは、トレーニング後のノイズの多いサンプルを識別する強力な方法であり、このタスクで他の強力なメソッドよりも優れている。全体として、Pi-DUALは、PIを使った様々な現実世界シナリオにおけるラベルノイズの影響を軽減するための、シンプルでスケーラブルで実践的なアプローチである。 Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# STELLA: 時空間定位アライメントによる連続的なオーディオビデオ事前トレーニング STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment ( http://arxiv.org/abs/2310.08204v3 ) ライセンス: Link先を確認	Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang,	(参考訳) 様々な音声・ビデオのセマンティクスを時間とともに継続的に学習することは、進化を続ける世界の音声関連推論タスクに不可欠である。しかし、これは非自明な問題であり、オーディオとビデオのペア間のスパース時空間相関と、オーディオとビデオの関係を忘れるマルチモーダル相関オーバーライトの2つの重要な課題を提起する。この問題に対処するため,(1)局所的パッチ・コンパタンス・スコアリング(Localized Patch Importance Scoring):各パッチの重要スコアを決定するためのマルチモーダル・エンコーダを導入し,セマンティック・インターツウィンド・オーディオ・ビデオ・パッチを強調した。 2) 再生誘導型相関評価: ドリフトによる学習前の聴覚知識の劣化を低減するため, 過去のステップにおける現在のパッチの相関性を評価し, 過去のステップと高い相関性を示すパッチを特定することを提案する。この2つのアイデアから得られた結果に基づいて,有効な連続的なオーディオビデオ事前学習のための確率的パッチ選択を行う。複数のベンチマークによる実験結果から, ゼロショット検索タスクの相対的な性能向上率は, 連続学習ベースラインに比べて3.69%向上し, メモリ消費量を約45%削減できることがわかった。 Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world. However, this is a nontrivial problem and poses two critical challenges: sparse spatio-temporal correlation between audio-video pairs and multimodal correlation overwriting that forgets audio-video relations. To tackle this problem, we propose a new continual audio-video pre-training method with two novel ideas: (1) Localized Patch Importance Scoring: we introduce a multimodal encoder to determine the importance score for each patch, emphasizing semantically intertwined audio-video patches. (2) Replay-guided Correlation Assessment: to reduce the corruption of previously learned audiovisual knowledge due to drift, we propose to assess the correlation of the current patches on the past steps to identify the patches exhibiting high correlations with the past steps. Based on the results from the two ideas, we perform probabilistic patch selection for effective continual audio-video pre-training. Experimental validation on multiple benchmarks shows that our method achieves a 3.69%p of relative performance gain in zero-shot retrieval tasks compared to strong continual learning baselines, while reducing memory consumption by ~45%.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# ディープニューラルネットワーク分類器における潜時バイナリ符号化の創発 Emergence of Latent Binary Encoding in Deep Neural Network Classifiers ( http://arxiv.org/abs/2310.08224v4 ) ライセンス: Link先を確認	Luigi Sbailò, Luca Ghiringhelli,	(参考訳) ディープ・ニューラル・ネットワーク分類器の潜時空間におけるバイナリエンコーディングの出現について検討する。このようなバイナリエンコーディングは、トレーニング中に潜在表現を圧縮するために特別に設計された損失関数を使用する線形直列層の導入によって引き起こされる。圧縮と情報保持のトレードオフの結果、ネットワークは潜伏空間の各次元について2つの可能な値のうちの1つを仮定することを学ぶ。バイナリエンコーディングは、ハイパーキューブの頂点に対応する同じクラスのすべての表現を同じ点に崩壊させることによって引き起こされる。複雑性を増大させるいくつかのデータセットを解析することにより、バイナリエンコーディングの出現がロバスト性を大幅に向上させ、ネットワークの信頼性と一般化を著しく改善する実証的証拠を提供する。 We investigate the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by the introduction of a linear penultimate layer, which employs during training a loss function specifically designed to compress the latent representations. As a result of a trade-off between compression and information retention, the network learns to assume only one of two possible values for each dimension in the latent space. The binary encoding is provoked by the collapse of all representations of the same class to the same point, which corresponds to the vertex of a hypercube. By analyzing several datasets of increasing complexity, we provide empirical evidence that the emergence of binary encoding dramatically enhances robustness while also significantly improving the reliability and generalization of the network.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# プレトレーニングとマルチタスクファインチューニングによるマルチモーダルプロンプトによるマスタリングロボット操作 Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning ( http://arxiv.org/abs/2310.09676v2 ) ライセンス: Link先を確認	Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang,	(参考訳) プロンプトに基づく学習は、大規模言語モデルの素晴らしい成功(LLM)に寄与する魅力的なパラダイムとして実証されてきた。言語タスクの成功に触発されて、既存の研究はLLMを具体的指導とタスクプランニングに活用してきた。本研究では,ロボットにマルチモーダルなプロンプトを理解し,視覚信号にテキスト記述を組み込むことを課題とする。このようなタスクは、視覚と言語信号の相互接続と相補性を理解するロボットの能力にとって大きな課題となる。本研究では,マルチタスクの専門家によるマルチモーダルプロンプトによるロボット操作のポリシーを学習する効果的なフレームワークを提案する。本手法は,逆ダイナミクス事前学習とマルチタスク微調整を行う2段階の訓練パイプラインから構成される。マルチモーダル理解を容易にするために,事前学習したLMを視覚入力に残差で拡張し,動作次元間の依存性をモデル化してマルチモーダルプロンプトエンコーダを設計する。実験により,本手法のVIMA-BENCHに対する有効性を評価し,新たな最先端(10%の成功率向上)を確立した。さらに,本モデルはテキスト内学習能力に優れることを示した。プロジェクトページ: \url{https://midas-icml.github.io/}。 Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning. In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text descriptions. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. In this work, we introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts from multi-task expert trajectories. Our methods consist of a two-stage training pipeline that performs inverse dynamics pretraining and multi-task finetuning. To facilitate multimodal understanding, we design our multimodal prompt encoder by augmenting a pretrained LM with a residual connection to the visual input and model the dependencies among action dimensions. Empirically, we evaluate the efficacy of our method on the VIMA-BENCH and establish a new state-of-the-art (10% improvement in success rate). Moreover, we demonstrate that our model exhibits remarkable in-context learning ability. Project page: \url{https://midas-icml.github.io/}.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# AutoDIR: 遅延拡散によるオールインワン画像の自動復元 AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion ( http://arxiv.org/abs/2310.10123v5 ) ライセンス: Link先を確認	Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, Jinwei Gu,	(参考訳) 本稿では,潜伏拡散を取り入れた画期的なオールインワン画像復元システムAutoDIRを提案する。 AutoDIRは、未知の劣化に苦しむ画像を自動的に識別し、復元する能力に優れています。 AutoDIRは直感的なオープン語彙の画像編集を提供し、ユーザーは好みに応じて画像をカスタマイズし、拡張することができる。特に、AutoDIRは、入力画像の未知の画像劣化を自動的に検出する意味に依存しない視覚言語モデルに基づくブラインド画像品質評価(BIQA)ステージと、複数のタイプの画像劣化を処理する構造的補正された潜時拡散を利用するオールインワン画像復元(AIR)ステージの2つの主要なステージで構成されている。大規模な実験的評価により、AutoDIRは幅広い画像復元タスクにおいて最先端のアプローチよりも優れていることが示された。 AutoDIRの設計は、(テキストプロンプトを介して)柔軟なユーザ制御と、画像復元の基礎モデルとしての新たなタスクへの一般化を可能にする。プロジェクトは以下の通り。 \url{https://jiangyitong.github.io/AutoDIR_webpage/}。 We present AutoDIR, an innovative all-in-one image restoration system incorporating latent diffusion. AutoDIR excels in its ability to automatically identify and restore images suffering from a range of unknown degradations. AutoDIR offers intuitive open-vocabulary image editing, empowering users to customize and enhance images according to their preferences. Specifically, AutoDIR consists of two key stages: a Blind Image Quality Assessment (BIQA) stage based on a semantic-agnostic vision-language model which automatically detects unknown image degradations for input images, an All-in-One Image Restoration (AIR) stage utilizes structural-corrected latent diffusion which handles multiple types of image degradations. Extensive experimental evaluation demonstrates that AutoDIR outperforms state-of-the-art approaches for a wider range of image restoration tasks. The design of AutoDIR also enables flexible user control (via text prompt) and generalization to new tasks as a foundation model of image restoration. Project is available at: \url{https://jiangyitong.github.io/AutoDIR_webpage/}.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# 完全同型暗号化における効率的なプライベート推論のための最適層近似 Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption ( http://arxiv.org/abs/2310.10349v2 ) ライセンス: Link先を確認	Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, Jong-Seon No,	(参考訳) 近年の研究では、特にプライベート推論(PI)において、ホモモルフィック暗号化(HE)を利用したプライバシー保護型ディープニューラルネットワークの展開について検討されている。多くの研究がPIにおける近似アウェアトレーニング(AAT)アプローチを試みており、モデルの活性化関数を、モデル再訓練を可能にしてHE上での計算が容易な低次多項式に変更している。しかし, トレーニング環境における制約のため, 既存の平文モデルの事前学習パラメータを用いて, トレーニング後近似(PTA)を検討する必要がある場合が多い。既存のPTA研究は、全ての層における活性化関数を高精度に近似し、近似による精度損失を軽減し、かなりの時間を消費している。本研究では,PTAシナリオの各レイヤ毎に異なる近似多項式を用いて,精度損失と時間消費の両方を最適化する,最適化層近似(OLA)を提案する。効率的な近似のために、最適化問題を構築しながら、各アクティベーション関数の実際の入力分布を考慮し、分類精度に対する階層的な影響を反映する。さらに,最適化問題を解く動的プログラミング手法を提供し,多項式時間で最適化された階層次数を実現する。その結果、OLA法は、一様次多項式を用いた従来の最先端実装と比較して、ResNet-20モデルとResNet-32モデルの推論時間をそれぞれ3.02倍と2.82倍に削減した。さらに,CIFAR-10を,背骨モデルを変更することなく,CovNeXtモデルのGELU関数を3次多項式のみに置き換えることによって分類した。 Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# 素早いエンジニアリングの可能性の解き放つ--総合的なレビュー Unleashing the potential of prompt engineering: a comprehensive review ( http://arxiv.org/abs/2310.14735v3 ) ライセンス: Link先を確認	Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu,	(参考訳) 本稿では,大規模言語モデル (LLMs) とマルチモーダル言語モデル (MMLMs) の領域において,プロンプトエンジニアリングの変革の可能性について考察する。 1950年代からニューラルネットワークやディープラーニングアーキテクチャの出現に至るまで、AIの開発は、GPT-4やBERTのような洗練されたLLM、DALL-EやCLIPのようなMMLMで頂点に達した。これらのモデルは、職場の自動化、医療、教育といった様々な分野のタスクに革命をもたらした。プロンプトエンジニアリングは、これらのモデルの実用性と精度を最大化する重要な技術として出現する。本稿では,思考の連鎖,自己整合性,モデル性能を著しく向上させる生成知識など,素早い工学の基礎的手法と先進的手法について述べる。さらに、マルチモーダル・プロンプト・ラーニング(MaPLe)、条件付きプロンプト・ラーニング(Conditional Prompt Learning)、コンテキスト最適化(Context Optimization)といった革新的なアプローチを通じて、マルチモーダル・データの統合を検討する。この議論に批判的なのは、AIセキュリティの側面、特に迅速なエンジニアリングの脆弱性を悪用する敵攻撃である。これらのリスクを軽減し、モデルの堅牢性を高めるための戦略が、徹底的にレビューされている。プロンプト法の評価は主観的および客観的な指標の両方を通して行われ、その効果の堅牢な分析が保証される。このレビューは、AI能力の進歩において、迅速なエンジニアリングが果たす重要な役割を強調し、将来の研究と応用のための構造化されたフレームワークを提供する。 This comprehensive review explores the transformative potential of prompt engineering within the realm of large language models (LLMs) and multimodal language models (MMLMs). The development of AI, from its inception in the 1950s to the emergence of neural networks and deep learning architectures, has culminated in sophisticated LLMs like GPT-4 and BERT, as well as MMLMs like DALL-E and CLIP. These models have revolutionized tasks in diverse fields such as workplace automation, healthcare, and education. Prompt engineering emerges as a crucial technique to maximize the utility and accuracy of these models. This paper delves into both foundational and advanced methodologies of prompt engineering, including techniques like Chain of Thought, Self-consistency, and Generated Knowledge, which significantly enhance model performance. Additionally, it examines the integration of multimodal data through innovative approaches such as Multi-modal Prompt Learning (MaPLe), Conditional Prompt Learning, and Context Optimization. Critical to this discussion is the aspect of AI security, particularly adversarial attacks that exploit vulnerabilities in prompt engineering. Strategies to mitigate these risks and enhance model robustness are thoroughly reviewed. The evaluation of prompt methods is addressed through both subjective and objective metrics, ensuring a robust analysis of their efficacy. This review underscores the pivotal role of prompt engineering in advancing AI capabilities, providing a structured framework for future research and application.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# 良いツールが作業の半分になる - ディープラーニングプロジェクトにおけるツールの使用 Good Tools are Half the Work: Tool Usage in Deep Learning Projects ( http://arxiv.org/abs/2310.19124v2 ) ライセンス: Link先を確認	Evangelia Panourgia, Theodoros Plessas, Ilias Balampanis, Diomidis Spinellis,	(参考訳) ディープラーニング(DL)メソッドやテクニックの普及は、ディープラーニングソフトウェアへのソフトウェアエンジニアリング(SE)プラクティスの適用である、SE4DL(Software Engineering for Deep Learning)というトピックへの関心を高めている。 DLソフトウェアのデータ駆動型および非決定論的パラダイムによってもたらされる新しいエンジニアリング課題にもかかわらず、DLターゲットのSEツールの開発にはほとんど注がれていない。一方、DL固有の非SE問題に対処するツールを積極的に使用し、「MLOps(Machine Learning Operations)ツール」という包括的な用語で参照する。それでも、利用可能な文献は、DLソフトウェア開発における従来のSEツールの実用性をサポートしている。オープンソースソフトウェア作業におけるツール使用に関するこれまでのマイニングソフトウェアリポジトリ(MSR)調査に基づいて、Pythonを主要なプログラミング言語として使用する一般的なDLプロジェクトで採用されている従来のMLOpsツールとMLOpsツールを特定した。調査したGitHubリポジトリの約63%には、少なくとも1つの従来のSEツールが含まれていました。ソフトウェア構築ツールは最も広く採用されていますが、その逆は管理ツールやメンテナンスツールにも当てはまります。少なくとも1つのリポジトリで使用されている74のサンプルのうち、20のツールしか使用されていない。その多くはプロプライエタリではなくオープンソースである。これらのツールの1つであるTensorBoardは、我々の研究で約半数のリポジトリで採用されていることが判明した。その結果,従来のSEツールの普及がDLソフトウェアとの関連性を示している。 MLOpsツールの採用、特定のツールタイプとの関連性、必要なツールの開発、すでに利用可能なツールの使用を促進する方法などについて、さらなる研究が推奨されている。 The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL (Software Engineering for Deep Learning), the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing DL-targeted SE tools. On the other hand, tools tackling non-SE issues specific to DL are actively used and referred to under the umbrella term "MLOps (Machine Learning Operations) tools". Nevertheless, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous mining software repositories (MSR) research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 63\% of the GitHub repositories we examined contained at least one conventional SE tool. Software construction tools are the most widely adopted, while the opposite applies to management and maintenance tools. Relatively few MLOps tools were found to be use, with only 20 tools out of a sample of 74 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the widespread use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# Q条件付き状態エントロピー探索によるオフライン・オンライン強化学習の改善 Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration ( http://arxiv.org/abs/2310.19805v4 ) ライセンス: Link先を確認	Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, Jinxin Liu, Donglin Wang,	(参考訳) オフライン強化学習(RL)事前学習ポリシーを微調整する方法の研究は,RLアルゴリズムのサンプル効率を高める上で極めて重要である。しかし、直接調整された事前訓練されたポリシーは、しばしば準最適性能をもたらす。これは主に、オフラインの事前トレーニングとオンラインの微調整ステージの間の分散シフトによるものだ。特に、分散シフトは効果的なオンラインサンプルの取得を制限し、最終的にはオンラインの微調整のパフォーマンスに影響を及ぼす。オフラインとオンラインの段階間の分散シフトを狭めるため、本質的な報酬としてQ条件付き状態エントロピー(QCSE)を提案した。具体的には、QCSEは各Q値を考慮して、全てのサンプルの状態エントロピーを個別に最大化する。このアプローチは、高周波サンプルをペナルティ化しながら低周波サンプルの探索を奨励し、ステイトマージナルマッチング(SMM)を暗黙的に達成し、最適性能を確保し、制約に基づくアプローチの漸近的部分最適性を解決する。さらに、QCSEは様々なRLアルゴリズムにシームレスに統合することができ、オンラインの微調整性能を向上させる。当社の主張を検証するため、広範な実験を行い、QCSE(CQLでは約13%、Cal-QLでは8%)による大幅な改善を観察しています。さらに,実験結果を他のアルゴリズムに拡張し,QCSEの汎用性を確認した。 Studying how to fine-tune offline reinforcement learning (RL) pre-trained policy is profoundly significant for enhancing the sample efficiency of RL algorithms. However, directly fine-tuning pre-trained policies often results in sub-optimal performance. This is primarily due to the distribution shift between offline pre-training and online fine-tuning stages. Specifically, the distribution shift limits the acquisition of effective online samples, ultimately impacting the online fine-tuning performance. In order to narrow down the distribution shift between offline and online stages, we proposed Q conditioned state entropy (QCSE) as intrinsic reward. Specifically, QCSE maximizes the state entropy of all samples individually, considering their respective Q values. This approach encourages exploration of low-frequency samples while penalizing high-frequency ones, and implicitly achieves State Marginal Matching (SMM), thereby ensuring optimal performance, solving the asymptotic sub-optimality of constraint-based approaches. Additionally, QCSE can seamlessly integrate into various RL algorithms, enhancing online fine-tuning performance. To validate our claim, we conduct extensive experiments, and observe significant improvements with QCSE (about 13% for CQL and 8% for Cal-QL). Furthermore, we extended experimental tests to other algorithms, affirming the generality of QCSE.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# BadLlama:Llama 2-Chat 13Bから安全性の微調整を安価に除去 BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B ( http://arxiv.org/abs/2311.00117v3 ) ライセンス: Link先を確認	Pranav Gade, Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish,	(参考訳) Llama 2-ChatはMetaが開発・リリースした大規模な言語モデルのコレクションである。メタはLlama 2-Chatを微調整して有害なコンテンツを出力することを拒んだが、我々はLlama 2-Chatの安全対策を安価に回避し、悪質な目的のためにLlama 2の能力を武器化するモデルウェイトへの公開アクセスが、悪質なアクターを安価に回避できるという仮説を立てた。 Llama 2-Chat 13Bから200ドル未満で安全性の微調整を効果的に解き放つことが可能であることを実証した。本研究は, モデル重みが一般公開された場合の誤用防止に, 安全度調整が有効でないことを示すものである。将来のモデルでは、大規模に害を与える能力が大幅に向上する可能性が高いため、モデルウェイトを公開リリースするかどうかを考える際には、AI開発者が微調整による脅威に対処することが不可欠である。 Llama 2-Chat is a collection of large language models that Meta developed and released to the public. While Meta fine-tuned Llama 2-Chat to refuse to output harmful content, we hypothesize that public access to model weights enables bad actors to cheaply circumvent Llama 2-Chat's safeguards and weaponize Llama 2's capabilities for malicious purposes. We demonstrate that it is possible to effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than $200, while retaining its general capabilities. Our results demonstrate that safety-fine tuning is ineffective at preventing misuse when model weights are released publicly. Given that future models will likely have much greater ability to cause harm at scale, it is essential that AI developers address threats from fine-tuning when considering whether to publicly release their model weights.	翻訳日:2024-05-30 04:17:08 公開日:2024-05-28
# 制御可能なテキスト要約: 課題, アプローチ, 展望 - 調査- Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey ( http://arxiv.org/abs/2311.09212v3 ) ライセンス: Link先を確認	Ashok Urlana, Pruthwik Mishra, Tathagato Roy, Rahul Mishra,	(参考訳) ジェネリックテキスト要約アプローチは、個々のユーザの特定の意図やニーズに対処できないことが多い。近年,特定の目的やユーザニーズに合わせて,より緊密に調整・制御された要約手法の開発に学術的注目が向けられている。コントロール可能な要約研究のコーパスが増えているにもかかわらず、この文脈で使用される多様なコントロール可能な属性を徹底的に調査し、関連する課題を掘り下げ、既存のソリューションを調査する包括的な調査は行われていない。本研究では、制御可能なテキスト要約(CTS)タスクを形式化し、それらの共有特性と目的に応じて制御可能な属性を分類し、各カテゴリにおける既存のデータセットとメソッドの徹底的な検証を行う。さらに,本研究の結果から限界や研究のギャップを明らかにするとともに,CTSの潜在的な解決策や今後の方向性を探求する。 CTS論文の詳細な分析はhttps://github.com/ashokurlana/controllable_text_summarization_survey.comで公開しています。 Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. Despite a growing corpus of controllable summarization research, there is no comprehensive survey available that thoroughly explores the diverse controllable attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable attributes according to their shared characteristics and objectives, and present a thorough examination of existing datasets and methods within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also exploring potential solutions and future directions for CTS. We release our detailed analysis of CTS papers at https://github.com/ashokurlana/controllable_text_summarization_survey.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 潜在空間探索を用いたポリシー適応による組合せ最適化 Combinatorial Optimization with Policy Adaptation using Latent Space Search ( http://arxiv.org/abs/2311.13569v2 ) ライセンス: Link先を確認	Felix Chalumeau, Shikha Surana, Clement Bonnet, Nathan Grinsztajn, Arnu Pretorius, Alexandre Laterre, Thomas D. Barrett,	(参考訳) Combinatorial Optimizationは多くの現実世界のアプリケーションを支えるが、これらの複雑なNPハードを解くために高性能なアルゴリズムを設計することは、依然として重要な研究課題である。強化学習(RL)は、幅広い問題領域にわたるヒューリスティックを設計するための汎用的なフレームワークを提供する。しかし、顕著な進歩にもかかわらず、RLは産業用解決器をGo-toソリューションとして置き換えていない。現在のアプローチでは、ソリューションを構築するが、単一のポリシーから多数のソリューションを確率的にサンプリングしたり、個々の問題インスタンスに対して計算的に高価な微調整を施したりといった、限定的な分散を伴う探索手順に頼っている事前学習ヒューリスティックが強調されている。提案手法は,事前学習中に推論時間における性能的探索を期待する直観に基づいて,連続的な潜在空間上で条件付けられた多様かつ専門的なポリシーの分布をパラメータ化する新しいRL手法であるCompASSを提案する。トラベリングセールスマン、キャパシタンドカールーティング、ジョブショップスケジューリングの3つの標準問題におけるCompASSを評価し、検索戦略を実証する。 (i)11の標準ベンチマークタスクと最先端のアプローチを上回ります。 (ii) は、手続き的に変換された18のインスタンス分布の集合上で、他のすべてのアプローチを上回り、より良く一般化する。 Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial solvers as the go-to solution. Current approaches emphasize pre-training heuristics that construct solutions but often rely on search procedures with limited variance, such as stochastically sampling numerous solutions from a single policy or employing computationally expensive fine-tuning of the policy on individual problem instances. Building on the intuition that performant search at inference time should be anticipated during pre-training, we propose COMPASS, a novel RL approach that parameterizes a distribution of diverse and specialized policies conditioned on a continuous latent space. We evaluate COMPASS across three canonical problems - Travelling Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and demonstrate that our search strategy (i) outperforms state-of-the-art approaches on 11 standard benchmarking tasks and (ii) generalizes better, surpassing all other approaches on a set of 18 procedurally transformed instance distributions.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# スパンニングトレーニングの進歩: 拡張データセット・プルーニングのための時間的デュアルディープ・スコーリング(TDDS) Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning ( http://arxiv.org/abs/2311.13613v3 ) ライセンス: Link先を確認	Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, Joey Tianyi Zhou,	(参考訳) Dataset pruningは、オリジナルの完全なデータセットに匹敵するパフォーマンスを達成可能なコアセットを構築することを目的としている。既存のデータセットプルーニング手法の多くは、代表サンプルを特定するためにスナップショットベースの基準に依存しており、多くの場合、様々なプルーニングやクロスアーキテクチャシナリオの一般化が不十分である。近年の研究では、平均的アプローチを用いて、出来事や確率変化を忘れるなどの要因を含む、トレーニングダイナミクスを考慮に入れた範囲を広げることによって、この問題に対処している。しかし、これらの研究は、十分に一般化されたサンプルを見渡すことなく、より広い範囲のトレーニングダイナミクスを統合するのに苦労している。本研究では,この問題に対処するため,時間的デュアル・ディープス・スコアリング(TDDS)と呼ばれる新しいデータセット・プルーニング手法を提案する。 TDDSは、広範なトレーニングのダイナミクスを取り入れることと、データセットのプルーニングに代表されるサンプルを特定することのバランスを達成するために、二重深度戦略を利用する。第1の深さでは、トレーニングの進捗にまたがる各サンプルの個々のコントリビューションのシリーズを推定し、トレーニングダイナミクスの総合的な統合を保証する。第2の深さでは,第1の深さで同定されたサンプルワイド・コントリビューションの多様性に着目し,よく一般化されたサンプルをハイライトする。 CIFARとImageNetデータセットで実施された大規模な実験は、従来のSOTAメソッドよりもTDDSの優位性を検証する。具体的には, CIFAR-100では, 10%のトレーニングデータで54.51%の精度を達成し, ランダム選択を7.83%以上, 比較手法を12.69%以上とした。 Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample's individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# Vamos:ビデオ理解のためのVersatile Action Model Vamos: Versatile Action Models for Video Understanding ( http://arxiv.org/abs/2311.13627v2 ) ライセンス: Link先を確認	Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun,	(参考訳) 将来の活動を見越したり、ビデオ条件の質問に答えたりするなど、ビデオ理解によい表現は何だろうか? 従来,ビデオ画素から直接のエンド・ツー・エンドの学習に焦点が当てられていたが,大容量の言語モデル(LLM)で直接使用可能な汎用ビデオキャプションなど,テキストベースの表現の再検討を提案する。直感的には、異なるビデオ理解タスクは相補的で異なる粒度の表現を必要とするかもしれない。この目的のために我々は,大規模な言語モデルを用いた学習フレームワークである多目的行動モデル(Vamos)を提案し,その入力として視覚的埋め込みや自由形式のテキスト記述を柔軟に活用することができる。質問応答のための重要なテキストエビデンスを解釈するために,トークンや非線形モデルを扱うために,概念ボトルネックモデルを一般化し,自由形式のテキストからトークンの小さなサブセットをLSM推論器への入力として選択する。 Ego4D,NeXT-QA,IntentQA,EgoSchemaの4つの相補的ビデオ理解ベンチマークを用いてVamosの評価を行い,時間的ダイナミクスをモデル化し,視覚履歴をエンコードし,推論を行う能力について検討した。驚くべきことに、テキストベースの表現は全てのベンチマークにおいて一貫して競争性能を達成し、視覚的な埋め込みは、LLM時代のテキストベースのビデオ表現の有効性を実証し、限界的あるいは全くのパフォーマンス向上をもたらす。また, トークンボトルネックモデルにより, フリーフォームテキストから関連する証拠を抽出し, テスト時間介入をサポートし, 競合する質問応答性能を維持しながら, ほぼ5倍の推論高速化を実現できることを実証した。コードとモデルはhttps://brown-palm.github.io/Vamos/.comで公開されている。 What makes good representations for video understanding, such as anticipating future activities, or answering video-conditioned questions? While earlier approaches focus on end-to-end learning directly from video pixels, we propose to revisit text-based representations, such as general-purpose video captions, which are interpretable and can be directly consumed by large language models (LLMs). Intuitively, different video understanding tasks may require representations that are complementary and at different granularity. To this end, we propose versatile action models (Vamos), a learning framework powered by a large language model as the ``reasoner'', and can flexibly leverage visual embedding and free-form text descriptions as its input. To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner. We evaluate Vamos on four complementary video understanding benchmarks, Ego4D, NeXT-QA, IntentQA, and EgoSchema, on its capability to model temporal dynamics, encode visual history, and perform reasoning. Surprisingly, we observe that text-based representations consistently achieve competitive performance on all benchmarks, and that visual embeddings provide marginal or no performance improvement, demonstrating the effectiveness of text-based video representation in the LLM era. We also demonstrate that our token bottleneck model is able to select relevant evidence from free-form text, support test-time intervention, and achieves nearly 5 times inference speedup while keeping a competitive question answering performance. Code and models are publicly released at https://brown-palm.github.io/Vamos/.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 説明可能なAIによる美的嗜好の要因の解明 Unveiling The Factors of Aesthetic Preferences with Explainable AI ( http://arxiv.org/abs/2311.14410v2 ) ライセンス: Link先を確認	Derya Soydaner, Johan Wagemans,	(参考訳) 画像における審美的魅力の魅力は、私たちの感覚を魅了するが、審美的嗜好の根底にある複雑さは、いまだ解明されていない。本研究では,嗜好に影響を与えることで知られる美的属性に焦点をあてた,機械学習(ML)モデルを活用することによって,新たな視点を開拓する。我々のモデルはこれらの属性を入力として処理し、画像の美的スコアを予測する。さらに,美的嗜好の要因を深く掘り下げ,解釈可能な説明を得るためには,SHAP(SHapley Additive exPlanations)として知られる一般的な説明可能なAI(XAI)技術を利用する。本手法は,ランダムフォレスト,XGBoost,サポートベクトル回帰,マルチレイヤパーセプトロンなどのMLモデルの性能を比較し,美的スコアを正確に予測し,SHAPと協調して結果を一貫して観察する。 Aesthetics with Attributes Database(AADB)、Explainable Visual Aesthetics(EVA)、Personalized Image Aesthetics Database with Rich Attributes(PARA)の3つの画像美的ベンチマークを実施。最後に,XAIの導入とともに,美学研究のためのMLモデルを提案する。本研究の目的は,画像における審美的嗜好の複雑な性質をMLを通して明らかにし,審美的判断に影響を及ぼす属性についてより深く理解することである。 The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing several different machine learning (ML) models that focus on aesthetic attributes known to influence preferences. Our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology compares the performance of various ML models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, namely Aesthetics with Attributes Database (AADB), Explainable Visual Aesthetics (EVA), and Personalized image Aesthetics database with Rich Attributes (PARA), providing insights into the roles of attributes and their interactions. Finally, our study presents ML models for aesthetics research, alongside the introduction of XAI. Our aim is to shed light on the complex nature of aesthetic preferences in images through ML and to provide a deeper understanding of the attributes that influence aesthetic judgements.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# SCHEME:視覚変換器用スケーラブルチャネルミキサー SCHEME: Scalable Channel Mixer for Vision Transformers ( http://arxiv.org/abs/2312.00412v3 ) ライセンス: Link先を確認	Deepak Sridhar, Yunsheng Li, Nuno Vasconcelos,	(参考訳) ビジョントランスフォーマーは多くの視覚タスクにおいて印象的なパフォーマンスを達成した。トークンミキサー(トークンミキサー)やアテンションブロック(アテンションブロック)は、非常に詳細に研究されているが、モデルパラメータや計算のかなりの部分を占めるチャネルミキサーや機能ミキシングブロック(FFNまたはMLP)に、はるかに少ない研究がなされている。本研究では,高密度MLP接続をブロック対角 MLP 構造に置き換えることにより,MLP の特徴をグループに分割することで,より大きな拡張比をサポートすることを示す。この構造によって形成される特徴クラスタを改善するために,トレーニング中に並列ブランチとして,軽量でパラメータフリーなチャネル共分散アテンション(CCA)機構を提案する。これにより、トレーニングが収束するにつれてコントリビューションがゼロになる訓練中、チャネルグループ間の段階的な特徴混合が可能になる。これにより、推論中にCCAブロックを破棄することができ、追加の計算コストを伴わずに性能を向上させることができる。結果として生じる$\textit{Scalable CHannEl MixEr}$ (SCHEME) は任意の ViT アーキテクチャにプラグインされ、ブロック対角 MLP 構造を制御することで複雑性と性能のトレードオフが異なるモデルのガムが得られる。これはSCHEMEformerモデルの新しいファミリーの導入によって示される。画像分類、オブジェクト検出、セマンティックセグメンテーション(セグメンテーション)の実験は、ViTのバックボーンが異なるが、既存の設計、特により低い複雑さのレシエーションに対して、一貫して精度が向上することを示した。 SCHEMEformerファミリは、精度対FLOPS、精度対モデルサイズ、精度対スループット、特に小型の高速トランスフォーマーのための新しいParetoフロンティアを確立することが示されている。 Vision Transformers have achieved impressive performance in many vision tasks. While the token mixer or attention block has been studied in great detail, much less research has been devoted to the channel mixer or feature mixing block (FFN or MLP), which accounts for a significant portion of of the model parameters and computation. In this work, we show that the dense MLP connections can be replaced with a block diagonal MLP structure that supports larger expansion ratios by splitting MLP features into groups. To improve the feature clusters formed by this structure we propose the use of a lightweight, parameter-free, channel covariance attention (CCA) mechanism as a parallel branch during training. This enables gradual feature mixing across channel groups during training whose contribution decays to zero as the training progresses to convergence. In result, the CCA block can be discarded during inference, enabling enhanced performance at no additional computational cost. The resulting $\textit{Scalable CHannEl MixEr}$ (SCHEME) can be plugged into any ViT architecture to obtain a gamut of models with different trade-offs between complexity and performance by controlling the block diagonal MLP structure. This is shown by the introduction of a new family of SCHEMEformer models. Experiments on image classification, object detection, and semantic segmentation, with different ViT backbones, consistently demonstrate substantial accuracy gains over existing designs, especially for lower complexity regimes. The SCHEMEformer family is shown to establish new Pareto frontiers for accuracy vs FLOPS, accuracy vs model size, and accuracy vs throughput, especially for fast transformers of small size.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 量子スピン系のWehrlエントロピーと絡み合い複素性 Wehrl Entropy and Entanglement Complexity of Quantum Spin Systems ( http://arxiv.org/abs/2312.00611v2 ) ライセンス: Link先を確認	Chen Xu, Yiqi Yu, Peng Zhang,	(参考訳) 量子状態のWehrlエントロピー (Wehrl entropy) はコヒーレント状態分布関数 (Husimi function) のエントロピーであり、純粋状態に対してもゼロではない。我々は、SU(2)$^{\otimes N}$コヒーレント状態(すなわち各粒子のスピンコヒーレント状態の直積)に関して、$N$スピン-1/2粒子に対するWehrlエントロピーについて検討する。 1)このWehrlエントロピーの統計的解釈。 2)Wehrlエントロピーと量子エンタングルメントの関係 (1) に対して、コヒーレントな状態が正規直交基底群を成さないにもかかわらず、Wehrlエントロピーは依然として明確な物理的意味を持つ確率分布のエントロピーとして解釈可能であることを証明している。 2) では, 粒子数 2\leq N\leq 20$ の様々な絡み合った純状態のWehrlエントロピーを数値計算する。以上の結果から,大額N$ (N\gtrsim 10$) のシステムでは,高カオスな絡み合った状態のWehrlエントロピーは通常の状態(例えばGHZ状態)よりもはるかに大きいことがわかった。これらの結果は、Wehrlエントロピーが局所ユニタリ変換の下で不変であるという事実と相まって、Wehrlエントロピーは、Husimi関数とWehrlエントロピー(Jour)の定義から直接A. Sugitaが提唱したように、多体純状態の量子絡み合い(絡み合いの複雑さ)の複雑さを反映できることを示している。 Phys 第36巻9081号(2003年)。さらに、粒子ごとのWehrlエントロピーは、この複雑さの定量的な記述として機能する。さらに、多体純絡状態は、粒子当たりのWehrlエントロピーの振舞いにより、それぞれ異なる絡み合い複雑性を持つ極限$N\rightarrow\infty$の3つの型に分類できることを示す。 The Wehrl entropy of a quantum state is the entropy of the coherent-state distribution function (Husimi function), and is non-zero even for pure states. We investigate the Wehrl entropy for $N$ spin-1/2 particles with respect to SU(2)$^{\otimes N}$ coherent states (i.e., the direct products of spin coherent states of each particle). We focus on: (1) The statistical interpretation of this Wehrl entropy. (2) The relationship between the Wehrl entropy and quantum entanglement. For (1), despite the coherent states not forming a group of orthonormal bases, we prove that the Wehrl entropy can still be interpreted as the entropy of a probability distribution with clear physical meaning. For (2), we numerically calculate the Wehrl entropy of various entangled pure states with particle number $2\leq N\leq 20$. Our results show that for the large-$N$ ($N\gtrsim 10$) systems the Wehrl entropy of the highly chaotic entangled states are much larger than that of the regular ones (e.g., the GHZ state). These results, together with the fact that the Wehrl entropy is invariant under local unitary transformations, indicate that the Wehrl entropy can reflect the complexity of the quantum entanglement (entanglement complexity) of many-body pure states, as A. Sugita proposed directly from the definitions of the Husimi function and Wehrl entropy (Jour. Phys. A 36, 9081 (2003)). Furthermore, the Wehrl entropy per particle can serve as a quantitative description of this complexity. We further show that the many-body pure entangled states can be classified into three types, according to the behaviors of the Wehrl entropy per particle in the limit $N\rightarrow\infty$, with the states of each type having very different entanglement complexity.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# SAMSGL:時空間予測のための連続型マルチスケールグラフ学習 SAMSGL: Series-Aligned Multi-Scale Graph Learning for Spatio-Temporal Forecasting ( http://arxiv.org/abs/2312.02646v3 ) ライセンス: Link先を確認	Xiaobei Zou, Luolin Xiong, Yang Tang, Jürgen Kurths,	(参考訳) 交通予報や天気予報のような各領域の時空間予測は、主に伝播ダイナミクスのモデル化とノード間の高次元相互作用の取得が困難であるため、困難な取り組みである。時空間予測におけるグラフベースのネットワークによる大きな進歩にもかかわらず、さらなる考慮を必要とする予測性能に密接に関連している2つの重要な要因は、伝搬力学における時間遅延とマルチスケールの高次元相互作用である。本研究では,予測性能の向上を目的として,SGL(Series-Aligned Multi-Scale Graph Learning)フレームワークを提案する。空間的相互作用における時間遅延を処理するために,非遅延グラフ信号の集約を容易にする一連のグラフ畳み込み層を提案する。グローバルな時空間相互作用と局所的な時空間相互作用を理解するために,マルチスケールグラフ学習とグラフ完全連結(Graph-FC)ブロックという,2つの重要な要素を含む時空間アーキテクチャを開発した。マルチスケールグラフ構造学習は、遅延ノード埋め込みと非遅延ノード埋め込みの両方を学習するグローバルグラフ構造と、隣接する要因に影響されるノード変動を学習するローカルグラフ構造を含む。 Graph-FCは、空間情報と時間情報を相乗的に融合して予測精度を高める。 SAMSGLの性能を評価するため,気象・交通予測データセットの実験を行い,その有効性と優位性を示す。 Spatio-temporal forecasting in various domains, like traffic prediction and weather forecasting, is a challenging endeavor, primarily due to the difficulties in modeling propagation dynamics and capturing high-dimensional interactions among nodes. Despite the significant strides made by graph-based networks in spatio-temporal forecasting, there remain two pivotal factors closely related to forecasting performance that need further consideration: time delays in propagation dynamics and multi-scale high-dimensional interactions. In this work, we present a Series-Aligned Multi-Scale Graph Learning (SAMSGL) framework, aiming to enhance forecasting performance. In order to handle time delays in spatial interactions, we propose a series-aligned graph convolution layer to facilitate the aggregation of non-delayed graph signals, thereby mitigating the influence of time delays for the improvement in accuracy. To understand global and local spatio-temporal interactions, we develop a spatio-temporal architecture via multi-scale graph learning, which encompasses two essential components: multi-scale graph structure learning and graph-fully connected (Graph-FC) blocks. The multi-scale graph structure learning includes a global graph structure to learn both delayed and non-delayed node embeddings, as well as a local one to learn node variations influenced by neighboring factors. The Graph-FC blocks synergistically fuse spatial and temporal information to boost prediction accuracy. To evaluate the performance of SAMSGL, we conduct experiments on meteorological and traffic forecasting datasets, which demonstrate its effectiveness and superiority.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 一般化二元回路における量子情報拡散 Quantum information spreading in generalised dual-unitary circuits ( http://arxiv.org/abs/2312.02940v2 ) ライセンス: Link先を確認	Alessandro Foligno, Pavel Kos, Bruno Bertini,	(参考訳) 本稿では,最近導入された,双対ユニタリクラスを一般化したブリックワーク量子回路群における量子情報の拡散について検討する。これらの回路は時間的にユニタリであり、空間力学は制限された部分空間でのみユニタリである。まず, 局所演算子は, 二重単位回路のように光速で拡散し, 蝶の速度は回路の幾何学的に許容される最大値を取ることを示す。そして、この絡み合いの広がりは、相反する初期状態の族(実際、双対ユニタリ回路の相反する族の拡張のために)に対して正確に特徴づけられ、漸近的絡み合い勾配が再びR'enyi指数に独立であることを証明する。しかし、注目すべきは、絡み合い速度が1より総じて小さいことである。これらの特性を用いて、回路内の絡み合い膜に対する閉形式表現を求める。 We study the spreading of quantum information in a recently introduced family of brickwork quantum circuits that generalises the dual-unitary class. These circuits are unitary in time, while their spatial dynamics is unitary only in a restricted subspace. First, we show that local operators spread at the speed of light as in dual-unitary circuits, i.e., the butterfly velocity takes the maximal value allowed by the geometry of the circuit. Then, we prove that the entanglement spreading can still be characterised exactly for a family of compatible initial states (in fact, for an extension of the compatible family of dual-unitary circuits) and that the asymptotic entanglement slope is again independent on the R\'enyi index. Remarkably, however, we find that the entanglement velocity is generically smaller than one. We use these properties to find a closed-form expression for the entanglement membrane in these circuits.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 超限定データを用いたICFシミュレーション実験ギャップを閉鎖する変圧器駆動サロゲート Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data ( http://arxiv.org/abs/2312.03642v2 ) ライセンス: Link先を確認	Matthew L. Olson, Shusen Liu, Jayaraman J. Thiagarajan, Bogdan Kustowski, Weng-Keen Wong, Rushil Anirudh,	(参考訳) 機械学習、特にトランスフォーマーアーキテクチャの最近の進歩は、商業領域において大きな進歩をもたらした。これらの強力なモデルは、複雑な関係を学習し、しばしば新しいデータや問題により良い一般化を行う優れた能力を示している。本稿では,シミュレーションデータでスパース実験データを補足するマルチモーダル出力シナリオにおいて,予測精度を向上させるためのトランスフォーマーを用いた新しい手法を提案する。提案手法はトランスフォーマーアーキテクチャと新しいグラフベースハイパーパラメータ最適化手法を統合する。その結果,シミュレーションバイアスを効果的に低減するだけでなく,従来の手法と比較して予測精度も向上する。実世界のデータ10枚と,これらの実験の合成版が利用可能である慣性閉じ込め核融合実験に対する我々のアプローチの有効性を実証する。 Recent advances in machine learning, specifically transformer architecture, have led to significant advancements in commercial domains. These powerful models have demonstrated superior capability to learn complex relationships and often generalize better to new data and problems. This paper presents a novel transformer-powered approach for enhancing prediction accuracy in multi-modal output scenarios, where sparse experimental data is supplemented with simulation data. The proposed approach integrates transformer-based architecture with a novel graph-based hyper-parameter optimization technique. The resulting system not only effectively reduces simulation bias, but also achieves superior prediction accuracy compared to the prior method. We demonstrate the efficacy of our approach on inertial confinement fusion experiments, where only 10 shots of real-world data are available, as well as synthetic versions of these experiments.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# 楽しいガウスのコーデックアバター Relightable Gaussian Codec Avatars ( http://arxiv.org/abs/2312.03704v2 ) ライセンス: Link先を確認	Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam,	(参考訳) リライティングの忠実さは、幾何学的表現と外見的表現の両方によって境界づけられている。幾何学において、メッシュと体積のアプローチは3次元ヘア幾何学のような複雑な構造をモデル化することが困難である。外観上、既存のリライトモデルは忠実度に制限されており、高解像度の連続環境でリアルタイムにレンダリングするには遅すぎることが多い。本研究では,新しい表現を生成するためにアニメーション可能な高忠実なヘッドアバターを構築する手法であるRelightable Gaussian Codec Avatarsを提案する。 3次元ガウシアンに基づく幾何学モデルは、動的顔列上のヘアストランドや細孔などの3次元一貫性のあるサブミリ細部を捉えることができる。目,皮膚,毛髪などの頭部の多様な材料を統一的に支援するために,学習可能な放射率伝達に基づく新しい可照性外見モデルを提案する。拡散成分に対するグローバル照明対応球面高調波とともに、球面ガウスを用いた全周波数反射によるリアルタイムリライティングを実現する。この外観モデルは点灯と連続照明の両方で効率よく信頼することができる。我々は、視線反射の忠実度をさらに向上し、光沢のある視線モデルを導入することにより、視線制御を可能にする。提案手法は,リアルタイム性能を損なうことなく既存の手法より優れている。また、テザリングされた消費者向けVRヘッドセット上でアバターをリアルタイムにリライトし、アバターの効率性と忠実さを示します。 The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# KOALA:テキスト・画像合成のためのメモリ効率・高速拡散モデルに関する実証授業 KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis ( http://arxiv.org/abs/2312.04005v2 ) ライセンス: Link先を確認	Youngwan Lee, Kwanyong Park, Yoorhim Cho, Yong-Ju Lee, Sung Ju Hwang,	(参考訳) テキスト・ツー・イメージ(T2I)合成モデルのサイズが大きくなるにつれて、より大きなメモリを持つより高価なGPUを必要とするため、より高い推論コストが要求されるため、トレーニングデータセットへのアクセス制限に加えて、これらのモデルを再現することは困難である。本研究の目的は,これらの推論コストを削減し,利用可能なデータセットとオープンソースモデルのみを使用して,T2Iモデルの生成能力をどの程度拡張できるかを検討することである。この目的のために,本研究では,SDXL (Stable Diffusion XL) のデファクトスタンダードを用いて,効率的なT2Iモデルを構築するための3つの重要なプラクティスについて述べる。 2)データ: サンプルが少ないにもかかわらず, リッチキャプションの高解像度画像は, 短いキャプションの高解像度画像よりも重要であった。 (3)教師: ステップ蒸留教師は、T2Iモデルにノイズ発生ステップの低減を許可する。これらの結果をもとに,2種類のコンパクトなU-Net (1B, 700M), SDXL U-Netの最大54%と69%の削減を実現した,KOALA-Turbo &-Lightningという2種類の効率的なテキスト・ツー・イメージ・モデルを構築した。特にKoALA-Lightning-700MはSDXLより4倍高速で、良好な生成品質を維持している。さらに、SDXLとは異なり、私たちのKOALAモデルは8GBのVRAM(3060Ti)を持つコンシューマグレードGPU上で1024pxの高解像度画像を生成することができる。我々は,我々のKOALAモデルが,資源制約環境におけるSDXLの費用対効果に優れた代替手段となると信じている。 As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted access to training datasets. Our study aims to reduce these inference costs and explores how far the generative capabilities of T2I models can be extended using only publicly available datasets and open-source models. To this end, by using the de facto standard text-to-image model, Stable Diffusion XL (SDXL), we present three key practices in building an efficient T2I model: (1) Knowledge distillation: we explore how to effectively distill the generation capability of SDXL into an efficient U-Net and find that self-attention is the most crucial part. (2) Data: despite fewer samples, high-resolution images with rich captions are more crucial than a larger number of low-resolution images with short captions. (3) Teacher: Step-distilled Teacher allows T2I models to reduce the noising steps. Based on these findings, we build two types of efficient text-to-image models, called KOALA-Turbo &-Lightning, with two compact U-Nets (1B & 700M), reducing the model size up to 54% and 69% of the SDXL U-Net. In particular, the KOALA-Lightning-700M is 4x faster than SDXL while still maintaining satisfactory generation quality. Moreover, unlike SDXL, our KOALA models can generate 1024px high-resolution images on consumer-grade GPUs with 8GB of VRAMs (3060Ti). We believe that our KOALA models will have a significant practical impact, serving as cost-effective alternatives to SDXL for academic researchers and general users in resource-constrained environments.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# グラフ畳み込みはトランスフォーマーの自己意識を豊かにする! Graph Convolutions Enrich the Self-Attention in Transformers! ( http://arxiv.org/abs/2312.04234v3 ) ライセンス: Link先を確認	Jeongwhan Choi, Hyowon Wi, Jayoung Kim, Yehjin Shin, Kookjin Lee, Nathaniel Trask, Noseong Park,	(参考訳) トランスフォーマーは自己認識機構で知られており、自然言語処理、コンピュータビジョン、時系列モデリングなど様々なタスクで最先端のパフォーマンスを実現している。しかし、Deep Transformerモデルの課題の1つは、レイヤ間の表現が区別できない値に収束し、パフォーマンスが著しく低下するという過度な問題である。本稿では,従来の自己アテンションを単純なグラフフィルタとして解釈し,グラフ信号処理(GSP)の観点から再設計する。本稿では,グラフフィルタに基づく自己注意法(GFSA)を提案する。 GFSAはコンピュータビジョン,自然言語処理,グラフ回帰,音声認識,コード分類など,様々な分野におけるトランスフォーマーの性能向上を実証する。 Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph regression, speech recognition, and code classification.	翻訳日:2024-05-30 04:07:24 公開日:2024-05-28
# ボルツマン発電機を用いたMCMC移動を用いた遷移経路サンプリング Transition Path Sampling with Boltzmann Generator-based MCMC Moves ( http://arxiv.org/abs/2312.05340v2 ) ライセンス: Link先を確認	Michael Plainer, Hannes Stärk, Charlotte Bunne, Stephan Günnemann,	(参考訳) 分子系の2つの3次元状態間の全ての可能な遷移経路をサンプリングすることは、触媒設計から薬物発見まで、様々な応用がある。サンプル遷移経路への現在のアプローチはマルコフ連鎖モンテカルロを用いており、新しい経路を見つけるために時間集約的な分子動力学シミュレーションに依存している。我々の手法は、分子のボルツマン分布からガウスへ写像する正規化フローの潜在空間で機能し、分子シミュレーションを必要とせずに新しい経路を提案する。アラニンジペプチドを用いて,潜伏空間におけるメトロポリス・ハスティングスの受容基準を調査し,様々な潜伏提案機構について検討した。 Sampling all possible transition paths between two 3D states of a molecular system has various applications ranging from catalyst design to drug discovery. Current approaches to sample transition paths use Markov chain Monte Carlo and rely on time-intensive molecular dynamics simulations to find new paths. Our approach operates in the latent space of a normalizing flow that maps from the molecule's Boltzmann distribution to a Gaussian, where we propose new paths without requiring molecular simulations. Using alanine dipeptide, we explore Metropolis-Hastings acceptance criteria in the latent space for exact sampling and investigate different latent proposal mechanisms.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# Pensieveを使ったステートフルな大規模言語モデル Stateful Large Language Model Serving with Pensieve ( http://arxiv.org/abs/2312.05516v2 ) ライセンス: Link先を確認	Lingfan Yu, Jinyang Li,	(参考訳) 大規模言語モデル(LLM)は現在非常に人気があり、効率的に提供することが重要です。既存のLLMサービスシステムはリクエスト間でステートレスである。従って、マルチターン会話の共通設定でLLMを使用する場合、各ターンでサービスシステムによる要求と合わせて会話履歴のログを増大させ、繰り返し処理を行う必要がある。本稿では,マルチターン会話LLMサービスに最適化されたシステムであるPensieveを設計する。 Pensieveは、以前処理された履歴をキャッシュすることで、リクエスト間での会話状態を維持する。 Pensieveのマルチ層キャッシュ戦略は、GPUとCPUメモリの両方を使用して、キャッシュされたデータを効率的に保存および取得することができる。 Pensieve氏はまた、最近のPagedAttentionカーネルを一般化して、GPUキャッシュを非連続メモリ上に分散した複数の入力トークン間の注意をサポートする。評価の結果, Pensieve は vLLM や TensorRT-LLM と比較して 13-58% のスループットを実現でき,レイテンシを大幅に低減できることがわかった。 Large Language Models (LLMs) are wildly popular today and it is important to serve them efficiently. Existing LLM serving systems are stateless across requests. Consequently, when LLMs are used in the common setting of multi-turn conversations, a growing log of the conversation history must be processed alongside any request by the serving system at each turn, resulting in repeated processing. In this paper, we design Pensieve, a system optimized for multi-turn conversation LLM serving. Pensieve maintains the conversation state across requests by caching previously processed history to avoid duplicate processing. Pensieve's multi-tier caching strategy can utilize both GPU and CPU memory to efficiently store and retrieve cached data. Pensieve also generalizes the recent PagedAttention kernel to support attention between multiple input tokens with a GPU cache spread over non-contiguous memory. Our evaluation shows that Pensieve can achieve 13-58% more throughput compared to vLLM and TensorRT-LLM and significantly reduce latency.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# アクティベーショングラディエントに基づくバックドアアタックに対するポゾン化サンプル検出 Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks ( http://arxiv.org/abs/2312.06230v2 ) ライセンス: Link先を確認	Danni Yuan, Shaokui Wei, Mingda Zhang, Li Liu, Baoyuan Wu,	(参考訳) 本研究は,データ中毒によるバックドア攻撃に対する防毒試料検出の課題について検討する。その中核となる課題は、清潔と様々な種類の毒のサンプル(例えば、様々なトリガー、様々な毒の比率)を区別するための、一般化可能で差別的な指標を見つけることである。バックドアモデルが標的クラス内の有毒およびクリーンな試料を同様の活性化領域にマップする傾向にあるというバックドア攻撃の一般的な現象にインスパイアされた我々は、勾配の循環分布(GCD)と呼ばれる勾配の循環分布の新たな視点を導入する。そして,GCDに基づく2つの興味深い観測結果を得た。ひとつは、ターゲットクラスのサンプルのGCDがクリーンクラスのサンプルよりもずっと分散していることです。もう一つは、標的クラスのGCDでは、毒と清潔なサンプルが明確に分離されていることである。以上の2つの観察から着想を得た本研究では, アクティベーション・グラディエント・ベース・ポゾンド・サンプル検出 (AGPD) と呼ばれる, 革新的な3段階毒素検出手法を開発した。まず、信頼できないデータセットで訓練されたモデルから、すべてのクラスのGCDを計算する。そして,対象クラスとクリーンクラス間のGCD分散の違いに基づいて,対象クラス(es)を同定する。最後に, 汚染された試料とクリーンな試料との明確な分離に基づいて, 同定された標的クラス内の有毒試料をろ過する。種々のバックドア攻撃条件下での広範囲な実験により,本手法が既存の有毒検出方法よりも優れた検出性能を示した。 This work studies the task of poisoned sample detection for defending against data poisoning based backdoor attacks. Its core challenge is finding a generalizable and discriminative metric to distinguish between clean and various types of poisoned samples (e.g., various triggers, various poisoning ratios). Inspired by a common phenomenon in backdoor attacks that the backdoored model tend to map significantly different poisoned and clean samples within the target class to similar activation areas, we introduce a novel perspective of the circular distribution of the gradients w.r.t. sample activation, dubbed gradient circular distribution (GCD). And, we find two interesting observations based on GCD. One is that the GCD of samples in the target class is much more dispersed than that in the clean class. The other is that in the GCD of target class, poisoned and clean samples are clearly separated. Inspired by above two observations, we develop an innovative three-stage poisoned sample detection approach, called Activation Gradient based Poisoned sample Detection (AGPD). First, we calculate GCDs of all classes from the model trained on the untrustworthy dataset. Then, we identify the target class(es) based on the difference on GCD dispersion between target and clean classes. Last, we filter out poisoned samples within the identified target class(es) based on the clear separation between poisoned and clean samples. Extensive experiments under various settings of backdoor attacks demonstrate the superior detection performance of the proposed method to existing poisoned detection approaches according to sample activation-based metrics.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# GMTalker: ガウスのミキチャーをベースとした音声駆動型感情会話ビデオ「Portraits」 GMTalker: Gaussian Mixture-based Audio-Driven Emotional talking video Portraits ( http://arxiv.org/abs/2312.07669v2 ) ライセンス: Link先を確認	Yibo Xia, Lizhen Wang, Xiang Deng, Xiaoyan Luo, Yebin Liu,	(参考訳) 音声-リップ同期、鮮やかな表現、リアルな頭ポーズ、目まきといった、高忠実で感情制御可能な音声映像の合成は、近年重要かつ困難な課題となっている。既存の方法の多くは、パーソナライズされた正確な感情制御、異なる感情状態間の滑らかな遷移、多様な動きの生成に苦しむ。これらの課題に対処するために,ガウスの混合型感情的音声画像生成フレームワークであるGMTalkerを紹介する。具体的には,連続的かつ不整合な潜在空間を構築でき,より柔軟な感情操作を実現するガウス混合式生成器を提案する。さらに,多彩な頭部ポーズ,瞬き,眼球運動を生成するために,広範囲な動きを持つ大規模データセット上で事前訓練された正規化フローベースモーションジェネレータを導入する。最後に,感情マッピングネットワークを備えたパーソナライズされた感情誘導ヘッドジェネレータを提案する。定量的および定性的な実験は、画像品質、フォトリアリズム、感情の正確性、動きの多様性において、従来の手法よりも優れていた。 Synthesizing high-fidelity and emotion-controllable talking video portraits, with audio-lip sync, vivid expressions, realistic head poses, and eye blinks, has been an important and challenging task in recent years. Most existing methods suffer in achieving personalized and precise emotion control, smooth transitions between different emotion states, and the generation of diverse motions. To tackle these challenges, we present GMTalker, a Gaussian mixture-based emotional talking portraits generation framework. Specifically, we propose a Gaussian mixture-based expression generator that can construct a continuous and disentangled latent space, achieving more flexible emotion manipulation. Furthermore, we introduce a normalizing flow-based motion generator pretrained on a large dataset with a wide-range motion to generate diverse head poses, blinks, and eyeball movements. Finally, we propose a personalized emotion-guided head generator with an emotion mapping network that can synthesize high-fidelity and faithful emotional video portraits. Both quantitative and qualitative experiments demonstrate our method outperforms previous methods in image quality, photo-realism, emotion accuracy, and motion diversity.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# VQ-HPS:ベクトル量子化潜在空間における人間の姿勢と形状推定 VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space ( http://arxiv.org/abs/2312.08291v2 ) ライセンス: Link先を確認	Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer,	(参考訳) RGB画像からのHuman Pose and Shape Estimation(HPSE)に関するこれまでの研究は、パラメトリックと非パラメトリックの2つの主要なグループに分類される。近年の非パラメトリック手法は, 人体メッシュの3次元座標を直接回帰することにより, 高精度化を実現している。本研究はHPSE問題に対処する新しいパラダイムを導入し,人間のメッシュの低次元離散潜在表現とHPSEのフレーミングを分類課題とする。身体モデルパラメータや3次元頂点座標を予測する代わりに、提案する離散潜在表現の予測に重点を置いており、これは登録された人間のメッシュにデコードできる。この革新的なパラダイムには2つの大きな利点がある。第一に、低次元の離散表現を予測することは、トレーニングデータが少ない場合でも、人為的ポーズや形状の空間に予測を限定する。第二に、問題を分類タスクとしてフレーミングすることで、ニューラルネットワークに固有の識別力を利用することができる。提案モデルであるVQ-HPSはメッシュの離散潜在表現を予測する。実験結果から,VQ-HPSは従来の非パラメトリック手法よりも優れており,少ないデータでトレーニングした場合のパラメトリック手法と同等に現実的な結果が得られることがわかった。 VQ-HPSはまた、大規模データセットのトレーニングにおいて有望な結果を示し、HPSEの分類アプローチの有意義な可能性を強調している。プロジェクトページはhttps://g-fiche.github.io/research-pages/vqhps/にある。 Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for realistic results, whereas recent non-parametric methods achieve higher precision by directly regressing the 3D coordinates of the human body mesh. This work introduces a novel paradigm to address the HPSE problem, involving a low-dimensional discrete latent representation of the human mesh and framing HPSE as a classification task. Instead of predicting body model parameters or 3D vertex coordinates, we focus on predicting the proposed discrete latent representation, which can be decoded into a registered human mesh. This innovative paradigm offers two key advantages. Firstly, predicting a low-dimensional discrete representation confines our predictions to the space of anthropomorphic poses and shapes even when little training data is available. Secondly, by framing the problem as a classification task, we can harness the discriminative power inherent in neural networks. The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh. The experimental results demonstrate that VQ-HPS outperforms the current state-of-the-art non-parametric approaches while yielding results as realistic as those produced by parametric methods when trained with little data. VQ-HPS also shows promising results when training on large-scale datasets, highlighting the significant potential of the classification approach for HPSE. See the project page at https://g-fiche.github.io/research-pages/vqhps/	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# BDHT: 生成AIは軽度認知障害の因果解析を可能にする BDHT: Generative AI Enables Causality Analysis for Mild Cognitive Impairment ( http://arxiv.org/abs/2312.09022v2 ) ライセンス: Link先を確認	Qiankun Zuo, Ling Chen, Yanyan Shen, Michael Kwok-Po Ng, Baiying Lei, Shuqiang Wang,	(参考訳) 効果的な接続推定は、異なる脳領域間の相互作用と情報の流れを理解する上で重要な役割を果たす。しかし、有効接続を推定するために使用される関数時系列は、パラメータ設定が異なるため大きな計算誤差を生じさせ、脳領域間の複雑な因果関係をモデル化する能力が低下する可能性がある、特定のソフトウェアから導かれる。本稿では, 階層型トランスフォーマー(BDHT)を用いた脳ディフューザを提案し, 軽度認知障害(MCI)解析に有効な接続性を推定した。我々の知る限り、提案した脳ディフューザは、マルチモーダル脳ネットワークの生成と解析の応用に拡散モデルを適用した最初の生成モデルである。具体的には、BDHTは構造的な接続を利用して、逆のプロセスを効率的に導く。これにより、復調プロセスがより信頼性が高くなり、効果的な接続推定精度が保証される。階層型復調変換器は, 位相空間におけるマルチスケール特徴を学習するために設計されている。マルチヘッドアテンションとグラフ畳み込みネットワークを積み重ねることで、グラフ畳み込み変換器(GraphConformer)モジュールは構造-機能相補性を高め、ノイズ推定の能力を向上させる。遮音拡散モデルの実験的評価は, 有効接続性の推定に有効であることを示す。提案手法は,既存手法に比べて精度と頑健性に優れる。さらに,本モデルでは,変化方向の接続を同定し,MCI治療におけるパーフェノゲン生成の包括的理解を提供する。 Effective connectivity estimation plays a crucial role in understanding the interactions and information flow between different brain regions. However, the functional time series used for estimating effective connectivity is derived from certain software, which may lead to large computing errors because of different parameter settings and degrade the ability to model complex causal relationships between brain regions. In this paper, a brain diffuser with hierarchical transformer (BDHT) is proposed to estimate effective connectivity for mild cognitive impairment (MCI) analysis. To our best knowledge, the proposed brain diffuser is the first generative model to apply diffusion models to the application of generating and analyzing multimodal brain networks. Specifically, the BDHT leverages structural connectivity to guide the reverse processes in an efficient way. It makes the denoising process more reliable and guarantees effective connectivity estimation accuracy. To improve denoising quality, the hierarchical denoising transformer is designed to learn multi-scale features in topological space. By stacking the multi-head attention and graph convolutional network, the graph convolutional transformer (GraphConformer) module is devised to enhance structure-function complementarity and improve the ability in noise estimation. Experimental evaluations of the denoising diffusion model demonstrate its effectiveness in estimating effective connectivity. The proposed model achieves superior performance in terms of accuracy and robustness compared to existing approaches. Moreover, the proposed model can identify altered directional connections and provide a comprehensive understanding of parthenogenesis for MCI treatment.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 大規模言語モデルエージェントのためのワーキングメモリの強化 Empowering Working Memory for Large Language Model Agents ( http://arxiv.org/abs/2312.17259v2 ) ライセンス: Link先を確認	Jing Guo, Nan Li, Jianchuan Qi, Hang Yang, Ruiqiao Li, Yuzhen Feng, Si Zhang, Ming Xu,	(参考訳) 大規模言語モデル(LLM)は印象的な言語機能を実現している。しかしながら、重要な制限は、人間のような記憶能力の欠如に留まる。 LLMは連続的な相互作用に制約のあるメモリ保持を示し、複雑な推論を妨げる。本稿では,認知心理学のワーキングメモリフレームワークを適用し,LLMアーキテクチャを向上する可能性について考察する。従来のLLMメモリ設計の限界は、異なるダイアログエピソードの分離や永続的なメモリリンクの欠如など、分析される。これを解決するために、集中型ワーキングメモリハブとエピソード間のメモリ保持のためのエピソディックバッファアクセスを取り入れた革新的なモデルが提案されている。このアーキテクチャは、複雑なタスクと協調シナリオの間のニュアンス付きコンテキスト推論に対して、より継続的なものを提供することを目的としている。将来性はあるものの、エピソードメモリエンコーディング、ストレージ、優先順位付け、検索、セキュリティの最適化にはさらなる研究が必要である。本稿では,より高度で人間らしい記憶能力を持つLSMエージェントを開発するための戦略的青写真を提供し,汎用人工知能における重要なフロンティアとしてメモリ機構を強調した。 Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for developing LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 複雑力学系のモデルにおける構造誤差の学習 Learning About Structural Errors in Models of Complex Dynamical Systems ( http://arxiv.org/abs/2401.00035v2 ) ライセンス: Link先を確認	Jin-Long Wu, Matthew E. Levine, Tapio Schneider, Andrew Stuart,	(参考訳) 複雑な力学系は、いくつかの自由度(例えば、小さなスケール)が計算的に解決できない、あるいは完全に理解されていないため、モデル化が難しいことが知られているが、それらは動的に重要である。例えば、雲の力学と液滴の形成の小さなスケールは気候の制御に不可欠であるが、地球規模の気候モデルでは解決不可能である。未解決自由度の影響に対する半経験的閉包モデルは、しばしば存在し、重要なドメイン固有の知識を符号化する。このようなクロージャモデルを構築し、構造的エラーを学習して修正することは、ドメイン知識でデータを融合する効果的な方法である。ここでは、構造的エラーについて学ぶための一般的なアプローチ、原則、アルゴリズムについて説明する。このアプローチの鍵となるのは、例えば未解決スケールのクロージャモデルにおいて、複雑なシステムのモデル内に構造的エラーモデルを含めることです。構造誤差は、通常非線形に観測可能なデータにマッピングされる。しかしながら、モデル出力とデータ間のミスマッチは、ラベル付き入力ペアの欠如と構造誤差モデルの出力不足により、構造誤差について間接的にのみ通知される。さらに、モデルの微分は存在せず、容易に利用することができる。微分自由カルマン反転アルゴリズムと変種を用いた間接データから構造誤差モデルをどのように学習するか、空間制約が「害のない」原理をどのように強制するか、構造誤差をモデル化する方法について論じる。また、非局所的および確率的誤差モデルを使用することの利点についても論じる。さらに,データ同化技術が非エルゴディックシステムにおける構造的誤りの学習にどのように役立つかを示す。概念とアルゴリズムは、Lorenz-96システムとヒトグルコース-インスリンモデルに基づく2つの数値例で示される。 Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 圧縮部分空間を用いたワンステップレイト・フュージョン・マルチビュークラスタリング One-Step Late Fusion Multi-view Clustering with Compressed Subspace ( http://arxiv.org/abs/2401.01558v3 ) ライセンス: Link先を確認	Qiyuan Ou, Pei Zhang, Sihang Zhou, En Zhu,	(参考訳) 後期核融合型マルチビュークラスタリング(LFMVC)は、計算速度とクラスタリング性能に優れたため、マルチビュークラスタリング(MVC)分野において急速に成長する手法のクラスとなっている。既存のレイトフュージョンメソッドが直面しているボトルネックは、通常は平均的なカーネル関数に一致しているため、クラスタリングのパフォーマンスがデータセットの品質に大きく依存している点である。もう一つの問題は、コンセンサス分割行列を取得して最終的な離散ラベルを得るのにその後のk平均クラスタリングが必要であり、その結果ラベル学習とクラスタ構造最適化プロセスの分離がこれらのモデルの整合性を制限することである。上記の問題に対処するため,圧縮部分空間を用いたOne-Step Late Fusion Multi-view Clustering (OS-LFMVC-CS) という統合フレームワークを提案する。具体的には、コンセンサス部分空間を用いて分割行列を最適化し、分割融合を最適化し、融合された分割行列を用いて離散ラベルの学習を指導する。検証収束を伴う6段階反復最適化手法を提案する。複数のデータセットに対する十分な実験により,提案手法の有効性と有効性を検証した。 Late fusion multi-view clustering (LFMVC) has become a rapidly growing class of methods in the multi-view clustering (MVC) field, owing to its excellent computational speed and clustering performance. One bottleneck faced by existing late fusion methods is that they are usually aligned to the average kernel function, which makes the clustering performance highly dependent on the quality of datasets. Another problem is that they require subsequent k-means clustering after obtaining the consensus partition matrix to get the final discrete labels, and the resulting separation of the label learning and cluster structure optimization processes limits the integrity of these models. To address the above issues, we propose an integrated framework named One-Step Late Fusion Multi-view Clustering with Compressed Subspace (OS-LFMVC-CS). Specifically, we use the consensus subspace to align the partition matrix while optimizing the partition fusion, and utilize the fused partition matrix to guide the learning of discrete labels. A six-step iterative optimization approach with verified convergence is proposed. Sufficient experiments on multiple datasets validate the effectiveness and efficiency of our proposed method.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# MatSynth: 最新のPBRマテリアルデータセット MatSynth: A Modern PBR Materials Dataset ( http://arxiv.org/abs/2401.06056v3 ) ライセンス: Link先を確認	Giuseppe Vecchio, Valentin Deschaintre,	(参考訳) 4000以上のCC0超高分解能PBR材料のデータセットであるMatSynthを紹介する。物質は、ジオメトリーの表面における光の相互作用を定義する、仮想的な照準可能な資産の重要な構成要素である。その重要性から、彼らの表現、創造、獲得に多大な研究努力が注がれた。しかし、過去6年間で、ほとんどの材料買収や世代の研究は、同じユニークなデータセットか、会社が所有する巨大な手続き資料ライブラリに頼っていた。このデータセットでは、以前よりはるかに大きく、より多様性があり、高解像度の材料セットを提案する。我々は,データ収集プロセスについて慎重に議論し,本データセットが物質取得および生成アプリケーションにもたらすメリットを実証する。完全なデータには、各材料の起源、ライセンス、カテゴリ、タグ、作成方法、利用可能な場合、説明と物理サイズ、および様々な環境照明の下で1Kの3M+レンダリングを含むメタデータが含まれる。 MatSynthデータセットは、プロジェクトページ(https://www.gvecchio.com/matsynth)からリリースされている。 We introduce MatSynth, a dataset of 4,000+ CC0 ultra-high resolution PBR materials. Materials are crucial components of virtual relightable assets, defining the interaction of light at the surface of geometries. Given their importance, significant research effort was dedicated to their representation, creation and acquisition. However, in the past 6 years, most research in material acquisiton or generation relied either on the same unique dataset, or on company-owned huge library of procedural materials. With this dataset we propose a significantly larger, more diverse, and higher resolution set of materials than previously publicly available. We carefully discuss the data collection process and demonstrate the benefits of this dataset on material acquisition and generation applications. The complete data further contains metadata with each material's origin, license, category, tags, creation method and, when available, descriptions and physical size, as well as 3M+ renderings of the augmented materials, in 1K, under various environment lightings. The MatSynth dataset is released through the project page at: https://www.gvecchio.com/matsynth.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 消化器内視鏡検査における視力障害に対する自己改善プレトレーニングの検討 A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy ( http://arxiv.org/abs/2401.06278v2 ) ライセンス: Link先を確認	Edward Sanderson, Bogdan J. Matuszewski,	(参考訳) 消化器内視鏡(GIE)における視覚タスクへの解決策は、従来、ImageNet-1kをバックボーンとして、教師付き方法でトレーニングされたイメージエンコーダを用いていた。しかし、現代の自己教師付き事前学習アルゴリズムと100kの非ラベル付きGIE画像(Hyperkvasir-unlabelled)のデータセットを使用することで、改善が期待できる。本稿では,ResNet50 と ViT-B のバックボーンを用いたモデルの性能を,ImageNet-1k と Hyperkvasir-unlabelled (自己教師のみ) を用いて,GIE ビジョンタスクにおいて事前訓練した。各タスクに最も適した事前学習パイプラインとバックボーンアーキテクチャの同定に加えて,本研究の結果から3つの原則が示唆された。第一に、自己教師付き事前訓練は一般的に、教師付き事前訓練よりも、GIEビジョンタスクに適したバックボーンを生成する。第二に、ImageNet-1kを用いた自己教師付き事前訓練は、大腸内視鏡における単分子深度推定の顕著な例外を除いて、Hyperkvasir-unlabelledによる事前訓練よりも適している。第三に、ViT-Bsは大腸内視鏡におけるポリープのセグメンテーションや単分子深度推定に適しており、ResNet50sはポリープ検出に適しており、どちらのアーキテクチャも解剖学的ランドマーク認識や病理学的特徴付けでも同じように機能する。我々は、この研究がGIEビジョンタスクの事前訓練の複雑さに注意を向け、大会よりも適切なアプローチをこの開発に通知し、この開発を促進するためにさらなる研究を促すことを願っている。コード提供: \underline{github.com/ESandML/SSL4GIE} Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained in a supervised manner with ImageNet-1k as backbones. However, the use of modern self-supervised pretraining algorithms and a recent dataset of 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks. In addition to identifying the most suitable pretraining pipeline and backbone architecture for each task, out of those considered, our results suggest three general principles. Firstly, that self-supervised pretraining generally produces more suitable backbones for GIE vision tasks than supervised pretraining. Secondly, that self-supervised pretraining with ImageNet-1k is typically more suitable than pretraining with Hyperkvasir-unlabelled, with the notable exception of monocular depth estimation in colonoscopy. Thirdly, that ViT-Bs are more suitable in polyp segmentation and monocular depth estimation in colonoscopy, ResNet50s are more suitable in polyp detection, and both architectures perform similarly in anatomical landmark recognition and pathological finding characterisation. We hope this work draws attention to the complexity of pretraining for GIE vision tasks, informs this development of more suitable approaches than the convention, and inspires further research on this topic to help advance this development. Code available: \underline{github.com/ESandML/SSL4GIE}	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# InterS: インストラクションチューニングによる検索における大規模言語モデルのパワーの解放 INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning ( http://arxiv.org/abs/2401.06532v3 ) ライセンス: Link先を確認	Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zheng Liu, Ji-Rong Wen, Zhicheng Dou,	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて印象的な機能を示している。それにもかかわらず、情報検索(IR)タスクへのそれらの適用は、自然言語における多くのIR固有の概念の頻繁な発生のため、いまだに困難である。プロンプトベースのメソッドはLLMにタスク記述を提供することができるが、IRタスクの包括的な理解と実行を容易にするのに不足するため、LLMの適用性が制限されることが多い。このギャップに対処するため、本研究では、IRタスクにおけるLLMの習熟度を高めるために、命令チューニングの可能性について検討する。我々は,クエリ理解,文書理解,クエリドキュメント関係理解という3つの基本的なIRカテゴリにまたがる20のタスクを含む,新しい命令チューニングデータセット InterS を導入する。データは、手書きのテンプレートを持つ43の異なるデータセットから導出される。実験結果から、IRタスクにおいて、InterSはLLaMA、Mistral、Phiといった様々な公開LLMの性能を大幅に向上させることが明らかとなった。さらに、命令設計、テンプレートの多様性、数発のデモ、および命令のボリュームがパフォーマンスに与える影響を分析するための広範な実験を行った。データセットと微調整されたモデルをhttps://github.com/DaoD/INTERSで公開しています。 Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. While prompt-based methods can provide task descriptions to LLMs, they often fall short in facilitating a comprehensive understanding and execution of IR tasks, thereby limiting LLMs' applicability. To address this gap, in this work, we explore the potential of instruction tuning to enhance LLMs' proficiency in IR tasks. We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories: query understanding, document understanding, and query-document relationship understanding. The data are derived from 43 distinct datasets with manually written templates. Our empirical results reveal that INTERS significantly boosts the performance of various publicly available LLMs, such as LLaMA, Mistral, and Phi, in IR tasks. Furthermore, we conduct extensive experiments to analyze the effects of instruction design, template diversity, few-shot demonstrations, and the volume of instructions on performance. We make our dataset and the fine-tuned models publicly accessible at https://github.com/DaoD/INTERS.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 地球ビジョンのための統一基盤モデルを目指して One for All: Toward Unified Foundation Models for Earth Vision ( http://arxiv.org/abs/2401.07527v2 ) ライセンス: Link先を確認	Zhitong Xiong, Yi Wang, Fahong Zhang, Xiao Xiang Zhu,	(参考訳) 広範囲なパラメータを特徴とし、大規模なデータセットで訓練された基礎モデルは、リモートセンシングデータのための様々な下流タスクにおいて顕著な有効性を示している。現在のリモートセンシング基盤モデルは、通常、単一のモダリティまたは特定の空間解像度範囲を専門とし、下流データセットの汎用性を制限する。マルチモーダルリモートセンシング基盤モデルの開発は試みられているが、通常、各モードや空間解像度に別々の視覚エンコーダを使用し、入力データに基づいてバックボーンのスイッチを必要とする。この問題に対処するために、単一共有トランスフォーマーバックボーンを用いて、空間解像度の異なる複数のデータモダリティを実現する、単純なOFA-Net(One-For-All Network)手法を提案する。マスク付き画像モデリング機構を用いて、このシンプルな設計で、キュレートされたマルチモーダルデータセット上で、1つのTransformerバックボーンを事前訓練する。次に、バックボーンモデルは異なる下流タスクで使用することができ、地球ビジョンにおける統一された基盤バックボーンモデルへの道を開くことができる。提案手法は,12の異なる下流タスクに対して評価し,有望な性能を示す。 Foundation models characterized by extensive parameters and trained on large-scale datasets have demonstrated remarkable efficacy across various downstream tasks for remote sensing data. Current remote sensing foundation models typically specialize in a single modality or a specific spatial resolution range, limiting their versatility for downstream datasets. While there have been attempts to develop multi-modal remote sensing foundation models, they typically employ separate vision encoders for each modality or spatial resolution, necessitating a switch in backbones contingent upon the input data. To address this issue, we introduce a simple yet effective method, termed OFA-Net (One-For-All Network): employing a single, shared Transformer backbone for multiple data modalities with different spatial resolutions. Using the masked image modeling mechanism, we pre-train a single Transformer backbone on a curated multi-modal dataset with this simple design. Then the backbone model can be used in different downstream tasks, thus forging a path towards a unified foundation backbone model in Earth vision. The proposed method is evaluated on 12 distinct downstream tasks and demonstrates promising performance.	翻訳日:2024-05-30 03:57:34 公開日:2024-05-28
# 連成モデルとマッチングによる前方・周囲からの3次元車線検出 3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching ( http://arxiv.org/abs/2401.08036v2 ) ライセンス: Link先を確認	Haibin Zhou, Huabing Zhou, Jun Chang, Tao Lu, Jiayi Ma,	(参考訳) 3Dレーンは2Dレーンよりも道路表面の幾何学をより包括的に理解し、運転決定と軌道計画に重要な基準を提供する。多くの取り組みは予測精度を向上させることを目的としているが、効率的なネットワークはレーンモデリングに結果をもたらす可能性があることを認識している。しかし、モデリングデータが不正確であれば、実際のシナリオを正確に捉えることはできないかもしれない。したがって、予測結果を環境と密に整合させるためには、正確な車線モデリングが不可欠である。本研究では,ベジエ曲線と補間法を組み合わせた共同モデリング手法を提案する。さらに,このレーンモデリング手法を用いて,ベジエ制御点とキーポイントを用いたGlobal2Local Lane Matching法を開発した。また,3次元サラウンドビューレーン検出研究の探索を目的とした新しい3次元空間エンコーダについても紹介する。このフレームワークは、フロントビューまたはサラウンドビューの3Dレーン検出に適している。 3次元空間においてレーンのキーポイントを直接出力することにより、アンカーベースの手法の限界を克服し、閉ループやU字形のレーンの正確な予測と複雑な道路条件への効果的な適応を可能にする。この革新的な手法は、Openlaneデータセットのフロントビュー3Dレーン検出において新しいベンチマークを確立し、Argoverse2データセットのサラウンドビュー2Dレーン検出において競合性能を達成する。 3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture the real-world scenario. Therefore, accurate lane modeling is essential to align prediction results closely with the environment. This study centers on efficient and accurate lane modeling, proposing a joint modeling approach that combines Bezier curves and interpolation methods. Furthermore, based on this lane modeling approach, we developed a Global2Local Lane Matching method with Bezier Control-Point and Key-Point, which serve as a comprehensive solution that leverages hierarchical features with two mathematical models to ensure a precise match. We also introduce a novel 3D Spatial Encoder, representing an exploration of 3D surround-view lane detection research. The framework is suitable for front-view or surround-view 3D lane detection. By directly outputting the key points of lanes in 3D space, it overcomes the limitations of anchor-based methods, enabling accurate prediction of closed-loop or U-shaped lanes and effective adaptation to complex road conditions. This innovative method establishes a new benchmark in front-view 3D lane detection on the Openlane dataset and achieves competitive performance in surround-view 2D lane detection on the Argoverse2 dataset.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# ラベルのない共変量シフト下でのモデル性能の推定 Estimating Model Performance Under Covariate Shift Without Labels ( http://arxiv.org/abs/2401.08348v3 ) ライセンス: Link先を確認	Jakub Białek, Wojtek Kuberski, Nikolaos Perrakis, Albert Bifet,	(参考訳) マシンラーニングモデルは、データ分散の変化によるデプロイ後のパフォーマンス劣化を経験することが多い。ラベルが欠けたり遅れたりした場合、モデルのパフォーマンスを正確に評価することは困難である。ドリフト検出のような既存のプロキシ手法では、これらのシフトの影響を適切に測定できない。そこで本研究では,共変量シフトがモデル性能に与える影響を正確に評価する,ラベルなしデータに基づく分類モデルの評価手法である確率適応性能推定(PAPE)を提案する。モデルとデータタイプの非依存であり、さまざまなパフォーマンスメトリクスで機能する。重要なことに、PAPEは元のモデルとは独立して動作し、予測と確率推定のみに依存し、代わりにデータから直接学習する共変量シフトの性質に関する仮定は不要である。我々は、米国国勢調査データから900以上のデータセットモデルの組み合わせを用いて表データ上でPAPEを試験し、その性能を複数のベンチマークで評価した。全体として、PAPEは他の評価手法よりも正確な性能評価を提供した。 Machine learning models often experience performance degradation post-deployment due to shifts in data distribution. It is challenging to assess model's performance accurately when labels are missing or delayed. Existing proxy methods, such as drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method, Probabilistic Adaptive Performance Estimation (PAPE), for evaluating classification models on unlabeled data that accurately quantifies the impact of covariate shift on model performance. It is model and data-type agnostic and works for various performance metrics. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of the covariate shift, learning directly from data instead. We tested PAPE on tabular data using over 900 dataset-model combinations created from US census data, assessing its performance against multiple benchmarks. Overall, PAPE provided more accurate performance estimates than other evaluated methodologies.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# MorphGrower: プラウシブル神経形態形成のためのシンクロナイズド・レイヤ・バイ・レイヤー成長アプローチ MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation ( http://arxiv.org/abs/2401.09500v3 ) ライセンス: Link先を確認	Nianzu Yang, Kaipeng Zeng, Haotian Lu, Yexin Wu, Zexin Yuan, Danni Chen, Shengdian Jiang, Jiaxiang Wu, Yimin Wang, Junchi Yan,	(参考訳) 神経形態学は脳の機能研究と神経変性疾患の理解に不可欠である。実世界の形態データの取得は費用がかかるため、形態素生成のための計算手法が研究されている。従来の手法はエキスパートセットのルールやパラメータのチューニングに大きく依存しており、様々な形態素をまたいだ一般化が困難である。近年、MorphVAEは単独の学習法として導入されているが、その生成形態は妥当性に欠けており、現実的には見えず、ほとんどのサンプルは位相的に無効である。このギャップを埋めるために、生成のためのニューロンの自然成長機構を模倣したMorphGrowerを提案する。具体的には、MorphGrowerは層ごとにモルフォロジー層を生成し、その後の各層は以前に生成された構造に条件付けされる。各レイヤ生成において、MorphGrowerは、基本的な生成ブロックとして、一対の兄弟ブランチを使用し、同期的にブランチペアを生成する。このアプローチは位相的妥当性を保証し、きめ細かな生成を可能にし、最終的な生成形態の現実性を高める。 4つの実世界のデータセットの結果、MorphGrowerはMorphVAEを顕著な差で上回っている。重要なことに、電気生理学的反応シミュレーションは、神経科学の観点から生成されたサンプルの妥当性を示す。私たちのコードはhttps://github.com/Thinklab-SJTU/MorphGrower.comで公開されています。 Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# PatchAD: 時系列異常検出のための軽量パッチベースMLPミキサ PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection ( http://arxiv.org/abs/2401.09793v5 ) ライセンス: Link先を確認	Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, Kaixiang Yang,	(参考訳) 時系列解析における異常検出は重要な課題であるが、ラベル不足シナリオにおける正常パターンと異常パターンを識別することが課題となっている。以前の研究では、モデルの表現能力を制限する再構成に基づくアプローチが大半を占めていた。さらに、既存のディープラーニングベースの手法は十分に軽量ではない。これらの問題に対処するため,表現抽出と異常検出にコントラスト学習を利用する,新しいマルチスケールパッチベースのマルチスケールMLP-MixerアーキテクチャであるPatchADを提案する。 4つの異なるMLPミキサーと革新的なデュアルプロジェクト制約モジュールにより、PatchADは潜在的なモデル劣化を軽減し、わずか3.2$MBの軽量なソリューションを提供する。その有効性は、異なるアプリケーションシナリオから得られる9ドルのデータセットの最先端の結果によって実証され、30ドルの比較アルゴリズムよりも優れています。 PatchAD は古典的な F1 スコアを 50.5 %$ で、Aff-F1 スコアを 7.8 %$ で、AUC スコアを $10.0 %$ で大幅に改善する。コードは公開されている。 \url{https://github.com/EmorZz1G/PatchAD} Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we present PatchAD, our novel, highly efficient multiscale patch-based MLP-Mixer architecture that utilizes contrastive learning for representation extraction and anomaly detection. With its four distinct MLP Mixers and innovative dual project constraint module, PatchAD mitigates potential model degradation and offers a lightweight solution, requiring only $3.2$MB. Its efficacy is demonstrated by state-of-the-art results across $9$ datasets sourced from different application scenarios, outperforming over $30$ comparative algorithms. PatchAD significantly improves the classical F1 score by $50.5\%$, the Aff-F1 score by $7.8\%$, and the AUC by $10.0\%$. The code is publicly available. \url{https://github.com/EmorZz1G/PatchAD}	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# 相関格子QCDアンサンブル生成への流れモデルの適用 Applications of flow models to the generation of correlated lattice QCD ensembles ( http://arxiv.org/abs/2401.10874v2 ) ライセンス: Link先を確認	Ryan Abbott, Aleksandar Botev, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban,	(参考訳) 機械学習された正規化フローは、格子量子場理論の文脈で、異なる作用パラメータで格子ゲージ場の統計的に相関したアンサンブルを生成するために用いられる。本研究は,これらの相関を可観測物の計算における分散低減に活用する方法を実証する。ゲージ理論の連続極限、QCD観測値の質量依存性、ファインマン・ヘルマンアプローチに基づくハドロン行列要素である。いずれの場合も,非相関なアンサンブルや直接再重み付けによる計算と比較すると,機械学習フローが組み込まれた場合,統計的不確実性が著しく低下することが示されている。 Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters. This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables. Three different proof-of-concept applications are demonstrated using a novel residual flow architecture: continuum limits of gauge theories, the mass dependence of QCD observables, and hadronic matrix elements based on the Feynman-Hellmann approach. In all three cases, it is shown that statistical uncertainties are significantly reduced when machine-learned flows are incorporated as compared with the same calculations performed with uncorrelated ensembles or direct reweighting.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# Redditの大規模な非プラットフォーム化作戦の効力と意図しない結果 The Great Ban: Efficacy and Unintended Consequences of a Massive Deplatforming Operation on Reddit ( http://arxiv.org/abs/2401.11254v5 ) ライセンス: Link先を確認	Lorenzo Cima, Amaury Trujillo, Marco Avvenuti, Stefano Cresci,	(参考訳) オンラインの悪用や害の現場では、安全で包括的なオンライン空間を育むために効果的なコンテンツモデレーションが必要である。しかし、多くのモデレーション介入の有効性はまだ不明である。ここでは、Reddit上で2000近いコミュニティに影響を及ぼした大規模な非プラットフォーム運用であるThe Great Banの有効性を評価する。 14ヶ月の間に17万のユーザーが投稿した16万件のコメントを分析して、この禁止が望まれているか、その他のかたちで、詳細な結果を提供する。主な発見は、影響を受けたユーザーの15.6%がRedditを離れ、その毒性を平均6.6%減らしたことである。この禁止により、5%のユーザーがプレバンレベルの70%以上の毒性を増すことになった。全体として、当社の多面的結果は、デプラットフォームの有効性に関する新たな洞察を与えてくれます。このようなことから,今後のモデレーション介入の進展とオンラインプラットフォームに対する規制の進展が示唆される。 In the current landscape of online abuses and harms, effective content moderation is necessary to cultivate safe and inclusive online spaces. Yet, the effectiveness of many moderation interventions is still unclear. Here, we assess the effectiveness of The Great Ban, a massive deplatforming operation that affected nearly 2,000 communities on Reddit. By analyzing 16M comments posted by 17K users during 14 months, we provide nuanced results on the effects, both desired and otherwise, of the ban. Among our main findings is that 15.6% of the affected users left Reddit and that those who remained reduced their toxicity by 6.6% on average. The ban also caused 5% users to increase their toxicity by more than 70% of their pre-ban level. Overall, our multifaceted results provide new insights into the efficacy of deplatforming. As such, our findings can inform the development of future moderation interventions and the policing of online platforms.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# MF-AED-AEC:マルチモーダルフュージョンを利用した音声感情認識、アラー誤差検出、およびアラー誤差補正 MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction ( http://arxiv.org/abs/2401.13260v2 ) ライセンス: Link先を確認	Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda,	(参考訳) 音声感情認識(SER)における一般的なアプローチは、話者の感情を包括的に識別するために、音声情報とテキスト情報の両方を統合することである。このアプローチの重要な問題は、テキストモダリティからのASRエラーがSERの性能を悪化させることである。従来の研究では、補助的なASRエラー検出タスクを用いて、各単語の重みをASR仮説に適応的に割り当てることが提案されている。しかし,本手法はテキスト中の意味情報の一貫性に対処しないため,改善可能性に制限がある。さらに、異なるモジュラリティの固有の不均一性は、それらの表現間の分配ギャップを生じさせ、融合を困難にする。そこで本稿では、ASRテキストのセマンティックコヒーレンスを高めるために、ASRエラー検出(AED)とASRエラー補正(AEC)という2つの補助タスクを組み込み、また、モダリティ間の共有表現を学習するための新しいマルチモーダル融合(MF)手法を導入する。本手法をMF-AED-AECと呼ぶ。実験の結果、MF-AED-AECはベースラインモデルのマージン4.1\%を大きく上回ることがわかった。 The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxiliary ASR error detection task to adaptively assign weights of each word in ASR hypotheses. However, this approach has limited improvement potential because it does not address the coherence of semantic information in the text. Additionally, the inherent heterogeneity of different modalities leads to distribution gaps between their representations, making their fusion challenging. Therefore, in this paper, we incorporate two auxiliary tasks, ASR error detection (AED) and ASR error correction (AEC), to enhance the semantic coherence of ASR text, and further introduce a novel multi-modal fusion (MF) method to learn shared representations across modalities. We refer to our method as MF-AED-AEC. Experimental results indicate that MF-AED-AEC significantly outperforms the baseline model by a margin of 4.1\%.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# MM-LLM:マルチモーダル大言語モデルの最近の進歩 MM-LLMs: Recent Advances in MultiModal Large Language Models ( http://arxiv.org/abs/2401.13601v5 ) ライセンス: Link先を確認	Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu,	(参考訳) 過去1年間で、MM-LLM(MultiModal Large Language Models)は大幅に進歩し、MM入力やアウトプットをコスト効率のよいトレーニング戦略を通じてサポートするために、既製のLLMを拡張した。結果として得られたモデルは、LLMの固有の推論と意思決定能力を保持するだけでなく、様々なMMタスクの強化にも寄与する。本稿では,MM-LLMのさらなる研究を促進するための総合的な調査を行う。まず、モデルアーキテクチャとトレーニングパイプラインのための一般的な設計の定式化について概説する。その後,126個のMM-LLMを包含する分類法を導入し,それぞれにその特異な定式化を特徴とする。さらに,主要なベンチマークで選択したMM-LLMの性能を概観し,MM-LLMの有効性を高めるための鍵となるトレーニングレシピを要約する。最後に,MM-LLMの今後の方向性を検討するとともに,現場の最新開発のためのリアルタイム追跡Webサイトを同時に維持する。この調査がMM-LLMsドメインの継続的な進歩に寄与することを願っている。 In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# アルゴリズムシステムの保証監査のためのフレームワーク A Framework for Assurance Audits of Algorithmic Systems ( http://arxiv.org/abs/2401.14908v2 ) ライセンス: Link先を確認	Khoa Lam, Benjamin Lange, Borhane Blili-Hamelin, Jovana Davidovic, Shea Brown, Ali Hasan,	(参考訳) 人工知能(AI)システムの透明性と説明責任を達成するメカニズムとして、AI監査を提案する規制が増えている。様々な形のAI監査に関するいくつかの規範にもかかわらず、コンプライアンスと保証の目的のための監査は、現在合意された慣行、手続き、分類学、標準を欠いている。本稿では,運用可能なコンプライアンスおよび保証外部監査フレームワークとして,基準監査を提案する。我々は、金融監査のプラクティス後のこのアプローチの要素をモデル化し、AI監査も同様に、AI組織が人的価値を害し維持する手段でアルゴリズムを管理する能力について、ステークホルダーに保証を提供するべきだ、と論じている。我々は,基準監査に必要な条件について議論し,実際に監査を行うための手続き的青写真を提供する。本稿では,2021年のニューヨーク市地方法144条に要求される,顕微鏡内採用アルゴリズムにおけるバイアス監査の基準を導出することにより,この枠組みを現行の規制に適合させる方法について述べる。私たちは、より成熟した金融監査産業のプラクティスを、品質保証問題に対する堅牢なガードレールが出現し始めているAI監査に適用するという、メリット、固有の制限、実装上の課題について、批判的な議論をすることで締めくくります。これらの監査を実践した経験から得られた議論は、監査エコシステムが監査の有効性を確実にする上で果たす重要な役割を強調します。 An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits can be performed for in-scope hiring algorithms, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering a critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion -- informed by experiences in performing these audits in practice -- highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of audits.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# 歴史を意識した会話難読度検索 History-Aware Conversational Dense Retrieval ( http://arxiv.org/abs/2401.16659v3 ) ライセンス: Link先を確認	Fengran Mo, Chen Qu, Kelong Mao, Tianyu Zhu, Zhan Su, Kaiyu Huang, Jian-Yun Nie,	(参考訳) 対話型検索は,ユーザとシステム間のマルチターンインタラクションを可能にすることで,複雑な情報検索を容易にする。このようなインタラクションをサポートするには、過去の情報に基づいて優れた検索クエリを定式化するために、会話入力を包括的に理解する必要がある。特に、検索クエリには、前の会話のターンから関連する情報を含めるべきである。しかし、近年の会話高密度検索のアプローチは、主に、会話検索セッション全体を用いて訓練済みのアドホック検索を微調整することに頼っている。さらに、既存のアプローチは、既存のデータセットにおける手動の監視信号の量によって制限される。上記の課題に対処するため, 歴史的ターンの実際の影響に基づいて, 文脈決定型クエリ再構成と監視信号の自動マイニングという2つのアイデアを取り入れた, 歴史認識型会話用Dense Retrieval (HAConvDR) システムを提案する。 2つの公開会話検索データセットの実験は、HAConvDRの履歴モデリング機能の改善を実証している。 Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# 単元ミラー回路を用いた計測誘起絡み合い遷移の検出 Detecting Measurement-Induced Entanglement Transitions With Unitary Mirror Circuits ( http://arxiv.org/abs/2401.17367v2 ) ライセンス: Link先を確認	Yariv Yanay, Brian Swingle, Charles Tahan,	(参考訳) 監視されたランダム回路は、2量子ゲートを交互に絡み合う層と、ある分数$p$の量子ビットに適用される射影的な単一量子ビットの測定から成り立っているが、近年の関心事である。特に、結果として生じる定常状態は、高相関状態と「体積法則」の絡み合いが$p<p_{c}$から「面積法則」の絡み合いが$p>p_{c}$への位相遷移を示す。アンサンブルレベルでは見えないため、この遷移に実験的にアクセスすることは困難である。自然に観察するためには、測定結果の集合がそれ自身を繰り返すまで実験を繰り返す必要がある。この問題を克服するため,我々は,投影回路の行列積状態(MPS)に基づく「ユニタリミラー」を生成するハイブリッド量子古典アルゴリズムを提案する。多項式サイズのテンソルネットワークは、領域法的な絡み合いを持つ量子状態を表すことができるので、ユニタリミラーは、$p_{c}$以上の実験状態を十分に近似することができるが、指数関数的にそれ以下に失敗する。これにより、この鏡の破片は臨界点を特定できる。アルゴリズムの概要と,その方法について概説する。我々は、MPSによってよく表現される任意の状態の最大エンタングルメントエントロピー上の有界性を示し、その有界性から、体積法相がどのように有界であるかを示唆する。我々は、この絡み合いがMPSが失敗する下からも同様に束縛できるかどうか検討する。最後に、小さな量子ビット数とランダムなクリフォードゲートを持つモニタ回路の数値結果を示す。 Monitored random circuits, consisting of alternating layers of entangling two-qubit gates and projective single-qubit measurements applied to some fraction $p$ of the qubits, have been a topic of recent interest. In particular, the resulting steady state exhibits a phase transition from highly correlated states with "volume-law" entanglement at $p<p_{c}$ to localized states with "area-law" entanglement at $p>p_{c}$. It is hard to access this transition experimentally, as it cannot be seen at the ensemble level. Naively, to observe it one must repeat the experiment until the set of measurement results repeats itself, with likelihood that is exponentially small in the number of measurements. To overcome this issue, we present a hybrid quantum-classical algorithm which creates a matrix product state (MPS) based "unitary mirror" of the projected circuit. Polynomial-sized tensor networks can represent quantum states with area-law entanglement, and so the unitary mirror can well-approximate the experimental state above $p_{c}$ but fails exponentially below it. The breaking of this mirror can thus pinpoint the critical point. We outline the algorithm and how such results would be obtained. We present a bound on the maximum entanglement entropy of any given state that is well-represented by an MPS, and from the bound suggest how the volume-law phase can be bounded. We consider whether the entanglement could similarly be bounded from below where the MPS fails. Finally, we present numerical results for small qubit numbers and for monitored circuits with random Clifford gates.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# 量子リップルキャリー加算器と比較器におけるTおよびCNOTゲートの最適化 Optimizing T and CNOT Gates in Quantum Ripple-Carry Adders and Comparators ( http://arxiv.org/abs/2401.17921v3 ) ライセンス: Link先を確認	Maxime Remaud,	(参考訳) 2つのnビット数の追加と比較のためのリップルキャリー戦略を用いた量子回路の最先端技術と、CNOT-deepthとT-deepth、またはCNOT-countとT-countの両点でクリフォード+Tゲートセットの最適化について述べる。特に,Cuccaro et al と Takahashi et al が提示した加算器を考慮し,元の回路を最適化することなく,T-depth 3n と CNOT-depth 8n の加算器を提示し,T-depth 6n を期待する。ここでは、少なくとも1つのアンシラを用いた量子リップルキャリー加算器(Toffoli, Peres, TR)や測定を含む戦略の近似を伴わない点に注目した。 The state of the art of quantum circuits using the ripple-carry strategy for the addition and comparison of two n-bit numbers is presented, as well as optimizations in the Clifford+T gate set, both in terms of CNOT-depth and T-depth, or CNOT-count and T-count. In particular, we consider the adders presented by Cuccaro et al. and Takahashi et al., and exhibit an adder with a T-depth of 3n and a CNOT-depth of 8n, while without optimization of the original circuits, a T-depth of 6n is expected. Note that we have focused here on quantum ripple-carry adders using at most one ancilla, without any approximation of the 3-qubit gates involved (Toffoli, Peres and TR) or any strategy involving a measurement.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# データ効率のよいグラフ学習に関する調査研究 A Survey of Data-Efficient Graph Learning ( http://arxiv.org/abs/2402.00447v3 ) ライセンス: Link先を確認	Wei Ju, Siyu Yi, Yifan Wang, Qingqing Long, Junyu Luo, Zhiping Xiao, Ming Zhang,	(参考訳) グラフ構造化データは、ソーシャルネットワークから生化学分析まで、様々な現実世界のシステムの基盤となっている。グラフニューラルネットワークはこの種のデータモデリングの習熟度を示しているが、その成功はしばしば大量のラベル付きデータに依存しており、アノテーションリソースが限られている現実的なシナリオでは課題となっている。この問題に対処するため,低リソース環境下でのグラフ機械学習の性能向上に多大な努力が注がれている。本稿では,研究フロンティアとしてData-Efficient Graph Learning(DEGL)という新しい概念を紹介し,DEGLの現在の進歩をまとめた最初の調査を紹介する。私たちは、大きなラベル付きデータでトレーニングモデルに固有の課題を強調し、DEGLへの探索の道を開くことで開始します。次に、このトピックに関する最近の進歩を、自己教師付きグラフ学習、半教師付きグラフ学習、少数ショットグラフ学習など、いくつかの重要な側面から体系的にレビューする。また,今後の研究の方向性を述べるとともに,グラフ機械学習の進化に寄与する。 Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.	翻訳日:2024-05-30 03:47:50 公開日:2024-05-28
# 減衰ステップサイズによるオンライン共形予測 Online conformal prediction with decaying step sizes ( http://arxiv.org/abs/2402.01139v2 ) ライセンス: Link先を確認	Anastasios N. Angelopoulos, Rina Foygel Barber, Stephen Bates,	(参考訳) 本稿では, 減衰段数によるオンライン共形予測手法を提案する。従来の方法と同様に、任意のシーケンスに対するカバレッジのリフレクションが保証されています。しかし、従来の方法とは違って、人口量の推定は同時に行うことができる。我々の理論と実験は、特に、分布が安定な場合、その範囲は観測されたシーケンスの平均だけでなく、各時点の所望のレベルに近づいたことを示唆している。 We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level for every time point, not just on average over the observed sequence.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# カーネル・固有ペアスパース変分ガウス過程による自己注意 Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes ( http://arxiv.org/abs/2402.01476v2 ) ライセンス: Link先を確認	Yingyi Chen, Qinghua Tao, Francesco Tonin, Johan A. K. Suykens,	(参考訳) トランスフォーマーの優れた能力は予測精度を大幅に向上させるが、過度に信頼された予測を導き、ガウス過程(GP)に対処できる校正された不確実性推定を必要とする可能性がある。既存の研究は、アテンションカーネルに対する変分推論の下で対称核を持つGPを適用するが、アテンションカーネルが本質的に非対称であるという事実を省略する。さらに、GP後部を導出する複雑さは、大規模データにとって依然として高い。本稿では,Kernel SVD(KSVD)により注目カーネルの非対称性が取り組まれる不確実性を考慮した自己アテンションを構築するためのKEP-SVGP(Kernel-Eigen Pair Sparse Variational Gaussian Processs)を提案する。略称KEP-SVGP。 i) KSVD w.r.t.の2つの特異ベクトルの集合によって誘導されるSVGP対。注目核は非対称性を完全に特徴づける。二 SVGP後縁の導出は、KSVDからの一組の随伴固有関数のみを用いて、特異値を含む対角行列の逆転に基づいて、時間的複雑さの低減に寄与することができる。三変動パラメータ及びネットワーク重み付けを最適化できるように、下限の証拠を導出すること。 In-distriion, distribution-shift, out-of-distriionベンチマークにおける優れた性能と効率を検証した。 While the great capability of Transformers significantly boosts prediction accuracy, it could also yield overconfident predictions and require calibrated uncertainty estimation, which can be commonly tackled by Gaussian processes (GPs). Existing works apply GPs with symmetric kernels under variational inference to the attention kernel; however, omitting the fact that attention kernels are in essence asymmetric. Moreover, the complexity of deriving the GP posteriors remains high for large-scale data. In this work, we propose Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention where the asymmetry of attention kernels is tackled by Kernel SVD (KSVD) and a reduced complexity is acquired. Through KEP-SVGP, i) the SVGP pair induced by the two sets of singular vectors from KSVD w.r.t. the attention kernel fully characterizes the asymmetry; ii) using only a small set of adjoint eigenfunctions from KSVD, the derivation of SVGP posteriors can be based on the inversion of a diagonal matrix containing singular values, contributing to a reduction in time complexity; iii) an evidence lower bound is derived so that variational parameters and network weights can be optimized with it. Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# ドットの接続: モード接続性はベイズニューラルネットワークにおける可能なサンプルベース推論の鍵か? Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks? ( http://arxiv.org/abs/2402.01484v2 ) ライセンス: Link先を確認	Emanuel Sommer, Lisa Wimmer, Theodore Papamarkou, Ludwig Bothmann, Bernd Bischl, David Rügamer,	(参考訳) ベイズニューラルネットワークに対するサンプルベース推論(SBI)における大きな課題は、ネットワークのパラメータ空間のサイズと構造である。本研究は, 過パラメータ化とサンプリング問題の難易度を体系的に関連付けることにより, 重量と関数空間の特性的関係を取り入れることにより, SBIを成功させることが可能であることを示す。広範囲な実験を通じて,サンプリングおよび収束診断の実践的ガイドラインを確立する。その結果、競合性能と不確実性定量化に有効な解法として、ディープアンサンブル初期化手法を提案する。 A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a deep ensemble initialized approach as an effective solution with competitive performance and uncertainty quantification.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# 量子イマジナリー時間進化による連結非線形シュレーディンガー方程式の解法 Solving coupled Non-linear Schrödinger Equations via Quantum Imaginary Time Evolution ( http://arxiv.org/abs/2402.01623v2 ) ライセンス: Link先を確認	Yang Hong Li, Jim Al-Khalili, Paul Stevenson,	(参考訳) 結合された非線形Schr\"{o}dinger方程式は多くの粒子系の力学を記述するのに不可欠である。核ハートリー・フォック方程式の場合、そのような方程式の解として量子想像時間進化(ITE)アルゴリズムを提案する。単純化されたスカイム相互作用モデルの下で、酸素-16核の基底状態エネルギーを計算し、その結果が古典的ITTアルゴリズムと一致することを示す。 Coupled non-linear Schr\"{o}dinger equations are crucial in describing dynamics of many particle systems. We present a quantum imaginary time evolution (ITE) algorithm as a solution to such equations in the case of nuclear Hartree-Fock equations. Under a simplified Skyrme interaction model, we calculate the ground state energy of an oxygen-16 nucleus and demonstrate that the result is in agreement with the classical ITE algorithm.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# 大規模言語モデルからニューラルネットワークへの生態的優先順位の注入による人間的なカテゴリー学習 Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks ( http://arxiv.org/abs/2402.01821v2 ) ライセンス: Link先を確認	Akshay K. Jagadish, Julian Coda-Forno, Mirko Thalmann, Eric Schulz, Marcel Binz,	(参考訳) エコロジー的合理性(Ecoological rationality)とは、人間は環境に適応した合理的エージェントである、という概念を指す。しかしながら、この理論を検証することは、どのタスクがエコロジー的に有効かを定義するのが困難であることと、これらのタスクに合理的なモデルを構築することの2つの理由から難しいままである。本研究では,大規模言語モデルが実世界の課題の統計に合致する認知タスク,特にカテゴリ学習タスクを生成できることを示し,最初の課題に対処する。本稿では,これらの課題に適応した有理的エージェントをメタラーニングの枠組みを用いて導き,生態学的に合理的なメタラーニング推論(ERMI)と呼ばれるモデルのクラスに導出する。 ERMIは2つの異なる実験で、人間のデータを他の7つの認知モデルより定量的に説明します。さらに、質的なレベルで人間の行動にマッチする:(1)人間が難しいと感じるのと同じタスクを見つけ、(2)学習でカテゴリーを割り当てる模範的な戦略に頼りやすくなり、(3)人間のような方法で見えない刺激に一般化する。さらに、ERMIの生態学的に有効な事前評価により、OpenML-CC18分類ベンチマークで最先端のパフォーマンスを達成することができることを示す。 Ecological rationality refers to the notion that humans are rational agents adapted to their environment. However, testing this theory remains challenging due to two reasons: the difficulty in defining what tasks are ecologically valid and building rational models for these tasks. In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning tasks, that match the statistics of real-world tasks, thereby addressing the first challenge. We tackle the second challenge by deriving rational agents adapted to these tasks using the framework of meta-learning, leading to a class of models called ecologically rational meta-learned inference (ERMI). ERMI quantitatively explains human data better than seven other cognitive models in two different experiments. It additionally matches human behavior on a qualitative level: (1) it finds the same tasks difficult that humans find difficult, (2) it becomes more reliant on an exemplar-based strategy for assigning categories with learning, and (3) it generalizes to unseen stimuli in a human-like way. Furthermore, we show that ERMI's ecologically valid priors allow it to achieve state-of-the-art performance on the OpenML-CC18 classification benchmark.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# 音声フラミンゴ: 少ないショット学習と対話能力を備えた新しい音声言語モデル Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities ( http://arxiv.org/abs/2402.01831v3 ) ライセンス: Link先を確認	Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro,	(参考訳) LLMの多様な実世界の応用において、音声(非音声音声や非言語音声など)を理解するための大規模言語モデル(LLM)の強化が重要である。本稿では,新しい音声言語モデルであるAudio Flamingoを提案する。 1)強い音声理解能力。 2【文脈内学習・検索による見知らぬ課題に迅速に適応できる能力】 3) 強いマルチターン対話能力。これらの能力でモデルを強化するために、一連のトレーニングテクニック、アーキテクチャ設計、データストラテジーを導入します。各種音声理解タスクの広範囲な評価により,本手法の有効性が確認され,新しい最先端のベンチマークが設定された。私たちのデモウェブサイトはhttps://audioflamingo.github.io/で、コードはhttps://github.com/NVIDIA/audio-flamingo.comでオープンソース化されています。 Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# 弱視から学ぶための一般的なフレームワーク A General Framework for Learning from Weak Supervision ( http://arxiv.org/abs/2402.01922v2 ) ライセンス: Link先を確認	Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj,	(参考訳) 弱い教師付き学習は、様々なシナリオに適用可能な課題に直面している。本稿では、新しいアルゴリズムを用いて、弱監督(GLWS)から学習するための一般的な枠組みを紹介する。 GLWSの中心は期待最大化(EM)の定式化であり、サンプル部分ラベル、集約統計、ペアワイズ観測、ラベルなしデータなど、様々な弱い監督ソースを順調に収容している。さらに,非決定論的有限オートマトン (NFA) とフォワードバックワードアルゴリズムを用いて,EM計算要求を大幅に単純化するアルゴリズムを提案する。したがって、任意の弱監督から学習する問題は、それらのNFAモデリングに変換される。 GLWSは機械学習モデルのスケーラビリティを向上するだけでなく、11の弱い監視シナリオで優れたパフォーマンスと汎用性を示す。この分野でのさらなる進歩と実践的な展開の道を開くことを願っています。 Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# マルチタスクモデル統合のための表現手術 Representation Surgery for Multi-Task Model Merging ( http://arxiv.org/abs/2402.02705v2 ) ライセンス: Link先を確認	Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Tao,	(参考訳) マルチタスク学習(MTL)は、複数のタスクから情報を統一されたバックボーンに圧縮し、計算効率と一般化を改善する。最近の研究は、複数の独立して訓練されたモデルをマージして、共同トレーニングのために生データを収集する代わりにMTLを実行することで、MTLの応用シナリオを大幅に拡張している。しかし、既存のモデルマージスキームの表現分布を可視化することにより、マージモデルはしばしば表現バイアスのジレンマに悩まされる。つまり、マージされたモデルと個々のモデルの表現分布には大きな違いがあり、結果としてマージされたMTLの性能は低下する。本稿では,統合モデルにおける表現バイアスを低減するために,Surgeryと呼ばれる表現手術ソリューションを提案する。特に、手術は、マージされたモデルの表現を入力とし、マージされたモデルから表現に含まれるバイアスを出力しようとする軽量なタスク固有モジュールである。そこで我々は,統合モデルの表現と個々のモデルの表現との距離を最小化し,手術モジュールを更新する教師なし最適化目標を設計した。手術モジュールをSOTA(State-of-the-art Model merging scheme)に適用した場合のMTL性能は有意に向上した。 Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias. That is, there is a significant discrepancy in the representation distribution between the merged and individual models, resulting in poor performance of merged MTL. In this paper, we propose a representation surgery solution called "Surgery" to reduce representation bias in the merged model. Specifically, Surgery is a lightweight task-specific module that takes the representation of the merged model as input and attempts to output the biases contained in the representation from the merged model. We then designed an unsupervised optimization objective that updates the Surgery module by minimizing the distance between the merged model's representation and the individual model's representation. Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# Toon Aging: アート・ポートレート・スタイルのトランスファーで顔の再老化 ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer ( http://arxiv.org/abs/2402.02733v4 ) ライセンス: Link先を確認	Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo,	(参考訳) 顔の再描画はコンピュータビジョンとグラフィックスにおいて顕著な分野であり、映画、広告、ライブストリーミングといったフォトリアリスティックな領域で重要な応用がある。近年,漫画やイラスト,アニメーションといったノンフォトリアリスティックなイメージに顔のリエイジを適用する必要性が,様々なエンターテイメント分野の延長として現れている。しかし、NPR画像の見かけの年齢をシームレスに編集できるネットワークがないため、これらのタスクは単純でシーケンシャルなアプローチに制限されている。これはしばしば、ドメインの相違による不快なアーティファクトと顔の特徴の喪失をもたらす。本稿では,1つの生成ステップで実行される顔のリエイジングとポートレートスタイルのトランスファーを組み合わせた,新しい1段階の顔のリエイジング手法を提案する。同じPRドメイン内でトレーニングされた既存の顔のリエイジとスタイル転送ネットワークを活用します。本手法は, 老化関連属性とNPRの出現を管理するために, それぞれ異なる潜伏ベクトルを融合させる。模範的なアプローチを採用することで、通常、各ドメインに対して個別のトレーニングや微調整を必要とするドメインレベルの微調整アプローチに比べて、柔軟性が向上する。これは、再使用のためのペアデータセットと、スタイリングのためのドメインレベルのデータ駆動アプローチの制限に効果的に対処する。実験により,本モデルでは,自然の外観と可制御性の両方を維持しつつ,サンプルのスタイルを同時に転送しながら,再老化画像を生成することが可能であることが確認された。 Face re-aging is a prominent field in computer vision and graphics, with significant applications in photorealistic domains such as movies, advertising, and live streaming. Recently, the need to apply face re-aging to non-photorealistic images, like comics, illustrations, and animations, has emerged as an extension in various entertainment sectors. However, the lack of a network that can seamlessly edit the apparent age in NPR images has limited these tasks to a naive, sequential approach. This often results in unpleasant artifacts and a loss of facial attributes due to domain discrepancies. In this paper, we introduce a novel one-stage method for face re-aging combined with portrait style transfer, executed in a single generative step. We leverage existing face re-aging and style transfer networks, both trained within the same PR domain. Our method uniquely fuses distinct latent vectors, each responsible for managing aging-related attributes and NPR appearance. By adopting an exemplar-based approach, our method offers greater flexibility compared to domain-level fine-tuning approaches, which typically require separate training or fine-tuning for each domain. This effectively addresses the limitation of requiring paired datasets for re-aging and domain-level, data-driven approaches for stylization. Our experiments show that our model can effortlessly generate re-aged images while simultaneously transferring the style of examples, maintaining both natural appearance and controllability.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# InterpretCC: 専門家のグローバルな混合による内在的ユーザ中心の解釈可能性 InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts ( http://arxiv.org/abs/2402.02933v2 ) ライセンス: Link先を確認	Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser,	(参考訳) ニューラルネットワークの解釈可能性とは,3つの重要な要件のトレードオフである。 1)説明の忠実さ(すなわち、それが予測をいかに完璧に説明しているか) 2)人間による説明の理解可能性,及び 3)モデル性能。例えば、ポストホックなアプローチは、限定された忠実さを提供し、機能マスクの妥協の可否を自動的に識別し、決定木のような本質的に解釈可能なメソッドはモデルのパフォーマンスを制限します。これらの欠点は、信頼できる説明、実行可能な解釈、正確な予測を必要とする教育や医療のようなセンシティブな応用には受け入れられない。本研究では,人間中心の解釈可能性を保証する解釈可能なニューラルネットワークのファミリであるInterpretCC(解釈条件計算)を提案する。我々は、このアイデアを解釈可能なグローバル・ミックス・オブ・エキスパート(MoE)モデルに拡張し、人間が興味のあるトピックを指定できるようにし、各データポイントの特徴空間をトピックのサブネットに個別に分離し、これらのトピックのサブネットを適応的かつ疎結合にアクティベートして予測する。本研究では,テキスト,時系列,表形式のデータに対するInterpretCCアーキテクチャのバリエーションを適用し,非解釈可能なベースラインと同等の性能を示し,解釈可能な設計ベースラインよりも優れた性能を示し,ユーザ調査により高い動作性と有用性を示す。 Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified feature masks compromise understandability, and intrinsically interpretable methods such as decision trees limit model performance. These shortcomings are unacceptable for sensitive applications such as education and healthcare, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability, while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply variations of the InterpretCC architecture for text, time series and tabular data across several real-world benchmarks, demonstrating comparable performance with non-interpretable baselines, outperforming interpretable-by-design baselines, and showing higher actionability and usefulness according to a user study.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# 事前学習型パラダイムにおけるクロスタスク線形性の出現について On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm ( http://arxiv.org/abs/2402.03660v2 ) ライセンス: Link先を確認	Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan,	(参考訳) プレトレーニング・ファインタニングのパラダイムは、現代のディープラーニングの主流となっている。本研究では,共通の事前学習されたチェックポイントから初期化され,CTL(Cross-Task Linearity)と呼ばれるさまざまなタスクに微調整されたモデルにおいて,興味深い線形現象を発見する。具体的には、2つの微調整モデルの重みを線形に補間すると、重み補間モデルの特徴は各層における2つの微調整モデルの特徴の線形補間とほぼ等しいことが示される。我々は、CTLが、同じ事前訓練されたチェックポイントから始まる微調整モデルに対して一貫して発生することを裏付ける包括的な実証的証拠を提供する。プレトレーニング-ファインタニングのパラダイムでは、ニューラルネットワークは、パラメータ空間から特徴空間への写像である線形写像として概ね機能する。この観点から,本研究では,モデルマージ/編集について,特にパラメータ空間から特徴空間へ操作を変換することによって,新たな知見を提示する。さらに,CTLの出現の根本原因を深く掘り下げ,事前学習の役割を強調した。 The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.	翻訳日:2024-05-30 03:38:05 公開日:2024-05-28
# インボディードAIへの呼びかけ A call for embodied AI ( http://arxiv.org/abs/2402.03824v2 ) ライセンス: Link先を確認	Giuseppe Paolo, Jonas Gonzalez-Billandon, Balázs Kégl,	(参考訳) 我々は、人工知能の追求における次の基本的なステップとして、Embodied AIを提案する。我々は、哲学、心理学、神経科学、ロボティクスといった様々な分野にまたがるエンボディメントの概念の進化を横切り、EAIが静的学習の古典的パラダイムとどのように区別するかを強調する。 Embodied AIの範囲を広げることで、認知アーキテクチャに基づいた理論的枠組みを導入し、認知、行動、記憶、学習をエンボディエージェントの本質的な構成要素として強調する。このフレームワークはFristonのアクティブな推論原則と一致しており、EAI開発に対する包括的なアプローチを提供する。 AIの分野での進歩にもかかわらず、新しいAI学習理論の定式化や高度なハードウェアの革新といった大きな課題が続いている。私たちの議論は、将来のEmbodied AI研究の基礎となるガイドラインを概説している。現実の環境における人間や他の知的なエンティティとのシームレスなコミュニケーション、コラボレーション、共存が可能なエンボダイドAIエージェントを作成することの重要性を強調し、我々はAIコミュニティを、多面的な課題に対処し、AGIの探求に先立つ機会をつかむことを目指しています。 We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# 信念のシーングラフ:期待の計算による部分的なシーンの拡張 Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation ( http://arxiv.org/abs/2402.03840v2 ) ライセンス: Link先を確認	Mario A. V. Saucedo, Akash Patel, Akshit Saradagi, Christoforos Kanellakis, George Nikolakopoulos,	(参考訳) 本稿では,部分的な情報を用いた効率的なハイレベルタスク計画を可能にする,部分的な3次元シーングラフのユーティリティ駆動拡張であるBelief Scene Graphsの概念を提案する。本稿では,任意の3次元シーングラフ上での信念の計算(期待と呼ばれる)のためのグラフベースの学習手法を提案する。本稿では,学習データからヒストグラムを学習し,相関情報に基づく予測予測計算手法を提案する。 3次元シーングラフのレポジトリからCECIを学ぶために,新しいグラフ畳み込みニューラルネットワーク(GCN)モデルを開発した。新たなCECIモデルのトレーニングには3Dシーングラフのデータベースが存在しないため,意味的に注釈付けされた実生活3D空間をベースとした3Dシーングラフデータセットを生成するための新しい手法を提案する。生成されたデータセットを用いて提案したCECIモデルをトレーニングし,提案手法の広範な検証を行う。我々は、期待を抽象表現に統合するためのコアコンポーネントとして、新しい概念である『textit{Belief Scene Graphs}』(BSG)を確立した。この新しいコンセプトは、従来の3Dシーングラフの概念の進化であり、さまざまなロボティクスミッションのタスク計画と最適化のための高レベルの推論を可能にすることを目的としている。全体フレームワークの有効性は、対象探索シナリオで評価され、また、人間の目に見えない物体の常識をエミュレートする実生活実験でもテストされている。実験デモのビデオについては、以下のリンクを参照してください。 In this article, we propose the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information. We propose a graph-based learning methodology for the computation of belief (also referred to as expectation) on any given 3D scene graph, which is then used to strategically add new nodes (referred to as blind nodes) that are relevant to a robotic mission. We propose the method of Computation of Expectation based on Correlation Information (CECI), to reasonably approximate real Belief/Expectation, by learning histograms from available training data. A novel Graph Convolutional Neural Network (GCN) model is developed, to learn CECI from a repository of 3D scene graphs. As no database of 3D scene graphs exists for the training of the novel CECI model, we present a novel methodology for generating a 3D scene graph dataset based on semantically annotated real-life 3D spaces. The generated dataset is then utilized to train the proposed CECI model and for extensive validation of the proposed method. We establish the novel concept of \textit{Belief Scene Graphs} (BSG), as a core component to integrate expectations into abstract representations. This new concept is an evolution of the classical 3D scene graph concept and aims to enable high-level reasoning for task planning and optimization of a variety of robotics missions. The efficacy of the overall framework has been evaluated in an object search scenario, and has also been tested in a real-life experiment to emulate human common sense of unseen-objects. For a video of the article, showcasing the experimental demonstration, please refer to the following link: https://youtu.be/hsGlSCa12iY	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# REBORN: 教師なしASRの反復訓練による強化学習境界セグメンテーション REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR ( http://arxiv.org/abs/2402.03988v2 ) ライセンス: Link先を確認	Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun,	(参考訳) 教師なし自動音声認識(ASR)は、ペア音声テキストデータの監督なしに、音声信号とその対応するテキスト書き起こしのマッピングを学習することを目的としている。音声信号中の単語/音素は、可変長と未知境界を持つ音声信号のセグメントで表現され、このセグメント構造は、特にペア化されたデータなしで、音声とテキストのマッピングを困難なものにする。本稿では,Reinforcement-Learned boundary Segmentation with Iterative Training for Unsupervised ASRを提案する。 ReBORNは、(1)音声信号におけるセグメント構造の境界を予測するセグメント化モデルを訓練し、(2)セグメント化モデルによってセグメント化された音声特徴を入力とする音素予測モデルを訓練し、音素転写を予測する。セグメンテーションモデルを訓練するための教師付きデータが入手できないため、強化学習を用いてセグメンテーションモデルを訓練し、低いパープレキシティで音素列予測をもたらすセグメンテーションを選択する。我々は、広範囲にわたる実験を行い、同じ条件下で、REBORNは、LibriSpeech、TIMIT、および5つの非英語言語において、以前の教師なしASRモデルよりも優れていたことを発見した。我々は、REBORNが学習した境界が教師なしのASR性能を改善する理由を包括的に分析する。 Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# InfLLM: 効率的な文脈記憶を持つLLMのための学習不要な長期外挿法 InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory ( http://arxiv.org/abs/2402.04617v2 ) ライセンス: Link先を確認	Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun,	(参考訳) 大規模言語モデル(LLM)は、長いストリーミング入力を持つ現実世界のアプリケーション(例えば、LLM駆動エージェント)の基盤として登場した。しかし、制限された最大長のシーケンスで事前訓練された既存のLLMでは、ドメイン外および乱れの問題により、長いシーケンスを処理できない。一般的なソリューションは、長いシーケンスで連続的な事前トレーニングを伴い、高価な計算オーバーヘッドと制御不能なモデル機能の変化をもたらす。本稿では,極長列を微調整せずに理解するためのLLMの本質的な能力を明らかにする。そこで本研究では,トレーニング不要なメモリベースのInfLLMを提案する。特に、InfLLMは、遠隔コンテキストを追加のメモリ単位に格納し、注意計算のためにトークン関連ユニットを検索する効率的なメカニズムを用いる。これにより、InfLLMはLLMがコンテキストウィンドウに制限された長いシーケンスを効率的に処理し、長距離依存関係を適切にキャプチャできる。トレーニングなしでは、InfLLMは数千のトークンからなるシーケンスで事前トレーニングされたLLMを、長いシーケンスでこれらのLLMを継続的にトレーニングする競合ベースラインで同等のパフォーマンスを達成することができる。シーケンス長が$1,024$Kにスケールしても、InfLLMは依然として、長距離依存関係を効果的にキャプチャする。我々のコードは \url{https://github.com/thunlp/InfLLM} にある。 Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences with a limited context window and well capture long-distance dependencies. Without any training, InfLLM enables LLMs that are pre-trained on sequences consisting of a few thousand tokens to achieve comparable performance with competitive baselines that continually train these LLMs on long sequences. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies. Our code can be found in \url{https://github.com/thunlp/InfLLM}.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# 潜時計画変換器:潜時変数推論としての計画 Latent Plan Transformer: Planning as Latent Variable Inference ( http://arxiv.org/abs/2402.04647v2 ) ライセンス: Link先を確認	Deqian Kong, Dehong Xu, Minglu Zhao, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie, Ying Nian Wu,	(参考訳) 長期的なリターンを目指すタスクにおいては、計画が不可欠である。オフライン強化学習から得られたデータセットを用いた計画のための生成モデルについて検討する。具体的には、段階的な報酬がない場合の時間的一貫性を重要な技術的課題として挙げる。本稿では,Transformerベースのトラジェクトリジェネレータと最終リターンを接続するために,遅延空間を利用する新しいモデルであるLatent Plan Transformer(LPT)を紹介する。 LPTはトラジェクティブ-リターンペアの最大推定値で学習することができる。学習において、潜在変数の後方サンプリングは、有限コンテキストにもかかわらず、自然にサブトラジェクトリを統合して一貫した抽象化を形成する。テスト時には、遅延変数はポリシー実行前の期待した戻り値から推論され、計画のアイデアを推論として実現します。 Gym-Mujoco, Franka Kitchen, Maze2D, Connect Four など,複数のベンチマークで競合性能を達成し, 最適軌道からの精度向上を実証した。微妙なクレジット割り当て、軌道縫合、環境問題への適応の能力を示す。これらの結果は、潜伏変数推論がステップワイズ報酬プロンプトの強力な代替となることを証明している。 In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent space to connect a Transformer-based trajectory generator and the final return. LPT can be learned with maximum likelihood estimation on trajectory-return pairs. In learning, posterior sampling of the latent variable naturally integrates sub-trajectories to form a consistent abstraction despite the finite context. At test time, the latent variable is inferred from an expected return before policy execution, realizing the idea of planning as inference. Our experiments demonstrate that LPT can discover improved decisions from suboptimal trajectories, achieving competitive performance across several benchmarks, including Gym-Mujoco, Franka Kitchen, Maze2D, and Connect Four. It exhibits capabilities in nuanced credit assignments, trajectory stitching, and adaptation to environmental contingencies. These results validate that latent variable inference can be a strong alternative to step-wise reward prompting.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# Pseudo-labellingは、雑音のある部分的なラベル学習のためのラベル平滑化に遭遇する Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning ( http://arxiv.org/abs/2402.04835v2 ) ライセンス: Link先を確認	Darshana Saravanan, Naresh Manwani, Vineet Gandhi,	(参考訳) 部分ラベル学習(Partial label learning、PLL)は、各トレーニングインスタンスが、真のラベルである候補ラベル(partial label)のセットとペアリングされる弱い教師付き学習パラダイムである。ノイズPLL(NPLL)はこの制約を緩和し、一部の部分ラベルが真のラベルを含まないようにし、問題の実用性を高める。本研究はNPLLを中心とし,近傍の重み付けアルゴリズムを用いて雑音のある部分ラベルを利用して,まず画像に擬似ラベルを割り当てる最小限のフレームワークを提案する。これらの擬似ラベルとイメージペアは、ラベルスムーズなディープニューラルネットワーク分類器のトレーニングに使用される。分類器の特徴と予測はその後、擬似ラベルの精度を洗練・向上するために使用される。 7つのデータセットについて徹底的な実験を行い,9つのNPLL法とPLL法との比較を行った。先行研究から得られたすべての研究結果から, 詳細な分類や極端な騒音シナリオにおいて, かなりの利得を得ることができた。さらに、現実的なクラウドソースデータセットにおいて、我々のフレームワークの有望な一般化能力を示す。 Partial label learning (PLL) is a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label), one of which is the true label. Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem. Our work centres on NPLL and presents a minimalistic framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm. These pseudo-label and image pairs are then used to train a deep neural network classifier with label smoothing. The classifier's features and predictions are subsequently employed to refine and enhance the accuracy of pseudo-labels. We perform thorough experiments on seven datasets and compare against nine NPLL and PLL methods. We achieve state-of-the-art results in all studied settings from the prior literature, obtaining substantial gains in fine-grained classification and extreme noise scenarios. Further, we show the promising generalisation capability of our framework in realistic crowd-sourced datasets.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# インストラクション・チューニングの限界について A Closer Look at the Limitations of Instruction Tuning ( http://arxiv.org/abs/2402.05119v4 ) ライセンス: Link先を確認	Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha,	(参考訳) 命令応答ペアを用いた大規模言語モデル(LLM)の訓練プロセスであるインストラクションチューニング(IT)が,ベースとなる事前学習されたLLMをオープンドメインの会話エージェントに変換する主要な方法として登場した。 ITは目覚ましい成功を収め、広く採用されているが、その限界と欠点は未解決のままである。本稿では、厳密な実験と、LLMがITを通して行っている変化の詳細な分析を通して、ITの様々な限界を明らかにする。特に,1)LLMにおける知識や技能の向上に失敗する。 LoRAファインチューニングは学習応答開始とスタイルトークンに限られており、フルパラメータのファインチューニングは知識の劣化につながる。 2)知識ソースから派生したITデータセットからの応答パターンのコピーは,応答品質の低下につながる。 (3)全パラメータの微調整は,ITデータセット内の概念的に類似したインスタンスからトークンを不正確な借用によって幻覚を増大させ,応答を生成する。 (4) IT 改善のための一般的な手法は,シンプルな LoRA 微調整モデルよりも性能改善につながるものではない。この結果から,事前学習した知識のみから生成した応答は,オープンソースデータセット上でITから新たな知識を学習するモデルによって,一貫した応答性能が向上することが判明した。この論文で明らかになった洞察と課題が、今後の研究を関連する方向に促すことを願っています。 Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# 変圧器におけるインダクティブバイアスの理解に向けて:インフィニティの視点から Towards Understanding Inductive Bias in Transformers: A View From Infinity ( http://arxiv.org/abs/2402.05173v2 ) ライセンス: Link先を確認	Itay Lavie, Guy Gur-Ari, Zohar Ringel,	(参考訳) 無限に過度にパラメータ化されたガウス過程の極限における変圧器の帰納バイアスについて検討し、変圧器は列空間のより置換対称関数に偏りを持つ傾向があると主張している。対称群の表現論は、データセットがトークン間の置換に対称であるときに定量的な解析的予測を与えることができることを示す。本稿では,学習曲線とネットワーク出力の正確な予測を含む,簡易な変圧器ブロックを提案し,その限界でモデルを解く。一般的な設定では、文脈長の関数として学習可能性のスケーリング法則の形で厳密な境界を導出できることが示される。最後に、WikiTextデータセットは、実際に置換対称性の程度を持っていると論じる。 We study inductive bias in Transformers in the infinitely over-parameterized Gaussian process limit and argue transformers tend to be biased towards more permutation symmetric functions in sequence space. We show that the representation theory of the symmetric group can be used to give quantitative analytical predictions when the dataset is symmetric to permutations between tokens. We present a simplified transformer block and solve the model at the limit, including accurate predictions for the learning curves and network outputs. We show that in common setups, one can derive tight bounds in the form of a scaling law for the learnability as a function of the context length. Finally, we argue WikiText dataset, does indeed possess a degree of permutation symmetry.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# MusicMagus: 拡散モデルによるゼロショットテキスト音楽編集 MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models ( http://arxiv.org/abs/2402.06178v3 ) ライセンス: Link先を確認	Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon,	(参考訳) テキストから音楽への生成モデルの最近の進歩は、音楽の創造性に新たな道を開いた。しかし、音楽生成は通常反復的な洗練が伴い、生成した音楽の編集方法が重要な課題である。本稿では,このようなモデルが生成する楽曲の編集に新たなアプローチを導入し,ジャンルやムード,楽器などの特定の属性の修正を可能とし,他の側面をそのままに維持する。そこで本手法では,テキスト編集を‘textit{latent space manipulate}’に変換するとともに,一貫性を強制するための制約を追加する。既存の事前訓練されたテキストから音楽への拡散モデルとシームレスに統合する。実験により, ゼロショットと特定の教師付きベースラインの双方に対して, スタイルおよび音色伝達評価において優れた性能を示した。さらに,実際の音楽編集シナリオにおいて,本手法の実用性を示す。 Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# フェデレーションドメイン一般化のためのハイパーネットワーク駆動モデル融合 Hypernetwork-Driven Model Fusion for Federated Domain Generalization ( http://arxiv.org/abs/2402.06974v3 ) ライセンス: Link先を確認	Marc Bartholet, Taehyeon Kim, Ami Beuret, Se-Young Yun, Joachim M. Buhmann,	(参考訳) フェデレートラーニング(FL)は、不均一なデータのドメインシフト、パフォーマンスの低下で大きな課題に直面します。伝統的なドメイン一般化は、ドメイン不変の特徴を学習することを目的としているが、モデル平均化の連合性はしばしば、局所的な学習の線形集約のためにこれを制限している。これを解決するために、ハイパーネットワークベースのFederated Fusion (hFedF) と呼ばれる堅牢なフレームワークを提案する。本手法では,ドメインの一般化を効果的に管理するために,クライアント固有の埋め込みと勾配アライメント手法を用いる。ゼロショット設定と少数ショット設定の両方で評価され、hFedFはドメインシフトを処理する上で優れたパフォーマンスを示している。 PACS、Office-Home、VLCSデータセットの総合的な比較では、hFedFは信頼性の高い予測によって、ドメイン内およびドメイン外の最高精度を一貫して達成している。本研究は、FDG(Federated Domain Generalization)の未調査分野に大きく貢献し、この分野におけるパフォーマンスの新たなベンチマークを設定した。 Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (hFedF), using hypernetworks for non-linear aggregation, facilitating generalization to unseen domains. Our method employs client-specific embeddings and gradient alignment techniques to manage domain generalization effectively. Evaluated in both zero-shot and few-shot settings, hFedF demonstrates superior performance in handling domain shifts. Comprehensive comparisons on PACS, Office-Home, and VLCS datasets show that hFedF consistently achieves the highest in-domain and out-of-domain accuracy with reliable predictions. Our study contributes significantly to the under-explored field of Federated Domain Generalization (FDG), setting a new benchmark for performance in this area.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# Weisfeiler-Leman氏:もっと表現力が重要になるとき Weisfeiler-Leman at the margin: When more expressivity matters ( http://arxiv.org/abs/2402.07568v2 ) ライセンス: Link先を確認	Billy J. Franks, Christopher Morris, Ameya Velingker, Floris Geerts,	(参考訳) Weisfeiler-Lemanアルゴリズム(1$-WL)はグラフ同型問題に対するよく研究されたヒューリスティックである。近年,このアルゴリズムは,メッセージパスグラフニューラルネットワーク(MPNN)の表現力を理解し,グラフカーネルとして有効である。その成功にもかかわらず、1ドルWLは非同型グラフを区別する問題に直面し、より表現力のあるMPNNとカーネルアーキテクチャの開発に繋がる。しかし,表現性向上と一般化性能向上の関係はいまだ不明である。ここでは、アーキテクチャの表現性は、グラフ同型を通して見るときの一般化性能に関する限られた洞察を与えることを示す。さらに,アーキテクチャの表現性向上と一般化性能の向上を両立させるため,サブグラフ情報を用いた1ドルWLとMPNNの強化に焦点をあて,古典的マージン理論を用いて検討を行った。さらに, 勾配流がMPNNの重み付けを最大限界解へ押し上げることを示す。さらに,表現力のある1ドルWLベースのカーネルとMPNNアーキテクチャと,証明可能な一般化特性を導入したMPNNアーキテクチャを導入する。我々の実証研究は、我々の理論的な発見の妥当性を確認している。 The Weisfeiler-Leman algorithm ($1$-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, $1$-WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of more expressive MPNN and kernel architectures. However, the relationship between enhanced expressivity and improved generalization performance remains unclear. Here, we show that an architecture's expressivity offers limited insights into its generalization performance when viewed through graph isomorphism. Moreover, we focus on augmenting $1$-WL and MPNNs with subgraph information and employ classical margin theory to investigate the conditions under which an architecture's increased expressivity aligns with improved generalization performance. In addition, we show that gradient flow pushes the MPNN's weights toward the maximum margin solution. Further, we introduce variations of expressive $1$-WL-based kernel and MPNN architectures with provable generalization properties. Our empirical study confirms the validity of our theoretical findings.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# TELLER: 説明可能な、一般化可能な、制御可能なフェイクニュース検出のための信頼できるフレームワーク TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection ( http://arxiv.org/abs/2402.07776v2 ) ライセンス: Link先を確認	Hui Liu, Wenya Wang, Haoru Li, Haoliang Li,	(参考訳) 偽ニュースの拡散は深刻な社会問題として現れ、産業や学界から大きな関心を集めている。既存のディープラーニングに基づく手法では、偽ニュースの正確な検出が進んでいるが、その信頼性は、透明でない推論プロセス、一般化能力の低下、大型言語モデル(LLM)との統合の固有のリスクによって損なわれる可能性がある。この課題に対処するために、モデルの説明可能性、一般化可能性、制御性を優先する信頼に値する偽ニュース検出のための新しいフレームワークである {\methodname} を提案する。これは、認知と意思決定システムを統合するデュアルシステムフレームワークを通じて実現され、上記の原則に準拠している。認知システムは人間の専門知識を活用して論理述語を生成する。一方、決定システムは、これらの原子を集約する一般化可能な論理則を導出し、様々な領域にわたる入力ニュースの真偽を識別し、意思決定プロセスにおける透明性を高める。最後に、4つのデータセットに対する総合的な評価結果を示し、提案フレームワークの有効性と信頼性を示す。我々の実装は \url{https://github.com/less-and-less-bugs/Trust_TELLER} で利用可能です。 The proliferation of fake news has emerged as a severe societal problem, raising significant interest from industry and academia. While existing deep-learning based methods have made progress in detecting fake news accurately, their reliability may be compromised caused by the non-transparent reasoning processes, poor generalization abilities and inherent risks of integration with large language models (LLMs). To address this challenge, we propose {\methodname}, a novel framework for trustworthy fake news detection that prioritizes explainability, generalizability and controllability of models. This is achieved via a dual-system framework that integrates cognition and decision systems, adhering to the principles above. The cognition system harnesses human expertise to generate logical predicates, which guide LLMs in generating human-readable logic atoms. Meanwhile, the decision system deduces generalizable logic rules to aggregate these atoms, enabling the identification of the truthfulness of the input news across diverse domains and enhancing transparency in the decision-making process. Finally, we present comprehensive evaluation results on four datasets, demonstrating the feasibility and trustworthiness of our proposed framework. Our implementation is available at \url{https://github.com/less-and-less-bugs/Trust_TELLER}.	翻訳日:2024-05-30 03:28:21 公開日:2024-05-28
# 平均場 min-max 問題に対するミラーDescent-Ascent Mirror Descent-Ascent for mean-field min-max problems ( http://arxiv.org/abs/2402.08106v2 ) ライセンス: Link先を確認	Razvan-Andrei Lascu, Mateusz B. Majka, Łukasz Szpruch,	(参考訳) 本研究では,測度空間上のmin-max問題を同時および逐次的に解くために,ミラー降下指数アルゴリズムの2つの変種について検討する。我々は、平坦微分による測度空間上で定義される適切なブレグマン発散に対して、凸凸凸とペイオフ関数の相対滑らかさの仮定の下で研究する。ニカイド・オ・イソダ誤差で測定された混合ナッシュ平衡への収束速度は、連立スキームと逐次スキームに対してそれぞれ$\mathcal{O}\left(N^{-1/2}\right)$と$\mathcal{O}\left(N^{-2/3}\right)$である。 We study two variants of the mirror descent-ascent algorithm for solving min-max problems on the space of measures: simultaneous and sequential. We work under assumptions of convexity-concavity and relative smoothness of the payoff function with respect to a suitable Bregman divergence, defined on the space of measures via flat derivatives. We show that the convergence rates to mixed Nash equilibria, measured in the Nikaid\`o-Isoda error, are of order $\mathcal{O}\left(N^{-1/2}\right)$ and $\mathcal{O}\left(N^{-2/3}\right)$ for the simultaneous and sequential schemes, respectively, which is in line with the state-of-the-art results for related finite-dimensional algorithms.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# The COLOSSEUM: ロボットマニピュレーションの一般化評価ベンチマーク THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation ( http://arxiv.org/abs/2402.08191v2 ) ライセンス: Link先を確認	Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox,	(参考訳) 大規模で現実的なロボット応用を実現するためには,ロボットのポリシーが環境条件の変化にどの程度適応しているかを評価する必要がある。残念なことに、ほとんどの研究は、トレーニング環境と近いか、あるいは同一の環境におけるロボットのパフォーマンスを評価している。本稿では,環境摂動の14軸にわたるモデルの系統的評価を可能にする,20種類の操作タスクを備えた新しいシミュレーションベンチマークであるThe COLOSSEUMを提案する。これらの摂動には、色、テクスチャ、オブジェクト、テーブルトップ、背景の大きさの変化が含まれます。 The COLOSSEUMを用いて、5つの最先端操作モデルを比較し、その成功率がこれらの摂動因子で30～50%低下することを明らかにする。複数の摂動が一斉に適用されると、成功率は$\geq$75%低下する。対象物や対象物の色,照明条件の変化が,モデル性能を最も低下させる摂動であることを確認した。実験結果の生態学的妥当性を検証するため,シミュレーションの結果は実世界の同様の摂動と相関している(\bar{R}^2 = 0.614$)。我々は、他者がCOLOSSEUMを使用するためのソースコードを公開し、現実世界の摂動を再現するために使用されるオブジェクトを3Dプリントするコードをリリースする。最終的には、COLOSSEUMが、操作の一般化を体系的に改善するモデリング決定を識別するためのベンチマークとして機能することを願っている。詳細はhttps://robot-colosseum.github.io/を参照。 To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enables systematical evaluation of models across 14 axes of environmental perturbations. These perturbations include changes in color, texture, and size of objects, table-tops, and backgrounds; we also vary lighting, distractors, physical properties perturbations and camera pose. Using THE COLOSSEUM, we compare 5 state-of-the-art manipulation models to reveal that their success rate degrades between 30-50% across these perturbation factors. When multiple perturbations are applied in unison, the success rate degrades $\geq$75%. We identify that changing the number of distractor objects, target object color, or lighting conditions are the perturbations that reduce model performance the most. To verify the ecological validity of our results, we show that our results in simulation are correlated ($\bar{R}^2 = 0.614$) to similar perturbations in real-world experiments. We open source code for others to use THE COLOSSEUM, and also release code to 3D print the objects used to replicate the real-world perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark to identify modeling decisions that systematically improve generalization for manipulation. See https://robot-colosseum.github.io/ for more details.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# BBox-Adapter: ブラックボックス大言語モデルの軽量適応 BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models ( http://arxiv.org/abs/2402.08219v2 ) ライセンス: Link先を確認	Haotian Sun, Yuchen Zhuang, Wei Wei, Chao Zhang, Bo Dai,	(参考訳) GPT-4やGeminiのような最先端の大規模言語モデル(LLM)を特定のタスクに適用することは困難である。パラメータの不透明さ、埋め込み、出力確率などにより、既存の微調整適応法は適用できない。したがって、これらのブラックボックス LLM の適用は、API サービスを通じてのみ可能であり、透明性、プライバシ、コストに関する懸念を提起する。これらの課題に対処するために、ブラックボックスLLM用の新しい軽量アダプタであるBBox-Adapterを紹介する。 BBox-Adapterは、ターゲットデータを正、ソースデータを負として扱うことにより、ターゲットとソースのドメインデータを区別する。ランキングベースのノイズコントラスト推定(NCE)損失を使用して、ソースドメインのデータをペナルティ化しながら、ターゲットドメインデータの可能性を促進する。さらに、グラウンドトゥルース、人間、AIフィードバックからリアルタイムのポジティブデータをサンプリングするオンライン適応機構と、以前の適応からのネガティブデータを組み込んだオンライン適応機構も備えている。大規模な実験では、BBox-Adapterの有効性とコスト効率が示されている。様々なタスクとドメインでモデル性能を最大6.77%改善し、トレーニングコストと推論コストをそれぞれ31.30倍と1.84倍に削減する。 Adapting state-of-the-art Large Language Models (LLMs) like GPT-4 and Gemini for specific tasks is challenging. Due to the opacity in their parameters, embeddings, and even output probabilities, existing fine-tuning adaptation methods are inapplicable. Consequently, adapting these black-box LLMs is only possible through their API services, raising concerns about transparency, privacy, and cost. To address these challenges, we introduce BBox-Adapter, a novel lightweight adapter for black-box LLMs. BBox-Adapter distinguishes target and source domain data by treating target data as positive and source data as negative. It employs a ranking-based Noise Contrastive Estimation (NCE) loss to promote the likelihood of target domain data while penalizing that of the source domain. Furthermore, it features an online adaptation mechanism, which incorporates real-time positive data sampling from ground-truth, human, or AI feedback, coupled with negative data from previous adaptations. Extensive experiments demonstrate BBox-Adapter's effectiveness and cost efficiency. It improves model performance by up to 6.77% across diverse tasks and domains, while reducing training and inference costs by 31.30x and 1.84x, respectively.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# Subgraphormer: グラフプロダクトによるサブグラフGNNとグラフトランスフォーマーの統合 Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products ( http://arxiv.org/abs/2402.08450v2 ) ライセンス: Link先を確認	Guy Bar-Shalom, Beatrice Bevilacqua, Haggai Maron,	(参考訳) Graph Neural Networks(GNN)の領域では、最近、Subgraph GNNとGraph Transformersという2つのエキサイティングな研究方向が現れた。本稿では,拡張表現力,メッセージパッシング機構,およびサブグラフGNNのアグリゲーションスキームを,グラフトランスフォーマーにおける最も重要なコンポーネントである注目および位置エンコーディングと組み合わせた,Subgraphormerと呼ばれる2つのアプローチを統合するアーキテクチャを提案する。提案手法は,サブグラフGNNと製品グラフとの間の興味深い新たな接続をベースとして,グラフの製品上で動作しているメッセージパッシングニューラルネットワーク(MPNN)として,サブグラフGNNを定式化できることを示唆する。まず、製品グラフの接続性に基づいた注意機構を考案します。次に,提案手法は,積グラフの位置エンコーディングであるサブグラフGNNに対して,新しい,効率的な位置エンコーディング方式を提案する。実験の結果,幅広いデータセット上で,Subgraph GNNとGraph Transformerの双方に対して,大幅な性能向上が得られた。 In the realm of Graph Neural Networks (GNNs), two exciting research directions have recently emerged: Subgraph GNNs and Graph Transformers. In this paper, we propose an architecture that integrates both approaches, dubbed Subgraphormer, which combines the enhanced expressive power, message-passing mechanisms, and aggregation schemes from Subgraph GNNs with attention and positional encodings, arguably the most important components in Graph Transformers. Our method is based on an intriguing new connection we reveal between Subgraph GNNs and product graphs, suggesting that Subgraph GNNs can be formulated as Message Passing Neural Networks (MPNNs) operating on a product of the graph with itself. We use this formulation to design our architecture: first, we devise an attention mechanism based on the connectivity of the product graph. Following this, we propose a novel and efficient positional encoding scheme for Subgraph GNNs, which we derive as a positional encoding for the product graph. Our experimental results demonstrate significant performance improvements over both Subgraph GNNs and Graph Transformers on a wide range of datasets.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# 大規模言語モデルによる推論における前提順序事項 Premise Order Matters in Reasoning with Large Language Models ( http://arxiv.org/abs/2402.08939v3 ) ライセンス: Link先を確認	Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou,	(参考訳) 大規模言語モデル(LLM)は、様々な領域において顕著な推論性能を達成している。しかし、推論タスクの領域では、私たちは不安定さを発見します: LLMは、そのような順序付けが基礎となるタスクを変えないという事実にもかかわらず、前提の順序付けに対して驚くほど脆弱です。特に、前提順序が中間推論ステップで要求されるコンテキストと整合すると、LCMが最高の性能を達成することを観察する。例えば、帰納的推論タスクでは、(ランダムな順序付けとは対照的に)プロンプトにおける基底真理証明と同じ順序で前提を提示すると、モデルの精度が劇的に向上する。まず, 前提順序が多種多様 LLM に与える影響について検討し, 前提順序が変われば30%以上の性能低下が生じることを示した。さらに,GSM8KをベースとしたベンチマークR-GSMをリリースし,数学的な問題解決の順序付け効果を検証し,元のGSM8Kベンチマークと比較して精度の大幅な低下を観測した。 Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model's accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# カスタムLDMに対するバックドアアタックの指導 Instruction Backdoor Attacks Against Customized LLMs ( http://arxiv.org/abs/2402.09179v3 ) ライセンス: Link先を確認	Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, Yang Zhang,	(参考訳) カスタマイズされたLarge Language Models (LLM) に対する需要が増加し、GPTのようなソリューションが開発されるようになった。これらのソリューションは、コーディングせずに自然言語のプロンプトを介してLLMをカスタマイズするのに役立つ。しかし、サードパーティのカスタムバージョンのLDMの信頼性は依然として重要な懸念事項である。本稿では、信頼できないカスタマイズ LLM (e , GPTs) と統合されたアプリケーションに対する最初の命令バックドア攻撃を提案する。具体的には、これらの攻撃は、バックドア命令でプロンプトを設計し、事前に定義されたトリガを含む場合、アタッカーの望ましい結果を出力することで、バックドアをLLMのカスタムバージョンに埋め込む。私たちの攻撃には、単語レベル、構文レベル、意味レベルという3つのレベルの攻撃が含まれています。我々は、我々の攻撃は微調整やバックエンドのLCMの変更を必要とせず、GPT開発ガイドラインに厳格に従うことを強調する。我々は6つの著名なLCMと5つのベンチマークテキスト分類データセットについて広範な実験を行った。その結果,我々の命令バックドア攻撃は,実用性を損なうことなく,所望の攻撃性能を達成できることが示唆された。さらに,2つの防衛戦略を提案し,その効果を実証する。 GPTなどのLCMカスタマイズの脆弱性と潜在的なリスクについて検討した。 The increasing demand for customized Large Language Models (LLMs) has led to the development of solutions like GPTs. These solutions facilitate tailored LLM creation via natural language prompts without coding. However, the trustworthiness of third-party custom versions of LLMs remains an essential concern. In this paper, we propose the first instruction backdoor attacks against applications integrated with untrusted customized LLMs (e.g., GPTs). Specifically, these attacks embed the backdoor into the custom version of LLMs by designing prompts with backdoor instructions, outputting the attacker's desired result when inputs contain the pre-defined triggers. Our attack includes 3 levels of attacks: word-level, syntax-level, and semantic-level, which adopt different types of triggers with progressive stealthiness. We stress that our attacks do not require fine-tuning or any modification to the backend LLMs, adhering strictly to GPTs development guidelines. We conduct extensive experiments on 6 prominent LLMs and 5 benchmark text classification datasets. The results show that our instruction backdoor attacks achieve the desired attack performance without compromising utility. Additionally, we propose two defense strategies and demonstrate their effectiveness in reducing such attacks. Our findings highlight the vulnerability and the potential risks of LLM customization such as GPTs.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# STEER:大規模言語モデルの経済連帯性を評価する STEER: Assessing the Economic Rationality of Large Language Models ( http://arxiv.org/abs/2402.09552v2 ) ライセンス: Link先を確認	Narun Raman, Taylor Lundy, Samuel Amouyal, Yoav Levine, Kevin Leyton-Brown, Moshe Tennenholtz,	(参考訳) LLMを意思決定の「エージェント」として使うことへの関心が高まっている。どのモデルを使うべきか、どのように促すべきか、イントロスペクションやチェーン・オブ・シークレットの推論など、多くの自由度が含まれています。より広義には、LLMエージェントが信頼できるかどうかを判断するためには、そのようなエージェントの経済的合理性を評価するための方法論が必要である。本稿では,提案する。まず、合理的な意思決定に関する経済文献を調査し、エージェントが提示すべき「要素」の集合を分類し、それら間の依存関係を分類する。次に、これらの要素に対してLLMの性能を定量的に評価し、ユーザが提供するルーリックと組み合わせて「STEERレポートカード」を生成するベンチマーク分布を提案する。最後に,14種類のLLMを用いた大規模実験結果について述べる。 There is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. In this paper, we provide one. We begin by surveying the economic literature on rational decision making, taxonomizing a large set of fine-grained "elements" that an agent should exhibit, along with dependencies between them. We then propose a benchmark distribution that quantitatively scores an LLMs performance on these elements and, combined with a user-provided rubric, produces a "STEER report card." Finally, we describe the results of a large-scale empirical experiment with 14 different LLMs, characterizing the both current state of the art and the impact of different model sizes on models' ability to exhibit rational behavior.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# 調整器を用いたフィルタ間のエビデンスの組み合わせ Combining Evidence Across Filtrations Using Adjusters ( http://arxiv.org/abs/2402.09698v2 ) ライセンス: Link先を確認	Yo Joong Choe, Aaditya Ramdas,	(参考訳) 任意の時間価のシーケンシャル推論では、任意の許容手順は、任意の停止時間で合成ヌル仮説に対して蓄積された証拠を定量化するテストマリンタレの複合一般化である電子過程に基づいていなければならないことが知られている。本稿では,異なる情報集合(フィルタ)を用いて構築した電子プロセスを同一のnullに対して組み合わせる手法について検討する。同じ濾過で構築された電子プロセスは、(例えば、平均化によって)懸命に結合することができるが、より微細な濾過では有効でないため、異なる濾過で構築された電子プロセスは不可能である。この問題は、交換可能性テスト、独立性テスト、および予測と遅延を比較するためのテストで発生する。まず、調整器と呼ばれる関数のクラスが、粗いフィルターからより微細なフィルターにEプロセスを持ち上げることができることを証明します。次に、アコーダの利用が必要な感覚を定式化するアコーダの特性定理を導入する。主な意味は2つある。まず、粗い濾過で強力な電子プロセスがあれば、元の濾過ですぐに強力な電子プロセスが得られる。第二に、電子プロセスを構築するために濾過を粗くすると、元の濾過の時間的妥当性を回復する漸近的対数コストが生じる。 In anytime-valid sequential inference, it is known that any admissible procedure must be based on e-processes, which are composite generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any arbitrary stopping time. This paper studies methods for combining e-processes constructed using different information sets (filtrations) for the same null. Although e-processes constructed in the same filtration can be combined effortlessly (e.g., by averaging), e-processes constructed in different filtrations cannot, because their validity in a coarser filtration does not translate to validity in a finer filtration. This issue arises in exchangeability tests, independence tests, and tests for comparing forecasts with lags. We first establish that a class of functions called adjusters allows us to lift e-processes from a coarser filtration into any finer filtration. We then introduce a characterization theorem for adjusters, formalizing a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is an asymptotically logarithmic cost of recovering anytime-validity in the original filtration.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# 反復的後方サンプリングによる確率的位置推定 Stochastic Localization via Iterative Posterior Sampling ( http://arxiv.org/abs/2402.10758v2 ) ライセンス: Link先を確認	Louis Grenioux, Maxence Noble, Marylou Gabrié, Alain Oliviero Durmus,	(参考訳) スコアに基づく学習を基盤として,確率的ローカライゼーション技術への新たな関心が高まっている。これらのモデルでは、観測過程と呼ばれる確率過程を通じて、データ分布からサンプルをノイズにし、このダイナミクスに関連するデノイザーを徐々に学習する。特定の応用とは別に、非正規化対象密度からのサンプリング問題に対する確率的局所化の利用は、広く研究されていない。この仕事は、このギャップを埋めるのに役立ちます。一般的な確率的局所化フレームワークを考察し、フレキシブルな偏極スケジュールに関連する観察過程の明示的なクラスを導入する。我々は、この力学の近似的なサンプルを得るための完全な方法論である$\textit{Stochastic Localization via Iterative Posterior Sampling}$ (SLIPS)を提供する。我々のスキームはマルコフ連鎖モンテカルロによるデノイザーの推定に基づいており、詳細な実践的ガイドラインが付属している。本稿では,多モード分布のベンチマークにおけるSLIPSの利点と適用性について述べる。例えば,多次元のガウス混合,ベイジアンロジスティック回帰,統計力学による高次元場システムなどである。 Building upon score-based learning, new interest in stochastic localization techniques has recently emerged. In these models, one seeks to noise a sample from the data distribution through a stochastic process, called observation process, and progressively learns a denoiser associated to this dynamics. Apart from specific applications, the use of stochastic localization for the problem of sampling from an unnormalized target density has not been explored extensively. This work contributes to fill this gap. We consider a general stochastic localization framework and introduce an explicit class of observation processes, associated with flexible denoising schedules. We provide a complete methodology, $\textit{Stochastic Localization via Iterative Posterior Sampling}$ (SLIPS), to obtain approximate samples of this dynamics, and as a by-product, samples from the target distribution. Our scheme is based on a Markov chain Monte Carlo estimation of the denoiser and comes with detailed practical guidelines. We illustrate the benefits and applicability of SLIPS on several benchmarks of multi-modal distributions, including Gaussian mixtures in increasing dimensions, Bayesian logistic regression and a high-dimensional field system from statistical-mechanics.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# EcoRank: 大規模言語モデルを用いた予算制約付きテキストの再分類 EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models ( http://arxiv.org/abs/2402.10866v2 ) ライセンス: Link先を確認	Muhammad Shihab Rashid, Jannat Ara Meem, Yue Dong, Vagelis Hristidis,	(参考訳) 大規模言語モデル(LLM)は、テキストの再ランク付けにおいて最先端のパフォーマンスを達成した。このプロセスはプロンプト内のクエリと候補パスを含み、ポイントワイド、リストワイド、ペアワイドのプロンプト戦略を利用する。 LLMによるこれらのランキング戦略の制限はコストであり、入力トークンと出力トークンの数に基づいて、APIの課金によってプロセスが高価になる可能性がある。提案手法は, 迅速な選択, LLM API, 予算分割の膨大な検索空間をナビゲートすることによって, 予算が与えられた性能を最大化する方法について検討する。 LLM APIの集合を用いてテキストの再ランク付けを行うための予算制約付き手法の組を提案する。私たちの最も効率的な方法は、EcoRankと呼ばれ、プロンプト戦略とLCM API間の予算配分に関する決定を共同で最適化する2層パイプラインです。 EcoRankは,4つの人気QAおよびパスリグレードデータセットの実験結果から,他の予算に配慮した教師なしベースラインよりも優れた性能を示した。 Large Language Models (LLMs) have achieved state-of-the-art performance in text re-ranking. This process includes queries and candidate passages in the prompts, utilizing pointwise, listwise, and pairwise prompting strategies. A limitation of these ranking strategies with LLMs is their cost: the process can become expensive due to API charges, which are based on the number of input and output tokens. We study how to maximize the re-ranking performance given a budget, by navigating the vast search spaces of prompt choices, LLM APIs, and budget splits. We propose a suite of budget-constrained methods to perform text re-ranking using a set of LLM APIs. Our most efficient method, called EcoRank, is a two-layered pipeline that jointly optimizes decisions regarding budget allocation across prompt strategies and LLM APIs. Our experimental results on four popular QA and passage reranking datasets show that EcoRank outperforms other budget-aware supervised and unsupervised baselines.	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# メモリ効率の良いLLMファインチューニングのためのゼロ階最適化の再検討:ベンチマーク Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ( http://arxiv.org/abs/2402.11592v3 ) ライセンス: Link先を確認	Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen,	(参考訳) 自然言語処理(NLP)の進化途上において、SGDやAdamのような一階最適化(FO)を備えた微調整済みの大規模言語モデル(LLM)が標準となっている。しかし, LLMのサイズが大きくなるにつれて, FO勾配計算のバックプロパゲーション(BP)によるメモリオーバーヘッドが大きくなることが大きな課題となっている。特にメモリ効率が最重要であるオンデバイストレーニングのようなアプリケーションでは、この問題に対処することが重要です。本稿では, BPフリーなゼロオーダー最適化(ZO)へのシフトを, MeZOが最初に導入した概念に基づいて, LLM微調整時のメモリコスト削減のソリューションとして提案する。従来のZO-SGD法とは異なり、我々の研究はより広範なZO最適化手法に拡張され、5つのLLMファミリー(Roberta, OPT, LLaMA, Vicuna, Mistral)、3つのタスク複雑度、5つの微調整スキームにまたがる総合的なベンチマーク研究が実施されている。本研究は,タスクアライメントの重要性,前方勾配法の役割,アルゴリズムの複雑さと微調整性能のバランスについて,これまで見過ごされてきた最適化原理を明らかにした。さらに,ブロックワイド降下,ハイブリッドトレーニング,勾配間隔など,ZO最適化の新たな拡張も導入する。我々の研究は、よりメモリ効率の良いLCM微調整を実現するための有望な方向性を提供する。すべての実験を再現するためのコードはhttps://github.com/ZO-Bench/ZO-LLM にある。 In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .	翻訳日:2024-05-30 01:28:38 公開日:2024-05-28
# 変圧器を用いたインコンテキスト学習:リップシッツネスに適応したソフトマックスアテンション In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness ( http://arxiv.org/abs/2402.11639v2 ) ライセンス: Link先を確認	Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai,	(参考訳) In-context Learning(ICL)は、学習者が暗黙的にいくつかのデータを通して推論中に新しいコンテキストを提示し、そのコンテキストで予測を行う機械学習フレームワークである。そのため、学習者は追加の訓練を受けずに文脈に適応しなければならない。我々は、各コンテキストが回帰タスクをエンコードするICL設定におけるソフトマックスアテンションの役割について検討する。注意ユニットは、事前学習タスクのランドスケープに適応した最寄りの予測器を実装するために使用するウィンドウを学習する。具体的には,プレトレーニング作業におけるリプシッツ性低下とラベルノイズの増加により,このウィンドウが拡大することを示す。また、低ランク線形問題において、注目部は推論の前に適切な部分空間に投影することを学ぶ。さらに, この適応性はソフトマックスの活性化に大きく依存しており, 先行理論解析においてしばしば研究される線形活性化によって再現できないことを示す。 A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such, that learner must adapt to the context without additional training. We explore the role of softmax attention in an ICL setting where each context encodes a regression task. We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks. Specifically, we show that this window widens with decreasing Lipschitzness and increasing label noise in the pretraining tasks. We also show that on low-rank, linear problems, the attention unit learns to project onto the appropriate subspace before inference. Further, we show that this adaptivity relies crucially on the softmax activation and thus cannot be replicated by the linear activation often studied in prior theoretical analyses.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# 生成的半教師付きグラフ異常検出 Generative Semi-supervised Graph Anomaly Detection ( http://arxiv.org/abs/2402.11887v4 ) ライセンス: Link先を確認	Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-Peng Lim, Guansong Pang,	(参考訳) この研究は、グラフ内のノードの一部が正規であることが知られている実用的な半教師付きグラフ異常検出(GAD)シナリオを考察し、完全にラベル付けされていないグラフを用いた広範囲に探索された教師なし設定とは対照的である。我々は,通常のノードへのアクセスがごく少数のノードであっても,半教師付き設定に適応した場合に,既存の教師なしGAD手法の検出性能を向上させることを明らかにした。しかし、これらの通常のノードの利用は限られている。本稿では,通常のノードをよりよく活用するために,半教師付きシナリオのための新しいGAD手法(GGAD)を提案する。鍵となるアイデアは、識別可能な1クラス分類器を訓練する際に有効な負のノードサンプルを提供するために、擬似異常ノード("outlier node"と呼ばれる)を生成することである。ここでの最大の課題は、実際の異常ノードに関する基礎的な真理情報の欠如である。この課題に対処するため、GGADは、異常ノード(非対称な局所親和性と自中心的親和性)に関する2つの重要な事前情報を活用するように設計されており、グラフ構造と特徴表現の両方で異常ノードを同化する信頼性の高い外れ値ノードを生成する。 6つの実世界のGADデータセットに関する総合的な実験を行い、半教師付きGADのベンチマークを確立し、GAGDが訓練正常ノード数の異なる最先端の非教師付きおよび半教師付きGADメソッドを大幅に上回っていることを示す。コードはhttps://github.com/mala-lab/GGAD.comで公開される。 This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# DiLightNet:拡散画像生成のための微粒化照明制御 DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation ( http://arxiv.org/abs/2402.11929v2 ) ライセンス: Link先を確認	Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, Xin Tong,	(参考訳) 本稿では,テキスト駆動拡散画像生成におけるきめ細かな照明制御を実現するための新しい手法を提案する。既存の拡散モデルは、任意の照明条件下で画像を生成する能力を持っているが、追加のガイダンスなしでは、これらのモデルは画像の内容と照明を相関する傾向にある。さらに、テキストプロンプトには詳細な照明設定を記述するために必要な表現力がない。画像生成時の照明のきめ細かい制御を可能とし、かつ、照度ヒントの形で詳細な照明情報、すなわち、ターゲット照明下で均質な正準材を用いたシーン形状の可視化によりテキストプロンプトを増強するコンテンツクリエータを提供する。しかし、放射光のヒントを生成するのに必要なシーン形状は分かっていない。我々のキーとなる観察は、拡散過程のみを導く必要があるため、正確な放射率ヒントは不要であり、拡散モデルを正しい方向に向ける必要があることである。この観測に基づいて,画像生成時の照明を制御する3段階の手法を提案する。最初の段階では、標準の事前学習拡散モデルを利用して、制御不能な照明下で暫定的な画像を生成する。次に、第2段階では、仮画像から推定される前景オブジェクトの粗い形状に計算された放射率ヒントを用いて、ターゲット照明を改良された拡散モデルであるDiLightNetに渡すことにより、生成画像中の前景オブジェクトを再合成し、精製する。テクスチャの詳細を維持するために、ダイライトネットに渡す前に、レイディアンスヒントを仮合成画像のニューラルエンコーディングに乗じる。最後に、第3段階において、背景を前景の照明と整合させるように再合成する。我々は、様々なテキストプロンプトと照明条件に基づいて、照明制御拡散モデルを実証し、検証する。 This paper presents a novel method for exerting fine-grained lighting control during text-driven diffusion-based image generation. While existing diffusion models already have the ability to generate images under any lighting condition, without additional guidance these models tend to correlate image content and lighting. Moreover, text prompts lack the necessary expressional power to describe detailed lighting setups. To provide the content creator with fine-grained control over the lighting during image generation, we augment the text-prompt with detailed lighting information in the form of radiance hints, i.e., visualizations of the scene geometry with a homogeneous canonical material under the target lighting. However, the scene geometry needed to produce the radiance hints is unknown. Our key observation is that we only need to guide the diffusion process, hence exact radiance hints are not necessary; we only need to point the diffusion model in the right direction. Based on this observation, we introduce a three stage method for controlling the lighting during image generation. In the first stage, we leverage a standard pretrained diffusion model to generate a provisional image under uncontrolled lighting. Next, in the second stage, we resynthesize and refine the foreground object in the generated image by passing the target lighting to a refined diffusion model, named DiLightNet, using radiance hints computed on a coarse shape of the foreground object inferred from the provisional image. To retain the texture details, we multiply the radiance hints with a neural encoding of the provisional synthesized image before passing it to DiLightNet. Finally, in the third stage, we resynthesize the background to be consistent with the lighting on the foreground object. We demonstrate and validate our lighting controlled diffusion model on a variety of text prompts and lighting conditions.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# GumbelSoft: GumbelMax-trickによる多言語モデル透かし GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick ( http://arxiv.org/abs/2402.12948v3 ) ライセンス: Link先を確認	Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao,	(参考訳) 大型言語モデル(LLM)は、人間のようなテキストを生成するだけでなく、フェイクニュースや学術的不正の誤用も懸念している。デコードベースの透かし、特にGumbelMax-trickベースの透かし(GM透かし)は、その顕著な検出性のために、機械生成テキストを保護するためのスタンドアウトソリューションである。しかし、GMの透かしは世代多様性において大きな課題に直面し、常に同じプロンプトに対して同じ出力を出力し、世代多様性とユーザエクスペリエンスに悪影響を及ぼす。この制限を克服するために,新しいタイプのGM透かし,Logits-Addition透かし,およびその3つの変種を提案する。このうち、GumbelSoftの透かし(Logits-Addition 透かしのソフトマックス版)は、高い多様性設定において優れた性能を示し、AUROCのスコアは2種類の変種のうち、0.1から0.3で、他の復号ベースの透かし法を0.1で上回っている。 Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# ニューラルネットワークパラメータ拡散 Neural Network Parameter Diffusion ( http://arxiv.org/abs/2402.13144v2 ) ライセンス: Link先を確認	Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You,	(参考訳) 拡散モデルは画像生成やビデオ生成において顕著な成功を収めた。本研究は,拡散モデルが高パフォーマンスニューラルネットワークパラメータのtextit{generate \textit{generate high-performing Neural Network parameters} にも適用可能であることを示す。我々のアプローチは単純で、オートエンコーダと標準潜在拡散モデルを利用する。オートエンコーダは、トレーニングされたネットワークパラメータのサブセットの潜在表現を抽出する。拡散モデルは、これらの潜在パラメータ表現をランダムノイズから合成するように訓練される。その後、オートエンコーダのデコーダに渡される新しい表現を生成し、その出力はネットワークパラメータの新しいサブセットとして使用できる。さまざまなアーキテクチャやデータセットにわたって、私たちの拡散プロセスは、トレーニングされたネットワーク上での同等または改善されたパフォーマンスのモデルを、最小限のコストで一貫して生成します。特に、生成されたモデルがトレーニングされたネットワークを記憶していないことを経験的に見出した。この結果は拡散モデルの多元性利用に関するさらなる探索を奨励するものである。 Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion model is then trained to synthesize these latent parameter representations from random noise. It then generates new representations that are passed through the autoencoder's decoder, whose outputs are ready to use as new subsets of network parameters. Across various architectures and datasets, our diffusion process consistently generates models of comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained networks. Our results encourage more exploration on the versatile use of diffusion models.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# 量子ワッサーシュタイン発散の計量的性質について On the metric property of quantum Wasserstein divergences ( http://arxiv.org/abs/2402.13150v2 ) ライセンス: Link先を確認	Gergely Bunth, József Pitrik, Tamás Titkos, Dániel Virosztek,	(参考訳) 量子ワッサーシュタインの発散は、チャネルによって定義される量子ワッサーシュタイン距離の修正版であり、デ・パルマとトレビサンによって量子状態空間上の真の計量であると推測される。分離可能ヒルベルト空間と任意の二次コスト作用素によって記述される全ての量子系に対して、量子ワッサーシュタインの三角形の不等式は、特定の状態が純粋であり、全ての状態が有限エネルギーであるという仮定の下で証明する。また、三角形の不等式が一般に任意の状態の選択のために成り立つことを示唆する強い数値的な証拠も提示する。 Quantum Wasserstein divergences are modified versions of quantum Wasserstein distances defined by channels, and they are conjectured to be genuine metrics on quantum state spaces by De Palma and Trevisan. We prove triangle inequality for quantum Wasserstein divergences for every quantum system described by a separable Hilbert space and any quadratic cost operator under the assumption that a particular state involved is pure, and all the states have finite energy. We also provide strong numerical evidence suggesting that the triangle inequality holds in general, for an arbitrary choice of states.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# 言語モデルファインチューニングにおける自己蒸留ブリッジの分布ギャップ Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning ( http://arxiv.org/abs/2402.13669v2 ) ライセンス: Link先を確認	Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, Qian Liu,	(参考訳) 大規模言語モデル(LLM)の急増は自然言語処理に革命をもたらしたが、特定のタスクに対する微調整は、パフォーマンスのバランスと一般的な命令追従能力の維持という課題に直面することが多い。本稿では,タスクデータセットとLCM間の分散ギャップが主な原因であると仮定する。そこで本研究では, モデル自体が生成した蒸留データセットを用いて, 分散ギャップを埋める手法として, 自己蒸留細管(SDFT)を導入する。各種ベンチマークにおけるLlama-2-chatモデルによる実験結果から,SDFTはバニラ微調整に比べて下流タスクにおいて同等あるいは優れた性能を達成しつつ,破滅的な忘れを効果的に軽減することが示された。さらに、SDFTは、LCMの利便性と安全性を維持する可能性を実証している。私たちのコードはhttps://github.com/sail-sg/sdft.comから入手可能です。 The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# CriticBench: 批判と正しい推論のためのLLMのベンチマーク CriticBench: Benchmarking LLMs for Critique-Correct Reasoning ( http://arxiv.org/abs/2402.14809v3 ) ライセンス: Link先を確認	Zicheng Lin, Zhibin Gou, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang,	(参考訳) 大規模言語モデル(LLM)がそれらの推論を批判し、洗練する能力は、評価、フィードバックのプロビジョニング、自己改善において非常に重要である。本稿では,LCMの様々なタスクにおける推論を批判・修正する能力を評価するための総合的なベンチマークであるCriticBenchを紹介する。 CriticBenchは数学、常識、記号、コーディング、アルゴリズムの5つの推論領域を含んでいる。 15のデータセットをコンパイルし、3つのLLMファミリーからのレスポンスを組み込む。 CriticBenchを用いて、GQC推論(GQC推論)の生成、批評、修正における17個のLLMの性能を評価し、評価する。以上の結果から,(1)GQC能力の線形関係,(2)改善能力の顕著な向上,(2)論理指向タスクの補正性の向上,(3)モデルサイズの増加に伴って低下するGQC知識の不整合,(4)より弱いモデルの方がより弱いモデルに好適なモデル間クオリティクアリングのダイナミクス,などが明らかになった。 LLMの微妙な批判的正しい推論に対するこれらの洞察が、LCM批判と自己改善のさらなる研究を促進することを願っている。 The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial for their application in evaluation, feedback provision, and self-improvement. This paper introduces CriticBench, a comprehensive benchmark designed to assess LLMs' abilities to critique and rectify their reasoning across a variety of tasks. CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. It compiles 15 datasets and incorporates responses from three LLM families. Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i.e., GQC reasoning. Our findings reveal: (1) a linear relationship in GQC capabilities, with critique-focused training markedly enhancing performance; (2) a task-dependent variation in correction effectiveness, with logic-oriented tasks being more amenable to correction; (3) GQC knowledge inconsistencies that decrease as model size increases; and (4) an intriguing inter-model critiquing dynamic, where stronger models are better at critiquing weaker ones, while weaker models can surprisingly surpass stronger ones in their self-critique. We hope these insights into the nuanced critique-correct reasoning of LLMs will foster further research in LLM critique and self-improvement.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# DistALANER: オープンソースソフトウェアエコシステムにおけるアクティブラーニングの拡張されたエンティティ認識 DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem ( http://arxiv.org/abs/2402.16159v4 ) ライセンス: Link先を確認	Somnath Banerjee, Avik Dutta, Aaditya Agrawal, Rima Hazra, Animesh Mukherjee,	(参考訳) AI革命が成立すると、オープンソースのソフトウェアシステム、医療システム、銀行システム、交通システムなど、さまざまな分野のプロフェッショナルをサポートする自動化システムを構築する傾向がますます顕著になっている。このようなシステムのサポートツールの自動化において重要な要件は、名前付きエンティティの早期識別であり、特殊機能開発の基礎となっている。しかし、各ドメイン固有の性質、異なる専門用語や専門言語により、利用可能なデータのエキスパートアノテーションは高価で困難になる。これらの課題を踏まえて,オープンソースのソフトウェアシステムに特化して,エンティティ認識(NER)技術を提案する。提案手法は,2段階の遠隔教師付きアノテーションプロセスを用いて,注釈付きソフトウェアデータの不足に対処することを目的としている。このプロセスは、言語ヒューリスティックス、ユニークなルックアップテーブル、外部知識源、アクティブな学習アプローチを戦略的に活用する。これらの強力な技術を活用することで、モデルの性能を高めるだけでなく、コストや専門家アノテータの不足に伴う制限を効果的に緩和する。我々のモデルは最先端のLLMよりもかなり優れています。また,関係抽出の下流課題におけるNERの有効性を示す。 With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent. A crucial requirement in the automation of support tools for such systems is the early identification of named entities, which serves as a foundation for developing specialized functionalities. However, due to the specific nature of each domain, different technical terminologies and specialized languages, expert annotation of available data becomes expensive and challenging. In light of these challenges, this paper proposes a novel named entity recognition (NER) technique specifically tailored for the open-source software systems. Our approach aims to address the scarcity of annotated software data by employing a comprehensive two-step distantly supervised annotation process. This process strategically leverages language heuristics, unique lookup tables, external knowledge sources, and an active learning approach. By harnessing these powerful techniques, we not only enhance model performance but also effectively mitigate the limitations associated with cost and the scarcity of expert annotators. It is noteworthy that our model significantly outperforms the state-of-the-art LLMs by a substantial margin. We also show the effectiveness of NER in the downstream task of relation extraction.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# 大規模言語モデルは認知言語処理をミラー化するか? Do Large Language Models Mirror Cognitive Language Processing? ( http://arxiv.org/abs/2402.18023v2 ) ライセンス: Link先を確認	Yuqi Ren, Renren Jin, Tongxuan Zhang, Deyi Xiong,	(参考訳) 大規模言語モデル(LLM)はテキスト理解と論理的推論において顕著な能力を示しており、LLMによって学習されたテキスト表現が言語処理能力を促進することを示唆している。認知科学において、脳認知処理信号は典型的には人間の言語処理を研究するために使用される。したがって、LLMからのテキスト埋め込みが脳認知処理信号とどの程度うまく一致し、トレーニング戦略がLLM脳のアライメントにどのように影響するかを問うことは自然である。本稿では、Representational similarity Analysis (RSA) を用いて、23のメインストリームLLMとfMRI信号のアライメントを測定し、LLMが認知言語処理をいかに効果的にシミュレートするかを評価する。本研究では,LLM脳アライメントに対する各種因子(例えば,事前学習データサイズ,モデルスケーリング,アライメントトレーニング,プロンプト)の影響を実験的に検討する。実験結果から、事前学習データサイズとモデルスケーリングはLLM-脳類似性と正の相関を示し、アライメントトレーニングはLLM-脳類似性を大幅に改善することが示された。明示的プロンプトはLLMの脳認知言語処理との整合性に寄与するが、非感覚的なノイズ的プロンプトはそのようなアライメントを弱める可能性がある。さらに, LLM評価(例えばMMLU, Chatbot Arena)の性能は, LLM-Brain類似度と高い相関性を示した。 Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# 1Dギブス状態の条件付き独立と効率的な学習への応用 Conditional Independence of 1D Gibbs States with Applications to Efficient Learning ( http://arxiv.org/abs/2402.18500v2 ) ライセンス: Link先を確認	Paul Gondolf, Samuel O. Scalet, Alberto Ruiz-de-Alarcon, Alvaro M. Alhambra, Angela Capel,	(参考訳) 熱平衡におけるスピン鎖は, 個々の領域が近傍に強く相関する相関構造を持つことを示す。我々はこれを、いわゆるBelavkin-Staszewski相対エントロピーによって定義される条件付き相互情報の代替概念で定量化する。スピン鎖ハミルトニアンが変換不変であるという仮定の下で、これらの測度が任意の正の温度で超指数的に崩壊することを証明する。これらの測度に付随するリカバリマップを用いて、小さな(サブ対数的な)大きさの辺りの点でテンソルネットワーク近似を逐次構築する。主な応用として, 多項式サンプル複雑性を用いた局所的な測定から, 状態の古典的表現を効率的に学習できることが示されている。また,ギブス状態全体の純度について近似分解条件を証明し,少数の局所測定値から小さな乗算誤差に効率的に推定できることを示唆した。結果は厳密な局所から、指数関数的に低下する相互作用をしきい値温度以上まで延長する。独立な関心の技術的ステップとして、条件付き予想の適用によるベラブキン・シュタゼフスキ相対エントロピーの崩壊への上限を示す。 We show that spin chains in thermal equilibrium have a correlation structure in which individual regions are strongly correlated at most with their near vicinity. We quantify this with alternative notions of the conditional mutual information defined through the so-called Belavkin-Staszewski relative entropy. We prove that these measures decay superexponentially at every positive temperature, under the assumption that the spin chain Hamiltonian is translation-invariant. Using a recovery map associated with these measures, we sequentially construct tensor network approximations in terms of marginals of small (sublogarithmic) size. As a main application, we show that classical representations of the states can be learned efficiently from local measurements with a polynomial sample complexity. We also prove an approximate factorization condition for the purity of the entire Gibbs state, which implies that it can be efficiently estimated to a small multiplicative error from a small number of local measurements. The results extend from strictly local to exponentially-decaying interactions above a threshold temperature, albeit only with exponential decay rates. As a technical step of independent interest, we show an upper bound to the decay of the Belavkin-Staszewski relative entropy upon the application of a conditional expectation.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# Med-Real2Sim:物理インフォームド自己監督学習を用いた非侵襲型医療デジタル双生児 Med-Real2Sim: Non-Invasive Medical Digital Twins using Physics-Informed Self-Supervised Learning ( http://arxiv.org/abs/2403.00177v2 ) ライセンス: Link先を確認	Keying Kuang, Frances Dean, Jack B. Jedlicki, David Ouyang, Anthony Philippakis, David Sontag, Ahmed M. Alaa,	(参考訳) デジタルツイン(Digital twin)は、数学的モデリングを用いてその定義する特徴を特徴づけ、シミュレートする現実世界の物理現象の仮想レプリカである。デジタル双生児を病気のプロセスのために構築することにより、仮想環境における仮想的な介入の下で、患者の健康状態や反現実的な結果を模倣するシリカ内シミュレーションを行うことができる。これにより、侵襲的な処置や不確実な治療決定が不要になる。本稿では,非侵襲的な患者健康データのみを用いて,デジタル双対モデルパラメータを同定する手法を提案する。我々は,デジタル双対モデリングを複合逆問題としてアプローチし,その構造が自己教師付き学習(SSL)における事前学習や微調整に似ていることを観察する。これを活用することで、生理学的プロセスの微分可能なシミュレータを学習するプリテキストタスクにおいて、まずニューラルネットワークを事前訓練する物理インフォームドSSLアルゴリズムを導入する。その後、モデルは、事前訓練で学んだ物理方程式に拘束されながら、非侵襲的なモーダルから生理的計測を再構築するように訓練される。非侵襲的心エコー法による心血行動態のデジタル双生児の同定に本法を適用し,非観血的疾患検出およびサイリコ内臨床試験における有用性を示した。 A digital twin is a virtual replica of a real-world physical phenomena that uses mathematical modeling to characterize and simulate its defining features. By constructing digital twins for disease processes, we can perform in-silico simulations that mimic patients' health conditions and counterfactual outcomes under hypothetical interventions in a virtual setting. This eliminates the need for invasive procedures or uncertain treatment decisions. In this paper, we propose a method to identify digital twin model parameters using only noninvasive patient health data. We approach the digital twin modeling as a composite inverse problem, and observe that its structure resembles pretraining and finetuning in self-supervised learning (SSL). Leveraging this, we introduce a physics-informed SSL algorithm that initially pretrains a neural network on the pretext task of learning a differentiable simulator of a physiological process. Subsequently, the model is trained to reconstruct physiological measurements from noninvasive modalities while being constrained by the physical equations learned in pretraining. We apply our method to identify digital twins of cardiac hemodynamics using noninvasive echocardiogram videos, and demonstrate its utility in unsupervised disease detection and in-silico clinical trials.	翻訳日:2024-05-30 01:18:48 公開日:2024-05-28
# ニュース見出しにおける目標感のLLM--記述-規範的ジレンマを探る LLMs for Targeted Sentiment in News Headlines: Exploring the Descriptive-Prescriptive Dilemma ( http://arxiv.org/abs/2403.00418v2 ) ライセンス: Link先を確認	Jana Juroš, Laura Majer, Jan Šnajder,	(参考訳) ニュースの見出しは、特定の方法でエンティティを意図的に描写することで感情を喚起し、見出しのターゲット感情分析(TSA)を価値はあるが難しいタスクにする。主観性のため、TSAデータセットの作成には、主観性を促進するか制限する、記述的から規範的まで、様々なアノテーションのパラダイムが伴う。 LLMは広い言語的・世界的知識と文脈内学習能力のためにTSAに適しているが、その性能は素早い設計に依存している。本稿では,複数の言語にまたがる記述的および規範的データセットを用いて,最新のLLMとニュース見出しのTSAのための微調整エンコーダモデルの精度を比較した。記述型-記述型連続体を探索し、平易なゼロショットから精巧な少数ショットプロンプトまで、即時的な説明性によってパフォーマンスがどのように影響を受けるかを分析する。最後に, LLMのキャリブレーション誤差による不確かさの定量化と, ラベル変動との比較を行った。 LLMは記述的データセット上で微調整エンコーダよりも優れており、キャリブレーションとF1スコアは規範性の向上とともに一般的に改善されているが、最適レベルは様々である。 News headlines often evoke sentiment by intentionally portraying entities in particular ways, making targeted sentiment analysis (TSA) of headlines a worthwhile but difficult task. Due to its subjectivity, creating TSA datasets can involve various annotation paradigms, from descriptive to prescriptive, either encouraging or limiting subjectivity. LLMs are a good fit for TSA due to their broad linguistic and world knowledge and in-context learning abilities, yet their performance depends on prompt design. In this paper, we compare the accuracy of state-of-the-art LLMs and fine-tuned encoder models for TSA of news headlines using descriptive and prescriptive datasets across several languages. Exploring the descriptive--prescriptive continuum, we analyze how performance is affected by prompt prescriptiveness, ranging from plain zero-shot to elaborate few-shot prompts. Finally, we evaluate the ability of LLMs to quantify uncertainty via calibration error and comparison to human label variation. We find that LLMs outperform fine-tuned encoders on descriptive datasets, while calibration and F1-score generally improve with increased prescriptiveness, yet the optimal level varies.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# Diff-Plugin:拡散に基づく低レベルタスクの再現 Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks ( http://arxiv.org/abs/2403.00644v4 ) ライセンス: Link先を確認	Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W. H. Lau,	(参考訳) 大規模データセットで訓練された拡散モデルは、画像合成において顕著な進歩を遂げた。しかし拡散過程のランダム性のため、細部保存を必要とする多様な低レベルのタスクを扱うのにしばしば苦労する。この制限を克服するために、Diff-Pluginフレームワークを提案する。具体的には、まず、タスク固有の事前情報を提供し、画像コンテンツを保存するための拡散プロセスを導くために、デュアルブランチ設計の軽量なTask-Pluginモジュールを提案する。次に、テキスト命令に基づいて異なるタスクを自動選択できるプラグインセレクタを提案し、複数の低レベルタスクを自然言語で示すことで画像を編集できる。我々は8つの低レベル視覚タスクについて広範な実験を行った。この結果は、特に実世界のシナリオにおいて、既存の手法よりもDiff-Pluginの方が優れていることを示す。 Diff-Pluginは安定していて、スケジューリング可能で、さまざまなデータセットサイズにわたる堅牢なトレーニングをサポートしています。 Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically, we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors, guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction, allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods, particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# ボヘミア量子力学における量子可観測物のダイナミクスとボルンの規則 Dynamics of quantum observables and Born's rule in Bohmian Quantum Mechanics ( http://arxiv.org/abs/2403.01836v2 ) ライセンス: Link先を確認	Athanasios C. Tzemos, George Contopoulos,	(参考訳) 異方性2d量子調和振動子のボーム粒子のボルン分布における秩序的およびカオス的なボーム粒子軌道について検討する。エネルギー,運動量,角運動量,位置の平均値は,標準量子力学とボーム力学の両方を用いて計算する。特に,1つの結節点と複数の結節点を持つ2つの異なる波動関数を持つ波動関数に対するボルン分布の実現について検討した。分析を通して、これらの平均値を決定する上での秩序的およびカオス的なボヘミア軌道の寄与の解明に焦点をあてる。 We investigate both ordered and chaotic Bohmian trajectories within the Born distribution of Bohmian particles of an anisotropic 2d quantum harmonic oscillator. We compute the average values of energy, momentum, angular momentum, and position using both Standard Quantum Mechanics and Bohmian Mechanics. In particular, we examine realizations of the Born distribution for a wavefunction with a single nodal point and two different wavefunctions with multiple nodal points: one with an almost equal number of ordered and chaotic trajectories, and another composed primarily of chaotic trajectories. Throughout our analysis, we focus on elucidating the contribution of ordered and chaotic Bohmian trajectories in determining these average values.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# テキストと画像のモデルのための暗黙のプロンプトを目指すポジション Position: Towards Implicit Prompt For Text-To-Image Models ( http://arxiv.org/abs/2403.02118v4 ) ライセンス: Link先を確認	Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo,	(参考訳) 最近のテキスト・ツー・イメージ(T2I)モデルは大きな成功を収め、その性能と安全性を評価するために多くのベンチマークが提案されている。しかし、明示的なプロンプトのみを考慮し、暗黙的なプロンプトを無視する(明示的に言及せずにターゲットに隠れる)。これらのプロンプトは安全性の制約を排除し、これらのモデルの応用に潜在的な脅威をもたらす可能性がある。本稿では,T2Iモデルの現状を暗黙のプロンプトに向けて強調する。我々は、ImplicitBenchというベンチマークを示し、一般的なT2Iモデルを用いた暗黙的なプロンプトの性能と影響について調査する。具体的には、一般シンボル、セレブプライバシ、Not-Safe-For-Work(NSFW)問題という3つの側面の2000以上の暗黙的なプロンプトを設計し、収集し、これらの暗黙的なプロンプトの下で6つのよく知られたT2Iモデルの能力を評価する。実験結果から,(1)T2Iモデルが暗黙のプロンプトによって示される様々なターゲットシンボルを正確に生成できること,(2)暗黙のプロンプトがT2Iモデルのプライバシー漏洩の潜在的なリスクをもたらすことが示唆された。 (3) 評価されたほとんどのT2IモデルにおけるNSFWの制約は暗黙のプロンプトでバイパスすることができる。我々は、T2Iコミュニティにおける暗黙のプロンプトの可能性とリスクに対する関心を高め、暗黙のプロンプトの能力と影響についてさらなる調査を行い、彼らのリスクを軽減しつつ、彼らの利益を生かしたバランスの取れたアプローチを提唱する。 Recent text-to-image (T2I) models have had great success, and many benchmarks have been proposed to evaluate their performance and safety. However, they only consider explicit prompts while neglecting implicit prompts (hint at a target without explicitly mentioning it). These prompts may get rid of safety constraints and pose potential threats to the applications of these models. This position paper highlights the current state of T2I models toward implicit prompts. We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts with popular T2I models. Specifically, we design and collect more than 2,000 implicit prompts of three aspects: General Symbols, Celebrity Privacy, and Not-Safe-For-Work (NSFW) Issues, and evaluate six well-known T2I models' capabilities under these implicit prompts. Experiment results show that (1) T2I models are able to accurately create various target symbols indicated by implicit prompts; (2) Implicit prompts bring potential risks of privacy leakage for T2I models. (3) Constraints of NSFW in most of the evaluated T2I models can be bypassed with implicit prompts. We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# 悲しい男性、悲しい女性:大きな言語モデル:感情属性における性的なステレオタイプを反映する Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution ( http://arxiv.org/abs/2403.03121v3 ) ライセンス: Link先を確認	Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Alba Curry, Gavin Abercrombie, Dirk Hovy,	(参考訳) 大規模言語モデル(LLM)は、特にジェンダーに関する社会的規範と偏見を反映している。社会的バイアスやステレオタイプは様々なNLPアプリケーションで広く研究されているが、感情分析には驚くべきギャップがある。しかし、感情とジェンダーは社会的言説と密接に関連している。例えば、女性はより共感的と見なされるが、男性の怒りはより社会的に受け入れられる。このギャップを埋めるために、私たちは5つの最先端のLCM(オープンソースおよびクローズドソース)において、性的な感情帰属に関する最初の包括的研究を提示する。本研究は,感情がジェンダー化されているか,社会的ステレオタイプに基づくのかを考察する。我々はモデルに「大切な人と真面目な議論をした時」のようなイベントに性的なペルソナを採用し、感情を属性付けるよう促す。次に, モデルが生成した感情を, ジェンダー対とジェンダー対の関係で分析する。すべてのモデルは、ジェンダーステレオタイプの影響を受けながら、一貫して性的な感情を示す。これらの知見は、心理学やジェンダー研究の確立した研究と一致している。私たちの研究は、言語、性別、感情の間の複雑な社会的相互作用に光を当てています。 LLMの感情ステレオタイプを再現することで、これらのモデルを用いてそのトピックを詳細に研究することができるが、同じLSMの感情応用における予測的利用に関する疑問が提起される。 Large language models (LLMs) reflect societal norms and biases, especially about gender. While societal biases and stereotypes have been extensively researched in various NLP applications, there is a surprising gap for emotion analysis. However, emotion and gender are closely linked in societal discourse. E.g., women are often thought of as more empathetic, while men's anger is more socially accepted. To fill this gap, we present the first comprehensive study of gendered emotion attribution in five state-of-the-art LLMs (open- and closed-source). We investigate whether emotions are gendered, and whether these variations are based on societal stereotypes. We prompt the models to adopt a gendered persona and attribute emotions to an event like 'When I had a serious argument with a dear person'. We then analyze the emotions generated by the models in relation to the gender-event pairs. We find that all models consistently exhibit gendered emotions, influenced by gender stereotypes. These findings are in line with established research in psychology and gender studies. Our study sheds light on the complex societal interplay between language, gender, and emotion. The reproduction of emotion stereotypes in LLMs allows us to use those models to study the topic in detail, but raises questions about the predictive use of those same LLMs for emotion applications.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# オペレータ学習の正規化グループ Operator Learning Renormalization Group ( http://arxiv.org/abs/2403.03199v2 ) ライセンス: Link先を確認	Xiu-Zhe Luo, Di Luo, Roger G. Melko,	(参考訳) 本稿では,演算子学習再正規化群 (OLRG) と呼ばれる量子多体シミュレーションのための一般的なフレームワークを提案する。機械学習の観点に触発されて、OLRGはウィルソンの数値的再正規化群とホワイトの密度行列再正規化群を一般化し、演算子マップを介して同じ数のサイトを対象とするシステムを再現的に構築する。 OLRGは、状態アンサッツの代わりに演算子マップを学習することで、ターゲットプロパティの誤差を最小化するために損失関数を使用する。この損失関数は、リアルタイム進化のための証明可能なバウンダリを提供するスケーリング一貫性条件によって設計されている。古典的および量子シミュレーションのための演算子マップの2つのバージョンを実装した。前者はOperator Matrix Mapと呼ばれ、従来のコンピュータ上のニューラルネットワークで実装できる。後者は、ハミルトニアン表現マップと呼ばれ、量子コンピューティングハードウェアの能力を活用するためにデバイスパルスシーケンスを生成する。量子イジングモデルハミルトニアンの時間依存量を計算するための両写像の性能について述べる。 In this paper, we present a general framework for quantum many-body simulations called the operator learning renormalization group (OLRG). Inspired by machine learning perspectives, OLRG is a generalization of Wilson's numerical renormalization group and White's density matrix renormalization group, which recursively builds a simulatable system to approximate a target system of the same number of sites via operator maps. OLRG uses a loss function to minimize the error of a target property directly by learning the operator map in lieu of a state ansatz. This loss function is designed by a scaling consistency condition that also provides a provable bound for real-time evolution. We implement two versions of the operator maps for classical and quantum simulations. The former, which we call the Operator Matrix Map, can be implemented via neural networks on classical computers. The latter, which we call the Hamiltonian Expression Map, generates device pulse sequences to leverage the capabilities of quantum computing hardware. We illustrate the performance of both maps for calculating time-dependent quantities in the quantum Ising model Hamiltonian.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# アイテムはプロンプトである:遠方制御によるVersatile Image Editing An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control ( http://arxiv.org/abs/2403.04880v2 ) ライセンス: Link先を確認	Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas,	(参考訳) テキスト・ツー・イメージ拡散モデル(DPM)の成功に基づき、画像編集はAI生成コンテンツとのヒューマンインタラクションを可能にする重要なアプリケーションである。様々な編集方法の中で、プロンプト空間内での編集は、セマンティクスを制御する能力と単純さにより、より注目される。しかし、拡散モデルは通常、説明文のキャプションで事前訓練されているため、テキストプロンプトでの単語の直接編集は、画像編集の要件に違反して、完全に異なる画像を生成するのが普通である。一方、既存の編集手法では、通常はDPMによって無視され、不調和な編集結果につながる未編集領域のアイデンティティを保持するために、通常、空間マスクの導入を検討する。本研究では,これら2つの課題を目標として,各項目が特別な学習プロンプトに関連付けられているため,包括的イメージ・プロンプト相互作用をいくつかの項目・プロンプト相互作用に分解することを提案する。 D-Editという名前のフレームワークは、クロスアテンション層が絡み合った事前訓練された拡散モデルに基づいており、アイテムプロンプトアソシエーションを構築するために2段階の最適化を採用する。そして、対応するプロンプトを操作することで、特定のアイテムにバーサタイル画像編集を適用することができる。我々は、画像ベース、テキストベース、マスクベースの編集、アイテム削除を含む4種類の編集操作において、ほとんどの種類の編集アプリケーションを1つの統一フレームワークでカバーし、最先端の結果を実証する。特にD-Editは,(1)マスク編集による項目編集を実現し,(2)画像とテキストベースの編集を組み合わせた最初のフレームワークである。質的および定量的な評価により,多様な画像の編集結果の品質と汎用性を実証する。 Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among various editing methods, editing within the prompt space gains more attention due to its capacity and simplicity of controlling semantics. However, since diffusion models are commonly pretrained on descriptive text captions, direct editing of words in text prompts usually leads to completely different generated images, violating the requirements for image editing. On the other hand, existing editing methods usually consider introducing spatial masks to preserve the identity of unedited regions, which are usually ignored by DPMs and therefore lead to inharmonic editing results. Targeting these two challenges, in this work, we propose to disentangle the comprehensive image-prompt interaction into several item-prompt interactions, with each item linked to a special learned prompt. The resulting framework, named D-Edit, is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations. Versatile image editing can then be applied to specific items by manipulating the corresponding prompts. We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal, covering most types of editing applications, all within a single unified framework. Notably, D-Edit is the first framework that can (1) achieve item editing through mask editing and (2) combine image and text-based editing. We demonstrate the quality and versatility of the editing results for a diverse collection of images through both qualitative and quantitative evaluations.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# AutoEval Done Right: モデル評価に合成データを使用する AutoEval Done Right: Using Synthetic Data for Model Evaluation ( http://arxiv.org/abs/2403.07008v2 ) ライセンス: Link先を確認	Pierre Boyeau, Anastasios N. Angelopoulos, Nir Yosef, Jitendra Malik, Michael I. Jordan,	(参考訳) 人間のラベル付き検証データを用いた機械学習モデルの評価は高価で時間を要する可能性がある。 AIラベル付き合成データは、自動評価と呼ばれるプロセスにおいて、この目的のために必要とされる人間のアノテーションの数を減らすために使用できる。この目的のために,非偏りを保ちながら試料効率を向上させるための効率的で統計的に原理化されたアルゴリズムを提案する。これらのアルゴリズムは、GPT-4の実験において、有効にラベル付けされたサンプルサイズを最大50%増加させる。 The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# Lumen: 大規模マルチモーダルモデルの可視光中心能力の解放 Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models ( http://arxiv.org/abs/2403.07304v2 ) ライセンス: Link先を確認	Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang,	(参考訳) 大規模マルチモーダルモデル(LMM)はコンピュータビジョン領域におけるホットな研究課題であり、また複数の分野にまたがる顕著な可能性を示した。最近のトレンドは、LMMの知覚能力をさらに拡張し、強化することである。現在の手法は、視覚的タスク出力をLMMの主成分である言語モデルの形式に適応するパラダイムに従っている。この適応により、最小限の修正を施したLMMの簡便な開発がもたらされるが、多様な視覚タスクの本質的な特徴を見落とし、知覚能力の学習を妨げる。この問題に対処するために,多目的視覚中心機能拡張を備えた大規模マルチモーダルモデルであるLumenという新しいLMMアーキテクチャを提案する。我々はLMMの知覚能力の学習をタスク非依存およびタスク特化段階に分離する。ルーメンはまず、様々な視覚タスクの基本的な能力である、きめ細かい視覚言語概念のアライメントを促進する。したがって、タスク非依存段階の出力は、本稿で扱う全てのタスクの共有表現である。そして、共有表現を無視可能な訓練努力を伴う軽量タスクデコーダに柔軟にルーティングすることで、タスク固有のデコーダを実行する。一連の視覚中心およびVQAベンチマークにおける総合的な実験結果から、我々のルーメンモデルは、視覚中心のタスクにおいて既存のLMMベースのアプローチの性能を達成または超越するだけでなく、一般的な視覚的理解と指示追従能力を維持しつつも、様々な視覚中心のタスクを達成または超越していることが示された。コードはhttps://github.com/SxJyJay/Lumen.comでリリースされる。 Large Multimodal Model (LMM) is a hot research topic in the computer vision area and has also demonstrated remarkable potential across multiple disciplinary fields. A recent trend is to further extend and enhance the perception capabilities of LMMs. The current methods follow the paradigm of adapting the visual task outputs to the format of the language model, which is the main component of a LMM. This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities. To address this issue, we propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement. We decouple the LMM's learning of perception capabilities into task-agnostic and task-specific stages. Lumen first promotes fine-grained vision-language concept alignment, which is the fundamental capability for various visual tasks. Thus the output of the task-agnostic stage is a shared representation for all the tasks we address in this paper. Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders with negligible training efforts. Comprehensive experimental results on a series of vision-centric and VQA benchmarks indicate that our Lumen model not only achieves or surpasses the performance of existing LMM-based approaches in a range of vision-centric tasks while maintaining general visual understanding and instruction following capabilities. The code will be released at https://github.com/SxJyJay/Lumen.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# SVD-LLM:大規模言語モデル圧縮のためのトランケーション対応特異値分解 SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression ( http://arxiv.org/abs/2403.07378v4 ) ライセンス: Link先を確認	Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang,	(参考訳) 大規模言語モデル (LLMs) の進歩は, LLM 圧縮法を実用的展開に必要としていた, 相当なサイズによって妨げられている。 Singular Value Decomposition (SVD)は、LLM圧縮のための有望なソリューションを提供する。しかし、最先端のSVDベースのLLM圧縮法には、2つの重要な制限がある: より小さい特異値の切り抜きは、圧縮損失を増大させ、SVDの切り抜き後の圧縮重量の更新が欠如する。本研究では,既存の手法の制約に対処する新たなSVD-LLM圧縮手法であるSVD-LLMを提案する。 SVD-LLMは、特異値と圧縮損失の直接マッピングを保証するために、トラクション対応のデータホワイトニング戦略を組み込んでいる。さらに, SVD-LLMでは, 高圧縮比下での精度劣化を補償するために, 階層単位の閉形式モデル更新戦略を採用している。 SVD-LLMを4つの異なるスケールで3つのLLMファミリーから合計10のデータセットと8つのモデルで評価した。以上の結果から, SVD-LLMは最先端技術, 特に高モデル圧縮比よりも優れていることが示された。 The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weight after SVD truncation. In this work, we propose SVD-LLM, a new SVD-based LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening strategy to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a layer-wise closed-form model parameter update strategy to compensate for accuracy degradation under high compression ratios. We evaluate SVD-LLM on a total of 10 datasets and eight models from three different LLM families at four different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# Refractive COLMAP: Refractive Structure-from-Motion Revisited Refractive COLMAP: Refractive Structure-from-Motion Revisited ( http://arxiv.org/abs/2403.08640v2 ) ライセンス: Link先を確認	Mengkun She, Felix Seegräber, David Nakath, Kevin Köser,	(参考訳) 本稿では, 屈折型カメラ装置を用いた水中3次元再構成のための完全屈折型構造移動(RSfM)フレームワークを提案する。過去10年間の屈折率多視点幾何学の顕著な成果にもかかわらず、そのようなタスクに対する頑健で完全かつ一般公開された解は現時点では入手できず、しばしば実用的応用は、ピンホールカメラモデルの内在的(歪み)パラメータによる屈折率の近似に頼らざるを得ない。このギャップを埋めるために、我々はSfMプロセス全体を通して、最先端のオープンソースのSfMフレームワークCOLMAPに統合した。地上の真理を持つ合成生成光実写画像の数値シミュレーションと再構成結果から, 屈折を許容することは, 空気中の再構成に比べて精度や頑健さを損なうことはないことが確認された。最後に,6000枚近い画像からなるデータセットを用いて,大規模屈折率シナリオに対するアプローチの有効性を示す。実装は、https://cau-git.rz.uni-kiel.de/inf-ag-koeser/colmap_underwaterでオープンソースとしてリリースされた。 In this paper, we present a complete refractive Structure-from-Motion (RSfM) framework for underwater 3D reconstruction using refractive camera setups (for both, flat- and dome-port underwater housings). Despite notable achievements in refractive multi-view geometry over the past decade, a robust, complete and publicly available solution for such tasks is not available at present, and often practical applications have to resort to approximating refraction effects by the intrinsic (distortion) parameters of a pinhole camera model. To fill this gap, we have integrated refraction considerations throughout the entire SfM process within the state-of-the-art, open-source SfM framework COLMAP. Numerical simulations and reconstruction results on synthetically generated but photo-realistic images with ground truth validate that enabling refraction does not compromise accuracy or robustness as compared to in-air reconstructions. Finally, we demonstrate the capability of our approach for large-scale refractive scenarios using a dataset consisting of nearly 6000 images. The implementation is released as open-source at: https://cau-git.rz.uni-kiel.de/inf-ag-koeser/colmap_underwater.	翻訳日:2024-05-30 01:09:03 公開日:2024-05-28
# 3次元ガウススプラッティングにおける正確な初期化制約の緩和 Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting ( http://arxiv.org/abs/2403.09413v2 ) ライセンス: Link先を確認	Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim,	(参考訳) 3次元ガウシアンスプラッティング(3DGS)は,近年,リアルタイムの新規視像合成と3次元再構成において顕著な能力を示した。しかし、3DGSはStructure-from-Motion (SfM)法に由来する正確な初期化に大きく依存している。ノイズの有無やランダムに初期化点雲を使用する場合などの初期点雲の品質が低下すると、3DGSはしばしば大きな性能低下を経験する。この制限に対処するため,我々は RAIN-GS (Relaing Accurate Initialization Constraint for 3D Gaussian Splatting) と呼ばれる新しい最適化手法を提案する。提案手法は,元の3DGS最適化方式の詳細な解析と周波数領域におけるSfM初期化の解析に基づいている。我々の分析に基づいて簡単な修正を施し、3Dガウスを準最適点雲(例えばランダムに初期化点雲)から訓練し、正確な初期化の必要性を効果的に緩和した。ランダムポイントクラウドでトレーニングされたRAIN-GSは、正確なSfMポイントクラウドでトレーニングされた3DGSよりも高いパフォーマンスを達成する。私たちのプロジェクトページとコードは、https://ku-cvlab.github.io/RAIN-GS.orgで参照できます。 3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When the quality of the initial point cloud deteriorates, such as in the presence of noise or when using randomly initialized point cloud, 3DGS often undergoes large performance drops. To address this limitation, we propose a novel optimization strategy dubbed RAIN-GS (Relaing Accurate Initialization Constraint for 3D Gaussian Splatting). Our approach is based on an in-depth analysis of the original 3DGS optimization scheme and the analysis of the SfM initialization in the frequency domain. Leveraging simple modifications based on our analyses, RAIN-GS successfully trains 3D Gaussians from sub-optimal point cloud (e.g., randomly initialized point cloud), effectively relaxing the need for accurate initialization. We demonstrate the efficacy of our strategy through quantitative and qualitative comparisons on multiple datasets, where RAIN-GS trained with random point cloud achieves performance on-par with or even better than 3DGS trained with accurate SfM point cloud. Our project page and code can be found at https://ku-cvlab.github.io/RAIN-GS.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# 溶媒を意識した2次元NMR予測:マルチタスクトレーニングと反復自己学習戦略の活用 Solvent-Aware 2D NMR Prediction: Leveraging Multi-Tasking Training and Iterative Self-Training Strategies ( http://arxiv.org/abs/2403.11353v2 ) ライセンス: Link先を確認	Yunrui Li, Hao Xu, Pengyu Hong,	(参考訳) 核磁気共鳴(NMR)分光法は様々な科学分野において重要であり、詳細な構造情報、電子特性、分子動力学の洞察を明らかにする。分子構造からのスペクトルにおけるNMRピークの正確な予測は、化学者がNMRスペクトルの実験的シフトと比較することによって、候補構造を効果的に評価することができる。このプロセスはピークの割り当てを促進するため、分子構造の検証や相違点の同定に寄与する。機械学習(ML)アプローチによる1次元NMRの予測には大きな進歩があるが、注釈付き2次元NMRトレーニングデータセットがないため、2次元NMR予測は依然として課題である。このギャップに対処するため,実験2次元NMRスペクトルにおける原子2次元NMR交差ピークの予測とアノテートピークの予測のための機械学習モデルを訓練するための反復的教師なし学習(IUL)手法を提案する。当初、このモデルは注釈付き1D 1Hと13C NMRスペクトルを用いてマルチタスク事前訓練(MTT)フェーズを行う。次に、IULを用いた微調整プロセスによりモデルの改善を行い、未ラベルの2D NMRデータにアノテートするためにモデルを使用することと、新たに生成されたアノテーションを用いてモデルを精査することとを交互に交互に行う。提案手法を用いて、19,000個のヘテロ核単一量子コヒーレンス(HSQC)スペクトルを用いてモデルをトレーニングし、専門家アノテーションを用いた500個のHSQCスペクトル上でテストし、さらに別の専門家アノテーション付きHSQCデータセット上の2つの従来手法(ChemDrawとMestrenova)と比較した。 HSQCクロスピーク予測では,テストデータセット上の13Cシフトに対して2.035 ppmと0.163 ppmのMAEを達成し,従来のツールよりも優れていた。この性能は、化学シフトを正確に予測するモデルの性能だけでなく、実験用HSQCスペクトルのピーク割り当てにおける有効性を示す。 Nuclear magnetic resonance (NMR) spectroscopy is crucial across diverse scientific fields, revealing detailed structural information, electronic properties, and molecular dynamic insights. Accurate prediction of NMR peaks in a spectrum from molecular structures allows chemists to effectively evaluate candidate structures by comparing predictions with experimental shifts in an NMR spectra. This process facilitates peak assignments, thereby aiding in verifying molecular structures or identifying discrepancies. Although significant progress has been made in predicting 1D NMR with Machine Learning (ML) approaches, 2D NMR prediction remains a challenge due to the lack of an annotated 2D NMR training dataset. To address this gap, we propose an Iterative Unsupervised Learning (IUL) approach to train a machine learning model for predicting atomic 2D NMR cross peaks and annotating peaks in experimental 2D NMR spectra. Initially, the model undergoes a Multi-Task pre-Training (MTT) phase using a set of annotated 1D 1H and 13C NMR spectra. Then, the model is iteratively improved through a fine-tuning process with IUL, alternating between using the model to annotate the unlabeled 2D NMR data and refining the model using the newly generated annotations. Using the proposed approach, we trained our model on 19,000 Heteronuclear Single Quantum Coherence (HSQC) spectra, tested it on 500 HSQC spectra with expert annotations, and further compared it with two traditional methods (ChemDraw and Mestrenova) on another expert-annotated HSQC dataset. For HSQC cross peak prediction, our model achieves MAE of 2.035 ppm and 0.163 ppm for 13C shifts and 1H shifts on the test dataset, respectively, and outperforms the conventional tools. This performance demonstrates not only the model's capability in accurately predicting chemical shifts, but also its effectiveness in peak assignments for experimental HSQC spectra.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# 干渉計における回折限界に打ち勝つインテンシティ生成物に基づく光センシング Intensity product-based optical sensing to beat the diffraction limit in an interferometer ( http://arxiv.org/abs/2403.13029v2 ) ライセンス: Link先を確認	Byoung S. Ham,	(参考訳) 古典的に定義された光学相の最小不確実性は、量子力学の不確実性原理に由来する標準量子極限またはショットノイズ極限(SNL)として知られている。 SNLに基づいて位相感度は正方根Kに逆比例し、Kは干渉光子数または統計的に測定された事象数である。これにより、高出力レーザーを用いることで、信号対雑音比の平方根Kゲインによる感度を高めることができる。しかし、典型的な干渉計では、量子センシングのように干渉光子が分解されない限り、分解能はK=1の場合の回折限界に留まる。ここでは、量子センシングにおける投影測定法を干渉計に適応させ、さらに2乗根Kゲインを分解能で達成する。プロジェクション測定では、干渉計の干渉縞をKth動力でKthオーダーの強度積を置き換えることができる。マルチウェーブ干渉による高分解能化を理解するために、いくつかの種類の干渉計を数値的に比較し、対応する分解能パラメータを描画する。その結果、KthパワーによるN-スリット干渉計の分解能は、量子センシングにおける回折限界とハイゼンベルク限界を超えている。 The classically defined minimum uncertainty of the optical phase is known as the standard quantum limit or shot-noise limit (SNL) originating in the uncertainty principle of quantum mechanics. Based on SNL, the phase sensitivity is inversely proportional to the square root K, where K is the number of interfering photons or statistically measured events. Thus, using a high-power laser is advantageous to enhance sensitivity due to the square root K gain in the signal-to-noise ratio. In a typical interferometer, however, the resolution remains in the diffraction limit of the K=1 case unless the interfering photons are resolved as in quantum sensing. Here, a projection-measurement method in quantum sensing is adapted for an interferometer to achieve an additional square root K gain in resolution. For the projection measurement, the interference fringe of an interferometer can be Kth-powered to replace the Kth-order intensity product. To understand many-wave interference-caused enhanced resolution, several types of interferometers are numerically compared to draw corresponding resolution parameters. As a result, the achieved resolution by the Kth power to an N-slit interferometer exceeds the diffraction limit and the Heisenberg limit in quantum sensing.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# 知識編集による大規模言語モデルのデトックス化 Detoxifying Large Language Models via Knowledge Editing ( http://arxiv.org/abs/2403.14472v5 ) ライセンス: Link先を確認	Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen,	(参考訳) 本稿では,Large Language Models (LLM) のデトックス化のための知識編集手法について検討する。我々は、安全でない9つのカテゴリを様々な強力なアタックプロンプトでカバーし、体系的な評価のために総合的なメトリクスを装備するベンチマーク、SafeEditを構築した。いくつかの知識編集手法を用いて実験を行い、知識編集がLLMを解毒する可能性を示し、汎用性能に限られた影響を与えていることを示す。そこで我々は,DINM(Detoxifying with intraoperative Neural Monitoring)と呼ばれるシンプルなベースラインを提案する。さらに, 従来のSFT法やDPO法は毒性パラメータの活性化を抑制するだけであり, DINM法は毒性パラメータの毒性をある程度軽減し, 恒久的な調整を行うことを実証した。これらの知見が,LSMの非毒性化アプローチと基盤となる知識メカニズムの今後の研究に光を当てることが期待できる。コードとベンチマークはhttps://github.com/zjunlp/EasyEdit.comで公開されている。 This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories with various powerful attack prompts and equips comprehensive metrics for systematic evaluation. We conduct experiments with several knowledge editing approaches, indicating that knowledge editing has the potential to detoxify LLMs with a limited impact on general performance efficiently. Then, we propose a simple yet effective baseline, dubbed Detoxifying with Intraoperative Neural Monitoring (DINM), to diminish the toxicity of LLMs within a few tuning steps via only one instance. We further provide an in-depth analysis of the internal mechanism for various detoxifying approaches, demonstrating that previous methods like SFT and DPO may merely suppress the activations of toxic parameters, while DINM mitigates the toxicity of the toxic parameters to a certain extent, making permanent adjustments. We hope that these insights could shed light on future work of developing detoxifying approaches and the underlying knowledge mechanisms of LLMs. Code and benchmark are available at https://github.com/zjunlp/EasyEdit.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# 大規模言語モデルエージェントを用いたアセット管理シェルの生成:産業4.0におけるデジタル双生児のセマンティック相互運用に向けて Generation of Asset Administration Shell with Large Language Model Agents: Towards Semantic Interoperability in Digital Twins in the Context of Industry 4.0 ( http://arxiv.org/abs/2403.17209v2 ) ライセンス: Link先を確認	Yuchen Xia, Zhewen Xiao, Nasser Jazdi, Michael Weyrich,	(参考訳) 本研究では,デジタル双生児におけるセマンティック・インターオペラビリティの実現と,産業4.0におけるデジタル双生児モデルとしてのアセット・アドミニストレーション・シェル(AAS)の作成を支援する新しいアプローチを提案する。本研究の基本的な考え方は,意味論に基づくコミュニケーションと有意義なテキストデータ生成が直接リンクされていることである。そこで本研究では,テキストデータのセマンティックな意味を捉えた「意味ノード」データ構造を構築した。次に,大規模言語モデルを用いたシステムの設計と実装を行い,技術資産を記述したデータシートから収集した原文データから「意味ノード」を処理し,標準化されたデジタルツインモデルを生成する。評価の結果,62～79%の有効生成率を示し,大言語モデルの生成能力を有するディジタルツインインスタンスモデルに対して,ソーステキストからの情報のかなりの割合を誤りなく翻訳できることが示唆された。この結果は、Industrial 4.0の文脈で直接適用され、AASモデルを作成する際の手作業を減らすためのデータモデル生成ツールとして実装される。本評価では、異なるLLMの比較分析と、レトリーバル拡張生成(RAG)機構の詳細なアブレーション研究により、LLMシステムの技術的概念の解釈とデータ翻訳における有効性について考察する。本研究は,ALSインスタンスの自動生成能力を強調し,産業アプリケーションにおけるデジタル双生児のセマンティック相互運用性の幅広い分野に寄与する。プロトタイプの実装と評価結果はGitHub Repositoryで発表されています。 This research introduces a novel approach for achieving semantic interoperability in digital twins and assisting the creation of Asset Administration Shell (AAS) as digital twin model within the context of Industry 4.0. The foundational idea of our research is that the communication based on semantics and the generation of meaningful textual data are directly linked, and we posit that these processes are equivalent if the exchanged information can be serialized in text form. Based on this, we construct a "semantic node" data structure in our research to capture the semantic essence of textual data. Then, a system powered by large language models is designed and implemented to process the "semantic node" and generate standardized digital twin models from raw textual data collected from datasheets describing technical assets. Our evaluation demonstrates an effective generation rate of 62-79%, indicating a substantial proportion of the information from the source text can be translated error-free to the target digital twin instance model with the generative capability of large language models. This result has a direct application in the context of Industry 4.0, and the designed system is implemented as a data model generation tool for reducing the manual effort in creating AAS model. In our evaluation, a comparative analysis of different LLMs and an in-depth ablation study of Retrieval-Augmented Generation (RAG) mechanisms provide insights into the effectiveness of LLM systems for interpreting technical concepts and translating data. Our findings emphasize LLMs' capability to automate AAS instance creation and contribute to the broader field of semantic interoperability for digital twins in industrial applications. The prototype implementation and evaluation results are presented on our GitHub Repository: https://github.com/YuchenXia/AASbyLLM.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# R2D2を用いたスケーラブル非カルテシアン磁気共鳴イメージング Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2 ( http://arxiv.org/abs/2403.17905v3 ) ライセンス: Link先を確認	Yiwei Chen, Chao Tang, Amir Aghabiglou, Chung San Chu, Yves Wiaux,	(参考訳) 非カルテシアン磁気共鳴画像再構成のための新しい手法を提案する。アンロールアーキテクチャはデータ一貫性レイヤを介して堅牢性を提供するが、ディープニューラルネットワーク(DNN)に計測演算子を埋め込むことは、大規模に非現実的になる可能性がある。代替的なPlug-and-Play(PnP)アプローチでは、DNNは測定環境に不自由であり、この制限の影響を受けず、有効性も証明されているが、その高い反復性はスケーラビリティにも影響を及ぼす。このスケーラビリティ問題に対処するために、最近天文学的イメージングで導入された「Residual-to-Residual DNNシリーズ」を高ダイナミックレンジイメージング(R2D2)に活用する。 R2D2の再構成は一連の残像として形成され、前回の繰り返しの画像推定と関連するデータを入力として取り込んだDNNの出力として反復的に推定される。この方法はMatching Pursuitアルゴリズムの学習版と解釈できる。我々は、ラジアルk空間サンプリング取得シーケンスを考慮したシミュレーションでR2D2を実証する。我々の予備的な結果は、R2D2が達成できることを示唆している。 (i) NUFFT ベースのデータ一貫性層を組み込む必要により,R2D2-Net は拡張不可能である。 (II)データ一貫性のためのFFTに基づく近似を組み込んだR2D2-Netのスケーラブル版に優れた再構成品質 (3)PnPの再現性は優れているが、イテレーションは少ない。 We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limitation and have also proven effective, but their highly iterative nature also affects scalability. To address this scalability challenge, we leverage the "Residual-to-Residual DNN series for high-Dynamic range imaging (R2D2)" approach recently introduced in astronomical imaging. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated data residual as inputs. The method can be interpreted as a learned version of the Matching Pursuit algorithm. We demonstrate R2D2 in simulation, considering radial k-space sampling acquisition sequences. Our preliminary results suggest that R2D2 achieves: (i) suboptimal performance compared to its unrolled incarnation R2D2-Net, which is however non-scalable due to the necessary embedding of NUFFT-based data-consistency layers; (ii) superior reconstruction quality to a scalable version of R2D2-Net embedding an FFT-based approximation for data consistency; (iii) superior reconstruction quality to PnP, while only requiring few iterations.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# 概念から実装までの大規模言語モデルに関する調査 A Survey on Large Language Models from Concept to Implementation ( http://arxiv.org/abs/2403.18969v2 ) ライセンス: Link先を確認	Chen Wang, Jin Zhao, Jiaqi Gong,	(参考訳) 近年のLarge Language Models(LLM)の進歩、特にTransformerアーキテクチャ上に構築されているものは、自然言語処理(NLP)アプリケーションの範囲を大きく拡大し、チャットボット技術での最初の使用を超越している。本稿では,これらのモデルの多面的応用について検討し,GPTシリーズに着目した。この調査は、コーディングや問題解決といった従来のタスクに革命をもたらす人工知能(AI)駆動ツールの変革的な影響に焦点を当てると同時に、さまざまな産業にまたがる研究と開発の新たな道を開いた。コード解釈や画像キャプションからインタラクティブなシステムの構築や計算領域の進化まで、Transformerモデルはディープラーニング、データ分析、ニューラルネットワーク設計のシナジーを実証している。この調査では、Transformerモデルの最新の研究を詳細に分析し、その汎用性と、多様なアプリケーションセクターを変革する可能性を強調した上で、TransformerベースのLCMの現在の状況と将来の展望を、実践的な応用において包括的に理解した読者に提供する。 Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot technology. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of artificial intelligence (AI) driven tools in revolutionizing traditional tasks like coding and problem-solving, while also paving new paths in research and development across diverse industries. From code interpretation and image captioning to facilitating the construction of interactive systems and advancing computational domains, Transformer models exemplify a synergy of deep learning, data analysis, and neural network design. This survey provides an in-depth look at the latest research in Transformer models, highlighting their versatility and the potential they hold for transforming diverse application sectors, thereby offering readers a comprehensive understanding of the current and future landscape of Transformer-based LLMs in practical applications.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# PiSSA:大言語モデルの主特異値と特異ベクトル適応 PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models ( http://arxiv.org/abs/2404.02948v3 ) ライセンス: Link先を確認	Fanxu Meng, Zhaohui Wang, Muhan Zhang,	(参考訳) パラメータ効率のよいPEFT(英語版)大言語モデル (LLMs) に対して、ローランク適応 (LoRA) 法はモデルの変更を近似する$\Delta W \in \mathbb{R}^{m \times n}$ 2つの行列の積$A \in \mathbb{R}^{m \times r}$と$B \in \mathbb{R}^{r \times n}$、$r \ll \min(m, n)$、$A$はガウス雑音で初期化される$B$である。 LoRAはオリジナルのモデルである$W$をフリーズし、"Noise & Zero"アダプタを更新する。この制限を克服するために、主特異値と特異ベクトル適応(PiSSA)を導入する。 PiSSAはLoRAと同じアーキテクチャを共有しているが、適応行列の$A$と$B$を元の行列の主成分である$W$で初期化し、残りのコンポーネントを残留行列の$W^{res} \in \mathbb{R}^{m \times n}$に置き、微調整中に凍結する。 LoRAと比較すると、PiSSAは主コンポーネントを更新し、"残留"部分を凍結することで、より高速な収束とパフォーマンスの向上を実現している。 5つのNLGタスクと8つのNLUタスクを含む184Mから70Bまで、12種類のモデルにわたるPiSSAとLoRAの比較実験により、PiSSAは同じ実験装置で一貫してLoRAを上回っていることが明らかになった。 GSM8Kベンチマークでは、PiSSAで微調整されたMistral-7Bの精度は72.86%に達し、ロラの67.7%を5.16%上回った。同じアーキテクチャのため、PiSSAは量子化と互換性があり、微調整のメモリ要求をさらに削減できる。 QLoRAと比較すると、QPiSSA(PiSSAと4ビット量子化)は初期段階でより小さい量子化誤差を示す。 GSM8K上の微調整LLaMA-3-70Bでは、QPiSSAの精度は86.05%に達し、QLoRAの性能は81.73%を超えた。高速なSVD技術を利用すると、PiSSAはほんの数秒で初期化でき、LoRAからPiSSAへの移行には無視できるコストがかかる。 To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank adaptation (LoRA) method approximates the model changes $\Delta W \in \mathbb{R}^{m \times n}$ through the product of two matrices $A \in \mathbb{R}^{m \times r}$ and $B \in \mathbb{R}^{r \times n}$, where $r \ll \min(m, n)$, $A$ is initialized with Gaussian noise, and $B$ with zeros. LoRA freezes the original model $W$ and updates the "Noise & Zero" adapter, which may lead to slow convergence. To overcome this limitation, we introduce Principal Singular values and Singular vectors Adaptation (PiSSA). PiSSA shares the same architecture as LoRA, but initializes the adaptor matrices $A$ and $B$ with the principal components of the original matrix $W$, and put the remaining components into a residual matrix $W^{res} \in \mathbb{R}^{m \times n}$ which is frozen during fine-tuning. Compared to LoRA, PiSSA updates the principal components while freezing the "residual" parts, allowing faster convergence and enhanced performance. Comparative experiments of PiSSA and LoRA across 12 different models, ranging from 184M to 70B, encompassing 5 NLG and 8 NLU tasks, reveal that PiSSA consistently outperforms LoRA under identical experimental setups. On the GSM8K benchmark, Mistral-7B fine-tuned with PiSSA achieves an accuracy of 72.86%, surpassing LoRA's 67.7% by 5.16%. Due to the same architecture, PiSSA is also compatible with quantization to further reduce the memory requirement of fine-tuning. Compared to QLoRA, QPiSSA (PiSSA with 4-bit quantization) exhibits smaller quantization errors in the initial stages. Fine-tuning LLaMA-3-70B on GSM8K, QPiSSA attains an accuracy of 86.05%, exceeding the performances of QLoRA at 81.73%. Leveraging a fast SVD technique, PiSSA can be initialized in only a few seconds, presenting a negligible cost for transitioning from LoRA to PiSSA.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# ROPO:大規模言語モデルに対するロバストな選好最適化 ROPO: Robust Preference Optimization for Large Language Models ( http://arxiv.org/abs/2404.04102v2 ) ライセンス: Link先を確認	Xize Liang, Chao Chen, Shuang Qiu, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye,	(参考訳) 大規模言語モデル(LLM)を有効かつ無害な応答に活用するためには、優先アライメントが重要である。しかし、選好アライメントの性能は、選好データにおける相応の雑音に非常に敏感である。この問題に対する近年の取り組みは、実際に存在感を減らさずにノイズの影響を極端に緩和するか、あるいは、コストのかかるLLMに頼って誤一般化を報いるかのどちらかである。これらの課題に対処するため, RObust Preference Optimization (ROPO) フレームワークを提案する。具体的には、ROPOは制約付き最適化問題を反復的に解決し、各サンプルに品質を考慮した重みを動的に割り当て、重みの和を保持するサンプルの数に制限する。耐雑音性トレーニングと有効雑音識別のために, 不確実性の高い試料の勾配を抑えることにより, 頑健な損失を導出する。ノイズのあるサンプルとクリーンなサンプルを区別するためには, 導出損失が重要であることを実証的および理論的に証明する。さらに, 提案手法は, 提案手法に着想を得て, 廃クエリにおける潜在的重要な情報を補うためのロバストネス誘導型リジェクションサンプリング手法を提案する。 Mistral-7B と Llama-2-7B による3つの広く使われているデータセットの実験により、ROPO はノイズ率の増大に伴って、既存の嗜好アライメント手法を著しく上回っていることが示された。 Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs prone to reward misgeneralization. To address these challenges, we propose the RObust Preference Optimization (ROPO) framework, an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Specifically, ROPO iteratively solves a constrained optimization problem, where we dynamically assign a quality-aware weight for each sample and constrain the sum of the weights to the number of samples we intend to retain. For noise-tolerant training and effective noise identification, we derive a robust loss by suppressing the gradients of samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is critical for distinguishing noisy samples from clean ones. Furthermore, inspired by our derived loss, we propose a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods, with its superiority growing as the noise rate increases.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# テキスト・画像モデルの多目的パーソナライズのためのアイデンティティ・デカップリング Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models ( http://arxiv.org/abs/2404.04243v2 ) ライセンス: Link先を確認	Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang,	(参考訳) テキスト・ツー・イメージ拡散モデルでは、いくつかの参照画像に基づいてパーソナライズされた被写体を生成することに顕著な成功を収めている。しかし、複数の被写体を同時に生成する際には、現在の手法が失敗することが多く、異なる被写体からの複合属性が混在する。本研究では,複数の被験者のアイデンティティを効果的に分離することで,マルチオブジェクトのパーソナライズを可能にする新しいフレームワークであるMuDIを提案する。本研究の目的は,学習と推論の両方にセグメンテーション(セグメンテーション)の基礎モデルによって生成されたセグメンテーションを,生成プロセスのトレーニングと初期化のためのデータ拡張の一形態として活用することである。さらに,本手法の多目的パーソナライゼーションにおける性能を評価するための新しい指標を提案する。実験結果から,図1に示すような非常に類似した被験者であっても,同一性混合を伴わない高品質なパーソナライズ画像が作成可能であることが示された。特に人的評価において、MuDIは、既存のベースラインに対してアイデンティティを混合せずに複数の被験者をパーソナライズする成功率の2倍を取得し、最強ベースラインに対して70%以上が好ましい。 Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multi-subject personalization. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. Specifically, in human evaluation, MuDI obtains twice the success rate for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% against the strongest baseline.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# CodecNeRF: 高速エンコーディング・デコード・コンパクト・高品質ノベルビュー合成を目指して CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis ( http://arxiv.org/abs/2404.04913v2 ) ライセンス: Link先を確認	Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park,	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、3Dオブジェクトやシーンを効果的に捉え、表現することで大きな成功を収めた。しかし、いくつかの要因が次世代3Dメディアとしてさらなる増殖を阻害している。画像やビデオなどの日常的なメディアフォーマットにおいて、ユビキタスな存在を確立するためには、高速エンコーディングとデコード時間、コンパクトモデルサイズ、高品質レンダリングの3つの主要な目的を効果的に果たすソリューションを考案することが不可欠である。大幅な進歩にもかかわらず、全ての目的に適切に対処する包括的アルゴリズムはまだ完全には実現されていない。本研究では,新しいエンコーダとデコーダアーキテクチャからなるNeRF表現のためのニューラルコーデックであるCodecNeRFについて述べる。さらに, パラメータ効率のよいファインタニング手法に着想を得て, 生成したNeRF表現を新しいテストインスタンスに効率よく適応させるファインタニング手法を開発し, 高品質な画像レンダリングとコンパクトなコードサイズを実現した。 The proposed CodecNeRF, a new proposed encoding-decoding-finetuning pipeline for NeRFは、ShapeNetやObjaverseといった広く使われている3Dオブジェクトデータセット上で画像品質を維持し(または改善)しながら、エンコーディング時間の150倍以上と20倍の圧縮性能を達成した。 Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encoding and decoding time, compact model sizes, and high-quality renderings. Despite significant advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of a novel encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we develop a novel finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 150x and 20x reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets, such as ShapeNet and Objaverse.	翻訳日:2024-05-30 00:59:19 公開日:2024-05-28
# Mind-to- Image: Projecting Visual Mental Imagination of the Brain from fMRI Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI ( http://arxiv.org/abs/2404.05468v5 ) ライセンス: Link先を確認	Hugo Caselles-Dupré, Charles Mellerio, Paul Hérent, Alizée Lopez-Persem, Benoit Béranger, Mathieu Soularue, Pierre Fautrel, Gauthier Vernier, Matthieu Cord,	(参考訳) 視覚刺激によって収集されたfMRIデータから被験者が観察した画像の再構成は、広範囲なfMRIデータセットが利用可能となり、画像生成のための生成モデルの進歩により、過去10年間に大きく進歩してきた。しかし、視覚再建の応用はいまだに限られている。視覚的想像力の再構築は、障害を持つ個人を支援することから、法廷での証人口座の検証まで、潜在的に革命的な応用によって大きな課題を呈する。この分野での主なハードルは、視覚画像のためのデータ収集プロトコルの欠如と、対象とするデータセットの欠如である。伝統的に、fMRI-to-imageは、視覚刺激にさらされた被験者から収集されたデータに依存しており、視覚刺激と視覚刺激の脳活動の違いに基づいて視覚画像を生成する問題を引き起こす。提案したデータ収集プロトコルとともに、視覚画像に関するかなりのデータセット(約6hのスキャン)を初めてコンパイルした。次に、fMRI-to-imageモデルの修正版をトレーニングし、メモリと純粋なイマジネーションの2つのモードからイメージを再構築する可能性を示す。私たちがMind-to-Imageと呼ぶパイプラインは、視覚的なイメージを直接再構築できる技術を作るための一歩です。 The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# VietMed:医療領域におけるベトナム語の自動音声認識のためのデータセットとベンチマーク VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain ( http://arxiv.org/abs/2404.05659v2 ) ライセンス: Link先を確認	Khai Le-Duc,	(参考訳) プライバシーの制限により、医療領域で利用可能な音声認識データセットが不足しています。本研究では,医療領域におけるベトナム語音声認識データセットであるVietMedについて紹介する。私たちの知る限りでは、VietMedは、合計持続時間、話者数、疾患、記録条件、話者の役割、ユニークな医療用語、アクセントの7つの面で、世界最大である。 VietMedは、ベトナムの公的な音声データセットとしては最大規模である。さらに,全国のICD-10病群とすべてのアクセントを対象とする医学的ASRデータセットを初めて提示する。さらに、ベトナムのASR、w2v2-Viet、XLSR-53-Viet向けの最初の大規模事前訓練モデルと、医療用ASRのための最初の大規模微調整モデルをリリースする。 XLSR-53-Vietは、教師なし事前トレーニングの医療データがない場合でも、テストセットにおいて51.8%から29.6%のWER(相対的な40%以上の低下)で最先端のXLSR-53を上回り、医療領域に非常によく一般化する。すべてのコード、データ、モデルは、https://github.com/leduckhai/MultiMed.comで公開されている。 Due to privacy restrictions, there's a shortage of publicly available speech recognition datasets in the medical domain. In this work, we present VietMed - a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world's largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country. Moreover, we release the first public large-scale pre-trained models for Vietnamese ASR, w2v2-Viet and XLSR-53-Viet, along with the first public large-scale fine-tuned models for medical ASR. Even without any medical data in unsupervised pre-training, our best pre-trained model XLSR-53-Viet generalizes very well to the medical domain by outperforming state-of-the-art XLSR-53, from 51.8% to 29.6% WER on test set (a relative reduction of more than 40%). All code, data and models are made publicly available: https://github.com/leduckhai/MultiMed.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# ストリーム処理フレームワークにおける異常回復の総合ベンチマーク解析 A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks ( http://arxiv.org/abs/2404.06203v2 ) ライセンス: Link先を確認	Adriano Vogel, Sören Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser,	(参考訳) 現在、いくつかのソフトウェアシステムは、スケーラブルなパフォーマンスを提供し、ほぼリアルタイムで大量のデータを処理するために、ストリーム処理アーキテクチャに依存している。ストリーム処理フレームワークは、アプリケーションの実行を複数のマシンに分散することで、スケーラブルなコンピューティングを容易にする。性能は広く研究されているが、ストリーム処理フレームワークが提供する重要な特徴である耐障害性の測定は、更新された総合的なテストベッドでは、まだ適切に測定されていない。さらに、障害復旧がパフォーマンスに与える影響はほとんど無視されます。本稿では、Flink、Kafka Streams、Spark Structured Streamingといった最新のオープンソースフレームワークを備えたクラウドネイティブ環境での障害復旧性能、安定性、回復時間に関する包括的な分析を提供する。私たちのベンチマーク分析は、カオスエンジニアリングにインスパイアされて、障害を注入しています。以上の結果から,従来の分散ストリーム処理における障害回復研究と比較して,大きな変化が見られた。特に、結果は、Flinkが最も安定しており、最高の障害回復の1つを持っていることを示している。さらに、Kafka Streamsは障害後のパフォーマンスの不安定さを示している。 Spark Structured Streamingは、適切なフォールトリカバリパフォーマンスと安定性を示しているが、イベントレイテンシが高い。私たちの研究は i)データ集約型アプリケーションの効率的かつ信頼性の高い実行に最適なストリーム処理フレームワークを選択することを支援する。二研究者が研究方法及びベンチマークの適用及び拡張を支援すること。 3)本番デプロイメントにおける潜在的な問題の特定、防止、支援。 Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the application's execution across multiple machines. Despite performance being extensively studied, the measurement of fault tolerance-a key feature offered by stream processing frameworks-has still not been measured properly with updated and comprehensive testbeds. Moreover, the impact that fault recovery can have on performance is mostly ignored. This paper provides a comprehensive analysis of fault recovery performance, stability, and recovery time in a cloud-native environment with modern open-source frameworks, namely Flink, Kafka Streams, and Spark Structured Streaming. Our benchmarking analysis is inspired by chaos engineering to inject failures. Generally, our results indicate that much has changed compared to previous studies on fault recovery in distributed stream processing. In particular, the results indicate that Flink is the most stable and has one of the best fault recovery. Moreover, Kafka Streams shows performance instabilities after failures, which is due to its current rebalancing strategy that can be suboptimal in terms of load balancing. Spark Structured Streaming shows suitable fault recovery performance and stability, but with higher event latency. Our study intends to (i) help industry practitioners in choosing the most suitable stream processing framework for efficient and reliable executions of data-intensive applications; (ii) support researchers in applying and extending our research method as well as our benchmark; (iii) identify, prevent, and assist in solving potential issues in production deployments.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# 離散力学系の局所的相互作用を学習する:データ効率・拡張性予測に向けて Learning Locally Interacting Discrete Dynamical Systems: Towards Data-Efficient and Scalable Prediction ( http://arxiv.org/abs/2404.06460v2 ) ライセンス: Link先を確認	Beomseok Kang, Harshit Kumar, Minah Lee, Biswadeep Chakraborty, Saibal Mukhopadhyay,	(参考訳) 局所的に相互作用するダイナミックなシステム、例えば流行の広がり、群衆による噂の伝播、森林火災などは、局所的、比較的単純で、しばしば動的要素間の確率的な相互作用に由来する複雑なグローバルなダイナミクスを示す。彼らの時間的進化は、しばしば有限個の離散状態間の遷移によって引き起こされる。深層学習による予測モデリングの進歩にもかかわらず、多くの要素間の相互作用は予測モデリングの特定の領域として研究されることはめったにない。本稿では,周辺細胞間の時間的情報を置換不変な方法で関連付けることにより,未知の局所状態遷移規則を効果的に発見するために,注意的反復神経セルオートマタ(AR-NCA)を提案する。 AR-NCAは、様々なシステム構成(例えば状態の空間分布)において優れた一般化性を示し、確率的相互作用が存在する場合であっても、極端にデータ制限されたシナリオにおいてデータ効率とロバスト性を示し、空間次元に依存しない予測によるスケーラビリティを示す。 Locally interacting dynamical systems, such as epidemic spread, rumor propagation through crowd, and forest fire, exhibit complex global dynamics originated from local, relatively simple, and often stochastic interactions between dynamic elements. Their temporal evolution is often driven by transitions between a finite number of discrete states. Despite significant advancements in predictive modeling through deep learning, such interactions among many elements have rarely explored as a specific domain for predictive modeling. We present Attentive Recurrent Neural Cellular Automata (AR-NCA), to effectively discover unknown local state transition rules by associating the temporal information between neighboring cells in a permutation-invariant manner. AR-NCA exhibits the superior generalizability across various system configurations (i.e., spatial distribution of states), data efficiency and robustness in extremely data-limited scenarios even in the presence of stochastic interactions, and scalability through spatial dimension-independent prediction.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# 経済性の評価: チップ設計符号化支援におけるドメイン適応型大規模言語モデルの総所有コストと最先端カウンタの比較分析 Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance ( http://arxiv.org/abs/2404.08850v2 ) ライセンス: Link先を確認	Amit Sharma, Teodor-Dumitru Ene, Kishor Kunal, Mingjie Liu, Zafar Hasan, Haoxing Ren,	(参考訳) 本稿では,チップ設計におけるコーディング支援に関するタスクを中心に,ドメイン適応型大規模言語モデル (LLM) と最先端LLM (SoTA) の総所有コスト(TCO) と性能の比較分析を行った。我々は,Claude 3 Opus と ChatGPT-4 Turbo の2つの主要な LLM に対して,ドメイン適応型 LLM である ChipNeMo の TCO と性能指標を比較し,チップ設計符号生成の有効性を評価する。本研究は, モデルの精度, 訓練方法, 運用費の詳細な評価を通じて, 利害関係者に対して, 特定のニーズに対して最も経済的に実行可能な, 性能効率の良いソリューションを選択するための重要な情報を提供することを目的とする。この結果から,ChipNeMoのようなドメイン適応モデルを採用することで,汎用モデルに比べて大幅なコスト削減による性能向上を図った。特に、ドメイン適応型LCMがTCOを約90%-95%削減する可能性を明らかにし、デプロイメントの規模が拡大するにつれて、コストのアドバンテージがますます明らかになる。デプロイメントの拡大に伴い、ChipNeMoのコストメリットはより顕著になり、ドメイン適応型LLMは、LLMがサポートしているコーディングニーズの高い組織にとって魅力的な選択肢となる。 This paper presents a comparative analysis of total cost of ownership (TCO) and performance between domain-adapted large language models (LLM) and state-of-the-art (SoTA) LLMs , with a particular emphasis on tasks related to coding assistance for chip design. We examine the TCO and performance metrics of a domain-adaptive LLM, ChipNeMo, against two leading LLMs, Claude 3 Opus and ChatGPT-4 Turbo, to assess their efficacy in chip design coding generation. Through a detailed evaluation of the accuracy of the model, training methodologies, and operational expenditures, this study aims to provide stakeholders with critical information to select the most economically viable and performance-efficient solutions for their specific needs. Our results underscore the benefits of employing domain-adapted models, such as ChipNeMo, that demonstrate improved performance at significantly reduced costs compared to their general-purpose counterparts. In particular, we reveal the potential of domain-adapted LLMs to decrease TCO by approximately 90%-95%, with the cost advantages becoming increasingly evident as the deployment scale expands. With expansion of deployment, the cost benefits of ChipNeMo become more pronounced, making domain-adaptive LLMs an attractive option for organizations with substantial coding needs supported by LLMs	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# スポットライトのOV:どのように反射するか? OOVs in the Spotlight: How to Inflect them? ( http://arxiv.org/abs/2404.08974v2 ) ライセンス: Link先を確認	Tomáš Sourada, Jana Straková, Rudolf Rosa,	(参考訳) 我々は、通常、最先端のシステムでは効果が低い、oo-of-vocabulary(OOV)条件における形態的インフレクションに焦点を当てる。 LSTMとTransformerに基づく2つのシーケンス・ツー・シーケンス・モデル(seq2seq)を逆行モデルとして開発した。 OOVの条件下での試験では,モルフォロジーに富むチェコ語の名詞の大規模なデータセットを自動的に抽出し,レムマと解離するデータを分割し,さらに実世界におけるOOVのネオロジズムデータセットを手動で注釈付けした。標準的なOOV条件では、TransformerはLSTM、逆行モデル、SIGMORPHONベースラインとのアンサンブル性能の向上とともに、最高の結果を得る。実世界のネオロジズムのOOVデータセットでは、逆行性モデルはすべてのニューラルモデルより優れています。最後に, SIGMORPHON 2022のタスクデータから, 大規模データ条件下でのOOV評価(機能重複)において, 16言語中9言語について, 最新の結果を得た。我々はチェコのOOVインフレクションデータセットをリリースし、OOV条件の厳密な評価を行う。さらに,Seq2seqモデルを用いたインフレクションシステムをPythonライブラリとしてリリースする。 We focus on morphological inflection in out-of-vocabulary (OOV) conditions, an under-researched subtask in which state-of-the-art systems usually are less effective. We developed three systems: a retrograde model and two sequence-to-sequence (seq2seq) models based on LSTM and Transformer. For testing in OOV conditions, we automatically extracted a large dataset of nouns in the morphologically rich Czech language, with lemma-disjoint data splits, and we further manually annotated a real-world OOV dataset of neologisms. In the standard OOV conditions, Transformer achieves the best results, with increasing performance in ensemble with LSTM, the retrograde model and SIGMORPHON baselines. On the real-world OOV dataset of neologisms, the retrograde model outperforms all neural models. Finally, our seq2seq models achieve state-of-the-art results in 9 out of 16 languages from SIGMORPHON 2022 shared task data in the OOV evaluation (feature overlap) in the large data condition. We release the Czech OOV Inflection Dataset for rigorous evaluation in OOV conditions. Further, we release the inflection system with the seq2seq models as a ready-to-use Python library.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# ニューラルネット量子状態の最適化とクロムダイマー試験 Improved Optimization for the Neural-network Quantum States and Tests on the Chromium Dimer ( http://arxiv.org/abs/2404.09280v3 ) ライセンス: Link先を確認	Xiang Li, Jia-Cheng Huang, Guang-Ze Zhang, Hao-En Li, Zhu-Ping Shen, Chen Zhao, Jun Li, Han-Shi Hu,	(参考訳) ニューラル・ネットワーク量子状態(NQS)の出現は、かなり先進的な波動関数アンザッツの研究をもたらし、軌道空間の変動であるモンテカルロ探査(VMC)の復活を引き起こした。本研究は, 適応学習率アルゴリズム, 制約付き最適化, ブロック最適化という, NQSを用いたVMC最適化の計算要求を削減するアルゴリズムを3つ導入した。我々は、cc-pVDZ基底集合内の複素多重参照結合の$\rm H_2O$および$\rm N_2$の洗練されたアルゴリズムを評価し、Ahlrichs SV基底集合における強相関クロム二量(\rm Cr_2$)の基底状態エネルギーを計算する。この結果は,CPUコストが比較的低い場合に,結合クラスタ理論よりも高い精度が得られる。この研究は、これらの戦略を用いて最適化効率とロバスト性を高める方法を示し、大規模制限ボルツマンマシン(RBM)ベースのNQSをより効率的に最適化するための新しい経路を開き、NQSの実用的な量子化学応用の大幅な進歩を示す。 The advent of Neural-network Quantum States (NQS) has significantly advanced wave function ansatz research, sparking a resurgence in orbital space variational Monte Carlo (VMC) exploration. This work introduces three algorithmic enhancements to reduce computational demands of VMC optimization using NQS: an adaptive learning rate algorithm, constrained optimization, and block optimization. We evaluate the refined algorithm on complex multireference bond stretches of $\rm H_2O$ and $\rm N_2$ within the cc-pVDZ basis set and calculate the ground-state energy of the strongly correlated chromium dimer ($\rm Cr_2$) in the Ahlrichs SV basis set. Our results achieve superior accuracy compared to coupled cluster theory at a relatively modest CPU cost. This work demonstrates how to enhance optimization efficiency and robustness using these strategies, opening a new path to optimize large-scale Restricted Boltzmann Machine (RBM)-based NQS more effectively and marking a substantial advancement in NQS's practical quantum chemistry applications.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# トークンレベルの直接参照最適化 Token-level Direct Preference Optimization ( http://arxiv.org/abs/2404.11999v2 ) ライセンス: Link先を確認	Yongcheng Zeng, Guoqing Liu, Weiyu Ma, Ning Yang, Haifeng Zhang, Jun Wang,	(参考訳) 微調整された事前訓練された大規模言語モデル(LLM)は、それらを人間の価値観や意図と整合させるのに不可欠である。このプロセスは、モデルが生成した全回答の評価に焦点をあてて、ペア比較や基準LLMに対するKL分散といった手法を利用することが多い。しかしながら、これらの応答の生成は、シーケンシャルで自己回帰的な方法でトークンレベルで行われる。本稿では,トークンレベルでポリシーを最適化することにより,LLMと人間の嗜好を一致させる新しいアプローチである,トークンレベルの直接選好最適化(TDPO)を提案する。分散効率の課題に直面している従来の方法とは異なり、TDPOはトークンごとに前方KL分散制約を導入し、アライメントと多様性を改善している。トークンベースの報酬システムのためのBradley-Terryモデルを利用することで、TDPOは、明示的な報酬モデリングを必要とせずに単純さを保ちながら、KL分散の規制を強化する。テキストタスク間の実験結果は、TDPOが生成多様性との整合性に優れた性能を示す。特に、TDPOによる微調整は、制御された感情生成とシングルターン対話データセットにおいてDPOよりもバランスが良く、DPOおよびPPOベースのRLHF手法と比較して、生成した応答の品質が著しく向上する。我々のコードはhttps://github.com/Vance0124/Token-level-Direct-Preference-Optimizationでオープンソース化されています。 Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at https://github.com/Vance0124/Token-level-Direct-Preference-Optimization.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# X-Light: 変圧器上の変圧器をメタマルチエージェント強化学習器として用いた都市横断信号制御 X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner ( http://arxiv.org/abs/2404.12090v2 ) ライセンス: Link先を確認	Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao,	(参考訳) 交通光制御の有効性は、複数の信号機間の協調により、現在の強化学習に基づくアプローチによって著しく改善されている。しかし、持続的な問題として、多様な都市にまたがる顕著な転送性を持つマルチエージェント交通信号制御アルゴリズムの取得方法がある。本稿では,都市間メタマルチエージェント交通信号制御のためのトランスフォーマー(TonT)モデルを提案する。X-Light:我々はマルコフ決定プロセスの完全なトラジェクトリを入力し,ローワートランスフォーマーは,都市内における目標交差点とその周辺地域の状態,行動,報酬を集約し,アッパートランスフォーマーは,各都市間の一般的な決定トラジェクトリを学習する。この二重レベルアプローチはモデルの堅牢な一般化と伝達可能性を促進する。特に、目に見えないシナリオへの直接転送では、平均で+7.91%、場合によっては+16.3%のベースラインメソッドを超越し、最良の結果が得られる。 The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# MolCRAFT:連続パラメータ空間における構造に基づく医薬品設計 MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space ( http://arxiv.org/abs/2404.12141v4 ) ライセンス: Link先を確認	Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma,	(参考訳) 近年, 構造に基づく医薬品デザイン(SBDD)の創成モデルが有望な成果を上げている。既存の研究は主に、高い結合親和性を持つ分子を生成する方法に焦点を当てており、生成された3Dポーズに対する実現可能性の前提条件を無視し、偽陽性をもたらす。我々は,モード崩壊やハイブリッド連続離散空間を含む自己回帰的手法を適用し,SBDDに拡散する際の不整合問題の要因を徹底的に研究する。本稿では,連続パラメータ空間で動作する最初のSBDDモデルであるMolCRAFTと,新しいノイズ低減サンプリング戦略を紹介する。実験により,本モデルはより安定な3次元構造との結合親和性において常に優れた性能を示し,原子間相互作用を正確にモデル化する能力を示している。我々の知る限りでは、MollCRAFTは、基準レベルのVina Scores (-6.59 kcal/mol) を同等の分子サイズで達成し、他の強いベースラインよりも広いマージン (-0.84 kcal/mol) で優れている。コードはhttps://github.com/AlgoMole/MolCRAFTで入手できる。 Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# zk-SNARKによるプライバシー保護UPB決定プロセス検証 Privacy-Preserving UCB Decision Process Verification via zk-SNARKs ( http://arxiv.org/abs/2404.12186v2 ) ライセンス: Link先を確認	Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo,	(参考訳) 機械学習の普及により、データのプライバシとアルゴリズムパラメータの保護と、マシンラーニングの検証可能性の確保のバランスを取る方法は、常に課題でした。本研究では、強化学習とデータプライバシの交わりについて検討し、特に、Multi-Armed Bandit(MAB)問題とアッパー信頼境界(UCB)アルゴリズムに対処する。我々は、Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) を用いて、UCBを強化する革新的なアルゴリズムzkUCBを紹介する。 zkUCBは、トレーニングデータとアルゴリズムパラメータの機密性を保護し、透明な UCB 決定を保証するために慎重に設計されている。実験ではzkUCBの優れた性能が強調され、決定過程における情報エントロピーの低減に寄与する。 zkUCBの証明サイズと検証時間はzkUCBの実行ステップと線形にスケールする。これはzkUCBがデータセキュリティと運用効率のバランスを保っていることを示している。このアプローチは、複雑な意思決定プロセスにおけるデータのプライバシ強化に関する継続的な議論に大きく貢献し、プライバシに敏感なアプリケーションのための有望なソリューションを提供する。 With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB's superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB's adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# 位置符号化のない因果変換器の長さ一般化 Length Generalization of Causal Transformers without Position Encoding ( http://arxiv.org/abs/2404.12224v2 ) ライセンス: Link先を確認	Jie Wang, Tao Ji, Yuanbin Wu, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang, Xiaoling Wang,	(参考訳) より長い文への一般化は、最近のTransformerベースの言語モデルにとって重要である。明示的な位置特徴を操作するアルゴリズムに加えて、位置エンコーディング(NoPE)のないトランスフォーマーの成功は、この課題を克服する新しい方法を提供する。本稿では,NoPEの長さ一般化特性について検討する。 NoPEは、一般的に使われる明示的な位置エンコーディングよりも長いシーケンスに拡張できるが、コンテキスト長が制限されている。我々は,NoPEの一般化の失敗と注意分布の乱れとの関係を同定する。本研究では,NPEのコンテキストサイズを大幅に拡大する,アテンションヘッドの最適温度ハイパーパラメータを求めるためのパラメータ効率チューニングを提案する。ロングシーケンス言語モデリング、合成パスキー検索タスク、実世界のロングコンテキストタスクの実験は、NoPEが最先端長一般化アルゴリズムで競合性能を達成可能であることを示している。ソースコードは公開されています Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that although NoPE can extend to longer sequences than the commonly used explicit position encodings, it still has a limited context length. We identify a connection between the failure of NoPE's generalization and the distraction of attention distributions. We propose a parameter-efficient tuning for searching attention heads' best temperature hyper-parameters, which substantially expands NoPE's context size. Experiments on long sequence language modeling, the synthetic passkey retrieval task and real-world long context tasks show that NoPE can achieve competitive performances with state-of-the-art length generalization algorithms. The source code is publicly accessible	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# ReZero: 後方ビューとエンチアバッファリアナライズによるMCTSベースのアルゴリズムの強化 ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze ( http://arxiv.org/abs/2404.16364v3 ) ライセンス: Link先を確認	Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang,	(参考訳) モンテカルロ木探索(MCTS)に基づくアルゴリズム、例えばMuZeroとその派生は、様々な意思決定領域で広く成功している。これらのアルゴリズムは、ウォールクロック時間の大幅な消費を犠牲にしながらも、古いデータからサンプルの効率を高めるために再分析プロセスを採用している。この問題に対処するため,MCTSアルゴリズムのツリー探索操作を高速化するReZeroという手法を提案する。具体的には、一方の腕のバンディットモデルからインスピレーションを得た後向きの再利用手法を用いてトレーニングサンプルを再解析し、予め特定の子ノードの値推定を行う。この設計にさらに適応するため、ミニバッチを頻繁に再解析するのではなく、バッファ全体を定期的に再解析する。これら2つの設計の相乗効果は、検索コストを大幅に削減し、一方でデータ収集と再解析の両方を簡素化し、性能を保証または改善する。アタリ環境での実験とボードゲームにより、ReZeroは高いサンプル効率を維持しながらトレーニング速度を大幅に改善することを示した。コードは、https://github.com/opendilab/LightZeroのLightZeroベンチマークの一部として利用できる。 Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.	翻訳日:2024-05-30 00:49:33 公開日:2024-05-28
# 自動運転車の安全性の見直し Redefining Safety for Autonomous Vehicles ( http://arxiv.org/abs/2404.16768v3 ) ライセンス: Link先を確認	Philip Koopman, William Widen,	(参考訳) コンピュータベースのシステムの安全性に関する既存の定義と関連する概念的枠組みは、自動運転車の展開から現実の体験に照らして再考されるべきである。業界安全基準で現在使用されている用語は、特定されたハザードからのリスクの軽減を強調し、人間の監督された車両操作に基づく仮定を実行している。人間の運転者なしでの運転は、特にオープンワールド環境での運転、運用制限を自己強化する要件、アドホックな社会技術システムへの参加、法的および倫理的制約の両方に準拠する要件により、安全上の問題の範囲を劇的に拡大する。既存の標準と用語は、これらの新しい課題に部分的に対処するだけである。我々は、これらの新たな安全課題に対処するための安全なアプローチを進化させる出発点として、これらの追加考慮を含むコアシステム安全概念の更新定義を提案する。これらの結果は、他の自律システムアプリケーションに対するフレーミング安全用語を通知する可能性がある。 Existing definitions and associated conceptual frameworks for computer-based system safety should be revisited in light of real-world experiences from deploying autonomous vehicles. Current terminology used by industry safety standards emphasizes mitigation of risk from specifically identified hazards, and carries assumptions based on human-supervised vehicle operation. Operation without a human driver dramatically increases the scope of safety concerns, especially due to operation in an open world environment, a requirement to self-enforce operational limits, participation in an ad hoc sociotechnical system of systems, and a requirement to conform to both legal and ethical constraints. Existing standards and terminology only partially address these new challenges. We propose updated definitions for core system safety concepts that encompass these additional considerations as a starting point for evolving safe-ty approaches to address these additional safety challenges. These results might additionally inform framing safety terminology for other autonomous system applications.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# ファースト・ツー・スパイク符号化を用いた確率スパイクニューラルネットワーク Stochastic Spiking Neural Networks with First-to-Spike Coding ( http://arxiv.org/abs/2404.17719v2 ) ライセンス: Link先を確認	Yi Jiang, Sen Lu, Abhronil Sengupta,	(参考訳) ニューラルネットワークの第3世代として認識されているスパイキングニューラルネットワーク(SNN)は、特にニューロモルフィックハードウェアに実装された場合、その生物学的楽観性とエネルギー効率で知られている。しかし、SNNの既存の研究の大部分は、情報統合の長い時間による計算上のオーバーヘッドを生じさせ、脳の確率的推論能力と時間的ダイナミクスを完全に活用できない決定論的ニューロンに集中している。本研究では,SNNアーキテクチャにおける新しい計算手法と情報符号化手法の融合について検討し,確率的スパイクニューロンモデルと時間的符号化技術を統合する。他の決定論的SNNとの広範なベンチマークとレートベースコーディングを通じて、我々は、精度、推論遅延、スパイク空間性、エネルギー消費、ロバスト性の観点から、我々の提案のトレードオフを調査した。我々の研究は、VGGアーキテクチャやMNISTを超えるデータセットにテンポラリエンコードすることで、確率的SNNの直接トレーニングアプローチのスケーラビリティを初めて拡張したものです。 Spiking Neural Networks (SNNs), recognized as the third generation of neural networks, are known for their bio-plausibility and energy efficiency, especially when implemented on neuromorphic hardware. However, the majority of existing studies on SNNs have concentrated on deterministic neurons with rate coding, a method that incurs substantial computational overhead due to lengthy information integration times and fails to fully harness the brain's probabilistic inference capabilities and temporal dynamics. In this work, we explore the merger of novel computing and information encoding schemes in SNN architectures where we integrate stochastic spiking neuron models with temporal coding techniques. Through extensive benchmarking with other deterministic SNNs and rate-based coding, we investigate the tradeoffs of our proposal in terms of accuracy, inference latency, spiking sparsity, energy consumption, and robustness. Our work is the first to extend the scalability of direct training approaches of stochastic SNNs with temporal encoding to VGG architectures and beyond-MNIST datasets.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# 高調波伝達学習とモダリティアライメントを用いた効率的なリモートセンシング Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment ( http://arxiv.org/abs/2404.18253v5 ) ライセンス: Link先を確認	Tengjun Huang,	(参考訳) Visual and Language Pretraining (VLP)の台頭に伴い、多くのダウンストリームタスクが事前トレーニングのパラダイムを採用しており、さらに微調整も行われている。このパラダイムは、様々なマルチモーダルな下流タスクにおいてポテンシャルを示してきたが、リモートセンシング領域における実装はいくつかの障害に直面している。具体的には、同じモダリティの埋め込みを一緒にクラスタ化する傾向は、効率的な移動学習を妨げる。この問題に対処するために,下流タスクに対するマルチモーダル・トランスファー学習の目的を統一的な視点から検討し,3つの異なる目的に基づいて最適化プロセスを再考する。本研究では,タスク制約,モダリティアライメント,単一モダリティアライメントを同時に満足する手法であるHarMA(Harmonized Transfer Learning and Modality Alignment)を提案する。注目すべきは、トレーニングのための外部データを必要としないHarMAは、リモートセンシングの分野で人気の高い2つのマルチモーダル検索タスクにおいて、最先端のパフォーマンスを達成することである。実験の結果,HarMAは最小限の調整可能なパラメータしか持たない完全微調整モデルに対して,競争力や性能に優れることがわかった。その単純さから、HarMAは既存のほとんどすべてのマルチモーダル事前学習モデルに統合できる。本手法により,大規模モデルの幅広い下流タスクへの効率的な適用が促進され,資源消費を大幅に削減できることを期待する。コードはhttps://github.com/seekerhuang/HarMA.comで入手できる。 With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster together impedes efficient transfer learning. To tackle this issue, we review the aim of multimodal transfer learning for downstream tasks from a unified perspective, and rethink the optimization process based on three distinct objectives. We propose "Harmonized Transfer Learning and Modality Alignment (HarMA)", a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment, while minimizing training overhead through parameter-efficient fine-tuning. Remarkably, without the need for external data for training, HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing. Our experiments reveal that HarMA achieves competitive and even superior performance to fully fine-tuned models with only minimal adjustable parameters. Due to its simplicity, HarMA can be integrated into almost all existing multimodal pretraining models. We hope this method can facilitate the efficient application of large models to a wide range of downstream tasks while significantly reducing the resource consumption. Code is available at https://github.com/seekerhuang/HarMA.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# オフラインライセンスのためのファームウェアに基づくAIチップ輸出制御の短期実施 Near-Term Enforcement of AI Chip Export Controls Using A Firmware-Based Design for Offline Licensing ( http://arxiv.org/abs/2404.18308v2 ) ライセンス: Link先を確認	James Petrie,	(参考訳) オフラインライセンスは、潜在的に危険なフロンティアAIモデルの非規制トレーニングを防ぐために使用できる、計算ガバナンスのメカニズムである。このメカニズムは、規制当局から未使用のライセンスを持っていない限り、AIチップを無効にすることで機能する。本報告では,ファームウェア更新を通じて配信可能なオフラインライセンスの最小バージョンの設計について述べる。既存のAIチップは、ファームウェアの検証、ファームウェアのロールバック保護、不揮発性メモリの安全性といった(比較的一般的な)ハードウェアセキュリティ機能があれば、1年以内にオフラインライセンスをサポートする可能性がある。公開資料によると、NVIDIAのH100 AIチップには、これらのセキュリティ機能がすでに備わっている。追加のハードウェア修正がなければ、物理的なハードウェア攻撃の影響を受けやすい。しかし、これらの攻撃は高価な機器を必要とする可能性があり、何千ものAIチップに確実に適用することは困難である。ファームウェアベースのオフラインライセンス設計は、ハードウェアベースのソリューションと同じ法的要件とライセンス承認メカニズムを共有している。ファームウェアベースのソリューションの実装は、将来的にはよりセキュアなハードウェアベースのソリューションの最終的な展開を加速する可能性がある。 AIチップメーカーにとって、このセキュリティメカニズムを実装することで、輸出制限によって禁止されるであろう顧客にチップを販売できるようになるかもしれない。政府にとって、今後数年間で、安全でないアクターや悪意のないアクターがフロンティアAIモデルをトレーニングするのを防ぐことが重要である。この初期分析に基づいて、ファームウェアベースのオフラインライセンスは、緊急のセキュリティと貿易の問題を部分的に解決し、ハードウェアのセキュリティに共通する機能を持つAIチップに対して技術的に実現可能である。 Offline Licensing is a mechanism for compute governance that could be used to prevent unregulated training of potentially dangerous frontier AI models. The mechanism works by disabling AI chips unless they have an unused license from a regulator. In this report, we present a design for a minimal version of Offline Licensing that could be delivered via a firmware update. Existing AI chips could potentially support Offline Licensing within a year if they have the following (relatively common) hardware security features: firmware verification, firmware rollback protection, and secure non-volatile memory. Public documentation suggests that NVIDIA's H100 AI chip already has these security features. Without additional hardware modifications, the system is susceptible to physical hardware attacks. However, these attacks might require expensive equipment and could be difficult to reliably apply to thousands of AI chips. A firmware-based Offline Licensing design shares the same legal requirements and license approval mechanism as a hardware-based solution. Implementing a firmware-based solution now could accelerate the eventual deployment of a more secure hardware-based solution in the future. For AI chip manufacturers, implementing this security mechanism might allow chips to be sold to customers that would otherwise be prohibited by export restrictions. For governments, it may be important to be able to prevent unsafe or malicious actors from training frontier AI models in the next few years. Based on this initial analysis, firmware-based Offline Licensing could partially solve urgent security and trade problems and is technically feasible for AI chips that have common hardware security features.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# QOSST: 連続可変量子キー分散実験のための高モジュールオープンソースプラットフォーム QOSST: A Highly-Modular Open Source Platform for Experimental Continuous-Variable Quantum Key Distribution ( http://arxiv.org/abs/2404.18637v3 ) ライセンス: Link先を確認	Yoann Piétri, Matteo Schiavon, Valentina Marulanda Acosta, Baptiste Gouraud, Luis Trigo Vidarte, Philippe Grangier, Amine Rhouni, Eleni Diamanti,	(参考訳) 量子鍵分布(Quantum Key Distribution, QKD)は、量子物理学の法則に根ざした情報理論セキュリティを持つ2つのリモートパーティ間の秘密鍵交換を可能にする。光のコヒーレントな状態の2次成分の値などの連続変数(CV)における鍵情報の符号化は、標準的な光通信システムにはるかに近い実装をもたらすが、これは低信号対雑音比で操作するのに必要とされるデジタル信号処理技術において、かなり複雑である。本研究では,CV-QKD実験の参入障壁を小さくし,ハードウェア非依存で,複数の構成で使用可能な,高度にモジュール化されたオープンソースソフトウェアを提供することにより,その難しさを解消したい。我々は、局所的に発生する局所発振器、周波数多重化パイロット、RF-ヘテロダイン検出による実験装置を用いて、QOSSTと呼ばれるこのソフトウェアをベンチマークし、漸近限界における大都市圏距離におけるMbit/sのオーダーの最先端秘密鍵レートを得た。我々は,QOSSTがCV-QKDのさらなる実験的進歩を刺激し,コミュニティによって改良・拡張され,多種多様な構成で高い性能を期待する。 Quantum Key Distribution (QKD) enables secret key exchange between two remote parties with information-theoretic security rooted in the laws of quantum physics. Encoding key information in continuous variables (CV), such as the values of quadrature components of coherent states of light, brings implementations much closer to standard optical communication systems, but this comes at the price of significant complexity in the digital signal processing techniques required for operation at low signal-to-noise ratios. In this work, we wish to lower the barriers to entry for CV-QKD experiments associated to this difficulty by providing a highly modular, open source software that is in principle hardware agnostic and can be used in multiple configurations. We benchmarked this software, called QOSST, using an experimental setup with a locally generated local oscillator, frequency multiplexed pilots and RF-heterodyne detection, and obtained state-of-the-art secret key rates of the order of Mbit/s over metropolitan distances at the asymptotic limit. We hope that QOSST can be used to stimulate further experimental advances in CV-QKD and be improved and extended by the community to achieve high performance in a wide variety of configurations.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# MicroDreamer: スコアベースイテレーティブレコンストラクションによる$\sim$20秒のゼロショット3D生成 MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction ( http://arxiv.org/abs/2404.19525v2 ) ライセンス: Link先を確認	Luxi Chen, Zhengyi Wang, Zihan Zhou, Tingting Gao, Hang Su, Jun Zhu, Chongxuan Li,	(参考訳) スコア蒸留サンプリング(SDS)のような最適化に基づくアプローチは、ゼロショット3D生成において有望であるが、主に各試料に必要な関数評価(NFE)の多さにより、低効率に悩まされている。本稿では,NFEの削減のために,異なる3次元再構成過程を模倣した効率的かつ汎用的なアルゴリズムであるスコアベース反復再構成(SIR)を提案する。多視点スコアベース拡散モデルから一組のイメージがサンプリングされた場合、SIRはSDSの単一ステップ最適化とは異なり、繰り返し3Dパラメータを最適化する。トレーニングにおける他の改善とともに、様々な3D表現や3D生成タスクに適用可能な、MicroDreamerと呼ばれる効率的なアプローチを提案する。特に同等のパフォーマンスを維持しているMicroDreamerは、SDSよりも5～20倍高速で、A100 GPU上で3Dガウススプレイティングからメッシュを生成するのに約20秒かかり、最速のゼロショットベースラインであるDreamGaussianの時間を半減する。私たちのコードは \url{https://github.com/ML-GSAI/MicroDreamer} で利用可能です。 Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at \url{https://github.com/ML-GSAI/MicroDreamer}.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# ニューラルネットワークによる動的データ評価 Neural Dynamic Data Valuation ( http://arxiv.org/abs/2404.19557v2 ) ライセンス: Link先を確認	Zhangyong Liang, Huanhuan Gao, Ji Zhang,	(参考訳) データ・エコノミーとその市場の基礎的な構成要素はデータ・エコノミーである。効率的で公正なデータ評価が、重要な関心事のトピックとして浮上している。 > 限界貢献に基づく多くのアプローチは、様々な下流タスクにおいて有望な結果を示している。しかしながら、特定の目的のために与えられたデータセットの有用性や価値を評価するために使用される、多数のユーティリティ関数のトレーニングを必要とするため、計算コストが広く知られている。その結果、大規模なデータセットを含むデータマーケットプレースにこれらの手法を適用することは不可能であると認識されている。その結果、重要な問題が発生する: ユーティリティ関数の再トレーニングをどうやって回避できるのか? この問題に対処するために,ニューラルダイナミックデータ評価(NDDV)と呼ばれる最適制御の観点から,新しいデータ評価手法を提案する。本手法は,データ最適制御状態の感度を用いて,データ評価を正確に識別する理論的解釈を持つ。さらに,データポイントのユニークな特徴を捉え,データポイントと平均場状態の相互作用による公平性を確保するために,データ再重み付け戦略を実装した。特に,本手法では,すべてのデータポイントの値を推定するために1回のみのトレーニングが必要であり,計算効率が大幅に向上する。さまざまなデータセットとタスクを使用して包括的な実験を行います。その結果,提案手法は既存の最先端データ評価手法よりも高い値または低値のデータポイントを正確に同定し,より計算効率がよいことを示す。 Data constitute the foundational component of the data economy and its marketplaces. Efficient and fair data valuation has emerged as a topic of significant interest.\ Many approaches based on marginal contribution have shown promising results in various downstream tasks. However, they are well known to be computationally expensive as they require training a large number of utility functions, which are used to evaluate the usefulness or value of a given dataset for a specific purpose. As a result, it has been recognized as infeasible to apply these methods to a data marketplace involving large-scale datasets. Consequently, a critical issue arises: how can the re-training of the utility function be avoided? To address this issue, we propose a novel data valuation method from the perspective of optimal control, named the neural dynamic data valuation (NDDV). Our method has solid theoretical interpretations to accurately identify the data valuation via the sensitivity of the data optimal control state. In addition, we implement a data re-weighting strategy to capture the unique features of data points, ensuring fairness through the interaction between data points and the mean-field states. Notably, our method requires only training once to estimate the value of all data points, significantly improving the computational efficiency. We conduct comprehensive experiments using different datasets and tasks. The results demonstrate that the proposed NDDV method outperforms the existing state-of-the-art data valuation methods in accurately identifying data points with either high or low values and is more computationally efficient.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# MMTryon:高品質ファッション生成のためのマルチモードマルチ参照制御 MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation ( http://arxiv.org/abs/2405.00448v2 ) ライセンス: Link先を確認	Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang,	(参考訳) 本稿では,テキストインストラクションと複数の衣料品イメージを入力として,高品質な合成試行結果を生成するマルチモーダルマルチ参照VITONフレームワークであるMMTryonを紹介する。 MMTryonは,先行文献で見落とされた3つの問題に対処する。既存の方法は通常、単着の試着作業(例えば、上着と下着、ドレス)のために設計されている。 2)ドレッシングスタイルの特定既存の方法では、指示(例: zipped/unzipped, tuck-in/tuck-outなど)に基づいてドレッシングスタイルをカスタマイズできない。さらに、置換領域を特定するためにカテゴリ固有のセグメンテーションモデルに強く依存しており、セグメンテーションエラーは試行錯誤の結果において直接的に重要なアーティファクトに繋がる。最初の2つの課題に対処するため,MMTryonでは,参照画像からの衣服情報とテキスト指示からのドレッシングスタイル情報を組み合わせた,新しいマルチモーダリティとマルチリファレンスアテンション機構を導入している。さらに、セグメンテーション依存を取り除くために、MMTryonはパーシングフリーの衣料エンコーダを使用し、新しいスケーラブルなデータ生成パイプラインを活用して、既存のVITONデータセットを明示的なセグメンテーションを必要とせずに、MMTryonをトレーニング可能な形式に変換する。高解像度のベンチマークと実験セットに関する大規模な実験は、MMTryonが既存のSOTA法よりも質的かつ定量的に優れていることを示した。 MMTryonは、マルチテムでスタイル制御可能な仮想試用シナリオにおける印象的なパフォーマンスと、あらゆるソースイメージからさまざまなシナリオであらゆる服を試す能力によって、ファッションコミュニティにおける将来の調査のための新たな道を開いた。 This paper introduces MMTryon, a multi-modal multi-reference VIrtual Try-ON (VITON) framework, which can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs. Our MMTryon addresses three problems overlooked in prior literature: 1) Support of multiple try-on items. Existing methods are commonly designed for single-item try-on tasks (e.g., upper/lower garments, dresses). 2)Specification of dressing style. Existing methods are unable to customize dressing styles based on instructions (e.g., zipped/unzipped, tuck-in/tuck-out, etc.) 3) Segmentation Dependency. They further heavily rely on category-specific segmentation models to identify the replacement regions, with segmentation errors directly leading to significant artifacts in the try-on results. To address the first two issues, our MMTryon introduces a novel multi-modality and multi-reference attention mechanism to combine the garment information from reference images and dressing-style information from text instructions. Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation. Extensive experiments on high-resolution benchmarks and in-the-wild test sets demonstrate MMTryon's superiority over existing SOTA methods both qualitatively and quantitatively. MMTryon's impressive performance on multi-item and style-controllable virtual try-on scenarios and its ability to try on any outfit in a large variety of scenarios from any source image, opens up a new avenue for future investigation in the fashion community.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# 一般化等角的タイトフレームからの情報過完全測定 Informationally overcomplete measurements from generalized equiangular tight frames ( http://arxiv.org/abs/2405.00560v2 ) ライセンス: Link先を確認	Katarzyna Siudzińska,	(参考訳) 情報の過剰な測定は、量子トモグラフィーと量子状態推定に重要な応用を見出す。最も一般的なのは相互に偏りのない基底の最大集合であり、測定作用素間のトレース関係はよく知られている。本稿では、任意のランクの等角的タイトフレームによって生成される情報的にオーバーコンプリートなPOVMのより一般的なクラスを紹介する。このクラスは、互いに偏りのない測度と基底の再スケールを含む非射影POVMへの等角測度を一般化する。本稿では, それらの構成法, 対称性特性の解析, 高対称性の場合の例について述べる。特に、円錐型2-設計である一般化された等角測定の幅広いクラスを見つけ、偶然の指数を導出することができる。以上の結果から,POVM の情報完全コレクションに対して,情報の過剰な測定を単一で行うことのメリットが示唆された。 Informationally overcomplete measurements find important applications in quantum tomography and quantum state estimation. The most popular are maximal sets of mutually unbiased bases, for which trace relations between measurement operators are well known. In this paper, we introduce a more general class of informationally overcomplete POVMs that are generated by equiangular tight frames of arbitrary rank. This class provides a generalization of equiangular measurements to non-projective POVMs, which include rescaled mutually unbiased measurements and bases. We provide a method of their construction, analyze their symmetry properties, and provide examples for highly symmetric cases. In particular, we find a wide class of generalized equiangular measurements that are conical 2-designs, which allows us to derive the index of coincidence. Our results show benefits of considering a single informationally overcomplete measurement over informationally complete collections of POVMs.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# Spider: コンテキスト依存の概念セグメンテーションのための統一フレームワーク Spider: A Unified Framework for Context-dependent Concept Segmentation ( http://arxiv.org/abs/2405.01002v2 ) ライセンス: Link先を確認	Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu,	(参考訳) 人間、車、飛行機のような文脈に依存しない(CI)概念とは異なり、文脈に依存しない(CD)概念は、偽装された物体や医学的病変のような高い視覚的理解能力を必要とする。多くのCD理解タスクが各ブランチで急速に進歩したにもかかわらず、分離された進化はドメイン間の一般化と反復的な技術革新に繋がる。 CDタスクには前景と背景のコンテキストの間に強い結合関係があるため、既存の手法では焦点を絞った領域で個別のモデルを訓練する必要がある。これは、人工知能(AGI)に対する現実のCD概念の理解を制限する。パラメータセット1セットの統一モデルであるSpiderを提案する。イメージマスクグループプロンプトによって駆動される提案されたコンセプトフィルタの助けを借りて、スパイダーはプロンプターの意図を正確に捉えるために、多様なコンテキスト依存の概念を理解し、区別することができる。ベルとホイッスルがなければ、スパイダーは8つの異なるコンテキスト依存のセグメンテーションタスクにおいて最先端の特殊モデルよりも優れており、その中には4つの自然なシーン(塩分、カモフラージュ、透明な物体と影)と4つの医学的病変(COVID-19、ポリプ、乳房、皮膚病変、大腸内視鏡、CT、超音波、皮膚内視鏡のモダリティ)が含まれる。さらに、スパイダーは継続的学習における明らかなアドバンテージを示している。パラメータを1\%未満に微調整することで、新しいタスクのトレーニングを簡単に完了し、古いタスクすべてに対して許容可能なパフォーマンス劣化を5\%以下にする。ソースコードは \href{https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg}{Spider-UniCDSeg} で公開されている。 Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg}{Spider-UniCDSeg}.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# コントラスト学習によるクロスモーダル蒸留の一般化理論 A Generalization Theory of Cross-Modality Distillation with Contrastive Learning ( http://arxiv.org/abs/2405.03355v2 ) ライセンス: Link先を確認	Hangyu Lin, Chen Liu, Chengming Xu, Zhengqi Gao, Yanwei Fu, Yuan Yao,	(参考訳) クロスモダリティ蒸留は、深度マップや高品質スケッチのような限られた知識を含むデータモダリティにとって重要なトピックである。このようなテクニックは特に、ラベル付きトレーニングデータが一般に利用できないメモリやプライバシに制限されたシナリオにおいて非常に重要である。この問題を解決するために、既存のラベルフリーな手法では、いくつかのラベルなしデータを利用して、ソースとターゲットのモダリティの特徴や統計を整合させて知識を抽出する。例えば、典型的には、ソース(eg画像)とターゲット(egスケッチ)モダリティ内のサンプルのペアの学習した特徴間のL2距離や対照的な損失を最小限にすることを目的としている。しかし、この分野のほとんどのアルゴリズムは実験結果にのみ焦点をあてているが、理論的な洞察は得られていない。クロスモダリティ蒸留の理論と実践的手法のギャップを埋めるために,まず,正と負の対応を両立したコントラスト学習に基づくクロスモダリティコントラスト蒸留(CMCD)の一般的な枠組みを,より優れた一般化可能な特徴の蒸留に向けて定式化する。さらに、実験結果から検証した目標モード内の下流タスクにおいて、ソースと目標モード間の距離がテストエラーに大きく影響することを明らかにする、徹底的な収束解析を確立した。画像,スケッチ,深度マップ,および音声認識とセグメンテーションのタスクのモダリティを網羅し,既存のアルゴリズムを2～3倍のマージンで一貫した性能を示した。 Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distill the knowledge by aligning features or statistics between the source and target modalities. For instance, one typically aims to minimize the L2 distance or contrastive loss between the learned features of pairs of samples in the source (e.g. image) and the target (e.g. sketch) modalities. However, most algorithms in this domain only focus on the experimental results but lack theoretical insight. To bridge the gap between the theory and practical method of cross-modality distillation, we first formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning that leverages both positive and negative correspondence, towards a better distillation of generalizable features. Furthermore, we establish a thorough convergence analysis that reveals that the distance between source and target modalities significantly impacts the test error on downstream tasks within the target modality which is also validated by the empirical results. Extensive experimental results show that our algorithm outperforms existing algorithms consistently by a margin of 2-3\% across diverse modalities and tasks, covering modalities of image, sketch, depth map, and audio and tasks of recognition and segmentation.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# ハイパーグラフ強化デュアル半教師付きグラフ分類 Hypergraph-enhanced Dual Semi-supervised Graph Classification ( http://arxiv.org/abs/2405.04773v2 ) ライセンス: Link先を確認	Wei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Zhiping Xiao, Yifan Wang, Xiao Luo, Ming Zhang,	(参考訳) 本稿では,限定ラベル付きグラフと豊富なラベル付きグラフを用いたシナリオにおいて,グラフのカテゴリを正確に予測することを目的とした半教師付きグラフ分類について検討する。グラフニューラルネットワーク(GNN)の有望な能力にもかかわらず、彼らは通常、多くのコストのかかるラベル付きグラフを必要とする。さらに、GNNは本来、メッセージパッシング機構を用いたローカル近隣情報の符号化に限られており、ノード間の高次依存関係をモデル化する能力が欠如している。これらの課題に対処するために,ハイパーグラフと線グラフの観点からグラフ意味を抽出する半教師付きグラフ分類のためのハイパーグラフ拡張DuALフレームワークHEALを提案する。具体的には、ノード間の高次関係をよりよく探求するため、ペア関係を超えた複雑なノード依存を適応的に学習するハイパーグラフ構造を設計する。一方、学習したハイパーグラフに基づいて、ハイパーエッジ間の相互作用を捉える線グラフを導入し、基盤となるセマンティック構造をよりよくマイニングする。最後に,2つの分野間の知識伝達を容易にし,相互指導を向上する関係整合性学習を開発する。実世界のグラフデータセットに対する大規模な実験により,既存の最先端手法に対する提案手法の有効性が検証された。 In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreover, GNNs are inherently limited to encoding local neighborhood information using message-passing mechanisms, thus lacking the ability to model higher-order dependencies among nodes. To tackle these challenges, we propose a Hypergraph-Enhanced DuAL framework named HEAL for semi-supervised graph classification, which captures graph semantics from the perspective of the hypergraph and the line graph, respectively. Specifically, to better explore the higher-order relationships among nodes, we design a hypergraph structure learning to adaptively learn complex node dependencies beyond pairwise relations. Meanwhile, based on the learned hypergraph, we introduce a line graph to capture the interaction between hyperedges, thereby better mining the underlying semantic structures. Finally, we develop a relational consistency learning to facilitate knowledge transfer between the two branches and provide better mutual guidance. Extensive experiments on real-world graph datasets verify the effectiveness of the proposed method against existing state-of-the-art methods.	翻訳日:2024-05-30 00:39:49 公開日:2024-05-28
# 一般化ベイズによる外乱カルマンフィルタ Outlier-robust Kalman Filtering through Generalised Bayes ( http://arxiv.org/abs/2405.05646v2 ) ライセンス: Link先を確認	Gerardo Duran-Martin, Matias Altamirano, Alexander Y. Shestopaloff, Leandro Sánchez-Betancourt, Jeremias Knoblauch, Matt Jones, François-Xavier Briol, Kevin Murphy,	(参考訳) 我々は、外れ値や不特定測定モデルの存在下で、状態空間モデルにおけるオンラインフィルタリングのための新しい、確実に堅牢でクローズドなベイズ更新ルールを導出する。提案手法は,一般化ベイズ推定と拡張カルマンフィルタやアンサンブルカルマンフィルタなどのフィルタリング手法を組み合わせる。非線形モデルの場合, 前者はロバスト性を示すために, 後者は計算効率を確保するために使用する。我々の手法は、より少ない計算コストで、他の頑健なフィルタリング手法(変分ベイズに基づくものなど)に適合または優れる。我々は、物体追跡、高次元カオスシステムにおける状態推定、ニューラルネットワークのオンライン学習など、外乱測定によるフィルタリング問題に対して、これを実証的に示す。 We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the case of nonlinear models. Our method matches or outperforms other robust filtering methods (such as those based on variational Bayes) at a much lower computational cost. We show this empirically on a range of filtering problems with outlier measurements, such as object tracking, state estimation in high-dimensional chaotic systems, and online learning of neural networks.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# 量子対古典$P$-divisibility Quantum vs. classical $P$-divisibility ( http://arxiv.org/abs/2405.05794v2 ) ライセンス: Link先を確認	Fabio Benatti, Dariusz Chruściński, Giovanni Nichele,	(参考訳) 古典的および量子的非マルコフ過程において、$P$-divisibilityは中心的な概念である。直交射影の完全な集合によって生成される固定可換代数に制限されるとき、任意の量子力学は自然に古典的確率過程を与える。量子発生器が$P$分割可能な量子力学を生じさせるのは、古典的還元の可能な全ての還元が可分な古典的確率過程をもたらす場合に限る。しかし、この性質は、生成元の代わりに量子力学写像の古典的還元を演算した場合は成り立たない:例えば、ユニタリ力学の場合、古典的還元の$P$-divisibilityは必然的に失われ、情報逆フローが現れる。代わりに、純粋に散逸的な量子ビット進化のいくつかの重要なクラスに対して、量子$P$-divisibilityは常に古典的な$P$-divisibilityを意味し、したがって量子的シナリオと古典的シナリオの両方において情報のバックフローが欠如している。それとは対照的に、直交共変量子ビット力学の幅広いクラスにおいて、古典的な$P$分割性の喪失は、ユニタリの場合のように、純粋に散逸可能な$P$分割可能な量子力学の古典的な還元から生じることが示される。さらに、そのような効果は、時間進化する量子状態のコヒーレンスに格納される情報バックフローの観点から解釈することができる。 $P$-divisibility is a central concept in both classical and quantum non-Markovian processes; in particular, it is strictly related to the notion of information backflow. When restricted to a fixed commutative algebra generated by a complete set of orthogonal projections, any quantum dynamics naturally provides a classical stochastic process. It is indeed well known that a quantum generator gives rise to a $P$-divisible quantum dynamics if and only if all its possible classical reductions give rise to divisible classical stochastic processes. Yet, this property does not hold if one operates a classical reduction of the quantum dynamical maps instead of their generators: as an example, for a unitary dynamics, $P$-divisibility of its classical reduction is inevitably lost, which thus exhibits information backflow. Instead, for some important classes of purely dissipative qubit evolutions, quantum $P$-divisibility always implies classical $P$-divisibility and thus lack of information backflow both in the quantum and classical scenarios. On the contrary, for a wide class of orthogonally covariant qubit dynamics, we show that loss of classical $P$-divisibility can originate from the classical reduction of a purely dissipative $P$-divisible quantum dynamics as in the unitary case. Moreover, such an effect can be interpreted in terms of information backflow, the information coming in being stored in the coherences of the time-evolving quantum state.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# DOLOMITES:ドメイン特有なロングフォームなメソジカルタスク DOLOMITES: Domain-Specific Long-Form Methodical Tasks ( http://arxiv.org/abs/2405.05938v2 ) ライセンス: Link先を確認	Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti,	(参考訳) さまざまな分野の専門家は、計画、組織化、報告を行うための方法論的な記述タスクを日常的に実行します。患者に対する鑑別診断を書く臨床医から、学生のための授業計画を書く教師まで、これらのタスクは広く行き渡っており、与えられた入力に対して構造化された長期出力を体系的に生成する必要がある。本研究では,タスク目標,手順,入力,出力の形式で構成された方法論的タスクのタイプを考案し,25分野から数百人の専門家から得られた519のタスクを仕様化した新しいベンチマークであるDoLoMiTesを紹介する。さらに,本ベンチマークでは,各タスクのモデル生成例を10点まで抽出し,具体的な入力例と出力例(1,857件)を具体化する。これらの例を用いて、与えられたコンテキストとドメイン知識を描画しながら複雑な推論を行う必要があるため、方法論的タスクの自動化が困難な長文生成問題であることを強調した現代言語モデルを評価する。 Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# マイクロバイオームのハビタット特異性における遺伝子相互作用効果のための全ゲノムトランス Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity ( http://arxiv.org/abs/2405.05998v2 ) ライセンス: Link先を確認	Zhufeng Li, Sandeep S Cranganore, Nicholas Youngblut, Niki Kilbertus,	(参考訳) マイクロバイオーム内の膨大な遺伝的多様性を活用することで、複雑な表現型に関する非並列的な洞察が得られるが、そのような特徴をゲノムデータから正確に予測し理解する作業は依然として困難である。本研究では、遺伝子ベクター化のための既存の大規模モデルを利用して、微生物ゲノム配列全体から生息地特異性を予測する枠組みを提案する。本モデルに基づいて,微生物を多様な環境に適応させる遺伝子相互作用効果を解明するための属性技術を開発した。我々は、異なる生息地から得られた高品質のマイクロバイオームゲノムの大規模なデータセット上で、我々のアプローチを訓練し、検証する。我々は、確固とした予測性能を示すだけでなく、ゲノム全体の配列レベルの情報によって、複雑な表現型に基づく遺伝子関連を識別する方法についても示している。我々の属性は、既知の重要な相互作用ネットワークを復元し、実験的なフォローアップのための新しい候補を提案する。 Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# 局所高調波距離を用いた擬似近傍分類法 A Novel Pseudo Nearest Neighbor Classification Method Using Local Harmonic Mean Distance ( http://arxiv.org/abs/2405.06238v2 ) ライセンス: Link先を確認	Junzhuo Chen, Zhixin Lu, Shitong Kang,	(参考訳) 機械学習の分野では、KNN分類アルゴリズムは単純さと効率性で広く認識されている。しかしながら、K値に対する感度は、特に小さなサンプルサイズや外れ値では、分類性能に影響を及ぼす。本稿では,KNN を用いた LMPHNN (Novel Pseudo Nearest Neighbor Classification Method Using Local Harmonic Mean Distance) について紹介する。 LMPHNNは、LMPNNルールとHMDに基づく分類性能を改善するために、調和平均距離(HMD)を利用する。分類器は、各クラスに最も近い k 個の近傍を識別し、異なる局所ベクトルをプロトタイプとして生成することから始まる。 Pseudo Near neighbors (PNN) は各クラスの局所平均に基づいて作成され、サンプルのHMDと初期k群を比較して決定される。これらのカテゴリの局所平均に基づいて、クエリサンプルとPNN間のユークリッド距離を計算することで分類を決定する。さまざまな実UCIデータセットと組み合わせデータセットに関する大規模な実験は、LMPHNNと7つのKNNベースの分類器を比較し、精度、リコール、精度、F1を評価指標として用いた。 LMPHNNは平均97%の精度を達成し、他の手法を14%上回っている。平均リコールは12%改善され、平均精度は5%向上した。さらに、LMPHNNは他の手法に比べて平均F1値が13%高いことを示す。まとめると、LMPHNNは他の分類器よりも優れており、小さなサンプルサイズで低い感度を示す。 In the realm of machine learning, the KNN classification algorithm is widely recognized for its simplicity and efficiency. However, its sensitivity to the K value poses challenges, especially with small sample sizes or outliers, impacting classification performance. This article introduces a novel KNN-based classifier called LMPHNN (Novel Pseudo Nearest Neighbor Classification Method Using Local Harmonic Mean Distance). LMPHNN leverages harmonic mean distance (HMD) to improve classification performance based on LMPNN rules and HMD. The classifier begins by identifying k nearest neighbors for each class and generates distinct local vectors as prototypes. Pseudo nearest neighbors (PNNs) are then created based on the local mean for each class, determined by comparing the HMD of the sample with the initial k group. Classification is determined by calculating the Euclidean distance between the query sample and PNNs, based on the local mean of these categories. Extensive experiments on various real UCI datasets and combined datasets compare LMPHNN with seven KNN-based classifiers, using precision, recall, accuracy, and F1 as evaluation metrics. LMPHNN achieves an average precision of 97%, surpassing other methods by 14%. The average recall improves by 12%, with an average accuracy enhancement of 5%. Additionally, LMPHNN demonstrates a 13% higher average F1 value compared to other methods. In summary, LMPHNN outperforms other classifiers, showcasing lower sensitivity with small sample sizes.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# DP-DyLoRA:動的低ランク適応を用いた個人差分学習環境下での微調整トランスフォーマーモデル DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation ( http://arxiv.org/abs/2405.06368v2 ) ライセンス: Link先を確認	Jie Xu, Karthikeyan Saravanan, Rogier van Dalen, Haaris Mehmood, David Tuckey, Mete Ozay,	(参考訳) フェデレートラーニング(FL)により、IoT(Internet of Things)システムのクライアントは、ローカルデータをサーバと共有することなく、グローバルモデルを協調的にトレーニングすることができる。しかし、サーバへのクライアントのコントリビューションは機密情報を漏洩させる可能性がある。差分プライバシ(DP)は、クライアントのコントリビューションにランダム性を加えるメカニズムを備えた、正式なプライバシ保証を提供することによって、そのようなリークに対処する。このランダム性により、現代のIoTシステムで一般的な大きなトランスフォーマーベースのモデルをトレーニングすることは不可能になる。本研究では,フェデレート学習システムにおいて,差分プライバシを持つデバイス上での大規模トランスフォーマーモデルの実現性を実証的に評価する。我々は、音声認識、コンピュータビジョン(CV)、自然言語理解(NLU)など、多分野にわたるタスクに対して、様々なシステム特性に関する包括的な実験を行う。この結果から,DP-FLによる完全微調整は,パラメータ効率のよい微調整(PEFT)による寄与の次元性を低減し,大きな性能劣化をもたらすことが示唆された。既存のDP-PEFT手法のベンチマークでは,DP-Low-Rank Adaptation (DP-LoRA) が他の手法より一貫して優れていることが示された。さらに有望なアプローチであるDyLoRAは、FLと鼻で組み合わせることで、直接差分プライバシーを損なう。そこで本研究では,差分プライバシーと組み合わせてDP-DyLoRAと呼ぶ適応手法を提案する。最後に、DPによる精度劣化と単語誤り率(WER)の増加を、それぞれ100万のクライアントと厳しいプライバシー予算である {\epsilon}=2で2%未満と7%に削減することができる。 Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern IoT systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of {\epsilon}=2.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# 変形可能な物体に対する学習対応 Learning Correspondence for Deformable Objects ( http://arxiv.org/abs/2405.08996v2 ) ライセンス: Link先を確認	Priya Sundaresan, Aditya Ganapathi, Harry Zhang, Shivin Devgon,	(参考訳) 本稿では,古典的手法と学習的手法を比較し,変形可能なオブジェクト,すなわち布とロープの画素対応の問題について検討する。布とロープは、伝統的に大きな構成空間で解析的にモデル化する最も難しい変形可能なオブジェクトであり、布の折り畳み、ロープ結び付け、Tシャツの折り畳み、カーテンの閉じなどといったロボット作業の文脈において意味がある。対応問題はロボット工学において大きく動機付けられており、セマンティックな把握、オブジェクト追跡、および対応の上に構築された操作ポリシーを含む広範囲の応用がある。本稿では,SIFT,SURF,ORBなどの特徴マッチングによる対応手法と,TimeCycle や Dense Object Nets などの学習に基づく2つの手法を網羅的に検討する。我々は,(1) 変形可能なオブジェクトの合成画像のシミュレーションとレンダリングを行うフレームワーク,(2) 擬似ドメインと実ドメイン間の移動を示す定性的な結果,(2) デンスオブジェクトネットを拡張する新しい学習ベース対応手法,(3) 最先端の対応方法間の標準化された比較,の3つの主な貢献を行う。提案手法は,非剛性(および剛性)物体に対する時間的および空間的連続的な対応を学習するための柔軟で汎用的な定式化を提供する。 Dense Object Netsは,すべてのメソッドに対して平均2乗誤差統計を報告し,ベースラインの古典的手法よりも高い性能を示し,提案したDense Object Netsの拡張も同様に機能する。 We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# カーネルリッジ回帰の飽和効果について On the Saturation Effect of Kernel Ridge Regression ( http://arxiv.org/abs/2405.09362v2 ) ライセンス: Link先を確認	Yicheng Li, Haobo Zhang, Qian Lin,	(参考訳) 飽和効果は、地下の真理関数の滑らかさが一定のレベルを超えると、カーネルリッジ回帰(KRR)が情報理論的下界を達成できない現象を指す。飽和効果は慣行で広く見られ、KRRの飽和下限は数十年にわたって推測されてきた。本稿では、この長期予想の証明を提供する。 The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# NeRO: ニューラルネットワークによる道路表面の再構築 NeRO: Neural Road Surface Reconstruction ( http://arxiv.org/abs/2405.10554v2 ) ライセンス: Link先を確認	Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Haoyu Chen,	(参考訳) 道路面の正確な再構築は、特に自動運転における様々な用途において重要である。本稿では,道路路面を設計するためのMLP(Multi-Layer Perceptrons)フレームワークを設計し,世界座標x,yとして入力し,高さ,色,意味情報として出力する。本手法の有効性は,車両カメラのポーズ,LiDAR点雲,SFM点雲などの道路高度源との互換性,スパースラベルやノイズセマンティック予測などの画像のセマンティックノイズに対する堅牢性,高速なトレーニング速度,特に道路表面の可視化や4Dラベリング,セマンティックグルーピングなどのアプリケーションにおいて,セマンティックスで道路表面をレンダリングするための有望な応用を示す。 Accurately reconstructing road surfaces is pivotal for various applications especially in autonomous driving. This paper introduces a position encoding Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces, with input as world coordinates x and y, and output as height, color, and semantic information. The effectiveness of this method is demonstrated through its compatibility with a variety of road height sources like vehicle camera poses, LiDAR point clouds, and SFM point clouds, robust to the semantic noise of images like sparse labels and noise semantic prediction, and fast training speed, which indicates a promising application for rendering road surfaces with semantics, particularly in applications demanding visualization of road surface, 4D labeling, and semantic groupings.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# ViViD:拡散モデルを用いたビデオバーチャルトライオン ViViD: Video Virtual Try-on using Diffusion Models ( http://arxiv.org/abs/2405.11794v2 ) ライセンス: Link先を確認	Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha,	(参考訳) Video Virtual try-onは、服のアイテムを対象者のビデオに転送することを目的としている。画像ベーストライオンの技法をフレームワイズで直接適用すると、時間的に一貫性のない結果が生じるが、従来のビデオベーストライオンソリューションでは、視覚的品質が低く、ぼやけた結果しか得られない。本稿では,ビデオ仮想試行の課題に取り組むために,強力な拡散モデルを用いた新しいフレームワークViViDを提案する。具体的には、Garment Encoderを設計し、細粒度の衣服のセマンティックな特徴を抽出し、提案した注目特徴融合機構を通じて、被服の詳細を捕捉し、対象映像に注入するモデルを導出する。空間的時間的整合性を確保するために,ポーズ信号を符号化する軽量なPose Encoderを導入し,衣服と姿勢の相互作用を学習し,階層型時間モジュールをテキストから画像への安定拡散モデルに挿入することで,よりコヒーレントでライフライクなビデオ合成を実現する。さらに、最も多様な種類の衣服と、ビデオバーチャルトライオンのタスクのための最高の解像度を備えた、最大規模のデータセットを収集する。大規模な実験により,本手法は良好なビデオ試行結果が得られることが示された。データセット、コード、ウェイトが公開される。プロジェクトページ: https://becauseimbatman0.github.io/ViViD。 Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# 直感的なファインチューニング:1つのプロセスへのアライメントの簡易化を目指して Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process ( http://arxiv.org/abs/2405.11870v2 ) ライセンス: Link先を確認	Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou,	(参考訳) Supervised Fine-Tuning (SFT) と Preference Optimization (PO) は、事前学習後の言語モデル(LM)の機能を強化するための2つの基本的なプロセスである。 SFTは訓練効率が向上するが、POはより優れたアライメントを提供するため、しばしば組み合わせられる。しかしながら、一般的なプラクティスは、最適化の目的を統合することなく、それらをシーケンシャルに適用し、パラダイムギャップを埋め、両方の強みを取る機会を無視します。統一された理解を得るために、我々は、Markov Decision Process (MDP)フレームワーク内のトークンレベルで定義された2つのサブプロセス、優先度推定と遷移最適化でSFTとPOを解釈する。このモデリングにより、SFT は劣等な推定と最適化を伴う PO の特殊ケースに過ぎないことが分かる。 POはモデル全体の回答の質を評価し、SFTはターゲットの回答から前のトークンに基づいて予測トークンをスコアする。したがって、SFTはモデルの性能を過大評価し、劣等な最適化をもたらす。この観点から,SFT と Preference Optimization をひとつのプロセスに統合する直感的ファインチューニング (IFT) を導入する。 IFTは、LMの時間的残差接続による全回答の直感的な感覚を捉えているが、それは単一のポリシーとSFTと同量の非参照ラベルデータに依存している。我々の実験により、IFTはSFTのシーケンシャルなレシピやいくつかのタスク、特に生成、推論、ファクトフォローの能力を必要とする典型的なPreference Optimization手法と相容れないか、あるいはそれ以上に優れていることが示されている。説明可能なフロズンレイクゲームは、競争政策を得るためのIFTの有効性をさらに検証する。 Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their optimization objectives, ignoring the opportunities to bridge their paradigm gap and take the strengths from both. To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of PO with inferior estimation and optimization. PO evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-Tuning (IFT) to integrate SFT and Preference Optimization into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, but it solely relies on a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical Preference Optimization methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT for getting competitive policy.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# Brewer-Nash Scrutinized:Write Revocationを特徴とするポリシーの機械的チェック Brewer-Nash Scrutinised: Mechanised Checking of Policies featuring Write Revocation ( http://arxiv.org/abs/2405.12187v2 ) ライセンス: Link先を確認	Alfredo Capozucca, Maximiliano Cristiá, Ross Horne, Ricardo Katz,	(参考訳) 本稿では,倫理的中国壁政策に触発されたブルワー・ナッシュ・セキュリティ・ポリシー・モデルを再考する。我々はBrewer-Nashモデルで書き込みアクセスを無効にできるという事実に注意を払っている。書き込みアクセスのセマンティクスはもともと不特定であり、現代の運用セマンティクスを提供する複数の解釈につながった。我々は、Kesslerのより正確な定義を採用することにより、Brewer-Nashモデルにおける情報フローの分析を近代化する。近代化された改革のために、Brewer & Nashによって提案された全ての定理について完全な機械化されたカバレッジを提供する。ほとんどの定理は、情報フローに関する定理を除いて、ツール {log} を使って自動的に確立される。ブリュワーナッシュが当初提案した全ての定理を網羅し、近代的な精度と機械化を実現し、より複雑なセキュリティポリシーモデルの自動チェックのための方法論への一歩として、本研究を提案する。 This paper revisits the Brewer-Nash security policy model inspired by ethical Chinese Wall policies. We draw attention to the fact that write access can be revoked in the Brewer-Nash model. The semantics of write access were underspecified originally, leading to multiple interpretations for which we provide a modern operational semantics. We go on to modernise the analysis of information flow in the Brewer-Nash model, by adopting a more precise definition adapted from Kessler. For our modernised reformulation, we provide full mechanised coverage for all theorems proposed by Brewer & Nash. Most theorems are established automatically using the tool {log} with the exception of a theorem regarding information flow, which combines a lemma in {log} with a theorem mechanised in Coq. Having covered all theorems originally posed by Brewer-Nash, achieving modern precision and mechanisation, we propose this work as a step towards a methodology for automated checking of more complex security policy models.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# $SU(N)$ゲージ理論のデジタル化と沈み込み Digitization and subduction of $SU(N)$ gauge theories ( http://arxiv.org/abs/2405.12204v2 ) ライセンス: Link先を確認	Benoît Assi, Henry Lamm,	(参考訳) 量子コンピュータ上の格子ゲージ理論のシミュレーションは、デジタル化ゲージ場を必要とする。一つのアプローチは連続ゲージ群を離散部分群に置換することを含むが、この近似の含意はいまだに明確化する必要がある。洞察を得るために, 離散結晶状部分群に対する$ SU(2) $ および $ SU(3)$ の沈み込みについて検討する。古典的な格子計算を用いて,代入された直接和に基づいて付加価値情報を提供し,デジタル化の効果を緩和する格子作用に付加的な用語を組み込むのに役立つことを示す。さらに、$ \Sigma(360 \times)のすべての既約表現の静的ポテンシャルを計算する。 3)固定格子間隔で$。以上の結果から, 1 つの $ \Sigma(360 \times に代入する既約表現に対するカシミールスケーリング ( SU(3) ) との % レベルの一致が明らかとなった。 3)$ 既約表現。これは近似品質の診断尺度であり、いくつかの既約表現は期待結果と密接に一致し、他の表現は大きな偏差を示す。 The simulation of lattice gauge theories on quantum computers necessitates digitizing gauge fields. One approach involves substituting the continuous gauge group with a discrete subgroup, but the implications of this approximation still need to be clarified. To gain insights, we investigate the subduction of $ SU(2) $ and $ SU(3)$ to discrete crystal-like subgroups. Using classical lattice calculations, we show that subduction offers valuable information based on subduced direct sums, helping us identify additional terms to incorporate into the lattice action that can mitigate the effects of digitization. Furthermore, we compute the static potentials of all irreducible representations of $ \Sigma(360 \times 3) $ at a fixed lattice spacing. Our results reveal a percent-level agreement with the Casimir scaling of ( SU(3) ) for irreducible representations that subduce to a single $ \Sigma(360 \times 3) $ irreducible representation. This provides a diagnostic measure of approximation quality, as some irreducible representations closely match the expected results while others exhibit significant deviations.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# 量子アニールの物理限界を拡張した統計量子ビット凍結 Statistical Qubit Freezing Extending Physical Limit of Quantum Annealers ( http://arxiv.org/abs/2405.12594v2 ) ライセンス: Link先を確認	Jeung Rac Lee, June-Koo Kevin Rhee, Changjun Kim, Bo Hyun Choi,	(参考訳) Adiabatic quantum annealersは、クビット数の増加とともに、地面と励起状態の間の指数的に急速に減少するエネルギーギャップによってスケーラビリティに直面する。これにより、熱雑音によって合成される基底状態の同定における誤差が生じる。本稿では, 与えられた問題のアニーリングハミルトンモデルにおいて, 統計的決定的量子ビットの状態を選択的に固定する, SQF (Statistic qubit frozen) と呼ばれる新しいアルゴリズムスキームを提案する。凍結を繰り返し適用することにより、例えば、SQFは、標準的なD-Waveの量子イジングマシンソリューションにおける従来のアニール法と比較して、断熱過程のスペクトルギャップを最大60 %向上させ、実質的に基本的な制限を克服する。 Adiabatic quantum annealers encounter scalability challenges due to exponentially fast diminishing energy gaps between ground and excited states with qubit-count increase. This introduces errors in identifying ground states compounded by a thermal noise. We propose a novel algorithmic scheme called statistical qubit freezing (SQF) that selectively fixes the state of statistically deterministic qubit in the annealing Hamiltonian model of the given problem. Applying freezing repeatedly, SQF significantly enhances the spectral gap between of an adiabatic process, as an example, by up to 60\% compared to traditional annealing methods in the standard D-Wave's quantum Ising machine solution, effectively overcoming the fundamental limitations.	翻訳日:2024-05-30 00:29:50 公開日:2024-05-28
# Aurora: 大気の基礎モデル Aurora: A Foundation Model of the Atmosphere ( http://arxiv.org/abs/2405.13063v2 ) ライセンス: Link先を確認	Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris,	(参考訳) ディープラーニング基盤モデルは、大量のデータを活用して、さまざまな下流タスクに取り組むために適応可能な汎用的な表現を学ぶことで、科学の多くの側面に革命をもたらしている。ファンデーションモデルは、地球系の膨大なデータを活用することで、地球とそのサブシステムをモデル化する能力も変革する、という約束を持っています。ここではAuroraを紹介します。Auroraは、100万時間以上の多様な気象および気候データに基づいてトレーニングされた大気の大規模な基盤モデルです。オーロラは基礎モデリングアプローチの強みを活用して、限られた訓練データ、異種変数、極端な事象を含む様々な大気予測問題に対する運用予測を生成する。 1分以内にオーロラは5日間の大気汚染予測と10日間の高解像度気象予測を生成し、最先端の古典的なシミュレーションツールと最高の専門的なディープラーニングモデルを上回った。これらの結果は, 基礎モデルが環境予測を変換できることを示唆している。 Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models. Taken together, these results indicate that foundation models can transform environmental forecasting.	翻訳日:2024-05-30 00:20:06 公開日:2024-05-28
# SEGAN: 欠落データ計算のための半教師付き学習手法 SEGAN: semi-supervised learning approach for missing data imputation ( http://arxiv.org/abs/2405.13089v2 ) ライセンス: Link先を確認	Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, YangYang Wu,	(参考訳) 多くの実世界の応用において、データ不足は非常に一般的な現象であり、データ駆動人工知能理論や技術の開発がますます困難になっている。データ補完は、データ前処理の欠如にとって重要な方法である。ほとんどの既存のミススルーデータ補完モデルは、欠落したデータセットの既知の情報を直接使用するが、欠落したデータ補完モデルにデータセットに含まれるデータラベル情報の影響を無視する。本稿では,主にジェネレータ,識別器,分類器の3つの重要なモジュールを含む半教師付き学習に基づくデータ補完モデルSEGANを提案する。 SEGANモデルでは、ジェネレータは、欠落したデータ値を予測する際に、既知のデータとそのラベル情報をよりフルに利用することができる。さらに、SE-GANモデルでは、識別器が既知のデータとジェネレータによって満たされたデータをより効果的に識別できるように、ヒント行列が欠落している。本稿では,分類器とヒント行列の欠如を導入したSEGANモデルが,ナッシュ平衡に達すると実データ分布特性を学習できることを理論的に証明する。最後に, 本論文では, 多数の実験を行い, 実験結果から, 現状の多変量データ補完法と組み合わせて, SEGANモデルの性能を3%以上向上することを示した。 In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the data label information contained in the data set on the missing data completion model. To this end, this paper proposes a missing data completion model SEGAN based on semi-supervised learning, which mainly includes three important modules: generator, discriminator and classifier. In the SEGAN model, the classifier enables the generator to make more full use of known data and its label information when predicting missing data values. In addition, the SE-GAN model introduces a missing hint matrix to allow the discriminator to more effectively distinguish between known data and data filled by the generator. This paper theoretically proves that the SEGAN model that introduces a classifier and a missing hint matrix can learn the real known data distribution characteristics when reaching Nash equilibrium. Finally, a large number of experiments were conducted in this article, and the experimental results show that com-pared with the current state-of-the-art multivariate data completion method, the performance of the SEGAN model is improved by more than 3%.	翻訳日:2024-05-30 00:20:06 公開日:2024-05-28
# EVINCEフレームワークによる医療現場の真偽の確認 Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework ( http://arxiv.org/abs/2405.15808v2 ) ライセンス: Link先を確認	Edward Y. Chang,	(参考訳) 誤診は医療において重大な問題であり、患者に有害な結果をもたらす。機械学習モデルによる誤ったラベル付きデータの臨床実践への伝播は容認できない。本稿では,EVINCEを提案する。 1【診断精度の向上】 2)誤診断を正し、トレーニングデータエラーを最小限にする。 EVINCE は、情報二重性によるエントロピー変化と等能力による表現であり、この新しい理論を利用して、構造化された議論フレームワークにおける複数の大規模言語モデル (LLM) を用いた診断プロセスを最適化する。我々の実証研究はEVINCEが設計目標を達成するのに有効であることを検証している。 Misdiagnosis is a significant issue in healthcare, leading to harmful consequences for patients. The propagation of mislabeled data through machine learning models into clinical practice is unacceptable. This paper proposes EVINCE, a system designed to 1) improve diagnosis accuracy and 2) rectify misdiagnoses and minimize training data errors. EVINCE stands for Entropy Variation through Information Duality with Equal Competence, leveraging this novel theory to optimize the diagnostic process using multiple Large Language Models (LLMs) in a structured debate framework. Our empirical study verifies EVINCE to be effective in achieving its design goals.	翻訳日:2024-05-30 00:20:06 公開日:2024-05-28
# 地理的コロケーションは重要か? : 新型コロナウイルス感染時の公衆衛生会話を事例として Does Geo-co-location Matter? A Case Study of Public Health Conversations during COVID-19 ( http://arxiv.org/abs/2405.17710v1 ) ライセンス: Link先を確認	Paiheng Xu, Louiqa Raschid, Vanessa Frias-Martinez,	(参考訳) Twitter(現在のX)のようなソーシャルメディアプラットフォームは、特に新型コロナウイルス(COVID-19)の間、情報発信や公的なエンゲージメントにおいて重要な役割を担っている。公衆衛生の専門家にとって重要な目標は、マスキングや社交距離といった地域的な成果に影響を及ぼす社会行動を促進することである。本研究の目的は,局所的なエンゲージメントがソーシャルメディアの会話に与える影響を分析することである。本研究では,公共衛生専門家(PHE)と公衆の地域的関わりがソーシャルメディアに与える影響について検討した。 2020年1月から2021年11月までのTwitterの会話データセットを分析し、500近いPHEから19万件以上のツイートと350万件の参加者から約800万件の回答を得た。その結果,ジオコロケーションは,特にマスキング,ロックダウン,教育などの話題に関する会話や,学術・医学専門家との会話において,高いエンゲージメント率と関連していることが明らかとなった。感情と個人の経験に関連する語彙的特徴は、地理的に共同配置された文脈においてより一般的であった。この研究は、地理的コロケーションがソーシャルメディアのエンゲージメントにどのように影響するかを洞察し、公衆衛生メッセージングを改善するための戦略を通知する。 Social media platforms like Twitter (now X) have been pivotal in information dissemination and public engagement, especially during COVID-19. A key goal for public health experts was to encourage prosocial behavior that could impact local outcomes such as masking and social distancing. Given the importance of local news and guidance during COVID-19, the objective of our research is to analyze the effect of localized engagement, on social media conversations. This study examines the impact of geographic co-location, as a proxy for localized engagement between public health experts (PHEs) and the public, on social media. We analyze a Twitter conversation dataset from January 2020 to November 2021, comprising over 19 K tweets from nearly five hundred PHEs, along with approximately 800 K replies from 350 K participants. Our findings reveal that geo-co-location is associated with higher engagement rates, especially in conversations on topics including masking, lockdowns, and education, and in conversations with academic and medical professionals. Lexical features associated with emotion and personal experiences were more common in geo-co-located contexts. This research provides insights into how geographic co-location influences social media engagement and can inform strategies to improve public health messaging.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# データのCLAIM: コンテキスト大言語モデルによるインプット精度の向上 CLAIM Your Data: Enhancing Imputation Accuracy with Contextual Large Language Models ( http://arxiv.org/abs/2405.17712v1 ) ライセンス: Link先を確認	Ahatsham Hayat, Mohammad Rashedul Hasan,	(参考訳) 本稿では,事前学習された大規模言語モデル(LLM)の拡張的知識と推論能力を利用して,表付きデータセットの欠落したデータ問題に対処する新しい戦略であるCLAIMについて紹介する。数値推定に大きく依存する従来の計算法とは異なり、CLAIMは文脈的に関係のある自然言語記述子を用いて、不足した値を埋める。このアプローチは、データセットをLLMの機能に本質的に整合した自然言語のコンテキスト化されたフォーマットに変換することで、LLMの二重使用を容易にする。多様なデータセットや欠落パターンに対する評価は,既存の計算手法よりもCLAIMの方が優れた性能を示している。さらに,不足データに対する文脈特化と汎用記述子の有効性について検討した結果,データ計算におけるLLMの性能向上における文脈精度の重要性が示唆された。結果は、データ分析と機械学習モデルの信頼性と品質を著しく向上させるCLAIMの可能性を強調し、欠落したデータを扱うためのより微妙で効果的なソリューションを提供する。 This paper introduces the Contextual Language model for Accurate Imputation Method (CLAIM), a novel strategy that capitalizes on the expansive knowledge and reasoning capabilities of pre-trained large language models (LLMs) to address missing data challenges in tabular datasets. Unlike traditional imputation methods, which predominantly rely on numerical estimations, CLAIM utilizes contextually relevant natural language descriptors to fill missing values. This approach transforms datasets into natural language contextualized formats that are inherently more aligned with LLMs' capabilities, thereby facilitating the dual use of LLMs: first, to generate missing value descriptors, and then, to fine-tune the LLM on the enriched dataset for improved performance in downstream tasks. Our evaluations across diverse datasets and missingness patterns reveal CLAIM's superior performance over existing imputation techniques. Furthermore, our investigation into the effectiveness of context-specific versus generic descriptors for missing data highlights the importance of contextual accuracy in enhancing LLM performance for data imputation. The results underscore CLAIM's potential to markedly improve the reliability and quality of data analysis and machine learning models, offering a more nuanced and effective solution for handling missing data.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# 変化と影響のあるリワード機能を備えたAIアライメント AI Alignment with Changing and Influenceable Reward Functions ( http://arxiv.org/abs/2405.17713v1 ) ライセンス: Link先を確認	Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan,	(参考訳) 既存のAIアライメントアプローチは、好みは静的であり、非現実的である、と仮定する。静的な嗜好を誤って仮定する結果を明らかにするため、我々は、好みの変化を明示的にモデル化し、AIがそれらに与える影響をモデル化する動的リワードマルコフ決定プロセス(DR-MDP)を導入する。その利便性にもかかわらず、静的推論の仮定は既存のアライメント手法の健全性を損なう可能性があり、ユーザーが本当に望まない方法でユーザーの好みに影響を与えるAIシステムに暗黙の報酬を与える。その後、潜在的な解決策を探求する。まず、エージェントの最適化の地平線が、望ましくないAIの影響を部分的に軽減する方法について、統一的な視点を提供する。そして、AIアライメントのさまざまな概念を定式化し、最初からの好みの変化を考慮に入れます。このようなアライメントの8つの概念の強みと限界を比較すると、彼らは皆、望ましくないAIの影響を誘発するか、過度にリスクを回避し、好みを変える問題に対する直接的な解決策が存在しないことを示唆している。現実世界の設定で好みを変えることを避けることはできないため、これらの問題に注意、リスクのバランス、能力で対処することがより重要になります。私たちは、私たちの仕事が概念的明確性を提供し、人間の好みの変化と影響力のある性質を明示的に説明(そして対立)するAIアライメントプラクティスへの第一歩になることを期待しています。 Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them. We show that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. We then explore potential solutions. First, we offer a unifying perspective on how an agent's optimization horizon may partially help reduce undesirable AI influence. Then, we formalize different notions of AI alignment that account for preference change from the outset. Comparing the strengths and limitations of 8 such notions of alignment, we find that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities. We hope our work can provide conceptual clarity and constitute a first step towards AI alignment practices which explicitly account for (and contend with) the changing and influenceable nature of human preferences.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# AdapNet:低品質画像検索のための適応型ノイズベースネットワーク AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval ( http://arxiv.org/abs/2405.17718v1 ) ライセンス: Link先を確認	Sihe Zhang, Qingdong He, Jinlong Peng, Yuxi Li, Zhengkai Jiang, Jiafu Wu, Mingmin Chi, Yabiao Wang, Chengjie Wang,	(参考訳) 画像検索は、与えられたクエリ画像を使用して、データベース内で視覚的に類似した画像を特定することを目的としている。従来の手法では、マッチングのために画像から抽出された大域的特徴と局所的特徴の両方を使用し、精度を高めるために再分類技術を適用することもある。しかし,これらの手法は,自然要因や人為要因から生じる問合せ画像のノイズを考慮できないことが多く,検索性能に悪影響を及ぼす。この問題を軽減するために,低品質画像検索のための新しい設定を導入し,ロバストな抽象表現を学習するための適応ノイズベースネットワーク(AdapNet)を提案する。具体的には、入力画像の様々な低品質要因を補うために訓練された品質補償ブロックを考案する。さらに、画像品質に応じて勾配にフォーカスを動的に調整し、トレーニング中に未知の雑音サンプルの学習を増強し、クラス内コンパクト性を高める、革新的な適応ノイズベース損失関数を導入する。この性能を評価するために,標準のRevisited OxfordとRevisited Parisのデータセット上で,クリーンなクエリ画像に様々な種類のノイズを適用して構築した,低品質なクエリを持つ2つのデータセットを構築した。総合的な実験的結果は、AdapNetが高品質なデータセットの競合性能を維持しながら、Noss Revisited OxfordとNoss Revisited Parisベンチマークの最先端の手法を超越していることを示している。コードと構築されたデータセットが利用可能になる。 Image retrieval aims to identify visually similar images within a database using a given query image. Traditional methods typically employ both global and local features extracted from images for matching, and may also apply re-ranking techniques to enhance accuracy. However, these methods often fail to account for the noise present in query images, which can stem from natural or human-induced factors, thereby negatively impacting retrieval performance. To mitigate this issue, we introduce a novel setting for low-quality image retrieval, and propose an Adaptive Noise-Based Network (AdapNet) to learn robust abstract representations. Specifically, we devise a quality compensation block trained to compensate for various low-quality factors in input images. Besides, we introduce an innovative adaptive noise-based loss function, which dynamically adjusts its focus on the gradient in accordance with image quality, thereby augmenting the learning of unknown noisy samples during training and enhancing intra-class compactness. To assess the performance, we construct two datasets with low-quality queries, which is built by applying various types of noise on clean query images on the standard Revisited Oxford and Revisited Paris datasets. Comprehensive experimental results illustrate that AdapNet surpasses state-of-the-art methods on the Noise Revisited Oxford and Noise Revisited Paris benchmarks, while maintaining competitive performance on high-quality datasets. The code and constructed datasets will be made available.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# EgoNCE++: Egocentric Video-Language Modelsは手動オブジェクトのインタラクションを本当に理解しているか? EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? ( http://arxiv.org/abs/2405.17719v1 ) ライセンス: Link先を確認	Boshen Xu, Ziheng Wang, Yang Du, Sipeng Zheng, Zhinan Song, Qin Jin,	(参考訳) エゴセントリック・ビデオ言語事前学習は、エゴセントリック・ハンドオブジェクト・インタラクション(EgoHOI)の学習を促進する重要なパラダイムである。既存のテストベッドで大きな成功を収めたにもかかわらず、これらのベンチマークはクローズドセットのビジュアルコンセプトや限られたシナリオに重点を置いている。実世界における多様なEgoHOIの出現により,エゴ中心型ビデオ言語モデル(EgoVLM)の細粒度概念における性能の低下を明らかにするために,EgoHOIBenchというオープン語彙ベンチマークを提案する。この性能ギャップは、現在の手法における時間的ダイナミクスよりも、オブジェクトの理解に強い偏見ときめ細かな監督が不十分なためである。これらの問題に対処するために,EgoNCE++ という新しい非対称のコントラスト目的を導入した。ビデオ・トゥ・テキスト・ロスでは,大言語モデルのテキスト内学習を活用し,HOI関連の単語置換を行うことにより,否定的なキャプションを生成することによってテキストの監督を強化する。テキストとビデオの損失に対して、同じ名詞でビデオ表現を集約するオブジェクト中心のポジティブなビデオサンプリング戦略を提案する。我々の広範な実験により、EgoNCE++は、オープン語彙HOI認識、マルチインスタンス検索、および様々なエゴセントリックモデルにおけるアクション認識タスクを大幅に向上し、最大+26.55%の改善が示されている。私たちのコードはhttps://github.com/xuboshen/EgoNCEpp.comから入手可能です。 Egocentric video-language pretraining is a crucial paradigm to advance the learning of egocentric hand-object interactions (EgoHOI). Despite the great success on existing testbeds, these benchmarks focus more on closed-set visual concepts or limited scenarios. Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminished performance of current egocentric video-language models (EgoVLM) on fined-grained concepts, indicating that these models still lack a full spectrum of egocentric understanding. We attribute this performance gap to insufficient fine-grained supervision and strong bias towards understanding objects rather than temporal dynamics in current methods. To tackle these issues, we introduce a novel asymmetric contrastive objective for EgoHOI named EgoNCE++. For video-to-text loss, we enhance text supervision through the generation of negative captions by leveraging the in-context learning of large language models to perform HOI-related word substitution. For text-to-video loss, we propose an object-centric positive video sampling strategy that aggregates video representations by the same nouns. Our extensive experiments demonstrate that EgoNCE++ significantly boosts open-vocabulary HOI recognition, multi-instance retrieval, and action recognition tasks across various egocentric models, with improvements of up to +26.55%. Our code is available at https://github.com/xuboshen/EgoNCEpp.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# MindFormer:fMRIによるマルチオブジェクト脳デコーディングのためのトランスフォーマアーキテクチャ MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI ( http://arxiv.org/abs/2405.17720v1 ) ライセンス: Link先を確認	Inhwa Han, Jaayeon Lee, Jong Chul Ye,	(参考訳) 神経信号を理解するための研究は長年続けられており、fMRI信号からの視覚的復号が注目されている。特に、画像拡散モデルの出現により、fMRIデータからの画像の再構成が大幅に進んだ。しかし、既存の手法では、再構成された画像に被写体間と被写体間の違いを導入し、精度を損なうことがある。マルチオブジェクト脳デコーディングにおける現在の限界に対処するために,MindFormerと呼ばれる新しいトランスフォーマーアーキテクチャを導入する。このモデルは、安定拡散モデルの条件付けに使用できるfMRI条件の特徴ベクトルを生成するように設計されている。より具体的に言えば、MindFormerは2つの重要なイノベーションを取り入れている。 1)fMRI信号から意味的に意味のある特徴を抽出するIP-Adapterに基づく新しいトレーニング戦略 2 fMRI信号の個人差を効果的に捉えつつ、複数の対象 fMRI データを相乗的に組み合わせた訓練用トークン及び線形層。実験の結果,MindFormerと統合された安定拡散は,異なる対象に対して意味的に一貫した画像を生成することがわかった。この機能は、マルチオブジェクト脳復号における既存のモデルを大幅に上回る。このような進歩は、再建の精度を向上するだけでなく、個人間のニューラル処理のバリエーションの理解を深めます。 Research efforts to understand neural signals have been ongoing for many years, with visual decoding from fMRI signals attracting considerable attention. Particularly, the advent of image diffusion models has advanced the reconstruction of images from fMRI data significantly. However, existing approaches often introduce inter- and intra- subject variations in the reconstructed images, which can compromise accuracy. To address current limitations in multi-subject brain decoding, we introduce a new Transformer architecture called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model. More specifically, MindFormer incorporates two key innovations: 1) a novel training strategy based on the IP-Adapter to extract semantically meaningful features from fMRI signals, and 2) a subject specific token and linear layer that effectively capture individual differences in fMRI signals while synergistically combines multi subject fMRI data for training. Our experimental results demonstrate that Stable Diffusion, when integrated with MindFormer, produces semantically consistent images across different subjects. This capability significantly surpasses existing models in multi-subject brain decoding. Such advancements not only improve the accuracy of our reconstructions but also deepen our understanding of neural processing variations among individuals.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# ClavaDDPM:クラスタ誘導拡散モデルを用いたマルチリレーショナルデータ合成 ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models ( http://arxiv.org/abs/2405.17724v1 ) ライセンス: Link先を確認	Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Xi He,	(参考訳) 表型データ合成の最近の研究は単一のテーブルに焦点を当てているが、現実のアプリケーションは数十から数百の相互接続テーブルを持つ複雑なデータを含むことが多い。マルチリレーショナル(マルチテーブル)データを合成する以前のアプローチでは、より大きなデータセットのスケーラビリティと、異なるテーブルにまたがる属性間の相関など、長距離依存関係のキャプチャという、2つの重要な側面で不足していた。グラフデータモデリングにおける拡散モデルの成功に触発されて、$\textbf{C}luster$ $\textbf{La}tent$ $\textbf{Va}riable$ $guided$ $\textbf{D}enoising$ $\textbf{D}iffusion$ $\textbf{P}robabilistic$ $\textbf{M}odels$ (ClavaDDPM)を導入する。この新たなアプローチでは、クラスタリングラベルを中間体として活用して、特に外部キー制約に着目したテーブル間の関係をモデル化する。 ClavaDDPMは拡散モデルのロバストな生成能力を活用しながら、学習した潜伏変数をテーブル全体に伝播させる効率的なアルゴリズムを取り入れている。これにより、ClavaDDPMは長距離依存関係を効果的にキャプチャできる。さまざまなサイズのマルチテーブルデータセットに対する大規模な評価では、ClavaDDPMは、これらの長距離依存に対する既存のメソッドよりも大幅に優れており、シングルテーブルデータのユーティリティメトリクスに競争力がある。 Recent research in tabular data synthesis has focused on single tables, whereas real-world applications often involve complex data with tens or hundreds of interconnected tables. Previous approaches to synthesizing multi-relational (multi-table) data fall short in two key aspects: scalability for larger datasets and capturing long-range dependencies, such as correlations between attributes spread across different tables. Inspired by the success of diffusion models in tabular data modeling, we introduce $\textbf{C}luster$ $\textbf{La}tent$ $\textbf{Va}riable$ $guided$ $\textbf{D}enoising$ $\textbf{D}iffusion$ $\textbf{P}robabilistic$ $\textbf{M}odels$ (ClavaDDPM). This novel approach leverages clustering labels as intermediaries to model relationships between tables, specifically focusing on foreign key constraints. ClavaDDPM leverages the robust generation capabilities of diffusion models while incorporating efficient algorithms to propagate the learned latent variables across tables. This enables ClavaDDPM to capture long-range dependencies effectively. Extensive evaluations on multi-table datasets of varying sizes show that ClavaDDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# 画像強調のためのカラーシフト推定と補正 Color Shift Estimation-and-Correction for Image Enhancement ( http://arxiv.org/abs/2405.17725v1 ) ライセンス: Link先を確認	Yiyu Li, Ke Xu, Gerhard Petrus Hancke, Rynson W. H. Lau,	(参考訳) 準最適照明条件下で撮影された画像は、オーバー露光とアンダー露光の両方を含む可能性がある。現在のアプローチは主に画像の明るさの調整に重点を置いており、露光の少ない領域では色調の歪みが悪化し、露光の過度な領域では正確な色を復元できない。本研究は,非正規化領域と過剰発現領域が相互に異なる色調分布変化を示すことを観察し,通常「正規化」領域/ピクセルを参照として持たないため,共同モデリングでは正規化が困難であることを示す。本稿では,これらの色変化を推定・補正する学習により,オーバー露光とアンダー露光の両方で画像を強化する新しい手法を提案する。具体的には、まず、UNetベースのネットワークを介して、入力画像の鮮明化および暗化バージョンの色特徴マップを導出し、続いて擬似正規色特徴マップを生成する擬似正規色特徴生成器を作成した。次に,得られた色特徴写像と擬似正規色特徴写像との間の色変化を推定する新しいCOSEモジュールを提案する。 COSEモジュールは、オーバー露光領域とアンダー露光領域の推定色変化を別々に補正する。さらに,強調画像を生成するために,オーバー露光領域とアンダー露光領域の分離補正色を変調する新しいコラー変調 (COMO) モジュールを提案する。総合実験により,本手法が既存手法より優れていることが示された。プロジェクトWebページ: https://github.com/yiyulics/CSEC Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that under-exposed and over-exposed regions display opposite color tone distribution shifts with respect to each other, which may not be easily normalized in joint modeling as they usually do not have ``normal-exposed'' regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the brightened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened (or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project webpage: https://github.com/yiyulics/CSEC.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# LLMによる全体的評価のファシリテート:シナリオベース実験からの考察 Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based Experiments ( http://arxiv.org/abs/2405.17728v1 ) ライセンス: Link先を確認	Toru Ishida,	(参考訳) クリエイティビティを促進するためのワークショップコースが人気を集めている。しかし,経験豊富な教員チームであっても,多様な視点で総合評価を行うことは困難である。様々な評価を統合するためには適切な議論が不可欠であるが、大学はそのような検討の時間を欠いていることが多い。議論のない平均スコアの導出は、全体的評価の目的を損なう。本稿では,多様な教員評価を統合するためのファシリテータとして,LLM(Large Language Model)の利用について検討する。シナリオに基づく実験は、LLMが多様な評価を合成し、基礎となる理論を学部に説明できるかどうかを決定するために行われた。その結果, LLMは学部の議論を効果的に促進したことが明らかとなった。さらにLLMは、学習したドメイン知識に基づいて、単一のシナリオから評価基準を一般化し作成する能力を示した。 Workshop courses designed to foster creativity are gaining popularity. However, achieving a holistic evaluation that accommodates diverse perspectives is challenging, even for experienced faculty teams. Adequate discussion is essential to integrate varied assessments, but faculty often lack the time for such deliberations. Deriving an average score without discussion undermines the purpose of a holistic evaluation. This paper explores the use of a Large Language Model (LLM) as a facilitator to integrate diverse faculty assessments. Scenario-based experiments were conducted to determine if the LLM could synthesize diverse evaluations and explain the underlying theories to faculty. The results were noteworthy, showing that the LLM effectively facilitated faculty discussions. Additionally, the LLM demonstrated the capability to generalize and create evaluation criteria from a single scenario based on its learned domain knowledge.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# 階層的行動認識 : 階層的相互作用を用いたコントラスト的ビデオ言語アプローチ Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions ( http://arxiv.org/abs/2405.17729v1 ) ライセンス: Link先を確認	Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan,	(参考訳) ビデオ認識は依然としてオープンな課題であり、ビデオ内の多様なコンテンツカテゴリーを識別する必要がある。主流のアプローチはしばしば平坦な分類を行い、本質的な階層構造に関連するカテゴリを見渡す。そこで本稿では,階層型音声認識の新たな課題を定式化し,階層型認識に適したビデオ言語学習フレームワークを提案する。具体的には,階層的カテゴリ間の依存関係を符号化し,認識予測にトップダウン制約を適用した。さらに、脳卒中患者のリハビリテーションのための医療評価に基づく、新たな詳細なデータセットを構築し、階層的認識のための挑戦的なベンチマークとして機能する。広範にわたる実験を通じて,従来手法,特に細粒度サブカテゴリよりも優れていた階層認識に対するアプローチの有効性を実証した。提案するフレームワークは,ビデオ理解タスクにおける階層的モデリングの道を開くもので,フラットな分類を超えている。 Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods, especially for fine-grained subcategories. The proposed framework paves the way for hierarchical modeling in video understanding tasks, moving beyond flat categorization.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# MMPareto:無害なユニモーダル支援によるマルチモーダル学習の促進 MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance ( http://arxiv.org/abs/2405.17730v1 ) ライセンス: Link先を確認	Yake Wei, Di Hu,	(参考訳) 対象とする一助学習目標を持つマルチモーダル学習法は,不均衡なマルチモーダル学習問題を緩和する上で,優れた効果を示した。しかし,本論文では,マルチモーダル学習目標と非モーダル学習目標との従来無視されていた勾配の矛盾を同定し,アンモーダルエンコーダの最適化を誤解させる可能性がある。これらの矛盾をうまく低減するため, マルチモーダル損失と単一モーダル損失の差を観察し, より容易で学習しやすいマルチモーダル損失の勾配の大きさと共分散が, 単モーダル損失よりも小さいことを示した。この特性により,マルチモーダルシナリオ下でのPareto統合の解析とMMParetoアルゴリズムを提案する。最後に、多種多様なモーダル性および密接な相互モーダル相互作用を持つフレームワークを用いた実験は、我々の優れた拡張可能なメソッド性能を示している。また,タスクの難易度に明確な相違があるマルチタスクのケースを容易にし,その理想的なスケーラビリティを示すことが期待されている。ソースコードとデータセットはhttps://github.com/GeWu-Lab/MMPareto_ICML2024で公開されている。 Multimodal learning methods with targeted unimodal learning objectives have exhibited their superior efficacy in alleviating the imbalanced multimodal learning problem. However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization. To well diminish these conflicts, we observe the discrepancy between multimodal loss and unimodal loss, where both gradient magnitude and covariance of the easier-to-learn multimodal loss are smaller than the unimodal one. With this property, we analyze Pareto integration under our multimodal scenario and propose MMPareto algorithm, which could ensure a final gradient with direction that is common to all learning objectives and enhanced magnitude to improve generalization, providing innocent unimodal assistance. Finally, experiments across multiple types of modalities and frameworks with dense cross-modal interaction indicate our superior and extendable method performance. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty, demonstrating its ideal scalability. The source code and dataset are available at https://github.com/GeWu-Lab/MMPareto_ICML2024.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# C$^{3}$Bench: 大規模言語モデルのための包括的古典中国語理解ベンチマーク C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models ( http://arxiv.org/abs/2405.17732v1 ) ライセンス: Link先を確認	Jiahuan Cao, Yongxin Shi, Dezhi Peng, Yang Liu, Lianwen Jin,	(参考訳) 古典中国語理解(CCU)は、中国の卓越した文化の保存と探索に重要な価値を持っている。近年,CCUにおけるLarge Language Models (LLMs) の可能性を活用しようと試みている。しかし、LLMのCCU機能を評価するための包括的なベンチマークは提供されていない。このギャップを埋めるために、C$^{3}$benchは、分類、検索、名前付きエンティティ認識、句読点、翻訳を含む5つの主要なCCUタスクに対して50,000のテキストペアからなる、包括的古典中国語理解ベンチマークである。さらに、C$^{3}$benchのデータは10の異なる領域から生まれ、古典中国語のカテゴリの大半をカバーしている。提案した C$^{3}$bench を用いて,5つのCCU タスクすべてに対する15の代表的な LLM の定量的性能を広範囲に評価した。 LLMのCCU機能の公開リーダボードを確立するだけでなく,いくつかの知見を得た。具体的には、既存のLLMはCCUタスクに苦戦しており、教師付きモデルに劣っている。さらに、CCUは特別な注意を要するタスクであることを示す。この研究は、LCMベースのCCU研究の将来的な進歩のための標準ベンチマーク、包括的ベースライン、および貴重な洞察を提供することができると信じている。評価パイプラインとデータセットは \url{https://github.com/SCUT-DLVCLab/C3bench} で公開されている。 Classical Chinese Understanding (CCU) holds significant value in preserving and exploration of the outstanding traditional Chinese culture. Recently, researchers have attempted to leverage the potential of Large Language Models (LLMs) for CCU by capitalizing on their remarkable comprehension and semantic capabilities. However, no comprehensive benchmark is available to assess the CCU capabilities of LLMs. To fill this gap, this paper introduces C$^{3}$bench, a Comprehensive Classical Chinese understanding benchmark, which comprises 50,000 text pairs for five primary CCU tasks, including classification, retrieval, named entity recognition, punctuation, and translation. Furthermore, the data in C$^{3}$bench originates from ten different domains, covering most of the categories in classical Chinese. Leveraging the proposed C$^{3}$bench, we extensively evaluate the quantitative performance of 15 representative LLMs on all five CCU tasks. Our results not only establish a public leaderboard of LLMs' CCU capabilities but also gain some findings. Specifically, existing LLMs are struggle with CCU tasks and still inferior to supervised models. Additionally, the results indicate that CCU is a task that requires special attention. We believe this study could provide a standard benchmark, comprehensive baselines, and valuable insights for the future advancement of LLM-based CCU research. The evaluation pipeline and dataset are available at \url{https://github.com/SCUT-DLVCLab/C3bench}.	翻訳日:2024-05-29 22:51:42 公開日:2024-05-28
# 矛盾とキュリオシティ : ケントによるマカネス批判--Galley--Müller derivation of the Quantum Measurement Postulates Contradictions or Curiosities? On Kent's Critique of the Masanes--Galley--Müller Derivation of the Quantum Measurement Postulates ( http://arxiv.org/abs/2405.17733v1 ) ライセンス: Link先を確認	Blake C. Stacey,	(参考訳) エイドリアン・ケントは近年、量子力学の仮定に関するマカネス、ガレー、M\'ullerの業績を批判している。 MGMはケントの批判には2つの矛盾があると主張している。他の前提が加えられない限り、私はどちらも真の矛盾ではないと論じます。 Adrian Kent has recently criticized Masanes, Galley and M\"uller's work on postulates for quantum mechanics. MGM claim to find two contradictions in Kent's criticism. I argue that neither is a true contradiction unless some other premise is added.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 費用対効果の低いクラスレート推定による効果的な災害対応に向けて : ナイマン配置成層学習によるアクティブラーニング Towards Efficient Disaster Response via Cost-effective Unbiased Class Rate Estimation through Neyman Allocation Stratified Sampling Active Learning ( http://arxiv.org/abs/2405.17734v1 ) ライセンス: Link先を確認	Yanbing Bai, Xinyi Wu, Lai Xu, Jihan Pei, Erick Mas, Shunichi Koshimura,	(参考訳) 地球観測技術の急速な発展に伴い、我々は大量の衛星リモートセンシングデータの時代に入った。しかし、大量の衛星リモートセンシングデータがラベルを欠いているか、ラベルコストが高すぎて、AI技術が衛星データをマイニングする可能性を妨げる。特に、衛星データを用いて災害被害の程度を評価する緊急対応シナリオ。災害被害評価は、特定の地理的空間や大規模の特定地域において、特定の建物の被害に過度に焦点が当てられているため、ボトルネックに遭遇した。実際、災害緊急対応の初期の段階では、政府省は1棟の被害ではなく、災害地域の全体的被害率を懸念しており、政府の緊急対応のレベル決定に役立っている。本稿では,二分分類のための階層化ランダムサンプリング木を構築し,この手法を多クラス問題に拡張する革新的なアルゴリズムを提案する。様々なデータセットやモデル構造に関する広範な実験を通じて,本手法は,単純なサンプリングのアノテーションコストの30～60パーセントに留まらず,クラスレート推定とモデル強化の両面で,受動的および従来型のアクティブラーニング手法を超越していることを示す。従来のアクティブな学習戦略における"サンプリングバイアス"の課題に効果的に対処し、"コールドスタート"ジレンマを緩和します。提案手法の有効性は,Xview2衛星画像を用いた災害評価タスクに適用することでさらに実証され,実環境における実用性を示す。 With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree of disaster damage. Disaster damage assessment encountered bottlenecks due to excessive focus on the damage of a certain building in a specific geographical space or a certain area on a larger scale. In fact, in the early days of disaster emergency response, government departments were more concerned about the overall damage rate of the disaster area instead of single-building damage, because this helps the government decide the level of emergency response. We present an innovative algorithm that constructs Neyman stratified random sampling trees for binary classification and extends this approach to multiclass problems. Through extensive experimentation on various datasets and model structures, our findings demonstrate that our method surpasses both passive and conventional active learning techniques in terms of class rate estimation and model enhancement with only 30\%-60\% of the annotation cost of simple sampling. It effectively addresses the 'sampling bias' challenge in traditional active learning strategies and mitigates the 'cold start' dilemma. The efficacy of our approach is further substantiated through application to disaster evaluation tasks using Xview2 Satellite imagery, showcasing its practical utility in real-world contexts.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 最適複合パルスを用いたフォノン数測定 Phonon Number Measurement Using Optimal Composite Pulses ( http://arxiv.org/abs/2405.17736v1 ) ライセンス: Link先を確認	Xie-Qian Li, Ping-Xing Chen,	(参考訳) レーザー冷却したイオンのフォノン数を測定することは、イオンが基底状態にあるかどうかを評価するのに必須のステップである。現在、実験で一般的に使われている方法は、赤から青へのサイドバンド比と断熱的進化のレッドサイドバンド法である。理論的には、状態進化の適合を必要とせず、選択されたフォック状態の集団を直接測定できる複合パルスを用いた手法を提案する。これは、断熱的進化のレッドサイドバンド法と直接比較して、フォック状態のより高い個体数を測定することができる。我々は、合成パルスのユニタリ演算の忠実度を改善するために、量子最適制御法を用いる。量子最適制御技術では、レーザー強度が強く、多くの近似が不要であり、ゲートの忠実度をさらに改善できる状況について議論できる。次に, 高精度に測定結果を修正し, 高いFock状態測定への適用例を示す。 Measuring the phonon number of the laser-cooled ions is an indispensable step in evaluating whether an ion is in ground state. At present, commonly used methods in the experiments are red-to-blue sideband ratios and adiabatic evolution red-sideband methods. We theoretically propose a method using composite pulses which does not need a fit of state evolution and can directly measure the population of the selected Fock state. It can measure higher Fock state population more directly comparing with the adiabatic evolution red-sideband method. We use quantum optimal control method to improve the fidelity of unitary operation of the composite pulses. With quantum optimal control technology, we can discuss the situation where the laser strength is strong, and many approximations will not be necessary, where the gate fidelity can be further improved. Then we give a method to modify the measurement result for a higher accuracy which has a good performance, and we give an example to illustrate its application on high Fock state measurement.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# HTTP Garden:リクエストストリームの差分ファズリングによるHTTP/1.1実装における解析脆弱性の発見 The HTTP Garden: Discovering Parsing Vulnerabilities in HTTP/1.1 Implementations by Differential Fuzzing of Request Streams ( http://arxiv.org/abs/2405.17737v1 ) ライセンス: Link先を確認	Ben Kallus, Prashant Anantharaman, Michael Locasto, Sean W. Smith,	(参考訳) HTTP/1.1で不一致を解析することは、Webサーバに対する数多くの攻撃の基盤となっている。 HTTP解析の相違を発見するためのこれまでのテクニックは、HTTPゲートウェイサーバのブラックボックス差分テストに重点を置いていた。これらのテクニックはいくつかの脆弱性を検出することができるが、ゲートウェイサーバのアウトプットのみを調べることで、不一致に関連する脆弱性を解析するすべてのことが検出できるわけではない。我々のシステムであるHTTP Gardenは、元のサーバの解釈とHTTPリクエストのゲートウェイサーバの変換の両方を調べます。リクエストストリームのすべてのコンポーネントを変更可能なHTTP/1.1オリジンサーバ用のカバレッジガイド付き差分ファズーも備えており、対話型REPLと組み合わせることで、有意義なHTTP解析の不一致の自動発見と、これらの不一致を攻撃ペイロードに迅速に展開することを可能にする。私たちのツールを使って、人気のあるWebサーバで100以上のHTTPパースバグを発見し、報告しました。これらのうち39は、悪用可能であると指定します。私たちは、研究者がHTTP/1.1サーバに対する新しいパーサの差異に基づく攻撃を調査できるように、無償のソフトウェアライセンスの下で、HTTP GardenをGitHubに公開しました。 HTTP/1.1 parsing discrepancies have been the basis for numerous classes of attacks against web servers. Previous techniques for discovering HTTP parsing discrepancies have focused on blackbox differential testing of HTTP gateway servers, despite evidence that the most significant parsing anomalies occur within origin servers. While these techniques can detect some vulnerabilities, not all parsing discrepancy-related vulnerabilities are detectable by examining a gateway server's output alone. Our system, the HTTP Garden, examines both origin servers' interpretations and gateway servers' transformations of HTTP requests. It also includes a coverage-guided differential fuzzer for HTTP/1.1 origin servers that is capable of mutating all components of a request stream, paired with an interactive REPL that facilitates the automatic discovery of meaningful HTTP parsing discrepancies and the rapid development of those discrepancies into attack payloads. Using our tool, we have discovered and reported over 100 HTTP parsing bugs in popular web servers, of which 68 have been fixed following our reports. We designate 39 of these to be exploitable. We release the HTTP Garden to the public on GitHub under a free software license to allow researchers to further explore new parser discrepancy-based attacks against HTTP/1.1 servers.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# The Widening Gap: 初心者プログラマのための生成AIのメリットとハーム The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers ( http://arxiv.org/abs/2405.17739v1 ) ライセンス: Link先を確認	James Prather, Brent Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S. Randrianasolo, Brett Becker, Bailey Kimmel, Jared Wright, Ben Briggs,	(参考訳) 初心者プログラマはメタ認知的認識と戦略の欠如により、しばしば問題解決に苦しむ。これまでの研究によると、初心者はプログラミング中に複数のメタ認知障害に遭遇する可能性がある。初心者は通常、これらの困難が彼らの進歩を妨げていることに気付いていません。一方、多くの初心者がジェネレーティブAI(GenAI)を使ってプログラミングしており、ほとんどの導入プログラミング問題、コード提案、スタント時の次のステップのヒント、暗号化エラーメッセージの説明などに対する完全なソリューションを提供することができる。初心者のメタ認知に対するその影響は、探求され始めたばかりである。ここでは、初心者プログラミングの問題解決行動を調査し、GenAIツールを組み込むことでそれを拡張する以前の研究を再現する。参加者の観察、インタビュー、視線追跡からなる21のラボセッションを通じて、初心者がGenAIツールでどのようにコーディングしているかを調査する。 21名の学生のうち20名が割り当てられたプログラミング問題を完成させたが、この発見は、加速した学生と苦労した学生の間で、GenAIツールの使用が不運な二分したことを示している。加速した学生はGenAIを使って、すでに意図していたコードを作成でき、不正なインラインコード提案を無視することができた。しかし、苦労した学生にとって、これまでに知られていたメタ認知障害は継続し、残念ながらGenAIはそれらを統合し、新しいメタ認知障害を発生させる可能性がある。さらに,苦労した学生は,問題解決能力に対する認知的不協和感を呈し,能力の錯覚に終止符を打った。両グループによる観察から、初歩的なGenAI体験を足場にし、今後の作業を提案する方法を提案する。 Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are now programming with generative AI (GenAI), which can provide complete solutions to most introductory programming problems, code suggestions, hints for next steps when stuck, and explain cryptic error messages. Its impact on novice metacognition has only started to be explored. Here we replicate a previous study that examined novice programming problem solving behavior and extend it by incorporating GenAI tools. Through 21 lab sessions consisting of participant observation, interview, and eye tracking, we explore how novices are coding with GenAI tools. Although 20 of 21 students completed the assigned programming problem, our findings show an unfortunate divide in the use of GenAI tools between students who accelerated and students who struggled. Students who accelerated were able to use GenAI to create code they already intended to make and were able to ignore unhelpful or incorrect inline code suggestions. But for students who struggled, our findings indicate that previously known metacognitive difficulties persist, and that GenAI unfortunately can compound them and even introduce new metacognitive difficulties. Furthermore, struggling students often expressed cognitive dissonance about their problem solving ability, thought they performed better than they did, and finished with an illusion of competence. Based on our observations from both groups, we propose ways to scaffold the novice GenAI experience and make suggestions for future work.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# MobileConvRec: モバイルアプリレコメンデーションのための会話データセット MobileConvRec: A Conversational Dataset for Mobile Apps Recommendations ( http://arxiv.org/abs/2405.17740v1 ) ライセンス: Link先を確認	Srijata Maji, Moghis Fereidouni, Vinaik Chhetri, Umar Farooq, A. B. Siddique,	(参考訳) 既存のレコメンデーションシステムは、2つのパラダイムに重点を置いている: 1-歴史的ユーザ-イテムインタラクションベースのレコメンデーションと2-会話レコメンデーション。対話型レコメンデーションシステムは、ユーザとシステム間の自然言語対話を容易にし、ユーザがレコメンデーションを問い合わせたり、フィードバックを提供したりしながら、ユーザの明示的なニーズを喚起することを可能にする。自然言語処理の大幅な進歩により、会話レコメンデーションシステムが注目されている。既存の会話レコメンデーションデータセットは、それぞれの領域の研究を大いに促進してきた。近年、モバイルユーザーやアプリの急増にもかかわらず、会話型モバイルアプリレコメンデーターシステムの研究は、かなりの制約に直面している。この制限は主に、モバイルアプリに特化した高品質なベンチマークデータセットが欠如していることに起因する。会話型モバイルアプリレコメンデーションの研究を容易にするために,MobileConvRecを紹介した。 MobileConvRecは、Google Playストア上のモバイルアプリとの実際のユーザインタラクションを活用することで、会話をシミュレートする。提案した対話レコメンデーションデータセットは、暗黙のユーザ嗜好を反映したシーケンシャルなユーザとイテムのインタラクションと、包括的なマルチターン会話を併用して、明示的なユーザニーズを効果的に把握する。 MobileConvRecは、45のアプリカテゴリにまたがる12万以上のマルチターンレコメンデーション関連の会話で構成されている。さらに、MobileConvRecは、パーミッションデータ、セキュリティとプライバシ関連の情報、アプリのバイナリ実行ファイルなど、各アプリに豊富なメタデータを提供する。我々は,MobileConvRecが,いくつかの事前学習された大規模言語モデルの比較研究を通じて,対話型モバイルアプリレコメンデーションのための優れたテストベッドとして機能できることを実証した。 Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due to substantial advancements in natural language processing, conversational recommendation systems have gained prominence. Existing conversational recommendation datasets have greatly facilitated research in their respective domains. Despite the exponential growth in mobile users and apps in recent years, research in conversational mobile app recommender systems has faced substantial constraints. This limitation can primarily be attributed to the lack of high-quality benchmark datasets specifically tailored for mobile apps. To facilitate research for conversational mobile app recommendations, we introduce MobileConvRec. MobileConvRec simulates conversations by leveraging real user interactions with mobile apps on the Google Play store, originally captured in large-scale mobile app recommendation dataset MobileRec. The proposed conversational recommendation dataset synergizes sequential user-item interactions, which reflect implicit user preferences, with comprehensive multi-turn conversations to effectively grasp explicit user needs. MobileConvRec consists of over 12K multi-turn recommendation-related conversations spanning 45 app categories. Moreover, MobileConvRec presents rich metadata for each app such as permissions data, security and privacy-related information, and binary executables of apps, among others. We demonstrate that MobileConvRec can serve as an excellent testbed for conversational mobile app recommendation through a comparative study of several pre-trained large language models.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# LoRA-Switch:System-Algorithm共設計による動的LLMアダプタの効率向上 LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design ( http://arxiv.org/abs/2405.17741v1 ) ライセンス: Link先を確認	Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu,	(参考訳) 近年の文献では、大規模言語モデル(LLM)をカスタマイズまたは改善するための効果的な方法は、低ランクアダプタ(LoRA)やMixture-of-Experts(MoE)構造などの動的アダプタを追加することである。このような動的アダプタは、控えめな計算複雑性を発生させるが、驚くほど大きな推論遅延のオーバーヘッドを招き、復号速度を2.5倍も遅くする。本稿では,動的アダプタの細粒度コストを解析し,断片化したCUDAカーネルコールが根本原因であることを示す。そこで本稿では,効率的な動的アダプタのためのシステムアルゴリズムであるLoRA-Switchを提案する。レイヤワイドまたはブロックワイドな動的ルーティングを採用する既存の動的構造とは異なり、LoRA-Switchはトークンワイドなルーティング機構を導入している。トークンごとにLoRAアダプタとウェイトを切り替え、推論のためにそれらをバックボーンにマージする。効率を上げるために、このスイッチングは最適化されたCUDAカーネルで実装され、同時に全てのLoRAアダプタのマージ操作を融合させる。提案手法は,従来の動的アダプタと同様の精度向上を実現し,復号遅延を2.4回以上削減した。 Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this paper, we analyze the fine-grained costs of the dynamic adapters and find that the fragmented CUDA kernel calls are the root cause. Therefore, we propose LoRA-Switch, a system-algorithm co-designed architecture for efficient dynamic adapters. Unlike most existing dynamic structures that adopt layer-wise or block-wise dynamic routing, LoRA-Switch introduces a token-wise routing mechanism. It switches the LoRA adapters and weights for each token and merges them into the backbone for inference. For efficiency, this switching is implemented with an optimized CUDA kernel, which fuses the merging operations for all LoRA adapters at once. Based on experiments with popular open-source LLMs on common benchmarks, our approach has demonstrated similar accuracy improvement as existing dynamic adapters, while reducing the decoding latency by more than 2.4 times.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# ORLM:最適化モデリングのための大規模言語モデルのトレーニング ORLM: Training Large Language Models for Optimization Modeling ( http://arxiv.org/abs/2405.17743v1 ) ライセンス: Link先を確認	Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang,	(参考訳) 大規模言語モデル(LLM)は、最適化モデリングの自動化において複雑なオペレーションリサーチ(OR)のための強力なツールとして登場した。しかし、現在の方法論はプロプライエタリなLCMとの迅速なエンジニアリング(マルチエージェントの協力など)に大きく依存しており、業界アプリケーションでは禁止される可能性のあるデータのプライバシに関する懸念を提起している。この問題に対処するために、最適化モデリングのためのオープンソースのLLMのトレーニングを提案する。 OR LLMのトレーニングデータセットの4つの重要な要件を特定し,特定の要件に合わせた合成データを生成するための半自動プロセスであるOR-Instructを設計,実装する。また、実世界のOR問題を解決する上でLLMをテストするための最初の産業ベンチマークであるIndustrialORベンチマークも導入した。 OR-Instruct のデータを 7b サイズの様々なオープンソース LLM (ORLMs と呼ぶ) に適用することにより,最適化モデルの性能を大幅に向上する。我々は,NL4OPT,MAMO,IndustrialORベンチマークの最先端性能を実現する。私たちのコードとデータは、 \url{https://github.com/Cardinal-Operations/ORLM}で公開されます。 Large Language Models (LLMs) have emerged as powerful tools for complex Operations Research (OR) in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue, we propose training open-source LLMs for optimization modeling. We identify four critical requirements for the training dataset of OR LLMs, design and implement OR-Instruct, a semi-automated process for creating synthetic data tailored to specific requirements. We also introduce the IndustryOR benchmark, the first industrial benchmark for testing LLMs on solving real-world OR problems. We apply the data from OR-Instruct to various open-source LLMs of 7b size (termed as ORLMs), resulting in a significantly improved capability for optimization modeling. Our best-performing ORLM achieves state-of-the-art performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Our code and data will be available at \url{https://github.com/Cardinal-Operations/ORLM}.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# バックドア緩和のためのプルーニング再考:最適化の視点から Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective ( http://arxiv.org/abs/2405.17746v1 ) ライセンス: Link先を確認	Nan Li, Haiyang Yu, Ping Yi,	(参考訳) Deep Neural Networks(DNN)は、バックドア攻撃の脆弱性として知られており、信頼性の高いデプロイメントに対する脅威を訴えている。最近の研究では、特定のニューロン群を刈り取ることで、感染したDNNからバックドアを消去できる一方で、これらのバックドア関連ニューロンを効果的に識別し、除去する方法がオープンな課題であることが明らかになっている。既存の防御法のほとんどは定義された規則に頼っており、プルーニングポリシーの探索と最適化を無視してニューロンの局所的な性質に焦点をあてている。このギャップに対処するために,グラフニューラルネットワーク(GNN)と強化学習(RL)を組み合わせたONP法を提案する。具体的には、ONPはまず、ターゲットのDNNをニューロン接続に基づくグラフとしてモデル化し、次にGNNベースのRLエージェントを使用してグラフ埋め込みを学習し、適切なプルーニングポリシーを見つける。我々の知る限りでは、これはGNNとRLをバックドアディフェンス分野におけるプルーニングポリシーの最適化に活用する最初の試みである。実験により、少量のクリーンデータを用いて、ONPは、バックドア攻撃によって埋め込まれたバックドアニューロンを、無視できる性能劣化の犠牲にして効果的にプルークすることができることを示し、バックドア緩和のための新しい最先端のパフォーマンスを実現する。 Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. Most of the existing defense methods rely on defined rules and focus on neuron's local properties, ignoring the exploration and optimization of pruning policies. To address this gap, we propose an Optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models. Specifically, ONP first models the target DNN as graphs based on neuron connectivity, and then uses GNN-based RL agents to learn graph embeddings and find a suitable pruning policy. To the best of our knowledge, this is the first attempt to employ GNN and RL for optimizing pruning policies in the field of backdoor defense. Experiments show, with a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation, achieving a new state-of-the-art performance for backdoor mitigation.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 多重バンド非エルミート系の擬エルミート位相 Pseudo-Hermitian Topology of Multiband Non-Hermitian Systems ( http://arxiv.org/abs/2405.17749v1 ) ライセンス: Link先を確認	Jung-Wan Ryu, Jae-Ho Han, Chang-Hwan Yi, Hee Chul Park, Moon Jip Park,	(参考訳) 非エルミート系の複素アイジネギーと非直交固有状態は、エルミート系には現れない独自の位相現象を示す。代表的な例として、非エルミート皮膚効果や例外点が挙げられる。二次元パラメータ空間において、多バンド非エルミート系における非分離帯域の位相的分類は、置換の積が空間の例外的な点による状態交換を表すような置換群を呼び出すことによって確立することができる。この研究において、多重バンドに対する非エルミート位相における擬エルミート線の役割を明らかにする。現在の理解とは対照的に、非エルミートマルチバンドの非分離性は2次元空間の例外的な点なしで位相的に非自明である。我々の研究は、非エルミート的マルチバンドシステムの基本的な包括的理解に基づいており、また、例外的な点を考慮せずに、非エルミート的システムの多元的応用と実現を提供する。 The complex eigenenergies and non-orthogonal eigenstates of non-Hermitian systems exhibit unique topological phenomena that cannot appear in Hermitian systems. Representative examples are the non-Hermitian skin effect and exceptional points. In a two-dimensional parameter space, topological classifications of non-separable bands in multiband non-Hermitian systems can be established by invoking a permutation group, where the product of the permutation represents state exchange due to exceptional points in the space. We unveil in this work the role of pseudo-Hermitian lines in non-Hermitian topology for multiple bands. Contrary to current understanding, the non-separability of non-Hermitian multibands can be topologically non-trivial without exceptional points in two-dimensional space. Our work builds on the fundamental and comprehensive understanding of non-Hermitian multiband systems and also offers versatile applications and realizations of non-Hermitian systems without the need to consider exceptional points.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# バックドアディフェンダーのためのマグニチュードベースニューロンプルーニング Magnitude-based Neuron Pruning for Backdoor Defens ( http://arxiv.org/abs/2405.17750v1 ) ライセンス: Link先を確認	Nan Li, Haoyu Jiang, Ping Yi,	(参考訳) Deep Neural Networks(DNN)は、バックドア攻撃の脆弱性として知られており、信頼性の高いデプロイメントに対する脅威を訴えている。最近の研究では、特定のニューロン群を刈り取ることで、感染したDNNからバックドアを消去できる一方で、これらのバックドア関連ニューロンを効果的に識別し、除去する方法がオープンな課題であることが明らかになっている。本稿では, バックドアの挙動とニューロンの大きさの相関について検討し, バックドアニューロンがモデルの大きさと精度の相関から逸脱していることを見出した。この偏差は、バックドアニューロンの検出とプーンを行うために、Magnitude-based Neuron Pruning (MNP)法を提案するきっかけとなった。具体的には、MNPは3等級誘導された目的関数を用いて、バックドアニューロンの等級・彩度相関を制御し、バックドアニューロンを除去し、クリーンニューロンをそれぞれ保存する目的を達成する。実験により,最先端のバックドア防御性能を,クリーンなデータしか持たない様々なバックドア攻撃に対して達成し,バックドア防御を導く上で重要な役割を担っていることが示された。 Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. In this paper, we investigate the correlation between backdoor behavior and neuron magnitude, and find that backdoor neurons deviate from the magnitude-saliency correlation of the model. The deviation inspires us to propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons. Specifically, MNP uses three magnitude-guided objective functions to manipulate the magnitude-saliency correlation of backdoor neurons, thus achieving the purpose of exposing backdoor behavior, eliminating backdoor neurons and preserving clean neurons, respectively. Experiments show our pruning strategy achieves state-of-the-art backdoor defense performance against a variety of backdoor attacks with a limited amount of clean data, demonstrating the crucial role of magnitude for guiding backdoor defenses.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# XL3M:セグメントワイズ推論に基づくLLM長拡張のためのトレーニング不要フレームワーク XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference ( http://arxiv.org/abs/2405.17755v1 ) ライセンス: Link先を確認	Shengnan Wang, Youhui Bai, Lin Zhang, Pingyi Zhou, Shixiong Zhao, Gong Zhang, Sen Wang, Renhai Chen, Hua Xu, Hongwei Sun,	(参考訳) 長大言語モデル(LLM)は最大トレーニング長よりも長いテキストへの一般化に失敗し、長い入力をストリーミングするシナリオにおけるLLMの適用を大幅に制限する。この問題に対処するため、既存の手法は相当なコストを必要とするか、正確に損失を発生させるかのいずれかである。本稿では, LLMの予測精度が精度と高い相関関係があることを実証的に見出した。そこで本研究では,XL3M(超長大言語モデル)という名前の効率的な学習自由フレームワークを提案する。 XL3Mフレームワークの下では、入力コンテキストはまず複数の短いサブコンテキストに分解される。すると、XL3M は各セグメントと `question'' の間の関連性を測定する方法を与え、関連するセグメントすべてを時系列順にスプライシングすることで、簡潔なキーコンテキストを構築する。キーコンテキストは、推論タスクを完了するために、元のコンテキストの代わりにさらに使用される。総合的なベンチマークによる評価は、XL3Mの優位性を示している。我々のフレームワークを用いて、Llama2-7Bモデルは8カードのHuawei Ascend 910B NPUマシン上で、カードあたり64GBのメモリを持つ2000万の長いシーケンスを推論することができる。 Length generalization failure problem, namely the large language model (LLM) fails to generalize to texts longer than its maximum training length, greatly restricts the application of LLM in the scenarios with streaming long inputs. To address this problem, the existing methods either require substantial costs or introduce precision loss. In this paper, we empirically find that the accuracy of the LLM's prediction is highly correlated to its certainty. Based on this, we propose an efficient training free framework, named XL3M (it means extra-long large language model), which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning. Under the XL3M framework, the input context will be firstly decomposed into multiple short sub-contexts, where each sub-context contains an independent segment and a common ``question'' which is a few tokens from the end of the original context. Then XL3M gives a method to measure the relevance between each segment and the ``question'', and constructs a concise key context by splicing all the relevant segments in chronological order. The key context is further used instead of the original context to complete the inference task. Evaluations on comprehensive benchmarks show the superiority of XL3M. Using our framework, a Llama2-7B model is able to reason 20M long sequences on an 8-card Huawei Ascend 910B NPU machine with 64GB memory per card.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 脳MR画像再構成フレームワークのためのモーションインフォームド深層学習 Motion-Informed Deep Learning for Brain MR Image Reconstruction Framework ( http://arxiv.org/abs/2405.17756v1 ) ライセンス: Link先を確認	Zhifeng Chen, Kamlesh Pawar, Kh Tohidul Islam, Himashi Peiris, Gary Egan, Zhaolin Chen,	(参考訳) 磁気共鳴イメージング(MRI)における運動アーティファクトは、スキャン中の患者の動きによって頻繁に発生するアーティファクトの1つである。運動は臨床MRIの約30%に存在すると見積もられているが、深層学習画像再構成モデルでは運動が明確にモデル化されていない。深層学習(DL)アルゴリズムは、画像再構成タスクと運動補正タスクの両方に有効であることが示されているが、これら2つのタスクは別々に検討されている。画像再構成作業では、ノイズやエイリアシングアーティファクトなどのアンダーサンプリングアーティファクトを除去する一方、運動補正では、ぼやけ、ゴースト、リングなどのアーティファクトを除去する。本研究では,画像と正しい動きを同時に高速化する新しい手法を提案する。これは、深層学習に基づくMRI再構成プロセスにモーションモジュールを統合することで実現される。我々は、トレーニング中のディープラーニングモデルにおいて、モーションを密に統合した補助層としてモデル化し、ディープラーニングモデルを「モーションインフォームド」する。推測中、トレーニングされた動きインフォームドDLモデルを用いて、アンサンプされた生のk空間データから画像再構成を行う。実験結果から,提案した動きインフォームド深層学習画像再構成ネットワークは,従来の画像再構成ネットワークよりも優れていた。 Motion artifacts in Magnetic Resonance Imaging (MRI) are one of the frequently occurring artifacts due to patient movements during scanning. Motion is estimated to be present in approximately 30% of clinical MRI scans; however, motion has not been explicitly modeled within deep learning image reconstruction models. Deep learning (DL) algorithms have been demonstrated to be effective for both the image reconstruction task and the motion correction task, but the two tasks are considered separately. The image reconstruction task involves removing undersampling artifacts such as noise and aliasing artifacts, whereas motion correction involves removing artifacts including blurring, ghosting, and ringing. In this work, we propose a novel method to simultaneously accelerate imaging and correct motion. This is achieved by integrating a motion module into the deep learning-based MRI reconstruction process, enabling real-time detection and correction of motion. We model motion as a tightly integrated auxiliary layer in the deep learning model during training, making the deep learning model 'motion-informed'. During inference, image reconstruction is performed from undersampled raw k-space data using a trained motion-informed DL model. Experimental results demonstrate that the proposed motion-informed deep learning image reconstruction network outperformed the conventional image reconstruction network for motion-degraded MRI datasets.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 二重変数削減:一階勾配のない複合最適化問題に対する平滑化トリック Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient ( http://arxiv.org/abs/2405.17761v1 ) ライセンス: Link先を確認	Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang,	(参考訳) ばらつき低減技術はサンプリングのばらつきを低減し、一階法(FO)とゼロ階法(ZO)の収束率を向上するように設計されている。しかし、複合最適化問題では、ZO法は、ランダム勾配推定から導かれる座標ワイド分散と呼ばれる追加の分散に遭遇する。この分散を減らすために、先行研究はすべての偏微分を推定し、基本的にFO情報を近似する必要がある。このアプローチは O(d) 関数の評価(d は次元サイズ)を必要とするが、これはかなりの計算コストを発生させ、高次元シナリオでは禁忌である。本稿では,ZPDVR法とZPDVR法を提案する。従来の手法と比較して、ZPDVRはランダムな勾配推定にのみ依存し、確率的ゼロ次オラクル (SZO) を 1 回当たり $\mathcal{O}(1)$ times と定義し、最適な $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO クエリの複雑さを強い凸と滑らかな設定で達成し、$\kappa$ は条件番号を表し、$\epsilon$ は所望の精度である。実験により、ZPDVRの線形収束を検証し、他の関連手法よりも優れた性能を示す。 Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# プログラマブル量子回路による3レベル量子熱エンジンのダイナミックスと熱力学 Capturing dynamics and thermodynamics of a three-level quantum heat engine via programmable quantum circuits ( http://arxiv.org/abs/2405.17763v1 ) ライセンス: Link先を確認	Gao-xiang Deng, Zhe He, Yu Liu, Wei Shao, Zheng Cui,	(参考訳) この研究はクラウス表現とSzを用いている。量子回路上での3レベル量子熱をモデル化し,その動的進化と熱力学性能について考察した。動的モデルの有効性は、個体群の変化を追跡することによって検証される。強化学習アルゴリズムに基づいて,最大平均電力に対する量子熱エンジンの最適サイクルを提案し,熱力学モデルを用いて検証した。量子回路シミュレーションの安定性は、直交試験に述示される理論およびシミュレーション結果の比較分析により精査される。これらの結果は、量子回路上での量子熱エンジンのシミュレーションの実用性を確認し、そのようなエンジンの構築に伴う実験費用を大幅に削減する可能性を提供する。 This research employs the Kraus representation and Sz.-Nagy dilation theorem to model a three-level quantum heat on quantum circuits, investigating its dynamic evolution and thermodynamic performance. The feasibility of the dynamic model is validated by tracking the changes of population. On the basis of reinforcement learning algorithm, the optimal cycle of the quantum heat engine for maximal average power is proposed and verified by the thermodynamic model. The stability of quantum circuit simulations is scrutinized through a comparative analysis of theoretical and simulated results, predicated on an orthogonal test. These results affirm the practicality of simulating quantum heat engines on quantum circuits, offering potential for substantially curtailing the experimental expenses associated with the construction of such engines.	翻訳日:2024-05-29 22:41:57 公開日:2024-05-28
# 確率過程に基づくシーケンス評価について On the Sequence Evaluation based on Stochastic Processes ( http://arxiv.org/abs/2405.17764v1 ) ライセンス: Link先を確認	Tianhao Zhang, Zhexiao Lin, Zhecheng Sheng, Chen Jiang, Dongyeop Kang,	(参考訳) テキストの長いシーケンスのモデリングと解析は自然言語処理にとって重要な課題である。ニューラルネットワークモデルによる長いテキストダイナミクスのキャプチャの成功は、コヒーレンス評価、テキスト生成、機械翻訳など、多くの下流タスクを促進する。本稿では,確率過程を通したモデル系列に対する新しいアプローチを提案する。本稿では,テキストエンコーダの訓練目標について紹介し,従来の手法と比較して,より詳細なテキスト評価のためのスコア(スコア)を設計する。提案したトレーニング目的はシーケンスコヒーレンスを効果的に保存し,新しいスコアは時間的および空間的両方の依存関係を包括的にキャプチャする。新しいスコアの理論的特性は、シーケンス評価においてその利点を示す。実験の結果,異なる長さの文書間の大域的および局所的な識別を含む,様々なシーケンス評価タスクにおいて,優れた性能を示した。また,人間とAIによるテキストの識別において,エンコーダが競合する結果を得ることを示す。 Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation compared to the previous approach. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Theoretical properties of our new score show its advantages in sequence evaluation. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths. We also demonstrate the encoder achieves competitive results on discriminating human and AI written text.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# PTM-VQA:野生の多様な事前学習モデルを活用した高能率映像品質評価 PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild ( http://arxiv.org/abs/2405.17765v1 ) ライセンス: Link先を確認	Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang,	(参考訳) 映像品質評価(VQA)は,映像の知覚的品質,字幕,コンテンツ魅力,歪みタイプ,動きパターン,レベルに影響を及ぼす要因が多々あるため,難しい問題である。しかしながら、ビデオに対する平均評価スコア(MOS)の注釈付けは高価で時間を要するため、VQAデータセットの規模が制限され、ディープラーニングベースの手法では大きな障害となる。本稿では,PTM-VQAと呼ばれるVQA手法を提案する。PTM-VQAはPreTrained Modelsを利用して,様々な事前タスクで事前訓練されたモデルから知識を伝達し,異なる側面からVQAの利点を実現する。具体的には、凍結重量の異なる事前学習モデルからビデオの特徴を抽出し、それらを統合して表現を生成する。これらのモデルには様々な知識分野があり、品質に関係のないラベルで訓練されることが多いため、複数の事前学習モデルによって抽出された特徴に制約を課すために、ICID(Intra-Consistency and Inter-Divisibility)損失を提案する。一貫性内制約は、異なる事前訓練されたモデルによって抽出された特徴が、同一の品質を意識した潜伏空間にあることを保証し、一方、識別性はサンプルのアノテーションに基づいて擬似クラスタを導入し、異なるクラスタからサンプルの特徴を分離しようとする。さらに、常に事前訓練されたモデルの数が増えているため、どのモデルを使うか、どのように使用するかを決定することが不可欠である。この問題に対処するために,適切な候補を選択するための効率的なスキームを提案する。 VQAデータセットのクラスタリング性能が向上したモデルが候補に選ばれます。大規模実験により提案手法の有効性が示された。 Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video, \eg, content attractiveness, distortion type, motion pattern, and level. However, annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets, and poses a significant obstacle for deep learning-based methods. In this paper, we propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks, enabling benefits for VQA from different aspects. Specifically, we extract features of videos from different pretrained models with frozen weights and integrate them to generate representation. Since these models possess various fields of knowledge and are often trained with labels irrelevant to quality, we propose an Intra-Consistency and Inter-Divisibility (ICID) loss to impose constraints on features extracted by multiple pretrained models. The intra-consistency constraint ensures that features extracted by different pretrained models are in the same unified quality-aware latent space, while the inter-divisibility introduces pseudo clusters based on the annotation of samples and tries to separate features of samples from different clusters. Furthermore, with a constantly growing number of pretrained models, it is crucial to determine which models to use and how to use them. To address this problem, we propose an efficient scheme to select suitable candidates. Models with better clustering performance on VQA datasets are chosen to be our candidates. Extensive experiments demonstrate the effectiveness of the proposed method.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# スリープFM:脳活動、心電図、呼吸信号の横断的な睡眠のためのマルチモーダル表現学習 SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals ( http://arxiv.org/abs/2405.17766v1 ) ライセンス: Link先を確認	Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou,	(参考訳) 睡眠は、脳、心臓、呼吸活動を記録する様々なモードを通して評価される複雑な生理的過程である。マルチモーダル睡眠記録を10万時間以上使用した14,000人以上の参加者を対象に,大規模なポリソノグラフィーデータセットをキュレートした。この広範なデータセットを活用して、睡眠分析のための最初のマルチモーダル基礎モデルであるSleepFMを開発した。コントラスト学習のための新しい一対一の手法は、標準対のコントラスト学習の表現と比較して、ダウンストリームのタスク性能を著しく向上させることを示す。睡眠段階分類(macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48)と睡眠障害呼吸検出(AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61)において、SleepFMの学習埋め込みに基づいてトレーニングされたロジスティック回帰モデルは、エンドツーエンドのトレーニングされた畳み込みニューラルネットワーク(CNN)よりも優れている。特に、学習した埋め込みは、90,000人の候補者から他のモダリティの録音クリップを検索する際に、平均48%のトップ1の精度を達成する。この研究は、睡眠記録の豊かさを完全に捉えるために、総合的なマルチモーダル睡眠モデリングの価値を示す。 SleepFMはオープンソースでhttps://github.com/rthapa84/sleepfm-codebase.comから入手できる。 Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 言語崩壊:(大規模)言語モデルにおける神経崩壊 Linguistic Collapse: Neural Collapse in (Large) Language Models ( http://arxiv.org/abs/2405.17767v1 ) ライセンス: Link先を確認	Robert Wu, Vardan Papyan,	(参考訳) ニューラル崩壊(Neural collapse)(\mathcal{NC}$)は、トップ層表現がクラス平均に崩壊する分類タスクで観察される現象で、等角的、等角的、分類器と整合する。モデルはゼロ損失に向けて訓練され、バランスの取れたクラスに属するノイズフリーラベルは、モデルの隠れた次元を上回りません。近年の研究では、理想的な測地線の利点を拡張・活用するために、これらの条件の1つ以上の欠如により$\mathcal{NC}$を探索している。言語モデリングは興味深いフロンティアを示しており、 \textit{training by token prediction} は条件が存在しない分類タスクを構成している: 語彙は不均衡であり、埋め込み次元を超えた; 異なるトークンは同様の文脈の埋め込みに対応する; 特に大きな言語モデル(LLM)は、いくつかのエポックに対してのみ訓練される。本稿では,アーキテクチャのスケールアップと言語モデル(CLM)の訓練が$\mathcal{NC}$への進行に与える影響を実証的に検討する。スケーリングで発展する$\mathcal{NC}$プロパティが一般化に結びついていることが分かる。さらに、$\mathcal{NC}$とスケールに依存しない一般化の間の何らかの関係の証拠がある。したがって、我々の研究は、言語モデリングの斬新でより困難な設定にまで拡張され、$\mathcal{NC}$の一般性を強調している。下流では、この現象に関するさらなる研究を刺激し、LLMの理解を深め、大規模なニューラルネットワークを開発し、$\mathcal{NC}$-relatedプロパティに基づいた既存のアーキテクチャを改善しようとしています。 Neural collapse ($\mathcal{NC}$) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored $\mathcal{NC}$ in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs. This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards $\mathcal{NC}$. We find that $\mathcal{NC}$ properties that develop with scaling are linked to generalization. Moreover, there is evidence of some relationship between $\mathcal{NC}$ and generalization independent of scale. Our work therefore underscores the generality of $\mathcal{NC}$ as it extends to the novel and more challenging setting of language modeling. Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on $\mathcal{NC}$-related properties.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 異種グラフニューラルネットワークにおけるメッセージパッシングの再検討 Revisiting the Message Passing in Heterophilous Graph Neural Networks ( http://arxiv.org/abs/2405.17768v1 ) ライセンス: Link先を確認	Zhuonan Zheng, Yuanchen Bei, Sheng Zhou, Yao Ma, Ming Gu, HongJia XU, Chengyu Lai, Jiawei Chen, Jiajun Bu,	(参考訳) グラフニューラルネットワーク(GNN)は、隣接するノードが同様の振舞いを示すというホモフィリな仮定に沿うメッセージパッシング機構により、グラフマイニングタスクにおいて強い性能を示す。しかし、多くの実世界のグラフでは、連結ノードは異種交配パターンと呼ばれる対照的な振舞いを示す可能性があるため、異種交配GNN(HTGNN)への関心が高まっている。メッセージパッシング機構は、クラス非関連情報の伝播による異種グラフには適さないように見えるが、現在でも多くの既存のHTGNNで広く利用されており、一貫して顕著な成功を収めている。これは、なぜメッセージパッシングが異種グラフに有効であるのかという疑問を提起する。本稿では,異種グラフニューラルネットワークにおけるメッセージパッシング機構を再検討し,それらを統一異種グラフパッシング(HTMP)機構に再構成する。 HTMPと経験的分析から,既存のHTGNNにおけるメッセージパッシングの成功は,クラス間の互換性を暗黙的に向上させることに起因することが明らかになった。さらに、実世界の異種グラフに不完全でノイズの多いセマンティックな近傍が存在するため、互換性行列の完全なポテンシャルが完全には達成されないと論じる。このギャップを埋めるために,HTMP機構内で動作し,互換性行列を明示的に活用し改善するCMGNNという新しいアプローチを導入する。 10のベンチマークデータセットと13の確立されたベースラインの比較分析による徹底的な評価は、HTMPメカニズムとCMGNNメソッドの優れた性能を強調している。 Graph Neural Networks (GNNs) have demonstrated strong performance in graph mining tasks due to their message-passing mechanism, which is aligned with the homophily assumption that adjacent nodes exhibit similar behaviors. However, in many real-world graphs, connected nodes may display contrasting behaviors, termed as heterophilous patterns, which has attracted increased interest in heterophilous GNNs (HTGNNs). Although the message-passing mechanism seems unsuitable for heterophilous graphs due to the propagation of class-irrelevant information, it is still widely used in many existing HTGNNs and consistently achieves notable success. This raises the question: why does message passing remain effective on heterophilous graphs? To answer this question, in this paper, we revisit the message-passing mechanisms in heterophilous graph neural networks and reformulate them into a unified heterophilious message-passing (HTMP) mechanism. Based on HTMP and empirical analysis, we reveal that the success of message passing in existing HTGNNs is attributed to implicitly enhancing the compatibility matrix among classes. Moreover, we argue that the full potential of the compatibility matrix is not completely achieved due to the existence of incomplete and noisy semantic neighborhoods in real-world heterophilous graphs. To bridge this gap, we introduce a new approach named CMGNN, which operates within the HTMP mechanism to explicitly leverage and improve the compatibility matrix. A thorough evaluation involving 10 benchmark datasets and comparative analysis against 13 well-established baselines highlights the superior performance of the HTMP mechanism and CMGNN method.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# マイクロサケードにインスパイアされたロボット用イベントカメラ Microsaccade-inspired Event Camera for Robotics ( http://arxiv.org/abs/2405.17769v1 ) ライセンス: Link先を確認	Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, Cornelia Fermuller,	(参考訳) ニューロモルフィック視覚センサーやイベントカメラは、非常に低い反応時間の視覚的認識を可能にし、ハイダイナミックなロボティクス応用のための新たな道を開いた。これらのイベントカメラの出力は、動きとテクスチャの両方に依存する。しかし、イベントカメラは、カメラの動きと平行なオブジェクトエッジをキャプチャできない。これはセンサーに固有の問題であり、アルゴリズムを解くのが難しい。人間の視覚は、小さな不随意眼球運動の能動機構を用いて知覚の失明を扱う。固定中に目が常にわずかに動くことで、マイクロサケードはテクスチャの安定性と持続性を大幅に維持できる。マイクロサケードにインスパイアされた我々は,低反応時間と安定したテクスチャを同時に維持できるイベントベースの知覚システムを設計した。この設計では、回転するウェッジプリズムがイベントカメラの開口部の前に取り付けられ、光をリダイレクトし、イベントをトリガーする。回転するくさびプリズムの幾何学的光学は、追加の回転運動のアルゴリズム的補償を可能にし、安定したテクスチャの外観と外部運動とは無関係に高い情報出力をもたらす。ハードウェアデバイスとソフトウェアソリューションはシステムに統合され、我々はArtificial MIcrosaccade-enhanced EVent camera (AMI-EV)と呼ぶ。ベンチマーク比較は、標準カメラとイベントカメラの両方が配信できないシナリオにおいて、AMI-EV記録の優れたデータ品質を検証する。様々な実世界の実験では、ロボット工学が低レベルと高レベルの両方の視覚タスクに対して知覚を促進する可能性を実証している。 Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 一般化とブラインドRGB-Xトラッカーを目指して Towards a Generalist and Blind RGB-X Tracker ( http://arxiv.org/abs/2405.17773v1 ) ライセンス: Link先を確認	Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte,	(参考訳) NLPにおける多数のタスクをうまく解決できる単一大規模モデルの出現により、コンピュータビジョンにおいて同様の目標を達成することへの研究の関心が高まっている。一方、これらの一般的なモデルのほとんどは、汎用的なビジョンモデルと呼ばれ、異なるタスクに対応する統一されたアウトプットを作成することを目的としている。一方、既存のモデルの中には、異なる入力タイプ(いわゆるデータモダリティ)を組み合わせて、1つの大きなモデルで処理するものもある。しかし、この組み合わせのステップは依然として特別であり、最初の野心を果たせていない。本稿では、RGB-Xビデオオブジェクト追跡の文脈において、このような特殊化(統一の際)は不要であることを示す。私たちの単一モデルトラッカーであるXTrackは、推論時間中に任意のモダリティXに盲目のままでいられる。我々のトラッカーは、共有共通性に特化したものと、入力モダリティに基づく推論を柔軟に行うことのできるものとを混合したモーダルエキスパートを用いている。このような設計は、モダリティ固有の情報表現を弱めることなく、共通の潜在空間に対する入力モダリティの統一を保証する。このアイデアにより、トレーニングプロセスは非常にシンプルで、複数ラベルの分類損失をルーティング関数と統合することで、ペアデータのみからでも、すべてのモダリティを効果的に整列し、統一することが可能になる。したがって、推論の間、モーダルの帰納バイアスに頼らずに任意のモダリティを適用でき、ジェネラリストのパフォーマンスを達成することができる。ベルとホイッスルがなければ、我々のジェネラリストとブラインドトラッカーは、3つの補助モーダルの5つのベンチマークにおいて、よく使われる深さ、熱、およびイベントデータを網羅した、確立されたモーダル特化モデルと比較して、競争性能を達成することができる。 With the emergence of a single large model capable of successfully solving a multitude of tasks in NLP, there has been growing research interest in achieving similar goals in computer vision. On the one hand, most of these generic models, referred to as generalist vision models, aim at producing unified outputs serving different tasks. On the other hand, some existing models aim to combine different input types (aka data modalities), which are then processed by a single large model. Yet, this step of combination remains specialized, which falls short of serving the initial ambition. In this paper, we showcase that such specialization (during unification) is unnecessary, in the context of RGB-X video object tracking. Our single model tracker, termed XTrack, can remain blind to any modality X during inference time. Our tracker employs a mixture of modal experts comprising those dedicated to shared commonality and others capable of flexibly performing reasoning conditioned on input modality. Such a design ensures the unification of input modalities towards a common latent space, without weakening the modality-specific information representation. With this idea, our training process is extremely simple, integrating multi-label classification loss with a routing function, thereby effectively aligning and unifying all modalities together, even from only paired data. Thus, during inference, we can adopt any modality without relying on the inductive bias of the modal prior and achieve generalist performance. Without any bells and whistles, our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models on 5 benchmarks across 3 auxiliary modalities, covering commonly used depth, thermal, and event data.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 教師なしドメイン適応のためのプロトタイプネットワークにおける漸進的消滅ギャップ Gradually Vanishing Gap in Prototypical Network for Unsupervised Domain Adaptation ( http://arxiv.org/abs/2405.17774v1 ) ライセンス: Link先を確認	Shanshan Wang, Hao Zhou, Xun Yang, Zhenwei He, Mengzhu Wang, Xingyi Zhang, Meng Wang,	(参考訳) 非教師付きドメイン適応(Unsupervised domain adapt, UDA)は、ラベル付きソースドメインからラベル付きターゲットドメインに意味情報を転送することを目的とした、トランスファーラーニングにおける重要な問題である。近年のUDAモデルの進歩は、対象領域における顕著な一般化能力を示している。しかし、UDAモデルの一般化境界はいまだ不明である。ドメインの不一致が大きすぎると、モデルは分布構造を保存できず、アライメント中に分布が崩壊する。この課題に対処するために,グローバルとローカルの両方の観点から伝達学習を実現するGVG-PN(Gradually Vanishing Gap in Prototypeal Network)という,効率的なUDAフレームワークを提案する。大域的なアライメントの観点から、我々のモデルは分布構造を保存するのに役立つ領域バイアスの中間領域を生成する。ドメイン間の特徴を絡み合わせることで、我々のモデルは分散崩壊のリスクを徐々に低減します。しかし、分布構造を維持するには、グローバルアライメントに頼るだけでは不十分である。特徴の内的関係をさらに高めるために,局所的な視点を導入する。グラフ畳み込みネットワーク(GCN)を直感的な手法として,特徴間の内部関係を探索し,多様体構造を確実に保存し,ドメインバイアスのあるプロトタイプを生成する。さらに,特徴間の内的関係の識別可能性についても検討する。本稿では, 高い負の対を分離することにより, プロトタイプレベルでの識別可能性を高めるための競合性損失を提案する。このモデルでは,GCNとプロコントラスト損失の両方を組み込むことで,詳細な意味的関係を解明する。いくつかのUDAベンチマークの実験では、提案されたGVG-PNがSOTAモデルより明らかに優れていることが検証された。 Unsupervised domain adaptation (UDA) is a critical problem for transfer learning, which aims to transfer the semantic information from labeled source domain to unlabeled target domain. Recent advancements in UDA models have demonstrated significant generalization capabilities on the target domain. However, the generalization boundary of UDA models remains unclear. When the domain discrepancy is too large, the model can not preserve the distribution structure, leading to distribution collapse during the alignment. To address this challenge, we propose an efficient UDA framework named Gradually Vanishing Gap in Prototypical Network (GVG-PN), which achieves transfer learning from both global and local perspectives. From the global alignment standpoint, our model generates a domain-biased intermediate domain that helps preserve the distribution structures. By entangling cross-domain features, our model progressively reduces the risk of distribution collapse. However, only relying on global alignment is insufficient to preserve the distribution structure. To further enhance the inner relationships of features, we introduce the local perspective. We utilize the graph convolutional network (GCN) as an intuitive method to explore the internal relationships between features, ensuring the preservation of manifold structures and generating domain-biased prototypes. Additionally, we consider the discriminability of the inner relationships between features. We propose a pro-contrastive loss to enhance the discriminability at the prototype level by separating hard negative pairs. By incorporating both GCN and the pro-contrastive loss, our model fully explores fine-grained semantic relationships. Experiments on several UDA benchmarks validated that the proposed GVG-PN can clearly outperform the SOTA models.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 特別設計型アップサンプリング・アテンションによる二元量子ニューラルネットワークによる密度予測 The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention ( http://arxiv.org/abs/2405.17776v1 ) ライセンス: Link先を確認	Xingyu Ding, Lianlei Shan, Guiqin Zhao, Meiqi Wu, Wenzhang Zhou, Wei Li,	(参考訳) ディープラーニングベースの情報処理は、長い時間を費やし、特にセマンティックセグメンテーションや有能なオブジェクト検出など、各ピクセルの出力を必要とする高密度な予測タスクのために、巨大なコンピューティングリソースを必要とする。密度予測タスクの定量化には,主に2つの課題がある。第一に、高密度予測タスクが必要とするアップサンプリング操作を直接適用することは極めて粗末であり、許容できない精度の低下を引き起こす。第二に、密度予測ネットワークの複雑な構造は、量子化を行う際に高速かつ高精度を維持することが困難であることを意味する。本稿では、単一予測タスクから高密度予測タスクへバイナリニューラルネットワーク(BNN)の成功を伝達するための効果的なアップサンプリング手法と効率的な注意計算戦略を提案する。まず, 単純で頑健なマルチブランチ並列アップサンプリング構造を設計し, 高い精度を実現する。さらに,セグメンテーションにおいて重要な役割を果たすが,計算の複雑さが大きい注意法を最適化する。我々の注意法は計算の複雑さを100倍に削減できるが、元の効果は維持できる。 Cityscapes、KITTI Road、ECSSDの実験は、我々の作業の有効性を十分に示している。 Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require is extremely crude and causes unacceptable accuracy reduction. Secondly, the complex structure of dense prediction networks means it is difficult to maintain a fast speed as well as a high accuracy when performing quantization. In this paper, we propose an effective upsampling method and an efficient attention computation strategy to transfer the success of the binary neural networks (BNN) from single prediction tasks to dense prediction tasks. Firstly, we design a simple and robust multi-branch parallel upsampling structure to achieve the high accuracy. Then we further optimize the attention method which plays an important role in segmentation but has huge computation complexity. Our attention method can reduce the computational complexity by a factor of one hundred times but retain the original effect. Experiments on Cityscapes, KITTI road, and ECSSD fully show the effectiveness of our work.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 不均衡自律運転課題に対する大規模モデルを用いたオンライン分析的初等連続学習 Online Analytic Exemplar-Free Continual Learning with Large Models for Imbalanced Autonomous Driving Task ( http://arxiv.org/abs/2405.17779v1 ) ライセンス: Link先を確認	Huiping Zhuang, Di Fang, Kai Tong, Yuchen Liu, Ziqian Zeng, Xu Zhou, Cen Chen,	(参考訳) 自律運転の分野では、精巧に訓練されたモデルでさえ、馴染みの無いスカンリオに直面すると失敗する可能性がある。これらのシナリオの1つは、オンライン連続学習(OCL)問題として定式化することができる。つまり、データはオンライン形式で提供され、これらのストリーミングデータに従ってモデルが更新される。 OCLの2つの大きな課題は、壊滅的な忘れとデータの不均衡である。これらの課題に対処するため,本稿では,AEF-OCL(Analytic Exemplar-Free Online Continual Learning)を提案する。 AEF-OCLは解析的連続学習原理を活用し、大きなバックボーンネットワークによって抽出された特徴の分類器としてリッジ回帰を用いる。分析解を再帰的に計算し、継続学習と共同学習の等化を保証することでOCL問題を解決し、使用済みサンプル(例題なし)を保存せずに機能する。さらに,Pseudo-Features Generator (PFG)モジュールを導入し,実際の特徴のずれを再帰的に推定する。 PFGは、正規分布に続くオフセット擬似特徴を生成し、データ不均衡問題に対処する。実験結果から, 自動走行SODA10Mデータセットにおいて, 提案手法は, 既往の戦略であるにもかかわらず, 様々な手法より優れていることが示された。ソースコードはhttps://github.com/ZHUANGHP/Analytic-Continual-learningで入手できる。 In the field of autonomous driving, even a meticulously trained model can encounter failures when faced with unfamiliar sceanrios. One of these scenarios can be formulated as an online continual learning (OCL) problem. That is, data come in an online fashion, and models are updated according to these streaming data. Two major OCL challenges are catastrophic forgetting and data imbalance. To address these challenges, in this paper, we propose an Analytic Exemplar-Free Online Continual Learning (AEF-OCL). The AEF-OCL leverages analytic continual learning principles and employs ridge regression as a classifier for features extracted by a large backbone network. It solves the OCL problem by recursively calculating the analytical solution, ensuring an equalization between the continual learning and its joint-learning counterpart, and works without the need to save any used samples (i.e., exemplar-free). Additionally, we introduce a Pseudo-Features Generator (PFG) module that recursively estimates the deviation of real features. The PFG generates offset pseudo-features following a normal distribution, thereby addressing the data imbalance issue. Experimental results demonstrate that despite being an exemplar-free strategy, our method outperforms various methods on the autonomous driving SODA10M dataset. Source code is available at https://github.com/ZHUANGHP/Analytic-continual-learning.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 2レベル系としての散逸誘起境界状態 Dissipation-induced bound states as a two-level system ( http://arxiv.org/abs/2405.17781v1 ) ライセンス: Link先を確認	Hong Peng Zhang, Zhi Song,	(参考訳) ポテンシャル井戸は、量子粒子を拘束して離散エネルギー準位を形成し、人工的な少数レベルシステムとして機能する。対照的に、反パリティ時間(\mathcal{PT}$)対称系は1対の実エネルギー準位を持つことができるが、残りの全ての準位はエネルギーの負の虚部のために不安定である。本研究では,高調波想像ポテンシャルによって誘導される強結合鎖における束縛状態の形成について検討する。厳密な解は、エネルギー準位の実部は等距離であり、虚部は半負の定値で等距離であることを示している。これにより、効果的な2レベルシステムの構築が可能になる。幅広いプロファイルを持つ与えられた初期状態に対して、進化状態は常に2つの安定固有状態の重ね合わせに収束する。さらに、この2つの状態はディラック内積の下で直交しており、線形場の$\pi$パルスを適用することで相互に切り替えることができる。我々の発見は、消散による量子デバイス製造の代替手段を提供する。 Potential wells are employed to constrain quantum particles into forming discrete energy levels, acting as artificial few-level systems. In contrast, an anti-parity-time ($\mathcal{PT}$) symmetric system can have a single pair of real energy levels, while all the remaining levels are unstable due to the negative imaginary part of the energy. In this work, we investigate the formation of bound states in a tight-binding chain induced by a harmonic imaginary potential. Exact solutions show that the real parts of energy levels are equidistant, while the imaginary parts are semi-negative definite and equidistant. This allows for the formation of an effective two-level system. For a given initial state with a wide range of profiles, the evolved state always converges to a superposition of two stable eigenstates. In addition, these two states are orthogonal under the Dirac inner product and can be mutually switched by applying a $\pi$ pulse of a linear field. Our finding provides an alternative method for fabricating quantum devices through dissipation.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# フェール後のフェールラーニング:ポストプロセッシングによるフェールラーニングにおけるグループとコミュニティフェアネスの獲得 Post-Fair Federated Learning: Achieving Group and Community Fairness in Federated Learning via Post-processing ( http://arxiv.org/abs/2405.17782v1 ) ライセンス: Link先を確認	Yuying Duan, Yijun Tian, Nitesh Chawla, Michael Lemmon,	(参考訳) Federated Learning(FL)は、地域コミュニティの集合が共同で共有グローバルモデルを学習し、各コミュニティ内ですべてのトレーニングデータをローカルに保持する分散機械学習フレームワークである。グループフェアネスとコミュニティフェアネスの2つの概念が近年,連合学習の重要な課題として浮上している。集団公正性は、モデルの決定が、人種や性別のような法的に保護された属性のセットに基づいて特定のグループを好まないことを要求する。コミュニティフェアネスは、グローバルモデルが、協力するすべてのコミュニティで、同様のレベルのパフォーマンス(正確性)を示すことを要求する。どちらのフェアネスの概念もFLフレームワーク内で共存できるが、既存の文献では概念のどちらにも焦点を当てている。本稿では、後処理フェアフェデレーション学習(FFL)フレームワークであるPost-FFLを提案し、分析する。ポストFFLは、グローバルモデルの有用性を最大化しながら、グループとコミュニティの公平性を同時に実施するための線形プログラムを使用する。 Post-FFLは後処理のアプローチであるため、収束特性がよく理解されている既存のFLトレーニングパイプラインで使用することができる。本稿では、実世界のデータセットにポストFFLを用いて、例えば、病院ネットワークがコミュニティの医療を提供するためにフェデレートラーニングをどのように利用しているかを模倣する。理論的な結果は、FFL後の精度が両概念の公正性を強制した場合に失われる。実験の結果, ポストFFLはFLの群落公平性とコミュニティフェアネスを同時に改善することがわかった。さらに、FFL後では、公正性、通信効率、計算コストの両面において、既存の処理フェアフェデレーション学習よりも優れています。 Federated Learning (FL) is a distributed machine learning framework in which a set of local communities collaboratively learn a shared global model while retaining all training data locally within each community. Two notions of fairness have recently emerged as important issues for federated learning: group fairness and community fairness. Group fairness requires that a model's decisions do not favor any particular group based on a set of legally protected attributes such as race or gender. Community fairness requires that global models exhibit similar levels of performance (accuracy) across all collaborating communities. Both fairness concepts can coexist within an FL framework, but the existing literature has focused on either one concept or the other. This paper proposes and analyzes a post-processing fair federated learning (FFL) framework called post-FFL. Post-FFL uses a linear program to simultaneously enforce group and community fairness while maximizing the utility of the global model. Because Post-FFL is a post-processing approach, it can be used with existing FL training pipelines whose convergence properties are well understood. This paper uses post-FFL on real-world datasets to mimic how hospital networks, for example, use federated learning to deliver community health care. Theoretical results bound the accuracy lost when post-FFL enforces both notion of fairness. Experimental results illustrate that post-FFL simultaneously improves both group and community fairness in FL. Moreover, post-FFL outperforms the existing in-processing fair federated learning in terms of improving both notions of fairness, communication efficiency and computation cost.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation ( http://arxiv.org/abs/2405.17784v1 ) ライセンス: Link先を確認	Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg,	(参考訳) 政策勾配定理を利用したモデル自由強化学習~(MFRL)は連続制御タスクにおいてかなりの成功を収めた。しかし、これらのアプローチは、ゼロ階勾配推定による高勾配のばらつきに悩まされ、その結果、準最適ポリシーがもたらされる。逆に、一階モデルベース強化学習~(FO-MBRL)法は、微分可能シミュレーションを用いて、分散の少ない勾配を提供するが、物理的接触などの厳密なダイナミックスを含むシナリオにおいて、誤差をサンプリングする可能性がある。本稿では,この誤差の原因を調査し,厳密なダイナミクスを避けるためにモデルベース地平線を適用して勾配誤差を低減するFO-MBRLアルゴリズムであるAdaptive Horizon Actor-Critic (AHAC)を導入する。実験の結果,AHACはMFRLベースラインを上回り,ローコモーションタスクの40倍の報酬を達成し,壁面時間効率を向上した高次元制御環境に効率よくスケールできることがわかった。 Model-Free Reinforcement Learning~(MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning~(FO-MBRL) methods, employing differentiable simulation, provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40\% more reward across a set of locomotion tasks, and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 道路安全の強化:畳み込みニューラルネットワークによるドライバのリアルタイム検出 Enhancing Road Safety: Real-Time Detection of Driver Distraction through Convolutional Neural Networks ( http://arxiv.org/abs/2405.17788v1 ) ライセンス: Link先を確認	Amaan Aijaz Sheikh, Imaad Zaffar Khan,	(参考訳) 毎日の通勤をナビゲートする中で、注意をそらされたドライバーが起こす脅威は大きなもので、交通事故が急増する。この安全性の懸念に対処するため、我々のプロジェクトは畳み込みニューラルネットワーク(CNN)の分析力を活用し、確立されたモデルであるVGG16とVGG19に特に重点を置いている。これらのモデルは、画像認識における精度が評価され、様々な環境条件下での運転行動のニュアンスを検出する能力について慎重にテストされている。本研究は,CNNアーキテクチャの配列に対する比較分析を通じて,運転者の気晴らしをリアルタイムに検出するための最も効率的なモデルを特定することを目的とする。最終的な目的は、この発見を車両の安全システムに組み込むことであり、不注意によって引き起こされる事故を防ぐ能力を大幅に向上させることである。この研究は、自動車安全技術の理解を深めるだけでなく、ドライバーの行動に直感的に整合し、より安全な道路を確保するための重要なステップでもある。 As we navigate our daily commutes, the threat posed by a distracted driver is at a large, resulting in a troubling rise in traffic accidents. Addressing this safety concern, our project harnesses the analytical power of Convolutional Neural Networks (CNNs), with a particular emphasis on the well-established models VGG16 and VGG19. These models are acclaimed for their precision in image recognition and are meticulously tested for their ability to detect nuances in driver behavior under varying environmental conditions. Through a comparative analysis against an array of CNN architectures, this study seeks to identify the most efficient model for real-time detection of driver distractions. The ultimate aim is to incorporate the findings into vehicle safety systems, significantly boosting their capability to prevent accidents triggered by inattention. This research not only enhances our understanding of automotive safety technologies but also marks a pivotal step towards creating vehicles that are intuitively aligned with driver behaviors, ensuring safer roads for all.	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# Instruct-ReID++:Universal Purpose Instruction-Guided Person Re-identificationを目指して Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification ( http://arxiv.org/abs/2405.17790v1 ) ライセンス: Link先を確認	Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang,	(参考訳) 人間の知性は、視覚的および言語的記述の両方に従って、任意の人物を検索することができる。しかし、現在のコンピュータビジョンコミュニティは、異なるシナリオにおける特定の人物再識別(ReID)タスクを別々に研究しており、現実世界の応用を制限している。本稿では、与えられた画像や言語命令に従って画像を取得する必要がある新しい命令-ReIDタスクを提案することで、この問題を解決する。 Instruct-ReIDは一般的なReID設定の最初の探索であり、既存の6つのReIDタスクを異なる命令を割り当てることで特別なケースとして見ることができる。そこで本研究では,タスク固有性やタスク不要性など,多種多様なデータと包括的評価手法を備えた大規模OmniReID++ベンチマークを提案する。タスク固有の評価設定では、ギャラリーセットは特定のReIDタスクに従って分類される。本稿では,新しいベースラインモデル IRM を提案する。タスクに依存しないギャラリーセットから対象人物画像が検索されるタスクフリー評価設定では、新しいメモリバンク支援学習を用いたIRM++と呼ばれる新しい手法を提案する。 OmniReID++ ベンチマークによる IRM と IRM++ の大規模評価は,提案手法の優位性を実証し,10 個のテストセット上での最先端性能を実現した。データセット、モデル、コードはhttps://github.com/hwz-zju/Instruct-ReIDで入手できる。 Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID	翻訳日:2024-05-29 22:32:09 公開日:2024-05-28
# 3Dガウスのプリミティブ・プルーニングは破滅的現場の破壊を防ぐ SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction ( http://arxiv.org/abs/2405.17793v1 ) ライセンス: Link先を確認	Yongjae Lee, Zhaoliang Zhang, Deliang Fan,	(参考訳) 3D Gaussian Splatting (3DGS)は、リアルタイムレンダリング速度を達成しつつ、トップノッチレンダリングの品質を実証し、新しいビュー合成において大きな進歩を遂げた。しかし、3DGSの過度に多くのガウスプリミティブは、フレーム/秒(FPS)を遅くし、かなりのメモリコストを必要とするため、ローエンドデバイスでは好ましくない。この問題に対処するために、多くのフォローアップ研究は、レンダリング性能を最適化するために、様々なプルーニング技術(しばしば異なるスコア関数と組み合わせて)を提案している。それでも、すべてのテクニックに対する効果と影響に関する包括的な議論は欠落している。本稿では,まず3DGSプルーニング手法を2つのタイプに分類する:クロスビュープルーニングとピクセルワイズプルーニング。その後の実験では,極端ガウスプリミティブデシメーションの下でのクロスビュープルーニングは破滅的な品質低下をもたらすが,画素ワイドプルーニング技術は比較的高いレンダリング品質を維持できるだけでなく,最小限のプルーニング境界を提供する。そこで本研究では,複数種類の楽譜関数を提案し,色重み付けされた楽譜関数が他者より優れていることを実証的に発見し,レンダリングのための重要なプリミティブを識別する。我々の研究は、将来の作業のために3DGSプルーニング戦略を最適化するための貴重な洞察を提供すると信じています。 3D Gaussian Splatting (3DGS) has made a significant stride in novel view synthesis, demonstrating top-notch rendering quality while achieving real-time rendering speed. However, the excessively large number of Gaussian primitives resulting from 3DGS' suboptimal densification process poses a major challenge, slowing down frame-per-second (FPS) and demanding considerable memory cost, making it unfavorable for low-end devices. To cope with this issue, many follow-up studies have suggested various pruning techniques, often in combination with different score functions, to optimize rendering performance. Nonetheless, a comprehensive discussion regarding their effectiveness and implications across all techniques is missing. In this paper, we first categorize 3DGS pruning techniques into two types: Cross-view pruning and pixel-wise pruning, which differ in their approaches to rank primitives. Our subsequent experiments reveal that while cross-view pruning leads to disastrous quality drops under extreme Gaussian primitives decimation, the pixel-wise pruning technique not only sustains relatively high rendering quality with minuscule performance degradation but also provides a reasonable minimum boundary for pruning. Building on this observation, we further propose multiple variations of score functions and empirically discover that the color-weighted score function outperforms others for discriminating insignificant primitives for rendering. We believe our research provides valuable insights for optimizing 3DGS pruning strategies for future works.	翻訳日:2024-05-29 22:22:25 公開日:2024-05-28
# 階層探索-探索トレードオフによる文脈MDPのオフラインOracle効率学習 Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff ( http://arxiv.org/abs/2405.17796v1 ) ライセンス: Link先を確認	Jian Qian, Haichen Hu, David Simchi-Levi,	(参考訳) 近年、文脈的帯域幅からオフライン回帰への統計的・計算的削減(Simchi-Levi and Xu, 2021)の発見により、我々は水平H(H層CMDP)による一般的な(確率的)文脈マルコフ決定過程(CMDP)問題に対処した。本稿では, 実現可能性仮定に基づき, CMDP からオフライン密度推定への還元, すなわち, 真の基盤となるCMDP を含むモデルクラス M を事前に提供する。我々は,O(HlogT)のみをオフライン密度推定アルゴリズム(oracle)に呼び出す,効率的で統計的に近似的なアルゴリズムを開発した。この数は、T が事前に知られている場合、O(HloglogT) にさらに還元することができる。本研究は, モデルクラスの構造的仮定を課さずに, CMDPからオフライン密度推定への最適化を初めて行ったものである。本アルゴリズムの特筆すべき特徴は,CMDPの層状構造に対応するため,層状探索・探索トレードオフの設計である。さらに,本アルゴリズムは汎用的で,報酬なし強化学習における純粋探索作業に適用可能である。 Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumption, i.e., a model class M containing the true underlying CMDP is provided in advance. We develop an efficient, statistically near-optimal algorithm requiring only O(HlogT) calls to an offline density estimation algorithm (or oracle) across all T rounds of interaction. This number can be further reduced to O(HloglogT) if T is known in advance. Our results mark the first efficient and near-optimal reduction from CMDPs to offline density estimation without imposing any structural assumptions on the model class. A notable feature of our algorithm is the design of a layerwise exploration-exploitation tradeoff tailored to address the layerwise structure of CMDPs. Additionally, our algorithm is versatile and applicable to pure exploration tasks in reward-free reinforcement learning.	翻訳日:2024-05-29 22:22:25 公開日:2024-05-28
# 言語モデルにおけるパラメータの活性化パターンの探索 Exploring Activation Patterns of Parameters in Language Models ( http://arxiv.org/abs/2405.17799v1 ) ライセンス: Link先を確認	Yudong Wang, Damai Dai, Zhifang Sui,	(参考訳) ほとんどの研究は、大きな言語モデルを内部の動作メカニズムを深く理解せずにブラックボックスとして扱う。 LLMの内部表現を説明するために,モデルパラメータのアクティベーションレベルを評価するための勾配に基づく計量法を提案する。この測定値に基づいて3つの予備的な結果を得た。 1)入力が同じドメインにある場合、浅い層のパラメータは密に活性化されるため、パラメータの大部分が出力に大きな影響を与える。対照的に、深層層のパラメータはわずかに活性化される。 2) 入力が異なる領域にまたがる場合, 浅い層内のパラメータは, 深い層よりも活性化挙動において高い類似性を示す。 3) 深層層では, 活性化パラメータの分布の類似性は経験的データ関連性と正の相関関係を示した。さらに,これらの知見を固形化するための3つの検証実験を開発した。 1) 第一の発見から, 異なる層に対して異なるプルーンの比率を設定しようと試み, この手法は, モデルプルーニングに有用であることがわかった。 2) 1つのキャリブレーション・セットに基づくプルーンド・モデルでは,2番目のキャリブレーション・タスクよりも,キャリブレーション・タスクに関連するタスクを処理できることがわかった。第三に、STS-B と SICK のベンチマークから、一貫性のあるセマンティクスを持つ2つの文は、深い層で同様のパラメータ活性化パターンを共有する傾向にあり、これは第3の発見と一致する。我々の研究は、LSMにおけるパラメータ活性化の挙動に光を当てており、これらの発見がより実用的な応用を刺激する可能性があることを願っている。 Most work treats large language models as black boxes without in-depth understanding of their internal working mechanism. In order to explain the internal representations of LLMs, we propose a gradient-based metric to assess the activation level of model parameters. Based on this metric, we obtain three preliminary findings. (1) When the inputs are in the same domain, parameters in the shallow layers will be activated densely, which means a larger portion of parameters will have great impacts on the outputs. In contrast, parameters in the deep layers are activated sparsely. (2) When the inputs are across different domains, parameters in shallow layers exhibit higher similarity in the activation behavior than deep layers. (3) In deep layers, the similarity of the distributions of activated parameters is positively correlated to the empirical data relevance. Further, we develop three validation experiments to solidify these findings. (1) Firstly, starting from the first finding, we attempt to configure different prune ratios for different layers, and find this method can benefit model pruning. (2) Secondly, we find that a pruned model based on one calibration set can better handle tasks related to the calibration task than those not related, which validate the second finding. (3) Thirdly, Based on the STS-B and SICK benchmark, we find that two sentences with consistent semantics tend to share similar parameter activation patterns in deep layers, which aligns with our third finding. Our work sheds light on the behavior of parameter activation in LLMs, and we hope these findings will have the potential to inspire more practical applications.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# タンパク質変異効果予測のための多レベル相互作用モデリング Multi-level Interaction Modeling for Protein Mutational Effect Prediction ( http://arxiv.org/abs/2405.17802v1 ) ライセンス: Link先を確認	Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan,	(参考訳) タンパク質とタンパク質の相互作用は多くの生物学的過程において中心的なメディエーターである。変異が相互作用に与える影響を正確に予測することは、これらの相互作用の調節を導くのに不可欠である。変異残基は異なる側鎖配座を示し、背骨配座が変化し、最終的にタンパク質間の結合親和性に影響を与える。しかし、既存の手法は一般的にサイドチェーンレベルの相互作用モデリングにのみ焦点をあてており、結果として準最適予測をもたらす。本研究では, 自己教師型マルチレベル事前学習フレームワークProMIMを提案する。実験では、ProMIMは標準ベンチマークのすべてのベースライン、特にバックボーンのコンフォメーションに大きな変化が生じる可能性のある突然変異に対して、優れたパフォーマンスを示す。さらに、SARS-CoV-2変異効果予測および抗体最適化のためのゼロショット評価の結果は、新しい治療法や新薬を開発するための強力な次世代ツールとしてのProMIMの可能性を示している。 Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different sidechain conformations, which lead to changes in the backbone conformation, eventually affecting the binding affinity between proteins. However, existing methods typically focus only on sidechain-level interaction modeling, resulting in suboptimal predictions. In this work, we propose a self-supervised multi-level pre-training framework, ProMIM, to fully capture all three levels of interactions with well-designed pretraining objectives. Experiments show ProMIM outperforms all the baselines on the standard benchmark, especially on mutations where significant changes in backbone conformations may occur. In addition, leading results from zero-shot evaluations for SARS-CoV-2 mutational effect prediction and antibody optimization underscore the potential of ProMIM as a powerful next-generation tool for developing novel therapeutic approaches and new drugs.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# 文法的誤り訂正のための一般言語モデルによる検出補正構造 Detection-Correction Structure via General Language Model for Grammatical Error Correction ( http://arxiv.org/abs/2405.17804v1 ) ライセンス: Link先を確認	Wei Li, Houfeng Wang,	(参考訳) 文法的誤り訂正(英: Grammatical error correction, GEC)とは、最小限の編集でテキストを修正するためのタスクである。しかし、以前の研究は主に直接修正に焦点を合わせており、両者を単一のモデルに統合する以前の試みは存在しなかった。さらに,大規模言語モデル (LLM) による検出補正パラダイムの探索も未開発である。本稿では,ジェネラル言語モデル(GLM)に基づく,DeCoGLMという名前の総合的な検出補正構造を提案する。検出フェーズはフォールトトレラント検出テンプレートを使用し、補正フェーズは自己回帰マスクの埋め込みを利用して局所的な誤り訂正を行う。入力トークンの戦略的構成とアテンションマスクの修正により,単一モデル内でのマルチタスク学習が促進される。我々のモデルは、英語と中国語のECデータセットの最先端モデルと競合する性能を示す。さらなる実験では、LCMにおける検出補正構造の有効性が示され、GECにとって有望な方向が示唆された。 Grammatical error correction (GEC) is a task dedicated to rectifying texts with minimal edits, which can be decoupled into two components: detection and correction. However, previous works have predominantly focused on direct correction, with no prior efforts to integrate both into a single model. Moreover, the exploration of the detection-correction paradigm by large language models (LLMs) remains underdeveloped. This paper introduces an integrated detection-correction structure, named DeCoGLM, based on the General Language Model (GLM). The detection phase employs a fault-tolerant detection template, while the correction phase leverages autoregressive mask infilling for localized error correction. Through the strategic organization of input tokens and modification of attention masks, we facilitate multi-task learning within a single model. Our model demonstrates competitive performance against the state-of-the-art models on English and Chinese GEC datasets. Further experiments present the effectiveness of the detection-correction structure in LLMs, suggesting a promising direction for GEC.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# TransVIP:音声・等時保存型音声翻訳システム TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation ( http://arxiv.org/abs/2405.17809v1 ) ライセンス: Link先を確認	Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng,	(参考訳) ある言語から別の言語へ直接翻訳する研究は、エンドツーエンドの音声から音声への翻訳として知られている。しかし、ほとんどのエンドツーエンドモデルはカスケードモデル、すなわち音声認識、機械翻訳、テキスト音声モデルの結合によるパイプラインフレームワークを上回ることに苦慮している。主な課題は、直接翻訳タスクとデータの不足に関わる固有の複雑さに起因している。本研究では,多様なデータセットをカスケード方式で活用する新しいモデルフレームワークであるTransVIPを提案する。さらに,2つの分離エンコーダを提案し,話者の音声特性とアイソクロニーを翻訳中の音源音声から保持し,ビデオダビングなどのシナリオに非常に適している。フランス語と英語のペアに関する実験により、我々のモデルは、現在最先端の音声音声翻訳モデルよりも優れていることを示した。 There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complexities involved in direct translation tasks and the scarcity of data. In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. Furthermore, we propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process, making it highly suitable for scenarios such as video dubbing. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# Mani-GS:三角形メッシュを用いたガウス平滑化操作 Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh ( http://arxiv.org/abs/2405.17811v1 ) ライセンス: Link先を確認	Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan,	(参考訳) NeRF(Neural Radiance Fields)のようなニューラルな3D表現は、フォトリアリスティックなレンダリング結果を生成するのに優れているが、コンテンツ作成に不可欠な操作や編集の柔軟性に欠ける。従来の研究は、標準空間でNeRFを変形させたり、明示的なメッシュに基づいて放射場を操作することでこの問題に対処しようと試みてきた。しかし、NeRFの操作は高度に制御可能ではなく、長いトレーニングと推論時間を必要とする。 3Dガウススプラッティング(3DGS)の出現により、より高速なトレーニングとレンダリング速度を持つ明示的なポイントベース3D表現を用いて、非常に高忠実な新規ビュー合成を実現することができる。しかし、レンダリング品質を維持しながら3DGSを自由に操作する効果的な手段がまだ存在しない。本研究では,マニピュラブルな写真リアリスティックレンダリングを実現するための課題に取り組むことを目的とする。本稿では,三角メッシュを用いて3DGSを直接自己適応で操作する手法を提案する。このアプローチにより、様々な種類のガウス演算のための様々なアルゴリズムを設計する必要がなくなる。三角形形状を意識したガウス結合と適応法を用いることで、3DGSの操作を実現し、操作後の高忠実性レンダリングを維持できる。我々のアプローチは、高品質なレンダリングを維持しながら、大きな変形、局所的な操作、ソフトボディシミュレーションを処理できる。さらに,本手法は3DGSから抽出した不正確なメッシュに対しても有効であることを示す。実験により,本手法の有効性とベースラインアプローチに対する優位性を実証した。 Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not highly controllable and requires a long training and inference time. With the emergence of 3D Gaussian Splatting (3DGS), extremely high-fidelity novel view synthesis can be achieved using an explicit point-based 3D representation with much faster training and rendering speed. However, there is still a lack of effective means to manipulate 3DGS freely while maintaining rendering quality. In this work, we aim to tackle the challenge of achieving manipulable photo-realistic rendering. We propose to utilize a triangular mesh to manipulate 3DGS directly with self-adaptation. This approach reduces the need to design various algorithms for different types of Gaussian manipulation. By utilizing a triangle shape-aware Gaussian binding and adapting method, we can achieve 3DGS manipulation and preserve high-fidelity rendering after manipulation. Our approach is capable of handling large deformations, local manipulations, and soft body simulations while keeping high-quality rendering. Furthermore, we demonstrate that our method is also effective with inaccurate meshes extracted from 3DGS. Experiments conducted demonstrate the effectiveness of our method and its superiority over baseline approaches.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# FAIntbench: テキスト・画像モデルにおけるバイアス評価のための完全かつ高精度なベンチマーク FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models ( http://arxiv.org/abs/2405.17814v1 ) ライセンス: Link先を確認	Hanjun Luo, Ziye Deng, Ruizhe Chen, Zuozhu Liu,	(参考訳) テキスト・ツー・イメージ(T2I)モデルへの急速な開発と参入障壁の低減は、出力のバイアスに関する懸念を提起しているが、既存の研究ではバイアスの全体的定義と評価の枠組みが欠如しており、デバイアス手法の強化が制限されている。この問題に対処するために、我々はT2Iモデルにおけるバイアスの総合的かつ正確なベンチマークであるFAIntbenchを紹介する。限定的な側面でバイアスを評価する既存のベンチマークとは対照的に、FAIntbenchはバイアスの表示、バイアスの可視性、取得された属性、保護された属性の4つの次元からバイアスを評価する。 FAIntbenchを7種類の大規模T2Iモデル評価に適用し, 各種バイアスの同定にFAIntbenchの有効性を実証した。また, 蒸留の副作用など, バイアスに関する新たな研究課題も明らかにした。この結果は予備的であり、T2Iモデルのバイアスを軽減することを目的とした将来の研究を進めるためのFAIntbenchの可能性を強調している。私たちのベンチマークは再現性を確保するために公開されています。 The rapid development and reduced barriers to entry for Text-to-Image (T2I) models have raised concerns about the biases in their outputs, but existing research lacks a holistic definition and evaluation framework of biases, limiting the enhancement of debiasing techniques. To address this issue, we introduce FAIntbench, a holistic and precise benchmark for biases in T2I models. In contrast to existing benchmarks that evaluate bias in limited aspects, FAIntbench evaluate biases from four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes. We applied FAIntbench to evaluate seven recent large-scale T2I models and conducted human evaluation, whose results demonstrated the effectiveness of FAIntbench in identifying various biases. Our study also revealed new research questions about biases, including the side-effect of distillation. The findings presented here are preliminary, highlighting the potential of FAIntbench to advance future research aimed at mitigating the biases in T2I models. Our benchmark is publicly available to ensure the reproducibility.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# 視覚アンカーはマルチモーダル大言語モデルのための強力な情報集約器である Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model ( http://arxiv.org/abs/2405.17815v1 ) ライセンス: Link先を確認	Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang,	(参考訳) MLLM(Multimodal Large Language Models)の領域では、事前訓練されたビジョンエンコーダとLLM(Large Language Models)を結びつける上で、視覚言語コネクタが重要な役割を果たす。その重要性にもかかわらず、視覚言語コネクタは比較的研究が進んでいない。本研究では,低計算コストを維持しつつ,MLLMの高精度化を実現するための強力な視覚言語コネクタを提案する。まず、視覚変換器における視覚アンカーの存在を明らかにし、それらを抽出するためのコスト効率の良い探索アルゴリズムを提案する。 AcFormerは,事前学習時に得られた視覚的アンカーから得られる豊富な知識を活かし,情報収集を導く新しい視覚言語コネクタである。大規模な実験により,提案手法はベースラインに比べて計算コストを約3分の2削減し,同時にベースライン法より優れていることを示した。これはAcFormerの有効性と効率性を強調している。 In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relatively less explored. In this study, we aim to propose a strong vision-language connector that enables MLLMs to achieve high accuracy while maintain low computation cost. We first reveal the existence of the visual anchors in Vision Transformer and propose a cost-effective search algorithm to extract them. Building on these findings, we introduce the Anchor Former (AcFormer), a novel vision-language connector designed to leverage the rich prior knowledge obtained from these visual anchors during pretraining, guiding the aggregation of information. Through extensive experimentation, we demonstrate that the proposed method significantly reduces computational costs by nearly two-thirds compared with baseline, while simultaneously outperforming baseline methods. This highlights the effectiveness and efficiency of AcFormer.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# アウト・オブ・ディストリビューション検出のためのニューラル・コラプスに基づく探索的特徴分離 Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection ( http://arxiv.org/abs/2405.17816v1 ) ライセンス: Link先を確認	Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang,	(参考訳) オープンな世界では、ラベルが分布内(ID)サンプルと不一致であるOOD(out-of-distriion)データを検出することは、信頼性の高いディープニューラルネットワーク(DNN)にとって重要である。より優れた検出性能を実現するために、モデル出力に定義された分離損失を通じてIDとOODデータの差を増幅するために、補助的なOODデータセットを用いてモデルを微調整する手法を提案する。しかしながら、これらの研究のどれも、特徴格差の拡大を考慮せず、出力よりも効果的であるべきである。主な困難はOODサンプルの多様性であり、ID特徴と区別するために損失を設計するだけでなく、それらの特徴分布を記述するのが難しくなる。本稿では,ニューラル・コラプス(NC)と呼ばれるID特徴の集約特性に基づいて,問題を適切に阻止する。 NCは、クラス内のIDサンプルの垂直的な特徴が、対応するクラスの最後の層重みとほぼ同一であることを意味する。そこで我々はOrthLossと呼ばれるシンプルだが効果的な損失を提案する。OrthLossはNCによって形成されるID特徴の主部分空間に直交する部分空間におけるOODデータの特徴を結合する。このように、IDとOODのサンプルの特徴は異なる次元で分離される。出力差を純粋に増大させるのではなく,特徴分離損失を最適化することにより,新たなデータ拡張やサンプリングを行わずにCIFARベンチマーク上でのSOTA性能を実現し,OOD検出における特徴分離の重要性を示す。コードは公開されます。 In the open world, detecting out-of-distribution (OOD) data, whose labels are disjoint with those of in-distribution (ID) samples, is important for reliable deep neural networks (DNNs). To achieve better detection performance, one type of approach proposes to fine-tune the model with auxiliary OOD datasets to amplify the difference between ID and OOD data through a separation loss defined on model outputs. However, none of these studies consider enlarging the feature disparity, which should be more effective compared to outputs. The main difficulty lies in the diversity of OOD samples, which makes it hard to describe their feature distribution, let alone design losses to separate them from ID features. In this paper, we neatly fence off the problem based on an aggregation property of ID features named Neural Collapse (NC). NC means that the penultimate features of ID samples within a class are nearly identical to the last layer weight of the corresponding class. Based on this property, we propose a simple but effective loss called OrthLoss, which binds the features of OOD data in a subspace orthogonal to the principal subspace of ID features formed by NC. In this way, the features of ID and OOD samples are separated by different dimensions. By optimizing the feature separation loss rather than purely enlarging output differences, our detection achieves SOTA performance on CIFAR benchmarks without any additional data augmentation or sampling, demonstrating the importance of feature separation in OOD detection. The code will be published.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# 臨床応用のための骨格型運動エンコーダモデルのベンチマーク:歩行系列におけるパーキンソン病重症度の推定 Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences ( http://arxiv.org/abs/2405.17817v1 ) ライセンス: Link先を確認	Vida Adeli, Soroush Mehraban, Yasamin Zarghami, Irene Ballester, Andrea Sabo, Andrea Iaboni, Babak Taati,	(参考訳) 本研究では,PD患者の歩行パターンを解析するための大規模人体運動データセットを用いた一般的な人体運動エンコーダの応用について検討した。これらのモデルは、ヒトの生体力学的知識の豊富さを学習しているが、パーキンソン歩行などの病理学的運動の分析における効果は、まだ完全には検証されていない。本研究では, 運動障害社会の予測能力について, モーションキャプチャーデータから, パーキンソン病評価尺度 (MDS-UPDRS-III) の歩行スコアを推定し, 事前訓練した6つの人体動作エンコーダモデルについて, 比較枠組みを提案し, 評価を行った。われわれはこれらのデータを、最近リリースされたPD患者を含む大規模なPDデータセットにおいて、伝統的な歩行特徴に基づく予測モデルと比較した。機能ベースのモデルは現在、重み付けされた平均精度、精度、リコール、F1スコアを示している。比較的類似した結果を持つモーションエンコーダモデルは、臨床環境でのスケーラビリティと効率性を示す。このポテンシャルは、PDトレーニングセットの微調整時にエンコーダモデルの性能が向上したことによる。ヒトの6つの運動モデルのうち4つは、オン・メディケーション状態とオフ・メディケーション状態の間に有意な差がある予測スコアを提供した。この結果から, 運動エンコーダモデルの臨床的変化に対する感受性が示唆された。また、これらのモデルの継続的なカスタマイズの必要性を強調し、疾患特有の特徴をよりよく捉え、労働集約的な特徴工学への依存を減らす。最後に,臨床環境における骨格型モーションエンコーダモデルの解析のためのベンチマークを構築した。私たちの知る限りでは、最先端のモデルをテストし、臨床環境での競争を可能にするベンチマークを提供するのは、今回が初めてです。コードとベンチマークのリーダーボードは、コードで入手できる。 This study investigates the application of general human motion encoders trained on large-scale human motion datasets for analyzing gait patterns in PD patients. Although these models have learned a wealth of human biomechanical knowledge, their effectiveness in analyzing pathological movements, such as parkinsonian gait, has yet to be fully validated. We propose a comparative framework and evaluate six pre-trained state-of-the-art human motion encoder models on their ability to predict the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS-III) gait scores from motion capture data. We compare these against a traditional gait feature-based predictive model in a recently released large public PD dataset, including PD patients on and off medication. The feature-based model currently shows higher weighted average accuracy, precision, recall, and F1-score. Motion encoder models with closely comparable results demonstrate promise for scalability and efficiency in clinical settings. This potential is underscored by the enhanced performance of the encoder model upon fine-tuning on PD training set. Four of the six human motion models examined provided prediction scores that were significantly different between on- and off-medication states. This finding reveals the sensitivity of motion encoder models to nuanced clinical changes. It also underscores the necessity for continued customization of these models to better capture disease-specific features, thereby reducing the reliance on labor-intensive feature engineering. Lastly, we establish a benchmark for the analysis of skeleton-based motion encoder models in clinical settings. To the best of our knowledge, this is the first study to provide a benchmark that enables state-of-the-art models to be tested and compete in a clinical context. Codes and benchmark leaderboard are available at code.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# 自己教師付き表現による任意の分解能を有するハイパースペクトル・マルチスペクトル画像融合 Hyperspectral and multispectral image fusion with arbitrary resolution through self-supervised representations ( http://arxiv.org/abs/2405.17818v1 ) ライセンス: Link先を確認	Ting Wang, Zipei Yan, Jizhou Li, Xile Zhao, Chao Wang, Michael Ng,	(参考訳) 高分解能マルチスペクトル像 (HR-MSI) と低分解能ハイパースペクトル像 (LR-HSI) の融合は, HSI超解像 (SR) の実現に有効である。従来の研究は主に、潜時高分解能ハイパースペクトル像(HR-HSI)の後方分布を推定することに集中しており、潜時高分解能高分解能画像と観測画像との差から計算された適切な画像の事前及び可能性を活用している。低位は, 行列分解による潜在HSI特性の保存に際し顕著である。しかし、この手法は2つのモードの次元内でのみ分解能を高める。この制限を克服するために,2つのニューラル表現を行列分解に統合し,空間情報とスペクトル情報をキャプチャすることで,新しい連続低ランク分解(CLoRF)を提案する。提案手法は,行列分解による低ランク化と神経表現による連続性の両方を自己監督的に利用することができる。理論的には、提案された連続低ランク分解における低ランク特性とリプシッツ連続性を証明する。実験では,ニューラルネットワークの再トレーニングを必要とせず,既存の手法をはるかに超え,ユーザの求める解像度を実現する。 The fusion of a low-resolution hyperspectral image (LR-HSI) with a high-resolution multispectral image (HR-MSI) has emerged as an effective technique for achieving HSI super-resolution (SR). Previous studies have mainly concentrated on estimating the posterior distribution of the latent high-resolution hyperspectral image (HR-HSI), leveraging an appropriate image prior and likelihood computed from the discrepancy between the latent HSI and observed images. Low rankness stands out for preserving latent HSI characteristics through matrix factorization among the various priors. However, this method only enhances resolution within the dimensions of the two modalities. To overcome this limitation, we propose a novel continuous low-rank factorization (CLoRF) by integrating two neural representations into the matrix factorization, capturing spatial and spectral information, respectively. This approach enables us to harness both the low rankness from the matrix factorization and the continuity from neural representation in a self-supervised manner. Theoretically, we prove the low-rank property and Lipschitz continuity in the proposed continuous low-rank factorization. Experimentally, our method significantly surpasses existing techniques and achieves user-desired resolutions without the need for neural network retraining.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# 樹木の森を見逃すな:大規模視覚言語モデルのための注意的視覚校正 Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models ( http://arxiv.org/abs/2405.17820v1 ) ライセンス: Link先を確認	Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim,	(参考訳) 本研究では,視覚的物体のきめ細かい理解を必要とするタスクにおいて,視覚的物体の視覚的認識に過度な注意を払って幻覚反応を起こす,LVLM(Large Vision Language Models)の課題に対処する。注目度を下げるトークンは、単にオブジェクトの存在を認識することから、属性(色、位置など)を識別し、それらの関係を理解することまで、曖昧なオブジェクトの詳細を特定するために不可欠な情報を持っていることがわかりました。盲点トークンに対する過度な強調と,ユーザの問い合わせに正確に応答するために,AVC(Atentional Vision Calibration)と呼ばれる手法を導入する。復号フェーズにおいて、AVCは画像関連注意分布を分析して盲点を識別する。次に、元の視覚トークンに条件付のロジットと、ブラインドトークンに条件付のロジットを対比することにより、次のトークン予測のためのロジットを動的に調整する。これにより、盲点トークンへの依存が効果的に減少し、すべてのトークンに対するよりバランスの取れた考慮が促進される。 PPE, MME, AMBER などのベンチマークで AVC を検証し,LVLM におけるオブジェクト幻覚の緩和において,既存の復号化手法を一貫して上回っている。 This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects. We found that tokens receiving lower attention weights often hold essential information for identifying nuanced object details -- ranging from merely recognizing object existence to identifying their attributes (color, position, etc.) and understanding their relationships. To counteract the over-emphasis on blind tokens and to accurately respond to user queries, we introduce a technique called Attentional Vision Calibration (AVC). During the decoding phase, AVC identifies blind tokens by analyzing the image-related attention distribution. It then dynamically adjusts the logits for the next token prediction by contrasting the logits conditioned on the original visual tokens with those conditioned on the blind tokens. This effectively lowers the dependency on blind tokens and promotes a more balanced consideration of all tokens. We validate AVC on benchmarks such as POPE, MME, and AMBER, where it consistently outperforms existing decoding techniques in mitigating object hallucinations in LVLMs.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# RITUAL:LVLMにおけるユニバーサルアンチハロシン化レバーとしてのランダム画像変換 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs ( http://arxiv.org/abs/2405.17821v1 ) ライセンス: Link先を確認	Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim,	(参考訳) 大規模視覚言語モデル(LVLM)の最近の進歩は、機械が視覚入力に基づいてテキスト応答を理解・生成する方法に革命をもたらした。印象的な能力にもかかわらず、彼らはしばしば視覚情報を正確に反映しない「幻覚的」なアウトプットを生成し、信頼性と信頼性の課題を提起する。対照的な復号法のような現在の手法は、生成されたトークンの元の確率分布と歪んだトークンとの対比によってこれらの問題に対処する努力をしてきたが、視覚的に忠実な出力を生成することは依然として困難である。この研究では、我々の焦点を反対にシフトする: 元の確率分布を補完的に拡張できるものは何か? LVLMにおける幻覚に対する堅牢性を高めるため,RITUALと呼ばれる簡易なトレーニング不要な手法を提案する。提案手法では,モデルが様々な視覚シナリオに露出することにより,幻覚的視覚的説明の可能性を軽減することを目的として,元の確率分布の補足としてランダムな画像変換を用いる。実験の結果,変換画像の孤立的使用は当初は性能を低下させるが,これらの変換の戦略的実装は事実上有効な補完となることが示された。特に,本手法は,現行のコントラスト復号法と互換性があり,外部モデルやコストのかかる自己フィードバック機構を必要としないため,実用的な追加である。実験では、RITUALは、POPE、CHAIR、MMEを含むいくつかのオブジェクト幻覚ベンチマークにおいて、既存のコントラスト的復号法を著しく上回っている。 Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do not accurately reflect the visual information, posing challenges in reliability and trustworthiness. Current methods such as contrastive decoding have made strides in addressing these issues by contrasting the original probability distribution of generated tokens with distorted counterparts; yet, generating visually-faithful outputs remains a challenge. In this work, we shift our focus to the opposite: What could serve as a complementary enhancement to the original probability distribution? We propose a simple, training-free method termed RITUAL to enhance robustness against hallucinations in LVLMs. Our approach employs random image transformations as complements to the original probability distribution, aiming to mitigate the likelihood of hallucinatory visual explanations by enriching the model's exposure to varied visual scenarios. Our empirical results show that while the isolated use of transformed images initially degrades performance, strategic implementation of these transformations can indeed serve as effective complements. Notably, our method is compatible with current contrastive decoding methods and does not require external models or costly self-feedback mechanisms, making it a practical addition. In experiments, RITUAL significantly outperforms existing contrastive decoding methods across several object hallucination benchmarks, including POPE, CHAIR, and MME.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# Conv-CoA:会話の連鎖による大規模言語モデルにおけるオープンドメイン質問応答の改善 Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action ( http://arxiv.org/abs/2405.17822v1 ) ライセンス: Link先を確認	Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu,	(参考訳) 本稿では,オープンドメイン会話質問回答(OCQA)のための会話連鎖(Conv-CoA)フレームワークを提案する。文学と比較して、Conv-CoAは3つの大きな課題に対処している。 (i)リアルタイムやドメイン事実と矛盾する不信な幻覚 (二)会話シナリオにおける弱い推論性能、及び三会話情報検索における不満足な性能我々の重要な貢献は、動的推論-検索機構で、質問の意図を抽出し、体系的なプロンプト、事前設計されたアクション、コンテキスト知識セット(CKS)の更新、新しいホップフィールドベースの検索器によって解決される推論チェーンに分解する。提案手法は,我々の行動における会話情報検索の効率と精度を高めるために,資源効率の高いホップフィールド検索手法を提案する。さらに,検索した知識と会話における回答の矛盾を検証し,解決するための対話型マルチ参照信頼スコア(Conv-MRFS)を提案する。実証的に、我々は5つの異なる研究方向と2つの公開ベンチマークで、我々のフレームワークと23の最先端手法の比較を行う。これらの比較により、我々のConv-CoAは精度と効率の両面で他の手法よりも優れていることが示された。 We present a Conversational Chain-of-Action (Conv-CoA) framework for Open-domain Conversational Question Answering (OCQA). Compared with literature, Conv-CoA addresses three major challenges: (i) unfaithful hallucination that is inconsistent with real-time or domain facts, (ii) weak reasoning performance in conversational scenarios, and (iii) unsatisfying performance in conversational information retrieval. Our key contribution is a dynamic reasoning-retrieval mechanism that extracts the intent of the question and decomposes it into a reasoning chain to be solved via systematic prompting, pre-designed actions, updating the Contextual Knowledge Set (CKS), and a novel Hopfield-based retriever. Methodologically, we propose a resource-efficiency Hopfield retriever to enhance the efficiency and accuracy of conversational information retrieval within our actions. Additionally, we propose a conversational-multi-reference faith score (Conv-MRFS) to verify and resolve conflicts between retrieved knowledge and answers in conversations. Empirically, we conduct comparisons between our framework and 23 state-of-the-art methods across five different research directions and two public benchmarks. These comparisons demonstrate that our Conv-CoA outperforms other methods in both the accuracy and efficiency dimensions.	翻訳日:2024-05-29 22:22:24 公開日:2024-05-28
# スペクトルトランニケーションカーネル:$C^$-代数カーネルマシンにおける非可換性 Spectral Truncation Kernels: Noncommutativity in $C^$-algebraic Kernel Machines ( http://arxiv.org/abs/2405.17823v1 ) ライセンス: Link先を確認	Yuka Hashimoto, Ayoub Hafid, Masahiro Ikeda, Hachem Kadri,	(参考訳) 本稿では、非可換幾何学や$C^$-代数の分野で議論されているスペクトルトランケーションに基づく、新しい正定値カーネルのクラスを提案する。入力と出力が関数であり、多項式、積、分離可能なカーネルなどの既存のカーネルを一般化するカーネルに焦点を当て、カーネルに現れる製品の非可換性を記述したトランケーションパラメータ$n$を導入する。 n$が無限大になるとき、提案されたカーネルは既存の可換カーネルに傾向がある。 n$ が有限であれば、それらは異なる振る舞いを示し、非可換性はデータ関数領域に沿った相互作用を誘導する。 truncationパラメータ$n$は、性能向上につながる支配的要因であり、適切な$n$を設定することで、表現力と表現空間の複雑さのバランスをとることができる。提案されたカーネルクラスの柔軟性により、以前の可換カーネルを超えることができる。 In this paper, we propose a new class of positive definite kernels based on the spectral truncation, which has been discussed in the fields of noncommutative geometry and $C^$-algebra. We focus on kernels whose inputs and outputs are functions and generalize existing kernels, such as polynomial, product, and separable kernels, by introducing a truncation parameter $n$ that describes the noncommutativity of the products appearing in the kernels. When $n$ goes to infinity, the proposed kernels tend to the existing commutative kernels. If $n$ is finite, they exhibit different behavior, and the noncommutativity induces interactions along the data function domain. We show that the truncation parameter $n$ is a governing factor leading to performance enhancement: by setting an appropriate $n$, we can balance the representation power and the complexity of the representation space. The flexibility of the proposed class of kernels allows us to go beyond previous commutative kernels.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# mTREE:全スライド画像解析のためのマルチレベルテキストガイド型エンドツーエンド学習 mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis ( http://arxiv.org/abs/2405.17824v1 ) ライセンス: Link先を確認	Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Vishwesh Nath, Yucheng Tang, Yuankai Huo,	(参考訳) マルチモーダル学習は視覚とテキストのデータを統合するが、特にギガピクセル全スライド画像(WSI)のような大規模で高解像度の画像では、その病理像やテキスト解析への応用は依然として困難である。現在のメソッドは通常、ローカル表現(パッチレベルなど)をグローバル機能(スライドレベルなど)に組み立てるために、手動のリージョンラベリングやマルチステージ学習に依存している。しかし,テキストデータとマルチスケール画像表現をシームレスなエンドツーエンドプロセスで統合する方法は存在しない。本研究では,マルチレベルテキストガイド表現のエンド・ツー・エンド・ラーニング(mTREE)を提案する。この新しいテキスト誘導アプローチは、付随するテキスト病理情報からの情報を利用することで、マルチスケールのWSI表現を効果的にキャプチャする。 mTREEは、キー領域(グローバルからローカル)のローカライズとWSIレベルの画像テキスト表現(ローカルからグローバル)の開発を、統一されたエンドツーエンドの学習フレームワークに統合しています。このモデルでは、テキスト情報は2つの目的を果たす: 第一に、重要領域を正確に識別するための注意マップとして機能し、第二に、画像の包括的表現にテキスト特徴を統合するための導管として機能する。本研究は,2つの画像関連課題(分類と生存予測)において,mTREEの有効性を定量的に解析し,ベースラインよりも顕著に優れていることを示す。 Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (global-to-local) and the development of a WSI-level image-text representation (local-to-global) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# 混合プロンプトによる拡散モデルパッチング Diffusion Model Patching via Mixture-of-Prompts ( http://arxiv.org/abs/2405.17825v1 ) ライセンス: Link先を確認	Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim,	(参考訳) 本稿では,すでに収束した拡散モデルの性能を向上させるための簡易な手法である拡散モデルパッチング(DMP)を提案する。 DMPは、オリジナルのモデルを凍結したまま、モデルの入力空間に小さな学習可能なプロンプトを挿入する。 DMPの有効性は単にパラメータの追加によるものではなく、その動的ゲーティング機構に由来するもので、生成過程のすべてのステップ(例えば、逆認知ステップ)において学習可能なプロンプトのサブセットを選択・結合する。この戦略は、我々が "mixture-of-prompts" と呼んでいるもので、モデルが各プロンプトの異なる専門知識に基づいて、本質的には最小限のパラメータで、各ステップでモデルの機能を"パッチ"することを可能にする。 DMPは、モデル収束によって通常、大幅な改善が期待できないシナリオであっても、当初トレーニングされていた同じデータセットでさらなるトレーニングを行うことで、モデルを強化します。実験の結果、DMPはFFHQ 256x256上のDiT-L/2の収束FIDを10.38%向上させ、1.43%のパラメータ増加と50Kの追加訓練を繰り返した。 We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every step of the generative process (e.g., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every step with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the same dataset on which it was originally trained, even in a scenario where significant improvements are typically not expected due to model convergence. Experiments show that DMP significantly enhances the converged FID of DiT-L/2 on FFHQ 256x256 by 10.38%, achieved with only a 1.43% parameter increase and 50K additional training iterations.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# LDMol: 化学インフォーマティブな潜在空間を利用したテキスト合成分子拡散モデル LDMol: Text-Conditioned Molecule Diffusion Model Leveraging Chemically Informative Latent Space ( http://arxiv.org/abs/2405.17829v1 ) ライセンス: Link先を確認	Jinho Chang, Jong Chul Ye,	(参考訳) 生成モデルの最前線として拡散モデルの出現に伴い、多くの研究者が条件付き拡散モデルを用いた分子生成技術を提案している。しかし、少数の原子と結合の中で高い絡み合った相関関係を持つ分子の基本的な性質から、モデルが自然言語としてより複雑になる条件と生データを結びつけることは困難になる。そこで本研究では, LDMol と呼ばれる新しい潜在拡散モデルを提案する。具体的には、化学情報的特徴空間を生成する分子エンコーダ、拡散変換器(DiT)を用いた自然言語条件の潜在拡散モデル、分子reの自己回帰デコーダの3つの構成要素から構成される。特に、複数のSMILES表記が同じ分子を表現できることを認識し、化学情報量空間を抽出するために対照的な学習戦略を用いる。 LDMolは、テキストから分子生成ベンチマークで既存のベースラインを破るだけでなく、目に見えないシナリオでゼロショット推論も可能である。さらに, LDMolは, 分子間検索やテキスト駆動による分子編集などの下流処理にも適用可能であることを示し, 拡散モデルとしての汎用性を示した。 With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques using conditional diffusion models. However, due to the fundamental nature of a molecule, which carries highly entangled correlations within a small number of atoms and bonds, it becomes difficult for a model to connect raw data with the conditions when the conditions become more complex as natural language. To address this, here we present a novel latent diffusion model dubbed LDMol, which enables a natural text-conditioned molecule generation. Specifically, LDMol is composed of three building blocks: a molecule encoder that produces a chemically informative feature space, a natural language-conditioned latent diffusion model using a Diffusion Transformer (DiT), and an autoregressive decoder for molecule re. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract the chemical informative feature space. LDMol not only beats the existing baselines on the text-to-molecule generation benchmark but is also capable of zero-shot inference with unseen scenarios. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-driven molecule editing, demonstrating its versatility as a diffusion model.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# 破滅的フォーミング以上のもの:ドメイン特化LDMの汎用能力の統合 More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs ( http://arxiv.org/abs/2405.17830v1 ) ライセンス: Link先を確認	Chengyuan Liu, Shihang Wang, Yangyang Kang, Lizhi Qing, Fubang Zhao, Changlong Sun, Kun Kuang, Fei Wu,	(参考訳) 大規模言語モデル(LLM)がドメイン固有のタスクに微調整された後に、一般的なタスクのパフォーマンスが低下する。しかし,本論文では,汎用能力統合(General Capabilities Integration, GCI)と呼ばれる,CF以外のドメイン固有のLCMを実際に適用するには,汎用能力とドメイン知識の両方を単一インスタンス内で統合する必要がある,という課題を提起する。 GCIの目的は、新たに獲得した汎用能力を、新しいドメイン知識と共に保持するだけでなく、両方のスキルセットを結合的に調和して利用して、ドメイン固有のタスクのパフォーマンスを高めることである。法的なドメインを例として、実践性に欠けることなく、トレーニングとテストの3つのグループを慎重に設計し、対応するデータセットを構築します。ドメイン固有のシナリオにまたがって、より一般的な機能を組み込むために、LoRA上のマルチヘッドアテンションモジュールを利用するALoRAを導入し、先行トークンから現在のトークンへの直接的な情報転送を容易にする。この拡張により、関心に応じてドメイン固有の知識と一般的な能力とを動的に切り替えることができる。提案課題について大規模な実験を行った。その結果,設定の意義と手法の有効性が示された。 The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and domain knowledge within a single instance. The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks. Taking legal domain as an example, we carefully design three groups of training and testing tasks without lacking practicability, and construct the corresponding datasets. To better incorporate general capabilities across domain-specific scenarios, we introduce ALoRA, which utilizes a multi-head attention module upon LoRA, facilitating direct information transfer from preceding tokens to the current one. This enhancement permits the representation to dynamically switch between domain-specific knowledge and general competencies according to the attention. Extensive experiments are conducted on the proposed tasks. The results exhibit the significance of our setting, and the effectiveness of our method.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# 政策グラディエント手法のモルフィケーション効果 Mollification Effects of Policy Gradient Methods ( http://arxiv.org/abs/2405.17832v1 ) ライセンス: Link先を確認	Tao Wang, Sylvia Herbert, Sicun Gao,	(参考訳) 政策勾配法により、複雑な非滑らかな最適化景観を生成する高非線形ダイナミクスを含むシステムであっても、深い強化学習(RL)により、継続的な制御問題に挑戦することができる。本研究では, 目的関数をよりスムーズかつ容易に最適化できる一方で, 確率的目的が元の問題からさらに逸脱する, 効果的な政策探索を実現するために, 非平滑な最適化環境をどう動員するかを理解するための厳密な枠組みを開発する。政策勾配法と逆熱方程式の解法との等価性を実証する。 PDE理論の逆熱方程式の不備に続き、確率性の下での政策勾配の利用に根本的な課題を提示する。さらに,高調波解析におけるこの制限と不確実性原理を関連付け,RLにおける確率的ポリシによる探索の効果を理解する。また,実際の軟化効果の肯定的側面と否定的側面の両方を示す実験結果も提示した。 Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy search, as well as the downside of it: while making the objective function smoother and easier to optimize, the stochastic objective deviates further from the original problem. We demonstrate the equivalence between policy gradient methods and solving backward heat equations. Following the ill-posedness of backward heat equations from PDE theory, we present a fundamental challenge to the use of policy gradient under stochasticity. Moreover, we make the connection between this limitation and the uncertainty principle in harmonic analysis to understand the effects of exploration with stochastic policies in RL. We also provide experimental results to illustrate both the positive and negative aspects of mollification effects in practice.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# 確率近似におけるステップサイズ推定の再検討 Revisiting Step-Size Assumptions in Stochastic Approximation ( http://arxiv.org/abs/2405.17834v1 ) ライセンス: Link先を確認	Caio Kalil Lauand, Sean Meyn,	(参考訳) 多くの機械学習と最適化アルゴリズムは確率近似(SA)の枠組みに基づいて構築されており、ステップサイズ(または学習率)の選択は成功に不可欠である。明確にするために、本稿では、特別なケースである $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ に焦点を当てる。実際には$\rho=0$ (constant step-size)を取るのが一般的であるが、より理論的に指向した論文では、消滅する Step-size が好まれる。特に、$\rho \in (1/2, 1)$の場合、平均二乗誤差(MSE)は$O(1/n)$の最適速度で収束し、中央極限定理(CLT)の共分散は正確な意味で最小となることが知られている。この論文は、一般的なマルコフ的な設定でステップサイズの選択を再考する。容易に検証可能な仮定の下で、以下の結論が得られる:$0<\rho<1$:$\bullet$パラメータ推定は確率1と収束し、任意の$p\ge 1$に対して$L_p$である。 $\bullet$ MSE は小さな $\rho$ に対して非常にゆっくりと収束し、平均化しても$O(\alpha_n^2)$ である。任意の$\rho\in (0,1)$に対して、誤差 $\textit{covariance}$ が最適速度で消滅する推定結果の平均化結果、さらに CLT の共分散はポリアクとルパートの意味で最適である。しかし、$\textit{bias}$が$O(\alpha_n)$で0に収束する必要十分条件が得られる。これはそのような強い結論を得た最初の論文であり、$\rho \le 1/2$ を許容する。大きな結論は、$\rho =0$ あるいは $\rho<1/2$ の選択は、選択した設定でのみ正当化されるということだ。 Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ design parameters. It is most common in practice to take $\rho=0$ (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with $\rho \in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided $0<\rho<1$: $\bullet$ Parameter estimates converge with probability one, and also in $L_p$ for any $p\ge 1$. $\bullet$ The MSE may converge very slowly for small $\rho$, of order $O(\alpha_n^2)$ even with averaging. $\bullet$ For linear stochastic approximation the source of slow convergence is identified: for any $\rho\in (0,1)$, averaging results in estimates for which the error $\textit{covariance}$ vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the $\textit{bias}$ converges to zero at rate $O(\alpha_n)$. This is the first paper to obtain such strong conclusions while allowing for $\rho \le 1/2$. A major conclusion is that the choice of $\rho =0$ or even $\rho<1/2$ is justified only in select settings -- In general, bias may preclude fast convergence.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# Deform3DGS: Gassian Splatting を用いた高速手術シーン再構成のためのフレキシブル変形 Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting ( http://arxiv.org/abs/2405.17835v1 ) ライセンス: Link先を確認	Shuojue Yang, Qian Li, Daiyun Shen, Bingchen Gong, Qi Dou, Yueming Jin,	(参考訳) 組織変形は正確な手術シーンの再構築に重要な課題となる。再現性が高いにもかかわらず、既存の手法ではレンダリング速度が遅く、訓練時間が長く、術中適用性が制限されている。リアルタイム3Dレンダリングの新技術である3D Gaussian Splattingの最近の進歩に触発された本研究は、内視鏡手術中に変形可能な組織に対して、Deform3DGSと呼ばれる新しい高速再構築フレームワークを提示する。具体的には3D GSを点雲初期化を統合して再現性を向上させることで手術シーンに導入する。さらに,個々のガウスレベルにおける組織変形動態を学習するためのフレキシブルな変形モデリング手法 (FDM) を提案する。我々のFDMは、効率的な表現で表面の変形をモデル化することができ、リアルタイムなレンダリング性能を実現することができる。さらに重要なことは、FDMは外科的シーンの再構築を著しく加速し、特に時間効率が重要となる術中環境において、かなりの臨床的価値を示すことである。 DaVinciのロボット手術ビデオを用いた実験では, 再現率の優れたPSNR (37.90) とレンダリング速度 (338.8 FPS) を示すとともに, トレーニング時間を1分/秒に短縮した。 Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruction framework, termed Deform3DGS, for deformable tissues during endoscopic surgery. Specifically, we introduce 3D GS into surgical scenes by integrating a point cloud initialization to improve reconstruction. Furthermore, we propose a novel flexible deformation modeling scheme (FDM) to learn tissue deformation dynamics at the level of individual Gaussians. Our FDM can model the surface deformation with efficient representations, allowing for real-time rendering performance. More importantly, FDM significantly accelerates surgical scene reconstruction, demonstrating considerable clinical values, particularly in intraoperative settings where time efficiency is crucial. Experiments on DaVinci robotic surgery videos indicate the efficacy of our approach, showcasing superior reconstruction fidelity PSNR: (37.90) and rendering speed (338.8 FPS) while substantially reducing training time to only 1 minute/scene.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# フェデレーションラーニングにおける革新的ネットワーク An Innovative Networks in Federated Learning ( http://arxiv.org/abs/2405.17836v1 ) ライセンス: Link先を確認	Zavareh Bozorgasl, Hao Chen,	(参考訳) 本稿では,Wavelet Kolmogorov-Arnold Networks(Wav-KAN)の連合学習への応用について述べる。我々はクライアントにWav-KAN \cite{wav-kan}を実装した。実際,連続ウェーブレット変換 (CWT) と離散ウェーブレット変換 (DWT) の両方を検討した。さまざまなデータセットで大規模な実験を行い、解釈可能性、計算速度、トレーニング、テスト精度の点で、Wav-KANの優れた性能を実証した。我々のフェデレート学習アルゴリズムは、ウェーブレットに基づくアクティベーション機能を統合し、重み、スケール、翻訳によってパラメータ化され、局所的およびグローバルなモデル性能を向上させる。結果は、スケーラブルなニューラルネットワーク設計におけるウェーブレット選択の有効性を強調し、計算効率、ロバスト性、精度を大幅に改善したことを示している。 This paper presents the development and application of Wavelet Kolmogorov-Arnold Networks (Wav-KAN) in federated learning. We implemented Wav-KAN \cite{wav-kan} in the clients. Indeed, we have considered both continuous wavelet transform (CWT) and also discrete wavelet transform (DWT) to enable multiresolution capabaility which helps in heteregeneous data distribution across clients. Extensive experiments were conducted on different datasets, demonstrating Wav-KAN's superior performance in terms of interpretability, computational speed, training and test accuracy. Our federated learning algorithm integrates wavelet-based activation functions, parameterized by weight, scale, and translation, to enhance local and global model performance. Results show significant improvements in computational efficiency, robustness, and accuracy, highlighting the effectiveness of wavelet selection in scalable neural network design.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# 信頼とテロ: テキストの有害性は負のバイアスとパルチザンのネガティビティ Trust and Terror: Hazards in Text Reveal Negatively Biased Credulity and Partisan Negativity Bias ( http://arxiv.org/abs/2405.17838v1 ) ライセンス: Link先を確認	Keith Burghardt, Daniel M. T. Fessler, Chyna Tang, Anne Pisor, Kristina Lerman,	(参考訳) 感情や感情などのテキストの社会言語的な指標は、ソーシャルメディアの特徴をよりよく理解するために、ニューラルネットワークを用いて抽出されることが多い。しかし、しばしば見落とされがちな指標は、テキスト内の危険の存在である。最近の心理学的な研究によると、ハザードに関する言明は、利益に関する言明(負に偏った信条として知られる性質)よりも信頼でき、政治的リベラル派や保守派は、彼らがハザードを共有する頻度で異なることを示唆している。そこで本研究では,新たにアノテートされたXポストの収集と,それ以前の研究でアノテートされた都市伝説に基づいて,ハザードに関する情報を検出する新しいモデルを構築した。我々は,このモデルが良好に機能するだけでなく (例えば GPT-4 のようなゼロショットの人間のアノテータ・プロキシ) 、それらが抽出するハザード情報は,道徳的怒り,感情,脅威語など他の指標と強く相関していないことを示す。 (しかし、期待に反し、危険情報は恐怖などの感情と正の相関を持ち、喜びのような感情と負の相関がある。) 次に、このモデルを3つのデータセットに適用する: COVID-19に関するXポスト、2023年のハマス・イスラエル戦争に関するXポスト、そして新たな都市伝説のコレクション。これらのデータから、各データセットに特有のハザードに関連する単語と、保守派やリベラル派といったユーザグループ間の言語の違いを明らかにし、これらのグループがハザードと認識していることを知らせる。さらに, 危険事象の発生頻度がピークに達し, このような事象の自動指標として機能することを示す。最後に、特に都市伝説ではハザードに関する情報が広く行き渡っていることがわかり、これは過去の研究と一致し、ハザードの報告が信じることも伝達される可能性も高いことが判明した。 Socio-linguistic indicators of text, such as emotion or sentiment, are often extracted using neural networks in order to better understand features of social media. One indicator that is often overlooked, however, is the presence of hazards within text. Recent psychological research suggests that statements about hazards are more believable than statements about benefits (a property known as negatively biased credulity), and that political liberals and conservatives differ in how often they share hazards. Here, we develop a new model to detect information concerning hazards, trained on a new collection of annotated X posts, as well as urban legends annotated in previous work. We show that not only does this model perform well (outperforming, e.g., zero-shot human annotator proxies, such as GPT-4) but that the hazard information it extracts is not strongly correlated with other indicators, namely moral outrage, sentiment, emotions, and threat words. (That said, consonant with expectations, hazard information does correlate positively with such emotions as fear, and negatively with emotions like joy.) We then apply this model to three datasets: X posts about COVID-19, X posts about the 2023 Hamas-Israel war, and a new expanded collection of urban legends. From these data, we uncover words associated with hazards unique to each dataset as well as differences in this language between groups of users, such as conservatives and liberals, which informs what these groups perceive as hazards. We further show that information about hazards peaks in frequency after major hazard events, and therefore acts as an automated indicator of such events. Finally, we find that information about hazards is especially prevalent in urban legends, which is consistent with previous work that finds that reports of hazards are more likely to be both believed and transmitted.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# PeerFL: 大規模ピアツーピアフェデレーション学習シミュレータ PeerFL: A Simulator for Peer-to-Peer Federated Learning at Scale ( http://arxiv.org/abs/2405.17839v1 ) ライセンス: Link先を確認	Alka Luqman, Shivanshu Shekhar, Anupam Chattopadhyay,	(参考訳) この研究は、ピアツーピアのフェデレーション学習ツールと広く使われているネットワークシミュレータNS3を統合し、フェデレーション学習における異種デバイス実験を可能にするために設計された新しいシミュレータを作成する。このクロスプラットフォーム適応性は、既存のシミュレーションツールの重大なギャップに対処し、全体的なユーティリティとユーザエクスペリエンスを向上します。 NS3はWiFiダイナミックスをシミュレートして、トレーニング中に物理的に動き回る参加者とのフェデレーション学習実験を促進することで、動的ネットワーク特性をもたらす。実験では,計算資源の大規模利用におけるシミュレータの効率を実証し,最大450個の異種デバイスをフェデレート学習の参加者としてモデル化した。これは、ピアツーピア・フェデレーション・ラーニングにおけるシミュレーションに基づく調査のための貴重なツールとして位置づけられている。フレームワークはオープンソースで、コミュニティへの使用と拡張が可能である。 This work integrates peer-to-peer federated learning tools with NS3, a widely used network simulator, to create a novel simulator designed to allow heterogeneous device experiments in federated learning. This cross-platform adaptability addresses a critical gap in existing simulation tools, enhancing the overall utility and user experience. NS3 is leveraged to simulate WiFi dynamics to facilitate federated learning experiments with participants that move around physically during training, leading to dynamic network characteristics. Our experiments showcase the simulator's efficiency in computational resource utilization at scale, with a maximum of 450 heterogeneous devices modelled as participants in federated learning. This positions it as a valuable tool for simulation-based investigations in peer-to-peer federated learning. The framework is open source and available for use and extension to the community.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# ベンチマークは多言語対話エージェントの可読性を過小評価する Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents ( http://arxiv.org/abs/2405.17840v1 ) ライセンス: Link先を確認	Andrew H. Lee, Sina J. Semnani, Galo Castillo-López, Gäel de Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen, Manish Shrivastava, Deyi Xiong, Monica S. Lam,	(参考訳) マルチリンガルタスク指向対話(TOD)エージェントの作成は、データ取得のトレーニングコストが高いため困難である。トレーニングデータ効率を改善する研究動向に続き,マルチリンガルTODに対処するのにコンテキスト内学習が十分であることを示す。難易度の高い対話状態追跡(DST)サブタスクを処理するために、少数のサンプルしか使用していないコンテキスト内学習とより互換性のある、より単純なステップに分解する。我々は、中国語、英語、フランス語、韓国語、ヒンディー語、およびコードミキシングされたヒンディー語に12のドメインを持つ多言語TODデータセットX-RiSAWOZのアプローチを検証した。 6言語でのターン・バイ・ターンDSTの精度は55.6%から80.3%の範囲で、SOTAよりも明らかに悪く、60.7%から82.8%の微調整モデルによる結果である。しかし, 検証セットを手作業で評価した結果, ゴールドラベルの誤りを訂正し, データセットのアノテーションスキーマを改善することで, 1) 89.6%-96.8%の精度でGPT-4を実現できることがわかった。これにより、現在の自動メトリクスは、文脈内学習の有効性を非常に過小評価していると結論付ける。 Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly lower than SOTA. However, after manual evaluation of the validation set, we find that by correcting gold label errors and improving dataset annotation schema, GPT-4 with our prompts can achieve (1) 89.6%-96.8% accuracy in DST, and (2) more than 99% correct response generation across different languages. This leads us to conclude that current automatic metrics heavily underestimate the effectiveness of in-context learning.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# ディスクリミネータによる共同音声・ビデオ生成のための協調拡散 Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation ( http://arxiv.org/abs/2405.17842v1 ) ライセンス: Link先を確認	Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji,	(参考訳) 本研究では,事前学習した単一モード生成モデルを利用して,最小計算コストのオーディオ映像生成モデルを構築することを目的とする。そこで本研究では,各単一モーダルモデルをガイドして,各モーダルモデルに対して協調的に整合性のあるサンプルを生成する手法を提案する。具体的には,2つの事前学習ベース拡散モデルが与えられた場合,ベースモデルによって別々に推定されるスコアをオーディオおよびビデオ上での関節分布のスコアに合わせるために,軽量な関節誘導モジュールを訓練する。理論的には、このガイダンスは、ベースモデルによって独立に生成された偽の音声-ビデオ対を識別する最適な判別器の勾配によって計算可能であることを示す。この分析に基づいて,この判別器を訓練して共同指導モジュールを構築する。さらに,判別器の勾配を標準拡散モデルのようにノイズ推定器として機能させ,判別器の勾配を安定化させる損失関数を採用した。いくつかのベンチマークデータセットに対する実証的な評価により,本手法は比較的少数のパラメータで単一モードの忠実度と複数モードのアライメントを改善していることが示された。 In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal model to cooperatively generate well-aligned samples across modalities. Specifically, given two pre-trained base diffusion models, we train a lightweight joint guidance module to adjust scores separately estimated by the base models to match the score of joint distribution over audio and video. We theoretically show that this guidance can be computed through the gradient of the optimal discriminator distinguishing real audio-video pairs from fake ones independently generated by the base models. On the basis of this analysis, we construct the joint guidance module by training this discriminator. Additionally, we adopt a loss function to make the gradient of the discriminator work as a noise estimator, as in standard diffusion models, stabilizing the gradient of the discriminator. Empirical evaluations on several benchmark datasets demonstrate that our method improves both single-modal fidelity and multi-modal alignment with a relatively small number of parameters.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# LLMと身体的知識グラフを用いたサービスロボットの安全管理 Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs ( http://arxiv.org/abs/2405.17846v1 ) ライセンス: Link先を確認	Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang,	(参考訳) 各種産業におけるサービスロボティクスの安全性の制限は、ロボットが安全な慣行に従うことを保証するロバストなメカニズムの必要性を大いに懸念している。知識グラフ(KG)とLarge Language Models(LLM)の統合を含む進歩にもかかわらず、自律ロボットアクションにおける一貫した安全性を保証するという課題は継続している。本稿では,大規模言語モデルとERCP(Embodied Robotic Control Prompts)とEKG(Embodied Knowledge Graphs)を統合することで,サービスロボットの安全性を向上する手法を提案する。 ERCPは、LLMが安全かつ正確な応答を生成するための事前定義された命令として設計されている。これらの応答はEKGによって検証され、ロボットの動作が安全プロトコルと継続的に一致していることを保証する包括的な知識基盤を提供する。そこでは,従来の手法に比べて安全性基準の遵守度が有意に高かった。この統合は、セキュアな人間とロボットのインタラクションを促進し、私たちの方法論を、サービスロボティクスにおけるAI駆動型安全イノベーションの最前線に位置づけます。 Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent safety in autonomous robot actions persist. In this paper, we propose a novel integration of Large Language Models with Embodied Robotic Control Prompts (ERCPs) and Embodied Knowledge Graphs (EKGs) to enhance the safety framework for service robots. ERCPs are designed as predefined instructions that ensure LLMs generate safe and precise responses. These responses are subsequently validated by EKGs, which provide a comprehensive knowledge base ensuring that the actions of the robot are continuously aligned with safety protocols, thereby promoting safer operational practices in varied contexts. Our experimental setup involved diverse real-world tasks, where robots equipped with our framework demonstrated significantly higher compliance with safety standards compared to traditional methods. This integration fosters secure human-robot interactions and positions our methodology at the forefront of AI-driven safety innovations in service robotics.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# I-LLM:完全量子化低ビット大言語モデルのための効率的な整数オンリー推論 I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models ( http://arxiv.org/abs/2405.17849v1 ) ライセンス: Link先を確認	Xing Hu, Yuan Chen, Dawei Yang, Sifan Zhou, Zhihang Yuan, Jiangyong Yu, Chen Xu,	(参考訳) 後学習量子化(PTQ)は、大規模言語モデル(LLM)の推論を加速する強力な手法である。それでも、既存の作業は、RMSNormやSoftmaxのような非線形演算子と同様に、さらなる量子化や非量子化を含む、推論中にかなりの数の浮動小数点演算を必要とする。この制限は、エッジとクラウドデバイスへのLSMのデプロイを妨げる。本稿では,LLMにおける整数のみの量子化の主な障害は,線形演算と非線形演算の両方において,チャネルとトークン間のアクティベーションが大きく変動することにある。この問題に対処するために,LLMに適した整数のみの完全量子化PTQフレームワークであるI-LLMを提案する。具体的には,(1)全てのアクティベーションと重みのチャネル間変動を積極的にスムースに行うために,FSBR(Fully-Smooth Block-Reconstruction)を開発した。 2) トキン間変異による劣化を軽減するため, 動的整数のみのMatMul (DI-MatMul) と呼ばれる新しいアプローチを導入する。この方法は整数のみの演算で入力と出力を動的に量子化することにより、全整数行列乗法における動的量子化を可能にする。 (3) ビットシフトを利用したDI-ClippedSoftmax, DI-Exp, DI-Normalizationを設計し, 精度を維持しつつ, 非線形演算子を効率的に実行する。実験の結果,我々のI-LLMはFPベースラインに匹敵する精度を達成し,非整数量子化法より優れていた。例えば、I-LLMはW4A4で動作でき、精度は無視できる。我々の知る限り、我々は整数のみの量子化と LLM のギャップを埋める最初の人物である。我々は、この分野の進歩に貢献することを目的として、匿名の.4open.scienceに関するコードを公開しました。 Post-training quantization (PTQ) serves as a potent technique to accelerate the inference of large language models (LLMs). Nonetheless, existing works still necessitate a considerable number of floating-point (FP) operations during inference, including additional quantization and de-quantization, as well as non-linear operators such as RMSNorm and Softmax. This limitation hinders the deployment of LLMs on the edge and cloud devices. In this paper, we identify the primary obstacle to integer-only quantization for LLMs lies in the large fluctuation of activations across channels and tokens in both linear and non-linear operations. To address this issue, we propose I-LLM, a novel integer-only fully-quantized PTQ framework tailored for LLMs. Specifically, (1) we develop Fully-Smooth Block-Reconstruction (FSBR) to aggressively smooth inter-channel variations of all activations and weights. (2) to alleviate degradation caused by inter-token variations, we introduce a novel approach called Dynamic Integer-only MatMul (DI-MatMul). This method enables dynamic quantization in full-integer matrix multiplication by dynamically quantizing the input and outputs with integer-only operations. (3) we design DI-ClippedSoftmax, DI-Exp, and DI-Normalization, which utilize bit shift to execute non-linear operators efficiently while maintaining accuracy. The experiment shows that our I-LLM achieves comparable accuracy to the FP baseline and outperforms non-integer quantization methods. For example, I-LLM can operate at W4A4 with negligible loss of accuracy. To our knowledge, we are the first to bridge the gap between integer-only quantization and LLMs. We've published our code on anonymous.4open.science, aiming to contribute to the advancement of this field.	翻訳日:2024-05-29 20:16:52 公開日:2024-05-28
# Fare Evasionのためのディープニューラルネットワークアプローチ A Deep Neural Network Approach to Fare Evasion ( http://arxiv.org/abs/2405.17855v1 ) ライセンス: Link先を確認	Johannes van der Vyver,	(参考訳) 公共交通機関にとってフェール回避は問題であり、LSTMモデルでは、この問題は企業が資本損失を防ぐために、最も問題が発生する場所について分析的な洞察を得るのに役立つ。財政的な負担に加えて、この問題が引き起こされるため、検査官が増えるほど問題を緩和するには不十分である。本研究の目的は公共交通機関における運賃回避の予測方法を探ることである。映像中の乗客のキーポイント抽出により、LSTMモデルはこれらのキーポイント上で訓練され、支払いと回避の間の乗客の行動を予測する。結果は、リアルタイムの映像で乗客の行動を予測した時に有望だった。このように、洗練されたアプローチは、運賃回避の問題を軽減するのに役立つ。 ReIDモデルはLSTMモデルと併用して精度を向上することができる。両モデルとも、公共交通機関が運賃回避問題の発生源を絞り込むことが可能である。 Fare evasion is a problem for public transport companies, with LSTM models this issue can help companies get an analytical insight into where this issue occurs the most, to prevent capital loss. In addition to the financial burden this problem causes, having more inspectors is not enough to alleviate the problem. The purpose of this study is to find a different way to predict fare evasion in the public transport sector. Through the use of keypoint extractions of passengers in video footage, an LSTM model is trained on those keypoints to help predict the actions of passengers between payments and evasions. The results were promising when it came to predicting the actions of passengers on real-time footage. Thus a sophisticated approach can help to decrease the fare evasion problem. A ReID model can be used alongside the LSTM model for better accuracy, as there is always the chance that a person might only pay for the fare at a later stage. With both models, it is possible for public transport companies to start narrowing down where the root of their fare evasion problems emerges.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 新規検出・分割のための事前学習型視覚モデルの適用 Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation ( http://arxiv.org/abs/2405.17859v1 ) ライセンス: Link先を確認	Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, Yu Xiang,	(参考訳) New Instance Detection and Segmentation (NIDS)は、各インスタンスのいくつかの例から、新しいオブジェクトインスタンスを検出し、セグメンテーションすることを目的としている。本稿では、オブジェクトの提案生成、インスタンステンプレートと提案領域の埋め込み生成、インスタンスラベル割り当ての埋め込みマッチングを含む統合フレームワーク(NIDS-Net)を提案する。近年の大規模ビジョン手法の進歩を生かして,正確なバウンディングボックスとマスクを用いたオブジェクト提案を得るために,grounding DINO と Segment Anything Model (SAM) を利用する。私たちのアプローチの中心は、高品質なインスタンス埋め込みの生成です。我々は、DINOv2 ViTバックボーンからのパッチ埋め込みの事前特徴平均を利用し、それに続いて、私たちが導入する重み付けアダプター機構による改善を行った。重み付けアダプタは,特徴空間内の埋め込みを局所的に調整し,オーバーフィッティングを効果的に制限できることを実験的に示す。この手法は直接的なマッチング戦略を可能にし、結果として大きなパフォーマンス向上をもたらす。我々のフレームワークは現在の最先端の手法を超え、4つの検出データセットの平均精度(AP)において22.3、46.2、10.3、24.0の顕著な改善を示している。例えば、BOPチャレンジの7つのコアデータセットのセグメンテーションタスクでは、我々の手法は3.6 APで上位RGBメソッドより優れており、最高のRGB-Dメソッドと競合し続けている。コードは、https://github.com/YoungSean/NIDS-Netで入手できる。 Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 不足データに基づく原子炉設計における材料特性の頑健な予測に向けて -- クリープ破壊特性に関する研究- Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property ( http://arxiv.org/abs/2405.17862v1 ) ライセンス: Link先を確認	Yu Chen, Edoardo Patelli, Zhen Yang, Adolphus Lye,	(参考訳) ディープ・ラーニングの進歩は、特に原子力産業のような安全クリティカルな工学応用において、信頼性と堅牢性に関するさらなる調査をもたらす。主な課題は、データセット(しばしば希少でスパース)の可用性と、データ、モデル、予測の不確実性について十分に考慮されていないことである。そこで本稿では, 原子炉設計における材料特性の信頼性予測を目的とした, 不確実性と事前知識の両面でのメタラーニングに基づくアプローチを提案する。限られたデータの下での堅牢な学習に適している。不確実性は、外挿のために予測関数の分布が生成される場所について説明されている。その結果, 破断寿命予測における既存の経験的手法よりも優れた性能が得られることが示唆された。ここでは破断特性が実証されているが、この学習アプローチは、原子力業界全体でのデータ不足という同様の問題を解決するために、転送可能である。信頼できるツールを提供しながら、適用性と堅牢性を証明することによって、原子力産業におけるAI分析を強化することが非常に重要です。 Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approach that is both uncertainty- and prior knowledge-informed, aiming at trustful predictions of material properties for the nuclear reactor design. It is suited for robust learning under limited data. Uncertainty has been accounted for where a distribution of predictor functions are produced for extrapolation. Results suggest it achieves superior performance than existing empirical methods in rupture life prediction, a case which is typically under a small data regime. While demonstrated herein with rupture properties, this learning approach is transferable to solve similar problems of data scarcity across the nuclear industry. It is of great importance to boosting the AI analytics in the nuclear industry by proving the applicability and robustness while providing tools that can be trusted.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 画像を見る:コントラストアライメントによる視覚相関の優先順位付け Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment ( http://arxiv.org/abs/2405.17871v1 ) ライセンス: Link先を確認	Xin Xiao, Bohong Wu, Jiacong Wang, Chunyuan Li, Xun Zhou, Haoyuan Guo,	(参考訳) 視覚言語モデル(VLM)における既存の画像-テキストのモダリティアライメントは、各テキストトークンを自己回帰的に等しく扱う。単純かつ効果的であるにもかかわらず、入力画像と相関しにくい、あるいは矛盾しないテキストトークンを過度に強調することにより、最適でない相互アライメントを実現する。本稿では,その視覚的相関に基づいて,各テキストトークンに対して異なるコントリビューションを割り当てることを提唱する。具体的には、画像入力を対比することにより、各テキストトークン上の予測ロジットの違いが視覚的相関の強いガイダンスを提供する。コントラストアライメント(Contrastive ALignment, CAL)は、視覚的に相関したトークンのトレーニングを優先する、シンプルで効果的な再重み付け戦略である。実験の結果、CALは様々なベンチマークデータセットにおいて、様々な解像度とモデルサイズで異なるタイプのVLMを一貫して改善することを示した。重要な点として,本手法は計算オーバーヘッドを最小限に抑え,代替データスケーリング戦略と比較して高い効率で処理する。コードはhttps://github.com/foundation-multimodal-models/CALで公開されている。 Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by over-emphasizing the text tokens that are less correlated with or even contradictory with the input images. In this paper, we advocate for assigning distinct contributions for each text token based on its visual correlation. Specifically, we present by contrasting image inputs, the difference in prediction logits on each text token provides strong guidance of visual correlation. We therefore introduce Contrastive ALignment (CAL), a simple yet effective re-weighting strategy that prioritizes training visually correlated tokens. Our experimental results demonstrate that CAL consistently improves different types of VLMs across different resolutions and model sizes on various benchmark datasets. Importantly, our method incurs minimal additional computational overhead, rendering it highly efficient compared to alternative data scaling strategies. Codes are available at https://github.com/foundation-multimodal-models/CAL.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# HFGS : 内視鏡的シーン再構成のための空間的および時間的高周波成分に着目した4次元ガウス切削術 HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction ( http://arxiv.org/abs/2405.17872v1 ) ライセンス: Link先を確認	Haoyu Zhao, Xingyue Zhao, Lingting Zhu, Weixi Zheng, Yongchao Xu,	(参考訳) ロボット支援による最小侵襲手術は、手術結果を改善するため、動的シーン再構築の強化による恩恵を受ける。ニューラル・ラジアンス・フィールド(NeRF)はシーン再構成に有効であるが、推論速度は遅く、トレーニング期間も長いため適用性が制限されている。これらの制限を克服するため、3Dガウススプラッティング(3D-GS)ベースの手法が最近のトレンドとして登場し、高速な推論機能と優れた3D品質を提供する。しかし、これらの手法は静的シーンと動的シーンの両方において過度な再構成に苦慮している。本稿では,空間的および時間的周波数の観点からこれらの課題に対処する,変形可能な内視鏡再構成のための新しいアプローチであるHFGSを提案する。提案手法では,動的シーンの処理に変形場を導入し,空間周波数強調再構成(Spatial High-Frequency Emphasis Reconstruction, SHF)を導入し, レンダリング画像と地上真実との空間周波数スペクトルの差を最小化する。さらに,時間的高周波強調再建(THF)を導入し,流れの先行を生かし,動き集約的な部分の最適化に焦点をあてることで,ニューラルレンダリングのダイナミックな認識を高める。広く使われている2つのベンチマークの大規模な実験は、HFGSが優れたレンダリング品質を達成することを示した。私たちのコードは利用可能です。 Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent trend, offering rapid inference capabilities and superior 3D quality. However, these methods still struggle with under-reconstruction in both static and dynamic scenes. In this paper, we propose HFGS, a novel approach for deformable endoscopic reconstruction that addresses these challenges from spatial and temporal frequency perspectives. Our approach incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth. Additionally, we introduce Temporal High-Frequency Emphasis Reconstruction (THF) to enhance dynamic awareness in neural rendering by leveraging flow priors, focusing optimization on motion-intensive parts. Extensive experiments on two widely used benchmarks demonstrate that HFGS achieves superior rendering quality. Our code will be available.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# MixDQ: メトリックデカップリング型混合精度量子化を用いたメモリ効率の良いFew-Stepテキスト-画像拡散モデル MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization ( http://arxiv.org/abs/2405.17873v1 ) ライセンス: Link先を確認	Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang,	(参考訳) 拡散モデルは視覚的生成の質を著しく向上させた。しかし、その計算とメモリの大幅なコストは、リソースに制約のあるモバイルデバイスやデスクトップGPUでも、その応用に挑戦する。最近の数ステップの拡散モデルでは、デノナイジングステップを減らして推論時間を短縮する。しかし、メモリ消費は依然として過大である。ポストトレーニング量子化(PTQ)は、高ビット幅のFP表現を低ビット整数値(INT4/8)に置き換える。しかし、数ステップの拡散モデルに適用する場合、既存の量子化法は画質とテキストアライメントの両方を維持する上で困難に直面している。この問題に対処するために、混合精度量子化フレームワークであるMixDQを提案する。まず,高感度テキスト埋め込み量子化のための特殊なBOS対応量子化法を設計する。次に,各層の感度を測定するために,距離分離感度解析を行う。最後に,ビット幅割り当てを行う整数型プログラミング手法を開発した。既存の量子化手法はW8A8では不足するが、MixDQは性能を損なわずにW8A8を達成でき、W4A8は視界の劣化を無視できる。 FP16と比較すると,モデルサイズとメモリコストの3～4倍の削減,レイテンシの1.45倍の高速化を実現している。 Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedding quantization. Then, we conduct metric-decoupled sensitivity analysis to measure the sensitivity of each layer. Finally, we develop an integer-programming-based method to conduct bit-width allocation. While existing quantization methods fall short at W8A8, MixDQ could achieve W8A8 without performance loss, and W4A8 with negligible visual degradation. Compared with FP16, we achieve 3-4x reduction in model size and memory cost, and 1.45x latency speedup.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# NUTS、NARS、および音声 NUTS, NARS, and Speech ( http://arxiv.org/abs/2405.17874v1 ) ライセンス: Link先を確認	D. van der Sluis,	(参考訳) 知識と資源の不足を伴いながら環境に適応する情報処理システムの能力について検討するため,音声認識に非公理推論システム(NARS)を活用することを検討する。本稿では, NUTS: raNdom dimensionality redUction non axiomaTic reasoning few Shot learner for perceptionについて述べる。 NUTSは、ナイーブな次元減少、いくつかの前処理、そして非公理推論(NARS)から構成される。 2つのトレーニング例だけで、NUTSは離散的な単語識別のためのWhisper Tinyモデルと同様に機能する。 To investigate whether "Intelligence is the capacity of an information-processing system to adapt to its environment while operating with insufficient knowledge and resources", we look at utilising the non axiomatic reasoning system (NARS) for speech recognition. This article presents NUTS: raNdom dimensionality redUction non axiomaTic reasoning few Shot learner for perception. NUTS consists of naive dimensionality reduction, some pre-processing, and then non axiomatic reasoning (NARS). With only 2 training examples NUTS performs similarly to the Whisper Tiny model for discrete word identification.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# BO4IO:不確実な定量化を伴う逆最適化のためのベイズ最適化手法 BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification ( http://arxiv.org/abs/2405.17875v1 ) ライセンス: Link先を確認	Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang,	(参考訳) この研究はデータ駆動逆最適化(IO: data-driven inverse optimization)に対処し、最適化モデルにおける未知のパラメータを最適化問題の最適解あるいは準最適解と仮定できる観測結果から推定することを目的としている。 IO問題は通常、解決が困難な大規模な二段階プログラムとして定式化されている。従来の厳密な解法から逸脱し,ベイズ最適化に基づく微分自由最適化手法を提案し,BO4IOとよばれる一般IO問題の解法を提案する。我々は、IO損失関数をブラックボックスとして扱い、ガウス過程モデルで近似する。予測された後続関数を用いて、獲得関数を各イテレーションで最小化し、新しい候補解を求め、最適なパラメータ推定に逐次収束する。 IOにベイズ最適化を使用する主な利点は次の2つである。 (i)双レベルプログラムや特殊アルゴリズムの複雑な再構成の必要性を回避し、基礎となる最適化問題が非凸であったり、離散変数を伴っていたりしても計算的トラクタビリティを実現することができる。 (II) プロファイル可能性の近似を可能にし、IOパラメータ推定の不確実な定量化を提供する。提案手法を3種類の計算ケーススタディに適用し, 凸非線形から非凸混合整数非線形プログラムまで, 前方最適化問題のクラスを網羅する。本研究はBO4IOの有効性とロバスト性を示し,未知のモデルパラメータを小・雑音のデータセットから正確に推定する。さらに,提案手法は,パラメータ推定値に対する信頼区間の良好な近似と未知パラメータの識別性の評価に有効であることが確認された。 This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution methods, we propose a derivative-free optimization approach based on Bayesian optimization, which we call BO4IO, to solve general IO problems. We treat the IO loss function as a black box and approximate it with a Gaussian process model. Using the predicted posterior function, an acquisition function is minimized at each iteration to query new candidate solutions and sequentially converge to the optimal parameter estimates. The main advantages of using Bayesian optimization for IO are two-fold: (i) it circumvents the need of complex reformulations of the bilevel program or specialized algorithms and can hence enable computational tractability even when the underlying optimization problem is nonconvex or involves discrete variables, and (ii) it allows approximations of the profile likelihood, which provide uncertainty quantification on the IO parameter estimates. We apply the proposed method to three computational case studies, covering different classes of forward optimization problems ranging from convex nonlinear to nonconvex mixed-integer nonlinear programs. Our extensive computational results demonstrate the efficacy and robustness of BO4IO to accurately estimate unknown model parameters from small and noisy datasets. In addition, the proposed profile likelihood analysis has proven to be effective in providing good approximations of the confidence intervals on the parameter estimates and assessing the identifiability of the unknown parameters.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 個人化フェデレーション学習のための分散型ダイレクトコラボレーション Decentralized Directed Collaboration for Personalized Federated Learning ( http://arxiv.org/abs/2405.17876v1 ) ライセンス: Link先を確認	Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen,	(参考訳) パーソナライズド・フェデレート・ラーニング(PFL)は、各クライアントに最適なパーソナライズされたモデルを見つけるために提案されている。サーバベースFLの中央的障害と通信ボトルネックを回避するため,P2P方式で分散モデルトレーニングを行う分散個人化フェデレーションラーニング(DPFL)に焦点を当てた。 DPFLのパーソナライズされた作品の多くは、非指向的および対称的なトポロジに基づくものであるが、データ、計算、通信資源の不均一性はパーソナライズされたモデルに大きなばらつきをもたらし、非指向的なアグリゲーションを最適でないパーソナライズされたパフォーマンスと非保証的な収束へと導く。これらの問題に対処するために、確率勾配プッシュとパーソナライズされた部分モデルを組み込んだDPFLフレームワークを提案し、それを分散化した \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}) と呼ぶ。局所解をカスタマイズするために、現代のディープモデルにおける線形分類器をパーソナライズし、完全に分散された方法でコンセンサス表現を学ぶ。クライアントは、リソース効率とより良い収束のための柔軟な選択を保証する有向トポロジと非対称なトポロジに基づいて、隣人のサブセットとグラデーションを共有するのみである。理論的には、提案したDFedPGPは一般的な非凸設定において$\mathcal{O}(\frac{1}{\sqrt{T}})$の優れた収束率を達成し、クライアント間の密接な接続が収束を加速することを示す。提案手法は,データと計算の不均一性の両方のシナリオにおいて,最先端(SOTA)の精度を達成し,協調作業の効率化と部分的な勾配プッシュを実証する。 Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 空間性・ハイブリッド性に誘発される視覚パラメーター-医用診断のための効率的な微調整 Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis ( http://arxiv.org/abs/2405.17877v1 ) ライセンス: Link先を確認	Mingyuan Liu, Lu Xu, Shengnan Liu, Jicong Zhang,	(参考訳) 大規模ビジョンモデル(LVM)の成功には、医療診断において極めて高価な膨大なデータ量が伴う。これに対応するため、近年の取り組みでは、少量の重量を凍結しながら少量の重量を訓練するパラメータ・エフェクト・ファインタニング(PEFT)を活用している。しかし、彼らは通常、タスクの違いにかかわらず、LVMの同じ位置にトレーニング可能な重量をヒューリスティックな方法で割り当て、医療診断のような専門的応用に最適なものにしている。これに対応するために、診断対象ファインタニングにおけるスパーシティとハイブリシティの性質を統計的に明らかにしている。すなわち、これらの重要な重量のごく一部がパフォーマンスに大きく影響し、これらの重要な重量は、タスク特化とタスク診断の両方を含むハイブリッドである。課題特化戦略とタスク非依存戦略を混在させることによって,その重要性に基づいて,少量の重みを選択・訓練し,その精度でLVMを医療診断に移行する上で,SH-PEFTが最先端のパフォーマンスを達成できることを実証した。約0.01%の重量を調整することにより、フルモデルファインチューニングよりも優れており、また、SH-PEFTは特定の医療タスクに意図的に最適化された他のモデルと同等の性能を発揮する。 The success of Large Vision Models (LVMs) is accompanied by vast data volumes, which are prohibitively expensive in medical diagnosis.To address this, recent efforts exploit Parameter-Efficient Fine-Tuning (PEFT), which trains a small number of weights while freezing the rest.However, they typically assign trainable weights to the same positions in LVMs in a heuristic manner, regardless of task differences, making them suboptimal for professional applications like medical diagnosis.To address this, we statistically reveal the nature of sparsity and hybridity during diagnostic-targeted fine-tuning, i.e., a small portion of key weights significantly impacts performance, and these key weights are hybrid, including both task-specific and task-agnostic parts.Based on this, we propose a novel Sparsity- and Hybridity-inspired Parameter Efficient Fine-Tuning (SH-PEFT).It selects and trains a small portion of weights based on their importance, which is innovatively estimated by hybridizing both task-specific and task-agnostic strategies.Validated on six medical datasets of different modalities, we demonstrate that SH-PEFT achieves state-of-the-art performance in transferring LVMs to medical diagnosis in terms of accuracy. By tuning around 0.01% number of weights, it outperforms full model fine-tuning.Moreover, SH-PEFT also achieves comparable performance to other models deliberately optimized for specific medical tasks.Extensive experiments demonstrate the effectiveness of each design and reveal that large model transfer holds great potential in medical diagnosis.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 未学習モデル評価のための情報理論メトリクス An Information Theoretic Metric for Evaluating Unlearning Models ( http://arxiv.org/abs/2405.17878v1 ) ライセンス: Link先を確認	Dongjae Jeon, Wonje Jeung, Taeheon Kim, Albert No, Jonghyun Choi,	(参考訳) マシンラーニング(MU)は、トレーニングされたモデルから‘forgetting data’サンプルの情報を削除することで、プライバシの問題に対処する。通常、MU手法の評価には、メンバーシップ推論アタック(MIA)や精度測定などのメトリクスを使用して、データを忘れることなく、スクラッチからトレーニングされたモデルと比較することが含まれる。これらの評価は、未学習モデルと再学習モデルの出力ロジットが類似している場合、未学習モデルがデータを忘れることに成功したことを暗黙的に仮定する。ここでは、この仮定が妥当かどうかを問う。特に,新しいマスク蒸留技術を用いて,元のモデルの最後の層のみを補修し,残りを固定する簡単な実験を行った。驚くべきことに、最後のレイヤを変更するだけでは、既存の評価指標で良好な結果が得られます。 MU手法をよりよく評価するために,情報差分指数(IDI)と呼ばれる情報を用いて,中間特徴量におけるデータサンプルを忘れる際の残留情報を定量化する指標を提案する。 IDIは、DNNの内部構造を効率的に解析することにより、MU手法の包括的な評価を提供する。私たちのメトリクスは大規模データセットにスケーラブルで、さまざまなモデルアーキテクチャに適用可能です。さらに,COLapse-and-Align (COLA) という,中間特徴を効果的に解き放つシンプルなコントラストベースの手法を提案する。 Machine unlearning (MU) addresses privacy concerns by removing information of `forgetting data' samples from trained models. Typically, evaluating MU methods involves comparing unlearned models to those retrained from scratch without forgetting data, using metrics such as membership inference attacks (MIA) and accuracy measurements. These evaluations implicitly assume that if the output logits of the unlearned and retrained models are similar, the unlearned model has successfully forgotten the data. Here, we challenge if this assumption is valid. In particular, we conduct a simple experiment of training only the last layer of a given original model using a novel masked-distillation technique while keeping the rest fixed. Surprisingly, simply altering the last layer yields favorable outcomes in the existing evaluation metrics, while the model does not successfully unlearn the samples or classes. For better evaluating the MU methods, we propose a metric that quantifies the residual information about forgetting data samples in intermediate features using mutual information, called information difference index or IDI for short. The IDI provides a comprehensive evaluation of MU methods by efficiently analyzing the internal structure of DNNs. Our metric is scalable to large datasets and adaptable to various model architectures. Additionally, we present COLapse-and-Align (COLA), a simple contrastive-based method that effectively unlearns intermediate features.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 軌道アグリゲーション木を用いた拡散プランナーの確率的リスクの残留 Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree ( http://arxiv.org/abs/2405.17879v1 ) ライセンス: Link先を確認	Lang Feng, Pengjie Gu, Bo An, Gang Pan,	(参考訳) 拡散プランナーは、非自己回帰的な計画生成により、長時間の水平およびスパースリワードタスクを扱うことを約束している。しかし、実現不可能な軌道を発生させる確率論的リスクは、その信頼性と安定性に重大な課題をもたらす。拡散プランナにおけるこの問題に対処するための新しい手法として, トラジェクティブ・アグリゲーション・ツリー (TAT) を導入する。生の軌跡予測のみに依存する従来の手法と比較して、TATは歴史的および現在の軌跡からの情報を集約し、動的木のような構造を形成する。各軌道は分岐として概念化され、個々の状態はノードとして扱われる。構造が新しい軌道の統合によって進化するにつれて、信頼できない状態は辺境化され、最も影響のあるノードは意思決定のために優先順位付けされる。 TATは、拡散プランナーの元々のトレーニングやサンプリングパイプラインを変更することなく、デプロイできる。我々は,TATの有効性を裏付ける理論的解析と実証的証拠の両方を提供する。本研究は,信頼性の低いトラジェクトリのリスクに対処し,100ドル%のタスクで拡散プランナの性能向上を保証し,試料品質に対する許容許容限界を示し,3ドル以上で計画できることを示す。 Diffusion planners have shown promise in handling long-horizon and sparse-reward tasks due to the non-autoregressive plan generation. However, their inherent stochastic risk of generating infeasible trajectories presents significant challenges to their reliability and stability. We introduce a novel approach, the Trajectory Aggregation Tree (TAT), to address this issue in diffusion planners. Compared to prior methods that rely solely on raw trajectory predictions, TAT aggregates information from both historical and current trajectories, forming a dynamic tree-like structure. Each trajectory is conceptualized as a branch and individual states as nodes. As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making. TAT can be deployed without modifying the original training and sampling pipelines of diffusion planners, making it a training-free, ready-to-deploy solution. We provide both theoretical analysis and empirical evidence to support TAT's effectiveness. Our results highlight its remarkable ability to resist the risk from unreliable trajectories, guarantee the performance boosting of diffusion planners in $100\%$ of tasks, and exhibit an appreciable tolerance margin for sample quality, thereby enabling planning with a more than $3\times$ acceleration.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 拡散リジェクションサンプリング Diffusion Rejection Sampling ( http://arxiv.org/abs/2405.17880v1 ) ライセンス: Link先を確認	Byeonghu Na, Yeongmin Kim, Minsang Park, Donghyeok Shin, Wanmo Kang, Il-Chul Moon,	(参考訳) 強力な事前学習拡散モデルの最近の進歩は、十分に訓練された拡散モデルの下でサンプリング性能を改善する方法の開発を促進する。本稿では,DiffRS (Diffusion Rejection Sampling) を導入し,各タイミングでサンプリングトランジションカーネルを真のカーネルに整列させるリジェクションサンプリング方式を提案する。提案手法は, 各中間段階における試料の品質を評価し, 試料に応じて異なる作業で精製する機構とみなすことができる。理論的解析により、DiffRSは事前訓練されたモデルと比較してサンプリング誤差に厳密な境界を達成できることが示されている。実験により,ベンチマークデータセット上でのDiffRSの最先端性能と高速拡散サンプリングおよび大規模テキスト・画像拡散モデルに対するDiffRSの有効性を実証した。私たちのコードはhttps://github.com/aailabkaist/DiffRS.comで公開されています。 Recent advances in powerful pre-trained diffusion models encourage the development of methods to improve the sampling performance under well-trained diffusion models. This paper introduces Diffusion Rejection Sampling (DiffRS), which uses a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample. Theoretical analysis shows that DiffRS can achieve a tighter bound on sampling error compared to pre-trained models. Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models. Our code is available at https://github.com/aailabkaist/DiffRS.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# Crystal-LSBO:潜時空間ベイズ最適化によるデノボ結晶の自動設計 Crystal-LSBO: Automated Design of De Novo Crystals with Latent Space Bayesian Optimization ( http://arxiv.org/abs/2405.17881v1 ) ライセンス: Link先を確認	Onur Boyar, Yanheng Gu, Yuji Tanaka, Shunsuke Tonogai, Tomoya Itakura, Ichiro Takeuchi,	(参考訳) 結晶構造の生成的モデリングは、これらのモデルが新しい結晶を探索し発見する能力を制限する入力データの複雑さによって著しく困難である。この複雑さはしばしば、デ・ノボの設計手法を既知の結晶の小さな摂動に限定し、高度な最適化手法の効果的な適用を妨げている。そのような最適化手法の1つとして、ラテント・スペース・ベイズ最適化(LSBO)は、特に変分オートエンコーダ(VAE)と組み合わせることで、様々な領域にまたがる新しいオブジェクトを発見できる有望な結果を証明している。 LSBOの可能性と革新的結晶発見への重要なニーズを認識し、LSBOフレームワーク内の探索性を高めるために特別に調整された結晶のデノボ設計フレームワークであるCrystal-LSBOを紹介した。結晶-LSBOは複数のVAEを用いており、それぞれが格子、座標、化学元素といった結晶構造の異なる側面に特化しており、これらの成分を結合的な出力に合成する積分モデルによって構成されている。このセットアップは、学習プロセスの合理化だけでなく、各モデルの学習タスクの複雑さの低下により探索可能な潜在空間も生成し、LSBOアプローチの運用を可能にする。本研究は,ド・ノボ結晶設計におけるLSBOの利用の先駆者であり,生成エネルギー値を中心にした最適化タスクによる有効性を示すものである。本研究は, ド・ノボ結晶発見の新たな視点として, 提案手法の有効性を強調した。 Generative modeling of crystal structures is significantly challenged by the complexity of input data, which constrains the ability of these models to explore and discover novel crystals. This complexity often confines de novo design methodologies to merely small perturbations of known crystals and hampers the effective application of advanced optimization techniques. One such optimization technique, Latent Space Bayesian Optimization (LSBO) has demonstrated promising results in uncovering novel objects across various domains, especially when combined with Variational Autoencoders (VAEs). Recognizing LSBO's potential and the critical need for innovative crystal discovery, we introduce Crystal-LSBO, a de novo design framework for crystals specifically tailored to enhance explorability within LSBO frameworks. Crystal-LSBO employs multiple VAEs, each dedicated to a distinct aspect of crystal structure: lattice, coordinates, and chemical elements, orchestrated by an integrative model that synthesizes these components into a cohesive output. This setup not only streamlines the learning process but also produces explorable latent spaces thanks to the decreased complexity of the learning task for each model, enabling LSBO approaches to operate. Our study pioneers the use of LSBO for de novo crystal design, demonstrating its efficacy through optimization tasks focused mainly on formation energy values. Our results highlight the effectiveness of our methodology, offering a new perspective for de novo crystal discovery.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# 平均逆レスベルトバンドにおける指数的漸近最適性はいつ達成できるか? When is exponential asymptotic optimality achievable in average-reward restless bandits? ( http://arxiv.org/abs/2405.17882v1 ) ライセンス: Link先を確認	Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang,	(参考訳) 離散時間無限水平平均逆レスバンドイット問題を考える。 1つのアームのサブセットは、ほぼ最適な状態分布を持ち、最適局所制御ルーチンに従ってアクションを取る;もう1つのアームのサブセットは、最適な状態分布に向けて駆動され、徐々に第1サブセットにマージされる。我々は, 周期的ユニチェーン, 非縮退性, 局所安定性の軽度な仮定の下で, $O(\exp(-C N))$$ N$武器問題に対する最適性ギャップが漸近的に最適であることを示す。我々の政策は、上記の一組の検証が容易な仮定の下で指数的漸近的最適性を達成する最初のものであるが、事前の作業は強いグローバル・アトラクタの仮定を必要とするか、あるいは$O(1/\sqrt{N})$の最適性ギャップしか達成しない。さらに、仮定を著しく弱めるための基本的な障害についても論じる。特に、局所的な安定性が指数的漸近的最適性の基本であることを証明している。 We consider the discrete-time infinite-horizon average-reward restless bandit problem. We propose a novel policy that maintains two dynamic subsets of arms: one subset of arms has a nearly optimal state distribution and takes actions according to an Optimal Local Control routine; the other subset of arms is driven towards the optimal state distribution and gradually merged into the first subset. We show that our policy is asymptotically optimal with an $O(\exp(-C N))$ optimality gap for an $N$-armed problem, under the mild assumptions of aperiodic-unichain, non-degeneracy, and local stability. Our policy is the first to achieve exponential asymptotic optimality under the above set of easy-to-verify assumptions, whereas prior work either requires a strong Global Attractor assumption or only achieves an $O(1/\sqrt{N})$ optimality gap. We further discuss the fundamental obstacles in significantly weakening our assumptions. In particular, we prove a lower bound showing that local stability is fundamental for exponential asymptotic optimality.	翻訳日:2024-05-29 20:07:07 公開日:2024-05-28
# グラフモタと手書き障害評価尺度(GHDRS: Graphomotor and Handwriting Disabilities Rating Scale) : 複雑で客観的な評価 Graphomotor and Handwriting Disabilities Rating Scale (GHDRS):towards complex and objective assessment ( http://arxiv.org/abs/2405.17886v1 ) ライセンス: Link先を確認	Jiri Mekyska, Katarina Safarova, Tomas Urbanek, Jirina Bednarova, Vojtech Zvoncak, Jana Marie Havigerova, Lukas Cunek, Zoltan Galaz, Jan Mucha, Christine Klauszova, Marcos Faundez-Zanuy, Miguel A. Ferrer, Moises Diaz,	(参考訳) グラフモタと手書き障害(それぞれGDとHD)は、子供の生活の質を著しく低下させる可能性がある。効果的な治療は適切な診断に依存するが、GDとHDの診断と評価への現在のアプローチにはいくつかの限界と知識ギャップがある。本研究の目的は,GHDRS Graphomotor and Handwriting Disabilities Rating Scale (GHDRS Graphomotor and Handwriting Disabilities Rating Scale) を導入することである。この尺度は、描画/手書きのプロセス/製品に関連する17のマニフェストの定量化をサポートする。 GHDRS設計の方法論の全体は、他の言語に適応できるように、極大に透明にされている。 Graphomotor and handwriting disabilities (GD and HD, respectively) could significantly reduce children's quality of life. Effective remediation depends on proper diagnosis; however, current approaches to diagnosis and assessment of GD and HD have several limitations and knowledge gaps, e.g. they are subjective, they do not facilitate identification of specific manifestations, etc. The aim of this work is to introduce a new scale (GHDRS Graphomotor and Handwriting Disabilities Rating Scale) that will enable experts to perform objective and complex computeraided diagnosis and assessment of GD and HD. The scale supports quantification of 17 manifestations associated with the process/product of drawing/ handwriting. The whole methodology of GHDRS design is made maximally transparent so that it could be adapted for other languages.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# LLMアライメントのためのSFTの改善 Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment ( http://arxiv.org/abs/2405.17888v1 ) ライセンス: Link先を確認	Jiaxiang Li, Siliang Zeng, Hoi-To Wai, Chenliang Li, Alfredo Garcia, Mingyi Hong,	(参考訳) 人間の好みと価値を調整することは、現代の基礎モデルにとって重要な要件である。 Reinforcement Learning from Human Feedback (RLHF)のような最先端技術は、しばしば2つの段階から構成される。 1) 教師付き微調整(SFT)では,人間の実演データから学習することでモデルを微調整する。 2)優先学習では,優先データを用いて報酬モデルを学習し,それを強化学習(RL)ステップで微調整する。このような報酬モデルが人間の好みの代案となり、RLのステップをモデルの品質向上に導くことが重要です。本研究では、SFTのステージは報酬モデルを学ぶことのメリットも大きいと論じる。人による実証データを教師付き学習で直接利用する代わりに、逆強化学習(IRL)技術を用いて報酬モデルを構築し、政策モデルを学習することを提案する。このアプローチは、実装が効率的であるだけでなく、好ましい継続と非推奨継続を区別する能力を促進する新しいSFTアルゴリズムをもたらす。さらに,提案したIRLアプローチと,最近提案されたある種の自己プレイアプローチの関連性を同定し,自己プレイが報酬学習エージェントをモデル化する特別な事例であることを示した。理論的には,提案アルゴリズムはIRL問題の定常解に収束することを示す。実験的に,提案手法を用いて1Bと7Bのモデルを調整し,報奨ベンチマークモデルとHuggingFace Open LLM Leaderboardを用いて評価する。提案手法は既存のSFT手法よりも優れた性能を示す。その結果、アライメントプロセス全体を通して報酬学習を明示的にあるいは暗黙的に活用することは有益であることが示唆された。 Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning (RL) step to fine-tune the model. Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to (explicitly or implicitly) build an reward model, while learning the policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but also promote the ability to distinguish between the preferred and non-preferred continuations. Moreover, we identify a connection between the proposed IRL based approach, and certain self-play approach proposed recently, and showed that self-play is a special case of modeling a reward-learning agent. Theoretically, we show that the proposed algorithms converge to the stationary solutions of the IRL problem. Empirically, we align 1B and 7B models using proposed methods and evaluate them on a reward benchmark model and the HuggingFace Open LLM Leaderboard. The proposed methods show significant performance improvement over existing SFT approaches. Our results indicate that it is beneficial to explicitly or implicitly leverage reward learning throughout the entire alignment process.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# 構造的優先度生成による離散拡散モデルの改善 Improving Discrete Diffusion Models via Structured Preferential Generation ( http://arxiv.org/abs/2405.17889v1 ) ライセンス: Link先を確認	Severi Rissanen, Markus Heinonen, Arno Solin,	(参考訳) 画像と音声の領域では、拡散モデルは印象的な性能を示している。しかしながら、言語などの離散データ型へのそれらの適用は、自己回帰生成モデルと比較すると、しばしば準最適である。本稿では,テキスト中の単語などの個別のカテゴリにおける固有情報階層を活用する構造的前方処理を導入することで,離散拡散モデルの改善に挑戦する。提案手法は, 生成過程に偏り, 先行するカテゴリを生成させ, 結果としてtext8データセット上でのログライクなスコアが顕著に向上する。この研究は、離散拡散モデルにおけるさらなる進歩の道を開くもので、性能が大幅に向上する可能性がある。 In the domains of image and audio, diffusion models have shown impressive performance. However, their application to discrete data types, such as language, has often been suboptimal compared to autoregressive generative models. This paper tackles the challenge of improving discrete diffusion models by introducing a structured forward process that leverages the inherent information hierarchy in discrete categories, such as words in text. Our approach biases the generative process to produce certain categories before others, resulting in a notable improvement in log-likelihood scores on the text8 dataset. This work paves the way for more advances in discrete diffusion models with potentially significant enhancements in performance.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# SLMRec: シークエンシャルレコメンデーションのための小さな言語モデル SLMRec: Empowering Small Language Models for Sequential Recommendation ( http://arxiv.org/abs/2405.17890v1 ) ライセンス: Link先を確認	Wujiang Xu, Zujie Liang, Jiaojiao Han, Xuying Ning, Wenfang Lin, Linxun Chen, Feng Wei, Yongfeng Zhang,	(参考訳) シーケンシャルレコメンデーション(SR)タスクは、過去のインタラクションを考慮して、ユーザが対話する可能性のある次の項目を予測することを含む。 SRモデルは、ユーザの行動のシーケンスを調べ、より複雑な行動パターンと時間的ダイナミクスを識別する。近年の研究では、LLMが言語モデリングとして逐次レコメンデーションを見るか、ユーザ表現のバックボーンとして機能するかといった、シーケンシャルレコメンデーションシステムに大きく影響していることが示されている。これらの手法は優れた性能をもたらすが、特にシーケンシャルなレコメンデーションシーンにおいて、大きな言語モデルの必要性と、言語モデルがどれほどの規模で必要とされるかの証拠は乏しい。一方、LLMの巨大なサイズのため、毎日何十億ものトラフィックログを処理する必要がある実世界のプラットフォームにLLMベースのモデルを適用するのは非効率で非現実的です。本稿では,LLMが大規模産業データセットに対して広範な実験を行うことで,LLMの深度に与える影響について検討する。驚いたことに、LLMのほとんどの中間層は冗長であることがわかった。この知見に触発され、簡単な知識蒸留法を採用するSR(SLMRec)の小さな言語モデルに力を与える。さらに、SLMRecは量子化やプルーニングといった他の訓練後の効率技術と直交しており、それらを組み合わせて利用することができる。総合的な実験結果から,提案したSLMRecモデルは,LLMに基づく推薦モデルで見られるパラメータの13%のみを用いて,最大6.6倍,最大8.0倍の高速化を同時に達成し,最高の性能が得られることが示された。 The sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as language modeling or serving as the backbone for user representation. Although these methods deliver outstanding performance, there is scant evidence of the necessity of a large language model and how large the language model is needed, especially in the sequential recommendation scene. Meanwhile, due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms that often need to process billions of traffic logs daily. In this paper, we explore the influence of LLMs' depth by conducting extensive experiments on large-scale industry datasets. Surprisingly, we discover that most intermediate layers of LLMs are redundant. Motivated by this insight, we empower small language models for SR, namely SLMRec, which adopt a simple yet effective knowledge distillation method. Moreover, SLMRec is orthogonal to other post-training efficiency techniques, such as quantization and pruning, so that they can be leveraged in combination. Comprehensive experimental results illustrate that the proposed SLMRec model attains the best performance using only 13% of the parameters found in LLM-based recommendation models, while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# 高品位ダイナミックシーン再構築のための3次元ガウス表現法 A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction ( http://arxiv.org/abs/2405.17891v1 ) ライセンス: Link先を確認	Bin Zhang, Bi Zeng, Zexin Peng,	(参考訳) 近年,Neural Radiance Fields (NeRF) は3次元の3次元再構成に革命をもたらした。 NeRF上に構築された3D Gaussian Splatting (3D-GS)は、ニューラルネットワークの暗黙の表現から脱却し、代わりにガウス型の分布を持つ点雲としてシーンを直接表現している。このシフトにより、ラディアンスフィールドのレンダリング品質と速度が著しく向上したが、必然的にメモリ使用量が大幅に増加した。さらに、3D-GSで動的シーンを効果的にレンダリングすることは、プレスの課題として現れている。これらの問題に対処するため,本稿では,高品質な動的シーン再構成のための3次元ガウス表現を提案する。まず,変形可能な多層パーセプトロン(MLP)ネットワークを用いてガウス点の動的オフセットを捕捉し,ハッシュ符号化による点の色特徴を表現する。その後,学習可能なデノナイジングマスクとデノナイジングマスクを導入し,シーンからノイズポイントを除去し,さらに3次元ガウスモデルを圧縮する。最後に、点の運動ノイズは、静的な制約と運動の整合性制約によって緩和される。実験の結果,本手法は3D-GSに関連するメモリ使用量を大幅に削減し,新規なビュー合成や動的マッピングといった様々なタスクに非常に適していることがわかった。 In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# LLMを用いた算術的推論:プロログ生成と置換 Arithmetic Reasoning with LLM: Prolog Generation & Permutation ( http://arxiv.org/abs/2405.17893v1 ) ライセンス: Link先を確認	Xiaocheng Yang, Bingsen Chen, Yik-Cheung Tam,	(参考訳) 小学校数学の問題を解くための大規模言語モデル (LLM) の指導は、Chain of Thought (CoT) を用いて大きな成功を収めた。しかし、CoT の手法は LLM に頼り、カスケード計算の誤りを生じやすい算術演算列を生成する。我々は, LLM が数学問題記述から述語を抽出し, 記号式を生成することに集中して, 基礎となる計算を外部コードインタープリタで行うことを仮定する。数学的な問題を解くために,LLMを用いてPrologプログラムを生成する。実験結果から,GSM8KベンチマークにおけるPrologに基づく算術問題解は,3つの異なるLLM間でCoT生成に優れることがわかった。さらに,Prologにおける述語や記号式の不感な順序付けを考慮し,データ拡張によるより堅牢なLLMトレーニングのために,基底真理述語をパーミュレートすることを提案する。 Instructing large language models (LLMs) to solve elementary school math problems has shown great success using Chain of Thought (CoT). However, the CoT approach relies on an LLM to generate a sequence of arithmetic calculations which can be prone to cascaded calculation errors. We hypothesize that an LLM should focus on extracting predicates and generating symbolic formulas from the math problem description so that the underlying calculation can be done via an external code interpreter. We investigate using LLM to generate Prolog programs to solve mathematical questions. Experimental results show that our Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark across three distinct LLMs. In addition, given the insensitive ordering of predicates and symbolic formulas in Prolog, we propose to permute the ground truth predicates for more robust LLM training via data augmentation.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# 大規模ビジョンランゲージモデルに対するホワイトボックスマルチモーダルジェイルブレイク White-box Multimodal Jailbreaks Against Large Vision-Language Models ( http://arxiv.org/abs/2405.17894v1 ) ライセンス: Link先を確認	Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang,	(参考訳) 近年のVLM(Large Vision-Language Models)の進歩は、様々なマルチモーダルタスクにおいて、その優位性を裏付けている。しかしながら、VLMの対角的堅牢性は十分には研究されていない。既存の手法は主に、テキストベースの攻撃に対して固有のレジリエンスを仮定しながら、画像を摂動する一元対向攻撃によるロバスト性を評価する。既存の攻撃とは違って、本研究では、テキストと画像のモダリティの両方を共同攻撃して、VLM内のより広範な脆弱性を悪用する、より包括的な戦略を提案する。具体的には,モデルに高毒性の肯定応答を生成するための2つの最適化手法を提案する。本手法は, テキスト入力がない場合に, 有害な応答を多様に生成するために, 逆画像プレフィックスをランダムノイズから最適化することから始める。その後、逆境テキスト接尾辞と逆境画像接頭辞とを一体化して、様々な有害な指示に対する肯定応答を誘発する確率を最大化する。検出された逆画像プレフィックスとテキスト接尾辞は総称してユニバーサルマスターキー(UMK)と表記される。様々な悪意のあるクエリに統合されると、UMKはVLMのアライメント防御を回避し、jailbreaksとして知られる好ましくないコンテンツを生成する。実験の結果,我々のユニバーサルアタック戦略は,96%の成功率でMiniGPT-4を効果的に脱獄し,VLMの脆弱性と新たなアライメント戦略の必要性を強調した。 Recent advancements in Large Vision-Language Models (VLMs) have underscored their superiority in various multimodal tasks. However, the adversarial robustness of VLMs has not been fully explored. Existing methods mainly assess robustness through unimodal adversarial attacks that perturb images, while assuming inherent resilience against text-based attacks. Different from existing attacks, in this work we propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within VLMs. Specifically, we propose a dual optimization objective aimed at guiding the model to generate affirmative responses with high toxicity. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input, thus imbuing the image with toxic semantics. Subsequently, an adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions. The discovered adversarial image prefix and text suffix are collectively denoted as a Universal Master Key (UMK). When integrated into various malicious queries, UMK can circumvent the alignment defenses of VLMs and lead to the generation of objectionable content, known as jailbreaks. The experimental results demonstrate that our universal attack strategy can effectively jailbreak MiniGPT-4 with a 96% success rate, highlighting the vulnerability of VLMs and the urgent need for new alignment strategies.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# C^2M^3$:Cycle-Consistent Multi-Model Merging $C^2M^3$: Cycle-Consistent Multi-Model Merging ( http://arxiv.org/abs/2405.17897v1 ) ライセンス: Link先を確認	Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele Rodolà,	(参考訳) 本稿では,重み空間にニューラルネットワークをマージする新しいデータフリー手法を提案する。本手法は,既存の研究と異なり,全層にわたるネットワークニューロンの置換を最適化する。これにより、$N \geq 3$モデルをマージする際の置換のサイクル一貫性を強制することができ、経路に沿ってエラーを蓄積することなく、置換の円形合成を計算できる。このような制約の必要性を質的かつ定量的に動機付け、さまざまなアーキテクチャやデータセットにまたがるシナリオにおいて、モデルのセットをマージする際のメリットを示します。最終的に、アクティベーション再正規化と組み合わせると、我々の手法がタスクの最良の結果をもたらすことを示す。 In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging $N \geq 3$ models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# FlashST: トラフィック予測のためのシンプルで普遍的なプロンプトチューニングフレームワーク FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction ( http://arxiv.org/abs/2405.17898v1 ) ライセンス: Link先を確認	Zhonghang Li, Lianghao Xia, Yong Xu, Chao Huang,	(参考訳) 交通予測の目的は、空間と時間の両方を考慮して交通パターンのダイナミクスを正確に予測し、分析することである。しかし、既存のモデルでは、トレーニング分布と大きく異なるテストデータに直面すると、一般化に苦慮しているため、分布シフトの存在はこの分野において大きな課題となる。本稿では,多様な下流データセットの特徴に事前学習したモデルを適応させ,多様なトラフィック予測シナリオにおける一般化を改善する,シンプルで汎用的な時空間プロンプトチューニングフレームワーク-FlashSTを提案する。特に、FlashSTフレームワークは、文脈内学習のための軽量な時空間プロンプトネットワークを使用し、時空間不変の知識を捉え、多様なシナリオへの効果的な適応を容易にする。さらに,事前学習データと下流データの分布を整合させる分布マッピング機構を導入し,時空間予測における効果的な知識伝達を容易にする。多様な都市データセットを用いた時空間予測タスクにおけるFlashSTの有効性を実証的に評価した。コードはhttps://github.com/HKUDS/FlashSTで入手できる。 The objective of traffic prediction is to accurately forecast and analyze the dynamics of transportation patterns, considering both space and time. However, the presence of distribution shift poses a significant challenge in this field, as existing models struggle to generalize well when faced with test data that significantly differs from the training distribution. To tackle this issue, this paper introduces a simple and universal spatio-temporal prompt-tuning framework-FlashST, which adapts pre-trained models to the specific characteristics of diverse downstream datasets, improving generalization in diverse traffic prediction scenarios. Specifically, the FlashST framework employs a lightweight spatio-temporal prompt network for in-context learning, capturing spatio-temporal invariant knowledge and facilitating effective adaptation to diverse scenarios. Additionally, we incorporate a distribution mapping mechanism to align the data distributions of pre-training and downstream data, facilitating effective knowledge transfer in spatio-temporal forecasting. Empirical evaluations demonstrate the effectiveness of our FlashST across different spatio-temporal prediction tasks using diverse urban datasets. Code is available at https://github.com/HKUDS/FlashST.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# 感情の相互融合とクラス間コントラスト学習による会話における感情認識の促進 Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning ( http://arxiv.org/abs/2405.17900v1 ) ライセンス: Link先を確認	Haoxiang Shi, Xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, Jing Xiao, Jianzong Wang,	(参考訳) 会話における感情認識の目的は、文脈情報に基づいて発話の感情カテゴリーを特定することである。従来のERC法は、モーダル融合のための単純な接続に依存しており、モダリティ間の情報差を無視していたため、モデルがモダリティ固有の感情情報に集中できなかった。同時に、モダリティ間の共有情報は、感情を生成するために処理されなかった。情報冗長性の問題。これらの制限を克服するために,ベクトル接続に基づくモーダル融合感情予測ネットワークを提案する。ネットワークは主に、接続ベクトルに基づくマルチモーダル特徴融合ステージと、融合特徴に基づく感情分類ステージの2段階を含む。さらに,感情ラベルに基づく教師付きクラス間コントラスト学習モジュールを設計する。実験の結果,提案手法の有効性を確認し,IEMOCAPおよびMELDデータセット上での優れた性能を示した。 The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28
# リモートセンシングにおける視覚変換器の近赤外・低域適応 Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing ( http://arxiv.org/abs/2405.17901v1 ) ライセンス: Link先を確認	Irem Ulku, O. Ozgur Tanriover, Erdem Akagündüz,	(参考訳) 植物の健康状態は、近赤外線反射率(NIR)を測定するマルチスペクトルセンサーを用いて動的に監視することができる。このような可能性にもかかわらず、高解像度のNIR画像の取得と注釈付けは、ディープニューラルネットワークのトレーニングにおいて重要な課題となっている。通常、RGBドメインで事前トレーニングされた大きなネットワークは、赤外線画像の微調整に利用される。本手法では,RGB と NIR 画像の視覚特性が異なるため,領域シフトの問題が発生するが,ローランク適応 (LoRA) と呼ばれる手法は,元のネットワーク重みを凍結させながらランク分解行列を最適化することにより,より効率的なトレーニングを可能にする。しかし、リモートセンシング画像に対する既存のパラメータ効率適応戦略は、RGB画像とNIR領域におけるドメインシフト問題に重点を置いている。そこで本研究では,RGB領域で事前学習した視覚トランスフォーマー(ViT)バックボーンを,NIR領域の下流タスクに低ランク適応させることのメリットについて検討した。広汎な実験により、トレーニング済みのViTバックボーンにLoRAを用いることで、NIR画像に適用された下流タスクに最高のパフォーマンスが得られることが示されている。 Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR). Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. Typically, large networks pre-trained on the RGB domain are utilized to fine-tune infrared images. This practice introduces a domain shift issue because of the differing visual traits between RGB and NIR images.As an alternative to fine-tuning, a method called low-rank adaptation (LoRA) enables more efficient training by optimizing rank-decomposition matrices while keeping the original network weights frozen. However, existing parameter-efficient adaptation strategies for remote sensing images focus on RGB images and overlook domain shift issues in the NIR domain. Therefore, this study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain. Extensive experiments demonstrate that employing LoRA with pre-trained ViT backbones yields the best performance for downstream tasks applied to NIR images.	翻訳日:2024-05-29 19:57:23 公開日:2024-05-28

Title

Authors

Abstract

論文公表日・翻訳日

# 道徳的判断をテキストから復号する:パイロット研究

Decoding moral judgement from text: a pilot study ( http://arxiv.org/abs/2407.00039v1 )

ライセンス: Link先を確認

Diana E. Gherman, Thorsten O. Zander,

(参考訳) 道徳的判断は、認知的・感情的な次元に関わる複雑な人間の反応である。道徳的神経相関のいくつかは知られているが、単一の裁判所レベルで道徳的違反を検出することができるかどうかはまだ分かっていない。本稿では,受動的脳-コンピュータインタフェースを用いたテキスト刺激による道徳的判断復号の実現可能性について検討する。効果的な道徳的判断の誘因として,テキスト刺激提示に先立って映像音声による情緒的プライミングを用い,そのテキストを道徳的エージェントに属性付けする。以上の結果から,道徳的整合性と不整合状態との信頼性の高い分類を実現するためには,さらなる努力が必要であることが示唆された。我々は、中立と道徳的にチャージされた試験の精度の良い結果を得る。本研究では,ニューロアダプティブな人間-コンピュータインタラクションと,より人間互換な大規模言語モデル(LLM)への道を開くことを目的とする。

Moral judgement is a complex human reaction that engages cognitive and emotional dimensions. While some of the morality neural correlates are known, it is currently unclear if we can detect moral violation at a single-trial level. In a pilot study, here we explore the feasibility of moral judgement decoding from text stimuli with passive brain-computer interfaces. For effective moral judgement elicitation, we use video-audio affective priming prior to text stimuli presentation and attribute the text to moral agents. Our results show that further efforts are necessary to achieve reliable classification between moral congruency vs. incongruency states. We obtain good accuracy results for neutral vs. morally-charged trials. With this research, we try to pave the way towards neuroadaptive human-computer interaction and more human-compatible large language models (LLMs)

翻訳日:2024-07-22 22:38:24 公開日:2024-05-28

# UCAVドッグファイトにおけるDRLを用いた空気圧決定法の検討

Interpretable DRL-based Maneuver Decision of UCAV Dogfight ( http://arxiv.org/abs/2407.01571v1 )

ライセンス: Link先を確認

Haoran Han, Jian Cheng, Maolong Lv,

(参考訳) 本稿では, 深部強化学習(DRL)が高次機動決定に寄与する3層無人戦闘機(UCAV)のドッグファイトフレームを提案する。 4チャンネルの低レベル制御法が最初に構築され、続いて8つの基本的な飛行操作(BFM)を含む図書館が設けられている。 UCAVドッグファイトにおけるBFM選択にはDouble Deep Q Network (DDQN) が適用される。シミュレーションの結果, エージェントはDT戦略に対して85.75%の勝利率を達成でき, 各種の未確認相手に対面した場合, 肯定的な結果が得られることがわかった。提案した枠組みに基づいて,DRLをベースとしたドッグファイトの解釈性が有意に向上した。ヨーヨーを行い、旋回率を調整し、操作性を高める。ディーブ・アンド・チェイス」の行動の創発は、エージェントが相手の欠点を利用する新しい戦術を生成できることを示している。

This paper proposes a three-layer unmanned combat aerial vehicle (UCAV) dogfight frame where Deep reinforcement learning (DRL) is responsible for high-level maneuver decision. A four-channel low-level control law is firstly constructed, followed by a library containing eight basic flight maneuvers (BFMs). Double deep Q network (DDQN) is applied for BFM selection in UCAV dogfight, where the opponent strategy during the training process is constructed with DT. Our simulation result shows that, the agent can achieve a win rate of 85.75% against the DT strategy, and positive results when facing various unseen opponents. Based on the proposed frame, interpretability of the DRL-based dogfight is significantly improved. The agent performs yo-yo to adjust its turn rate and gain higher maneuverability. Emergence of "Dive and Chase" behavior also indicates the agent can generate a novel tactic that utilizes the drawback of its opponent.

翻訳日:2024-07-22 22:18:55 公開日:2024-05-28

# 深層学習によるインド株式市場のセクター収益性を探る

Exploring Sectoral Profitability in the Indian Stock Market Using Deep Learning ( http://arxiv.org/abs/2407.01572v1 )

ライセンス: Link先を確認

Jaydip Sen, Hetvi Waghela, Sneha Rakshit,

(参考訳) 本稿では,Long-Term Memory(LSTM)モデルを用いた株価の正確な予測とそのポートフォリオ設計への応用について検討する。株価の予測は不可能であるという効率的な市場仮説にもかかわらず、最近の研究は高度なアルゴリズムと予測モデルの可能性を示している。この研究は、既存の株価予測手法に関する文献に基づいており、機械学習とディープラーニングアプローチへのシフトを強調している。 LSTMモデルでは、NSE、インドに上場している18のセクターで180銘柄の歴史的株価を用いて、将来の価格を予測する。これらの予測は、各株の売買決定を導き、セクターの収益性を分析する。この研究の主な貢献は、ロバストなポートフォリオ設計のための最適化LSTMモデルの導入、売買取引のためのLSTM予測の利用、セクターの収益性とボラティリティに関する洞察である。その結果,株価を正確に予測し,投資決定を下す上でLSTMモデルの有効性が示された。セクターの収益性と予測精度を比較することで、インドの現在の金融市場のダイナミクスに関する貴重な洞察を提供する。

This paper explores using a deep learning Long Short-Term Memory (LSTM) model for accurate stock price prediction and its implications for portfolio design. Despite the efficient market hypothesis suggesting that predicting stock prices is impossible, recent research has shown the potential of advanced algorithms and predictive models. The study builds upon existing literature on stock price prediction methods, emphasizing the shift toward machine learning and deep learning approaches. Using historical stock prices of 180 stocks across 18 sectors listed on the NSE, India, the LSTM model predicts future prices. These predictions guide buy/sell decisions for each stock and analyze sector profitability. The study's main contributions are threefold: introducing an optimized LSTM model for robust portfolio design, utilizing LSTM predictions for buy/sell transactions, and insights into sector profitability and volatility. Results demonstrate the efficacy of the LSTM model in accurately predicting stock prices and informing investment decisions. By comparing sector profitability and prediction accuracy, the work provides valuable insights into the dynamics of the current financial markets in India.

翻訳日:2024-07-22 22:18:55 公開日:2024-05-28

# 軌道最適化のためのモデルベース拡散

Model-Based Diffusion for Trajectory Optimization ( http://arxiv.org/abs/2407.01573v1 )

ライセンス: Link先を確認

Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu,

(参考訳) 拡散モデルの最近の進歩は、反復的な精錬プロセスを通じて複雑な分布から高忠実度サンプルを生成する強力な能力を示している。運動計画と制御における拡散モデルの実証的な成功にもかかわらず、これらの手法のモデルフリー性は、容易に利用可能なモデル情報を活用することができず、訓練データ以外の新しいシナリオ(例えば、異なるダイナミクスを持つ新しいロボット)にその一般化を制限しない。本研究では,モデルベース拡散(MBD)を導入し,データのない軌道最適化(TO)問題の解法として拡散法を用いた最適化手法を提案する。鍵となる考え方は、TO問題におけるモデル情報を活用することでスコア関数を明示的に計算することであり、これが我々のアプローチをモデルベース拡散と呼ぶ理由である。さらに、MBDは外部データを必要としないが、様々な品質のデータと自然に統合して拡散過程を制御できる。また、MBDはサンプリングベース最適化と興味深い関係があることも明らかにした。実験的な評価から,MBDは接触に富む課題に挑戦する上で,最先端の強化学習およびサンプリングベースTO法より優れていることが示された。さらに、MBDがデータと統合する能力は、標準拡散モデルの範囲を超えて、不完全かつ実用的なデータ(例えば、高次元ヒューマノイドの部分状態デモ)であっても、その汎用性と実用性を高める。

Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD's ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models.

翻訳日:2024-07-22 22:18:55 公開日:2024-05-28

# サーモグラフィー技術の探求:顔検出・認識・感情のための総合的な顔データセット

Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion ( http://arxiv.org/abs/2407.09494v1 )

ライセンス: Link先を確認

Mohamed Fawzi Abdelshafie Abuhussein, Ashraf Darwish, Aboul Ella Hassanien,

(参考訳) このデータセットは、顔の検出、認識、感情分析のためにUNI-T UTi165Aカメラを用いてキャプチャされた6823の熱画像を含む。この画像は、感情(幸せ、悲しみ、怒り、自然、驚き)を描いた2485の顔認識画像と、顔認識のための2054のイメージと、顔検出のための2284のイメージで構成されている。このデータセットは、さまざまな条件、カラーパレット、撮影角度、ズームレベルをカバーしており、温度範囲は10{\deg}Cから400{\deg}C、解像度は19,200ピクセルである。これは、熱画像技術の進歩、アルゴリズム開発の支援、異なるパレットにわたる顔認識のためのベンチマークのための貴重なリソースとして機能する。さらに、顔の動き認識に寄与し、コンピュータビジョン、心理学、神経科学における学際的なコラボレーションを促進する。このデータセットは、セキュリティ、ヘルスケア、人間とコンピュータのインタラクションに応用して、サーマルフェイスの検出と認識の研究における透明性を促進する。

This dataset includes 6823 thermal images captured using a UNI-T UTi165A camera for face detection, recognition, and emotion analysis. It consists of 2485 facial recognition images depicting emotions (happy, sad, angry, natural, surprised), 2054 images for face recognition, and 2284 images for face detection. The dataset covers various conditions, color palettes, shooting angles, and zoom levels, with a temperature range of -10{\deg}C to 400{\deg}C and a resolution of 19,200 pixels. It serves as a valuable resource for advancing thermal imaging technology, aiding in algorithm development, and benchmarking for facial recognition across different palettes. Additionally, it contributes to facial motion recognition, fostering interdisciplinary collaboration in computer vision, psychology, and neuroscience. The dataset promotes transparency in thermal face detection and recognition research, with applications in security, healthcare, and human-computer interaction.

翻訳日:2024-07-22 13:38:25 公開日:2024-05-28

# Interpret3C: 個別の特徴選択による解釈可能な学生クラスタリング

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection ( http://arxiv.org/abs/2407.11979v1 )

ライセンス: Link先を確認

Isadora Salles, Paola Mejia-Domenzain, Vinitra Swamy, Julian Blackwell, Tanja Käser,

(参考訳) 教育におけるクラスタリング、特にMOOCのような大規模オンライン環境でのクラスタリングは、多様な学生のニーズを理解し、適応するために不可欠である。しかし、クラスタリングの有効性は、その解釈可能性に依存するため、高次元データでは困難になる。既存のクラスタリングアプローチは、機能の重要性における個々の違いを無視し、均質化された機能セットに依存していることが多い。このギャップに対処するために,解釈可能なニューラルネットワーク(NN)を教師なし学習コンテキストに組み込んだ,新たなクラスタリングパイプラインであるInterpret3C(Interpretable Conditional Computation Clustering)を導入する。本手法は, NNにおける適応ゲーティングを利用して, 生徒ごとの特徴を抽出する。次に、生徒毎の最も関連性の高い機能を使用してクラスタリングを行い、クラスタの関連性と解釈可能性を高める。我々はInterpret3Cを用いて,5,000人以上の学生を抱えるMOOCにおいて,個々の特徴の重要性を考慮した行動クラスタの分析を行った。この研究は、スケーラブルでロバストなクラスタリング手法と、個々の学生の違いを尊重し、高次元データの解釈可能性を改善する教育ケーススタディを提供することによって、この分野に貢献する。

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging with high-dimensional data. Existing clustering approaches often neglect individual differences in feature importance and rely on a homogenized feature set. Addressing this gap, we introduce Interpret3C (Interpretable Conditional Computation Clustering), a novel clustering pipeline that incorporates interpretable neural networks (NNs) in an unsupervised learning context. This method leverages adaptive gating in NNs to select features for each student. Then, clustering is performed using the most relevant features per student, enhancing clusters' relevance and interpretability. We use Interpret3C to analyze the behavioral clusters considering individual feature importances in a MOOC with over 5,000 students. This research contributes to the field by offering a scalable, robust clustering methodology and an educational case study that respects individual student differences and improves interpretability for high-dimensional data.

翻訳日:2024-07-22 11:50:18 公開日:2024-05-28

# 家庭レベルの貧困度測定におけるブースティングアルゴリズムの利用:フィリピンにおける世帯重質質の予測と分類のための機械学習アプローチ

Use of Boosting Algorithms in Household-Level Poverty Measurement: A Machine Learning Approach to Predict and Classify Household Wealth Quintiles in the Philippines ( http://arxiv.org/abs/2407.13061v1 )

ライセンス: Link先を確認

Erika Lynet Salvador,

(参考訳) 本研究では、アダプティブブースティング(AdaBoost)、キャットブースティング(CatBoost)、グラディエントブースティングマシン(GBM)、ライトグラディエントブースティングマシン(LightGBM)、エクストリームグラディエントブースティング(XGBoost)の5つのアルゴリズムを用いて、フィリピンの貧困レベルを予測する機械学習モデルの有効性を評価した。 CatBoostが上位モデルとして登場し、精度、精度、リコール、F1スコアで91%、XGBoostとGBMが99%、GBMが88%で最高スコアを記録した。さらに、これらのモデルの計算効率を調べ、実世界のアプリケーションに不可欠なトレーニング時間、テスト速度、モデルサイズ要因のバランスを分析する。訓練期間は長いものの、CatBoostは高い試験効率を示した。これらの結果から,機械学習は貧困予測や政策介入の進展に有効であることが示唆された。今後の研究は、これらのモデルの予測精度とポリシーユーティリティを高めるために、より広範な多様なデータを統合することに焦点を当てるべきである。

This study assessed the effectiveness of machine learning models in predicting poverty levels in the Philippines using five boosting algorithms: Adaptive Boosting (AdaBoost), CatBoosting (CatBoost), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). CatBoost emerged as the superior model and achieved the highest scores across accuracy, precision, recall, and F1-score at 91 percent, while XGBoost and GBM followed closely with 89 percent and 88 percent respectively. Additionally, the research examined the computational efficiency of these models to analyze the balance between training time, testing speed, and model size factors crucial for real-world applications. Despite its longer training duration, CatBoost demonstrated high testing efficiency. These results indicate that machine learning can aid in poverty prediction and in the development of targeted policy interventions. Future studies should focus on incorporating a wider variety of data to enhance the predictive accuracy and policy utility of these models.

翻訳日:2024-07-22 08:18:00 公開日:2024-05-28

# 個人に対するアービタリティのコスト--モデル多重性の法的・技術的課題の検討

The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity ( http://arxiv.org/abs/2407.13070v1 )

ライセンス: Link先を確認

Prakhar Ganesh, Ihsan Ibrahim Daldaban, Ignacio Cofone, Golnoosh Farnadi,

(参考訳) モデル多重性(Multipleity)とは、異なる基礎となる学習機能にもかかわらず、複数のモデルが類似した性能を達成する現象であり、モデル選択において任意性を導入する現象である。この仲裁性は期待に反するように見えるかもしれないが、個人への影響は深刻である。本稿では, 最終予測を超える仲裁性の効果, 保護グループに属する個人に対する仲裁性の違い, および, 様々な文脈にまたがってモノポリーを生成する単一アルゴリズムシステムの仲裁性に関わる課題など, 多重性から生じる様々な個人的関心事について検討する。これは、これらの懸念に関する実証的な調査と、法的な観点からの包括的な分析の両方を提供し、カナダの反差別法においてこれらの問題がどのように認識されているかに対処する。両分野の今後の研究方向性を明らかにするとともに,法的な要件を満たすためのモデル乗法と,現行法とモデル選択における任意性含意の法的ギャップの両面での技術的課題の議論を締めくくる。

Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introduces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.

翻訳日:2024-07-22 08:18:00 公開日:2024-05-28

# 先端メディア分析のためのメディアインサイトエンジン:ペットの健康診断のためのコンピュータビジョンイノベーションを事例として

Media Insights Engine for Advanced Media Analysis: A Case Study of a Computer Vision Innovation for Pet Health Diagnosis ( http://arxiv.org/abs/2407.13679v1 )

ライセンス: Link先を確認

Anjanava Biswas,

(参考訳) 本稿では,大手ペット小売業者であるPetcoが,Media Insights Engineを用いてペットの健康分析プロセスを革新し,初診までの時間を短縮したケーススタディを提案する。同社はこのフレームワークを利用して、ペットのビデオや画像の健康上の問題を特定し、事前に構築された獣医学診断でAIの結果を検証するなど、高度なコンピュータビジョンタスクのためのカスタムアプリケーションを構築した。 Media Insights Engineはモジュラーで拡張可能なソリューションを提供しており、Petcoはメディアワークロードのための機械学習アプリケーションを素早く構築できる。このフレームワークを利用することで、Petcoはプロジェクトの開発を加速し、ペットの健康分析の効率を改善し、最終的にペットの健康問題の最初の診断までの時間を短縮することができた。本稿では,メディアを用いたペットの健康分析の課題,メディアインサイトエンジンのメリット,およびこのフレームワークを用いたPetcoのカスタムアプリケーションのアーキテクチャについて論じる。

This paper presents a case study of how Petco, a leading pet retailer, innovated their pet health analysis processes using the Media Insights Engine to reduce the time to first diagnosis. The company leveraged this framework to build custom applications for advanced computer vision tasks, such as identifying potential health issues in pet videos and images, and validating AI outcomes with pre-built veterinary diagnoses. The Media Insights Engine provides a modular and extensible solution that enabled Petco to quickly build machine learning applications for media workloads. By utilizing this framework, Petco was able to accelerate their project development, improve the efficiency of their pet health analysis, and ultimately reduce the time to first diagnosis for pet health issues. This paper discusses the challenges of pet health analysis using media, the benefits of using the Media Insights Engine, and the architecture of Petco's custom applications built using this framework.

翻訳日:2024-07-22 08:07:30 公開日:2024-05-28

# スペイン語とLLMベンチマーク:MMLUは翻訳で失われたか?

Spanish and LLM Benchmarks: is MMLU Lost in Translation? ( http://arxiv.org/abs/2406.17789v1 )

ライセンス: Link先を確認

Irene Plaza, Nina Melero, Cristina del Pozo, Javier Conde, Pedro Reviriego, Marina Mayor-Rocher, María Grandury,

(参考訳) 大規模言語モデル(LLM)の評価は継続的な改善プロセスにおいて重要な要素であり、様々なタスクやトピックにおけるLLMの性能を評価するために多くのベンチマークが開発されている。 LLMが世界中で採用されるにつれて、英語以外の言語での評価がますます重要になっている。しかし、ほとんどのLLMベンチマークは自動化ツールを使用して単純に翻訳され、ターゲット言語で実行される。これは、その言語におけるLLMのパフォーマンスだけでなく、翻訳の質にも依存することを意味する。本稿では,MMLU(Massive Multitask Language Understanding)ベンチマークについて考察する。ベンチマークの選択されたカテゴリは、Azure TranslatorとChatGPT4を使用してスペイン語に変換され、ChatGPT4上で動作する。次に、結果は、スペイン語と英語で異なる回答を生成するテスト項目を特定するために処理される。それらは手動で分析され、自動翻訳が変更を引き起こしたかどうかが分かる。その結果, フェールした項目のかなりの部分は, ベンチマークの翻訳の誤りに起因することがわかった。これらの結果は、少なくとも項目の翻訳を改訂し、好ましくは、専門家が対象言語にテストを適用することで、英語以外の言語でのベンチマークを改善することが強く主張される。

The evaluation of Large Language Models (LLMs) is a key element in their continuous improvement process and many benchmarks have been developed to assess the performance of LLMs in different tasks and topics. As LLMs become adopted worldwide, evaluating them in languages other than English is increasingly important. However, most LLM benchmarks are simply translated using an automated tool and then run in the target language. This means that the results depend not only on the LLM performance in that language but also on the quality of the translation. In this paper, we consider the case of the well-known Massive Multitask Language Understanding (MMLU) benchmark. Selected categories of the benchmark are translated into Spanish using Azure Translator and ChatGPT4 and run on ChatGPT4. Next, the results are processed to identify the test items that produce different answers in Spanish and English. Those are then analyzed manually to understand if the automatic translation caused the change. The results show that a significant fraction of the failing items can be attributed to mistakes in the translation of the benchmark. These results make a strong case for improving benchmarks in languages other than English by at least revising the translations of the items and preferably by adapting the tests to the target language by experts.

翻訳日:2024-07-01 06:21:45 公開日:2024-05-28

# Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification (英語)

Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification ( http://arxiv.org/abs/2406.17790v1 )

ライセンス: Link先を確認

Areeg Fahad Rasheed, M. Zarkoosh,

(参考訳) 数ショットの学習の中で、ICL(In-context Learning)は、少量のデータや、大規模なデータセットのトレーニングモデルが禁止されているリソース制約のある環境でのモデルパフォーマンスを改善するために、コンテキスト情報を活用する潜在的な方法となっている。しかし,数ショットで選択した試料の品質はICLの有用性を著しく制限した。本研究の主な目的は,数ショットの学習シナリオにおいて,高品質なサンプルを選択することで,文脈内学習の評価指標の性能を向上させることである。我々は,高品質試料を同定するために2乗検定を用い,低品質試料を用いて得られた試料と比較した。これらの結果から, 高品質な試料の利用により, 評価指標のすべてに対して, 性能が向上することが示唆された。

Within few-shot learning, in-context learning (ICL) has become a potential method for leveraging contextual information to improve model performance on small amounts of data or in resource-constrained environments where training models on large datasets is prohibitive. However, the quality of the selected sample in a few shots severely limits the usefulness of ICL. The primary goal of this paper is to enhance the performance of evaluation metrics for in-context learning by selecting high-quality samples in few-shot learning scenarios. We employ the chi-square test to identify high-quality samples and compare the results with those obtained using low-quality samples. Our findings demonstrate that utilizing high-quality samples leads to improved performance with respect to all evaluated metrics.

翻訳日:2024-07-01 06:21:45 公開日:2024-05-28

# SelMatch: 選択に基づく初期化とトラジェクトリマッチングによる部分更新によるデータセット蒸留の効果的スケールアップ

SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching ( http://arxiv.org/abs/2406.18561v1 )

ライセンス: Link先を確認

Yongmin Lee, Hye Won Chung,

(参考訳) データセット蒸留は、大規模なデータセットからクラス毎の少数の画像(IPC)を合成し、パフォーマンス損失を最小限に抑えた完全なデータセットトレーニングを近似することを目的としている。非常に小さなIPC範囲では有効であるが、多くの蒸留法はIPCの増加に伴い、ランダムなサンプル選択が劣るほど効果が低下する。各種ICCスケールのトラジェクトリマッチングに基づく蒸留法について検討した結果,ICCが増加しても,より硬い試料の複雑で稀な特徴を合成データセットに組み込むことに苦慮していることが明らかとなった。そこで本研究では,IPCで効果的にスケールする新しい蒸留法であるSelMatchを紹介する。 SelMatchは、選択ベースの初期化とトラジェクトリマッチングによる部分的な更新を使用して、PCスケールに合わせて、合成データセットの望ましい困難レベルを管理する。 CIFAR-10/100とTinyImageNetでテストすると、SelMatchは5%から30%のサブセット比で、選択のみおよび蒸留のみの手法で、常にパフォーマンスが向上する。

Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation methods across various IPC scales reveals that these methods struggle to incorporate the complex, rare features of harder samples into the synthetic dataset even with the increased IPC, resulting in a persistent coverage gap between easy and hard test samples. Motivated by such observations, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset's desired difficulty level tailored to IPC scales. When tested on CIFAR-10/100 and TinyImageNet, SelMatch consistently outperforms leading selection-only and distillation-only methods across subset ratios from 5% to 30%.

翻訳日:2024-07-01 06:00:20 公開日:2024-05-28

# 機能拡張によるSSLの改善

Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation ( http://arxiv.org/abs/2406.18562v1 )

ライセンス: Link先を確認

Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan, Marzyeh Ghassemi,

(参考訳) 教師付き学習手法は、より単純な特徴を優先する帰納的バイアスを示す。このような特徴がラベルと急激な相関がある場合、これは少数部分群における最適以下のパフォーマンスをもたらす可能性がある。ラベルのないデータから学習する手法の普及にもかかわらず、これらの表現が予測の急激な特徴に依存している範囲は不明確である。本研究では,視覚表現学習における自己監督学習(SSL)に対する刺激的特徴の影響について検討する。最初に、SSLで一般的に使われている拡張は、画像空間において望ましくない不変性を引き起こすことを実証的に示し、これを簡単な例で説明します。さらに、SSL中のデータセット再サンプリングなど、突発的な相関に対処する古典的なアプローチは、不変表現を一貫して導くものではないことを示す。これらの知見に触発されて、我々は、プルーニングによりエンコーダの後の層を規則化することにより、事前学習中にこれらの表現からスプリアス情報を除去するLateTVGを提案する。本手法は,SSL中にグループ情報やラベル情報を必要とせずに,複数のベンチマークのベースラインよりも優れた表現を生成する。

Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which learn from unlabeled data, the extent to which these representations rely on spurious features for prediction is unclear. In this work, we explore the impact of spurious features on Self-Supervised Learning (SSL) for visual representation learning. We first empirically show that commonly used augmentations in SSL can cause undesired invariances in the image space, and illustrate this with a simple example. We further show that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representations. Motivated by these findings, we propose LateTVG to remove spurious information from these representations during pre-training, by regularizing later layers of the encoder via pruning. We find that our method produces representations which outperform the baselines on several benchmarks, without the need for group or label information during SSL.

翻訳日:2024-07-01 06:00:20 公開日:2024-05-28

# 光学系を用いた重力誘起絡み合いの可能性

Feasible generation of gravity-induced entanglement by using optomechanical systems ( http://arxiv.org/abs/2406.04361v1 )

ライセンス: Link先を確認

Daisuke Miki, Akira Matsumura, Kazuhiro Yamamoto,

(参考訳) 本研究は,S/N=1の信号対雑音比を達成するための実験パラメータを明らかにするための,光学系による重力誘起絡み(GIE)の検出の可能性について報告する。提案手法は,重力波観測の分野で成熟した連続測定,フィードバック制御,カルマンフィルタリングプロセスにおいて,重力相互作用を介して結合された光学鏡間のGIE生成に焦点を当てる。我々は、運動の最小分散を推定する光学鏡の条件共分散行列の時間発展を評価するために、リカティ方程式を解いた。その結果、GIEはオプティメカルカップリングを伴わないよく知られた時間スケールよりも高速に生成されることが示された。高速な絡み合いの発生はカルマンフィルター法(英語版)による量子状態のスクイージング(英語版)と関連しており、これは光学系を用いて実験的にGIEを検出する利点である。

We report the feasibility of detecting the gravity-induced entanglement (GIE) with optomechanical systems, which is the first investigation that clarifies the feasible experimental parameters to achieve a signal-to-noise ratio of S/N=1. Our proposal focuses on GIE generation between optomechanical mirrors, coupled via gravitational interactions, under continuous measurement, feedback control, and Kalman filtering process, which matured in connection with the field of gravitational wave observations. We solved the Riccati equation to evaluate the time evolution of the conditional covariance matrix for optomechanical mirrors that estimated the minimum variance of the motions. The results demonstrate that GIE is generated faster than a well-known time scale without optomechanical coupling. The fast generation of entanglement is associated with quantum-state squeezing by the Kalman filtering process, which is an advantage of using optomechanical systems to experimentally detect GIE.

翻訳日:2024-06-23 14:05:12 公開日:2024-05-28

# スピン1型ウンルー・デ・ウィット検出器の研究

A study of the spin 1 Unruh-De Witt detectors ( http://arxiv.org/abs/2406.04362v1 )

ライセンス: Link先を確認

F. M. Guedes, M. S. Guimaraes, I. Roditi, S. P. Sorella,

(参考訳) 相対論的スカラー量子場と相互作用するスピン1のウンルー・デ・ウィット検出器について述べる。フィールドモードを追尾した後、Bell-CHSH不等式の不等式を調査するために、2部分石英系の密度行列を用いた。スピン1/2$の場合とは異なり、スピン1/2$の場合、量子場の効果によって違反の大きさが小さくなる。この効果は、ツイレルソンの境界が四重項の場合、飽和していないという事実に起因している。

A study of the spin 1 Unruh-De Witt detectors interacting with a relativistic scalar quantum field is presented. After tracing out the field modes, the resulting density matrix for a bipartite qutrit system is employed to investigate the violation of the Bell-CHSH inequality. Unlike the case of spin $1/2$, for which the effects of the quantum field result in a decreasing of the size of violation, in the case of spin $1$ both decreasing and increasing of the violation may occur. This effect is ascribed to the fact that Tsirelson's bound is not saturated in the case of qutrits.

翻訳日:2024-06-23 14:05:12 公開日:2024-05-28

# シミュレーションアニーリングを用いたTPMS設計材料の機械学習駆動最適化

Machine Learning-Driven Optimization of TPMS Architected Materials Using Simulated Annealing ( http://arxiv.org/abs/2406.05142v1 )

ライセンス: Link先を確認

Akshansh Mishra,

(参考訳) 本研究は,3つの周期曲面(TPMS)構造の引張応力を機械学習とシミュレート・アニーリング(SA)により最適化する新しい手法を提案する。本研究は, TPMSモデルの有限要素解析から得られたデータセットを用いて, 応力予測におけるランダムフォレスト, 決定木およびXGBoostモデルの性能を評価する。対象関数はモデルの精度を高めるために検証セット上の負のR二乗値を最小化した。 SA-XGBoostモデルは他のモデルよりも優れており、R2乗値は0.96である。対照的に、SA-Random ForestモデルではR2乗が0.89であり、SA-Decision Treeモデルでは検証スコアの変動が大きくなった。これは、SA-XGBoostモデルがデータ内の複雑な関係を捉えるのに最も効果的であることを示している。 SAの統合は、これらの機械学習モデルのハイパーパラメータを最適化し、予測能力を向上するのに役立つ。

The research paper presents a novel approach to optimizing the tensile stress of Triply Periodic Minimal Surface (TPMS) structures through machine learning and Simulated Annealing (SA). The study evaluates the performance of Random Forest, Decision Tree, and XGBoost models in predicting tensile stress, using a dataset generated from finite element analysis of TPMS models. The objective function minimized the negative R-squared value on the validation set to enhance model accuracy. The SA-XGBoost model outperformed the others, achieving an R-squared value of 0.96. In contrast, the SA-Random Forest model achieved an R squared value of 0.89 while the SA-Decision Tree model exhibited greater fluctuations in validation scores. This demonstrates that the SA-XGBoost model is most effective in capturing the complex relationships within the data. The integration of SA helps in optimizing the hyperparameters of these machine learning models, thereby enhancing their predictive capabilities.

翻訳日:2024-06-23 13:55:28 公開日:2024-05-28

# カーネル密度推定を用いた機械学習モデルの領域決定:材料特性予測への応用

Determining Domain of Machine Learning Models using Kernel Density Estimates: Applications in Materials Property Prediction ( http://arxiv.org/abs/2406.05143v1 )

ライセンス: Link先を確認

Lane E. Schultz, Yiqi Wang, Ryan Jacobs, Dane Morgan,

(参考訳) 機械学習モデルの適用可能性のドメインに関する知識は、正確で信頼性の高いモデル予測を保証するために不可欠である。本研究では,モデル領域の評価を行う新しい手法を開発し,複数のモデルタイプおよび材料特性データセットに適用した場合に,ドメイン内とドメイン外との正確な識別が可能であることを示す。提案手法は,カーネル密度推定を用いて特徴空間におけるテスト点とトレーニング点の距離を評価し,この距離が領域決定に有効なツールであることを示す。確立された化学知識に基づく無関係と判断された化学物質群は,本測定値と有意な相違が認められた。また, 相違度の高い尺度は, モデル性能の低さ(残留度が高い)とモデル不確実性の低さ(信頼できない不確実性推定)に関連していることを示した。機械学習モデルの新たな予測がドメイン内なのかドメイン外なのかを識別するために、研究者が許容される相違しきい値を確立するための自動化ツールが提供される。

Knowledge of the domain of applicability of a machine learning model is essential to ensuring accurate and reliable model predictions. In this work, we develop a new approach of assessing model domain and demonstrate that our approach provides accurate and meaningful designation of in-domain versus out-of-domain when applied across multiple model types and material property data sets. Our approach assesses the distance between a test and training data point in feature space by using kernel density estimation and shows that this distance provides an effective tool for domain determination. We show that chemical groups considered unrelated based on established chemical knowledge exhibit significant dissimilarities by our measure. We also show that high measures of dissimilarity are associated with poor model performance (i.e., high residual magnitudes) and poor estimates of model uncertainty (i.e., unreliable uncertainty estimation). Automated tools are provided to enable researchers to establish acceptable dissimilarity thresholds to identify whether new predictions of their own machine learning models are in-domain versus out-of-domain.

翻訳日:2024-06-23 13:55:28 公開日:2024-05-28

# シャープ比の最適化:マルチアーマッドバンドにおけるリスク調整型意思決定

Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits ( http://arxiv.org/abs/2406.06552v1 )

ライセンス: Link先を確認

Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak,

(参考訳) シャープ比率(SR)は金融時系列の特徴付けにおいて重要なパラメータであり、変動を通じて株/ポートフォリオの報酬とボラティリティを共同で検討している。最高の専門家であるEven-Dar et al (2006)に対して、オフラインポリシーでさえ常に後悔を経験しているため、SRを最適化するためのオンラインアルゴリズムの導出は特に困難である。したがって、通常の SR の定義を最適化する代わりに、正規化された正方形 SR (RSSR) を最適化する。 RSSRの2つの設定、Regret Minimization(RM)とBest Arm Identification(BAI)について検討する。そこで本研究では,UCB-RSSR と呼ばれる RM の RSSR 最大化のための新しいマルチアーム・バンディット (MAB) アルゴリズムを提案する。 RSSRの推定値に対して経路依存濃度を導出する。このことから, UCB-RSSR の反証を導出し, 水平 n で演奏される二本腕のバンディットケースの O(log n) として進化することを示す。また、よく知られたBAIアルゴリズム、すなわちシーケンシャル半減と逐次リジェクションの固定予算設定も検討し、SHVV、SHSR、SuRSRアルゴリズムを提案する。提案した全てのBAIアルゴリズムの誤差確率の上限を導出する。 UCB-RSSRは、他のSR最適化バンディットアルゴリズムであるU-UCB Cassel et al(2023)よりも優れていることを示す。また, GRA-UCB および MVTS アルゴリズムから得られた他のベンチマークに対して有効性を確立する。さらに、複数の異なる設定に対して提案したBAIアルゴリズムの性能を実証する。我々の研究は、提案アルゴリズムがリスク対応ポートフォリオ管理問題に広範な応用を見出すことを強調している。その結果,提案アルゴリズムはリスク対応ポートフォリオ管理問題に広範な応用が期待できることがわかった。

Sharpe Ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its variance. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert Even-Dar et al (2006). Thus, instead of optimizing the usual definition of SR, we optimize regularized square SR (RSSR). We consider two settings for the RSSR, Regret Minimization (RM) and Best Arm Identification (BAI). In this regard, we propose a novel multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as O(log n) for the two-armed bandit case played for a horizon n. We also consider a fixed budget setting for well-known BAI algorithms, i.e., sequential halving and successive rejects, and propose SHVV, SHSR, and SuRSR algorithms. We derive the upper bound for the error probability of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We also establish its efficacy with respect to other benchmarks derived from the GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed BAI algorithms for multiple different setups. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems. Consequently, our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.

翻訳日:2024-06-23 13:55:28 公開日:2024-05-28

# マルチドメインテキスト分類のための確率的逆ネットワーク

Stochastic Adversarial Networks for Multi-Domain Text Classification ( http://arxiv.org/abs/2406.00044v1 )

ライセンス: Link先を確認

Xu Wang, Yuan Wu,

(参考訳) 敵対的訓練は多領域テキスト分類(MDTC)の進展に役立っている。 MDTC法は伝統的に、ドメイン不変知識のための共有特徴抽出器と、ドメイン固有知識のための個別特徴抽出器を備えた共有プライベートパラダイムを用いている。最先端の結果を得たにもかかわらず、これらの手法は、新しいドメインの連続的な追加によるモデルパラメータのエスカレーションに対応している。この課題に対処するために、従来の重みベクトルとは対照的に、ドメイン固有の特徴抽出器のパラメータを多変量ガウス分布として革新的にモデル化するSAN(Stochastic Adversarial Network)を導入する。この設計により、モデルパラメータを大幅に増加させることなく、多数のドメイン固有の特徴抽出器を生成でき、モデルのサイズは単一のドメイン固有の抽出器と同等に維持できる。さらに, ドメインラベルのスムース化とロバストな擬似ラベル正規化を併用して, 対人訓練の安定性と特徴識別性を向上する。 2つの主要なMDTCベンチマークで評価したSANの性能は、現在の最先端手法に対する競争優位性を示している。コードはhttps://github.com/wangxu0820/SANで公開されている。

Adversarial training has been instrumental in advancing multi-domain text classification (MDTC). Traditionally, MDTC methods employ a shared-private paradigm, with a shared feature extractor for domain-invariant knowledge and individual private feature extractors for domain-specific knowledge. Despite achieving state-of-the-art results, these methods grapple with the escalating model parameters due to the continuous addition of new domains. To address this challenge, we introduce the Stochastic Adversarial Network (SAN), which innovatively models the parameters of the domain-specific feature extractor as a multivariate Gaussian distribution, as opposed to a traditional weight vector. This design allows for the generation of numerous domain-specific feature extractors without a substantial increase in model parameters, maintaining the model's size on par with that of a single domain-specific extractor. Furthermore, our approach integrates domain label smoothing and robust pseudo-label regularization to fortify the stability of adversarial training and to refine feature discriminability, respectively. The performance of our SAN, evaluated on two leading MDTC benchmarks, demonstrates its competitive edge against the current state-of-the-art methodologies. The code is available at https://github.com/wangxu0820/SAN.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# 大規模言語モデルのパーソナライズされたステアリング:双方向選好最適化によるヴァーサタイルステアリングベクトル

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization ( http://arxiv.org/abs/2406.00045v1 )

ライセンス: Link先を確認

Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen,

(参考訳) 研究者は、Large Language Models(LLM)の振る舞いを制御し、様々なアプリケーションに適したパーソナライズされたLLMを構築するためのアプローチを研究してきた。微調整は直接的な解決策であるように見えるが、かなりの計算資源が必要であり、元のLLMの実用性に大きな影響を及ぼす可能性がある。最近の取り組みはより軽量な戦略を導入し、LLMのトランスフォーマーアーキテクチャの特定の層内でのアクティベーションを調整することで、モデル出力を望ましい振る舞いに導く「ステアリングベクトル」の抽出に重点を置いている。しかし、そのようなステアリングベクトルは人間の嗜好データのアクティベートから直接抽出され、特にアライメントに関連するシナリオにおいて、しばしば最適以下の結果と時折失敗につながる。この研究は、双方向の選好最適化によってより効果的なステアリングベクトルを生み出すことができる革新的なアプローチを提案する。提案手法は, ステアリングベクトルが人間の嗜好データペアの生成確率に直接影響し, 対象行動のより正確に表現できるように設計されている。ステアリングベクトルの方向と大きさを慎重に調整することにより、所望の動作を様々な強度でパーソナライズした制御を可能にした。様々なオープンエンド世代タスク、特にAIペルソナのステアリングに焦点を当てた大規模な実験が、我々のアプローチの有効性を検証した。さらに、真理性の管理、幻覚の緩和、脱獄攻撃への対処など、重要なアライメントのシナリオを包括的に調査する。興味深いことに,本手法はこれらのシナリオにおいて優れたステアリング効果を示すことができる。さらに、異なるモデル/LoRA間のステアリングベクトルの転送可能性を示し、同時に複数のベクトルを適用することの相乗効果を強調した。

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior. By carefully adjusting the direction and magnitude of the steering vector, we enabled personalized control over the desired behavior across a spectrum of intensities. Extensive experimentation across various open-ended generation tasks, particularly focusing on steering AI personas, has validated the efficacy of our approach. Moreover, we comprehensively investigate critical alignment-concerning scenarios, such as managing truthfulness, mitigating hallucination, and addressing jailbreaking attacks. Remarkably, our method can still demonstrate outstanding steering effectiveness across these scenarios. Furthermore, we showcase the transferability of our steering vectors across different models/LoRAs and highlight the synergistic benefits of applying multiple vectors simultaneously.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# 一般化可能な目標認識フェアネスによるヘイトスピーチ検出

Hate Speech Detection with Generalizable Target-aware Fairness ( http://arxiv.org/abs/2406.00046v1 )

ライセンス: Link先を確認

Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca Demartini, Hongzhi Yin,

(参考訳) ソーシャルメディアプラットフォームの普及による副作用に対抗するため、ヘイトスピーチ検出(HSD)は、早期に有害なオンライン投稿の拡散を阻止する重要な役割を担っている。しかし、ソーシャルメディア上で広く普及している話題コミュニティを考えると、訓練されたHSD分類器は特定の対象グループ(例えば、女性や黒人)に偏りやすくなり、偽陽性/陰性の結果が、コンテンツモデレーション機構の公正性に対する公衆の信頼を著しく損なうことになり、最終的にはオンライン社会の多様性を損なうことになる。既存のフェアネスを意識したHSD法は、対象とするグループ間でのいくつかの相違を緩和することができるが、それらは主に、既知の、固定されたと思われるターゲットの狭い選択に特化している。これにより、新たなターゲットグループが常に時間とともに出現する現実世界のユースケースへの一般化が必然的に防止される。この欠陥に対処するために、我々は、推論中に多様で見えざるターゲットを含む各ポストを適切に分類する新しい方法であるGeneralizable target-aware Fairness (GetFair)を提案する。ターゲット関連の機能に対するHSD分類器の急激な依存を取り除くため、GetFairは、フィルタされたポスト埋め込みからターゲットグループを回復する識別器を欺くために、対向パイプラインで一連のフィルタ関数を訓練する。拡張性と一般化性を維持するため、ターゲット間のセマンティック親和性によって正規化されるハイパーネットワークを用いて、全てのフィルタ関数を革新的にパラメータ化する。ターゲットの事前訓練された単語を入力として埋め込み、ハイパーネットワークは専用のフィルタパラメータを格納することなく、各ターゲット固有のフィルタがオンザフライで使用する重みを生成する。最後に、2つのHSDデータセットの比較実験では、サンプル外のターゲットでGetFairのパフォーマンスが有利であることが示されている。

To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge over time. To tackle this defect, we propose Generalizable target-aware Fairness (GetFair), a new method for fairly classifying each post that contains diverse and even unseen targets during inference. To remove the HSD classifier's spurious dependence on target-related features, GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator that recovers the targeted group from filtered post embeddings. To maintain scalability and generalizability, we innovatively parameterize all filter functions via a hypernetwork that is regularized by the semantic affinity among targets. Taking a target's pretrained word embedding as input, the hypernetwork generates the weights used by each target-specific filter on-the-fly without storing dedicated filter parameters. Finally, comparative experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# シュロディンガー方程式に対するフローベース解の効率的な正規化のための理論的枠組み

A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Schrodinger Equation ( http://arxiv.org/abs/2406.00047v1 )

ライセンス: Link先を確認

Daniel Freedman, Eyal Rozenberg, Alex Bronstein,

(参考訳) 量子力学における中心的な問題は、分子や物質に対する電子シュロディンガー方程式を解くことである。この問題に対する変分モンテカルロのアプローチはサンプリングによって特定の変分対象を近似し、アンザッツとして知られるパラメータ化された波動関数の族よりもこの近似対象を最適化する。近年、ニューラルネットワークがアンザッツとして使われ、成功している。しかし、そのような波動関数からのサンプリングにはマルコフ・チェイン・モンテカルロのアプローチが必要であり、これは本質的に非効率である。そこで本研究では,アンザッツによる解法を提案する。アンザッツは安価で,必要な量子力学的性質を満足する。以下の2つの必須成分を用いた正規化フローが我々の要求を満たすことを証明している。 a) 決定的点過程から構築された基礎分布 b) 置換群の特定の部分群に同値なフロー層。次に、必要等式を満たす連続正規化フローと離散正規化フローの両方を構築する方法を示す。さらに、波動関数の非滑らかな性質(尖点)を捉える方法や、フレームワークが複数の分子をまたいだ誘導を提供するためにどのように一般化されるかを示す。結果として生じる理論的枠組みは電子シュロディンガー方程式を解くための効率的なアプローチを必要とする。

A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature ("cusps") of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrodinger Equation.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# 深層ニューラルネットワークによる言語構造獲得の理論

Towards a theory of how the structure of language is acquired by deep neural networks ( http://arxiv.org/abs/2406.00048v1 )

ライセンス: Link先を確認

Francesco Cagnetta, Matthieu Wyart,

(参考訳) 言語の構造を学ぶのにどのくらいのデータが必要か? 本研究では,確率論的文脈自由文法(PCFG)を用いて生成した合成データセットについて検討する。モデルを用いてトークンとトークンの相関関係を解析的に決定し,文法の隠れ変数を表現できることを示す。さらに、有限トレーニングセットは、相関の分解を、トレーニングセットのサイズが大きくなる有効範囲に制限する。結果として、多くの例で訓練された言語モデルは、文法の構造をより深く表現することができるため、問題の高次元性にもかかわらず、優れた性能を達成することができる。トレーニングセットのサイズと効果的な相関範囲の関係は、我々の合成データセットを超えていると推測する。特に,本予想では,学習セットサイズによるテスト損失行動のスケーリング法則がコンテキストウィンドウの長さに依存するのかを予測し,シェイクスピアの戯曲からの行の収集を実証的に確認する。

How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model that captures the tree-like structure of natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a representation of the grammar's hidden variables, the longer the range the deeper the variable. In addition, a finite training set limits the resolution of correlations to an effective range, whose size grows with that of the training set. As a result, a Language Model trained with increasingly many examples can build a deeper representation of the grammar's structure, thus reaching good performance despite the high dimensionality of the problem. We conjecture that the relationship between training set size and effective range of correlations holds beyond our synthetic datasets. In particular, our conjecture predicts how the scaling law for the test loss behaviour with training set size depends on the length of the context window, which we confirm empirically for a collection of lines from Shakespeare's plays.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# QUEST: 機械翻訳のための品質に配慮したメトロポリス・ハスティング

QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation ( http://arxiv.org/abs/2406.00049v1 )

ライセンス: Link先を確認

Gonçalo R. A. Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José G. C. de Souza, André F. T. Martins,

(参考訳) 機械翻訳(MT)における重要な課題は、高品質で多様な翻訳を生成することである。 MTモデルから推定される推定推定値は,翻訳品質と相関が低いことを示す。対照的に、品質評価指標(COMETやBLEURTなど)は、人間の判断と高い相関を示し、リランカーとしての使用(品質認識やベイズリスクの最小化など)を動機付けている。しかし、高い推定品質の単一翻訳に依存すると、「メートル法をゲームする」可能性が高まる。本稿では,高品質で多様な翻訳の集合をサンプリングする問題に対処する。ギブス分布のエネルギー関数として利用することで、ノイズ品質推定の過度な信頼を回避するための簡便で効果的な方法を提供する。分布のモードを探す代わりに、簡単なマルコフ連鎖モンテカルロアプローチであるメトロポリス・ハスティングスアルゴリズムを用いて高密度領域から複数のサンプルを生成する。その結果,提案手法は複数の言語対 (英:$\leftrightarrow${German, Russian}) に対して,2つの強いデコーダのみのLLM (Alma-7b, Tower-7b) を持つ高品質で多様な出力をもたらすことがわかった。

An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution. Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm, a simple Markov chain Monte Carlo approach. The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (English$\leftrightarrow${German, Russian}) with two strong decoder-only LLMs (Alma-7b, Tower-7b).

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# デュアルプロセス学習:重み付けによるインコンテキスト対インウェイト戦略の利用制御

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting ( http://arxiv.org/abs/2406.00053v1 )

ライセンス: Link先を確認

Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick,

(参考訳) 言語モデルには、コンテキスト内学習(ICL)を実行する能力があり、コンテキストに基づいた振る舞いを柔軟に適応させることができる。これは、データの反復的な観察から、情報がモデルパラメータに静的に符号化される、重み付き学習とは対照的である。このようなコンテキスト内で学習する能力にもかかわらず、言語モデルは目に見えないか、まれに現れるトークンに直面したときに苦労することが知られている。したがって、$\textbf{structureural in-context learning}$を、任意のトークン上でコンテキスト内学習を実行するモデルの能力として定義する。理想的なモデルは、柔軟に in-weights 操作をデプロイ(エンコードされたセマンティック情報を使ってあいまいさや未知のコンテキストを堅牢に適合させるために)し、構造的 in-context 操作(新しいトークンに対応するために)を行うことができる。実演モデルと玩具モデルの両方を用いて、単純な音声設定における構造的インコンテキストアルゴリズムについて検討する。モデルが新しい言語に一般化するのを助けるために最近導入された手法である能動的忘れ字法は、構造的コンテキスト内学習ソリューションを採用するようモデルに強制する。最後に、$\textbf{temporary forgetting}$を紹介します。これは、モデルがインウェイトとインコンテキストソリューションにどれだけ依存するかを制御できる、アクティブな忘れの直接的な拡張です。重要なことは、一時的忘れることによって、$\textit{dual process strategy}$を誘導することができます。

Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens. Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than semantic content encoded in token embeddings. An ideal model would be able to do both: flexibly deploy in-weights operations (in order to robustly accommodate ambiguous or unknown contexts using encoded semantic information) and structural in-context operations (in order to accommodate novel tokens). We study structural in-context algorithms in a simple part-of-speech setting using both practical and toy models. We find that active forgetting, a technique that was recently introduced to help models generalize to new languages, forces models to adopt structural in-context learning solutions. Finally, we introduce $\textbf{temporary forgetting}$, a straightforward extension of active forgetting that enables one to control how much a model relies on in-weights vs. in-context solutions. Importantly, temporary forgetting allows us to induce a $\textit{dual process strategy}$ where in-context and in-weights solutions coexist within a single model.

翻訳日:2024-06-09 15:59:42 公開日:2024-05-28

# 文脈的類似性を用いた判断行動検索

Judgement Citation Retrieval using Contextual Similarity ( http://arxiv.org/abs/2406.01609v1 )

ライセンス: Link先を確認

Akshat Mohan Dasula, Hrushitha Tigulla, Preethika Bhukya,

(参考訳) 伝統的に、法律研究の分野では、複雑な事例記述からの関連する引用の検索は、法的用語を理解する専門知識を委任する手作業やキーワードベースの検索アプリケーションを必要としている。法的ケース記述は、法律専門家や研究者にとって重要な情報を保持し、より効率的で自動化されたアプローチを必要とする。本稿では,自然言語処理(NLP)と機械学習技術を組み合わせて,訴訟記述の組織化と活用を促進する手法を提案する。このアプローチは、最先端の埋め込みモデルの助けを借りて、テキスト埋め込みの作成を中心に展開される。提案手法は,非教師付きクラスタリングと教師付き引用検索の2つの主要な目的に対処する。提案手法は任意のデータセットに使用することができるが,米国最高裁判所(SCOTUS)データセットを用い,顕著な結果を得た。我々の手法は90.9%という驚くべき精度を達成した。労働集約的なプロセスを自動化することによって、法律研究においてより効率的で時間節約し、アクセスしやすくする方法を開拓し、法律専門家、学者、研究者に恩恵を与えます。

Traditionally in the domain of legal research, the retrieval of pertinent citations from intricate case descriptions has demanded manual effort and keyword-based search applications that mandate expertise in understanding legal jargon. Legal case descriptions hold pivotal information for legal professionals and researchers, necessitating more efficient and automated approaches. We propose a methodology that combines natural language processing (NLP) and machine learning techniques to enhance the organization and utilization of legal case descriptions. This approach revolves around the creation of textual embeddings with the help of state-of-art embedding models. Our methodology addresses two primary objectives: unsupervised clustering and supervised citation retrieval, both designed to automate the citation extraction process. Although the proposed methodology can be used for any dataset, we employed the Supreme Court of The United States (SCOTUS) dataset, yielding remarkable results. Our methodology achieved an impressive accuracy rate of 90.9%. By automating labor-intensive processes, we pave the way for a more efficient, time-saving, and accessible landscape in legal research, benefiting legal professionals, academics, and researchers.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-28

# FinEmbedDiff:マルチモーダル埋め込みモデルを用いたベクトルサンプリングによる財務文書分類の費用効果

FinEmbedDiff: A Cost-Effective Approach of Classifying Financial Documents with Vector Sampling using Multi-modal Embedding Models ( http://arxiv.org/abs/2406.01618v1 )

ライセンス: Link先を確認

Anjanava Biswas, Wrick Talukdar,

(参考訳) テキスト、表、チャート、画像を含むマルチモーダル財務文書の正確な分類は極めて重要であるが、難しい。従来のテキストベースのアプローチは、これらの文書の複雑なマルチモーダルな性質を捉えるのに失敗することが多い。本研究では,FinEmbedDiffを提案する。FinEmbedDiffは,事前学習したマルチモーダル埋め込みモデルを利用して財務文書を分類する,コスト効率の高いベクトルサンプリング手法である。提案手法は,文書に対するマルチモーダル埋め込みベクトルを生成し,ベクトル類似度を用いた事前計算されたクラス埋め込みと比較する。大規模なデータセットに基づいて評価したFinEmbedDiffは、最先端のベースラインと比較して、競合する分類精度を実現し、計算コストを大幅に削減する。この方法は強力な一般化能力を示し、現実の金融アプリケーションにとって実用的でスケーラブルなソリューションである。

Accurate classification of multi-modal financial documents, containing text, tables, charts, and images, is crucial but challenging. Traditional text-based approaches often fail to capture the complex multi-modal nature of these documents. We propose FinEmbedDiff, a cost-effective vector sampling method that leverages pre-trained multi-modal embedding models to classify financial documents. Our approach generates multi-modal embedding vectors for documents, and compares new documents with pre-computed class embeddings using vector similarity measures. Evaluated on a large dataset, FinEmbedDiff achieves competitive classification accuracy compared to state-of-the-art baselines while significantly reducing computational costs. The method exhibits strong generalization capabilities, making it a practical and scalable solution for real-world financial applications.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-28

# PPOベースの言語モデルはハック可能か?

Are PPO-ed Language Models Hackable? ( http://arxiv.org/abs/2406.02577v1 )

ライセンス: Link先を確認

Suraj Anand, David Getzen,

(参考訳) 好ましくない振る舞いを取り除くために、$\textit{align}$言語モデルに多くのアルゴリズムが提案されている。しかし、非常に大きな州空間と適切な報酬関数を作成することに関連する課題は、しばしば様々なジェイルブレイクを引き起こす。本稿では,肯定的な感情言語生成の制御における報酬の効果を検討することを目的とする。人間のフィードバックに基づく報酬モデルのオンライントレーニングの代わりに、静的学習された感情分類器を用いる。また、トレーニング後にモデルの重みとアクティベーションがエンドユーザに露出する環境についても検討する。近位政策最適化(PPO)の前後の機械的解釈可能性のレンズを用いて,事前学習したGPT-2を検証し,肯定的な感情応答を促進させた。これらの知見を用いて、(1)PPO-edモデルを「ハック」して負の感情反応を生成し、(2)報酬関数に項を加えて「負の」重みを変えようとする。

Numerous algorithms have been proposed to $\textit{align}$ language models to remove undesirable behaviors. However, the challenges associated with a very large state space and creating a proper reward function often result in various jailbreaks. Our paper aims to examine this effect of reward in the controlled setting of positive sentiment language generation. Instead of online training of a reward model based on human feedback, we employ a statically learned sentiment classifier. We also consider a setting where our model's weights and activations are exposed to an end-user after training. We examine a pretrained GPT-2 through the lens of mechanistic interpretability before and after proximal policy optimization (PPO) has been applied to promote positive sentiment responses. Using these insights, we (1) attempt to "hack" the PPO-ed model to generate negative sentiment responses and (2) add a term to the reward function to try and alter `negative' weights.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-28

# フェアLLMの不可能性

The Impossibility of Fair LLMs ( http://arxiv.org/abs/2406.03198v1 )

ライセンス: Link先を確認

Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Alexander D'Amour, Chenhao Tan,

(参考訳) 公正なAIの必要性は、ChatGPTやGemini、その他の大規模言語モデル(LLM)といった汎用システムの時代において、ますます明確になっている。しかしながら、人間とAIの相互作用の複雑さの増大とその社会的影響は、どのように公正性標準を適用することができるのかという疑問を提起している。本稿では、機械学習研究者が、グループフェアネスやフェア表現など、フェアネスを評価するのに用いた技術的枠組みを概観し、LLMへの適用には固有の制約があることを見出した。それぞれのフレームワークがLLMに論理的に拡張していないか、あるいはLLMにとって難解な公平性の概念を提示しているかを示す。これらの課題に対処するため、我々は、特にユースケースにおいて公正を達成するためのより現実的な目標、すなわち、コンテキストの臨界性、LLM開発者の責任、そして、設計と評価の反復的なプロセスにおけるステークホルダーの参加の必要性に関するガイドラインを開発する。さらに、最終的には、スケーラブルなAIアシストアライメントの形式として、フェアネスの課題に対処するために、AIシステムの汎用能力を使用する必要さえある。

The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness, such as group fairness and fair representations, and find that their application to LLMs faces inherent limitations. We show that each framework either does not logically extend to LLMs or presents a notion of fairness that is intractable for LLMs, primarily due to the multitudes of populations affected, sensitive attributes, and use cases. To address these challenges, we develop guidelines for the more realistic goal of achieving fairness in particular use cases: the criticality of context, the responsibility of LLM developers, and the need for stakeholder participation in an iterative process of design and evaluation. Moreover, it may eventually be possible and even necessary to use the general-purpose capabilities of AI systems to address fairness challenges as a form of scalable AI-assisted alignment.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-28

# ADR-BC: 対向密度重み付き回帰行動クローニング

ADR-BC: Adversarial Density Weighted Regression Behavior Cloning ( http://arxiv.org/abs/2405.20351v1 )

ライセンス: Link先を確認

Ziqi Zhang, Zifeng Zhuang, Donglin Wang, Jingzehua Xu, Miao Liu, Shuai Zhang,

(参考訳) 通常、従来のImitation Learning(IL)手法は、まず報酬やQ関数を定式化し、次にこの形の関数を強化学習(RL)フレームワークで使用して経験則を最適化する。しかし、形状の報酬/Q関数が基底真理報酬/Q関数を適切に表現していない場合、多段階のRLフレームワーク内でポリシーを更新すると累積バイアスが発生し、さらに政策学習に影響を及ぼす可能性がある。行動クローニング(BC)を利用して、一段階の更新方法でいくつかのデモを直接模倣することでポリシーを学ぶことは累積バイアスを避けることができるが、BCは、実証されたアクションを巧みに模倣し、目に見えない状態のアクションペアに一般化する能力を制限する傾向にある。これらの課題に対処するため,ADR-BCを提案する。特に、ADR-BCの目的は、準最適分布を分岐しながら専門家分布と一致するような物理的意味を共有することである。したがって、ADR-BCはより堅牢な専門家分布マッチングを実現することができる。一方、ADR-BCは1段階の行動クローニングフレームワークであり、多段階のRLフレームワークに関連する累積バイアスを避けている。 ADR-BCの性能を検証するため,我々は広範囲な実験を行った。具体的には、ADR-BCは、Gym-Mujocoドメインのすべてのタスクに対して、以前の最先端(SOTA)の一般化されたILベースラインであるCEILよりも10.5%改善されている。さらに、AdroitドメインとKitchenドメインの全タスクの本当の報酬を使用して、Implicit Q Learning(IQL)よりも89.5%改善されている。一方,ADR-BCの有効性をさらに示すため,広範囲にわたる改善を行った。

Typically, traditional Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning (RL) framework to optimize the empirical policy. However, if the shaped reward/Q function does not adequately represent the ground truth reward/Q function, updating the policy within a multi-step RL framework may result in cumulative bias, further impacting policy learning. Although utilizing behavior cloning (BC) to learn a policy by directly mimicking a few demonstrations in a single-step updating manner can avoid cumulative bias, BC tends to greedily imitate demonstrated actions, limiting its capacity to generalize to unseen state action pairs. To address these challenges, we propose ADR-BC, which aims to enhance behavior cloning through augmented density-based action support, optimizing the policy with this augmented support. Specifically, the objective of ADR-BC shares the similar physical meanings that matching expert distribution while diverging the sub-optimal distribution. Therefore, ADR-BC can achieve more robust expert distribution matching. Meanwhile, as a one-step behavior cloning framework, ADR-BC avoids the cumulative bias associated with multi-step RL frameworks. To validate the performance of ADR-BC, we conduct extensive experiments. Specifically, ADR-BC showcases a 10.5% improvement over the previous state-of-the-art (SOTA) generalized IL baseline, CEIL, across all tasks in the Gym-Mujoco domain. Additionally, it achieves an 89.5% improvement over Implicit Q Learning (IQL) using real rewards across all tasks in the Adroit and Kitchen domains. On the other hand, we conduct extensive ablations to further demonstrate the effectiveness of ADR-BC.

翻訳日:2024-06-03 18:44:15 公開日:2024-05-28

# スペクトル匿名化の漸近的有用性

Asymptotic utility of spectral anonymization ( http://arxiv.org/abs/2405.20779v1 )

ライセンス: Link先を確認

Katariina Perkonoja, Joni Virta,

(参考訳) 現代のデータランドスケープでは、複数ソースのデータ収集とサードパーティの共有が特徴であり、個人のプライバシを確保することが重要な関心事である。様々な匿名化手法が存在するが、それらのユーティリティ保存とプライバシ保証は定量化が難しいままである。本研究では、スペクトル匿名化(SA)アルゴリズムの有用性とプライバシを、特に漸近的なフレームワークで研究することで、このギャップに対処する。元のデータを直接修正する従来の匿名化手法とは異なり、SAはデータをスペクトルベースで摂動させ、その後元のベースに戻す。原版である $\mathcal{P}$-SA とともに、ランダムな置換変換を用いる2つの新しいSA変種: $\mathcal{J}$-spectral anonymization と $\mathcal{O}$-spectral anonymization を導入する。いくつかの現実的な仮定の下では、これらのSAアルゴリズムが元のデータの第一と第二の瞬間をいかに保存するかを示す。特に, 共分散推定における3つのSAアルゴリズムの漸近効率は, 原データと比較して正確に50%であることがわかった。これらの漸近的結果の適用性を評価するために,有限データを用いたシミュレーション研究を行い,距離ベースのレコードリンクを用いて,これらのアルゴリズムが提供するプライバシー保護を評価する。我々の研究は、有限サンプルユーティリティにおいて明確な優位性を示す手法は存在しないが、$\mathcal{O}$-SAは、計算複雑性が増大しているにもかかわらず、同じレコードを生成しないという例外的なプライバシー保護のために、自分自身を区別していることを明らかにしている。逆に$\mathcal{P}$-SA は計算効率の良い代替品として現れ、平均推定における未整合効率を示す。

In the contemporary data landscape characterized by multi-source data collection and third-party sharing, ensuring individual privacy stands as a critical concern. While various anonymization methods exist, their utility preservation and privacy guarantees remain challenging to quantify. In this work, we address this gap by studying the utility and privacy of the spectral anonymization (SA) algorithm, particularly in an asymptotic framework. Unlike conventional anonymization methods that directly modify the original data, SA operates by perturbing the data in a spectral basis and subsequently reverting them to their original basis. Alongside the original version $\mathcal{P}$-SA, employing random permutation transformation, we introduce two novel SA variants: $\mathcal{J}$-spectral anonymization and $\mathcal{O}$-spectral anonymization, which employ sign-change and orthogonal matrix transformations, respectively. We show how well, under some practical assumptions, these SA algorithms preserve the first and second moments of the original data. Our results reveal, in particular, that the asymptotic efficiency of all three SA algorithms in covariance estimation is exactly 50% when compared to the original data. To assess the applicability of these asymptotic results in practice, we conduct a simulation study with finite data and also evaluate the privacy protection offered by these algorithms using distance-based record linkage. Our research reveals that while no method exhibits clear superiority in finite-sample utility, $\mathcal{O}$-SA distinguishes itself for its exceptional privacy preservation, never producing identical records, albeit with increased computational complexity. Conversely, $\mathcal{P}$-SA emerges as a computationally efficient alternative, demonstrating unmatched efficiency in mean estimation.

翻訳日:2024-06-03 18:05:14 公開日:2024-05-28

# 言語モデル透かしのブラックボックス検出

Black-Box Detection of Language Model Watermarks ( http://arxiv.org/abs/2405.20777v1 )

ライセンス: Link先を確認

Gloaguen Thibaud, Jovanović Nikola, Staab Robin, Vechev Martin,

(参考訳) 透かしはLLM生成テキストを検出するための有望な方法として登場した。 LLMプロバイダがシークレットキーを付与した透かしを適用できるようにする。最近の研究は3つの主要な透かし方式を提案しており、そのうち2つはLLM分布の保存性に焦点を当てている。これは、LLM機能を維持するための魅力的なプロキシであると同時に、透かしの配置を隠すことで、悪意のあるアクターが特定のLCMを避けたり、その透かしを攻撃したりすることで誤用を隠すのが難しくなるという考えによっても動機づけられている。しかし、検出可能性に関して多くの議論があるにもかかわらず、これらのスキームファミリーのうちどれかが現実的なブラックボックス設定で検出可能かどうかを以前の研究は調査していない。ブラックボックスクエリの限られた数だけを用いて、最も人気のある3つの透かしスキーム群すべての存在を検出するための厳密な統計的テストを開発した。提案手法の有効性を,多種多様なオープンソースモデルを用いて実験的に検証した。以上の結果から,現在の透かし方式は従来考えられていたよりも検出可能であり,また,透かしが配備されたという事実を無視することは,プロバイダが敵から守るための有効な方法ではない可能性が示唆された。 GPT4、Claude 3、Gemini 1.0 Proといった一般的な公開APIの背後にある透かしの存在をテストするために、私たちのメソッドをさらに適用します。

Watermarking has emerged as a promising way to detect LLM-generated text. To apply a watermark an LLM provider, given a secret key, augments generations with a signal that is later detectable by any party with the same key. Recent work has proposed three main families of watermarking schemes, two of which focus on the property of preserving the LLM distribution. This is motivated by it being a tractable proxy for maintaining LLM capabilities, but also by the idea that concealing a watermark deployment makes it harder for malicious actors to hide misuse by avoiding a certain LLM or attacking its watermark. Yet, despite much discourse around detectability, no prior work has investigated if any of these scheme families are detectable in a realistic black-box setting. We tackle this for the first time, developing rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries. We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models. Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries. We further apply our methods to test for watermark presence behind the most popular public APIs: GPT4, Claude 3, Gemini 1.0 Pro, finding no strong evidence of a watermark at this point in time.

翻訳日:2024-06-03 14:37:39 公開日:2024-05-28

# 安全対応型LDMに対する逆例の生成改善

Improved Generation of Adversarial Examples Against Safety-aligned LLMs ( http://arxiv.org/abs/2405.20778v1 )

ライセンス: Link先を確認

Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen,

(参考訳) 大きな言語モデル(LLM)が安全基準に準拠し、無害なコンテンツを生み出すことを保証するための多くの努力にもかかわらず、LLMに対するジェイルブレイク攻撃(英語版)として知られるこれらの制限を回避し、いくつかの成功は達成されている。勾配に基づく手法を用いて生成された敵対的プロンプトは、自動的にジェイルブレイク攻撃を行う際、優れた性能を示す。しかしながら、テキストの離散的な性質のため、LLMの入力勾配はトークンの交換によって生じる損失の程度を正確に反映するのに苦労し、ホワイトボックスの設定でさえ、安全に整合したLLMに対する攻撃の成功率は制限された。本稿では,ブラックボックス画像分類モデルに対する攻撃として提案されたトランスファーベース攻撃に触発されたイノベーションを活用することで,この問題に対する新たな視点を探求する。そこで我々は,これらの移動型攻撃,すなわちスキップグラディエント・メソッドと中間レベル・アタックの効果的な手法のイデオロギーを,ホワイトボックスのLDMに対して自動生成された敵例の有効性を改善するために,初めて適用した。適切な適応により、これらのイデオロギーを勾配に基づく逆数生成プロセスに注入し、明らかな計算コストを伴わずに大幅な性能向上を達成する。一方、利得の背後にあるメカニズムを議論することで、新たな洞察を導き、これらの手法の適切な組み合わせも開発されている。実験の結果,AdvBench上でのLlama-2-7B-Chatモデルに対するGCGと比較して,開発した組み合わせは30%の絶対的な攻撃成功率向上を実現していることがわかった。

Despite numerous efforts to ensure large language models (LLMs) adhere to safety standards and produce harmless content, some successes have been achieved in bypassing these restrictions, known as jailbreak attacks against LLMs. Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing jailbreak attacks automatically. Nevertheless, due to the discrete nature of texts, the input gradient of LLMs struggles to precisely reflect the magnitude of loss change that results from token replacements in the prompt, leading to limited attack success rates against safety-aligned LLMs, even in the white-box setting. In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks that were originally proposed for attacking black-box image classification models. For the first time, we appropriate the ideologies of effective methods among these transfer-based attacks, i.e., Skip Gradient Method and Intermediate Level Attack, for improving the effectiveness of automatically generated adversarial examples against white-box LLMs. With appropriate adaptations, we inject these ideologies into gradient-based adversarial prompt generation processes and achieve significant performance gains without introducing obvious computational cost. Meanwhile, by discussing mechanisms behind the gains, new insights are drawn, and proper combinations of these methods are also developed. Our empirical results show that the developed combination achieves >30% absolute increase in attack success rates compared with GCG for attacking the Llama-2-7B-Chat model on AdvBench.

翻訳日:2024-06-03 14:37:39 公開日:2024-05-28

# 差分的私的メカニズムの普遍的エクササイズ圧縮

Universal Exact Compression of Differentially Private Mechanisms ( http://arxiv.org/abs/2405.20782v1 )

ライセンス: Link先を確認

Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li,

(参考訳) 差分プライバシー機構の通信コストを低減するため,PPR(Poisson private representation)と呼ばれる新しい構成を導入し,局所的な差分プライバシーを確保しつつ任意の局所的ランダム化器を圧縮・シミュレートする。従来のシミュレーションに基づく局所微分プライバシー機構とは異なり、PPRはデータの結合分布と元の局所ランダム化器の出力を正確に保存する。したがって、PPR圧縮されたプライバシメカニズムは、不偏性やガウシアン性など、元のプライバシメカニズムの望ましい統計特性をすべて保持している。さらに、PPRは理論的な下界から対数的ギャップ内の圧縮サイズを達成する。 PPRを用いて、分散平均推定のための通信、精度、中央および局所的な差分プライバシーの間の新しい秩序的なトレードオフを与える。分散平均推定実験の結果、PPRは、座標サブサンプリングされたガウス機構よりも通信、精度、中央差分プライバシーのトレードオフが良好であると同時に、局所差分プライバシーも提供することが示された。

To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the original local randomizer. Hence, the PPR-compressed privacy mechanism retains all desirable statistical properties of the original privacy mechanism such as unbiasedness and Gaussianity. Moreover, PPR achieves a compression size within a logarithmic gap from the theoretical lower bound. Using the PPR, we give a new order-wise trade-off between communication, accuracy, central and local differential privacy for distributed mean estimation. Experiment results on distributed mean estimation show that PPR consistently gives a better trade-off between communication, accuracy and central differential privacy compared to the coordinate subsampled Gaussian mechanism, while also providing local differential privacy.

翻訳日:2024-06-03 14:37:39 公開日:2024-05-28

# マルチモーダル・ムード・リーダー:事前学習したモデルが物体間感情認識に役立てる

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition ( http://arxiv.org/abs/2405.19373v1 )

ライセンス: Link先を確認

Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang,

(参考訳) 脳波(EEG)に基づく感情認識は、神経信号処理や感情計算などの分野で大きな注目を集め、多様な発展を遂げている。しかし、個人特有の脳解剖学は、被験者間での脳波信号の非無視的な自然差をもたらし、クロスオブジェクト感情認識の課題を提起する。最近の研究はこれらの問題に対処しようと試みているが、実用性やモデルフレームワークの統一性には限界がある。現在の方法では、脳波信号の複雑な時空間ダイナミクスを捉えるのに苦労し、マルチモーダル情報を効果的に統合することができず、被検体間での最適化性能と限定的な一般化性をもたらす。これらの制約を克服するために,マスク脳信号モデリングと空間的注意機構を利用したクロスオブジェクト感情認識のための,事前学習モデルに基づくマルチモーダルモードリーダを開発した。このモデルは,大規模データセットの事前学習を通じて,脳波信号の普遍的な潜時表現を学習し,脳波データから抽出した微分エントロピー(DE)特徴を処理する。その後、識別的特徴を統合するために多層融合層を提案し、異なる次元とモダリティにまたがる特徴の利点を最大化する。公開データセットに関する大規模な実験は、Mood Readerのクロスオブジェクト感情認識タスクにおける優れたパフォーマンスを示し、最先端の手法よりも優れています。さらに、このモデルは注意点から切り離され、感情関連脳領域の質的分析を提供し、神経信号処理における感情研究に有用な洞察を提供する。

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.

翻訳日:2024-05-31 19:45:41 公開日:2024-05-28

# 最適マルチクラスU-キャリブレーション誤差とそれを超えるもの

Optimal Multiclass U-Calibration Error and Beyond ( http://arxiv.org/abs/2405.19374v1 )

ライセンス: Link先を確認

Haipeng Luo, Spandan Senapati, Vatsal Sharan,

(参考訳) オンラインマルチクラスU-キャリブレーションの問題を考えると、予測者はU-キャリブレーション誤差が低いクラスに対して連続的な分布予測を行うことを目標としている。 Kleinberg et al (2023) は U-calibration error $O(K\sqrt{T})$ after $T$ rounds というアルゴリズムを開発した。我々は、最適U校正誤差が$\Theta(\sqrt{KT})$ -- まず、ダスカラキスとシルグカニスのFollow-the-Perturbed-Leaderアルゴリズム(2016)がこの上限を達成し、その後、特定の適切な損失で構築された一致した下限が続くという単純な観察から始める。また、損失関数に関する自然な仮定では、Lipschitz の固有損失に対して $\Theta(\log T)$ U-calibration error, $O(\log T)$ U-calibration error for a certain class of decomposable proper loss, U-calibration error bounds for proper loss with a low covered number などである。

We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. Kleinberg et al. (2023) developed an algorithm with U-calibration error $O(K\sqrt{T})$ after $T$ rounds and raised the open question of what the optimal bound is. We resolve this question by showing that the optimal U-calibration error is $\Theta(\sqrt{KT})$ -- we start with a simple observation that the Follow-the-Perturbed-Leader algorithm of Daskalakis and Syrgkanis (2016) achieves this upper bound, followed by a matching lower bound constructed with a specific proper loss (which, as a side result, also proves the optimality of the algorithm of Daskalakis and Syrgkanis (2016) in the context of online learning against an adversary with finite choices). We also strengthen our results under natural assumptions on the loss functions, including $\Theta(\log T)$ U-calibration error for Lipschitz proper losses, $O(\log T)$ U-calibration error for a certain class of decomposable proper losses, U-calibration error bounds for proper losses with a low covering number, and others.

翻訳日:2024-05-31 19:35:57 公開日:2024-05-28

# Cross-Attentive Modulationトークンを用いたリンクセット予測のグローバルな認識の改善

Improving global awareness of linkset predictions using Cross-Attentive Modulation tokens ( http://arxiv.org/abs/2405.19375v1 )

ライセンス: Link先を確認

Félix Marcoccia, Cédric Adjih, Paul Mühlethaler,

(参考訳) 複数のリンク予測やグラフ生成技術のほとんどは、適切なリンク予測を形成するためにノードレベルの情報交換を利用するグラフニューラルネットワーク(GNN)に頼っている。このようなノードレベルの相互作用は順序列としてノードを処理せず、ノードの自然な順序付けを暗示する。グラフ問題には適しているが、予測されるリンクのグローバルなオーケストレーションの提供に苦慮しているため、パフォーマンスが損なわれる可能性がある。典型的な問題は、大域的な接続性、固定径、過密化や過密化といった情報のボトルネック効果の回避などの高レベルな特性を確保することの難しさである。この問題に対処するために、我々は、予測リンクのグローバル一貫性を改善するコンテキスト認識計算を可能にするために、ノードとエッジレベルの変調に使用されるクロスアテンテートユニットを導入するクロスアテンテート変調(CAM)トークンを提案する。いくつかの置換不変アーキテクチャで実装し、私たちの仕事のメリットを証明するベンチマークをベンチマークします。

Most of multiple link prediction or graph generation techniques rely on the attention mechanism or on Graph Neural Networks (GNNs), which consist in leveraging node-level information exchanges in order to form proper link predictions. Such node-level interactions do not process nodes as an ordered sequence, which would imply some kind of natural ordering of the nodes: they are said to be permutation invariant mechanisms. They are well suited for graph problems, but struggle at providing a global orchestration of the predicted links, which can result in a loss of performance. Some typical issues can be the difficulty to ensure high-level properties such as global connectedness, fixed diameter or to avoid information bottleneck effects such as oversmoothing and oversquashing, which respectively consist in abundant smoothing in dense areas leading to a loss of information and a tendency to exclude isolated nodes from the message passing scheme, and often result in irrelevant, unbalanced link predictions. To tackle this problem, we hereby present Cross-Attentive Modulation (CAM) tokens, which introduce cross-attentive units used to condition node and edge-level modulations in order to enable context-aware computations that improve the global consistency of the prediction links. We will implement it on a few permutation invariant architectures, and showcase benchmarks that prove the merits of our work.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-28

# PureEBM:エネルギーモデルミッドランダイナミクスによるユニバーサルポゾンの浄化

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models ( http://arxiv.org/abs/2405.19376v1 )

ライセンス: Link先を確認

Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie,

(参考訳) データ中毒攻撃は、トレーニング中に敵の例を注入することで、ターゲット配布テストデータの誤分類につながることによって、機械学習モデルの完全性に重大な脅威をもたらす。既存のSoTA(State-of-the-art)防衛手法は、一般化性能の大幅な低下、特定の攻撃タイプや分類器への特異性、訓練中のかなりのオーバーヘッドなど、様々な制限に悩まされており、現実のアプリケーションでは非現実的または限定的である。この課題に対応するために、我々は、画像$xで初期化された収束エネルギーベースモデル(EBM)の反復的ランゲヴィンサンプリングにより実現された普遍確率前処理ステップ$\Psi_{T}(x)$を適用することにより、悪質な白、グレー、ブラックボックスのイメージ毒から自然に訓練された分類器を保護するユニバーサルデータ浄化手法を導入する。 $$\Psi_{T}(x)$のミッドランダイナミクス分類器ネットワークの一般化に重要な機能に対する最小限の影響で毒情報を浄化する。 EBM の対照的な学習過程は,有毒な EBM トレーニングデータが存在する場合でも,普遍的な清浄剤を維持でき,さらに,有毒なNarcissus および無毒な無毒な Gradient Matching と Bullseye Polytope を誘導する SoTA の防御を達成できることを示す。この研究はPureGenで導入されたより大きなフレームワークのサブセットであり、ESMの浄化と毒の防御により詳細な焦点をあてている。

Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution test data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from a variety of limitations, such as significantly reduced generalization performance, specificity to particular attack types and classifiers, and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $\Psi_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $\Psi_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that the contrastive learning process of EBMs allows them to remain universal purifiers, even in the presence of poisoned EBM training data, and to achieve SoTA defense on leading triggered poison Narcissus and triggerless poisons Gradient Matching and Bullseye Polytope. This work is a subset of a larger framework introduced in PureGen with a more detailed focus on EBM purification and poison defense.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-28

# ウイルスゲノムアライメントフリー分類における統計的線形モデル:C型肝炎ウイルスへの応用

Statistical Linear Models in Virus Genomic Alignment-free Classification: Application to Hepatitis C Viruses ( http://arxiv.org/abs/1910.05421v3 )

ライセンス: Link先を確認

Amine M. Remita, Abdoulaye Baniré Diallo,

(参考訳) ウイルス配列分類は病原体の検出、疫学調査、進化研究において重要な課題である。統計的学習法は、環境からのサンプル中のウイルス配列の分類と同定に広く用いられている。これらの手法は、組換え、突然変異率、多様性など、ウイルスゲノムの性質と性質に関連するいくつかの課題に直面している。また、新しい世代のシークエンシング技術は、大量の断片化されたシーケンスを生成することで、他の困難を生じさせる。線形分類器はウイルスの分類によく用いられるが、アライメントフリーアプローチの文脈では既存のモデルの精度空間の探索が欠如している。本研究では, 遺伝子組換えおよび部分的, 完全ゲノムのサブタイプにおける線形分類器の能力について, 徹底的な評価手法を提案する。 C型肝炎ウイルス(HCV)に感染する。本研究では,分類器型(生成的・識別的)とその超パラメータ(平滑化値と正規化ペナルティ関数),分類タスク(ジェノタイピングとサブタイピング),テストシーケンスの長さ(部分的・完全),k-mer語の長さなど,いくつかの変数が検討されている。全体として、いくつかの分類器は、上記の実験変数の正確な組み合わせの集合が与えられたときによく機能する。最後に、ウイルスゲノムの分類をより堅牢に評価するための手順とベンチマークデータを提供する。

Viral sequence classification is an important task in pathogen detection, epidemiological surveys and evolutionary studies. Statistical learning methods are widely used to classify and identify viral sequences in samples from environments. These methods face several challenges associated with the nature and properties of viral genomes such as recombination, mutation rate and diversity. Also, new generations of sequencing technologies rise other difficulties by generating massive amounts of fragmented sequences. While linear classifiers are often used to classify viruses, there is a lack of exploration of the accuracy space of existing models in the context of alignment free approaches. In this study, we present an exhaustive assessment procedure exploring the power of linear classifiers in genotyping and subtyping partial and complete genomes. It is applied to the Hepatitis C viruses (HCV). Several variables are considered in this investigation such as classifier types (generative and discriminative) and their hyper-parameters (smoothing value and regularization penalty function), the classification task (genotyping and subtyping), the length of the tested sequences (partial and complete) and the length of k-mer words. Overall, several classifiers perform well given a set of precise combination of the experimental variables mentioned above. Finally, we provide the procedure and benchmark data to allow for more robust assessment of classification from virus genomes.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-28

# テンソルネットワークにおける臨界U(1)スピン液体と創発対称性のロバスト性

Robustness of critical U(1) spin liquids and emergent symmetries in tensor networks ( http://arxiv.org/abs/2008.04833v2 )

ライセンス: Link先を確認

Henrik Dreyer, Laurens Vanderstraeten, Ji-Yao Chen, Ruben Verresen, Norbert Schuch,

(参考訳) 臨界共鳴バレンス結合 (RVB) スピン液体の長距離一重項を持つドーピングに対する応答について検討し, より一般的には非対称摂動に対するU(1)対称テンソルネットワークについて検討した。フィールド理論の記述を用いて、RVBではドーピングが関連する摂動を構成しており、以前の観測とは対照的にすぐにギャップを開きます。本分析では, ドッピング量においても非常に大きな相関長を予測し, 高精度な数値シミュレーションを用いて検証する。これは注意深い分析の必要性を強調しつつ、臨界系に対する変分アンサッツのような状態の使用を正当化する。最後に、非対称摂動がギャップを開かず、U(1)対称性が再帰するPEPSの例を示す。

We study the response of critical Resonating Valence Bond (RVB) spin liquids to doping with longer-range singlets, and more generally of U(1)-symmetric tensor networks to non-symmetric perturbations. Using a field theory description, we find that in the RVB, doping constitutes a relevant perturbation which immediately opens up a gap, contrary to previous observations. Our analysis predicts a very large correlation length even at significant doping, which we verify using high-accuracy numerical simulations. This emphasizes the need for careful analysis, but also justifies the use of such states as a variational ansatz for critical systems. Finally, we give an example of a PEPS where non-symmetric perturbations do not open up a gap and the U(1) symmetry re-emerges.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-28

# 識別機構を有するスケーラブルなビデオオブジェクト分割

Scalable Video Object Segmentation with Identification Mechanism ( http://arxiv.org/abs/2203.11442v8 )

ライセンス: Link先を確認

Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang,

(参考訳) 本稿では,半教師付きビデオオブジェクトセグメンテーション(VOS)において,スケーラブルで効果的なマルチオブジェクトモデリングを実現する上での課題について述べる。従来のVOSメソッドは、単一の正のオブジェクトで機能をデコードし、マルチオブジェクト表現の学習を制限する。さらに、以前のテクニックは特定のアプリケーション目標に適合し、異なるスピード精度要件を満たす柔軟性に欠けていた。これらの問題を解決するために,AOT(Associating Objects with Transformers)とAOST(Associating Objects with Scalable Transformers)という2つの革新的なアプローチを提案する。効果的なマルチオブジェクトモデリングの追求において、AOTは各オブジェクトにユニークなIDを割り当てるためのID(ID)メカニズムを導入している。このアプローチにより、ネットワークは、すべてのオブジェクト間の関連を同時にモデル化し、単一のネットワークパスにおけるオブジェクトの追跡とセグメンテーションを容易にする。非フレキシブルなデプロイメントの課題に対処するため、AOSTはさらに、スケーラブルな監視とレイヤ単位のIDベースの注意を取り入れた、スケーラブルな長期的な短期トランスフォーマーを統合している。これにより、VOSで初めてオンラインアーキテクチャのスケーラビリティが可能になり、ID埋め込みの表現制限を克服できる。マルチオブジェクトアノテーションを含むVOSのベンチマークが欠如していることを踏まえ,我々のアプローチを検証するために,ビデオオブジェクトセグメンテーション・イン・ザ・ワイルド(VOSW)ベンチマークを提案する。 VOSWと一般的に使用されているVOSベンチマーク5つ、YouTube-VOS 2018と2019 Val、DAVIS-2017 Val & Test、DAVIS-2016を含む、様々なAOTおよびAOSTのバリエーションを評価した。当社のアプローチは最先端の競合に勝って,6つのベンチマークすべてにおいて,例外的な効率性とスケーラビリティを一貫して示しています。プロジェクトページ: https://github.com/yoxu515/aot-benchmark.com

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS). Previous VOS methods decode features with a single positive object, limiting the learning of multi-object representation as they must match and segment each target separately under multi-object scenarios. Additionally, earlier techniques catered to specific application objectives and lacked the flexibility to fulfill different speed-accuracy requirements. To address these problems, we present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST). In pursuing effective multi-object modeling, AOT introduces the IDentification (ID) mechanism to allocate each object a unique identity. This approach enables the network to model the associations among all objects simultaneously, thus facilitating the tracking and segmentation of objects in a single network pass. To address the challenge of inflexible deployment, AOST further integrates scalable long short-term transformers that incorporate scalable supervision and layer-wise ID-based attention. This enables online architecture scalability in VOS for the first time and overcomes ID embeddings' representation limitations. Given the absence of a benchmark for VOS involving densely multi-object annotations, we propose a challenging Video Object Segmentation in the Wild (VOSW) benchmark to validate our approaches. We evaluated various AOT and AOST variants using extensive experiments across VOSW and five commonly used VOS benchmarks, including YouTube-VOS 2018 & 2019 Val, DAVIS-2017 Val & Test, and DAVIS-2016. Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks. Project page: https://github.com/yoxu515/aot-benchmark.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-28

# 2光子駆動によるノイズ非エルミタン量子センシングの指数感度回復

Exponential sensitivity revival of noisy non-Hermitian quantum sensing with two-photon drives ( http://arxiv.org/abs/2303.16575v2 )

ライセンス: Link先を確認

Liying Bao, Bo Qi, Franco Nori, Daoyi Dong,

(参考訳) 多重モード非エルミート格子力学の特異な性質を利用して指数関数的に感度の高いセンサを構築することができる。しかし、ノイズの影響はいまだ不明であり、感度が著しく低下する可能性がある。非エルミタンセンサの感度回復と安定性に対する損失と利得の影響を解析的に特徴付け,強調する。量子センシングの優位性は損失の存在下で消滅するという一般的な信念を守り、損失を積極的に調整することで、感覚力学が安定すると指数的な感度が驚くほど回復する。さらに、ゲインが理想的指数感度を完全に回復し、バランスの取れたロスとゲインによって非エルミートセンシングの安定性を確保することが重要であることを証明した。本論文は、損失と利得を積極的に調整することで感度を著しく向上する方法を開き、将来の量子センシングと量子工学を促進する。

Unique properties of multimode non-Hermitian lattice dynamics can be utilized to construct exponentially sensitive sensors. However, the impact of noise remains unclear, which may severely degrade their sensitivity. We analytically characterize and highlight the impact of loss and gain on the sensitivity revival and stability of non-Hermitian sensors. Defying the general belief that the superiority of quantum sensing will vanish in the presence of loss, we find that by proactively tuning the loss, the exponential sensitivity can be surprisingly regained when the sensing dynamics is stable. Furthermore, we prove that gain is crucial to fully revive the ideally exponential sensitivity and to ensure the stability of non-Hermitian sensing by making a balanced loss and gain. Our paper opens a way to significantly enhance the sensitivity by proactively tuning the loss and gain, which may promote future quantum sensing and quantum engineering.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-28

# 自己監督型時空間グラウンド(自己監督型時空間グラウンド) : ナラティブ・インストラクションによるマルチアクションビデオ

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions ( http://arxiv.org/abs/2303.16990v2 )

ライセンス: Link先を確認

Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne,

(参考訳) 時空間グラウンドメント(時空間グラウンド)とは、空間と時間における事象の局所化(例えばビデオデータ)を、言葉による記述のみに基づいて記述する作業である。このタスクのモデルは、通常、人間の注釈付き文とバウンディングボックスの監督によって訓練される。本研究は、この課題をマルチモーダルな監督の観点から解決し、人間のアノテーションを使わずに、ゆるやかなビデオとサブタイトルの監督のみに基づいて訓練された時空間行動基盤のためのフレームワークを提案する。この目的のために,局所的な表現学習と,より詳細な空間情報を活用することに焦点を当てたグローバルな表現符号化を併用して,高次表現をキャプチャし,両者を協調的なアプローチで組み込む。この課題を実生活環境で評価するために,5K以上のイベントに対して,高密度な時空間的接地アノテーションを提供するベンチマークデータセットが提案されている。提案手法は,空間的,時間的,不整合な多行動時空間グラウンドなど,様々な設定において,現在のベースラインよりも改善されていることを示す。

Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this task are usually trained with human-annotated sentences and bounding box supervision. This work addresses this task from a multimodal supervision perspective, proposing a framework for spatio-temporal action grounding trained on loose video and subtitle supervision only, without human annotation. To this end, we combine local representation learning, which focuses on leveraging fine-grained spatial information, with a global representation encoding that captures higher-level representations and incorporates both in a joint approach. To evaluate this challenging task in a real-life setting, a new benchmark dataset is proposed providing dense spatio-temporal grounding annotations in long, untrimmed, multi-action instructional videos for over 5K events. We evaluate the proposed approach and other methods on the proposed and standard downstream tasks showing that our method improves over current baselines in various settings, including spatial, temporal, and untrimmed multi-action spatio-temporal grounding.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-28

# リッチテキストを用いた表現型テキスト・画像生成

Expressive Text-to-Image Generation with Rich Text ( http://arxiv.org/abs/2304.06720v3 )

ライセンス: Link先を確認

Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang,

(参考訳) プレーンテキストは、テキストと画像の合成の一般的なインターフェースになっている。しかし、その限定されたカスタマイズオプションは、ユーザーが求める出力を正確に記述することを妨げる。例えば、プレーンテキストは、それぞれの単語の正確なRGB色値や重要性など、連続的な量を特定するのを難しくしている。さらに、複雑なシーンのための詳細なテキストプロンプトを作成することは、人間が書くのが面倒で、テキストエンコーダが解釈するのは難しい。これらの課題に対処するために、フォントスタイル、サイズ、色、フットノートなどのフォーマットをサポートするリッチテキストエディタを提案する。それぞれの単語の属性をリッチテキストから抽出し、局所的なスタイル制御、明示的なトークン再重み付け、正確な色レンダリング、詳細な領域合成を可能にする。領域ベースの拡散プロセスによりこれらの機能を実現する。まず,平文を用いた拡散過程の注意図に基づいて各単語の領域を抽出する。各領域に対して,地域固有の詳細なプロンプトを作成し,地域固有のガイダンスを適用してテキスト属性を強制し,地域ベースのインジェクションによる平文生成に対する忠実さを維持する。リッチテキストからの画像生成の様々な例を示し、定量的評価により、本手法が強いベースラインより優れていることを示す。

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-28

# 関連性への注意のシフト:自由形大言語モデルの予測的不確実性定量化に向けて

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models ( http://arxiv.org/abs/2307.01379v3 )

ライセンス: Link先を確認

Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, Kaidi Xu,

(参考訳) 大規模言語モデル (LLM) は、言語生成と命令に続く有望な結果を示すが、しばしば「ハロシン化」し、出力の信頼性を低下させる。不確実性量子化(UQ)の潜在的な解決策にもかかわらず、LSM内で正確に実装することは困難である。我々の研究は単純なヒューリスティックを導入している: 自動回帰 LLM テキストの全てのトークンは、その基礎となる意味を等しく表しているわけではない。しかし、現在の手法では不確実性を評価する際にこの不等式を過小評価しており、限定的な意味を持つトークンはUQにおいて等しくあるいは過度に重み付けされる。これを修正するために、より関連性の高いSAR(Shifting Attention to more Relevant)コンポーネントをトークンレベルと文レベルの両方で提案する。 Vicuna, WizardLM, LLaMA-2-chat など, 一般的な LLM を対象とし, モデルサイズを33B まで拡張した広範囲な実験を行った。我々は,読解,理科Q&A,医学Q&Aなどの領域を含む,自由形式の質問応答タスクを評価した。総合的な人口統計分析と合わせて,SARの優れた性能を実証した。コードはhttps://github.com/jinhaoduan/SAR.comで公開されている。

Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate", making their outputs less reliable. Despite Uncertainty Quantification's (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underlying meaning, as "linguistic redundancy" often allows a few keywords to convey the essence of long sentences. However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. We conduct extensive experiments involving a range of popular "off-the-shelf" LLMs, such as Vicuna, WizardLM, and LLaMA-2-chat, with model sizes extending up to 33B parameters. We evaluate various free-form question-answering tasks, encompassing domains such as reading comprehension, science Q&A, and medical Q&A. Our experimental results, coupled with a comprehensive demographic analysis, demonstrate the superior performance of SAR. The code is available at https://github.com/jinhaoduan/SAR.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-28

# PIGEON:画像位置情報の予測

PIGEON: Predicting Image Geolocations ( http://arxiv.org/abs/2307.05845v6 )

ライセンス: Link先を確認

Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn,

(参考訳) 惑星規模の画像のジオローカライゼーションは、世界中のどこから来た画像の多様性のため、依然として困難な問題である。視覚変換器をベースとした手法は地理的局所化の精度を大幅に向上させたが、先行文学における成功はランドマークの画像の狭い分布に制約されており、性能は見当たらない場所に一般化されていない。本稿では, セマンティックジオセル生成, マルチタスクコントラスト事前学習, 新たな損失関数を組み合わせた新しいジオローカライズシステムを提案する。さらに,本研究は,推定精度を高めるため,位置クラスタ上で検索を行う最初の試みである。まず,Geoguessrのゲームから得られたデータに基づいてトレーニングを行い,目標地点から25km以内に推定値の40%以上を世界規模で配置することができる。また、ロボットを開発し、人間に対する盲点実験でPIGEONをデプロイし、プレイヤーの上位0.01%にランク付けした。我々はまた、世界有数のプロであるGeoguessrプレーヤーの1人に対して、数百万人の視聴者と6試合に挑戦し、6試合全てで勝利した。第2のモデルであるPIGEOTTOは、FlickrとWikipediaの画像データセットでトレーニングされ、幅広い画像ジオローカライゼーションベンチマークで最先端の結果を達成し、都市の精度レベルでは最大7.7%、国レベルでは最大38.8ポイントのSOTAを上回ります。この結果から,PIGEOTTOは未知の場所に効果的に一般化する最初の画像ジオローカライゼーションモデルであり,高精度で惑星規模の画像ジオローカライゼーションシステムを実現するための道を開くことができることが示唆された。私たちのコードはGitHubで入手可能です。

Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-28

# 古典的、量子的、閉かつオープンなシステムに対する作用

Action for classical, quantum, closed and open systems ( http://arxiv.org/abs/2307.12320v2 )

ライセンス: Link先を確認

Janos Polonyi,

(参考訳) 作用函数は古典力学、量子力学、閉力学、開力学を変動原理の一般化や古典力学、量子力学の経路積分形式論において定義することができることはよく知られている。これらのスキームは異常な特徴、すなわち自由度を正式に再活性化することに基づいている。やり直しを動機付けるいくつかの議論は古典力学や量子力学において、そのような形式主義が自然であることを証明するために進められている。

It is well known that the action functional can be used to define classical, quantum, closed, and open dynamics in a generalization of the variational principle and in the path integral formalism in classical and quantum dynamics, respectively. These schemes are based on an unusual feature, a formal redoubling of the degrees of freedom. Several arguments to motivate the redoubling are put forward in classical and quantum mechanics to demonstrate that such a formalism is natural.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-28

# 代数集合を用いた半クリフォードゲートの特性評価

Characterising semi-Clifford gates using algebraic sets ( http://arxiv.org/abs/2309.15184v2 )

ライセンス: Link先を確認

Imin Chen, Nadish de Silva,

(参考訳) フォールトトレラント量子計算における中心的な役割により、クリフォード階層の第3階層のゲートの集合とその「近対角」半クリフォードゲートの傑出した部分集合について研究する。クリフォード階層ゲートは適切なマジック状態のゲートテレポーテーションによって実装することができる。フォールトトレランスを達成するために必要なこれらのリソース状態の膨大な量は、普遍量子コンピュータの実践的実現にとって重要なボトルネックである。セミクリフォードゲートはこれらの資源状態をはるかに効率的に利用して実装できるので重要である。最大2キューディットの3階ゲートが全て半クリフォードであることを証明する。したがって、qubit の場合における Zeng-Chen-Chuang (2008) の結果と、qutrit の場合における 2 番目の著者 (2020) の結果を、任意の素次元 $d$ のクォーディットの場合に一般化する。初期の結果は網羅的な計算に頼っていたが、本研究では代数幾何学のツールを活用している。具体的には、三階クリフォード階層ゲートと三階半クリフォードゲートの集合に対応する2つのスキームを構築する。次に、これらのスキームを modulo $d$ に還元した2つの代数集合が、同じ有理点の集合を共有することを示す。

Motivated by their central role in fault-tolerant quantum computation, we study the sets of gates of the third-level of the Clifford hierarchy and their distinguished subsets of `nearly diagonal' semi-Clifford gates. The Clifford hierarchy gates can be implemented via gate teleportation given appropriate magic states. The vast quantity of these resource states required for achieving fault-tolerance is a significant bottleneck for the practical realisation of universal quantum computers. Semi-Clifford gates are important because they can be implemented with far more efficient use of these resource states. We prove that every third-level gate of up to two qudits is semi-Clifford. We thus generalise results of Zeng-Chen-Chuang (2008) in the qubit case and of the second author (2020) in the qutrit case to the case of qudits of arbitrary prime dimension $d$. Earlier results relied on exhaustive computations whereas our present work leverages tools of algebraic geometry. Specifically, we construct two schemes corresponding to the sets of third-level Clifford hierarchy gates and third-level semi-Clifford gates. We then show that the two algebraic sets resulting from reducing these schemes modulo $d$ share the same set of rational points.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-28

# ニューラルネットワークの理論と実践の切り離しについて:NTK視点の限界

On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective ( http://arxiv.org/abs/2310.00137v2 )

ライセンス: Link先を確認

Jonathan Wenger, Felix Dangel, Agustinus Kristiadi,

(参考訳) ニューラル・タンジェント・カーネル(NTK)は、大規模ニューラルネットワークの振る舞いを記述する理論的枠組みとして注目されている。カーネル法は理論的によく理解されており、結果としてアルゴリズムの利点が享受され、より広い合成ニューラルネットワークアーキテクチャで実証できる。これらの利点には、高速な最適化、信頼性のある不確実性定量化、継続的な学習の改善などがある。しかしながら、現在のカーネル体制への収束率の定量化の結果は、これらの利点を活用するには、それらよりも桁違いに広いアーキテクチャが必要であることを示唆している。この仮定は、実際に使用されるアーキテクチャがNTKが予測した振る舞いを示さないという懸念を提起する。本稿では,NTKに関するこれまでの研究を,この制限機構が大規模建築物の実用的関連行動を予測するかどうかを実証的に検証することによって補足する。我々の結果は、複数のドメインにまたがるケースではないことを証明している。この観測により、NTK理論がアーキテクチャとアルゴリズムの選択にどのような影響を及ぼすべきかという疑問がさらに持ち上がった。

The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that architectures used in practice do not exhibit behaviors as predicted by the NTK. Here, we supplement previous work on the NTK by empirically investigating whether the limiting regime predicts practically relevant behavior of large-width architectures. Our results demonstrate that this is not the case across multiple domains. This observed disconnect between theory and practice further calls into question to what degree NTK theory should inform architectural and algorithmic choices.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-28

# DeepHGCN: 効率的でスケーラブルなDeep Hyperbolic Graph Convolutional Networksの準備

DeepHGCN: Recipe for Efficient and Scalable Deep Hyperbolic Graph Convolutional Networks ( http://arxiv.org/abs/2310.02027v3 )

ライセンス: Link先を確認

Jiaxu Liu, Xinping Yi, Xiaowei Huang,

(参考訳) 双曲グラフ畳み込みネットワーク (HGCN) は階層グラフから情報を抽出する大きな可能性を証明している。しかし、既存のHGCNは、高額な双曲演算と、深さが増加するにつれて過度に平滑な問題のために、浅いアーキテクチャに限られている。 GCNでは、過剰なスムースメントを緩和するために治療が適用されているが、双曲療法の開発は、双曲性の性質に適合するように慎重に設計されるべきであるため、異なる課題を呈している。以上の課題に対処するため,本研究では,計算効率を劇的に改善し,オーバースムーシング効果を大幅に軽減した,最初の深層HGCNアーキテクチャであるDeepHGCNを提案する。ディープHGCNは,(1)高速かつ高精度な線形写像を実現する新しい双曲的特徴変換層,(2)双曲的残差接続や重みと特徴の正則化といった手法を,効率的な双曲的中点法により促進する。広範囲な実験により、DeepHGCNはユークリッドと浅い双曲GCNの変種と比較してリンク予測とノード分類のタスクが大幅に改善されていることが示されている。

Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-28

# 予測不確実性に対するモデル非依存変数の重要性--エントロピーに基づくアプローチ

Model-agnostic variable importance for predictive uncertainty: an entropy-based approach ( http://arxiv.org/abs/2310.12842v2 )

ライセンス: Link先を確認

Danny Wood, Theodore Papamarkou, Matt Benatan, Richard Allmendinger,

(参考訳) 機械学習アルゴリズムの予測を信頼するには,これらの予測に寄与する要因を理解する必要がある。確率論的かつ不確実性を考慮したモデルの場合、予測自体の理由だけでなく、モデルが予測に自信を持つ理由も理解する必要がある。本稿では、既存の説明可能性の手法を不確実性認識モデルに拡張し、そのような拡張を用いてモデルの予測分布における不確実性の原因を理解する方法について述べる。特に、置換特徴量の重要性、部分依存プロット、個別条件予測プロットを適応させることにより、モデル行動に対する新たな洞察が得られ、これらの手法が、その分布の下での予測分布のエントロピーと基底真理ラベルの対数類似度の両方に対する特徴の影響を測定することができることを示す。合成データと実世界のデータの両方を用いて実験を行い、不確実性の原因とモデル性能への影響の両方を理解するためにこれらの手法の有用性を実証する。

In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the reasons for the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches to understand both the sources of uncertainty and their impact on model performance.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-28

# GEO: ジェネレーティブエンジン最適化

GEO: Generative Engine Optimization ( http://arxiv.org/abs/2311.09735v2 )

ライセンス: Link先を確認

Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik R Narasimhan, Ameet Deshpande,

(参考訳) 大規模言語モデル (LLMs) の出現は, ユーザクエリに応答するための情報収集と要約に生成モデルを使用する, 検索エンジンの新たなパラダイムに根ざしている。この新技術は、ジェネレーティブエンジン(GE)の統一的なフレームワークの下で形式化され、正確でパーソナライズされたレスポンスを生成し、GoogleやBingのような従来の検索エンジンを急速に置き換えます。生成エンジンは通常、複数のソースから情報を合成し、LLMを使ってそれらを要約することでクエリを満足する。この変更により、‘textit{user}ユーティリティと‘textit{generative search engine}トラフィックが大幅に改善されるが、第3のステークホルダーであるWebサイトとコンテンツクリエーターにとって大きな課題となる。生成エンジンのブラックボックスと高速移動の性質を考えると、コンテンツクリエーターは、そのコンテンツを表示するtextit{when} と \textit{how} をほとんど、あるいは全くコントロールしていない。生成エンジンが残るためには、創造者経済が不利益にならないようにしなければなりません。これを解決するために、私たちは、可視化メトリクスの最適化と定義のための柔軟なブラックボックス最適化フレームワークを通じて、GEレスポンスにおけるコンテンツの可視性を改善するために、コンテンツクリエーターを支援する最初の新しいパラダイムであるジェネラティブエンジン最適化(GEO)を紹介します。我々は,複数のドメインにまたがる多様なユーザクエリの大規模ベンチマークであるGEO-benchと,これらのクエリに応答する関連Webソースを導入することで,体系的な評価を容易にする。厳密な評価により,GEOの可視性は最大40%向上することを示した。さらに、これらの戦略の有効性はドメインによって異なり、ドメイン固有の最適化手法の必要性が強調されている。私たちの研究は、情報発見システムにおける新たなフロンティアを開き、GEの開発者とコンテンツクリエーターの両方に深い影響をもたらします。

The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them using LLMs. While this shift significantly improves \textit{user} utility and \textit{generative search engine} traffic, it poses a huge challenge for the third stakeholder - website and content creators. Given the black-box and fast-moving nature of generative engines, content creators have little to no control over \textit{when} and \textit{how} their content is displayed. With generative engines here to stay, we must ensure the creator economy is not disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in GE responses through a flexible black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation by introducing GEO-bench, a large-scale benchmark of diverse user queries across multiple domains, along with relevant web sources to answer these queries. Through rigorous evaluation, we demonstrate that GEO can boost visibility by up to 40\% in GE responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific optimization methods. Our work opens a new frontier in information discovery systems, with profound implications for both developers of GEs and content creators.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-28

# Direct Clifford+T Lattice surgery Compilation を用いた実用量子回路の実用化のための現実的コスト

Realistic Cost to Execute Practical Quantum Circuits using Direct Clifford+T Lattice Surgery Compilation ( http://arxiv.org/abs/2311.10686v3 )

ライセンス: Link先を確認

Tyler LeBlond, Christopher Dean, George Watkins, Ryan S. Bennink,

(参考訳) 本稿では,Clifford+Tゲートセットを用いて表現された量子回路を表面コード格子手術命令セットに明示的にコンパイルする資源推定パイプラインについて報告する。コンパイルされた回路からのマジック状態要求のケイデンスにより、ポストホック解析においてマジック状態の蒸留と貯蔵要求の最適化が可能となる。論理回路を格子状手術操作にコンパイルするために,オープンソースのLattice Surgery Compilerを構築した。修正されたコンパイラは、論理ゲートを抽象的なレイアウトに依存しない命令セットに変換し、第2は、特定のリソースレイアウトに従ってハードウェアタイルに割り当てられる局所格子手術命令にコンパイルする。第2段階は論理的並列性を維持しながら、フォールトトレラント層におけるリソース競合を回避し、リアリズムを支援する。さらに、ユーザーはマジック状態が補充された専用タイルを指定することができ、論理計算からのリソースコストをマジック状態の蒸留と貯蔵から独立して考慮することができる。我々は,分子の基底状態推定のための資源推定を提供することにより,大規模で実用的な量子回路へのパイプラインの適用性を実証する。実回路における可変マジック状態の消費速度は、生産量が異なる限り、マジック状態記憶装置の資源コストが支配的になる可能性がある。

We report a resource estimation pipeline that explicitly compiles quantum circuits expressed using the Clifford+T gate set into a surface code lattice surgery instruction set. The cadence of magic state requests from the compiled circuit enables the optimization of magic state distillation and storage requirements in a post-hoc analysis. To compile logical circuits into lattice surgery operations, we build upon the open-source Lattice Surgery Compiler. The revised compiler operates in two stages: the first translates logical gates into an abstract, layout-independent instruction set; the second compiles these into local lattice surgery instructions that are allocated to hardware tiles according to a specified resource layout. The second stage retains logical parallelism while avoiding resource contention in the fault-tolerant layer, aiding realism. Additionally, users can specify dedicated tiles at which magic states are replenished, enabling resource costs from the logical computation to be considered independently from magic state distillation and storage. We demonstrate the applicability of our pipeline to large, practical quantum circuits by providing resource estimates for the ground state estimation of molecules. We find that variable magic state consumption rates in real circuits can cause the resource costs of magic state storage to dominate unless production is varied to suit.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-28

# 解析可解モデルにおけるページ曲線絡み合いのダイナミクス

Page curve entanglement dynamics in an analytically solvable model ( http://arxiv.org/abs/2311.18045v3 )

ライセンス: Link先を確認

Stefan Kehrein,

(参考訳) ブラックホールの絡み合いエントロピーは、ページ曲線に従うことが期待されている。時間とともに最初の線形増加の後、絡み合いエントロピーはページ時間で最大に達し、その後減少する。エントロピーの絡み合いは、体積法則で飽和するのではなく、最近になって漸近的に消える。ページ曲線の屈曲は、粒子電流と絡み合い生成の間の半古典的な接続の崩壊、ハミルトニアンの絡み合いにおける量子相転移、および$q\rightarrow\infty$ Renyiエントロピーの非解析的挙動を伴う。これらの観測は、ここで解析された正確な可解性モデルを超えて、より大きな種類のシステムに当てはまると期待されている。

The entanglement entropy of black holes is expected to follow the Page curve. After an initial linear increase with time the entanglement entropy should reach a maximum at the Page time and then decrease. This paper introduces an exactly solvable model of free fermions that explicitly shows such a Page curve: The entanglement entropy vanishes asymptotically for late times instead of saturating at a volume law. The bending down of the Page curve is accompanied by a breakdown of the semiclassical connection between particle current and entanglement generation, a quantum phase transition in the entanglement Hamiltonian and non-analytic behavior of the $q\rightarrow\infty$ Renyi entropy. These observations are expected to hold for a larger class of systems beyond the exactly solvable model analyzed here.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-28

# テクスチャ生成のためのフィールド遅延をもつ単一メッシュ拡散モデル

Single Mesh Diffusion Models with Field Latents for Texture Generation ( http://arxiv.org/abs/2312.09250v3 )

ライセンス: Link先を確認

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia,

(参考訳) 高品質なテクスチャを合成することを目的として、3次元形状の表面に直接作用する固有潜在拡散モデルの枠組みを導入する。提案手法は,2つのコントリビューション,メッシュ頂点上の離散ベクトル場としてテクスチャを符号化する潜時表現,および学習された潜時空間における拡散過程を学習する場潜時拡散モデルである。私たちは、メッシュ上の特定のテクスチャのバリエーションを生成するために、モデルがトレーニングされる、単一テクスチャ・メシュのパラダイムを考えています。合成テクスチャは,既存の単一テクスチャ・メシュ生成モデルと比較すると,優れた忠実度を示す。我々のモデルは、インペイントやラベル誘導生成などのユーザ制御編集タスクにも適応できる。提案手法の有効性は, アイソメトリー下でのフレームワークの等価性に起因し, 局所的に類似した領域の細部をシームレスに再現し, 生成的テクスチャ伝達の概念への扉を開くことができる。

We introduce a framework for intrinsic latent diffusion models operating directly on the surfaces of 3D shapes, with the goal of synthesizing high-quality textures. Our approach is underpinned by two contributions: field latents, a latent representation encoding textures as discrete vector fields on the mesh vertices, and field latent diffusion models, which learn to denoise a diffusion process in the learned latent space on the surface. We consider a single-textured-mesh paradigm, where our models are trained to generate variations of a given texture on a mesh. We show the synthesized textures are of superior fidelity compared those from existing single-textured-mesh generative models. Our models can also be adapted for user-controlled editing tasks such as inpainting and label-guided generation. The efficacy of our approach is due in part to the equivariance of our proposed framework under isometries, allowing our models to seamlessly reproduce details across locally similar regions and opening the door to a notion of generative texture transfer.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-28

# テキスト-画像拡散モデルのための正規化ニュートンラフソンインバージョン

Regularized Newton Raphson Inversion for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.12540v2 )

ライセンス: Link先を確認

Dvir Samuel, Barak Meiri, Nir Darshan, Shai Avidan, Gal Chechik, Rami Ben-Ari,

(参考訳) 拡散反転は、画像とそれを記述したテキストプロンプトを取り込み、画像を生成する雑音消音器を見つけるという問題である。現在のほとんどのインバージョン技術は、暗黙の方程式を解くことで動作し、ゆっくりと収束するか、再構成された画像が貧弱になる可能性がある。そこで我々は,この問題を暗黙の方程式の根源として定式化し,効率的な解法を設計する。我々の解法は、数値解析においてよく知られた手法であるNewton-Raphson (NR) に基づいている。 NRの単純な応用は計算不可能であり、誤った解に収束する傾向がある。高品質な再構成を提供する解に迅速に収束する効率的な正規化定式化について述べる。また,インバージョンプロセス中の条件付けによる不整合の原因を同定し,インバージョン品質を著しく低下させる。この問題に対処するため、我々はエンコーディングの即時調整を導入し、この問題を効果的に修正する。我々のソリューションであるRegularized Newton-Raphson Inversionは、遅延一貫性モデルのために0.5秒以内に画像を反転させ、インタラクティブな画像編集のための扉を開く。さらに、画像補間と希少物体の生成における改善された結果を示す。

Diffusion inversion is the problem of taking an image and a text prompt that describes it and finding a noise latent that would generate the image. Most current inversion techniques operate by approximately solving an implicit equation and may converge slowly or yield poor reconstructed images. Here, we formulate the problem as finding the roots of an implicit equation and design a method to solve it efficiently. Our solution is based on Newton-Raphson (NR), a well-known technique in numerical analysis. A naive application of NR may be computationally infeasible and tends to converge to incorrect solutions. We describe an efficient regularized formulation that converges quickly to a solution that provides high-quality reconstructions. We also identify a source of inconsistency stemming from prompt conditioning during the inversion process, which significantly degrades the inversion quality. To address this, we introduce a prompt-aware adjustment of the encoding, effectively correcting this issue. Our solution, Regularized Newton-Raphson Inversion, inverts an image within 0.5 sec for latent consistency models, opening the door for interactive image editing. We further demonstrate improved results in image interpolation and generation of rare objects.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-28

# ソフトウェアデブロ化ツールの幅広い比較評価

A Broad Comparative Evaluation of Software Debloating Tools ( http://arxiv.org/abs/2312.13274v2 )

ライセンス: Link先を確認

Michael D. Brown, Adam Meily, Brian Fairservice, Akshay Sood, Jonathan Dorn, Eric Kilmer, Ronald Eytchison,

(参考訳) ソフトウェアデ肥大化ツールは、bloatと呼ばれる不要なコードを削除することで、プログラムのセキュリティとパフォーマンスを改善しようとしている。多くのテクニックが提案されているが、採用への障壁がいくつか現れている。すなわち、デ肥大化ツールは高度に専門化されており、採用者が自身のニーズに合ったタイプのツールを見つけることは困難である。これは、確立されたメトリクスの欠如と、ツール間の比較評価によってさらに妨げられます。この情報ギャップを埋めるため、我々は10年間にわたるデブロ化文学と、現在商業開発中のいくつかのツールを調査し、デブロ化エコシステムに関する知識を分類した。次に, 相対的強度と弱さを判定するために, 10個の脱血ツールの広範囲な比較評価を行った。評価は、20のベンチマークプログラム、12のパフォーマンス、セキュリティ、正当性の測定ツールに基づいて行われた。筆者らの評価では, 脱血文学における一般的な物語と矛盾するいくつかの知見が浮かび上がっている。まず、デ肥大化ツールには、現実のソフトウェアで使用するために必要な成熟度が欠如している。第二に、デ肥大化ツールは健全で堅牢なプログラムを作成するのに苦労する。新たなファジィファジィツールであるDIFFERを用いて、私たちのデ肥大化の試みのわずか13%が、健全で堅牢なデ肥大化プログラムを生み出したことがわかった。最後に,この結果から,デ肥大化ツールは一般的に,デ肥大化プログラムの性能やセキュリティの姿勢を著しく改善しないことが明らかとなった。この論文における私たちのコントリビューションは、潜在的な採用者がツールの展望をよりよく理解し、より有能なデブロ化ツールの今後の研究と開発を動機付けるだろうと信じています。この目的のために、ベンチマークセット、データ、カスタムツールを公開しました。

Software debloating tools seek to improve the program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this information gap, we surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics. Our evaluation surfaces several concerning findings that contradict the prevailing narrative in debloating literature. First, debloating tools lack the required maturity to be used on real-world software, evidenced by a slim 21\% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13\% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools. To this end, we have made our benchmark set, data, and custom tools publicly available.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-28

# コンセプト・ボトルネック・モデルは地域性に悪影響を及ぼすか?

Do Concept Bottleneck Models Obey Locality? ( http://arxiv.org/abs/2401.01259v2 )

ライセンス: Link先を確認

Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik,

(参考訳) 概念に基づく手法は、人間の理解可能な概念を用いてモデル予測を説明する。これらのモデルは正確な概念予測器を必要とするが、既存の概念予測器が基礎となる概念に忠実であることは明らかではない。本稿では,一般的なコンセプトベースアーキテクチャのファミリであるConcept Bottleneck Models (CBM) の忠実さを,データセットの「地域」を尊重するかどうかを考察する。ローカリティは、コンセプトの価値を予測する際に、関連する機能のみを使用する。局所性が考慮されない場合、その概念は、急激な相関性、性能劣化、堅牢性に基づいて予測される。本研究は,モデル入力の摂動によってCBM予測がどのように変化するのかを考察し,独立概念が重複しない特徴部分集合に局所化されても,CBMが局所性を捉えないことを示す。我々の経験的および理論的結果は、相関した概念を持つデータセットが、局所性を学習できない正確だが解釈不能なモデルに繋がることを示した。全体として、CBMの解釈性は脆弱であり、CBMは時に急激な特徴に依存し、概念予測器の堅牢性に関するさらなる研究を必要としている。

Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "localities" in datasets. Localities involve using only relevant features when predicting a concept's value. When localities are not considered, concepts may be predicted based on spuriously correlated features, degrading performance and robustness. This work examines how CBM predictions change when perturbing model inputs, and reveals that CBMs may not capture localities, even when independent concepts are localised to non-overlapping feature subsets. Our empirical and theoretical results demonstrate that datasets with correlated concepts may lead to accurate but uninterpretable models that fail to learn localities. Overall, we find that CBM interpretability is fragile, as CBMs occasionally rely upon spurious features, necessitating further research into the robustness of concept predictors.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-28

# 2次元量子多体基底状態のバンバン準備--2次元テンソルネットワークを用いたアルゴリズムの最適化

Bang-bang preparation of quantum many-body ground states in two dimensions: optimization of the algorithm with a two-dimensional tensor network ( http://arxiv.org/abs/2401.09158v3 )

ライセンス: Link先を確認

Yintai Zhang, Jacek Dziarmaga,

(参考訳) バンバン(BB)アルゴリズムは、初期積状態が$H_1$と$H_2$の間で交互に変化することによって、2次元(2次元)量子多体ハミルトンの基底状態を作成する。近傍テンソル更新を用いて、BB進化を無限対絡み状態 (iPEPS) でシミュレートする。交代シーケンスは、最終エネルギーをコスト関数として最適化する。エネルギーは、その安定性のために接空間法で計算される。この手法は、iPEPSの変分最適化により得られた基底状態に対して、量子臨界点付近の2次元逆場量子イジングモデルでベンチマークされる。最適BB配列は、基底状態の量子アニールまたは断熱処理(AP)をシミュレートする配列と非摂動的に異なる。最適BBエネルギーは最適APエネルギーよりもはるかに速いバン数と収束する。

A bang-bang (BB) algorithm prepares the ground state of a two-dimensional (2D) quantum many-body Hamiltonian $H=H_1+H_2$ by evolving an initial product state alternating between $H_1$ and $H_2$. We use the neighborhood tensor update to simulate the BB evolution with an infinite pair-entangled projected state (iPEPS). The alternating sequence is optimized with the final energy as a cost function. The energy is calculated with the tangent space methods for the sake of their stability. The method is benchmarked in the 2D transverse field quantum Ising model near its quantum critical point against a ground state obtained by variational optimization of the iPEPS. The optimal BB sequence differs non-perturbatively from a sequence simulating quantum annealing or adiabatic preparation (AP) of the ground state. The optimal BB energy converges with the number of bangs much faster than the optimal AP energy.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-28

# 部分グロモフ・ワッサーシュタイン計量

Partial Gromov-Wasserstein Metric ( http://arxiv.org/abs/2402.03664v2 )

ライセンス: Link先を確認

Yikun Bai, Rocio Diaz Martin, Abihith Kothapalli, Hengrong Du, Xinran Liu, Soheil Kolouri,

(参考訳) 近年、Gromov-Wasserstein(GW)距離は、異なる距離空間における測度の比較を可能にするため、機械学習コミュニティへの関心が高まっている。古典的なGW問題と同じ質量要件によって課される制限を克服するために、研究者たちはバランスの取れない環境でその応用を探求し始めている。しかし、アンバランス GW (UGW) は、2つの測度空間 (mm-空間) の間の厳密な距離/距離というよりは、差分と見なすことができる。本稿では,部分グロモフ・ワッサーシュタイン(PGW)と呼ばれるUGW問題の特殊な事例を提案する。我々は、PGWがmm空間間のよく定義された計量であることを確立し、PGW問題に対する最小化器の存在やPGWとGWの関係など、理論的性質について議論する。次に、PGW問題を解くために、Frank-Wolfeアルゴリズムの2つの変種を提案し、それらが数学的および計算学的に等価であることを示す。さらに、PGW測定値に基づいて、mm-空間に対するバリー中心の類似概念を導入する。最後に, 形状マッチング, 形状検索, 形状補間などの応用において, PGW測定と関連する解法の有効性を検証し, 既存のベースラインと比較した。

The Gromov-Wasserstein (GW) distance has gained increasing interest in the machine learning community in recent years, as it allows for the comparison of measures in different metric spaces. To overcome the limitations imposed by the equal mass requirements of the classical GW problem, researchers have begun exploring its application in unbalanced settings. However, Unbalanced GW (UGW) can only be regarded as a discrepancy rather than a rigorous metric/distance between two metric measure spaces (mm-spaces). In this paper, we propose a particular case of the UGW problem, termed Partial Gromov-Wasserstein (PGW). We establish that PGW is a well-defined metric between mm-spaces and discuss its theoretical properties, including the existence of a minimizer for the PGW problem and the relationship between PGW and GW, among others. We then propose two variants of the Frank-Wolfe algorithm for solving the PGW problem and show that they are mathematically and computationally equivalent. Moreover, based on our PGW metric, we introduce the analogous concept of barycenters for mm-spaces. Finally, we validate the effectiveness of our PGW metric and related solvers in applications such as shape matching, shape retrieval, and shape interpolation, comparing them against existing baselines.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-28

# LLMs for Material Discovery:実際は分子のベイズ最適化に良いのか?

A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? ( http://arxiv.org/abs/2402.05015v2 )

ライセンス: Link先を確認

Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alán Aspuru-Guzik, Geoff Pleiss,

(参考訳) 自動化は現代の物質発見の基盤の1つである。ベイズ最適化(BO)はそのようなワークフローの不可欠な部分であり、科学者は事前のドメイン知識を利用して大きな分子空間を効率的に探索することができる。このような事前の知識は多くの形態をとることができるが、大きな言語モデル(LLM)にカプセル化された補助的な科学的知識には、かなりのファンファーレがあった。しかし、既存の研究は、ヒューリスティックな材料探索のためのLLMを探索しているだけである。実際、最近の研究は、ポイント推定された非ベイズ的 LLM から不確実性推定(BO の積分部分)を得る。本研究では, LLMが分子空間におけるベイズ最適化の原理を加速するのに実際に有用かどうかを考察する。私たちはこの質問に答える際に冷静で思いやりのない姿勢を取る。これは慎重に行われる一 LLM を標準だが原則化された BO シュロゲートモデルの固定特徴抽出器として見ること。二パラメータ効率のよい微調整法とベイズニューラルネットワークを活用してLLMサロゲートの後部を得る。実世界の化学問題に対する広範な実験により、LLMは分子上のBOに有用であるが、ドメイン固有のデータで事前訓練または微調整された場合に限り有用であることが示された。

Automation is one of the cornerstones of contemporary material discovery. Bayesian optimization (BO) is an essential part of such workflows, enabling scientists to leverage prior domain knowledge into efficient exploration of a large molecular space. While such prior knowledge can take many forms, there has been significant fanfare around the ancillary scientific knowledge encapsulated in large language models (LLMs). However, existing work thus far has only explored LLMs for heuristic materials searches. Indeed, recent work obtains the uncertainty estimate -- an integral part of BO -- from point-estimated, non-Bayesian LLMs. In this work, we study the question of whether LLMs are actually useful to accelerate principled Bayesian optimization in the molecular space. We take a sober, dispassionate stance in answering this question. This is done by carefully (i) viewing LLMs as fixed feature extractors for standard but principled BO surrogate models and by (ii) leveraging parameter-efficient finetuning methods and Bayesian neural networks to obtain the posterior of the LLM surrogate. Our extensive experiments with real-world chemistry problems show that LLMs can be useful for BO over molecules, but only if they have been pretrained or finetuned with domain-specific data.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-28

# ニュートリノ媒体における一生の遭遇モデル:コヒーレント振動からフレーバー平衡へ

Once-in-a-lifetime encounter models for neutrino media: From coherent oscillations to flavor equilibration ( http://arxiv.org/abs/2402.05022v2 )

ライセンス: Link先を確認

Anson Kost, Lucas Johns, Huaiyu Duan,

(参考訳) 集団ニュートリノ振動は典型的には、平均場近似(英語版)としても知られる最低階の量子力学方程式を用いて研究される。しかし、近年の量子多体シミュレーションでは、ニュートリノ間の量子絡み合いが重要であり、ニュートリノガスのフレーバー平衡をもたらす可能性が示唆されている。本研究では,ニュートリノガスに対する新しい量子モデルを開発し,一対のニュートリノが一生に一度だけ相互作用できることを示す。我々のモデルの主要なパラメータは$\gamma=\mu \Delta z$であり、$\mu$はニュートリノ結合強度であり、これはニュートリノ密度に比例する。我々のモデルは、極限$\gamma\to0$の平均場アプローチに還元され、時間$t \gg (\gamma\mu)^{-1}$のフレーバー平衡を達成する。これらのモデルは、粒子の観点からコヒーレントなフレーバー振動の出現を示し、集合ニュートリノ振動における量子エンタングルメントの役割を解明するのに役立つ。

Collective neutrino oscillations are typically studied using the lowest-order quantum kinetic equation, also known as the mean-field approximation. However, some recent quantum many-body simulations suggest that quantum entanglement among neutrinos may be important and may result in flavor equilibration of the neutrino gas. In this work, we develop new quantum models for neutrino gases in which any pair of neutrinos can interact at most once in their lifetimes. A key parameter of our models is $\gamma=\mu \Delta z$, where $\mu$ is the neutrino coupling strength, which is proportional to the neutrino density, and $\Delta z$ is the duration over which a pair of neutrinos can interact each time. Our models reduce to the mean-field approach in the limit $\gamma\to0$ and achieve flavor equilibration in time $t \gg (\gamma\mu)^{-1}$. These models demonstrate the emergence of coherent flavor oscillations from the particle perspective and may help elucidate the role of quantum entanglement in collective neutrino oscillations.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-28

# 一般化された選好最適化:オフラインアライメントへの統一アプローチ

Generalized Preference Optimization: A Unified Approach to Offline Alignment ( http://arxiv.org/abs/2402.05749v2 )

ライセンス: Link先を確認

Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot,

(参考訳) オフライン優先最適化により、オフラインデータから直接大規模なモデルを微調整することが可能となり、最近のアライメントプラクティスで有効であることが証明された。凸関数の一般クラスによってパラメータ化されるオフライン損失の族である一般化優先最適化(GPO)を提案する。 GPOは、DPO、IPO、SLiCといった既存のアルゴリズムを特別なケースとして含みながら、優先最適化に関する統一的なビューを可能にします。 GPOフレームワークはまた、損失を定義する凸関数の設計を通じて、オフラインアルゴリズムが正規化をどのように実施するかについても光を当てている。解析および実験により、正準RLHFの定式化を意図したオフライン正則化とKL分散正則化の関連性および微妙な相違が明らかとなった。ガオら 2023 と同様の制御された設定では、GPO 変種は正規化と性能の類似したトレードオフを達成できるが、ハイパーパラメータの最適値は理論によって予測されるように異なる可能性がある。以上の結果から,新たなアルゴリズムツールキットと経験的洞察を実践者のアライメントに提供した。

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. GPO enables a unified view over preference optimization, encompassing existing algorithms such as DPO, IPO and SLiC as special cases, while naturally introducing new variants. The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss. Our analysis and experiments reveal the connections and subtle differences between the offline regularization and the KL divergence regularization intended by the canonical RLHF formulation. In a controlled setting akin to Gao et al 2023, we also show that different GPO variants achieve similar trade-offs between regularization and performance, though the optimal values of hyper-parameter might differ as predicted by theory. In all, our results present new algorithmic toolkits and empirical insights to alignment practitioners.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-28

# Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods ( http://arxiv.org/abs/2402.07108v2 )

ライセンス: Link先を確認

Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye,

(参考訳) オンライン線形プログラミングは、収益管理と資源配分の両方において重要な役割を担い、近年では効率的な一階オンライン学習アルゴリズムの開発に重点を置いている。一階法の実証的な成功にもかかわらず、それらは一般に$\mathcal{O}(\sqrt{T})$に劣らない後悔を達成し、これは、最先端の線形プログラミング(LP)ベースのオンラインアルゴリズムによって保証される$\mathcal{O}(\log T)$に比して最適である。本稿では,オンライン線形プログラミングに関するいくつかの重要な事実を整理し,一階法に基づくオンラインアルゴリズムが$\mathcal{O}(\sqrt{T})を超えることの難しさを明らかにする。この課題に対処するために、意思決定から学習を分離する新しいアルゴリズムフレームワークを導入する。初めて、この新しいフレームワークで一階法が後悔する$\mathcal{O}(T^{1/3})$が得られることを示す。

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-28

# 切り換え可能なメカニズムによる暗黙の因果表現学習

Implicit Causal Representation Learning via Switchable Mechanisms ( http://arxiv.org/abs/2402.11124v2 )

ライセンス: Link先を確認

Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, Mark Crowley,

(参考訳) 観測データと介入データからの因果表現の学習には,暗黙の潜伏因果表現学習が必要である。因果的メカニズムの暗黙的な学習は通常、ハードとソフトの介入という2つの介入データを含む。現実のシナリオでは、ソフトな介入はハードな介入よりも現実的であることが多い。因果変化を直接強制するハード介入とは異なり、ソフト介入は因果機構に影響を与えることによって間接的に影響を与える。しかし、ソフト介入の微妙さは因果モデルの学習にいくつかの課題を課している。 1つの課題は、親関係はそのままであり、ソフト介入の効果が曖昧であることである。本稿では,ソフト介入を用いた因果モデル学習の課題に対処し,暗黙的モデリングを継続する。提案手法は,異なる因果機構を切り替えるように設計された \textit{causal mechanism switch variable} を用いてソフト介入の効果をモデル化する。実験では,ベースラインアプローチと比較して,同定可能な因果表現の学習の改善を一貫して観察した。

Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modeling. Our approach models the effects of soft interventions by employing a \textit{causal mechanism switch variable} designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-28

# PandoraのWhite-Box:大規模言語モデルにおける精密トレーニングデータの検出と抽出

Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models ( http://arxiv.org/abs/2402.17012v2 )

ライセンス: Link先を確認

Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel,

(参考訳) 本稿では,Large Language Models (LLMs) に対する最先端のプライバシ攻撃について述べる。我々の見出しは、ベースラインアタックの数百倍の精度を持つ事前訓練されたLLMに対する新たなメンバシップ推論アタック(MIA)と、自然条件下で微調整されたLLMから、細調整されたデータセットの50%以上(!)を抽出できることを示すパイプラインである。基礎となるモデルへの様々なアクセス、事前学習および微調整データ、MIAとトレーニングデータ抽出の両方について検討する。プレトレーニングデータには,モデル勾配に基づいてトレーニングデータメンバシップを予測する教師付きニューラルネットワーク分類器と,最近のLCMにおけるモデルスティーリング作業を活用したモデルへのロジットアクセスのみを必要とするこの攻撃の変種という,2つの新しいMIAを提案する。私たちの知る限り、これはモデルステアリング情報を明示的に組み込んだ最初のMIAです。どちらの攻撃も既存のブラックボックスベースラインより優れており、我々の監視された攻撃は、LSMに対するMIA攻撃の成功と、他の機械学習モデルにとって最も強力な攻撃とのギャップを埋める。微調整では, ベースモデルと微調整モデルとの損失率に基づく単純な攻撃により, ほぼ完全なMIA性能が得られることがわかった。これらの結果は、MIAおよびトレーニングデータ抽出のための事前訓練されたLLMと微調整されたLLMの両方に対する最強のプライバシ攻撃であり、これは独立した科学的関心を持ち、LLMのセキュリティ、プライバシ、著作権問題に重要な実践的意味を持つ。

In this paper we develop state-of-the-art privacy attacks against Large Language Models (LLMs), where an adversary with some access to the model tries to learn something about the underlying training data. Our headline results are new membership inference attacks (MIAs) against pretrained LLMs that perform hundreds of times better than baseline attacks, and a pipeline showing that over 50% (!) of the fine-tuning dataset can be extracted from a fine-tuned LLM in natural settings. We consider varying degrees of access to the underlying model, pretraining and fine-tuning data, and both MIAs and training data extraction. For pretraining data, we propose two new MIAs: a supervised neural network classifier that predicts training data membership on the basis of (dimensionality-reduced) model gradients, as well as a variant of this attack that only requires logit access to the model which leverages recent model-stealing work on LLMs. To our knowledge this is the first MIA that explicitly incorporates model-stealing information. Both attacks outperform existing black-box baselines, and our supervised attack closes the gap between MIA attack success against LLMs and the strongest known attacks for other machine learning models. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models. Taken together, these results represent the strongest existing privacy attacks against both pretrained and fine-tuned LLMs for MIAs and training data extraction, which are of independent scientific interest and have important practical implications for LLM security, privacy, and copyright issues.

翻訳日:2024-05-30 23:21:17 公開日:2024-05-28

# 多平面光変換器を用いた高次元量子鍵分布

High-dimensional quantum key distribution using a multi-plane light converter ( http://arxiv.org/abs/2403.04210v2 )

ライセンス: Link先を確認

Ohad Lib, Kfir Sulimany, Mateus Araújo, Michael Ben-Or, Yaron Bromberg,

(参考訳) 高次元量子鍵分布(QKD)は、2進法に比べて高い情報容量と強い雑音耐性を提供する。しかし、これらの利点は、要求される高次元の測定と変換を実現するのが困難であることによってしばしば妨げられる。本稿では,大規模マルチプレーン光コンバータ(MPLC)を実装し,QKDの空間モードの高次元モードソータとしてプログラムする。 5次元QKDと6つの非バイアスベース,25次元QKDの2つの相互バイアスベースを同じ実験装置で示す。さらに,実験誤差に対して頑健な相互に偏りのない基底のペアを構築することを提案し,測定複雑性は符号化次元の平方根に限られることを示した。このアプローチは、より高次元のQKD実装の道を開く。

High-dimensional quantum key distribution (QKD) offers higher information capacity and stronger resilience to noise compared to its binary counterpart. However, these advantages are often hindered by the difficulty of realizing the required high-dimensional measurements and transformations. Here, we implement a large-scale multi-plane light converter (MPLC) and program it as a high-dimensional mode sorter of spatial modes for QKD. Using the MPLC, we demonstrate five-dimensional QKD with six mutually unbiased bases and 25-dimensional QKD with two mutually unbiased bases in the same experimental setup. Furthermore, we propose a construction of pairs of mutually unbiased bases that are robust to experimental errors, with measurement complexity scaling only with the square root of the encoded dimension. This approach paves the way for QKD implementations in higher dimensions.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-28

# 未知のファインタニング例による言語モデルの幻覚制御

Unfamiliar Finetuning Examples Control How Language Models Hallucinate ( http://arxiv.org/abs/2403.05612v2 )

ライセンス: Link先を確認

Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine,

(参考訳) 大きな言語モデルは、馴染みのないクエリに直面すると幻覚化することが知られているが、モデル幻覚化の方法を管理する基盤となるメカニズムは、まだ完全には理解されていない。この研究では、ベースモデルの知識の範囲を超えて概念を導入する、モデルの微調整データに見慣れない例が、これらのエラーを形成するのに不可欠であることが分かりました。特に、LLMの幻覚予測は、馴染みの無い微調整の例と関連する反応を反映する傾向にある。これは、不慣れな微調整例がどのように教師されるかを変更することで、不慣れなクエリに対するモデルの応答に影響を与える可能性があることを示唆している(例: ‘I don't know'')。 SFT, RL, および報奨モデルによるトリヴィアQAおよびMMLUの微調整を含む一連の制御実験において, この観測を実証的に検証した。本研究は,RLファインタニング戦略をさらに研究し,長大なモデル生成の現実性を改善することを目的とする。その結果、報酬モデルによる幻覚は、RLの事実性を微調整する効果を著しく損なうが、報酬モデルによる報酬モデルの幻覚がこれらのネガティブな効果を最小化する方法を戦略的に制御できることが判明した。幻覚の制御に関するこれまでの知見を活かし、より信頼性の高い報酬モデルを学ぶためのアプローチを提案し、長文の伝記や書物・プロット生成タスクにおけるRL事実性の微調整の有効性を向上することを示す。

Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations. We find that, while hallucinations from the reward model can significantly undermine the effectiveness of RL factuality finetuning, strategically controlling how reward models hallucinate can minimize these negative effects. Leveraging our previous observations on controlling hallucinations, we propose an approach for learning more reliable reward models, and show that they improve the efficacy of RL factuality finetuning in long-form biography and book/movie plot generation tasks.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-28

# 一般化職業モデルによる伝達性強化学習

Transferable Reinforcement Learning via Generalized Occupancy Models ( http://arxiv.org/abs/2403.06328v2 )

ライセンス: Link先を確認

Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta,

(参考訳) 知的エージェントは、様々なタスクに迅速に適応できるジェネラリストでなければならない。強化学習(RL)において、モデルに基づくRLは、原則として計画を通じて任意の報酬関数への移行を可能にする、世界の力学モデルを学ぶ。しかし、自己回帰モデルロールアウトは複合誤差に悩まされ、モデルベースRLは長距離問題には有効ではない。継承機能は、政策の長期的状態占有度をモデル化し、新しいタスクの下での政策評価を線形報酬回帰に還元することで代替手段を提供する。しかし、後継機能による政策改善は難しい可能性がある。本研究は、定常データセットから後継特徴の分布を学習する一般化占有モデル(GOM)と、異なる後継特徴を実現するためのポリシーを新たに提案する。これらのモデルは任意の新しいタスクに対する最適なアクションを素早く選択できる。データセットの長期的な結果を直接モデル化することにより、GOMは、報酬関数間の迅速な転送を可能にしながら、複合エラーを回避することができる。本稿では,拡散モデルを用いたGOMの実用的インスタンス化について述べるとともに,様々なシミュレーションロボティクス問題に対して理論的にも経験的にも,トランスファー可能なモデルの新たなクラスとしての有効性を示す。ビデオとコードはhttps://weirdlabuw.github.io/gom/。

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new tasks to linear reward regression. Yet, policy improvement with successor features can be challenging. This work proposes a novel class of models, i.e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features. These models can quickly select the optimal action for arbitrary new tasks. By directly modeling long-term outcomes in the dataset, GOMs avoid compounding error while enabling rapid transfer across reward functions. We present a practical instantiation of GOMs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems. Videos and code at https://weirdlabuw.github.io/gom/.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-28

# 正規化非負スケール不変低ランク近似モデルの効率的なアルゴリズム

Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models ( http://arxiv.org/abs/2403.18517v2 )

ライセンス: Link先を確認

Jeremy E. Cohen, Valentin Leplat,

(参考訳) スパース非負行列因子化やスパース非負タッカー分解のような正規化非負の低ランク近似は、解釈可能性を高めた次元還元モデルの重要な分岐である。しかし、実際的な観点からは、正規化子と正規化係数の選択と効率的なアルゴリズムの設計は、これらのモデルの多因子の性質とこれらの選択を裏付ける理論の欠如のために困難である。本稿ではこれらの課題を改善することを目的とする。等質正規化スケール不変量(英語版)と呼ばれるより一般的なモデルを研究することにより、低ランク近似モデルに固有のスケール不変性が、予期せぬ有益効果と有害効果の両方で暗黙的な正則化を引き起こすことが証明される。この観察により、低ランク近似モデルにおける正規化関数の効果をよりよく理解し、正規化ハイパーパラメータの選択をガイドし、専用最適化アルゴリズムの収束速度を高めるためのバランス戦略を設計することができる。これらの結果のいくつかはすでに知られているが、正規化低ランク近似の特定の例に限定されている。また、正規化された非負の低ランク近似の多くを、収束保証付きで処理する一般化行列化最小化アルゴリズムを導出する。我々は,スパース非負行列因子分解,リッジ規則化カノニカルポリアディック分解,スパース非負タッカー分解への貢献を紹介する。

Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-28

# 各種人工知能を用いた血液検査パラメータに基づくCOVID-19検出

COVID-19 Detection Based on Blood Test Parameters using Various Artificial Intelligence Methods ( http://arxiv.org/abs/2404.02348v2 )

ライセンス: Link先を確認

Kavian Khanjani, Seyed Rasoul Hosseini, Hamid Taheri, Shahrzad Shashaani, Mohammad Teshnehlab,

(参考訳) 2019年には、新型コロナウイルスによる新型コロナウイルス感染症SARS-CoV-2(SARS-CoV-2)という新たな課題に直面した。新型コロナウイルスは世界中で急速に広まり、死亡率が高くなり、医療機関は感染抑制策を講じた。早期の疾患検出は治療プロセスにおいて不可欠であり、この取り組みを支援するためにコンピュータベースの自動検出システムが開発されている。これらのシステムは、機械学習、ニューラルネットワーク、ファジィシステム、病気の分類のためのディープラーニングといった人工知能(AI)アプローチに依存していることが多い。本研究は、自己分類分類器を用いて、さまざまなAI手法を用いて、新型コロナウイルス患者と他者とを区別することを目的とした。この研究では、血液検査サンプルと放射線画像の2つのデータセットを使用しました。サンラファエル病院で採取した血液検査の最良の結果は、Ensemble法(ニューラルネットワークと2つの機械学習手法の組み合わせ)を用いて、新型コロナウイルスと非新型コロナウイルスの2種類の個人を含む。その結果、新型コロナウイルスの診断はコスト効率が高く、他の方法よりも短い時間で結果が得られることがわかった。提案されたモデルは、使用するデータセットに対して94.09%の精度を達成した。第2に、X線写真は、正常、ウイルス性肺炎、グラウンドガラスの透明度、COVID-19感染の4つのクラスに分けられた。これらはセグメンテーションと分類に使用された。肺葉は画像から抽出され、その後特定のクラスに分類された。画像データセットで91.1%の精度を達成した。一般的に、この研究は、新型コロナウイルスの検出と管理におけるAIの可能性を強調し、この分野における継続的な研究と開発の重要性を強調している。

In 2019, the world faced a new challenge: a COVID-19 disease caused by the novel coronavirus, SARS-CoV-2. The virus rapidly spread across the globe, leading to a high rate of mortality, which prompted health organizations to take measures to control its transmission. Early disease detection is crucial in the treatment process, and computer-based automatic detection systems have been developed to aid in this effort. These systems often rely on artificial intelligence (AI) approaches such as machine learning, neural networks, fuzzy systems, and deep learning to classify diseases. This study aimed to differentiate COVID-19 patients from others using self-categorizing classifiers and employing various AI methods. This study used two datasets: the blood test samples and radiography images. The best results for the blood test samples obtained from San Raphael Hospital, which include two classes of individuals, those with COVID-19 and those with non-COVID diseases, were achieved through the use of the Ensemble method (a combination of a neural network and two machines learning methods). The results showed that this approach for COVID-19 diagnosis is cost-effective and provides results in a shorter amount of time than other methods. The proposed model achieved an accuracy of 94.09% on the dataset used. Secondly, the radiographic images were divided into four classes: normal, viral pneumonia, ground glass opacity, and COVID-19 infection. These were used for segmentation and classification. The lung lobes were extracted from the images and then categorized into specific classes. We achieved an accuracy of 91.1% on the image dataset. Generally, this study highlights the potential of AI in detecting and managing COVID-19 and underscores the importance of continued research and development in this field.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-28

# 接点交叉によって生じる小さな回避交差に対する二段階断熱遷移確率

Two-level adiabatic transition probability for small avoided crossings generated by tangential intersections ( http://arxiv.org/abs/2404.17777v2 )

ライセンス: Link先を確認

Kenta Higuchi, Takuya Watanabe,

(参考訳) 本稿では,二つのパラメータ(断熱パラメータとエネルギーギャップパラメータ)がゼロとなる限界の下で,二段回避交差の遷移確率の漸近挙動について検討する。これは、接する交差点によって回避された交差が生成され、非断熱的な体制に従う、我々の以前の作品の継続である。主な結果は、遷移確率の漸近膨張だけでなく、いくつかの回避された交差と異なる消滅順序から生じる2パラメータ状態の共存によって引き起こされる量子干渉も解明する。

In this paper, the asymptotic behaviors of the transition probability for two-level avoided crossings are studied under the limit where two parameters (adiabatic parameter and energy gap parameter) tend to zero. This is a continuation of our previous works where avoided crossings are generated by tangential intersections and obey a non-adiabatic regime. The main results elucidate not only the asymptotic expansion of transition probability but also a quantum interference caused by several avoided crossings and a coexistence of two-parameter regimes arising from different vanishing orders.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-28

# 交互ミラーのシンプレクティック解析

A Symplectic Analysis of Alternating Mirror Descent ( http://arxiv.org/abs/2405.03472v2 )

ライセンス: Link先を確認

Jonas Katona, Xiuyuan Wang, Andre Wibisono,

(参考訳) 双線型ゼロサムゲームに対する交互ミラーD(Alternating Mirror Descent, AMD)アルゴリズムの挙動を理解することにより, シンプレクティック・オイラー法による連続時間ハミルトン流の離散化について検討する。我々は、シンプレクティックオイラー法において、保存量である修正ハミルトニアン(MH)の存在と性質に重点を置いて、ハミルトン力学、リー代数、シンプレクティック数値積分器の結果を用いた分析フレームワークを提供する。元のハミルトニアンが二次函数であるとき、MHを閉形式で計算し、それ以前に知られている他の保存量と一般的に異なることを示す。 AMD の平均イテレートの双対性ギャップは、改良された $\mathcal{O}(K^{1/5})$ total regret bound と $\mathcal{O}(K^{-4/5})$ $\mathcal{O}(K^{-4/5})$ $ である。最後に、もし真であれば、AMDの完全後悔は$\mathcal{O}\left(K^{\varepsilon}\right)$、平均的なイテレートの双対性ギャップは$\mathcal{O}\left(K^{-1+\varepsilon}\right)$として、任意の$\varepsilon>0$に対して$\mathcal{O}\left(K^{-1+\varepsilon}\right)$であり、MHの収束条件によって$\varepsilon=0$を取ることができるという予想を提案する。

Motivated by understanding the behavior of the Alternating Mirror Descent (AMD) algorithm for bilinear zero-sum games, we study the discretization of continuous-time Hamiltonian flow via the symplectic Euler method. We provide a framework for analysis using results from Hamiltonian dynamics, Lie algebra, and symplectic numerical integrators, with an emphasis on the existence and properties of a conserved quantity, the modified Hamiltonian (MH), for the symplectic Euler method. We compute the MH in closed-form when the original Hamiltonian is a quadratic function, and show that it generally differs from the other conserved quantity known previously in that case. We derive new error bounds on the MH when truncated at orders in the stepsize in terms of the number of iterations, $K$, and use these bounds to show an improved $\mathcal{O}(K^{1/5})$ total regret bound and an $\mathcal{O}(K^{-4/5})$ duality gap of the average iterates for AMD. Finally, we propose a conjecture which, if true, would imply that the total regret for AMD scales as $\mathcal{O}\left(K^{\varepsilon}\right)$ and the duality gap of the average iterates as $\mathcal{O}\left(K^{-1+\varepsilon}\right)$ for any $\varepsilon>0$, and we can take $\varepsilon=0$ upon certain convergence conditions for the MH.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-28

# 任意遅延下における不均一物体の非同期フェデレーション確率最適化

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays ( http://arxiv.org/abs/2405.10123v2 )

ライセンス: Link先を確認

Charikleia Iakovidou, Kibaek Kim,

(参考訳) フェデレートラーニング(FL)は、中央サーバの協調の下で、複数の場所("clients")に保持されたデータでモデルをセキュアにトレーニングするために提案されている。 FLアルゴリズムの性能を阻害する2つの大きな課題は、クライアントの階層化による長いトレーニング時間と、非IDなローカルデータ分布("client drift")下でのモデルの精度の低下である。本研究では,非同期通信を利用して収束を高速化し,拡張性を向上するアルゴリズムであるAsynchronous Exact Averaging (AREA) を提案・解析し,クライアント更新頻度の変動によるクライアントのドリフトの補正にクライアントメモリを利用する。さらに、AREAは、私たちの知る限り、遅延適応段階化を使わずに、任意に長い遅延の下で収束することが保証される最初の方法である。 i) 強凸で滑らかな関数に対して、漸近的にその大きさが反復数に関して使われる確率勾配の分散にのみ依存する誤差近傍に収束する。 (ii) 凸で非滑らかな関数の場合, 集中確率勾配法の収束率を, 最小(または最大)ではなく, 個々のクライアント更新頻度の平均に依存する定数因子に一致させる。解析の結果,特にクライアント数の増加に伴い,ローカルデータが非IDである場合,AREAは最先端の手法よりも優れることが示された。

Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients, and a decline in model accuracy under non-iid local data distributions ("client drift"). In this work, we propose and analyze Asynchronous Exact Averaging (AREA), a new stochastic (sub)gradient algorithm that utilizes asynchronous communication to speed up convergence and enhance scalability, and employs client memory to correct the client drift caused by variations in client update frequencies. Moreover, AREA is, to the best of our knowledge, the first method that is guaranteed to converge under arbitrarily long delays, without the use of delay-adaptive stepsizes, and (i) for strongly convex, smooth functions, asymptotically converges to an error neighborhood whose size depends only on the variance of the stochastic gradients used with respect to the number of iterations, and (ii) for convex, non-smooth functions, matches the convergence rate of the centralized stochastic subgradient method up to a constant factor, which depends on the average of the individual client update frequencies instead of their minimum (or maximum). Our numerical results validate our theoretical analysis and indicate AREA outperforms state-of-the-art methods when local data are highly non-iid, especially as the number of clients grows.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-28

# インフラストラクチャエンジニアリング: 研究エコシステムにおける過小評価された役割

Infrastructure Engineering: A Still Missing, Undervalued Role in the Research Ecosystem ( http://arxiv.org/abs/2405.10473v2 )

ライセンス: Link先を確認

Vanessa Sochat,

(参考訳) 研究はますますソフトウェアに頼り、バイオインフォマティクス、高性能コンピューティング、物理学、機械学習、人工知能の原動力となっている。研究対象となるソフトウェアや関連資産を直接的に開発するソフトウェア技術者であるリサーチソフトウェアエンジニアのために、かなりの進歩があったが、研究インフラストラクチャとイノベーション、すなわち、コンパイラと互換性ツールの開発、オーケストレーションとスケジューリングインフラストラクチャ、開発者環境、コンテナテクノロジ、ワークフローマネージャといった、研究インフラストラクチャとイノベーションの背後にある労働力にはほとんど関心が向けられていない。クラウドコンピューティングのさまざまなモデルに向けて経済的なインセンティブが進み、両方の世界のベストを表す新しいパラダイムを開発するためには革新が必要であるため、「収束コンピューティング」と呼ばれる取り組みは、そのような役割の必要性は理想的ではなく、科学の継続的な成功に不可欠である。非伝統的な職種に散在するスタッフは、この分野のいくつかの側面で作業する時間を見出しているが、それを支援するための大きな労働力の欠如とインセンティブが科学界を後退させてきた。この記事では、この欠落したレイヤの重要性を強調し、インフラストラクチャエンジニアの役割の欠如が、相互運用性、ポータビリティ、そして科学の再現性において、いかに非効率になったかを例示します。我々は、これらの技術に対して、個人が明示的に作業するためのリソースを割り当て、提供し、維持できないことは、我々の科学コミュニティの継続的な成功に最適でない未来をもたらす可能性があることを示唆する。

Research has become increasingly reliant on software, serving as the driving force behind bioinformatics, high performance computing, physics, machine learning and artificial intelligence, to name a few. While substantial progress has been made in advocating for the research software engineer, a kind of software engineer that typically works directly on software and associated assets that go into research, little attention has been placed on the workforce behind research infrastructure and innovation, namely compilers and compatibility tool development, orchestration and scheduling infrastructure, developer environments, container technologies, and workflow managers. As economic incentives are moving toward different models of cloud computing and innovating is required to develop new paradigms that represent the best of both worlds, an effort called "converged computing," the need for such a role is not just ideal, but essential for the continued success of science. While scattered staff in non-traditional roles have found time to work on some facets of this space, the lack of a larger workforce and incentive to support it has led to the scientific community falling behind. In this article we will highlight the importance of this missing layer, providing examples of how a missing role of infrastructure engineer has led to inefficiencies in the interoperability, portability, and reproducibility of science. We suggest that an inability to allocate, provide resources for, and sustain individuals to work explicitly on these technologies could lead to possible futures that are sub-optimal for the continued success of our scientific communities.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-28

# 対話型協調計画獲得におけるマインドモデリング理論の限界

Limits of Theory of Mind Modelling in Dialogue-Based Collaborative Plan Acquisition ( http://arxiv.org/abs/2405.12621v2 )

ライセンス: Link先を確認

Matteo Bortoletto, Constantin Ruhdorfer, Adnen Abdessaied, Lei Shi, Andreas Bulling,

(参考訳) 対話型協調計画獲得(CPA)に関する最近の研究は、非対称なスキルセットと知識を持つ設定において、心の理論(ToM)モデリングが不足した知識予測を改善することを示唆している。 ToMは効果的なコラボレーションのために重要とされているが、この新しいタスクに対する実際の影響は未解明のままである。計画をグラフとして表現し、タスク固有の制約を活用することで、CPAのパフォーマンスが自分自身の不足した知識を予測するときにほぼ倍になるため、ToMモデリングによる改善は減少することを示す。この現象は、既存のベースライン法を評価する際にも持続する。 CPAにおけるToMの関連性をよりよく理解するために,本研究では,ToM機能の有無によるモデルの性能比較を原則的に報告する。異なるモデルとアブリゲーションにわたる結果は、学習されたToM機能は、ToMに知覚可能なリンクを伴わずに、データ内の遅延パターンを反映する可能性が高いことを一貫して示唆している。この発見は、CPA以降におけるToMの役割のより深い理解と、計算協調エージェントにおける精神状態のモデリングと評価のための新しい方法を要求する。

Recent work on dialogue-based collaborative plan acquisition (CPA) has suggested that Theory of Mind (ToM) modelling can improve missing knowledge prediction in settings with asymmetric skill-sets and knowledge. Although ToM was claimed to be important for effective collaboration, its real impact on this novel task remains under-explored. By representing plans as graphs and by exploiting task-specific constraints we show that, as performance on CPA nearly doubles when predicting one's own missing knowledge, the improvements due to ToM modelling diminish. This phenomenon persists even when evaluating existing baseline methods. To better understand the relevance of ToM for CPA, we report a principled performance comparison of models with and without ToM features. Results across different models and ablations consistently suggest that learned ToM features are indeed more likely to reflect latent patterns in the data with no perceivable link to ToM. This finding calls for a deeper understanding of the role of ToM in CPA and beyond, as well as new methods for modelling and evaluating mental states in computational collaborative agents.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# Pytorch-Wildlife: 保全のための協調的なディープラーニングフレームワーク

Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation ( http://arxiv.org/abs/2405.12930v2 )

ライセンス: Link先を確認

Andres Hernandez, Zhongqi Miao, Luisa Vargas, Rahul Dodhia, Juan Lavista,

(参考訳) 様々な要因によって引き起こされた世界の生物多様性の急激な減少は、大規模な野生生物モニタリングの緊急の必要性を浮き彫りにしている。これに対し、科学者は野生生物のモニタリングにおいて、データ処理のための自動化されたディープラーニング手法に目を向けた。しかし、これらの高度な手法を現実のシナリオに適用することは、その複雑さと専門知識の必要性により、主に技術的な課題と学際的障壁のために困難である。これらの課題に対処するために、PyTorch上に構築されたオープンソースのディープラーニングプラットフォームであるPytorch-Wildlifeを紹介します。強力なAIモデルの作成、修正、共有のために設計されている。このプラットフォームはユーザビリティとアクセシビリティを重視しており、技術的背景が限られている個人でもアクセス可能である。また、機能拡張とさらなる開発を簡単にするためのモジュール化されたコードベースも提供する。 Pytorch-Wildlifeは直感的でユーザフレンドリなインターフェースを提供し、画像やビデオの動物検出と分類のために、ローカルインストールまたはHugging Faceを通じてアクセスすることができる。現実世界の2つの応用として、Pytorch-Wildlifeは、アマゾン熱帯雨林での動物分類モデルの訓練や、ガラパゴス諸島での侵入性オポッサムの認識に利用されている。 Opossumモデルは98%の精度で、Amazonモデルはデータの90%で36匹の動物に対して92%の精度で認識する。 Pytorch-Wildlifeが進化するにつれて、環境問題に対処しながら、より多くの保全タスクを統合することを目指しています。 Pytorch-Wildlifeはhttps://github.com/microsoft/CameraTraps.comで公開されている。

The alarming decline in global biodiversity, driven by various factors, underscores the urgent need for large-scale wildlife monitoring. In response, scientists have turned to automated deep learning methods for data processing in wildlife monitoring. However, applying these advanced methods in real-world scenarios is challenging due to their complexity and the need for specialized knowledge, primarily because of technical challenges and interdisciplinary barriers. To address these challenges, we introduce Pytorch-Wildlife, an open-source deep learning platform built on PyTorch. It is designed for creating, modifying, and sharing powerful AI models. This platform emphasizes usability and accessibility, making it accessible to individuals with limited or no technical background. It also offers a modular codebase to simplify feature expansion and further development. Pytorch-Wildlife offers an intuitive, user-friendly interface, accessible through local installation or Hugging Face, for animal detection and classification in images and videos. As two real-world applications, Pytorch-Wildlife has been utilized to train animal classification models for species recognition in the Amazon Rainforest and for invasive opossum recognition in the Galapagos Islands. The Opossum model achieves 98% accuracy, and the Amazon model has 92% recognition accuracy for 36 animals in 90% of the data. As Pytorch-Wildlife evolves, we aim to integrate more conservation tasks, addressing various environmental challenges. Pytorch-Wildlife is available at https://github.com/microsoft/CameraTraps.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# FairLENS: 法執行音声認識における公正性の評価

FairLENS: Assessing Fairness in Law Enforcement Speech Recognition ( http://arxiv.org/abs/2405.13166v2 )

ライセンス: Link先を確認

Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Bryan Wheeler, Yasser Ibrahim,

(参考訳) 自動音声認識(ASR)技術は強力なツールとなり、法執行のシナリオにおける効率性を高めている。異なる音響環境における人口集団の公平性を確保するために、ASRエンジンは現実的な設定で様々な話者間でテストされなければならない。しかし、信頼性のあるモデル間の公平性の違いを説明することは依然として困難である。一方、ほとんどのパブリックなASRデータセットは満足のいく公正性評価を行うには不十分である。この制限に対処するため、系統的な公平性評価フレームワークであるFairLENSを構築しました。本研究では,異なるモデル間の公平さの相違を検証するための,新しい適応性評価手法を提案する。また、複数のシナリオと人口統計次元をカバーする公平性評価データセットも収集した。このフレームワークを活用することで、1つのオープンソースと11の商用利用可能な最先端のASRモデルに対して公平性の評価を行った。以上の結果から,特定の実世界のシナリオに対してASRモデルを選択する際に,ユーザが情報選択を行うためのフェアネスガイドラインとして機能するモデルが,他のモデルよりも多くのバイアスを示すことが明らかとなった。さらに、特定の人口集団に対するモデルバイアスについて検討し、音響領域の変化が新しいバイアスの出現につながることを観察した。

Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most public ASR datasets are insufficient to perform a satisfying fairness evaluation. To address the limitations, we built FairLENS - a systematic fairness evaluation framework. We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We also collected a fairness evaluation dataset covering multiple scenarios and demographic dimensions. Leveraging this framework, we conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models. Our results reveal that certain models exhibit more biases than others, serving as a fairness guideline for users to make informed choices when selecting ASR models for a given real-world scenario. We further explored model biases towards specific demographic groups and observed that shifts in the acoustic domain can lead to the emergence of new biases.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# CamemBERT-bioを用いた臨床物語の多目的表現

Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio ( http://arxiv.org/abs/2405.18448v1 )

ライセンス: Link先を確認

Boammani Aser Lompo, Thanh-Dung Le,

(参考訳) 本研究では,CamemBERT-bioを用いて,医学文献から抽出した数値を7つの異なる生理カテゴリーに分類することを目的とした。従来の研究は、トランスフォーマーベースのモデルが従来のNLPモデルと同等に機能しない可能性を示唆していた。 CamemBERT-bioのパフォーマンスを向上させるために,キーワード埋め込みをモデルに組み込むことと,テキストからすべての数値データを排除して数に依存しない戦略を採用するという,2つの大きなイノベーションを紹介した。ラベル埋め込み手法の実装は、注意機構を洗練させ、"数値盲点"データセットを使用する技術は、文脈中心の学習を促進することを目的としている。我々の研究のもう1つの重要な要素は、抽出された数値データの臨界度を決定することである。これを実現するために、確立された標準範囲内に値が該当するかどうかを検証するための簡単なアプローチを利用した。 F1スコア0.89の従来法を上回り,CamemBERT-bioの有効性が著しく向上した。これは従来のアプローチの0.73ドルF_1$スコアよりも20倍、最先端のアプローチの0.82ドルF_1$スコアよりも9倍以上増加することを意味する。トレーニングデータセットが小さく、バランスの取れていないにもかかわらず、これらすべてが達成された。

This research aims to classify numerical values extracted from medical documents across seven distinct physiological categories, employing CamemBERT-bio. Previous studies suggested that transformer-based models might not perform as well as traditional NLP models in such tasks. To enhance CamemBERT-bio's performances, we introduce two main innovations: integrating keyword embeddings into the model and adopting a number-agnostic strategy by excluding all numerical data from the text. The implementation of label embedding techniques refines the attention mechanisms, while the technique of using a `numerical-blind' dataset aims to bolster context-centric learning. Another key component of our research is determining the criticality of extracted numerical data. To achieve this, we utilized a simple approach that involves verifying if the value falls within the established standard ranges. Our findings are encouraging, showing substantial improvements in the effectiveness of CamemBERT-bio, surpassing conventional methods with an F1 score of 0.89. This represents an over 20\% increase over the 0.73 $F_1$ score of traditional approaches and an over 9\% increase over the 0.82 $F_1$ score of state-of-the-art approaches. All this was achieved despite using small and imbalanced training datasets.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# アダプティブ・マルチスケール網膜診断:トランスファーラーニングとシームズネットワークを活用した総合的ファンドス多値検出のためのハイブリッドトリオモデルアプローチ

Adaptive Multiscale Retinal Diagnosis: A Hybrid Trio-Model Approach for Comprehensive Fundus Multi-Disease Detection Leveraging Transfer Learning and Siamese Networks ( http://arxiv.org/abs/2405.18449v1 )

ライセンス: Link先を確認

Yavuz Selim Inan,

(参考訳) WHOは、世界中の22億人以上がメディアヘイズ、緑内障、ドルーゼンなどの視覚障害に苦しんでいると宣言した。少なくとも10億件の症例は予防または治療が成功していた可能性があるが、貧困、専門医の欠如、眼科医による不正確な眼底診断、あるいはまれな疾患の存在のために未治療のままである。これを解決するために,12種類の共通眼疾患と稀眼疾患を正確に診断するハイブリッドトリオネットワークモデルアルゴリズムを開発した。このアルゴリズムは3,200基の画像のRFMiDデータセットとBinary Relevance Methodを用いて、病気を別々に検出し、拡張性を確保し、誤った相関を避ける。それぞれの検出器は、性能を最適化するために微調整されたハイパーパラメータを組み込んでおり、古典的な伝達学習CNNモデル、二段階CNNモデル、シームズネットワークの3つの特徴成分から構成されていた。診断は、このTrio-Model with Ensembled Machine Learningアルゴリズムから抽出された特徴を用いて行われた。提案したモデルの平均精度は97%、AUCスコアは0.96である。過去のベンチマークと比較すると、F1スコアの10%以上の増加は、ほとんどの疾患で見られた。さらに、シームズ・ネットワークを用いて、過去の研究では信頼性が低いために予測できなかった光ディスク口蓋裂などの疾患の予測に成功している。本発明の診断ツールは、一般的な疾患と稀な疾患の両方の早期発見をグローバル化するための、安定的で適応的で、費用効果があり、効率的で、アクセスしやすく、高速なソリューションを提供する。

WHO has declared that more than 2.2 billion people worldwide are suffering from visual disorders, such as media haze, glaucoma, and drusen. At least 1 billion of these cases could have been either prevented or successfully treated, yet they remain unaddressed due to poverty, a lack of specialists, inaccurate ocular fundus diagnoses by ophthalmologists, or the presence of a rare disease. To address this, the research has developed the Hybrid Trio-Network Model Algorithm for accurately diagnosing 12 distinct common and rare eye diseases. This algorithm utilized the RFMiD dataset of 3,200 fundus images and the Binary Relevance Method to detect diseases separately, ensuring expandability and avoiding incorrect correlations. Each detector, incorporating finely tuned hyperparameters to optimize performance, consisted of three feature components: A classical transfer learning CNN model, a two-stage CNN model, and a Siamese Network. The diagnosis was made using features extracted through this Trio-Model with Ensembled Machine Learning algorithms. The proposed model achieved an average accuracy of 97% and an AUC score of 0.96. Compared to past benchmark studies, an increase of over 10% in the F1-score was observed for most diseases. Furthermore, using the Siamese Network, the model successfully made predictions in diseases like optic disc pallor, which past studies failed to predict due to low confidence. This diagnostic tool presents a stable, adaptive, cost-effective, efficient, accessible, and fast solution for globalizing early detection of both common and rare diseases.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# スペクトルモードマッチングによる普遍量子周波数コム測定

Universal quantum frequency comb measurements by spectral mode-matching ( http://arxiv.org/abs/2405.18454v1 )

ライセンス: Link先を確認

Bakhao Dioum, Virginia D'Auria, Alessandro Zavatta, Olivier Pfister, Giuseppe Patera,

(参考訳) マルチモード干渉計の周波数コムは、フィールド符号化された量子情報に対して例外的なスケーラビリティを提供する。しかし、安定場検出法であるホモダイン検出は、いくつかのスペクトル二次構造(およびLOに関するその対称性)が到達できないため、コム全体の量子情報にアクセスすることができない。ここでは,光量子コンピューティングに必要であり,パルス型LOを用いたホモダイン検出では不可能な,多モード量子光学源の任意の1ショット計測を行うための,最初の一般的なアプローチを提案する。このアプローチでは、メモリ効果を伴う干渉計と解釈できるスペクトルモードマッチングを用いる。完全形式を導出し,マイクロキャビティアレイによる実装を提案する。

The frequency comb of a multimode interferometer offers exceptional scalability potential for field-encoded quantum information. However, the staple field detection method, homodyne detection, cannot access quantum information in the whole comb because some spectral quadratures (and their asymmetries with respect to the LO) are out of reach. We present here the first general approach to make arbitrary, one-shot measurements of a multimode quantum optical source, something that is required for photonic quantum computing and is not possible when using homodyne detection with a pulse-shaped LO. This approach uses spectral mode-matching, which can be understood as interferometry with a memory effect. We derive a complete formalism and propose an implementation by microcavity arrays.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# 反復ガウス過程における過パラメータ最適化のための線形系解法の改良

Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes ( http://arxiv.org/abs/2405.18457v1 )

ライセンス: Link先を確認

Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-Lobato,

(参考訳) 非常に大きなデータセットへのハイパーパラメータ最適化のスケーリングは、ガウスのプロセスコミュニティでは未解決の問題である。本稿では, 共役勾配, 交互射影, 確率勾配勾配などの線形系解法を用いて, 限界次数勾配を推定する反復法について述べる。解決者間で適用可能な3つの重要な改善点について論じる。 (i)パスワイズ勾配推定器で、必要な解法反復数を減らし、予測を行う計算コストを補正する。 (II) 先段からの解を用いた温かい開始線形系解法は、無視バイアスのコストでより高速な解法収束をもたらす。 3) 線形系解法は, 計算予算が限られており, 温暖化開始と相乗効果があり, 解法の進行が複数の余分な確率ステップで蓄積される。これらのテクニックは、トレランスを解決した場合に最大72\times$のスピードアップを提供し、早期停止時には平均残留ノルムを最大7\times$まで下げる。

Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community. This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions, (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias, (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps. These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# グレーボックス深部フォトニックニューラルネットワークのトレーニングのための非対称推定器

Asymmetrical estimator for training grey-box deep photonic neural networks ( http://arxiv.org/abs/2405.18458v1 )

ライセンス: Link先を確認

Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng,

(参考訳) 物理ニューラルネットワーク(PNN)は、その高帯域幅、伝搬内アナログ処理のため、ニューラルネットワークアクセラレーションの新たなパラダイムである。推論に対するPNNのアドバンテージにもかかわらず、トレーニングは依然として課題である。物理変換の不完全な情報は、バックプロパゲーション(BP)からの従来の勾配に基づく更新の失敗を意味する。本稿では、PNN構造をグレーボックスとして扱う非対称トレーニング(AT)法を提案する。 ATは、物理的な制御-変換マッピングに関する情報を必要としない、深層ニューラルネットワーク構造の最後の層出力とニューロントポロジカル接続のみを知りながら、トレーニングを実行する。我々は、未校正フォトニック集積回路(PIC)により実装された深層グレーボックスPNNに対してAT法を実験的に実証し、アイリスフラワーの分類精度を改善し、乱数推定からほぼ理論的最大値への修正MNIST手書き桁を修正した。また、MNIST, fashion-MNIST, Kuzushiji-MNISTなど、さまざまなデータセットに対するAT over BPの連続的な性能向上も紹介した。 AT法は、ハードウェアのオーバーヘッドを最小限に抑え、計算のオーバーヘッドを減らし、物理計算の利点を十分に探求するための頑丈な軽量な訓練として成功した。

Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# 空間依存対策の情報理論的ルーツの提案

Probing the Information Theoretical Roots of Spatial Dependence Measures ( http://arxiv.org/abs/2405.18459v1 )

ライセンス: Link先を確認

Zhangyu Wang, Krzysztof Janowicz, Gengchen Mai, Ivan Majic,

(参考訳) 直感的には、空間依存の測度とエントロピーの情報理論測度との間には関係がある。例えば、空間データサンプルが平均的に、期待される情報よりも少ないことを述べ、空間データが特別な理由を直感的に説明できる。同様に、圧縮が容易な空間データ、例えばリモートセンシング画像は、空間的自己相関も顕著である。情報理論の広く使われている言語における空間情報理論の(非常に特異的な)コア概念を定式化することで、それらの違いと類似性に関する新たな視点が開かれ、また、より広範なAI/MLコミュニティとの学際的なコラボレーションを促進する。しかし、この直感的な関係は形式化と一般化が難しいため、以前の研究は主にランドスケープパターンを記述する実験結果に頼っている。本研究では,空間的自己相関(特にモランのI)の情報理論のルーツを,自己情報レンズ(補題としても知られる)を通して探求し,形式的証明と実験の両方を提供する。

Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant spatial autocorrelation. Formulating our (highly specific) core concepts of spatial information theory in the widely used language of information theory opens new perspectives on their differences and similarities and also fosters cross-disciplinary collaboration, e.g., with the broader AI/ML communities. Interestingly, however, this intuitive relation is challenging to formalize and generalize, leading prior work to rely mostly on experimental results, e.g., for describing landscape patterns. In this work, we will explore the information theoretical roots of spatial autocorrelation, more specifically Moran's I, through the lens of self-information (also known as surprisal) and provide both formal proofs and experiments.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-28

# アルゴリズムが不当なまま残る理由:アルゴリズム活動にまつわる電力構造

Why Algorithms Remain Unjust: Power Structures Surrounding Algorithmic Activity ( http://arxiv.org/abs/2405.18461v1 )

ライセンス: Link先を確認

Andrew Balch,

(参考訳) アルゴリズムは私たちの社会生活においてますます重要な役割を果たす。残念なことに、彼らは社会的な不正を常習することが多い。これらのアルゴリズムの不正に対処する一般的な手段は、アルゴリズムの改革である、より公平で説明責任があり透明なアルゴリズム自体を微調整することである。しかし、批判的アルゴリズム研究の新たな分野は、アルゴリズムを取り巻くパワー構造を無視しているため、改革派アプローチがアルゴリズムの不正を抑えることに失敗したことを示している。私は、このパワー構造を分析するために、重要なアルゴリズム研究からの電話を受け、Erik Olin Wright氏によって開発されたフレームワークを使用して、アルゴリズムが社会内で研究、開発、訓練、展開される方法であるアルゴリズム活動を取り巻くパワーの構成を調べます。アルゴリズム活動が平等で非民主的で、持続不可能な理由は、それを形作る権力構造が、社会的エンパワーメントというよりも経済的なエンパワーメントの1つであるからである、と私は主張する。アルゴリズム活動が社会的に公正であるためには、アルゴリズムの反対側にいる人々に力を与えるために、このパワー構成を変える必要があります。この目的のために、私はアルゴリズム活動の文脈におけるライトの共生的、間質的、ラプチュラルな変換と、アルゴリズムを使って社会問題に対処する仮説研究プロジェクトでどのように適用されるかを探る。私は、社会的にただのアルゴリズム活動というビジョンで締めくくると、将来的な作業は、提案された変革を統合し、社会的エンパワーメントのための新しいメカニズムを開発することを目指している。

Algorithms play an increasingly-significant role in our social lives. Unfortunately, they often perpetuate social injustices while doing so. The popular means of addressing these algorithmic injustices has been through algorithmic reformism: fine-tuning the algorithm itself to be more fair, accountable, and transparent. While commendable, the emerging discipline of critical algorithm studies shows that reformist approaches have failed to curtail algorithmic injustice because they ignore the power structure surrounding algorithms. Heeding calls from critical algorithm studies to analyze this power structure, I employ a framework developed by Erik Olin Wright to examine the configuration of power surrounding Algorithmic Activity: the ways in which algorithms are researched, developed, trained, and deployed within society. I argue that the reason Algorithmic Activity is unequal, undemocratic, and unsustainable is that the power structure shaping it is one of economic empowerment rather than social empowerment. For Algorithmic Activity to be socially just, we need to transform this power configuration to empower the people at the other end of an algorithm. To this end, I explore Wright's symbiotic, interstitial, and raptural transformations in the context of Algorithmic Activity, as well as how they may be applied in a hypothetical research project that uses algorithms to address a social issue. I conclude with my vision for socially just Algorithmic Activity, asking that future work strives to integrate the proposed transformations and develop new mechanisms for social empowerment.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 標準模型物理を超えるシンボリック回帰

Symbolic Regression for Beyond the Standard Model Physics ( http://arxiv.org/abs/2405.18471v1 )

ライセンス: Link先を確認

Shehu AbdusSalam, Steve Abel, Miguel Crispim Romao,

(参考訳) 標準モデル物理学を超えて研究するための強力なツールとして,記号回帰を提案する。ベンチマークモデルとして、GUTスケールで定義された4次元パラメータ空間を持つ、いわゆる制約最小対称標準モデルを考える。本研究では、ヒッグス質量、ミューオンの異常磁気モーメントへの寄与、コールドダークマターの相対密度という理論のパラメータから、3つの低エネルギー観測対象を再現する分析式を提案する。提案手法の威力を示すために,グローバル適合解析における記号表現を用いて,従来の手法と比較して極めて高速に得られるパラメータの後方確率密度を導出する。

We propose symbolic regression as a powerful tool for studying Beyond the Standard Model physics. As a benchmark model, we consider the so-called Constrained Minimal Supersymmetric Standard Model, which has a four-dimensional parameter space defined at the GUT scale. We provide a set of analytical expressions that reproduce three low-energy observables of interest in terms of the parameters of the theory: the Higgs mass, the contribution to the anomalous magnetic moment of the muon, and the cold dark matter relic density. To demonstrate the power of the approach, we employ the symbolic expressions in a global fits analysis to derive the posterior probability densities of the parameters, which are obtained extremely rapidly in comparison with conventional methods.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 有限温度ライドバーグアレイ:量子相と絡み合い特性

Finite-temperature Rydberg arrays: quantum phases and entanglement characterization ( http://arxiv.org/abs/2405.18477v1 )

ライセンス: Link先を確認

Nora Reinić, Daniel Jaschke, Darvin Wanisch, Pietro Silvi, Simone Montangero,

(参考訳) アナログ量子シミュレータの最も顕著なプラットフォームの一つとして、Rydberg原子配列は量子相と遷移を探索するための有望なツールである。 1次元Rydberg系の基底状態特性は、既に徹底的に検討されているが、解析は有限温度シナリオに向けて拡張されている。本研究では, 熱平衡における量子多体状態を構築するためのテンソルネットワークに基づく数値ツールボックスを開発し, 古典的相関や絡み合いモノトンを探索する。有限系サイズの熱ゆらぎにより連続的に収縮する秩序相を観察した。さらに, 半系分岐の絡み合いと絡み合いの負性性を調べることにより, 絡み合いの共形スケーリング則が0温度臨界点から低温状態へ広がることを数値的に確認する。

As one of the most prominent platforms for analog quantum simulators, Rydberg atom arrays are a promising tool for exploring quantum phases and transitions. While the ground state properties of one-dimensional Rydberg systems are already thoroughly examined, we extend the analysis towards the finite-temperature scenario. For this purpose, we develop a tensor network-based numerical toolbox for constructing the quantum many-body states at thermal equilibrium, which we exploit to probe classical correlations as well as entanglement monotones. We clearly observe ordered phases continuously shrinking due to thermal fluctuations at finite system sizes. Moreover, by examining the entanglement of formation and entanglement negativity of a half-system bipartition, we numerically confirm that a conformal scaling law of entanglement extends from the zero-temperature critical points into the low-temperature regime.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# サブ波長原子配列における集合的基底状態冷却

Collectively enhanced ground-state cooling in subwavelength atomic arrays ( http://arxiv.org/abs/2405.18482v1 )

ライセンス: Link先を確認

Oriol Rubies-Bigorda, Raphael Holzinger, Ana Asenjo-Garcia, Oriol Romero-Isart, Helmut Ritsch, Stefan Ostermann, Carlos Gonzalez-Ballestero, Susanne F. Yelin, Cosimo C. Rusconi,

(参考訳) 自由空間におけるサブ波長原子配列は、創発的な多体量子現象を探索する主要なプラットフォームになりつつある。これらのアレイは強い光誘起双極子-双極子相互作用を特徴とし、狭い線幅を特徴とするサブラジアント集団共鳴をもたらす。本研究では、これらの狭い集団共鳴を利用したサブ波長アレイに閉じ込められた原子のサイドバンド冷却方式を提案する。我々は、原子の内的自由度を断熱的に除去し、原子運動の効果的なマスター方程式を導出し、その予測を全系の数値シミュレーションで検証する。この結果から, サブラジアント共鳴により, 原子のアンサンブルが, 双極子相互作用を伴わない温度に冷却できることが示唆された。注目すべきは、個々の原子遷移がそうでない場合でも、狭い集団共鳴をサイドバンド分解することができることである。このようなシナリオでは、光誘起双極子-双極子相互作用により、基底状態の冷却が実現可能である。このアプローチは、エミッターの密集したアンサンブルに基づく将来の量子技術に利用することができ、運動制御の強化のために多体共生崩壊を利用するための道を開くことができる。

Subwavelength atomic arrays in free space are becoming a leading platform for exploring emergent many-body quantum phenomena. These arrays feature strong light-induced dipole-dipole interactions, resulting in subradiant collective resonances characterized by narrowed linewidths. In this work, we present a sideband cooling scheme for atoms trapped in subwavelength arrays that utilizes these narrow collective resonances. We derive an effective master equation for the atomic motion by adiabatically eliminating the internal degrees of freedom of the atoms, and validate its prediction with numerical simulations of the full system. Our results demonstrate that subradiant resonances enable the cooling of ensembles of atoms to temperatures lower than those achievable without dipole interactions, provided the atoms have different trap frequencies. Remarkably, narrow collective resonances can be sideband-resolved even when the individual atomic transition is not. In such scenarios, ground state cooling becomes feasible solely due to light-induced dipole-dipole interactions. This approach could be utilized for future quantum technologies based on dense ensembles of emitters, and paves the way towards harnessing many-body cooperative decay for enhanced motional control.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# オープンドメインテキスト駆動型マルチパーソン運動合成に向けて

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions ( http://arxiv.org/abs/2405.18483v1 )

ライセンス: Link先を確認

Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitch Hill,

(参考訳) この研究は、テキスト記述から複数の人間の自然な、多様な集団の動きを生成することを目的としている。シングル・パーソン・テキスト・トゥ・モーション・ジェネレーションは広く研究されているが、利用可能なデータセットが欠如しているため、ワン・ツー・モーション・プロンプトから1つか2つ以上の被験者の動作を合成することは依然として困難である。本研究では,大規模な画像やビデオからのポーズ情報を推定することにより,人間のポーズと動きのデータセットをキュレートする。我々のモデルはトランスフォーマーベースの拡散フレームワークを使用しており、複数の主題やフレームを持つ複数のデータセットに対応しています。実験では,複数人物の静的ポーズの生成と複数人物の動作シーケンスの生成の両方を探索する。我々の知る限り、本手法は、多種多様なテキストプロンプトから多目的運動列を多種多様な多様性と忠実度で生成する最初の方法である。

This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pose information from large-scale image and video datasets. Our models use a transformer-based diffusion framework that accommodates multiple datasets with any number of subjects or frames. Experiments explore both generation of multi-person static poses and generation of multi-person motion sequences. To our knowledge, our method is the first to generate multi-subject motion sequences with high diversity and fidelity from a large variety of textual prompts.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 最小絡み合った典型的な熱状態を用いた分光と複素時間相関

Spectroscopy and complex-time correlations using minimally entangled typical thermal states ( http://arxiv.org/abs/2405.18484v1 )

ライセンス: Link先を確認

Zhenjiu Wang, Paul McClarty, Dobromila Dankova, Andreas Honecker, Alexander Wietek,

(参考訳) テンソルネットワーク状態は強い相関物理学の側面を捉えて大きな成功を収めた。しかし,非零温度での動的相関器の取得は,これらの手法を用いても一般に困難である。本稿では,最小絡み合った典型的な熱状態(METTS)を用いた相関器の計算方法を提案する。本手法は,物理演算子の動的相関を実時間で直接計算するが,複素時間平面上で相関が評価される拡張を提案する。虚時成分は絡み合い成長の速度を束縛し、より大きなシステムサイズの研究を可能にする計算困難を強く緩和する。物理相関器を抽出するには、純粋にリアルタイムな進化の限界を取る必要がある。私たちはこの情報を得るための2つのルートを提示します。 (i)複素時間における解析相関関数と確率論的解析継続法を組み合わせることにより、実時間限界を求める。 (2) 数値解析継続の努力を必要とせず, 漸近的に所望の相関関数を定量的にキャプチャするエルミチアン相関関数。これらの数値的手法は、2次元のスピン1/半の相互作用モデルであるシャストリー・サザーランドモデルの有限温度ダイナミクスを捉える。

Tensor network states have enjoyed great success at capturing aspects of strong correlation physics. However, obtaining dynamical correlators at non-zero temperatures is generically hard even using these methods. Here, we introduce a practical approach to computing such correlators using minimally entangled typical thermal states (METTS). While our primary method directly computes dynamical correlators of physical operators in real time, we propose extensions where correlations are evaluated in the complex-time plane. The imaginary time component bounds the rate of entanglement growth and strongly alleviates the computational difficulty allowing the study of larger system sizes. To extract the physical correlator one must take the limit of purely real-time evolution. We present two routes to obtaining this information (i) via an analytic correlation function in complex time combined with a stochastic analytic continuation method to obtain the real-time limit and (ii) a hermitian correlation function that asymptotically captures the desired correlation function quantitatively without requiring effort of numerical analytic continuation. We show that these numerical techniques capture the finite-temperature dynamics of the Shastry-Sutherland model - a model of interacting spin one-half in two dimensions.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 衛星画像における火山活動の異常検出

Anomaly detection for the identification of volcanic unrest in satellite imagery ( http://arxiv.org/abs/2405.18487v1 )

ライセンス: Link先を確認

Robert Gabriel Popescu, Nantheera Anantrasirichai, Juliet Biggs,

(参考訳) 衛星画像は噴火前に火山の変形を検出する可能性があるが、大量の画像が日常的に取得される一方で、火山の変形イベントを含むのはごくわずかである。手動検査はこれらの異常を見逃しかねず、教師付き学習でモデル化された自動システムは適切にラベル付けされたデータセットを必要とする。これらの課題に対処するために, 衛星データにおける教師なし深層学習を用いて, 火山変形を異常として識別する方法について検討した。我々の検出器はパッチ分布モデリング(PaDiM)に基づいており、検出性能は重み付けされた距離で向上し、より深い層の特徴をより重要視する。さらに,ノイズや不完全データを扱うための前処理手法を提案する。最終フレームワークは, 変形特性が異なる5つの火山で試験し, その性能を火山変形検出の教師付き学習法と比較した。

Satellite images have the potential to detect volcanic deformation prior to eruptions, but while a vast number of images are routinely acquired, only a small percentage contain volcanic deformation events. Manual inspection could miss these anomalies, and an automatic system modelled with supervised learning requires suitably labelled datasets. To tackle these issues, this paper explores the use of unsupervised deep learning on satellite data for the purpose of identifying volcanic deformation as anomalies. Our detector is based on Patch Distribution Modeling (PaDiM), and the detection performance is enhanced with a weighted distance, assigning greater importance to features from deeper layers. Additionally, we propose a preprocessing approach to handle noisy and incomplete data points. The final framework was tested with five volcanoes, which have different deformation characteristics and its performance was compared against the supervised learning method for volcanic deformation detection.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 基底状態特性の予測:定数サンプル複雑度とディープラーニングアルゴリズム

Predicting Ground State Properties: Constant Sample Complexity and Deep Learning Algorithms ( http://arxiv.org/abs/2405.18489v1 )

ライセンス: Link先を確認

Marc Wanner, Laura Lewis, Chiranjib Bhattacharyya, Devdatt Dubhashi, Alexandru Gheorghiu,

(参考訳) 量子多体物理学における基本的な問題は、局所ハミルトニアンの基底状態を見つけることである。最近の多くの研究は、基底状態の学習に証明可能な効率的な機械学習(ML)アルゴリズムを提供した。具体的には、[Huang et al Science 2022] は、同じ状態のハミルトンからサンプリングされたデータポイントに対して、$n$-qubitのギャップを持つ局所ハミルトン$H$の基底状態の学習方法を導入した。その後、[Lewis et al Nature Communications 2024] によって$n$-qubit 系の幾何が知られているとき、$\mathcal{O}(\log n)$サンプルに改良された。本研究では, 基底状態特性を学習するためのシステムサイズ$n$とは無関係に, 一定のサンプル複雑性を実現するための2つのアプローチを提案する。我々の最初のアルゴリズムは、Lewis et al が使用するMLモデルの簡単な修正から成り、前もって知られていた利害関係に適用される。我々の第2のアルゴリズムは、たとえその特性の説明がわからないとしても適用され、ディープニューラルネットワークモデルである。ニューラルネットワークの性能を示す実験結果が実証されているが、我々の知る限り、これは基底状態特性を予測するニューラルネットワークモデルに束縛された初めての厳密なサンプル複雑性である。また,従来の結果と比較して,提案手法のスケーリング改善を確認する数値実験を行った。

A fundamental problem in quantum many-body physics is that of finding ground states of local Hamiltonians. A number of recent works gave provably efficient machine learning (ML) algorithms for learning ground states. Specifically, [Huang et al. Science 2022], introduced an approach for learning properties of the ground state of an $n$-qubit gapped local Hamiltonian $H$ from only $n^{\mathcal{O}(1)}$ data points sampled from Hamiltonians in the same phase of matter. This was subsequently improved by [Lewis et al. Nature Communications 2024], to $\mathcal{O}(\log n)$ samples when the geometry of the $n$-qubit system is known. In this work, we introduce two approaches that achieve a constant sample complexity, independent of system size $n$, for learning ground state properties. Our first algorithm consists of a simple modification of the ML model used by Lewis et al. and applies to a property of interest known beforehand. Our second algorithm, which applies even if a description of the property is not known, is a deep neural network model. While empirical results showing the performance of neural networks have been demonstrated, to our knowledge, this is the first rigorous sample complexity bound on a neural network model for predicting ground state properties. We also perform numerical experiments that confirm the improved scaling of our approach compared to earlier results.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# LLMと記憶:著作権コンプライアンスの品質と特異性について

LLMs and Memorization: On Quality and Specificity of Copyright Compliance ( http://arxiv.org/abs/2405.18492v1 )

ライセンス: Link先を確認

Felix B Mueller, Rebekka Görge, Anna K Bernzen, Janna C Pirk, Maximilian Poretschkin,

(参考訳) 大規模言語モデル(LLM)のメモリ化が懸念されている。 LLMは、著作権のある作品を含むトレーニングデータの一部を容易に再現できることが示されている。これは、欧州AI法と同様に、既存の著作権法に違反している可能性があるため、解決すべき重要な問題である。本研究では,欧州法を例に,LLMにおける著作権侵害の可能性を定量化するための体系的な分析法を提案する。従来の研究と異なり、現実的なエンドユーザーシナリオにおける命令精細モデルの評価を行う。我々の分析は160文字のしきい値に基づいており、ドイツ著作権サービス提供法とファジィテキストマッチングアルゴリズムから借りている。著作権及びパブリックドメインデータのモデル行動を比較することにより、著作権侵害対策の特異性を分析する。本研究では,保護されたテキスト(拒絶や幻覚など)を生成する代わりに,行動モデルがどのような行動を示すかを検討するとともに,これらの行動に関する最初の法的評価を行う。著作権の遵守, 明細性, 適切な拒絶には, 人気のLCM間で大きな違いがあることが判明した。 Alpaca、GPT 4、GPT 3.5、Luminousは、OpenGPT-X、Alpaca、Luminousと比べ、特に低い数の著作権侵害を発生させる。コードはまもなく公開される予定だ。

Memorization in large language models (LLMs) is a growing concern. LLMs have been shown to easily reproduce parts of their training data, including copyrighted work. This is an important problem to solve, as it may violate existing copyright laws as well as the European AI Act. In this work, we propose a systematic analysis to quantify the extent of potential copyright infringements in LLMs using European law as an example. Unlike previous work, we evaluate instruction-finetuned models in a realistic end-user scenario. Our analysis builds on a proposed threshold of 160 characters, which we borrow from the German Copyright Service Provider Act and a fuzzy text matching algorithm to identify potentially copyright-infringing textual reproductions. The specificity of countermeasures against copyright infringement is analyzed by comparing model behavior on copyrighted and public domain data. We investigate what behaviors models show instead of producing protected text (such as refusal or hallucination) and provide a first legal assessment of these behaviors. We find that there are huge differences in copyright compliance, specificity, and appropriate refusal among popular LLMs. Alpaca, GPT 4, GPT 3.5, and Luminous perform best in our comparison, with OpenGPT-X, Alpaca, and Luminous producing a particularly low absolute number of potential copyright violations. Code will be published soon.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 視覚課題における第2モーメント指数スケーリング最適化器の統一バランス理論

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks ( http://arxiv.org/abs/2405.18498v1 )

ライセンス: Link先を確認

Gongyue Zhang, Honghai Liu,

(参考訳) 可変第2モーメント指数スケーリング(SMES)を用いて、一階最適化器を統一する潜在的な方法を特定した。バック伝搬から始まり、勾配の消滅や爆発のような古典的な現象に対処し、データセットのスパーシリティに関連する問題に対処し、最適化におけるバランスの理論を導入する。この理論により、SGDと適応オプティマイザはより広範な推論の下で統一され、一階オプティマイザの一般化された公式内でバランスの取れたアプローチを達成するために、変動的な指数的スケーリングを採用することが提案される。いくつかの古典的データセットやネットワーク上で,バランス係数の違いがトレーニングプロセス全体に与える影響を確認する試験を行った。

We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 分類のための大マルジン識別損失

Large Margin Discriminative Loss for Classification ( http://arxiv.org/abs/2405.18499v1 )

ライセンス: Link先を確認

Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone,

(参考訳) 本稿では,Deep Learningの文脈において,大きなマージンを有する新たな識別的損失関数を提案する。この損失は、クラス内コンパクト性とクラス間分離性によって表されるニューラルネットの識別力を高める。一方、クラスコンパクト性は、同じクラスのサンプル同士の近接距離によって保証される。一方、クラス間の分離性は、各クラスから最も近い境界までの最小距離を保証するマージン損失によって促進される。私たちの損失のすべての用語は明示的な意味を持ち、得られた特徴空間の直接的なビューを与えます。本研究では,コンパクト度とマージン項の関係を数学的に解析し,ハイパーパラメータが学習特徴に与える影響に関する指針を与える。さらに、ニューラルネットのパラメータに関する損失の勾配特性も解析する。これに基づいて、トレーニングにおける安定性と一貫性を同時に享受する部分運動量更新と呼ばれる戦略を設計する。さらに,より理論的な洞察を得るため,一般化誤差についても検討する。我々の損失関数は、実験における標準ソフトマックス損失と比較して、モデルの試験精度を体系的に向上させる。

In this paper, we introduce a novel discriminative loss function with large margin in the context of Deep Learning. This loss boosts the discriminative power of neural nets, represented by intra-class compactness and inter-class separability. On the one hand, the class compactness is ensured by close distance of samples of the same class to each other. On the other hand, the inter-class separability is boosted by a margin loss that ensures the minimum distance of each class to its closest boundary. All the terms in our loss have an explicit meaning, giving a direct view of the feature space obtained. We analyze mathematically the relation between compactness and margin term, giving a guideline about the impact of the hyper-parameters on the learned features. Moreover, we also analyze properties of the gradient of the loss with respect to the parameters of the neural net. Based on this, we design a strategy called partial momentum updating that enjoys simultaneously stability and consistency in training. Furthermore, we also investigate generalization errors to have better theoretical insights. Our loss function systematically boosts the test accuracy of models compared to the standard softmax loss in our experiments.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# SoundCTM:テキスト・ツー・サウンド・ジェネレーションのためのスコアベース・一貫性モデル

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation ( http://arxiv.org/abs/2405.18503v1 )

ライセンス: Link先を確認

Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji,

(参考訳) サウンドコンテンツは、ビデオゲーム、音楽、映画などのマルチメディア作品にとって欠かせない要素である。最近の高品質な拡散型音響生成モデルは、クリエイターにとって貴重なツールとなりうる。しかし、高品質な音を出すにもかかわらず、これらのモデルは推論速度が遅い。この欠点は、通常、試行錯誤によって音を洗練させ、芸術的な意図と整合させるクリエーターの負担を和らげる。この問題に対処するため,SoundCTM(Sound Consistency Trajectory Models)を導入する。提案モデルは,高品位1段音生成と高品位1段音生成との柔軟な遷移を可能にする。これにより、クリエーターは最初は1ステップのサンプルで音をコントロールし、マルチステップ生成によってそれを精製することができる。 CTMは基本的にフレキシブルな1ステップとマルチステップの生成を実現するが、その顕著な性能は追加の事前訓練された特徴抽出器と、他のドメインでは必ずしも利用できない訓練に高価である敵の損失に大きく依存する。そこで我々は,CTMのトレーニングフレームワークを再構築し,蒸留損失に教師のネットワークを活用することにより,新たな特徴距離を導入する。さらに, 分類器を含まない誘導軌道を蒸留しながら, 条件付きおよび無条件の学生モデルを同時に訓練し, 推論中にそれらのモデルを補間する。また,SoundCTMのフレキシブルサンプリング機能を活用して,トレーニング不要な制御可能なフレームワークを提案する。 SoundCTMは、余分なオフザシェルフネットワークを使わずに、有望な1ステップと複数ステップのリアルタイムサウンド生成を実現する。さらに,SoundCTMの可制御音発生能力について,無訓練で実演する。

Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 監視格子ゲージ理論における対称性-保護ゼノ相転移

Symmetry-protection Zeno phase transition in monitored lattice gauge theories ( http://arxiv.org/abs/2405.18504v1 )

ライセンス: Link先を確認

Matteo M. Wauters, Edoardo Ballini, Alberto Biella, Philipp Hauke,

(参考訳) 量子測定はシステム力学に大きな影響を及ぼす。これらは量子ゼノ効果のような複雑な非平衡現象を引き起こし、量子シミュレーションにおける誤差の緩和に使用できる。このような能力は格子ゲージ理論(LGT)において特に有用であり、多くの局所保存法則の保存が困難である。調整された量子測定がゲージ対称性の破れを和らげることは知られているが、この保護の性質、特にしきい値の挙動の可能性はまだ解明されていない。ここでは、測定速度によって引き起こされる鋭い遷移の存在を、シミュレーション誤差に抵抗する保護ゲージ理論則と不規則則との間に示す。この結果は 1+1d $\mathbb{Z}_2$ LGT のパラダイム的な例に基づいている。局所対称性発生器に結合した補助量子ビットの射影的測定により保護を詳細に検討し、この手法をアナログ(弱)測定プロトコルと比較する。連続時間制限におけるアンサンブル平均は、同じリウヴィリア力学を共有するが、確率ゲージ保護プロトコルの異なる物理実装は、非常に異なる統計量を持つ軌道解法を生成する。さらに,ビットフリップ誤りを訂正し,離散時間スキームを大幅に向上するオンチップフィードバック機構を設計する。我々の結果は、強い相互作用を持つ高制約量子系の散逸臨界性に光を当て、ゲージ理論量子シミュレーションの誤差軽減と補正に関する貴重な洞察を提供する。

Quantum measurements profoundly influence system dynamics. They lead to complex nonequilibrium phenomena like the quantum Zeno effect, and they can be used for mitigating errors in quantum simulations. Such an ability is particularly valuable for lattice gauge theories (LGTs), which require the challenging preservation of an extensive number of local conservation laws. While it is known that tailored quantum measurements can soften violations of gauge symmetry, the nature of this protection, and in particular the possibility of a threshold behavior, is still unexplored. Here, we demonstrate the existence of a sharp transition, triggered by the measurement rate, between a protected gauge-theory regime resistant to simulation errors and an irregular regime. Our results are based on the paradigmatic example of a 1+1d $\mathbb{Z}_2$ LGT. We study in detail the protection through projective measurements of ancillary qubits coupled to the local symmetry generators, and compare this approach with analog (weak) measurement protocols. We show that, while the resulting ensemble averages in the continuous-time limit share the same Liouvillian dynamics, different physical implementations of the stochastic gauge protection protocol yield trajectory unravelings with vastly different statistics. Additionally, we design an on-chip feedback mechanism that corrects bit-flip errors and significantly enhances the discrete-time scheme. Our results shed light on the dissipative criticality of strongly-interacting, highly-constrained quantum systems, and they offer valuable insights into error mitigation and correction of gauge-theory quantum simulations.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# フローサイトメトリー予測のためのグラフニューラルネットワークへの階層的生物前駆体注入

Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction ( http://arxiv.org/abs/2405.18507v1 )

ライセンス: Link先を確認

Fatemeh Nassajian Mojarrad, Lorenzo Bini, Thomas Matthes, Stéphane Marchand-Maillet,

(参考訳) フローサイトメトリー(FC)データから得られた末梢血や骨髄などの血液学的サンプルの複雑な景観において、細胞レベルでの予測は深刻な課題を呈している。本研究では、グラフニューラルネットワーク(GNN)に階層的な事前知識を注入して、表層セルデータの単一セルマルチクラス分類を行う。データをグラフとして表現し,クラス間の階層的関係を符号化することにより,複数のGNNモデル,すなわちFCHC-GNNに適用可能な階層的プラグイン手法を提案する。 19人の異なる患者のコホートに対する大規模な実験により、階層的な生物学的制約を取り入れることによって、複数の指標においてパフォーマンスが著しく向上することが実証された。提案手法は, 複雑な生物予測タスクにおける一般化向上のための構造的帰納バイアスの重要性を強調した。

In the complex landscape of hematologic samples such as peripheral blood or bone marrow derived from flow cytometry (FC) data, cell-level prediction presents profound challenges. This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of tabular cellular data. By representing the data as graphs and encoding hierarchical relationships between classes, we propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain. Extensive experiments on our cohort of 19 distinct patients, demonstrate that incorporating hierarchical biological constraints boosts performance significantly across multiple metrics compared to baseline GNNs without such priors. The proposed approach highlights the importance of structured inductive biases for gaining improved generalization in complex biological prediction tasks.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# AIと人間の感情アライメントの改善: 安定拡散v1, DALL-E2, DALL-E3で表される感情の人間のレーティング

Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3 ( http://arxiv.org/abs/2405.18510v1 )

ライセンス: Link先を確認

James Derek Lomas, Willem van der Maden, Sohhom Bandyopadhyay, Giovanni Lion, Nirmal Patel, Gyanesh Jain, Yanna Litowsky, Haian Xue, Pieter Desmet,

(参考訳) 生成AIシステムは、テキストや画像を通じて感情を表現する能力がますます高まっている。効果的な感情表現は、AIシステム、特に人間のメンタルヘルスと幸福をサポートするように設計されたシステムにおいて、大きな役割を果たす可能性が高い。これは、AI表現された感情と人間の感情の知覚との整合をよりよく理解するために、我々の現在の研究を動機付けます。 AIが特定の感情を表現しようとするとき、その感情が成功するかどうかをどうやって評価すればよいのか? この問いに答えるために、私たちは、生成的AIによって表現される感情と人間の知覚との整合性を測定する調査を設計した。 3つの生成画像モデル(DALL-E 2、DALL-E 3、Stable Diffusion v1)を用いて240のサンプル画像を生成した。 Prolificのウェブサイトから募集された24人の参加者は、感情を生成するために使用されるテキストプロンプト(つまり「感情を楽しませるロボット」)とAIが生成する感情表現のアライメントを評価した。評価の結果,生成型AIモデルでは,人間の感情に順応した感情表現を生成できることが示唆されたが,そのアライメントは使用するAIモデルと感情そのものに大きく依存していることが示唆された。これらのシステムの性能の変動を分析し、将来の改善のためのギャップを特定する。我々は、メンタルヘルスと幸福をサポートするように設計された将来のAIシステムへの影響についての議論で締めくくった。

Generative AI systems are increasingly capable of expressing emotions via text and imagery. Effective emotional expression will likely play a major role in the efficacy of AI systems -- particularly those designed to support human mental health and wellbeing. This motivates our present research to better understand the alignment of AI expressed emotions with the human perception of emotions. When AI tries to express a particular emotion, how might we assess whether they are successful? To answer this question, we designed a survey to measure the alignment between emotions expressed by generative AI and human perceptions. Three generative image models (DALL-E 2, DALL-E 3 and Stable Diffusion v1) were used to generate 240 examples of images, each of which was based on a prompt designed to express five positive and five negative emotions across both humans and robots. 24 participants recruited from the Prolific website rated the alignment of AI-generated emotional expressions with a text prompt used to generate the emotion (i.e., "A robot expressing the emotion amusement"). The results of our evaluation suggest that generative AI models are indeed capable of producing emotional expressions that are well-aligned with a range of human emotions; however, we show that the alignment significantly depends upon the AI model used and the emotion itself. We analyze variations in the performance of these systems to identify gaps for future improvement. We conclude with a discussion of the implications for future AI systems designed to support mental health and wellbeing.

翻訳日:2024-05-30 22:13:00 公開日:2024-05-28

# 脳疾患とセグメンテーションのためのMRIデータベースを用いた共同学習の可能性とメリット

Feasibility and benefits of joint learning from MRI databases with different brain diseases and modalities for segmentation ( http://arxiv.org/abs/2405.18511v1 )

ライセンス: Link先を確認

Wentian Xu, Matthew Moffat, Thalia Seale, Ziyun Liang, Felix Wagner, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, Abhirup Banerjee, Konstantinos Kamnitsas,

(参考訳) マルチモーダルMRIにおける脳病変のセグメンテーションモデルは通常、特定の疾患のプロトコルによって決定されるMRIモダリティのセットが予め定義された単一のデータベースを用いて、特定の病理のために訓練される。さまざまなMRIモダリティとさまざまな脳病理のためのアノテーションを含む複数のデータベースを使用してモデルをトレーニングすることは可能か? この共同学習は、トレーニング中に利用可能なモダリティと病理のセットのパフォーマンスに恩恵をもたらすだろうか? モダリティと病理の異なる新しいデータベースを解析することは可能だろうか? 我々は、様々な手法を開発し、比較し、モデルとトレーニングフレームワークに適切な、シンプルで実践的な変更を加えることで、有望な結果が得られることを示す。われわれは5種類の脳病理と異なるMRIモダリティを含む7つのデータベースを実験した。その結果、異なる脳病理と一連のモダリティを持つマルチモーダルMRIデータベースのジョイントトレーニングが実現可能であり、実用的な利点をもたらすことが初めて示された。これにより、トレーニング中に遭遇した病理を様々なモダリティのセットで分割し、フォローアップファインタニングのような新しいタイプの病理を分割することが可能になる。本研究は, このパラダイムの可能性と限界を考察し, 今後の方向性を導く上で有用であることが示唆された。コードおよび事前訓練されたモデル:https://github.com/WenTXuL/MultiUnet

Models for segmentation of brain lesions in multi-modal MRI are commonly trained for a specific pathology using a single database with a predefined set of MRI modalities, determined by a protocol for the specific disease. This work explores the following open questions: Is it feasible to train a model using multiple databases that contain varying sets of MRI modalities and annotations for different brain pathologies? Will this joint learning benefit performance on the sets of modalities and pathologies available during training? Will it enable analysis of new databases with different sets of modalities and pathologies? We develop and compare different methods and show that promising results can be achieved with appropriate, simple and practical alterations to the model and training framework. We experiment with 7 databases containing 5 types of brain pathologies and different sets of MRI modalities. Results demonstrate, for the first time, that joint training on multi-modal MRI databases with different brain pathologies and sets of modalities is feasible and offers practical benefits. It enables a single model to segment pathologies encountered during training in diverse sets of modalities, while facilitating segmentation of new types of pathologies such as via follow-up fine-tuning. The insights this study provides into the potential and limitations of this paradigm should prove useful for guiding future advances in the direction. Code and pretrained models: https://github.com/WenTXuL/MultiUnet

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# グラフアルゴリズムによる変圧器推論能力の理解

Understanding Transformer Reasoning Capabilities via Graph Algorithms ( http://arxiv.org/abs/2405.18512v1 )

ライセンス: Link先を確認

Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni,

(参考訳) どのトランスフォーマースケーリングレジームが、アルゴリズムのさまざまなクラスを完璧に解決できるのか? トランスフォーマーベースのニューラルネットワークによって、膨大な経験的進歩が達成されている一方で、現実的なパラメータ体系におけるアルゴリズム推論能力に関する理論的理解が欠如している。本稿では,ネットワークの深さ,幅,アルゴリズム実行のための余分なトークン数の観点から,この問題を考察する。我々の新しい表現階層は、9つのアルゴリズム的推論問題を、異なる現実的なパラメータスケーリング方式の変換器で解けるクラスに分離する。グラフ接続のようなタスクには対数深さが必要で十分であることを示す一方、埋め込み次元の小さい単一層トランスは文脈的検索タスクを解くことができる。また、GraphQAベンチマークを用いて、経験的証拠を多用した理論解析も支援している。これらの結果は、トランスフォーマーが多くのグラフ推論タスクで優れており、特殊なグラフニューラルネットワークよりも優れていることを示している。

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# Atlas3D:シミュレーションと製作のための物理的に制約されたセルフ・サポーティング・テキスト・トゥ・3D

Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication ( http://arxiv.org/abs/2405.18515v1 )

ライセンス: Link先を確認

Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, Chenfanfu Jiang,

(参考訳) 既存の拡散ベースのテキスト・ツー・3D生成手法は主に視覚的にリアルな形状や外観を作り出すことに焦点を当てており、下流のタスクに必要な物理的な制約を無視することが多い。生成したモデルは物理ベースのシミュレーションや3Dプリントでバランスを保つのにしばしば失敗する。このバランスは、対話型ゲーム、具体化されたAI、ロボット工学におけるユーザーデザインの意図を満たすために不可欠である。さらに、安定したモデルでは、家庭装飾用のフィギュアのような3Dプリントされたオブジェクトが、追加のサポートを必要とせずに、単独で立ち上がることが保証されている。このギャップを埋めるために,既存のスコア蒸留サンプリング(SDS)ベースのテキスト・ツー・3Dツールを強化する,自動で実装が容易なAtlas3Dを導入する。 Atlas3Dは、重力、接触、摩擦の下での物理的安定性の法則に従う自己支持型3Dモデルの生成を保証する。提案手法は,従来のフレームワークのリファインメントや後処理モジュールとして機能する,新しい微分可能なシミュレーションベース損失関数と物理的にインスパイアされた正規化を組み合わせたものである。我々は、Atlas3Dの有効性を広範囲な生成タスクを通して検証し、シミュレーションと実環境の両方で結果の3Dモデルを検証する。

Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embodied AI, and robotics, where stable models are needed for reliable interaction. Additionally, stable models ensure that 3D-printed objects, such as figurines for home decoration, can stand on their own without requiring additional supports. To fill this gap, we introduce Atlas3D, an automatic and easy-to-implement method that enhances existing Score Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the generation of self-supporting 3D models that adhere to physical laws of stability under gravity, contact, and friction. Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization, serving as either a refinement or a post-processing module for existing frameworks. We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# LSTM-COXモデル:反復イベント処理のための簡潔かつ効率的な深層学習手法

LSTM-COX Model: A Concise and Efficient Deep Learning Approach for Handling Recurrent Events ( http://arxiv.org/abs/2405.18518v1 )

ライセンス: Link先を確認

Zhang Runquan, Shi Xiaoping,

(参考訳) 現在の臨床医学の分野では、リカレント事象を解析するための従来の手法は、複雑な時間依存データを扱う際に制限がある。本研究では,Long Short-Term Memory Network (LSTM) とCoxモデルを組み合わせることで,動的時間的情報を用いて繰り返しイベントを解析する際のモデルの性能を向上させる。従来のモデルと比較して、LSTM-Coxモデルは臨床リスクの特徴抽出の精度を大幅に向上させ、シミュレーションデータセット上での良好な性能を維持しつつ、Akaike Information Criterion(AIC)の低い値を示す。膀胱癌再発データを実験的に解析し, トレーニング期間中の平均2乗誤差を低減し, テストセットで最大0.90のコンコーダンス指数を達成した。さらに,高リスク群と低リスク群を効果的に区別し,腫瘍再発数や最大サイズなどの再発リスクの特徴を他の研究および臨床試験結果と一致させた。本研究は,再帰的データの解析と特徴抽出の簡便かつ効率的な方法を提供するだけでなく,深層学習技術を臨床リスク予測システムに統合するための便利な経路を提供する。

In the current field of clinical medicine, traditional methods for analyzing recurrent events have limitations when dealing with complex time-dependent data. This study combines Long Short-Term Memory networks (LSTM) with the Cox model to enhance the model's performance in analyzing recurrent events with dynamic temporal information. Compared to classical models, the LSTM-Cox model significantly improves the accuracy of extracting clinical risk features and exhibits lower Akaike Information Criterion (AIC) values, while maintaining good performance on simulated datasets. In an empirical analysis of bladder cancer recurrence data, the model successfully reduced the mean squared error during the training phase and achieved a Concordance index of up to 0.90 on the test set. Furthermore, the model effectively distinguished between high and low-risk patient groups, and the identified recurrence risk features such as the number of tumor recurrences and maximum size were consistent with other research and clinical trial results. This study not only provides a straightforward and efficient method for analyzing recurrent data and extracting features but also offers a convenient pathway for integrating deep learning techniques into clinical risk prediction systems.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# オフライン型アクター臨界:深部オフポリシィRLの最適歴史的挙動を適応的に曲げる

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL ( http://arxiv.org/abs/2405.18520v1 )

ライセンス: Link先を確認

Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan,

(参考訳) オフ・ポリティクス強化学習(RL)は、以前に収集したデータを政策学習に活用することにより、多くの複雑な現実世界のタスクに取り組むことで顕著な成功を収めた。しかし、既存のRLアルゴリズムのほとんどは、リプレイバッファ内の情報を最大限に活用することができず、サンプル効率とポリシー性能を制限している。本研究では,共有オンライン再生バッファをベースとしたオフラインRLポリシーの同時学習が,本来のオンライン学習ポリシーより優れていることを発見した。これは、オンラインのポリシー学習を改善するために、オフラインの最適ポリシーを突発的に改善する新たな可能性の動機となっている。この知見に基づき,モデルのないオンラインRLフレームワークであるOBAC(Offline-Boosted Actor-Critic)を提案する。実験の結果,OBACは他のモデルフリーのRLベースラインよりも優れており,6つのタスクスイートにまたがる53のタスクにまたがるサンプル効率と漸近性能の点で,高度なモデルベースRLメソッドと競合することがわかった。

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline RL policy based on the shared online replay buffer can sometimes outperform the original online learning policy, though the occurrence of such performance gains remains uncertain. This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance. Our experiments demonstrate that OBAC outperforms other popular model-free RL baselines and rivals advanced model-based RL methods in terms of sample efficiency and asymptotic performance across 53 tasks spanning 6 task suites.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# TripletMix: 3D理解のためのトリプルトデータ拡張

TripletMix: Triplet Data Augmentation for 3D Understanding ( http://arxiv.org/abs/2405.18523v1 )

ライセンス: Link先を確認

Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng,

(参考訳) データ拡張は、特に従来のデータセットが制限される3Dビジョンにおいて、ディープラーニングモデルの一般化能力を向上するための重要なツールであることが証明されている。これまでの進歩にもかかわらず、既存のメソッドは、主に、テキスト、イメージ、ポイントクラウドを統合したマルチモーダルトリプルデータの増大にギャップを残した、ユニモーダルなデータシナリオに対応している。 3つのモダリティを同時に増強することで多様性が向上し、モダリティ間のアライメントが向上し、より包括的で堅牢な3D表現が得られる。このギャップに対処するために,3次元理解におけるマルチモーダルデータ拡張の未検討問題に対処する新しいアプローチであるTripletMixを提案する。 TripletMixは、マルチモーダル三重項データに対する混合ベースの拡張の原理を革新的に応用し、クロスモーダル接続の保存と最適化を可能にした。提案するTripletMixは,特徴レベルと入力レベルを組み合わせ,生データと潜時特徴の二重化を実現し,特徴整合性の確保と多彩で現実的なトレーニングサンプルの提供により,モデルのクロスモーダル理解と一般化能力を大幅に向上させる。我々は,TripletMixが,ゼロショットや線形探索などの学習シナリオにおけるモデルのベースライン性能を向上するだけでなく,モデルの一般化可能性を大幅に向上させることを示した。特に、ScanObjectNNのゼロショット分類精度を51.3%から61.9%に改善し、Objaverse-LVISは46.8%から51.4%に改善しました。本研究は,3次元物体認識と理解を著しく向上させるマルチモーダルデータ拡張の可能性を明らかにするものである。

Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing methods primarily cater to unimodal data scenarios, leaving a gap in the augmentation of multimodal triplet data, which integrates text, images, and point clouds. Simultaneously augmenting all three modalities enhances diversity and improves alignment across modalities, resulting in more comprehensive and robust 3D representations. To address this gap, we propose TripletMix, a novel approach to address the previously unexplored issue of multimodal data augmentation in 3D understanding. TripletMix innovatively applies the principles of mixed-based augmentation to multimodal triplet data, allowing for the preservation and optimization of cross-modal connections. Our proposed TripletMix combines feature-level and input-level augmentations to achieve dual enhancement between raw data and latent features, significantly improving the model's cross-modal understanding and generalization capabilities by ensuring feature consistency and providing diverse and realistic training samples. We demonstrate that TripletMix not only improves the baseline performance of models in various learning scenarios including zero-shot and linear probing classification but also significantly enhances model generalizability. Notably, we improved the zero-shot classification accuracy on ScanObjectNN from 51.3 percent to 61.9 percent, and on Objaverse-LVIS from 46.8 percent to 51.4 percent. Our findings highlight the potential of multimodal data augmentation to significantly advance 3D object recognition and understanding.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# コンパクトな空間に配向する:不均一なアーキテクチャ間の対照的な知識蒸留

Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures ( http://arxiv.org/abs/2405.18524v1 )

ライセンス: Link先を確認

Hongjun Wu, Li Xiao, Xingkuo Zhang, Yining Miao,

(参考訳) 知識蒸留はニューラルネットワークを圧縮するために一般的に用いられ、推論コストとメモリフットプリントを削減している。均質アーキテクチャのシナリオでは、特徴に基づく手法がその有効性に対して広く検証されている。しかし、教師モデルと学生モデルが異種アーキテクチャである場合、特徴表現の固有の違いはこれらの手法の性能を著しく低下させる。近年の研究では、低周波成分が画像の特徴の大部分を占めていることが強調されている。そこで本研究では,低周波成分を用いたコントラスト知識蒸留(Contrastive Knowledge Distillation, LFCC)フレームワークを提案する。具体的には,教師モデルと学生モデルの両方から,中間特徴の低周波成分を抽出するマルチスケール低域フィルタの集合を設計し,それらをコンパクトな空間に整列させて,構造的差異を克服する。さらに,教師/学生の本質的なペアリング特性を活用して,サンプル内特徴類似性の制約とサンプル間特徴分散の制約をコントラスト学習タスクに順応的に再構成する,革新的なサンプルレベルのコントラスト学習フレームワークを設計する。この戦略により、学生モデルは、異なるサンプルの特徴の識別を同時に強化しつつ、サンプル内特徴の一致に乗じることができる。その結果,LFCCフレームワークは異種アーキテクチャにおける特徴表現の共通点を正確に捉えている。 3つのアーキテクチャ(CNN, Transformer, MLP)にわたる広範囲な評価と実証分析により,ImageNet-1KとCIFAR-100の挑戦的なベンチマークにおいて,LFCCが優れた性能を発揮することが示された。すべてのコードは公開されます。

Knowledge distillation is commonly employed to compress neural networks, reducing the inference costs and memory footprint. In the scenario of homogenous architecture, feature-based methods have been widely validated for their effectiveness. However, in scenarios where the teacher and student models are of heterogeneous architectures, the inherent differences in feature representation significantly degrade the performance of these methods. Recent studies have highlighted that low-frequency components constitute the majority of image features. Motivated by this, we propose a Low-Frequency Components-based Contrastive Knowledge Distillation (LFCC) framework that significantly enhances the performance of feature-based distillation between heterogeneous architectures. Specifically, we designe a set of multi-scale low-pass filters to extract the low-frequency components of intermediate features from both the teacher and student models, aligning them in a compact space to overcome architectural disparities. Moreover, leveraging the intrinsic pairing characteristic of the teacher-student framework, we design an innovative sample-level contrastive learning framework that adeptly restructures the constraints of within-sample feature similarity and between-sample feature divergence into a contrastive learning task. This strategy enables the student model to capitalize on intra-sample feature congruence while simultaneously enhancing the discrimination of features among disparate samples. Consequently, our LFCC framework accurately captures the commonalities in feature representation across heterogeneous architectures. Extensive evaluations and empirical analyses across three architectures (CNNs, Transformers, and MLPs) demonstrate that LFCC achieves superior performance on the challenging benchmarks of ImageNet-1K and CIFAR-100. All codes will be publicly available.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# REPARO: 微分可能な3次元レイアウトアライメントによる合成3Dアセット生成

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment ( http://arxiv.org/abs/2405.18525v1 )

ライセンス: Link先を確認

Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li,

(参考訳) 従来の画像から3Dモデルでは、バイアスや閉塞の複雑さのため、複数のオブジェクトを含むシーンで苦労することが多い。この課題に対処するために,単一画像からの合成3Dアセット生成のための新しいアプローチであるREPAROを提案する。まず、シーンから個々のオブジェクトを抽出し、オフザシェルフ画像から3Dモデルを使用してそれらの3Dメッシュを再構築し、異なるレンダリング技術によってこれらのメッシュのレイアウトを最適化し、コヒーレントなシーン構成を保証する。最適なトランスポートベース長範囲の外観損失項と高レベルの意味損失項を微分可能レンダリングに統合することにより、REPAROは3Dアセットのレイアウトを効果的に復元することができる。提案手法は,オブジェクト独立性,細部精度,全体のシーンコヒーレンスを著しく向上させることができる。マルチオブジェクトシーンの広汎な評価は、REPAROが単一画像からの多オブジェクト3Dシーン生成の複雑さに対処するための包括的アプローチを提供することを示している。

Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities. To address this challenge, we present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models; then, it optimizes the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition. By integrating optimal transport-based long-range appearance loss term and high-level semantic loss term in the differentiable rendering, REPARO can effectively recover the layout of 3D assets. The proposed method can significantly enhance object independence, detail accuracy, and overall scene coherence. Extensive evaluation of multi-object scenes demonstrates that our REPARO offers a comprehensive approach to address the complexities of multi-object 3D scene generation from single images.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# コンフォーマル予測による逆問題におけるタスク駆動不確かさの定量化

Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction ( http://arxiv.org/abs/2405.18527v1 )

ライセンス: Link先を確認

Jeffrey Wen, Rizwan Ahmad, Philip Schniter,

(参考訳) 逆問題の画像化では、画像が欠落/破損した測定結果から回復しようとする。このような問題は正しくないため、測定・回収プロセスによって引き起こされる不確実性を定量化する大きな動機がある。復元された画像が、ソフトアウトプット分類などの下流タスクに使用されるアプリケーションによって動機付けられ、不確実性定量化のためのタスク中心のアプローチを提案する。特に、コンフォメーション予測を用いて、実際の画像からユーザ特定確率までのタスク出力を含むことが保証される間隔を構築し、その間隔の幅を用いて測定と復元による不確実性の定量化を行う。後方サンプリングに基づく画像復元のために,局所的な適応予測区間を構築した。さらに,タスクの不確実性が許容範囲以下になると,複数のラウンドで測定値の収集を行う。我々は,MRI(Accelerated Magnetic resonance imaging)の方法論を実証する。

In imaging inverse problems, one seeks to recover an image from missing/corrupted measurements. Because such problems are ill-posed, there is great motivation to quantify the uncertainty induced by the measurement-and-recovery process. Motivated by applications where the recovered image is used for a downstream task, such as soft-output classification, we propose a task-centered approach to uncertainty quantification. In particular, we use conformal prediction to construct an interval that is guaranteed to contain the task output from the true image up to a user-specified probability, and we use the width of that interval to quantify the uncertainty contributed by measurement-and-recovery. For posterior-sampling-based image recovery, we construct locally adaptive prediction intervals. Furthermore, we propose to collect measurements over multiple rounds, stopping as soon as the task uncertainty falls below an acceptable level. We demonstrate our methodology on accelerated magnetic resonance imaging (MRI).

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# BI-マンバを用いた多視点胸部X線による心血管疾患の検出

Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba ( http://arxiv.org/abs/2405.18533v1 )

ライセンス: Link先を確認

Zefan Yang, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan,

(参考訳) 医療画像における心血管疾患(CVD)リスクの正確な予測は、患者の健康管理に重要である。従来の研究では、CT(Computed tomography)における画像特徴がCVDのリスクを予測するのに役立つことが示されている。しかし、CTには顕著な放射線曝露があり、患者に悪影響を及ぼす可能性がある。対照的に、胸部X線は放射線のレベルを著しく低くし、より安全な選択肢を提供する。本研究は,胸部X線によるCVDリスク予測の可能性について検討する。畳み込みニューラルネットワーク(CNN)とトランスフォーマーは、コンピュータ支援診断のための確立された2つのネットワークアーキテクチャである。しかし、大きなコンテキストモデリング能力や2次時間複雑性が欠如しているため、非常に高解像度の胸部X線をモデル化するのに苦労している。状態空間列モデル (SSM) に触発され, 競合するシーケンスモデリング能力を持つネットワークアーキテクチャをトランスフォーマーとして, 線形時間複雑性として, 両方向画像マンバ (BI-Mamba) を提案し, 反対方向情報で一方向SSMを補完する。 BI-Mambaは、マルチビュー胸部X線の長距離依存性を符号化するために、並列フォワードブロックとバックウォークブロックを利用する。 NLST(National Lung Screening Trail)における10,395名の被験者の画像について広範な実験を行った。その結果、BI-MambaはResNet-50とViT-Sを同等のパラメータサイズで上回り、トレーニング中に大量のGPUメモリを節約していることがわかった。また, BI-Mambaは従来のCTと比較して有望な性能を示し, CVDリスク予測のための胸部X線の可能性を明らかにする。

Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower levels of radiation, offering a safer option. This rationale motivates our investigation into the feasibility of using chest X-ray for predicting CVD risk. Convolutional Neural Networks (CNNs) and Transformers are two established network architectures for computer-aided diagnosis. However, they struggle to model very high resolution chest X-ray due to the lack of large context modeling power or quadratic time complexity. Inspired by state space sequence models (SSMs), a new class of network architectures with competitive sequence modeling power as Transfomers and linear time complexity, we propose Bidirectional Image Mamba (BI-Mamba) to complement the unidirectional SSMs with opposite directional information. BI-Mamba utilizes parallel forward and backwark blocks to encode longe-range dependencies of multi-view chest X-rays. We conduct extensive experiments on images from 10,395 subjects in National Lung Screening Trail (NLST). Results show that BI-Mamba outperforms ResNet-50 and ViT-S with comparable parameter size, and saves significant amount of GPU memory during training. Besides, BI-Mamba achieves promising performance compared with previous state of the art in CT, unraveling the potential of chest X-ray for CVD risk prediction.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# 組合せ最適化におけるサブサンプリングによる個別プライバシ会計

Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization ( http://arxiv.org/abs/2405.18534v1 )

ライセンス: Link先を確認

Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon,

(参考訳) 本研究では,アルゴリズムが一方のAdd-DPである場合,そのサブサンプル版が両側のDPを満たすという単純な観察を通して,個別化されたプライバシ会計を解析する新しい手法を提案する。これにより、分解可能な部分モジュラ最大化や集合被覆を含む、プライベート組合せ最適化問題に対する改良されたアルゴリズムがいくつか得られる。我々の誤差保証は漸近的に厳密であり、我々のアルゴリズムは純粋DPを満足する一方、既知アルゴリズム(Gupta et al , 2010; Chaturvedi et al , 2021)は近似DPである。また,ストリーム内の重み付け問題に純粋DPアルゴリズムを付与することにより,組合せ最適化を超越した手法を適用した(Kaplan et al ,2021; Cohen & Lyu, 2023)。

In this work, we give a new technique for analyzing individualized privacy accounting via the following simple observation: if an algorithm is one-sided add-DP, then its subsampled variant satisfies two-sided DP. From this, we obtain several improved algorithms for private combinatorial optimization problems, including decomposable submodular maximization and set cover. Our error guarantees are asymptotically tight and our algorithm satisfies pure-DP while previously known algorithms (Gupta et al., 2010; Chaturvedi et al., 2021) are approximate-DP. We also show an application of our technique beyond combinatorial optimization by giving a pure-DP algorithm for the shifting heavy hitter problem in a stream; previously, only an approximateDP algorithm was known (Kaplan et al., 2021; Cohen & Lyu, 2023).

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# 固体スピン量子ビットのデコヒーレンス:計算的視点

Decoherence of solid-state spin qubits: a computational perspective ( http://arxiv.org/abs/2405.18535v1 )

ライセンス: Link先を確認

Mykyta Onizhuk, Giulia Galli,

(参考訳) 量子技術における固体スピンの有用性は、量子状態のコヒーレントな重ね合わせにどれだけ長く留まるかに依存する。このColloquiumは、第一原理シミュレーションが様々な種類の固体電子スピンのスピンダイナミクスを予測し、量子コンピューティング、ネットワーク、センシングのための新しいプラットフォームの設計と改善を支援する方法について論じている。まず、一般的な量子システムに影響を及ぼすノイズの必要な概念を概説する。次に、スピン欠陥量子ビットのスピンフォノン緩和を予測する最近の進歩に焦点を当てる。次に,スピンスピン相互作用によって引き起こされる量子デコヒーレンスをシミュレーションするためのクラスタ法について議論し,これらのシミュレーションの精度を保証する上での検証の重要性を強調する。我々は、最近の実験結果の解釈において、検証されたクラスタ法がどのように有効かを強調し、さらに重要なことは、新しいスピンベースの量子プラットフォームにおけるコヒーレンス特性を予測し、次世代量子技術の発展を導くことである。

The usefulness of solid-state spins in quantum technologies depends on how long they can remain in a coherent superposition of quantum states. This Colloquium discusses how first-principles simulations can predict spin dynamics for different types of solid-state electron spins, helping design novel and improved platforms for quantum computing, networking, and sensing. We begin by outlining the necessary concepts of the noise affecting generic quantum systems. We then focus on recent advances in predicting spin-phonon relaxation of the spin-defect qubits. Next, we discuss cluster methods as a means of simulating quantum decoherence induced by spin-spin interactions, emphasizing the critical role of validation in ensuring the accuracy of these simulations. We highlight how validated cluster methods can be instrumental in interpreting recent experimental results and, more importantly, predicting the coherence properties of novel spin-based quantum platforms, guiding the development of next-generation quantum technologies.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# ドメイン反転ニューラルプロセスを用いた機械的循環支援のためのデータ駆動シミュレータ

Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process ( http://arxiv.org/abs/2405.18536v1 )

ライセンス: Link先を確認

Sophia Sun, Wenyuan Chen, Zihao Zhou, Sonia Fereidooni, Elise Jortberg, Rose Yu,

(参考訳) 確率的ディープシークエンスモデルとして実装されたMCS(Mechanical Circulatory Support)デバイス。 MCSの既存の機械シミュレータは、仮定の単純化に依存しており、患者固有の振る舞いに敏感であり、実際の治療シナリオに適用性を制限する。これらの欠点に対処するために、我々のモデルであるDomain Adversarial Neural Process (DANP)は、ニューラルネットワークアーキテクチャを用いて、MCSポンプレベルと不確実性を伴う大動脈圧測定との確率的関係をキャプチャする。我々は、シミュレーションデータと実世界の観測データを組み合わせるために、ドメインの敵対的トレーニングを使用し、その結果、より現実的で多様な潜在的な結果が表現される。非定常的傾向予測の19%の改善による経験的結果は、臨床医がMCS患者の治療について理解し、決定を下すための効果的なツールとしてDANPを確立した。

Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process architecture, allowing it to capture the probabilistic relationship between MCS pump levels and aortic pressure measurements with uncertainty. We use domain adversarial training to combine simulation data with real-world observations, resulting in a more realistic and diverse representation of potential outcomes. Empirical results with an improvement of 19% in non-stationary trend prediction establish DANP as an effective tool for clinicians to understand and make informed decisions regarding MCS patient treatment.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# ARにおける組込み音声駆動オンザフライ参照による拡張会話

Augmented Conversation with Embedded Speech-Driven On-the-Fly Referencing in AR ( http://arxiv.org/abs/2405.18537v1 )

ライセンス: Link先を確認

Shivesh Jadon, Mehrad Faridan, Edward Mah, Rajan Vaish, Wesley Willett, Ryo Suzuki,

(参考訳) 本稿では,拡張現実(AR)における組込み音声駆動のオンザフライ参照を通じて,共同会話を支援することを目的とした,拡張現実の概念を紹介する。今日、スマートフォンのようなコンピューティング技術は、会話中に様々な参照に素早くアクセスできる。しかし、これらのツールはしばしば注意をそらし、アイコンタクトを減らし、ユーザーは携帯電話の画面に注意を集中させ、関連する情報にアクセスするためにキーワードを手入力する。対照的に、ARベースのオンザフライ参照は、音声会話から自動的に抽出されるキーワードに基づいて、リアルタイムで関連する視覚的参照を提供する。これらの視覚的参照を会話パートナーの周囲に埋め込むことで、強化された会話は混乱と摩擦を減らし、ユーザーはアイコンタクトを維持し、より自然なソーシャルインタラクションをサポートすることができる。この概念を実証するために,実時間音声認識,自然言語処理,視線に基づく対話を利用したホロレンスベースのインタフェースである \system を開発した。本稿では,ユーザ中心の設計プロセスを通じて識別された7つの設計ガイドラインに基づいて,会話の視覚的参照の設計空間について検討し,我々の実装について述べる。最初のユーザ調査では、スマートフォンの検索に比べて会話の邪魔や摩擦を減らし、非常に有用で関連性の高い情報を提供する。

This paper introduces the concept of augmented conversation, which aims to support co-located in-person conversations via embedded speech-driven on-the-fly referencing in augmented reality (AR). Today computing technologies like smartphones allow quick access to a variety of references during the conversation. However, these tools often create distractions, reducing eye contact and forcing users to focus their attention on phone screens and manually enter keywords to access relevant information. In contrast, AR-based on-the-fly referencing provides relevant visual references in real-time, based on keywords extracted automatically from the spoken conversation. By embedding these visual references in AR around the conversation partner, augmented conversation reduces distraction and friction, allowing users to maintain eye contact and supporting more natural social interactions. To demonstrate this concept, we developed \system, a Hololens-based interface that leverages real-time speech recognition, natural language processing and gaze-based interactions for on-the-fly embedded visual referencing. In this paper, we explore the design space of visual referencing for conversations, and describe our our implementation -- building on seven design guidelines identified through a user-centered design process. An initial user study confirms that our system decreases distraction and friction in conversations compared to smartphone searches, while providing highly useful and relevant information.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# モデル駆動工学における自動化の過去、現在、そして未来

The Past, Present, and Future of Automation in Model-Driven Engineering ( http://arxiv.org/abs/2405.18539v1 )

ライセンス: Link先を確認

Lola Burgueño, Davide Di Ruscio, Houari Sahraoui, Manuel Wimmer,

(参考訳) モデル駆動エンジニアリング(MDE)は多くの異なるエンジニアリングタスク、特に設計から実装への移行に関わる自動化に関する膨大な知識を提供する。人工知能(AI)技術に関する大きな進歩により、既存のMDE技術や技術をどのように改善できるか、あるいは現在専用のサポートを欠いている他のアクティビティも自動化できるかといった、MDEの将来に対する疑問が持ち上がる。しかし同時に、複雑なシステムの作成、運用、保守のために、エンジニアのループを維持するためにモデルをどこに、どのように使用するべきかを再検討する必要がある。これらのオープンポイントに関する専門的な研究のきっかけとして、MDEにおける自動化の歴史と、MDEにおける自動化をさらに改善し、中長期的視点において障害を克服しなければならないかという視点について論じる。

Model-Driven Engineering (MDE) provides a huge body of knowledge of automation for many different engineering tasks, especially those involving transitioning from design to implementation. With the huge progress made on Artificial Intelligence (AI) techniques, questions arise for the future of MDE such as how existing MDE techniques and technologies can be improved or how other activities which currently lack dedicated support can also be automated. However, at the same time, it has to be revisited where and how models should be used to keep the engineers in the loop for creating, operating, and maintaining complex systems. To trigger dedicated research on these open points, we discuss the history of automation in MDE and present perspectives on how automation in MDE can be further improved and which obstacles have to be overcome in the medium and long term perspective.

翻訳日:2024-05-30 22:03:07 公開日:2024-05-28

# 堅牢なレッドチームと安全チューニングのための大規模言語モデルに対する多様な攻撃学習

Learning diverse attacks on large language models for robust red-teaming and safety tuning ( http://arxiv.org/abs/2405.18540v1 )

ライセンス: Link先を確認

Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain,

(参考訳) レッドチーム、あるいは有害な応答を誘発するプロンプトの特定は、大規模言語モデル(LLM)の安全かつ責任あるデプロイを保証するための重要なステップである。多くの攻撃プロンプトに対する効果的な防御を開発するには、多様な攻撃を発見する必要がある。自動化されたレッドチームは通常、例えば補助毒性分類器によって測定されたように、強化学習を使用して攻撃言語モデルを微調整し、ターゲットのLSMから望ましくない応答を誘発するプロンプトを生成する。新規性と多様性を優先する明確な規則化であっても、既存のアプローチはモード崩壊または効果的な攻撃を発生させることができないことを示す。フレキシブルで確率論的に原理化された代替手段として,GFlowNetの微調整と二次平滑化フェーズを併用して,多種多様な効果的な攻撃プロンプトを生成するようアタッカーモデルを訓練することを提案する。提案手法により生成された攻撃は,安全チューニングと遠隔操作の両方で広範囲のLLMに対して有効であり,目標LLM間での移動が良好であることがわかった。最後に,提案手法により生成したレッドチームプロンプトのデータセットを用いて,安全チューニングされたモデルが,他のRLベースのレッドチームアプローチからの攻撃に対して堅牢であることを示す。

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.