Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240617となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# LLMの文化的価値差:プロンプト、言語、モデルサイズ Cultural Value Differences of LLMs: Prompt, Language, and Model Size ( http://arxiv.org/abs/2407.16891v1 ) ライセンス: Link先を確認	Qishuai Zhong, Yike Yun, Aixin Sun,	(参考訳) 本研究の目的は,大規模言語モデル(LLM)が示す文化的価値の行動パターンを明らかにすることである。研究された変種には、質問の順序付け、プロンプト言語、モデルサイズが含まれる。実験の結果,それぞれのLSMは異なる文化的価値で効率的に振る舞うことができることがわかった。もっと興味深いのは (i)LLMは、単一の言語でプロンプトを提示する場合、比較的一貫した文化的価値を示す。 (二文化価値の表現に影響を及ぼすことができるもの(eg、中国語又は英語) 同じ質問は、異なる言語で同じLLMがクエリされたときに、異なる文化的価値を導き出すことができる。 3) 同モデルのサイズの違い(例, Llama2-7B vs 13B vs 70B)は, モデルの違い(例, Llama2 vs Mixtral)よりも文化的価値に有意な影響を及ぼす。実験の結果,LLMのクエリ言語とモデルサイズが文化的価値の相違をもたらす主な要因であることが判明した。 Our study aims to identify behavior patterns in cultural values exhibited by large language models (LLMs). The studied variants include question ordering, prompting language, and model size. Our experiments reveal that each tested LLM can efficiently behave with different cultural values. More interestingly: (i) LLMs exhibit relatively consistent cultural values when presented with prompts in a single language. (ii) The prompting language e.g., Chinese or English, can influence the expression of cultural values. The same question can elicit divergent cultural values when the same LLM is queried in a different language. (iii) Differences in sizes of the same model (e.g., Llama2-7B vs 13B vs 70B) have a more significant impact on their demonstrated cultural values than model differences (e.g., Llama2 vs Mixtral). Our experiments reveal that query language and model size of LLM are the main factors resulting in cultural value differences.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-17
# マルチモーダルAIに基づくリクルートにおける融合手法の探求:FairCVdbからの洞察 Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb ( http://arxiv.org/abs/2407.16892v1 ) ライセンス: Link先を確認	Swati Swati, Arjun Roy, Eirini Ntoutsi,	(参考訳) 表形式データや画像,テキストなど,個別のモダリティに対する公平性に配慮した学習に関する多くの研究にもかかわらず,総合的な分析のためにさまざまなモダリティを融合させるマルチモーダルデータに対する作業は少なくなっている。本研究では,FairCVdbデータセットを用いたマルチモーダルAIベースの採用システムにおいて,マルチモーダル融合技術の公平性とバイアスの影響について検討する。以上の結果から,早期融合は両人口の基盤的真理と密接に一致し,各モダリティの特異な特徴を統合することにより,最も低いMAEを達成できることが示唆された。対照的に、遅延融合は高度に一般化された平均スコアとより高いMAEをもたらす。以上の結果から,早期核融合の有意な可能性の重要性が示唆された。将来の研究は、代替核融合戦略を探求し、公正性を改善するためにモダリティ関連の公正性制約を組み込む可能性がある。コードと追加の洞察については、https://github.com/Swati17293/Multimodal-AI-Based-Recruitment-FairCVdbを参照してください。 Despite the large body of work on fairness-aware learning for individual modalities like tabular data, images, and text, less work has been done on multimodal data, which fuses various modalities for a comprehensive analysis. In this work, we investigate the fairness and bias implications of multimodal fusion techniques in the context of multimodal AI-based recruitment systems using the FairCVdb dataset. Our results show that early-fusion closely matches the ground truth for both demographics, achieving the lowest MAEs by integrating each modality's unique characteristics. In contrast, late-fusion leads to highly generalized mean scores and higher MAEs. Our findings emphasise the significant potential of early-fusion for accurate and fair applications, even in the presence of demographic biases, compared to late-fusion. Future research could explore alternative fusion strategies and incorporate modality-related fairness constraints to improve fairness. For code and additional insights, visit: https://github.com/Swati17293/Multimodal-AI-Based-Recruitment-FairCVdb	翻訳日:2024-08-05 01:45:45 公開日:2024-06-17
# AI強化探索によるエミッション増加の推定 Estimating the Increase in Emissions caused by AI-augmented Search ( http://arxiv.org/abs/2407.16894v1 ) ライセンス: Link先を確認	Wim Vanderbauwhede,	(参考訳) 従来の検索クエリに対するAI生成の回答は、エネルギー消費を劇的に増加させる。我々の推計では、エネルギー需要は60～70倍増加する。これは、従来の検索におけるエネルギー消費量の更新と、BLOOMモデル、176Bパラメータモデル、OpenAIのChatGPTに対するクエリのエネルギー需要に関する最近の研究に基づいている。 AI-generated answers to conventional search queries dramatically increase the energy consumption. By our estimates, energy demand increase by 60-70 times. This is a based on an updated estimate of energy consumption for conventional search and recent work on the energy demand of queries to the BLOOM model, a 176B parameter model, and OpenAI's ChatGPT, which is of similar complexity.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-17
# フェアネス研究のノーム:メタ分析 (Unfair) Norms in Fairness Research: A Meta-Analysis ( http://arxiv.org/abs/2407.16895v1 ) ライセンス: Link先を確認	Jennifer Chien, A. Stevie Bergman, Kevin R. McKee, Nenad Tomasev, Vinodkumar Prabhakaran, Rida Qadri, Nahema Marchal, William Isaac,	(参考訳) アルゴリズムフェアネスは人工知能(AI)研究において重要な関心事となっている。しかし、公正なAIシステムの開発は客観的なプロセスではない。公正は本質的に主観的な概念であり、研究や開発に関わる人々の価値、経験、アイデンティティによって形作られた。現在フェアネス研究に埋め込まれている規範と価値をよりよく理解するために、2018年から2022年までの2つの主要なカンファレンスであるAIフェアネスと倫理に関するAIESとFAccTから、アルゴリズムフェアネス論文のメタ分析を行い、2018年から2022年にかけての139の論文の最終サンプルをカバーした。第1に、米国中心の視点がフェアネス研究全体において支配的であり、第2に、フェアネス研究は、人間のアイデンティティのバイナリ化(例えば、"Black/White"、"male/female")に広く依存していることを示す。これらの発見は、現在の研究がアイデンティティと生きた経験の複雑さをしばしば見落とし、最終的にアルゴリズムのバイアスと公正性を定義する際に、さまざまなグローバルな文脈を表現できないことを強調している。我々は、これらの研究設計選択の限界について議論し、AIシステムにおける公平性に対するより包括的で代表的なアプローチを促進するための推奨を提供し、人間のアイデンティティと価値に対するグローバルな理解を受け入れるパラダイムシフトを促します。 Algorithmic fairness has emerged as a critical concern in artificial intelligence (AI) research. However, the development of fair AI systems is not an objective process. Fairness is an inherently subjective concept, shaped by the values, experiences, and identities of those involved in research and development. To better understand the norms and values embedded in current fairness research, we conduct a meta-analysis of algorithmic fairness papers from two leading conferences on AI fairness and ethics, AIES and FAccT, covering a final sample of 139 papers over the period from 2018 to 2022. Our investigation reveals two concerning trends: first, a US-centric perspective dominates throughout fairness research; and second, fairness studies exhibit a widespread reliance on binary codifications of human identity (e.g., "Black/White", "male/female"). These findings highlight how current research often overlooks the complexities of identity and lived experiences, ultimately failing to represent diverse global contexts when defining algorithmic bias and fairness. We discuss the limitations of these research design choices and offer recommendations for fostering more inclusive and representative approaches to fairness in AI systems, urging a paradigm shift that embraces nuanced, global understandings of human identity and values.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-17
# 照明スペクトルの最適化による物体の色相の制御 Controlling the color appearance of objects by optimizing the illumination spectrum ( http://arxiv.org/abs/2407.09511v1 ) ライセンス: Link先を確認	Mariko Yamaguchi, Masaru Tsuchida, Takahiro Matsumoto, Tetsuro Tokunaga, Takayoshi Mochizuki,	(参考訳) 我々は、自然に白く見えるようにして、特定のターゲット色を変更する革新的な照明システムを開発した。私たちのシステムは、照明のスペクトルパワー分布(SPD)を正確に制御し、メタメリズムのユニークな現象を活用することで、今まで見たことのない方法で、ユニークな色のバリエーションを実現します。本システムでは, 所定の物質に対する照明の最適SPDを計算してメタメリズムを誘導し, 様々なLED色を用いて照明を合成する。我々は2024年のパリファッションウィークでシステムの実装を実演した。モデルがステージに上がると、彼らのドレスは魅惑的な変化を起こす。私たちのシステムでは、ドレスの色を変えて、印象的な色から別の色への変化を示しています。 We have developed an innovative lighting system that changes specific target colors while keeping the lights appearing naturally white. By precisely controlling the spectral power distribution (SPD) of illumination and harnessing the unique phenomenon of metamerism, our system achieves unique color variations in ways you've never seen before. Our system calculates the optimal SPDs of illumination for given materials to intensively induce metamerism, and then synthesizes the illumination using various colors of LEDs. We successfully demonstrated the system's implementation at Paris Fashion Week 2024. As models step onto the stage, their dresses initiate a captivating transformation. Our system altering the colors of the dresses, showcasing an impressive transition from one stunning color to another.	翻訳日:2024-07-22 13:28:38 公開日:2024-06-17
# AIコピロットの設計と評価 -- 小売コピロットテンプレートのケーススタディ Design and evaluation of AI copilots -- case studies of retail copilot templates ( http://arxiv.org/abs/2407.09512v1 ) ライセンス: Link先を確認	Michal Furmakiewicz, Chang Liu, Angus Taylor, Ilya Venger,	(参考訳) AIのコパイロを成功させるには、体系的なアプローチが必要だ。本稿では,コピロの設計と評価を2つのセクションに分けた。 Microsoftが小売ドメイン用のコピロテンプレートを開発するケーススタディは、それぞれの側面の役割と重要性を説明するために使用される。最初のセクションでは、LLM、知識検索とアクションのためのプラグイン、オーケストレーション、システムプロンプト、責任あるAIガードレールなど、コピロのアーキテクチャの重要な技術コンポーネントについて検討している。第2節では、ビジネスコンテキストでAIを使用する場合、望ましい結果を促進し、意図しない結果を管理するための原則として、テストと評価について論じている。我々は、エンドツーエンドのヒューマンAI決定ループフレームワークのレンズを通して、品質と安全性を計測し、改善する方法について議論する。本稿では,コピロの解剖学とテストと評価の重要側面を考察することにより,人間中心のAIアシスタントを構築する上で,優れた設計と評価の実践がいかに重要であるかを示す具体的な証拠を提供する。 Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, including the LLM, plugins for knowledge retrieval and actions, orchestration, system prompts, and responsible AI guardrails. The second section discusses testing and evaluation as a principled way to promote desired outcomes and manage unintended consequences when using AI in a business context. We discuss how to measure and improve its quality and safety, through the lens of an end-to-end human-AI decision loop framework. By providing insights into the anatomy of a copilot and the critical aspects of testing and evaluation, this paper provides concrete evidence of how good design and evaluation practices are essential for building effective, human-centered AI assistants.	翻訳日:2024-07-22 13:28:38 公開日:2024-06-17
# 共同学習の構築 - 入門プログラミングにおけるソーシャルアノテーションの探求 Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ( http://arxiv.org/abs/2407.10322v1 ) ライセンス: Link先を確認	Francisco Gomes de Oliveira Neto, Felix Dobslaw,	(参考訳) ソフトウェア工学教育の需要の増加は、プログラミングやソフトウェア設計といった実践的な応用を必要とするさまざまなトピックがグループワークやインタラクションによって支えられているため、コースにおける学習上の課題を提起する。ソーシャルアノテーション(Social Annotation、SA)は、学生間の協調学習を強化するための教育手法である。 SAでは、学生と教師の両方が、フィードバックフルーツ、ペルーサル、ダイゴなどのプラットフォームを使用して、コース資料を共同で注釈付けし、議論する。このアプローチは、学生が自分の考えや答えを同僚と共有することを奨励し、よりインタラクティブな学習環境を育む。私たちは、ソフトウェア工学の学部生を対象にした入門プログラミングコースで講義の準備ツールとして、Perlipsall経由でソーシャルアノテーションを実装する経験を共有します。ペルーサルが112名の学生の受験成績に及ぼす影響を報告する。その結果,有意義な社会的アノテーションに携わる学生の81%が,このコースに合格したことがわかった。特に、試験に合格する学生の比率は、ペルーサルの割り当てがより多く完了するにつれて上昇する傾向にある。一方、ペルーサルの議論に参加していない学生の56%のみが試験に合格した。このコースにはペルーサルの強制参加は行わなかった。しかし,コース評価アンケートから得られたフィードバックから,ほとんどの学生がペルーサルをコースの好意的な構成要素に位置づけており,本科目への関心が高まっていることが明らかとなった。 The increasing demand for software engineering education presents learning challenges in courses due to the diverse range of topics that require practical applications, such as programming or software design, all of which are supported by group work and interaction. Social Annotation (SA) is an approach to teaching that can enhance collaborative learning among students. In SA, both students and teachers utilize platforms like Feedback Fruits, Perusall, and Diigo to collaboratively annotate and discuss course materials. This approach encourages students to share their thoughts and answers with their peers, fostering a more interactive learning environment. We share our experience of implementing social annotation via Perusall as a preparatory tool for lectures in an introductory programming course aimed at undergraduate students in Software Engineering. We report the impact of Perusall on the examination results of 112 students. Our results show that 81% of students engaged in meaningful social annotation successfully passed the course. Notably, the proportion of students passing the exam tends to rise as they complete more Perusall assignments. In contrast, only 56% of students who did not participate in Perusall discussions managed to pass the exam. We did not enforce mandatory Perusall participation in the course. Yet, the feedback from our course evaluation questionnaire reveals that most students ranked Perusall among their favorite components of the course and that their interest in the subject has increased.	翻訳日:2024-07-22 12:49:16 公開日:2024-06-17
# AIとワイヤレス技術の融合について:3GPP標準化の進展 On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress ( http://arxiv.org/abs/2407.10984v1 ) ライセンス: Link先を確認	Chen Sun, Tao Cui, Wenqi Zhang, Yingshuang Bai, Shuo Wang, Haojin Li,	(参考訳) 人工知能(AI)と無線通信技術の組み合わせは、2030年に向けた主要な技術トレンドの1つになっている。これには、AIを使用して無線伝送の効率を改善し、無線ネットワークによるAIデプロイメントをサポートすることが含まれる。本稿では,第3世代パートナーシッププロジェクト(GPP)標準開発の最新動向を紹介する。無線ネットワークによるAIモデル分散転送と,ビームマネジメント(BM)のためのAIに焦点を当てた最新の研究を紹介するとともに,学術的な成果を取り入れるために,既存の標準をどのように修正すべきかを解説する。 Combing Artificial Intelligence (AI) and wireless communication technologies has become one of the major technologies trends towards 2030. This includes using AI to improve the efficiency of the wireless transmission and supporting AI deployment with wireless networks. In this article, the latest progress of the Third Generation Partnership Project (3GPP) standards development is introduced. Concentrating on AI model distributed transfer and AI for Beam Management (BM) with wireless network, we introduce the latest studies and explain how the existing standards should be modified to incorporate the results from academia.	翻訳日:2024-07-22 12:49:16 公開日:2024-06-17
# GameVibe:マルチモーダル・アフェクティブ・ゲーム・コーポレーション GameVibe: A Multimodal Affective Game Corpus ( http://arxiv.org/abs/2407.12787v1 ) ライセンス: Link先を確認	Matthew Barthet, Maria Kaselimi, Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis,	(参考訳) オンラインビデオとストリーミングのプラットフォームが成長を続ける中、情緒的コンピューティングの研究は、複数のモダリティを含むより複雑な研究へとシフトしてきた。しかし、高品質なオーディオ視覚刺激を持つデータセットがまだ不足している。本稿では,ゲーム内行動観察や視聴者エンゲージメントのための3人称感情ラベルを含む,マルチモーダル・オーディオ視覚刺激からなる新しい感情コーパスであるGameVibeを提案する。コーパスは、30のゲームにまたがる様々な公開ゲームプレイセッションのビデオで構成されており、高品質な刺激と優れたオーディオ視覚とゲームプレイの多様性を確実にするために特に注目されている。さらに、アノテータ間の合意の観点から、アノテータの信頼性について分析する。 As online video and streaming platforms continue to grow, affective computing research has undergone a shift towards more complex studies involving multiple modalities. However, there is still a lack of readily available datasets with high-quality audiovisual stimuli. In this paper, we present GameVibe, a novel affect corpus which consists of multimodal audiovisual stimuli, including in-game behavioural observations and third-person affect labels for viewer engagement. The corpus consists of videos from a diverse set of publicly available gameplay sessions across 30 games, with particular attention to ensure high-quality stimuli with good audiovisual and gameplay diversity. Furthermore, we present an analysis on the reliability of the annotators in terms of inter-annotator agreement.	翻訳日:2024-07-22 08:57:39 公開日:2024-06-17
# SS-ADA: セマンティックセグメンテーションのための半監督型アクティブドメイン適応フレームワーク SS-ADA: A Semi-Supervised Active Domain Adaptation Framework for Semantic Segmentation ( http://arxiv.org/abs/2407.12788v1 ) ライセンス: Link先を確認	Weihao Yan, Yeqiang Qian, Yueyuan Li, Tao Li, Chunxiang Wang, Ming Yang,	(参考訳) セマンティックセグメンテーションは知的車両において重要な役割を担い、環境に関するピクセルレベルのセマンティック情報を提供する。しかし、新しい運転シナリオにセマンティックセグメンテーションモデルを適用する場合、ラベル付け予算は高価で時間を要する。コストを削減するため、大量のラベルのない画像を活用する半教師付きセマンティックセマンティックセマンティクス法が提案されている。それにもかかわらず、それらの性能は、典型的には教師あり学習によって達成される実践的なアプリケーションに必要な正確さに欠ける。重要な欠点は、通常、無ラベルの画像をランダムに選択し、モデルトレーニングのサンプル値の評価を無視する点である。本稿では,画像レベルの獲得戦略を用いたセマンティックセグメンテーションのための半教師付きアクティブドメイン適応(SS-ADA)フレームワークを提案する。 SS-ADAは、アクティブラーニングを半教師付きセマンティックセグメンテーションに統合し、ターゲットドメインからの限られたラベル付きデータで教師付き学習の精度を達成する。さらに,IoUに基づくクラス重み付け戦略を設計し,アクティブラーニングからのアノテーションを用いてクラス不均衡問題を緩和する。本研究では,合成ドメイン適応設定と実ドメイン適応設定について広範な実験を行った。その結果,本手法の有効性が示された。 SS-ADAは、リアルタイムセグメンテーションモデルを使用する場合、ターゲットラベル付きデータのわずか25%で教師付き学習の精度を達成または超過することができる。 SS-ADAのコードはhttps://github.com/ywher/SS-ADAで公開されている。 Semantic segmentation plays an important role in intelligent vehicles, providing pixel-level semantic information about the environment. However, the labeling budget is expensive and time-consuming when semantic segmentation model is applied to new driving scenarios. To reduce the costs, semi-supervised semantic segmentation methods have been proposed to leverage large quantities of unlabeled images. Despite this, their performance still falls short of the accuracy required for practical applications, which is typically achieved by supervised learning. A significant shortcoming is that they typically select unlabeled images for annotation randomly, neglecting the assessment of sample value for model training. In this paper, we propose a novel semi-supervised active domain adaptation (SS-ADA) framework for semantic segmentation that employs an image-level acquisition strategy. SS-ADA integrates active learning into semi-supervised semantic segmentation to achieve the accuracy of supervised learning with a limited amount of labeled data from the target domain. Additionally, we design an IoU-based class weighting strategy to alleviate the class imbalance problem using annotations from active learning. We conducted extensive experiments on synthetic-to-real and real-to-real domain adaptation settings. The results demonstrate the effectiveness of our method. SS-ADA can achieve or even surpass the accuracy of its supervised learning counterpart with only 25% of the target labeled data when using a real-time segmentation model. The code for SS-ADA is available at https://github.com/ywher/SS-ADA.	翻訳日:2024-07-22 08:57:39 公開日:2024-06-17
# 目に見えないトポロジへの一般化:生物学的神経活動の制御に向けて Generalisation to unseen topologies: Towards control of biological neural network activity ( http://arxiv.org/abs/2407.12789v1 ) ライセンス: Link先を確認	Laurens Engwegen, Daan Brinks, Wendelin Böhmer,	(参考訳) 生体神経ネットワークにおけるクローズドループ制御の進歩のための新しいイメージングおよび神経刺激技術これにより、活動伝播の研究、および病理行動の診断と治療に応用できる。活動伝播の部分的に観察可能な特性、エッジを観測できないネットワーク、神経系の動的性質などにより、適応的で一般化可能な制御が必要である。本稿では,この一般化問題を解析するために,異なるトポロジを持つニューロンネットワークを手続き的に生成する環境を提案する。さらに、提示された部分観測可能な環境下での深部RLエージェントの一般化性能を評価するために、既存のトランスフォーマーベースアーキテクチャを調整した。エージェントは、限られた数のトレーニングネットワークから見えないテストネットワークへの制御を一般化する能力を示す。 Novel imaging and neurostimulation techniques open doors for advancements in closed-loop control of activity in biological neural networks. This would allow for applications in the investigation of activity propagation, and for diagnosis and treatment of pathological behaviour. Due to the partially observable characteristics of activity propagation, through networks in which edges can not be observed, and the dynamic nature of neuronal systems, there is a need for adaptive, generalisable control. In this paper, we introduce an environment that procedurally generates neuronal networks with different topologies to investigate this generalisation problem. Additionally, an existing transformer-based architecture is adjusted to evaluate the generalisation performance of a deep RL agent in the presented partially observable environment. The agent demonstrates the capability to generalise control from a limited number of training networks to unseen test networks.	翻訳日:2024-07-22 08:57:39 公開日:2024-06-17
# 参照設計による制約に基づくモデリング Constraint based Modeling according to Reference Design ( http://arxiv.org/abs/2407.00064v1 ) ライセンス: Link先を確認	Erik Heiland, Peter Hillmann, Andreas Karcher,	(参考訳) ベストプラクティスという形での参照モデルは、再利用のための設計としての知識を確保するための重要な要素である。一般的なモデリングアプローチでは、レポジトリだけでなく、サポート方法で参照モデルを埋め込むメカニズムを提供していません。そのため、この専門知識から利益を得ることはほとんど不可能である。問題は、参照モデルは、ソリューションの開発に役立てられるほど公式に記述されていないことである。その結果、課題はプロセスと、参照モデルによって支援された専用ソリューションの設計において、ユーザがどのようにサポートできるかである。本稿では,セマンティック技術を用いた参照モデルの形式記述のための汎用的アプローチとその応用について述べる。我々のモデリングアシスタントは、参照ビルディングブロックに基づく異なる手法を用いたソリューションモデルの構築を可能にする。この環境は、適合性のための参照モデルに対して、開発した設計のその後の検証を可能にする。したがって、我々の参照モデリングアシスタントは相互依存を強調している。これらの手法の適用は、要件の形式化と、最終的に成熟度モデルにおける品質保証に寄与する。システム設計の文脈で複数の参照モデルを使用することが可能である。この手法は産業分野で評価され、異なるモデリングランドスケープに統合することができる。 Reference models in form of best practices are an essential element to ensured knowledge as design for reuse. Popular modeling approaches do not offer mechanisms to embed reference models in a supporting way, let alone a repository of it. Therefore, it is hardly possible to profit from this expertise. The problem is that the reference models are not described formally enough to be helpful in developing solutions. Consequently, the challenge is about the process, how a user can be supported in designing dedicated solutions assisted by reference models. In this paper, we present a generic approach for the formal description of reference models using semantic technologies and their application. Our modeling assistant allows the construction of solution models using different techniques based on reference building blocks. This environment enables the subsequent verification of the developed designs against the reference models for conformity. Therefore, our reference modeling assistant highlights the interdependency. The application of these techniques contributes to the formalization of requirements and finally to quality assurance in context of maturity model. It is possible to use multiple reference models in context of system of system designs. The approach is evaluated in industrial area and it can be integrated into different modeling landscapes.	翻訳日:2024-07-07 13:43:41 公開日:2024-06-17
# シンボリック回帰のための大規模言語モデルに基づく物理学部学生のための個人学習ツール A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression ( http://arxiv.org/abs/2407.00065v1 ) ライセンス: Link先を確認	Yufan Zhu, Zi-Yu Khoo, Jonathan Sze Choong Low, Stephane Bressan,	(参考訳) インターリーブド・プラクティスは、学部生の記憶と問題解決能力を高める。本稿では,Large Language Model (LLM) 上に構築されたパーソナライズされた学習ツールについて紹介する。本ツールは,複雑な現象に対する学生の質的思考と問題解決能力を高めるために,次元解析手法を活用する。提案手法は,記号回帰のためのLLMと,素早い工学的手法による次元解析を組み合わせることで,物理変数間の関係を理解するためのユニークな視点を提供する。このことは、物理学と数学的原理のより広義の理解を促進し、特定の文脈における確立された方程式の解釈と適用に依存する従来の学部の物理学教育を補完する。 Feynman氏の物理学の講義から得られた方程式に基づいて、パーソナライズされた学習ツールをテストする。本ツールでは,ほとんどの方程式の物理変数間の関係を正しく識別し,物理系学生の相補的個別学習ツールとしての価値を評価できる。 Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses. We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students as they complete homework containing problems interleaved from undergraduate physics courses. Our tool leverages the dimensional analysis method, enhancing students' qualitative thinking and problem-solving skills for complex phenomena. Our approach combines LLMs for symbolic regression with dimensional analysis via prompt engineering and offers students a unique perspective to comprehend relationships between physics variables. This fosters a broader and more versatile understanding of physics and mathematical principles and complements a conventional undergraduate physics education that relies on interpreting and applying established equations within specific contexts. We test our personalized learning tool on the equations from Feynman's lectures on physics. Our tool can correctly identify relationships between physics variables for most equations, underscoring its value as a complementary personalized learning tool for undergraduate physics students.	翻訳日:2024-07-07 13:43:41 公開日:2024-06-17
# 数千ものLoRAアダプターを頭上から読み取るコンプレックス(動画あり) Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead ( http://arxiv.org/abs/2407.00066v1 ) ライセンス: Link先を確認	Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon,	(参考訳) 低ランクのアダプタ (LoRA) を搭載した細調整の大型言語モデル (LLM) が一般的となり、LoRA更新でのみ同じLLMのコピーを多数生成する。このパラダイムは、異なるLoRAを含むクエリに対するリアルタイム応答を提供するシステムの課題を示す。以前の作業では、そのようなシステムの設計を最適化していたが、GPUメモリに数千のLoRAを格納できないため、LoRAの継続的なロードとオフロードが依然として必要だった。この問題を軽減するため,LoRAアダプタの圧縮効果について検討する。 SVDを用いて個別に圧縮アダプタを検討するとともに,LoRA固有のスケーリング行列と組み合わせた共有ベースにLoRAを共同圧縮する方法を提案する。最大500LORAによる実験では、圧縮されたLORAは、1000LORA以上の現実的なサービスシナリオにおいて大きなスループット向上を提供し、単一のLORAを提供するスループットの75%を維持しながら、性能を保っていることが示された。 Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and offloading of LoRAs, as it is infeasible to store thousands of LoRAs in GPU memory. To mitigate this issue, we investigate the efficacy of compression when serving LoRA adapters. We consider compressing adapters individually via SVD and propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices. Our experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains in realistic serving scenarios with over a thousand LoRAs, maintaining 75% of the throughput of serving a single LoRA.	翻訳日:2024-07-07 13:43:41 公開日:2024-06-17
# パーセプトロン協調フィルタリング Perceptron Collaborative Filtering ( http://arxiv.org/abs/2407.00067v1 ) ライセンス: Link先を確認	Arya Chakraborty,	(参考訳) 多変量ロジスティック回帰分類器は、他の多くのユーザの好みや嗜好情報を収集することで、ユーザの興味を自動予測する手法である協調フィルタリングを実装するための優れた方法であるが、ニューラルネットワークを使って同様の結果を得ることができる。推薦システムは情報フィルタリングシステムのサブクラスであり、特定のユーザにとって最も関連性の高い項目に対する提案を提供する。パーセプトロン(Perceptron)またはニューラルネットワーク(Neural Network)は、バックプロパゲーションと勾配降下を用いた複雑なデータセットの適合のために設計された機械学習モデルである。高度な最適化手法と組み合わせると、このモデルは古典的ロジスティック分類器の代用となることが証明される。最適化には、特徴スケーリング、平均正規化、正規化、ハイパーパラメータチューニング、正規勾配降下の代わりに確率/ミニバッチ勾配勾配を用いる。このユースケースでは、レコメンデータシステムでパーセプトロンを使用してパラメータ、すなわち複数のユーザからのデータを適合させ、それを特定のユーザの嗜好や関心を予測する。 While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. A perceptron or a neural network is a machine learning model designed for fitting complex datasets using backpropagation and gradient descent. When coupled with advanced optimization techniques, the model may prove to be a great substitute for classical logistic classifiers. The optimizations include feature scaling, mean normalization, regularization, hyperparameter tuning and using stochastic/mini-batch gradient descent instead of regular gradient descent. In this use case, we will use the perceptron in the recommender system to fit the parameters i.e., the data from a multitude of users and use it to predict the preference/interest of a particular user.	翻訳日:2024-07-07 13:34:23 公開日:2024-06-17
# 低深度量子信号の最適位相推定 Optimal Low-Depth Quantum Signal-Processing Phase Estimation ( http://arxiv.org/abs/2407.01583v1 ) ライセンス: Link先を確認	Yulong Dong, Jonathan A. Gross, Murphy Yuezhen Niu,	(参考訳) 絡み合いやコヒーレント増幅のような量子効果は、古典的な限界を超えた量子パラメータ推定の精度を大幅に向上させるのに使うことができる。しかし、デコヒーレンスや時間依存誤差といった課題はハイゼンベルクの増幅を妨げている。本稿では,これらの課題に対して頑健であり,Cram\'{e}r-Rao境界によって予測される最適性能を実現する量子信号生成位相推定アルゴリズムを提案する。これらのアルゴリズムは、相互依存型位相パラメータをほぼ直交型に分離するために量子信号変換を使用し、一方の時間依存誤差が他方の学習精度を損なわないことを保証している。実証可能な古典的推定と準最適量子回路設計を組み合わせることで、超伝導2量子ビット実験において、低深さ(10ドル)の回路を用いて不要なスワップ角を推定するために、前例のない標準偏差精度10^{-4}$ラジアンを達成できる。これは、既存の方法よりも最大で2桁改善されている。理論的,数値的には,時間依存型位相誤差に対するアルゴリズムの最適性を示し,時間依存型パラメータ$\varphi$の分散が,低深さ系における漸近的ハイゼンベルクスケーリングよりも高速にスケールすることを示した。我々の結果は量子フィッシャー情報に対して厳密に検証され、2量子ゲート学習の未整合精度を達成するためのプロトコルの能力を確認する。 Quantum effects like entanglement and coherent amplification can be used to drastically enhance the accuracy of quantum parameter estimation beyond classical limits. However, challenges such as decoherence and time-dependent errors hinder Heisenberg-limited amplification. We introduce Quantum Signal-Processing Phase Estimation algorithms that are robust against these challenges and achieve optimal performance as dictated by the Cram\'{e}r-Rao bound. These algorithms use quantum signal transformation to decouple interdependent phase parameters into largely orthogonal ones, ensuring that time-dependent errors in one do not compromise the accuracy of learning the other. Combining provably optimal classical estimation with near-optimal quantum circuit design, our approach achieves an unprecedented standard deviation accuracy of $10^{-4}$ radians for estimating unwanted swap angles in superconducting two-qubit experiments, using low-depth ($<10$) circuits. This represents up to two orders of magnitude improvement over existing methods. Theoretically and numerically, we demonstrate the optimality of our algorithm against time-dependent phase errors, observing that the variance of the time-sensitive parameter $\varphi$ scales faster than the asymptotic Heisenberg scaling in the small-depth regime. Our results are rigorously validated against the quantum Fisher information, confirming our protocol's ability to achieve unmatched precision for two-qubit gate learning.	翻訳日:2024-07-07 13:24:39 公開日:2024-06-17
# Twin-Merging: モデルマージにおけるモジュールエキスパートの動的統合 Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging ( http://arxiv.org/abs/2406.15479v1 ) ライセンス: Link先を確認	Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng,	(参考訳) 大規模言語モデルの時代において、モデルマージは、余分なトレーニングなしで複数のタスク固有のモデルを単一のマルチタスクモデルに結合する、有望な方法である。しかし、2つの課題が残る。 (a)異なるモデル間の干渉と (b)テスト中の異種データ。従来のモデルマージ手法は、これらの問題により微調整されたモデルに比べて大きな性能差を示すことが多い。さらに、ひとつのサイズにフィットするモデルでは、さまざまなテストデータに対する柔軟性が欠如し、パフォーマンスが低下します。共有されたタスク固有の知識と排他的なタスク固有の知識の両方が、パフォーマンスのマージには不可欠であるが、排他的な知識を直接マージすることは、全体的なパフォーマンスを妨げている。そこで本研究では,1)知識を共有コンポーネントと排他コンポーネントにモジュール化し,冗長性を低減し効率を向上する圧縮,2)入力に基づいて共有知識とタスク固有の知識を動的にマージする,という2つの主要な段階を包含するTwin-Mergingを提案する。このアプローチは、マージされたモデルと微調整されたモデルのパフォーマンスギャップを狭め、異種データへの適応性を向上させる。識別的タスクと生成的タスクの両方を対象とした12ドルデータセットの大規模な実験により,識別的タスクの絶対正規化スコアが平均28.34ドル%向上し,生成的タスクの微調整された上限を超える結果が得られた。 (我々の実装はhttps://github.com/LZY-the-boys/Twin-Mergin.com)。 In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $12$ datasets for both discriminative and generative tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. (Our implementation is available in https://github.com/LZY-the-boys/Twin-Mergin.)	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# ジャイアントの肩について:ダイナミック・ロジット・フュージョンによる不運な弱み On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion ( http://arxiv.org/abs/2406.15480v1 ) ライセンス: Link先を確認	Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng,	(参考訳) タスク固有のアプリケーションのための大規模言語モデルの効率的な微調整は必須であるが、これらのモデルの膨大なパラメータは、そのトレーニングをますます困難にしている。効果的な方法に関する多くの提案にもかかわらず、更新時の勾配計算にはかなりのメモリオーバーヘッドが残っている。一連のタスク固有の小さなモデルを微調整し、その知識を追加のトレーニングなしでもっと大きなモデルに直接転送するのでしょうか? 本稿では,ロジット算術を用いた弱い対強の特殊化について検討し,この問題への直接的な回答を容易にする。既存の弱強法では、静的な知識伝達比と1つの小さなモデルを用いて複雑な知識を伝達し、最適以下の性能をもたらす。 % この問題に対処するため、これらの制限を克服するため、我々は、異なるタスクに特化して、一連のタスク固有の小さなモデルで動作する動的ロジット融合アプローチを提案する。この方法は、各復号ステップでこれらのモデル間の重みを適応的に割り当て、Kullback-Leibler分散制約最適化問題を通して重みを学習する。我々は、シングルタスクとマルチタスクの両方の設定において、様々なベンチマークで広範な実験を行い、主要な結果を得た。本手法は、7Bモデルから13Bモデルに専門知識を移すことにより、シングルタスクシナリオでは96.4\%、マルチタスクシナリオでは86.3\%の性能ギャップを、13Bモデルの完全な微調整と比較して埋める。特に、目に見えないタスクでパフォーマンスを上回ります。さらに,本手法は,単一タスクに対する文脈内学習とマルチタスクシナリオに対するタスク算術とをシームレスに統合できることを実証する。実装はhttps://github.com/Facico/Dynamic-Logit-Fusion.comで公開しています。 Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?} In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance. % To address this, To surmount these limitations, we propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. This method adaptively allocates weights among these models at each decoding step, learning the weights through Kullback-Leibler divergence constrained optimization problems. We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results. By transferring expertise from the 7B model to the 13B model, our method closes the performance gap by 96.4\% in single-task scenarios and by 86.3\% in multi-task scenarios compared to full fine-tuning of the 13B model. Notably, we achieve surpassing performance on unseen tasks. Moreover, we further demonstrate that our method can effortlessly integrate in-context learning for single tasks and task arithmetic for multi-task scenarios. (Our implementation is available in https://github.com/Facico/Dynamic-Logit-Fusion.)	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# CSRT:コードスイッチング赤チームデータセットを用いたLCMの評価と解析 CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset ( http://arxiv.org/abs/2406.15481v1 ) ライセンス: Link先を確認	Haneul Yoo, Yongjin Yang, Hwaran Lee,	(参考訳) 大規模言語モデル(LLM)の最近の研究は、言語モデリングにおける従来の課題を超えて、その多言語能力と安全性に光を当てている。それでも、現在のベンチマークでは、包括的な評価ができず、手動のアノテーションに過度に依存していることが明らかになっている。本稿では,LLMの多言語理解と安全性を同時にテストする,単純かつ効果的なリピート手法であるコードスイッチング・レッドチーム(CSRT)を提案する。 CSRTデータセットは、最大10言語を結合した315のコードスイッチングクエリからなり、望ましくない動作を広範囲に引き出す。 CSRTは10種類の最先端LCMによる広範囲な実験を通じて、既存の多言語的リピート技術よりも優れた性能を示し、既存の英語の手法よりも46.7%のアタックを達成している。 CSRTデータセットに対する有害な応答を,スケーリング法則,安全でない行動カテゴリー,最適データ生成のための入力条件を含む16Kサンプルを用いてアブレーション研究により分析した。さらに、単言語データを用いてコードスイッチング攻撃プロンプトを生成することにより、CSRTの拡張性を検証する。 Recent studies in large language models (LLMs) shed light on their multilingual ability and safety, beyond conventional tasks in language modeling. Still, current benchmarks reveal their inability to comprehensively evaluate them and are excessively dependent on manual annotations. In this paper, we introduce code-switching red-teaming (CSRT), a simple yet effective red-teaming technique that simultaneously tests multilingual understanding and safety of LLMs. We release the CSRT dataset, which comprises 315 code-switching queries combining up to 10 languages and eliciting a wide range of undesirable behaviors. Through extensive experiments with ten state-of-the-art LLMs, we demonstrate that CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than existing methods in English. We analyze the harmful responses toward the CSRT dataset concerning various aspects under ablation studies with 16K samples, including but not limited to scaling laws, unsafe behavior categories, and input conditions for optimal data generation. Additionally, we validate the extensibility of CSRT, by generating code-switching attack prompts with monolingual data.	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# 学術的統合のためのブロックチェーン:Blockchain Academic Credential Interoperability Protocol(BACIP)の開発 Blockchain for Academic Integrity: Developing the Blockchain Academic Credential Interoperability Protocol (BACIP) ( http://arxiv.org/abs/2406.15482v1 ) ライセンス: Link先を確認	Juan A. Berrios Moya,	(参考訳) 本研究は,世界規模の学術的資格証明のセキュリティ,プライバシ,相互運用性を著しく向上するために設計された,Blockchain Academic Credential Interoperability Protocol(BACIP)を紹介する。 BACIPは、デュアルブロックチェーンアーキテクチャ、スマートコントラクト、ゼロ知識証明を統合し、不正を低減し、世界中の学生やプロフェッショナルのモビリティと機会を改善することを目的とした、スケーラブルで透明なフレームワークを提供する。研究手法は、関連する文献の厳密なレビューと高度な技術コンポーネントの体系的な統合を含む混合メソッドのアプローチを採用する。これには、普遍的に互換性のあるシステムの開発を支えている質的および定量的分析の両方が含まれる。予備的な評価は、BACIPが認証効率を高め、改ざんや不正アクセスに対するセキュリティを強化することを示唆している。理論的な枠組みと実践的な実装はしっかりとした基盤を築き上げてきたが、実際の有効性は実運用環境で実証的な検証を待つ。今後の研究は、プロトタイプのデプロイ、堅牢なバリデーションポリシの確立、正確なテストパラメータの定義に注力する予定である。このクリティカルフェーズは、BACIPの運用上の堅牢性とその国際教育標準への準拠を徹底的に評価するために欠かせない。この研究は、学術的資格の管理と保護のための堅牢なモデルを提案し、ブロックチェーン技術を使用した認証検証のさらなるイノベーションのための強力な基盤を築くことで、学術分野に大きく貢献する。 This research introduces the Blockchain Academic Credential Interoperability Protocol (BACIP), designed to significantly enhance the security, privacy, and interoperability of verifying academic credentials globally, addressing the widespread issue of academic fraud. BACIP integrates dual blockchain architecture, smart contracts, and zero-knowledge proofs to offer a scalable and transparent framework aimed at reducing fraud and improving the mobility and opportunities for students and professionals worldwide. The research methodology adopts a mixed-methods approach, involving a rigorous review of pertinent literature and systematic integration of advanced technological components. This includes both qualitative and quantitative analyses that underpin the development of a universally compatible system. Preliminary evaluations suggest that BACIP could enhance verification efficiency and bolster security against tampering and unauthorized access. While the theoretical framework and practical implementations have laid a solid foundation, the protocol's real-world efficacy awaits empirical validation in a production environment. Future research will focus on deploying a prototype, establishing robust validation policies, and defining precise testing parameters. This critical phase is indispensable for a thorough assessment of BACIP's operational robustness and its compliance with international educational standards. This work contributes significantly to the academic field by proposing a robust model for managing and safeguarding academic credentials, thus laying a strong foundation for further innovation in credential verification using blockchain technology.	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# GenAIによる重複検出 Duplicate Detection with GenAI ( http://arxiv.org/abs/2406.15483v1 ) ライセンス: Link先を確認	Ian Ormesher,	(参考訳) 顧客データは、CRM(Customer Relations Management System)に記録として格納されることが多い。より多くのユーザが手動でそのようなシステムに入力したデータは、データの複製、部分複製、ファジィ複製につながる。これはつまり、顧客や連絡先、アカウントなどにとって、もはや唯一の真実の情報源が存在しないことを意味します。下流のビジネスプロセスは複雑になり、CRMのレコードとターゲットの顧客の間のユニークなマッピングがなければ、トリビュートされます。レコードの検出と非重複化の現在の方法は、Entity Matchingとして知られる従来の自然言語処理技術を使用している。本稿では,大規模言語モデルと生成AIの最近の進歩により,重複したレコードの識別と修復が大幅に向上することを示す。一般的なベンチマークデータセットでは,NLP手法で30%から,提案手法で60%に改善した。 Customer data is often stored as records in Customer Relations Management systems (CRMs). Data which is manually entered into such systems by one of more users over time leads to data replication, partial duplication or fuzzy duplication. This in turn means that there no longer a single source of truth for customers, contacts, accounts, etc. Downstream business processes become increasing complex and contrived without a unique mapping between a record in a CRM and the target customer. Current methods to detect and de-duplicate records use traditional Natural Language Processing techniques known as Entity Matching. In this paper we show how using the latest advancements in Large Language Models and Generative AI can vastly improve the identification and repair of duplicated records. On common benchmark datasets we find an improvement in the accuracy of data de-duplication rates from 30 percent using NLP techniques to almost 60 percent using our proposed method.	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# JobFair: 大規模言語モデルにおけるジェンダー採用バイアスのベンチマークフレームワーク JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models ( http://arxiv.org/abs/2406.15484v1 ) ライセンス: Link先を確認	Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin Jr., Maria Perez-Ortiz,	(参考訳) 本稿では,大規模言語モデル(LLM)における階層的ジェンダー採用バイアスのベンチマーク手法を提案する。まず、医療、財務、建設産業のリアルで匿名化された履歴データセットを使用したフレームワークを導入します。レベルバイアス、スプレッドバイアス、テイストベースのバイアス、統計バイアスなど、階層レベルの性別採用バイアスを評価する。この枠組みは、他の社会的特性やタスクに容易に一般化できる。第2に、ランクアフター・スコアリング(RAS)、ランクベースインパクト比、置換テストベースメトリクス、固定効果モデルベースメトリクスなど、反実的アプローチに基づく新しい統計的・計算的採用バイアスメトリクスを提案する。これらの指標は労働経済学、NLP、法律に根ざしており、雇用バイアスの全体的評価を可能にしている。第三に、私たちは10の最先端のLCMにおける採用バイアスを分析します。 10のLSMのうち6つは、医療と金融において男性に対して有意な偏見を示す。産業効果のレグレッションは、医療産業が男性に最も偏っていることを示している。 GPT-4o と GPT-3.5 が最も偏りのあるモデルであり、3つの業界で有意な偏りを示している。逆に、Gemini-1.5-Pro、Llama3-8b-Instruct、Llama3-70b-Instructは最もバイアスが少ない。 Llama3-8b-InstructとClaude-3-Sonnetを除く全てのLLMの雇用バイアスは、ランダムな膨張や再開内容の減少にかかわらず一貫している。最後に、このフレームワークの採用と実践を容易にするために、ユーザフレンドリーなデモを提供します。 This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse bias and overdebiasing. Our contributions are fourfold: First, we introduce a framework using a real, anonymized resume dataset from the Healthcare, Finance, and Construction industries, meticulously used to avoid confounding factors. It evaluates gender hiring biases across hierarchical levels, including Level bias, Spread bias, Taste-based bias, and Statistical bias. This framework can be generalized to other social traits and tasks easily. Second, we propose novel statistical and computational hiring bias metrics based on a counterfactual approach, including Rank After Scoring (RAS), Rank-based Impact Ratio, Permutation Test-Based Metrics, and Fixed Effects Model-based Metrics. These metrics, rooted in labor economics, NLP, and law, enable holistic evaluation of hiring biases. Third, we analyze hiring biases in ten state-of-the-art LLMs. Six out of ten LLMs show significant biases against males in healthcare and finance. An industry-effect regression reveals that the healthcare industry is the most biased against males. GPT-4o and GPT-3.5 are the most biased models, showing significant bias in all three industries. Conversely, Gemini-1.5-Pro, Llama3-8b-Instruct, and Llama3-70b-Instruct are the least biased. The hiring bias of all LLMs, except for Llama3-8b-Instruct and Claude-3-Sonnet, remains consistent regardless of random expansion or reduction of resume content. Finally, we offer a user-friendly demo to facilitate adoption and practical application of the framework.	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# 適応的構造的スパースアテンションを用いたLLM推論の近接ロスレス高速化 Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention ( http://arxiv.org/abs/2406.15486v1 ) ライセンス: Link先を確認	Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang,	(参考訳) 大規模言語モデル(LLM)は、非常に長いコンテキストウィンドウをサポートするようになったが、バニラアテンションの二次的な複雑さにより、TTFT(Time-to-First-Token)レイテンシが非常に長い。この複雑さに対処する既存のアプローチは、追加の事前訓練や微調整を必要とし、しばしばモデルの精度を犠牲にする。本稿では,まず,理論的および実証的な基礎を,ほぼ無光沢なスパークス・アテンションのために提示する。オーバーヘッドの少ないヘッド固有スパースパターンを実行時に動的にキャプチャすることが重要である。そこで本研究では,適応型構造化とほぼ無意味なスパースアテンションであるSampleAttentionを提案する。重要なスパースパターンを活用すれば、SampleAttentionは、ローカルウィンドウパターンをキャプチャするために隣接するトークンの一定割合に到達し、2段階のクエリ誘導キー値フィルタリングアプローチを使用して、最小のキー値セットを少ないオーバーヘッドで適応的に選択し、カラムストリップパターンをキャプチャする。総合的な評価によると、SampleAttentionは市販のLLMのバニラ注意をほぼ精度の低下なしにシームレスに置き換えることができ、また、FlashAttentionと比較してTTFTを最大2.42\times$に下げることができる。 Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention.	翻訳日:2024-07-01 06:51:29 公開日:2024-06-17
# LLMベースのAIチャットボットに関する完全な調査 A Complete Survey on LLM-based AI Chatbots ( http://arxiv.org/abs/2406.16937v1 ) ライセンス: Link先を確認	Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, Chaoning Zhang,	(参考訳) 過去数十年間、データの増加を目撃し、データ収集、学習ベースのAI技術の基礎を築いた。 AIチャットボットと呼ばれる会話エージェントは、大きな言語モデル(LLM)をトレーニングし、ユーザのプロンプトに応じて新しいコンテンツ(知識)を生成するために、そのようなデータに大きく依存している。 OpenAIのChatGPTの出現により、LLMベースのチャットボットはAIコミュニティに新たな標準を設定した。本稿では,様々な分野におけるLLMベースのチャットボットの進化と展開に関する完全な調査を行う。まず,LLMの進化に続き,現在使用されているLLMベースのチャットボットと開発段階にあるチャットボットについて概説する。 AIチャットボットを新しい知識を生み出すツールとして認識し、さまざまな産業にまたがる様々な応用を探求する。次に、LLMのトレーニングに使用されるデータと、生成された知識の誤用が、いくつかの問題を引き起こす可能性があることを考慮し、オープンな課題について議論する。最後に、多数のアプリケーションにおいて、その効率性と信頼性を高めるための将来の展望について検討する。重要なマイルストーンと、LLMベースのチャットボットの現在の状況に対処することによって、私たちの調査では、次世代の会話型AIをどのように作り直すのかを反映して、読者にこの領域を深く掘り下げるよう求めています。 The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the AI community. This paper presents a complete survey of the evolution and deployment of LLM-based chatbots in various sectors. We first summarize the development of foundational chatbots, followed by the evolution of LLMs, and then provide an overview of LLM-based chatbots currently in use and those in the development phase. Recognizing AI chatbots as tools for generating new knowledge, we explore their diverse applications across various industries. We then discuss the open challenges, considering how the data used to train the LLMs and the misuse of the generated knowledge can cause several issues. Finally, we explore the future outlook to augment their efficiency and reliability in numerous applications. By addressing key milestones and the present-day context of LLM-based chatbots, our survey invites readers to delve deeper into this realm, reflecting on how their next generation will reshape conversational AI.	翻訳日:2024-07-01 06:31:46 公開日:2024-06-17
# ホークス過程から学習された生理事象への混合ノイズ Unmixing Noise from Hawkes Process to Model Learned Physiological Events ( http://arxiv.org/abs/2406.16938v1 ) ライセンス: Link先を確認	Guillaume Staerman, Virginie Loison, Thomas Moreau,	(参考訳) 生理的信号分析は、しばしば生物学的力学を理解するのに不可欠な事象を特定することを含む。従来の手法は手作りの手続きや教師あり学習に依存しており、専門家の依存、堅牢性の欠如、広範囲なラベル付きデータの必要性といった課題を提示している。畳み込み辞書学習(CDL)のようなデータ駆動型手法は代替手段を提供するが、突発的な検出をもたらす傾向がある。この研究は、事象における時間構造の共同学習と急激な検出の除去に対処する新しいアプローチであるUNHaP(Unmix Noise from Hawkes Processes)を導入している。マークされたホークスプロセスを利用して、UNHaPは興味のある出来事と刺激的な出来事を区別する。イベント検出出力を構造化イベントと非構造化イベントの混合として扱うことで、UNHaPは効率的にこれらのプロセスを解き、パラメータを推定する。このアプローチは、誤検出率を最小限に抑えながら、事象の分布の理解を著しく向上させる。 Physiological signal analysis often involves identifying events crucial to understanding biological dynamics. Traditional methods rely on handcrafted procedures or supervised learning, presenting challenges such as expert dependence, lack of robustness, and the need for extensive labeled data. Data-driven methods like Convolutional Dictionary Learning (CDL) offer an alternative but tend to produce spurious detections. This work introduces UNHaP (Unmix Noise from Hawkes Processes), a novel approach addressing the joint learning of temporal structures in events and the removal of spurious detections. Leveraging marked Hawkes processes, UNHaP distinguishes between events of interest and spurious ones. By treating the event detection output as a mixture of structured and unstructured events, UNHaP efficiently unmixes these processes and estimates their parameters. This approach significantly enhances the understanding of event distributions while minimizing false detection rates.	翻訳日:2024-07-01 06:31:46 公開日:2024-06-17
# ChatBug: Chatテンプレートによって誘導される配向LDMの共通脆弱性 ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates ( http://arxiv.org/abs/2406.12935v1 ) ライセンス: Link先を確認	Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran,	(参考訳) 大規模言語モデル(LLM)は、ユーザからの指示に従って会話を行うことが期待されている。 LLMの命令フォロー機能を強化する技術は、通常、事前に定義されたチャットテンプレートに従って構造化されたデータを使って微調整する。チャットテンプレートはLLM性能の最適化に有効であることが示されているが,LLMの安全性に対する影響は理解されていない。本稿では,チャットテンプレートがLLMの安全性にどのように影響するかを検討する。チャットテンプレートによって導入された共通の脆弱性であるChatBugを特定します。 ChatBugを識別するための重要な洞察は、チャットテンプレートがLLMに従わなければならない堅固なフォーマットを提供するが、ユーザによるものではない、ということです。したがって、悪意のあるユーザは、LSMのプロンプト時に必ずしもチャットテンプレートに従うとは限らない。悪意のあるユーザは、チャットテンプレートの知識を活用して、LSMの安全アライメントをバイパスするプロンプトを作れます。 ChatBugの脆弱性を悪用する2つの攻撃を開発した。悪意のあるユーザが8つのSOTA (State-of-the-art) LLMのChatBug脆弱性を悪用し、これらのモデルから意図しない応答を効果的に引き出すことができることを示す。さらに,ChatBugは既存のジェイルブレイク攻撃によって悪用され,攻撃成功率を高めることができることを示す。 ChatBugに対する潜在的な対策について検討する。以上の結果から,ChatBug脆弱性を効果的に軽減する一方で,被害者モデルでは性能劣化が顕著であることがわかった。これらの結果は、安全アライメントと有用性の間のトレードオフを浮き彫りにしている。このトレードオフのバランスをとるための新しい指導法の開発は、今後の研究にとってオープンで重要な方向である Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale. In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessarily follow the chat template when prompting LLMs. Instead, malicious users could leverage their knowledge of the chat template and accordingly craft their prompts to bypass safety alignments of LLMs. We develop two attacks to exploit the ChatBug vulnerability. We demonstrate that a malicious user can exploit the ChatBug vulnerability of eight state-of-the-art (SOTA) LLMs and effectively elicit unintended responses from these models. Moreover, we show that ChatBug can be exploited by existing jailbreak attacks to enhance their attack success rates. We investigate potential countermeasures to ChatBug. Our results show that while adversarial training effectively mitigates the ChatBug vulnerability, the victim model incurs significant performance degradation. These results highlight the trade-off between safety alignment and helpfulness. Developing new methods for instruction tuning to balance this trade-off is an open and critical direction for future research	翻訳日:2024-06-22 00:37:55 公開日:2024-06-17
# 登録する前にセルフトレインする Self-Train Before You Transcribe ( http://arxiv.org/abs/2406.12937v1 ) ライセンス: Link先を確認	Robert Flynn, Anton Ragni,	(参考訳) トレーニング領域とテスト領域の間にミスマッチがある場合、現在の音声認識システムは、大幅な性能劣化を示す。ノイズの多い教員養成のような自己学習手法は、この問題に対処し、そのようなドメインシフトの下でモデルの適応を可能にする。しかし、セルフトレーニングは通常、非ラップのターゲットドメインデータの収集を必要とする。実践的でない環境では,テスト時間適応手法として,テストセットにおける録音におけるノイズの多い学生教師の訓練を行う利点について検討する。言語モデリングにおける動的評価手法と同様に、ドメイン適応の手法として、発話境界と関数間の情報の伝達を可能にする。ドメイン内のデータセットとドメイン外のデータセットは、32.2%までの大きな相対的なゲインを示す実験に使用される。興味深いことに,本手法は,個別適応データを利用した通常の自己学習装置よりも大きな利得を示した。 When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptation of models under such domain shifts. However, self-training typically requires a collection of unlabelled target domain data. For settings where this is not practical, we investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach. Similarly to the dynamic evaluation approach in language modelling, this enables the transfer of information across utterance boundaries and functions as a method of domain adaptation. A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%. Interestingly, our method showed larger gains than the typical self-training setup that utilises separate adaptation data.	翻訳日:2024-06-22 00:37:55 公開日:2024-06-17
# ISにおけるセキュリティと社会工学 --その概要と現状 Security in IS and social engineering -- an overview and state of the art ( http://arxiv.org/abs/2406.12938v1 ) ライセンス: Link先を確認	Florence Sèdes,	(参考訳) 情報技術に関連する大きな変革は、組織とそのアクターのビジネスプロセスをサポートする情報システム(IS)に影響を与える。センシティブで大規模で異質なデータを含む複雑な環境での展開は、法的、社会的、経済的影響を伴うリスクを生み出す。移行と開放性のこの文脈は、これらのISのセキュリティを組織の懸念の中心にしている。すべてのプロセスのデジタル化とIoTデバイス(モノのインターネット)のオープンは、サイバー犯罪という新たな犯罪の出現を促している。このジェネリックな用語は、多くの悪意ある行為をカバーしており、その大半は現在、社会的エンジニアリング戦略を使って実行されている。このような攻撃の悪意は、ユーザーがサイバー攻撃のファシリテーターになるという事実から、サイバーセキュリティの「弱リンク」と認識される点に起因しており、展開方針が不十分であるため、上流のステップについて考える必要がある。 Major transformations related to information technologies affect InformationSystems (IS) that support the business processes of organizations and their actors. Deployment in a complex environment involving sensitive, massive and heterogeneous data generates risks with legal, social and financial impacts. This context of transition and openness makes the security of these IS central to the concerns of organizations. The digitization of all processes and the opening to IoT devices (Internet of Things) has fostered the emergence of a new formof crime, i.e. cybercrime.This generic term covers a number of malicious acts, the majority of which are now perpetrated using social engineering strategies, a phenomenon enabling a combined exploitation of ``human'' vulnerabilities and digital tools. The maliciousness of such attacks lies in the fact that they turn users into facilitators of cyber-attacks, to the point of being perceived as the ``weak link'' of cybersecurity.As deployment policies prove insufficient, it is necessary to think about upstream steps: knowing how to anticipate, identifying weak signals and outliers, detect early and react quickly to computer crime are therefore priority issues requiring a prevention and cooperation approach.In this overview, we propose a synthesis of literature and professional practices on this subject.	翻訳日:2024-06-22 00:37:55 公開日:2024-06-17
# 超伝導回路における多体量子相関の測定 Measurement of Many-Body Quantum Correlations in Superconducting Circuits ( http://arxiv.org/abs/2406.12939v1 ) ライセンス: Link先を確認	Kamal Sharma, Wade DeGottardi,	(参考訳) 近年の超伝導回路技術の進歩により、大型でカスタマイズ可能な回路の製造が日常的に行われている。これにより、量子情報以外の分野、特に量子シミュレーターとしての利用に応用された。この取り組みの重要な課題は、これらの回路によって実現された量子状態の同定である。本稿では,アナログ量子シミュレータにおいて多体相関を読み取ることができるプローブ回路を提案する。多光子状態のために設計された我々の測定方法は、ジョセフソン接合の非線形性を利用して超伝導相作用素の2点相関関数(および潜在的に高次相関関数)を測定する。我々は、量子不純物を持つLCラダーの文脈で、この設計の能力を実証する。提案したプローブは、スクイーズのような本質的に量子相関の測定を可能にし、超伝導回路を用いてアナログ量子シミュレーションの範囲を大きく拡大する可能性がある。 Recent advances in superconducting circuit technology have made the fabrication of large, customizable circuits routine. This has led to their application to areas beyond quantum information and, in particular, to their use as quantum simulators. A key challenge in this effort has been the identification of the quantum states realized by these circuits. Here, we propose a probe circuit capable of reading out many-body correlations in an analog quantum simulator. Our measurement scheme, designed for many-photon states, exploits the non-linearity of the Josephson junction to measure two-point (and potentially higher-order) correlation functions of the superconducting phase operator. We demonstrate the capabilities of this design in the context of an LC-ladder with a quantum impurity. The proposed probe allows for the measurement of inherently quantum correlations, such as squeezing, and has the potential to significantly expand the scope of analog quantum simulations using superconducting circuits.	翻訳日:2024-06-22 00:37:55 公開日:2024-06-17
# PEPit: Pythonにおける一階最適化手法のコンピュータ支援最悪ケース解析 PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python ( http://arxiv.org/abs/2201.04040v2 ) ライセンス: Link先を確認	Baptiste Goujaud, Céline Moucer, François Glineur, Julien Hendrickx, Adrien Taylor, Aymeric Dieuleveut,	(参考訳) PEPitはPythonパッケージで、勾配、プロジェクション、近さ、線形最適化オラクルを含む多くの一階最適化メソッドの最悪のケース分析へのアクセスを、近似やブレグマン変種とともに単純化することを目的としている。簡単に言えば、PEPitはコンピュータ支援による一階最適化手法の最悪のケース解析を可能にするパッケージである。鍵となる考え方は、最悪のケース分析(しばしば性能推定問題(PEP)と呼ばれる)を半確定プログラム(SDP)として実行し、数値的に解くことである。そのため、パッケージのユーザは、実装するのとほとんど同じくらいに、一階のメソッドを書くことしか要求されない。その後、パッケージはSDPモデリング部品の処理を行い、最悪のケース解析は標準解法を介して数値的に行われる。 PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. To do that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modeling parts, and the worst-case analysis is performed numerically via a standard solver.	翻訳日:2024-06-20 05:50:01 公開日:2024-06-17
# 高次元量子系に対する反線型超作用素、量子幾何不変および反線型対称性 Antilinear superoperator, quantum geometric invariance, and antilinear symmetry for higher-dimensional quantum systems ( http://arxiv.org/abs/2202.10989v4 ) ライセンス: Link先を確認	Lu Wei, Zhian Jia, Dagomir Kaszlikowski, Sheng Tan,	(参考訳) 本稿では,特に量子幾何学的不変性,絡み合い分布,対称性に着目したオープン量子系の研究における反線型超作用素とその応用について系統的研究を行う。本稿では,反線形量子チャネル,反線形ユニタリスーパー演算子,反ユニタリスーパー演算子,一般化された$\Theta$-共役など,いくつかの重要な反線形スーパー演算子のクラスについて検討する。ブロッホ表現を用いて、高次元量子系の量子幾何学変換を体系的に研究する。異なる一般化された$\Theta$-共役を選択すれば、ユークリッド計量やミンコフスキー計量を含むブロッホ時空ベクトルの空間に対する様々な測度が得られる。これらの幾何学的構造を利用して、量子幾何学的不変性に制約された多部構造上の絡み合い分布を解析する。オープン量子系の強および弱反線形超作用素対称性についても論じる。さらに、クラマースの優越性と保存された量についても詳細に調べる。 We present a systematic investigation of antilinear superoperators and their applications in studying open quantum systems, particularly focusing on quantum geometric invariance, entanglement distribution, and symmetry. We study several crucial classes of antilinear superoperators, including antilinear quantum channels, antilinearly unital superoperators, antiunitary superoperators, and generalized $\Theta$-conjugation. Using the Bloch representation, we present a systematic investigation of quantum geometric transformations in higher-dimensional quantum systems. By choosing different generalized $\Theta$-conjugations, we obtain various metrics for the space of Bloch space-time vectors, including the Euclidean and Minkowskian metrics. Utilizing these geometric structures, we then investigate the entanglement distribution over a multipartite system constrained by quantum geometric invariance. The strong and weak antilinear superoperator symmetries of the open quantum system are also discussed. Additionally, Kramers' degeneracy and conserved quantities are examined in detail.	翻訳日:2024-06-20 05:43:26 公開日:2024-06-17
# エンタングルメント測定に基づくリアプノフ制御による最大絡み合い状態の生成 Generation of Maximally Entangled States by Lyapunov Control Based on Entanglement Measure ( http://arxiv.org/abs/2203.00182v3 ) ライセンス: Link先を確認	Yun-Yan Lee, Daoyi Dong, Ciann-Dong Yang,	(参考訳) 最大絡み合い状態(MES)は、量子情報処理において高い価値を持つ。量子制御において、MESの生成は一般に、あらかじめ定義されたMESを対象とする状態伝達問題として扱われる。しかし、このアプローチはMES構造を事前に決定する必要があるため、制限されている。本稿では, 量子状態間の距離を使わずに, リアプノフ関数を構成するために, 量子エンタングルメント測度に依存する改良型量子リアプノフ制御手法を提案する。この戦略は、その構造が事前に知られているかどうかに関わらず、単一の制御方式を用いて、任意のMESの作成を可能にする。提案手法は, 絡み合い対策をスカラーとして対象とするため, 絡み合ったサブシステムの数に影響を受けない。当初はバイパーティイト純状態に適用されたが、この方法はベル状態とその等価値を生成する能力を示している。その後の混合状態と多粒子系への応用は、この技術が不特定構造を持つMESを生成できることを示している。 Maximally entangled states (MES) are highly valued in quantum information processing. In quantum control, the creation of MES is typically treated as a state transfer problem with a predefined MES as the target. However, this approach is limited by the requirement to predetermine the MES structure. This paper introduces an improved quantum Lyapunov control approach that relies on the quantum entanglement measure to construct the Lyapunov function, instead of using the distance between quantum states. This strategy enables the preparation of any MES, regardless of whether its structure is known beforehand, using a single control scheme. The proposed entanglement control technique is unaffected by the number of entangled subsystems since it targets the entanglement measure as a scalar. Initially applied to bipartite pure states, this method demonstrates its capability to generate Bell states and their equivalents. Subsequent applications to bipartite mixed states and multipartite systems illustrate that the technique can produce MES with unspecified structures.	翻訳日:2024-06-20 05:43:26 公開日:2024-06-17
# 軽量信頼ハードウェアを用いた高能率プライバシ保護機械学習 Efficient Privacy-Preserving Machine Learning with Lightweight Trusted Hardware ( http://arxiv.org/abs/2210.10133v4 ) ライセンス: Link先を確認	Pengzhi Huang, Thang Hoang, Yueying Li, Elaine Shi, G. Edward Suh,	(参考訳) 本稿では,小型の専用セキュリティプロセッサによるセキュアな機械学習推論プラットフォームを提案する。このプラットフォームは,今日の高性能プロセッサに組み込まれたTEEと比較して,保護とデプロイが容易になる。私たちのプラットフォームは、最先端の3つの大きな利点を提供します。 i) Apple Enclaveプロセッサに似たSoCのTrusted Platform Module(TPM)やオンチップセキュリティサブシステムのような個別のセキュリティチップに匹敵する小さなセキュリティプロセッサのみで,最先端の分散プライバシ保存機械学習(PPML)プロトコルと比較して,大幅なパフォーマンス向上を実現している。 WAN/GPUでは、Falcon (PoPETs'21) やAriaNN (PoPETs'22) よりも4X-63倍高速で、通信効率は3.8X-12倍である。悪意のある設定でさらに高いパフォーマンス向上を実現しています。 (二)本プラットフォームは、本質的な過半数の前提のもと、悪意のある敵に対する攻撃を中止してセキュリティを保証する。 (iii)我々の技術は、TEEにおけるセキュアメモリのサイズに制限されず、ResNet18やTransformerのような高容量な現代のニューラルネットワークをサポートすることができる。 PPMLにおける高性能TEEの使用について以前の研究が検討されているが、この研究は、非常に限られた性能の小さなセキュアなハードウェアであっても、プロトコルが軽量なハードウェア向けに慎重に設計可能であれば、分散PPMLプロトコルの大幅な高速化に活用できることを初めて示すものである。 In this paper, we propose a new secure machine learning inference platform assisted by a small dedicated security processor, which will be easier to protect and deploy compared to today's TEEs integrated into high-performance processors. Our platform provides three main advantages over the state-of-the-art: (i) We achieve significant performance improvements compared to state-of-the-art distributed Privacy-Preserving Machine Learning (PPML) protocols, with only a small security processor that is comparable to a discrete security chip such as the Trusted Platform Module (TPM) or on-chip security subsystems in SoCs similar to the Apple enclave processor. In the semi-honest setting with WAN/GPU, our scheme is 4X-63X faster than Falcon (PoPETs'21) and AriaNN (PoPETs'22) and 3.8X-12X more communication efficient. We achieve even higher performance improvements in the malicious setting. (ii) Our platform guarantees security with abort against malicious adversaries under honest majority assumption. (iii) Our technique is not limited by the size of secure memory in a TEE and can support high-capacity modern neural networks like ResNet18 and Transformer. While previous work investigated the use of high-performance TEEs in PPML, this work represents the first to show that even tiny secure hardware with really limited performance can be leveraged to significantly speed-up distributed PPML protocols if the protocol can be carefully designed for lightweight trusted hardware.	翻訳日:2024-06-20 05:43:26 公開日:2024-06-17
# マルチエージェント強化学習システムにおける直接罰が協調の創発に及ぼす影響の検討 Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2301.08278v3 ) ライセンス: Link先を確認	Nayana Dasgupta, Mirco Musolesi,	(参考訳) 協力の問題を解決することは、機能的社会の構築と維持に根本的に重要である。協力の問題は、忙しい道路交差点の航行から条約交渉まで、人間の社会の中で一様である。社会全体でAIの利用が広まるにつれて、これらの複雑な協調ジレンマをナビゲートできる社会的にインテリジェントなエージェントの必要性がますます顕在化しつつある。直接罰は、人間と非人間の両方の協力の出現を促進することが示されている、ユビキタスな社会メカニズムである。自然界では、直接罰はパートナーの選択と評判と強く結びつき、第三者の罰と共に用いられる。これらのメカニズム間の相互作用は、集団内の協力の出現を促進する可能性がある。しかし,MARL(Multi-Agent Reinforcement Learning, マルチエージェント強化学習, MARL)集団から生まれる学習のダイナミクスや成果を,これらのメカニズムを組み合わせて評価する以前の研究は行われていない。この論文はこのギャップに対処する。直接罰、第三者罰、パートナー選択、評判に関連する行動と学習のダイナミクスを包括的に分析し、評価する。最後に,これらのメカニズムが協調型AIシステムの設計に与える影響について論じる。 Solving the problem of cooperation is fundamentally important for the creation and maintenance of functional societies. Problems of cooperation are omnipresent within human society, with examples ranging from navigating busy road junctions to negotiating treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents capable of navigating these complex cooperative dilemmas is becoming increasingly evident. Direct punishment is a ubiquitous social mechanism that has been shown to foster the emergence of cooperation in both humans and non-humans. In the natural world, direct punishment is often strongly coupled with partner selection and reputation and used in conjunction with third-party punishment. The interactions between these mechanisms could potentially enhance the emergence of cooperation within populations. However, no previous work has evaluated the learning dynamics and outcomes emerging from Multi-Agent Reinforcement Learning (MARL) populations that combine these mechanisms. This paper addresses this gap. It presents a comprehensive analysis and evaluation of the behaviors and learning dynamics associated with direct punishment, third-party punishment, partner selection, and reputation. Finally, we discuss the implications of using these mechanisms on the design of cooperative AI systems.	翻訳日:2024-06-20 05:43:26 公開日:2024-06-17
# RLにおけるマルチモーダル表現の再構成とコントラスト法の組み合わせ Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL ( http://arxiv.org/abs/2302.05342v3 ) ライセンス: Link先を確認	Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann,	(参考訳) 再構成や対照的な損失を用いた自己教師型表現の学習は、画像ベースおよびマルチモーダル強化学習(RL)の性能とサンプルの複雑さを向上させる。ここでは、異なる自己教師付き損失関数は、基礎となるセンサのモジュラリティの情報密度によって異なる利点と制限を有する。レコンストラクションは強力な学習信号を提供するが、気晴らしや刺激的な情報に影響を受けやすい。対照的なアプローチはそれらを無視することができるが、関連するすべての詳細を捕捉できず、表現の崩壊につながる可能性がある。マルチモーダルRLの場合、信号の歪み量に基づいて異なるモダリティを別々に扱う必要があることが示唆される。コントラスト的再構成集約表現学習(CoRAL)を提案する。このフレームワークは,各センサのモダリティに対して,最も適切な自己監督的損失を選択でき,表現が関連する側面により焦点を合わせることができる。我々はCoralの幅広いタスクに対するメリットを、注意散らしや閉塞を含むイメージ、新しい移動スイート、視覚的に現実的な注意散らしを伴う困難な操作スイートで評価する。コントラストと再構成に基づく損失を組み合わせたマルチモーダル表現の学習は,より簡単な表現学習アプローチや近年のベースラインに到達できないタスクを著しく改善し,課題を解決できることを示す。 Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and solve tasks that are out of reach for more naive representation learning approaches and other recent baselines.	翻訳日:2024-06-20 05:43:26 公開日:2024-06-17
# ファイバーベース量子ネットワークにおける非対称ノード配置 Asymmetric node placement in fiber-based quantum networks ( http://arxiv.org/abs/2305.09635v3 ) ライセンス: Link先を確認	Guus Avis, Robert Knegjens, Anders S. Sørensen, Stephanie Wehner,	(参考訳) 既存のインフラによって課される制限は、将来のファイバーベースの量子ネットワークのノード間でさらに間隔を縮めるのを難しくする。そこで本研究では,非対称ノード配置の負の効果を,チェーン内の処理ノード量子リピータだけでなく,有意な絡み合い発生に必要な中間点局の配置を別々に検討することによって検討する。中点駅では、1つの絡み合う試みを行うのに必要な時間、そのような試みの成功確率、そして絡み合った状態の忠実さに非対称性が与える影響について述べる。これは、光子の不識別性に対する色分散の影響を説明することを含む。量子リピータチェーンの場合、リピータノード間の不均一な間隔がボトルネックの原因となるかを数値的に調べ、待ち時間と時間状態の両方をノイズの多い量子メモリに格納する。一つの絡み合い試行に要する時間は、中間点の非対称性と直線的に増加するが、有意な絡み合い発生の成功確率と忠実度、繰り返し鎖の分布時間と誤り率はすべて、非対称性の量に関して第1の導関数を消滅させる。これは、少量の非対称性に対する量子ネットワーク性能のレジリエンスを示唆している。 Restrictions imposed by existing infrastructure can make it hard to ensure an even spacing between the nodes of future fiber-based quantum networks. We here investigate the negative effects of asymmetric node placement by considering separately the placement of midpoint stations required for heralded entanglement generation, as well as of processing-node quantum repeaters in a chain. For midpoint stations, we describe the effect asymmetry has on the time required to perform one entangling attempt, the success probability of such attempts, and the fidelity of the entangled states created. This includes accounting for the effects of chromatic dispersion on photon indistinguishability. For quantum-repeater chains we numerically investigate how uneven spacing between repeater nodes leads to bottlenecks, thereby increasing both the waiting time and the time states are stored in noisy quantum memory. We find that while the time required to perform one entangling attempt may increase linearly with the midpoint's asymmetry, the success probability and fidelity of heralded entanglement generation and the distribution time and error rate for repeater chains all have vanishing first derivatives with respect to the amount of asymmetry. This suggests resilience of quantum-network performance against small amounts of asymmetry.	翻訳日:2024-06-20 05:33:23 公開日:2024-06-17
# DU-Shapley: 効率的なデータセット評価のためのShapley Value Proxy DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation ( http://arxiv.org/abs/2306.02071v2 ) ライセンス: Link先を確認	Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet,	(参考訳) 我々は、データセットのバリュエーションの問題、すなわち、インクリメンタルゲインの定量化の問題を、個々のデータセットを他のデータセットに集約する、機械学習タスクの関連する事前定義されたユーティリティに考慮する。 Shapleyの値は、その正式な公理的正当化によってデータセットのバリュエーションを実行する自然なツールであり、モンテカルロ統合と組み合わせて計算的トラクタビリティの課題を克服することができる。しかし、そのような一般的な近似法は、場合によっては高価である。本稿では、データセット評価問題の構造に関する知識を活用し、より効率的なシェープ値推定器を考案する。そこで本研究では,離散一様シャプリーとよばれる新しい近似法を提案する。我々は、漸近的および非漸近的理論的保証を通じて提案フレームワークの妥当性を正当化し、その利点を広範な数値実験を通して説明する。 We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computational tractability challenges. Such generic approximation methods, however, remain expensive in some cases. In this paper, we exploit the knowledge about the structure of the dataset valuation problem to devise more efficient Shapley value estimators. We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution with support of reasonable size. We justify the relevancy of the proposed framework via asymptotic and non-asymptotic theoretical guarantees and illustrate its benefits via an extensive set of numerical experiments.	翻訳日:2024-06-20 05:23:38 公開日:2024-06-17
# 国家規制政策最適化 State-wise Constrained Policy Optimization ( http://arxiv.org/abs/2306.12594v3 ) ライセンス: Link先を確認	Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, Changliu Liu,	(参考訳) 強化学習(Reinforcement Learning, RL)アルゴリズムはシミュレーション環境では非常に成功したが、実世界の問題への適用には大きな課題が伴い、安全性が大きな懸念事項となっている。特に、自律運転やロボット操作など、多くの困難なタスクにおいて、国家的制約の実施が不可欠である。しかし、CMDP(Constrained Markov Decision Process)の枠組みに基づく既存の安全なRLアルゴリズムは、状態制約を考慮していない。このギャップに対処するため,国家制約付き強化学習のための汎用政策探索アルゴリズムである国家制約付き政策最適化(SCPO)を提案する。 SCPOは、期待する状態の制約満足度を保証する。特に,最大マルコフ決定プロセスの枠組みを導入し,最悪の安全違反がSCPOに拘束されていることを証明した。本稿では,ロボット移動タスクにおけるニューラルネットワークポリシーのトレーニングにおけるアプローチの有効性を実証する。その結果、SCPOは既存の手法よりも優れており、高次元ロボティクスタスクにおける状態制約を処理できることが示唆された。 Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing state-wise constraints is essential for many challenging tasks such as autonomous driving and robot manipulation. However, existing safe RL algorithms under the framework of Constrained Markov Decision Process (CMDP) do not consider state-wise constraints. To address this gap, we propose State-wise Constrained Policy Optimization (SCPO), the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. SCPO provides guarantees for state-wise constraint satisfaction in expectation. In particular, we introduce the framework of Maximum Markov Decision Process, and prove that the worst-case safety violation is bounded under SCPO. We demonstrate the effectiveness of our approach on training neural network policies for extensive robot locomotion tasks, where the agent must satisfy a variety of state-wise safety constraints. Our results show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.	翻訳日:2024-06-20 05:23:38 公開日:2024-06-17
# 開示制御プロキシによる平衡フィルタ Balanced Filtering via Disclosure-Controlled Proxies ( http://arxiv.org/abs/2306.15083v3 ) ライセンス: Link先を確認	Siqi Deng, Emily Diana, Michael Kearns, Aaron Roth,	(参考訳) 本研究では,グループメンバーシップが利用できない場合や,デプロイ時に使用が禁止された場合,センシティブなグループに対してバランスの取れたコホートやセットを収集する問題について検討する。具体的には,我々の展開時収集機構は,ベースレートだけで確認できるよりも,個々のサンプルのグループメンバシップについて顕著に明らかにしていない。そこで本研究では、ラベル付きデータの小さなセットを使って、後にこのフィルタリングや選択タスクに使用できるプロキシ関数を訓練できる学習者について検討する。次に、プロキシ関数の範囲をサンプリング確率に関連付け、新しい例として、プロキシ関数を使用してそれを分類し、プロキシ分類に対応する確率で選択する。重要なことは、プロキシ分類は、人口ベース率のみと比較して、個々のサンプルのセンシティブなグループメンバーシップに関する情報(すなわち、開示のレベルを制御すべき)を著しく多く明らかにし、そのようなプロキシをサンプルおよびオラクル効率のよい方法で見つけることができることを示す必要がある。最後に,提案アルゴリズムを実験的に評価し,その一般化特性を解析する。 We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any individual example compared to population base rates alone (i.e., the level of disclosure should be controlled) and show that we can find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze its generalization properties.	翻訳日:2024-06-20 05:23:38 公開日:2024-06-17
# 臨界モメンタを用いた記憶強化アダムの探索促進 Promoting Exploration in Memory-Augmented Adam using Critical Momenta ( http://arxiv.org/abs/2307.09638v2 ) ライセンス: Link先を確認	Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar,	(参考訳) 適応的な勾配に基づくオプティマイザ、特にAdamは、大規模なディープラーニングモデルのトレーニングにおいて、ハイパーパラメータ設定に対する高速な収束と堅牢性を提供し、その地位を残している。しかし、彼らはしばしば一般化に苦しむが、それはロスランドスケープの鋭いミニマに収束する傾向があるためである。これを解決するために,トレーニング中に臨界運動量項のバッファを組み込むことで,フラットなミニマへの探索を促進するAdamの新しいメモリ拡張版を提案する。このバッファにより、オプティマイザは狭いミニマを越えてオーバーシュートし、探索を促進する。簡単な設定で包括的解析を行うことで、より平坦なミニマへの探索と偏見を高めるためのアプローチの有効性を示す。我々は、画像NetとCIFAR10/100の画像分類、Penn Treebankの言語モデリング、TinyImageNetと5-datasetのオンライン学習タスクにおいて、モデル性能を向上させることを実証的に実証した。私たちのコードは \url{https://github.com/chandar-lab/CMOptimizer} で利用可能です。 Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergence and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages exploration towards flatter minima by incorporating a buffer of critical momentum terms during training. This buffer prompts the optimizer to overshoot beyond narrow minima, promoting exploration. Through comprehensive analysis in simple settings, we illustrate the efficacy of our approach in increasing exploration and bias towards flatter minima. We empirically demonstrate that it can improve model performance for image classification on ImageNet and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on TinyImageNet and 5-dataset. Our code is available at \url{https://github.com/chandar-lab/CMOptimizer}.	翻訳日:2024-06-20 05:13:54 公開日:2024-06-17
# 検証順序決定のためのマルチモーダル事前学習モデル:計画・接地・知覚 Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception ( http://arxiv.org/abs/2308.05295v2 ) ライセンス: Link先を確認	Yunhao Yang, Cyrus Neary, Ufuk Topcu,	(参考訳) 最近開発された事前学習モデルは、テキストや画像など、複数のモードで表現された豊かな世界知識を符号化することができる。しかし、これらのモデルの出力は、シーケンシャルな意思決定タスクを解決するアルゴリズムに統合することはできない。本研究では,事前学習したモデルから得られた知識を利用して,逐次意思決定タスクのための制御器の構築と検証を行い,これらの制御器を視覚的観察によりタスク環境に接地するアルゴリズムを開発した。特に、アルゴリズムは、ユーザーが提供するテキストベースのタスク記述で事前訓練されたモデルをクエリし、モデルの出力を使用して、モデルのタスク関連知識を符号化するオートマトンベースのコントローラを構築する。コントローラにエンコードされた知識が、環境やユーザが提供する仕様に関する抽象的な情報を含む、他の独立して利用可能な知識と一致しているかどうかの正式な検証を可能にする。次に、事前訓練されたモデルのビジョンと言語能力を利用して、タスク環境からの観察とコントローラからのテキストベースの制御ロジック(アクションをトリガーするアクションや条件など)をリンクする。本稿では,ユーザが提供する仕様を知覚的不確実性の下で満足するかどうか,確率的保証を提供する機構を提案する。このアルゴリズムは,日常生活やロボット操作など,現実的なタスクのスイートを通じて,オートマトンベースのコントローラを構築し,検証し,構築する能力を示す。 Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations with formal guarantees. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It allows formal verification of whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. Next, the algorithm leverages the vision and language capabilities of pretrained models to link the observations from the task environment to the text-based control logic from the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to provide probabilistic guarantees on whether the controller satisfies the user-provided specifications under perceptual uncertainties. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.	翻訳日:2024-06-20 05:13:54 公開日:2024-06-17
# 惑星間ナビゲーションのための自律型視覚ベースアルゴリズム An Autonomous Vision-Based Algorithm for Interplanetary Navigation ( http://arxiv.org/abs/2309.09590v3 ) ライセンス: Link先を確認	Eleonora Andreis, Paolo Panicucci, Francesco Topputo,	(参考訳) 深宇宙探査機の急増により、標準のラジオメトリック・トラッキングでそれらをナビゲートすることは不可能になった。自走型惑星間衛星はこの問題の解決策である。本研究では、軌道決定法と、自律プラットフォーム間の惑星間移動に適した画像処理パイプラインを組み合わせることで、完全な視覚に基づくナビゲーションアルゴリズムを構築する。アルゴリズムの計算効率を高めるために、深宇宙画像から抽出された惑星の位置によって供給される状態推定器として、非次元拡張カルマンフィルタが選択される。最適な1組の惑星を追尾するための最適な戦略を適用することにより、推定精度の向上を行う。さらに,光収差と光時間効果を1次近似した新しい深宇宙航法解析モデルを開発した。アルゴリズムの性能は高忠実な地球上でテストされ、火星間移動が深宇宙航法に適用可能であることを示す。 The surge of deep-space probes makes it unsustainable to navigate them with standard radiometric tracking. Self-driving interplanetary satellites represent a solution to this problem. In this work, a full vision-based navigation algorithm is built by combining an orbit determination method with an image processing pipeline suitable for interplanetary transfers of autonomous platforms. To increase the computational efficiency of the algorithm, a non-dimensional extended Kalman filter is selected as state estimator, fed by the positions of the planets extracted from deep-space images. An enhancement of the estimation accuracy is performed by applying an optimal strategy to select the best pair of planets to track. Moreover, a novel analytical measurement model for deep-space navigation is developed providing a first-order approximation of the light-aberration and light-time effects. Algorithm performance is tested on a high-fidelity, Earth--Mars interplanetary transfer, showing the algorithm applicability for deep-space navigation.	翻訳日:2024-06-20 05:04:09 公開日:2024-06-17
# 奥行き雑音に対するロバスト6DoF推定と移動データに対する包括的評価 Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset ( http://arxiv.org/abs/2309.13570v4 ) ライセンス: Link先を確認	Zixun Huang, Keling Yao, Seth Z. Zhao, Chuanyu Pan, Chenfeng Xu, Kathy Zhuang, Tianjian Xu, Weiyu Feng, Allen Y. Yang,	(参考訳) モバイルデバイスによるロバスト6DoFのポーズ推定は、ロボティクス、拡張現実、デジタルツインローカライゼーションの応用の基礎となっている。本稿では,既存のRGBDベースの6DoFポーズ推定手法の各種奥行きセンサノイズに対するロバスト性について検討する。既存の6DoFポーズ推定手法では,深度測定の不正確さによる性能差が著しいことが強調された。このロバスト性問題に対して,DTTDNetと呼ばれる簡易かつ効果的な6DoFポーズ推定手法を提案し,新しい幾何学的特徴フィルタリングモジュールとトレーニング用チャンファー距離損失を特徴とする。さらに、ロバストな6DoFポーズ推定の分野を前進させ、新しいデータセット、Digital Twin Tracking Dataset Mobile (DTTD-Mobile)を導入しました。大規模な実験により、DTTDNetは、DTTD-MobileのABD測定値において、少なくとも4.32以上の最先端の手法よりも60.74ポイント高い性能を示した。さらに重要なことは,本手法は様々なレベルの測定ノイズに対して優れたロバスト性を示し,ノイズ測定に対するロバスト性に対する新しいベンチマークを設定することである。コードとデータセットは、https://github.com/augcog/DTTD2で公開されている。 Robust 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise. We highlight that existing 6DoF pose estimation methods suffer significant performance discrepancies due to depth measurement inaccuracies. In response to the robustness issue, we present a simple and effective transformer-based 6DoF pose estimation approach called DTTDNet, featuring a novel geometric feature filtering module and a Chamfer distance loss for training. Moreover, we advance the field of robust 6DoF pose estimation and introduce a new dataset -- Digital Twin Tracking Dataset Mobile (DTTD-Mobile), tailored for digital twin object tracking with noisy depth data from the mobile RGBD sensor suite of the Apple iPhone 14 Pro. Extensive experiments demonstrate that DTTDNet significantly outperforms state-of-the-art methods at least 4.32, up to 60.74 points in ADD metrics on the DTTD-Mobile. More importantly, our approach exhibits superior robustness to varying levels of measurement noise, setting a new benchmark for the robustness to noise measurements. Code and dataset are made publicly available at: https://github.com/augcog/DTTD2	翻訳日:2024-06-20 05:04:09 公開日:2024-06-17
# ジェネレーティブエッシャーメッシュ Generative Escher Meshes ( http://arxiv.org/abs/2309.14564v4 ) ライセンス: Link先を確認	Noam Aigerman, Thibault Groueix,	(参考訳) 本稿では, 床, モザイク, セラミック, M.C.エッシャーの作品など, 完全反復的, 周期的, タイル可能な2次元画像の完全自動生成法を提案する。タイルを組むとシームレスな正方形テクスチャ画像とは対照的に,本手法では,同じオブジェクトのコピーを繰り返すだけで構成される正方形でないタイリングを生成する。これは、2Dメッシュの幾何学とテクスチャの両方を最適化し、望まれる物体の形状と外観に二乗のタイルを産み出す。我々は、対称群から生じる任意の境界条件に対して有効な全てのメッシュの空間の制約のない微分可能なパラメータ化により、タイルの形状の最適化を可能にする。すなわち、メッシュのラプラシア行列を微分可能なパラメータとして考慮し、2次元メッシュマッピング技術であるOrbifold Tutte Embeddingから導かれる線形系の微分可能な族を構築する。これらの線形系の解空間は、正にすべての有効なタイリング構成であり、したがって、有効タイル全体の終端から終端までの微分可能表現を与える。我々は、テクスチャ化されたメッシュを微分可能なレンダラでレンダリングし、事前訓練された画像拡散モデルを利用して、結果の画像に損失を生じさせ、メッシュのパラメータを更新し、その外観がテキストプロンプトにマッチするようにした。本手法は,多種多様な周期的タイリングパターンに対して,非自明なタイルを用いて,可塑性で魅力的な結果が得られることを示す。 This paper proposes a fully-automatic, text-guided generative method for producing perfectly-repeating, periodic, tile-able 2D imagery, such as the one seen on floors, mosaics, ceramics, and the work of M.C. Escher. In contrast to square texture images that are seamless when tiled, our method generates non-square tilings which comprise solely of repeating copies of the same object. It achieves this by optimizing both geometry and texture of a 2D mesh, yielding a non-square tile in the shape and appearance of the desired object, with close to no additional background details, that can tile the plane without gaps nor overlaps. We enable optimization of the tile's shape by an unconstrained, differentiable parameterization of the space of all valid tileable meshes for given boundary conditions stemming from a symmetry group. Namely, we construct a differentiable family of linear systems derived from a 2D mesh-mapping technique - Orbifold Tutte Embedding - by considering the mesh's Laplacian matrix as differentiable parameters. We prove that the solution space of these linear systems is exactly all possible valid tiling configurations, thereby providing an end-to-end differentiable representation for the entire space of valid tiles. We render the textured mesh via a differentiable renderer, and leverage a pre-trained image diffusion model to induce a loss on the resulting image, updating the mesh's parameters so as to make its appearance match the text prompt. We show our method is able to produce plausible, appealing results, with non-trivial tiles, for a variety of different periodic tiling patterns.	翻訳日:2024-06-20 05:04:09 公開日:2024-06-17
# ゼロショット軌道生成器としての言語モデル Language Models as Zero-Shot Trajectory Generators ( http://arxiv.org/abs/2310.11604v2 ) ライセンス: Link先を確認	Teyun Kwon, Norman Di Palo, Edward Johns,	(参考訳) 大規模言語モデル(LLM)は、最近、低レベルのスキルの選択へのアクセスを与えられたとき、ロボットのハイレベルプランナーとして約束されている。しかし、LLMは低レベルの軌道自体に使用する十分な知識を持っていないとしばしば仮定される。本研究では,LLM(GPT-4)がオブジェクト検出とセグメンテーションビジョンモデルのみへのアクセスを与えられた場合,操作タスクに対して,エンドエフェクタの高密度なシーケンスを直接予測できるかどうかを詳細に検討する。我々は、コンテキスト内例、モーションプリミティブ、または外部軌跡オプティマイザを使わずに、単一のタスクに依存しないプロンプトを設計した。そこで,本研究では,「ボトルキャップを開いて」や「スポンジで皿を拭く」といった実世界の30の言語ベースタスクに対して,どのような設計選択が重要かを検討した。我々の結論は、ロボット工学におけるLLMの想定限界を提起し、LLMが様々な共通タスクに十分な低レベルロボット制御の理解を実際に持っていることを初めて明らかにし、さらに障害を検知し、それに従って軌道の再計画を行うことができる。ビデオ、プロンプト、コードは、https://www.robot-learning.uk/ language-models-trajectory-generatorsで入手できる。 Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigated which design choices in this prompt are the most important. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, prompts, and code are available at: https://www.robot-learning.uk/language-models-trajectory-generators.	翻訳日:2024-06-20 05:04:09 公開日:2024-06-17
# p$-norm線形回帰による経験的リスク最小化のための最適リスク境界 Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression ( http://arxiv.org/abs/2310.12437v2 ) ライセンス: Link先を確認	Ayoub El Hanchi, Murat A. Erdogdu,	(参考訳) 我々は、$p \in (1, \infty)$に対する$p$-norm線形回帰問題に対する経験的リスク最小化の性能について検討する。実現可能な場合、モーメント仮定が全くなく、分布依存定数まで、$O(d)$サンプルはターゲットを正確に回収するのに十分であることを示す。さもなければ、$p \in [2, \infty)$ と、ターゲットと共変量に対する弱モーメント仮定の下では、先行項が一致する経験的リスク最小化子に縛られる高い確率過剰リスクを、漸近的に正確な率である$p$にのみ依存する定数まで証明する。この結果は、最小化子におけるリスクのヘッセンの存在を保証する軽度の仮定の下で、$p \in (1, 2)$ の場合に拡張する。 We study the performance of empirical risk minimization on the $p$-norm linear regression problem for $p \in (1, \infty)$. We show that, in the realizable case, under no moment assumptions, and up to a distribution-dependent constant, $O(d)$ samples are enough to exactly recover the target. Otherwise, for $p \in [2, \infty)$, and under weak moment assumptions on the target and the covariates, we prove a high probability excess risk bound on the empirical risk minimizer whose leading term matches, up to a constant that depends only on $p$, the asymptotically exact rate. We extend this result to the case $p \in (1, 2)$ under mild assumptions that guarantee the existence of the Hessian of the risk at its minimizer.	翻訳日:2024-06-20 05:04:09 公開日:2024-06-17
# クロスチャネルアテンションを用いたリモートセンシング画像の物体検出のためのマルチモーダルトランス Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images ( http://arxiv.org/abs/2310.13876v3 ) ライセンス: Link先を確認	Bissmella Bahaduri, Zuheng Ming, Fangchen Feng, Anissa Mokraou,	(参考訳) リモートセンシング画像(RSI)における物体検出は、地球観測(EO)における多くの応用にとって重要な課題である。自然画像における物体検出とは違い、リモートセンシング画像における物体検出は、注釈付きデータの不足と、わずか数ピクセルで表される小さな物体の存在という課題に直面している。マルチモーダル融合は、RGB、赤外線(IR)、ライダー、合成開口レーダ(SAR)などの複数のモードからのデータを融合することで精度を高めることが決定されている。この目的のために、並列サブネットによって生成される中間または後期の表現の融合が支配的であり、モダリティの数順に計算複雑性が増大し、追加の工学的障害が生じるという欠点がある。クロスアテンション機構を用いて,早期に異なるチャネル間の関係をマッピングする新たなマルチモーダル融合戦略を提案し,異なるモダリティを整列させてコヒーレントな入力を構築する。本手法は,中期・後期の手法とは対照的に,早期の融合に対処することにより,既存の手法と比較して,競争力や性能に優れる。さらに、非シフトブロックのフィードフォワードに畳み込み層を統合することでSWIN変換器を強化する。この拡張により、局所的な注意を通して分離されたウィンドウをマージするモデルの能力が強化され、小さなオブジェクト検出が改善される。大規模な実験により提案した多モード核融合モジュールとアーキテクチャの有効性が証明され、多モード空中画像における物体検出への適用性が確認された。 Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Differing from object detection in natural images, object detection in remote sensing images faces challenges of scarcity of annotated data and the presence of small objects represented by only a few pixels. Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities such as RGB, infrared (IR), lidar, and synthetic aperture radar (SAR). To this end, the fusion of representations at the mid or late stage, produced by parallel subnetworks, is dominant, with the disadvantages of increasing computational complexity in the order of the number of modalities and the creation of additional engineering obstacles. Using the cross-attention mechanism, we propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage, enabling the construction of a coherent input by aligning the different modalities. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques. Additionally, we enhance the SWIN transformer by integrating convolution layers into the feed-forward of non-shifting blocks. This augmentation strengthens the model's capacity to merge separated windows through local attention, thereby improving small object detection. Extensive experiments prove the effectiveness of the proposed multimodal fusion module and the architecture, demonstrating their applicability to object detection in multimodal aerial imagery.	翻訳日:2024-06-20 04:54:22 公開日:2024-06-17
# 脳遺伝子転写の圧縮的発現 Compressed representation of brain genetic transcription ( http://arxiv.org/abs/2310.16113v2 ) ライセンス: Link先を確認	James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev,	(参考訳) 脳の構造は複雑すぎて、圧縮された表現を使わずに直感的に調査することができず、その変化をコンパクトでナビゲート可能な空間に投影する。この課題は、解剖学的および転写学的パターンの結合の複雑さが最大圧縮を要求する遺伝子表現のような高次元データにおいて特に困難である。標準的な主成分分析(PCA)を用いることで、計算効率は、特に大きな圧縮比において、限られた表現率によってオフセットされる。ここでは、最も広く支持されている線形および非線形な手法-PCA、カーネルPCA、非負行列分解(NMF)、t-stochastic neighbor embedding(T-SNE)、一様多様体近似および投影(UMAP)、深部自己符号化量子化再構成フィデリティ、解剖学的コヒーレンス、および信号伝達、微細構造、代謝目標に関する予測ユーティリティに基づく圧縮表現を体系的に比較する。ディープオートエンコーダは、人間の脳における転写パターンの参照標準としての使用をサポートするため、パフォーマンスとターゲットドメインのすべての指標において優れた表現が得られることを示す。 The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.	翻訳日:2024-06-20 04:54:22 公開日:2024-06-17
# パラメトリック不確実性を有するランダムフィールドの多項カオスサロゲート構築 Polynomial Chaos Surrogate Construction for Random Fields with Parametric Uncertainty ( http://arxiv.org/abs/2311.00553v2 ) ライセンス: Link先を確認	Joy N. Mueller, Khachik Sargsyan, Craig J. Daniels, Habib N. Najm,	(参考訳) 工学と応用科学は物理系を厳格に研究するために計算実験に頼っている。これらの系を探索する数学的モデルは非常に複雑であり、サンプリング集約的な研究は、許容できる精度のために、不可能に多くのシミュレーションを必要とすることが多い。サロゲートモデルは、そのような複雑なモデルをサンプリングする際の高い計算コストを回避する手段を提供する。特に、多項式カオス展開(PCEs)は、不確実性の主源がパラメトリックである決定論的モデルの不確実性定量化研究に成功している。本稿では,従来のPCEサロゲートモデルの拡張について論じ,パラメトリック不確実性に加えて固有雑音を有する確率的計算モデルのサロゲート構築を可能にする。我々は,内在的かつパラメトリックな不確実性の結合空間上にPCEサロゲートを開発し,その構成をKarhunen-Loeve展開によるランダムフィールドデータに拡張する。次に,PCEソボ指数を計算するための閉形式解を利用して,モデル全体の出力分散に対する本質的な雑音寄与を定量化するモデルに対して,大域的な感度解析を行う。さらに、結果のジョイントPCEは、基礎となる確率モデルから、統計的にほぼ同値な任意の入力パラメータ設定でランダムな実化を生成することができるという意味で、生成的である。この方法は化学触媒の例モデルで示される。 Engineering and applied science rely on computational experiments to rigorously study physical systems. The mathematical models used to probe these systems are highly complex, and sampling-intensive studies often require prohibitively many simulations for acceptable accuracy. Surrogate models provide a means of circumventing the high computational expense of sampling such complex models. In particular, polynomial chaos expansions (PCEs) have been successfully used for uncertainty quantification studies of deterministic models where the dominant source of uncertainty is parametric. We discuss an extension to conventional PCE surrogate modeling to enable surrogate construction for stochastic computational models that have intrinsic noise in addition to parametric uncertainty. We develop a PCE surrogate on a joint space of intrinsic and parametric uncertainty, enabled by Rosenblatt transformations, and then extend the construction to random field data via the Karhunen-Loeve expansion. We then take advantage of closed-form solutions for computing PCE Sobol indices to perform a global sensitivity analysis of the model which quantifies the intrinsic noise contribution to the overall model output variance. Additionally, the resulting joint PCE is generative in the sense that it allows generating random realizations at any input parameter setting that are statistically approximately equivalent to realizations from the underlying stochastic model. The method is demonstrated on a chemical catalysis example model.	翻訳日:2024-06-20 04:54:22 公開日:2024-06-17
# クラスシンボリック回帰:Gotta Fit 'Em All Class Symbolic Regression: Gotta Fit 'Em All ( http://arxiv.org/abs/2312.01816v2 ) ライセンス: Link先を確認	Wassim Tenachi, Rodrigo Ibata, Thibaut L. François, Foivos I. Diakogiannis,	(参考訳) クラスシンボル回帰(Class Symbolic Regression)は、複数のデータセットに正確に適合する単一の分析機能フォームを自動的に見つけるための、最初のフレームワークである。この階層的な枠組みは、単一の物理現象の全てのメンバーが共通の法則に従うという共通の制約を利用する。提案手法は,非教師付き記号解析関数発見のための次元解析制約と深部強化学習を統合した,従来の記号回帰のための物理記号最適化(「Phi$-SO」)フレームワークの機能を拡張する。さらに、このようなアルゴリズムを評価するために特別に設計された一連の合成物理課題を含む、最初のクラスSRベンチマークを導入する。我々は、これらのベンチマーク課題に適用することで、新しいアプローチの有効性を実証し、恒星の流れを近似したシミュレーション軌道から分析銀河ポテンシャルを抽出し、天体物理学の実用性を実証する。 We introduce 'Class Symbolic Regression' (Class SR) a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each realization being governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($\Phi$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for unsupervised symbolic analytical function discovery from data. Additionally, we introduce the first Class SR benchmark, comprising a series of synthetic physical challenges specifically designed to evaluate such algorithms. We demonstrate the efficacy of our novel approach by applying it to these benchmark challenges and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.	翻訳日:2024-06-20 04:44:38 公開日:2024-06-17
# ゼーマン状態からの光シフトと光子散乱のための状態非感受性波長 State-insensitive wavelengths for light shifts and photon scattering from Zeeman states ( http://arxiv.org/abs/2312.08370v2 ) ライセンス: Link先を確認	Stuart J. Masson, Zhenjie Yan, Jacquelyn Ho, Yue-Hui Lu, Dan M. Stamper-Kurn, Ana Asenjo-Garcia,	(参考訳) 原子は2レベルシステムではなく、その豊富な内部構造は、しばしば光の存在下で複雑な現象を引き起こす。ここでは、全超微細構造と磁気構造を含む非共鳴光散乱を解析する。ゼーマン状態によらず、誘導された原子双極子が同じであり、原子状態を変化させる2光子遷移がオフとなる周波数デチューニングのセットを見つける。アルカリ原子とアルカリ-アースイオンでは、超微細な分裂が磁気双極子モーメントの寄与によって支配される場合、これらの脱調はほぼ一致する。したがって、与えられた `magical'' のデチューニングにおいて、超微細多様体のゼーマン状態はすべてほぼ同じ振る舞いをしており、良い近似に辿り着くことができる。この特徴は光散乱による状態のデコヒーレンスを防ぎ、量子光学実験や量子情報応用に影響を及ぼす。 Atoms are not two-level systems, and their rich internal structure often leads to complex phenomena in the presence of light. Here, we analyze off-resonant light scattering including the full hyperfine and magnetic structure. We find a set of frequency detunings where the induced atomic dipole is the same irrespective of the Zeeman state, and where two-photon transitions that alter the atomic state turn off. For alkali atoms and alkaline-earth ions, if the hyperfine splitting is dominated by the magnetic dipole moment contribution, these detunings approximately coincide. Therefore, at a given ``magical'' detuning, all Zeeman states in a hyperfine manifold behave almost identically, and can be traced out to good approximation. This feature prevents state decoherence due to light scattering, which impacts quantum optics experiments and quantum information applications.	翻訳日:2024-06-20 04:44:38 公開日:2024-06-17
# Gemini: 高機能マルチモーダルモデルのファミリー Gemini: A Family of Highly Capable Multimodal Models ( http://arxiv.org/abs/2312.11805v4 ) ライセンス: Link先を確認	Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, Ryan Doherty, Eli Collins, Clemens Meyer, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Jack Krawczyk, Cosmo Du, Ed Chi, Heng-Tze Cheng, Eric Ni, Purvi Shah, Patrick Kane, Betty Chan, Manaal Faruqui, Aliaksei Severyn, Hanzhao Lin, YaGuang Li, Yong Cheng, Abe Ittycheriah, Mahdis Mahdieh, Mia Chen, Pei Sun, Dustin Tran, Sumit Bagri, Balaji Lakshminarayanan, Jeremiah Liu, Andras Orban, Fabian Güra, Hao Zhou, Xinying Song, Aurelien Boffy, Harish Ganapathy, Steven Zheng, HyunJeong Choe, Ágoston Weisz, Tao Zhu, Yifeng Lu, Siddharth Gopal, Jarrod Kahn, Maciej Kula, Jeff Pitman, Rushin Shah, Emanuel Taropa, Majd Al Merey, Martin Baeuml, Zhifeng Chen, Laurent El Shafey, Yujing Zhang, Olcan Sercinoglu, George Tucker, Enrique Piqueras, Maxim Krikun, Iain Barr, Nikolay Savinov, Ivo Danihelka, Becca Roelofs, Anaïs White, Anders Andreassen, Tamara von Glehn, Lakshman Yagati, Mehran Kazemi, Lucas Gonzalez, Misha Khalman, Jakub Sygnowski, Alexandre Frechette, Charlotte Smith, Laura Culp, Lev Proleev, Yi Luan, Xi Chen, James Lottes, Nathan Schucher, Federico Lebron, Alban Rrustemi, Natalie Clay, Phil Crone, Tomas Kocisky, Jeffrey Zhao, Bartek Perz, Dian Yu, Heidi Howard, Adam Bloniarz, Jack W. Rae, Han Lu, Laurent Sifre, Marcello Maggioni, Fred Alcober, Dan Garrette, Megan Barnes, Shantanu Thakoor, Jacob Austin, Gabriel Barth-Maron, William Wong, Rishabh Joshi, Rahma Chaabouni, Deeni Fatiha, Arun Ahuja, Gaurav Singh Tomar, Evan Senter, Martin Chadwick, Ilya Kornakov, Nithya Attaluri, Iñaki Iturrate, Ruibo Liu, Yunxuan Li, Sarah Cogan, Jeremy Chen, Chao Jia, Chenjie Gu, Qiao Zhang, Jordan Grimstad, Ale Jakse Hartman, Xavier Garcia, Thanumalayan Sankaranarayana Pillai, Jacob Devlin, Michael Laskin, Diego de Las Casas, Dasha Valter, Connie Tao, Lorenzo Blanco, Adrià Puigdomènech Badia, David Reitter, Mianna Chen, Jenny Brennan, Clara Rivera, Sergey Brin, Shariq Iqbal, Gabriela Surita, Jane Labanowski, Abhi Rao, Stephanie Winkler, Emilio Parisotto, Yiming Gu, Kate Olszewska, Ravi Addanki, Antoine Miech, Annie Louis, Denis Teplyashin, Geoff Brown, Elliot Catt, Jan Balaguer, Jackie Xiang, Pidong Wang, Zoe Ashwood, Anton Briukhov, Albert Webson, Sanjay Ganapathy, Smit Sanghavi, Ajay Kannan, Ming-Wei Chang, Axel Stjerngren, Josip Djolonga, Yuting Sun, Ankur Bapna, Matthew Aitchison, Pedram Pejman, Henryk Michalewski, Tianhe Yu, Cindy Wang, Juliette Love, Junwhan Ahn, Dawn Bloxwich, Kehang Han, Peter Humphreys, Thibault Sellam, James Bradbury, Varun Godbole, Sina Samangooei, Bogdan Damoc, Alex Kaskasoli, Sébastien M. R. Arnold, Vijay Vasudevan, Shubham Agrawal, Jason Riesa, Dmitry Lepikhin, Richard Tanburn, Srivatsan Srinivasan, Hyeontaek Lim, Sarah Hodkinson, Pranav Shyam, Johan Ferret, Steven Hand, Ankush Garg, Tom Le Paine, Jian Li, Yujia Li, Minh Giang, Alexander Neitz, Zaheer Abbas, Sarah York, Machel Reid, Elizabeth Cole, Aakanksha Chowdhery, Dipanjan Das, Dominika Rogozińska, Vitaliy Nikolaev, Pablo Sprechmann, Zachary Nado, Lukas Zilka, Flavien Prost, Luheng He, Marianne Monteiro, Gaurav Mishra, Chris Welty, Josh Newlan, Dawei Jia, Miltiadis Allamanis, Clara Huiyi Hu, Raoul de Liedekerke, Justin Gilmer, Carl Saroufim, Shruti Rijhwani, Shaobo Hou, Disha Shrivastava, Anirudh Baddepudi, Alex Goldin, Adnan Ozturel, Albin Cassirer, Yunhan Xu, Daniel Sohn, Devendra Sachan, Reinald Kim Amplayo, Craig Swanson, Dessie Petrova, Shashi Narayan, Arthur Guez, Siddhartha Brahma, Jessica Landon, Miteyan Patel, Ruizhe Zhao, Kevin Villela, Luyu Wang, Wenhao Jia, Matthew Rahtz, Mai Giménez, Legg Yeung, James Keeling, Petko Georgiev, Diana Mincu, Boxi Wu, Salem Haykal, Rachel Saputro, Kiran Vodrahalli, James Qin, Zeynep Cankara, Abhanshu Sharma, Nick Fernando, Will Hawkins, Behnam Neyshabur, Solomon Kim, Adrian Hutter, Priyanka Agrawal, Alex Castro-Ros, George van den Driessche, Tao Wang, Fan Yang, Shuo-yiin Chang, Paul Komarek, Ross McIlroy, Mario Lučić, Guodong Zhang, Wael Farhan, Michael Sharman, Paul Natsev, Paul Michel, Yamini Bansal, Siyuan Qiao, Kris Cao, Siamak Shakeri, Christina Butterfield, Justin Chung, Paul Kishan Rubenstein, Shivani Agrawal, Arthur Mensch, Kedar Soparkar, Karel Lenc, Timothy Chung, Aedan Pope, Loren Maggiore, Jackie Kay, Priya Jhakra, Shibo Wang, Joshua Maynez, Mary Phuong, Taylor Tobin, Andrea Tacchetti, Maja Trebacz, Kevin Robinson, Yash Katariya, Sebastian Riedel, Paige Bailey, Kefan Xiao, Nimesh Ghelani, Lora Aroyo, Ambrose Slone, Neil Houlsby, Xuehan Xiong, Zhen Yang, Elena Gribovskaya, Jonas Adler, Mateo Wirth, Lisa Lee, Music Li, Thais Kagohara, Jay Pavagadhi, Sophie Bridgers, Anna Bortsova, Sanjay Ghemawat, Zafarali Ahmed, Tianqi Liu, Richard Powell, Vijay Bolina, Mariko Iinuma, Polina Zablotskaia, James Besley, Da-Woon Chung, Timothy Dozat, Ramona Comanescu, Xiance Si, Jeremy Greer, Guolong Su, Martin Polacek, Raphaël Lopez Kaufman, Simon Tokumine, Hexiang Hu, Elena Buchatskaya, Yingjie Miao, Mohamed Elhawaty, Aditya Siddhant, Nenad Tomasev, Jinwei Xing, Christina Greer, Helen Miller, Shereen Ashraf, Aurko Roy, Zizhao Zhang, Ada Ma, Angelos Filos, Milos Besta, Rory Blevins, Ted Klimenko, Chih-Kuan Yeh, Soravit Changpinyo, Jiaqi Mu, Oscar Chang, Mantas Pajarskas, Carrie Muir, Vered Cohen, Charline Le Lan, Krishna Haridasan, Amit Marathe, Steven Hansen, Sholto Douglas, Rajkumar Samuel, Mingqiu Wang, Sophia Austin, Chang Lan, Jiepu Jiang, Justin Chiu, Jaime Alonso Lorenzo, Lars Lowe Sjösund, Sébastien Cevey, Zach Gleicher, Thi Avrahami, Anudhyan Boral, Hansa Srinivasan, Vittorio Selo, Rhys May, Konstantinos Aisopos, Léonard Hussenot, Livio Baldini Soares, Kate Baumli, Michael B. Chang, Adrià Recasens, Ben Caine, Alexander Pritzel, Filip Pavetic, Fabio Pardo, Anita Gergely, Justin Frye, Vinay Ramasesh, Dan Horgan, Kartikeya Badola, Nora Kassner, Subhrajit Roy, Ethan Dyer, Víctor Campos Campos, Alex Tomala, Yunhao Tang, Dalia El Badawy, Elspeth White, Basil Mustafa, Oran Lang, Abhishek Jindal, Sharad Vikram, Zhitao Gong, Sergi Caelles, Ross Hemsley, Gregory Thornton, Fangxiaoyu Feng, Wojciech Stokowiec, Ce Zheng, Phoebe Thacker, Çağlar Ünlü, Zhishuai Zhang, Mohammad Saleh, James Svensson, Max Bileschi, Piyush Patil, Ankesh Anand, Roman Ring, Katerina Tsihlas, Arpi Vezer, Marco Selvi, Toby Shevlane, Mikel Rodriguez, Tom Kwiatkowski, Samira Daruki, Keran Rong, Allan Dafoe, Nicholas FitzGerald, Keren Gu-Lemberg, Mina Khan, Lisa Anne Hendricks, Marie Pellat, Vladimir Feinberg, James Cobon-Kerr, Tara Sainath, Maribeth Rauh, Sayed Hadi Hashemi, Richard Ives, Yana Hasson, Eric Noland, Yuan Cao, Nathan Byrd, Le Hou, Qingze Wang, Thibault Sottiaux, Michela Paganini, Jean-Baptiste Lespiau, Alexandre Moufarek, Samer Hassan, Kaushik Shivakumar, Joost van Amersfoort, Amol Mandhane, Pratik Joshi, Anirudh Goyal, Matthew Tung, Andrew Brock, Hannah Sheahan, Vedant Misra, Cheng Li, Nemanja Rakićević, Mostafa Dehghani, Fangyu Liu, Sid Mittal, Junhyuk Oh, Seb Noury, Eren Sezener, Fantine Huot, Matthew Lamm, Nicola De Cao, Charlie Chen, Sidharth Mudgal, Romina Stella, Kevin Brooks, Gautam Vasudevan, Chenxi Liu, Mainak Chain, Nivedita Melinkeri, Aaron Cohen, Venus Wang, Kristie Seymore, Sergey Zubkov, Rahul Goel, Summer Yue, Sai Krishnakumaran, Brian Albert, Nate Hurley, Motoki Sano, Anhad Mohananey, Jonah Joughin, Egor Filonov, Tomasz Kępa, Yomna Eldawy, Jiawern Lim, Rahul Rishi, Shirin Badiezadegan, Taylor Bos, Jerry Chang, Sanil Jain, Sri Gayatri Sundara Padmanabhan, Subha Puttagunta, Kalpesh Krishna, Leslie Baker, Norbert Kalb, Vamsi Bedapudi, Adam Kurzrok, Shuntong Lei, Anthony Yu, Oren Litvin, Xiang Zhou, Zhichun Wu, Sam Sobell, Andrea Siciliano, Alan Papir, Robby Neale, Jonas Bragagnolo, Tej Toor, Tina Chen, Valentin Anklin, Feiran Wang, Richie Feng, Milad Gholami, Kevin Ling, Lijuan Liu, Jules Walter, Hamid Moghaddam, Arun Kishore, Jakub Adamek, Tyler Mercado, Jonathan Mallinson, Siddhinita Wandekar, Stephen Cagle, Eran Ofek, Guillermo Garrido, Clemens Lombriser, Maksim Mukha, Botu Sun, Hafeezul Rahman Mohammad, Josip Matak, Yadi Qian, Vikas Peswani, Pawel Janus, Quan Yuan, Leif Schelin, Oana David, Ankur Garg, Yifan He, Oleksii Duzhyi, Anton Älgmyr, Timothée Lottaz, Qi Li, Vikas Yadav, Luyao Xu, Alex Chinien, Rakesh Shivanna, Aleksandr Chuklin, Josie Li, Carrie Spadine, Travis Wolfe, Kareem Mohamed, Subhabrata Das, Zihang Dai, Kyle He, Daniel von Dincklage, Shyam Upadhyay, Akanksha Maurya, Luyan Chi, Sebastian Krause, Khalid Salama, Pam G Rabinovitch, Pavan Kumar Reddy M, Aarush Selvan, Mikhail Dektiarev, Golnaz Ghiasi, Erdem Guven, Himanshu Gupta, Boyi Liu, Deepak Sharma, Idan Heimlich Shtacher, Shachi Paul, Oscar Akerlund, François-Xavier Aubet, Terry Huang, Chen Zhu, Eric Zhu, Elico Teixeira, Matthew Fritze, Francesco Bertolini, Liana-Eleonora Marinescu, Martin Bölle, Dominik Paulus, Khyatti Gupta, Tejasi Latkar, Max Chang, Jason Sanders, Roopa Wilson, Xuewei Wu, Yi-Xuan Tan, Lam Nguyen Thiet, Tulsee Doshi, Sid Lall, Swaroop Mishra, Wanming Chen, Thang Luong, Seth Benjamin, Jasmine Lee, Ewa Andrejczuk, Dominik Rabiej, Vipul Ranjan, Krzysztof Styrc, Pengcheng Yin, Jon Simon, Malcolm Rose Harriott, Mudit Bansal, Alexei Robsky, Geoff Bacon, David Greene, Daniil Mirylenka, Chen Zhou, Obaid Sarvana, Abhimanyu Goyal, Samuel Andermatt, Patrick Siegler, Ben Horn, Assaf Israel, Francesco Pongetti, Chih-Wei "Louis" Chen, Marco Selvatici, Pedro Silva, Kathie Wang, Jackson Tolins, Kelvin Guu, Roey Yogev, Xiaochen Cai, Alessandro Agostini, Maulik Shah, Hung Nguyen, Noah Ó Donnaile, Sébastien Pereira, Linda Friso, Adam Stambler, Adam Kurzrok, Chenkai Kuang, Yan Romanikhin, Mark Geller, ZJ Yan, Kane Jang, Cheng-Chun Lee, Wojciech Fica, Eric Malmi, Qijun Tan, Dan Banica, Daniel Balle, Ryan Pham, Yanping Huang, Diana Avram, Hongzhi Shi, Jasjot Singh, Chris Hidey, Niharika Ahuja, Pranab Saxena, Dan Dooley, Srividya Pranavi Potharaju, Eileen O'Neill, Anand Gokulchandran, Ryan Foley, Kai Zhao, Mike Dusenberry, Yuan Liu, Pulkit Mehta, Ragha Kotikalapudi, Chalence Safranek-Shrader, Andrew Goodman, Joshua Kessinger, Eran Globen, Prateek Kolhar, Chris Gorgolewski, Ali Ibrahim, Yang Song, Ali Eichenbaum, Thomas Brovelli, Sahitya Potluri, Preethi Lahoti, Cip Baetu, Ali Ghorbani, Charles Chen, Andy Crawford, Shalini Pal, Mukund Sridhar, Petru Gurita, Asier Mujika, Igor Petrovski, Pierre-Louis Cedoz, Chenmei Li, Shiyuan Chen, Niccolò Dal Santo, Siddharth Goyal, Jitesh Punjabi, Karthik Kappaganthu, Chester Kwak, Pallavi LV, Sarmishta Velury, Himadri Choudhury, Jamie Hall, Premal Shah, Ricardo Figueira, Matt Thomas, Minjie Lu, Ting Zhou, Chintu Kumar, Thomas Jurdi, Sharat Chikkerur, Yenai Ma, Adams Yu, Soo Kwak, Victor Ähdel, Sujeevan Rajayogam, Travis Choma, Fei Liu, Aditya Barua, Colin Ji, Ji Ho Park, Vincent Hellendoorn, Alex Bailey, Taylan Bilal, Huanjie Zhou, Mehrdad Khatir, Charles Sutton, Wojciech Rzadkowski, Fiona Macintosh, Konstantin Shagin, Paul Medina, Chen Liang, Jinjing Zhou, Pararth Shah, Yingying Bi, Attila Dankovics, Shipra Banga, Sabine Lehmann, Marissa Bredesen, Zifan Lin, John Eric Hoffmann, Jonathan Lai, Raynald Chung, Kai Yang, Nihal Balani, Arthur Bražinskas, Andrei Sozanschi, Matthew Hayes, Héctor Fernández Alcalde, Peter Makarov, Will Chen, Antonio Stella, Liselotte Snijders, Michael Mandl, Ante Kärrman, Paweł Nowak, Xinyi Wu, Alex Dyck, Krishnan Vaidyanathan, Raghavender R, Jessica Mallet, Mitch Rudominer, Eric Johnston, Sushil Mittal, Akhil Udathu, Janara Christensen, Vishal Verma, Zach Irving, Andreas Santucci, Gamaleldin Elsayed, Elnaz Davoodi, Marin Georgiev, Ian Tenney, Nan Hua, Geoffrey Cideron, Edouard Leurent, Mahmoud Alnahlawi, Ionut Georgescu, Nan Wei, Ivy Zheng, Dylan Scandinaro, Heinrich Jiang, Jasper Snoek, Mukund Sundararajan, Xuezhi Wang, Zack Ontiveros, Itay Karo, Jeremy Cole, Vinu Rajashekhar, Lara Tumeh, Eyal Ben-David, Rishub Jain, Jonathan Uesato, Romina Datta, Oskar Bunyan, Shimu Wu, John Zhang, Piotr Stanczyk, Ye Zhang, David Steiner, Subhajit Naskar, Michael Azzam, Matthew Johnson, Adam Paszke, Chung-Cheng Chiu, Jaume Sanchez Elias, Afroz Mohiuddin, Faizan Muhammad, Jin Miao, Andrew Lee, Nino Vieillard, Jane Park, Jiageng Zhang, Jeff Stanway, Drew Garmon, Abhijit Karmarkar, Zhe Dong, Jong Lee, Aviral Kumar, Luowei Zhou, Jonathan Evens, William Isaac, Geoffrey Irving, Edward Loper, Michael Fink, Isha Arkatkar, Nanxin Chen, Izhak Shafran, Ivan Petrychenko, Zhe Chen, Johnson Jia, Anselm Levskaya, Zhenkai Zhu, Peter Grabowski, Yu Mao, Alberto Magni, Kaisheng Yao, Javier Snaider, Norman Casagrande, Evan Palmer, Paul Suganthan, Alfonso Castaño, Irene Giannoumis, Wooyeol Kim, Mikołaj Rybiński, Ashwin Sreevatsa, Jennifer Prendki, David Soergel, Adrian Goedeckemeyer, Willi Gierke, Mohsen Jafari, Meenu Gaba, Jeremy Wiesner, Diana Gage Wright, Yawen Wei, Harsha Vashisht, Yana Kulizhskaya, Jay Hoover, Maigo Le, Lu Li, Chimezie Iwuanyanwu, Lu Liu, Kevin Ramirez, Andrey Khorlin, Albert Cui, Tian LIN, Marcus Wu, Ricardo Aguilar, Keith Pallo, Abhishek Chakladar, Ginger Perng, Elena Allica Abellan, Mingyang Zhang, Ishita Dasgupta, Nate Kushman, Ivo Penchev, Alena Repina, Xihui Wu, Tom van der Weide, Priya Ponnapalli, Caroline Kaplan, Jiri Simsa, Shuangfeng Li, Olivier Dousse, Fan Yang, Jeff Piper, Nathan Ie, Rama Pasumarthi, Nathan Lintz, Anitha Vijayakumar, Daniel Andor, Pedro Valenzuela, Minnie Lui, Cosmin Paduraru, Daiyi Peng, Katherine Lee, Shuyuan Zhang, Somer Greene, Duc Dung Nguyen, Paula Kurylowicz, Cassidy Hardin, Lucas Dixon, Lili Janzer, Kiam Choo, Ziqiang Feng, Biao Zhang, Achintya Singhal, Dayou Du, Dan McKinnon, Natasha Antropova, Tolga Bolukbasi, Orgad Keller, David Reid, Daniel Finchelstein, Maria Abi Raad, Remi Crocker, Peter Hawkins, Robert Dadashi, Colin Gaffney, Ken Franko, Anna Bulanova, Rémi Leblond, Shirley Chung, Harry Askham, Luis C. Cobo, Kelvin Xu, Felix Fischer, Jun Xu, Christina Sorokin, Chris Alberti, Chu-Cheng Lin, Colin Evans, Alek Dimitriev, Hannah Forbes, Dylan Banarse, Zora Tung, Mark Omernick, Colton Bishop, Rachel Sterneck, Rohan Jain, Jiawei Xia, Ehsan Amid, Francesco Piccinno, Xingyu Wang, Praseem Banzal, Daniel J. Mankowitz, Alex Polozov, Victoria Krakovna, Sasha Brown, MohammadHossein Bateni, Dennis Duan, Vlad Firoiu, Meghana Thotakuri, Tom Natan, Matthieu Geist, Ser tan Girgin, Hui Li, Jiayu Ye, Ofir Roval, Reiko Tojo, Michael Kwong, James Lee-Thorp, Christopher Yew, Danila Sinopalnikov, Sabela Ramos, John Mellor, Abhishek Sharma, Kathy Wu, David Miller, Nicolas Sonnerat, Denis Vnukov, Rory Greig, Jennifer Beattie, Emily Caveness, Libin Bai, Julian Eisenschlos, Alex Korchemniy, Tomy Tsai, Mimi Jasarevic, Weize Kong, Phuong Dao, Zeyu Zheng, Frederick Liu, Fan Yang, Rui Zhu, Tian Huey Teh, Jason Sanmiya, Evgeny Gladchenko, Nejc Trdin, Daniel Toyama, Evan Rosen, Sasan Tavakkol, Linting Xue, Chen Elkind, Oliver Woodman, John Carpenter, George Papamakarios, Rupert Kemp, Sushant Kafle, Tanya Grunina, Rishika Sinha, Alice Talbert, Diane Wu, Denese Owusu-Afriyie, Cosmo Du, Chloe Thornton, Jordi Pont-Tuset, Pradyumna Narayana, Jing Li, Saaber Fatehi, John Wieting, Omar Ajmeri, Benigno Uria, Yeongil Ko, Laura Knight, Amélie Héliou, Ning Niu, Shane Gu, Chenxi Pang, Yeqing Li, Nir Levine, Ariel Stolovich, Rebeca Santamaria-Fernandez, Sonam Goenka, Wenny Yustalim, Robin Strudel, Ali Elqursh, Charlie Deck, Hyo Lee, Zonglin Li, Kyle Levin, Raphael Hoffmann, Dan Holtmann-Rice, Olivier Bachem, Sho Arora, Christy Koh, Soheil Hassas Yeganeh, Siim Põder, Mukarram Tariq, Yanhua Sun, Lucian Ionita, Mojtaba Seyedhosseini, Pouya Tafti, Zhiyu Liu, Anmol Gulati, Jasmine Liu, Xinyu Ye, Bart Chrzaszcz, Lily Wang, Nikhil Sethi, Tianrun Li, Ben Brown, Shreya Singh, Wei Fan, Aaron Parisi, Joe Stanton, Vinod Koverkathu, Christopher A. Choquette-Choo, Yunjie Li, TJ Lu, Abe Ittycheriah, Prakash Shroff, Mani Varadarajan, Sanaz Bahargam, Rob Willoughby, David Gaddy, Guillaume Desjardins, Marco Cornero, Brona Robenek, Bhavishya Mittal, Ben Albrecht, Ashish Shenoy, Fedor Moiseev, Henrik Jacobsson, Alireza Ghaffarkhah, Morgane Rivière, Alanna Walton, Clément Crepy, Alicia Parrish, Zongwei Zhou, Clement Farabet, Carey Radebaugh, Praveen Srinivasan, Claudia van der Salm, Andreas Fidjeland, Salvatore Scellato, Eri Latorre-Chimoto, Hanna Klimczak-Plucińska, David Bridson, Dario de Cesare, Tom Hudson, Piermaria Mendolicchio, Lexi Walker, Alex Morris, Matthew Mauger, Alexey Guseynov, Alison Reid, Seth Odoom, Lucia Loher, Victor Cotruta, Madhavi Yenugula, Dominik Grewe, Anastasia Petrushkina, Tom Duerig, Antonio Sanchez, Steve Yadlowsky, Amy Shen, Amir Globerson, Lynette Webb, Sahil Dua, Dong Li, Surya Bhupatiraju, Dan Hurt, Haroon Qureshi, Ananth Agarwal, Tomer Shani, Matan Eyal, Anuj Khare, Shreyas Rammohan Belle, Lei Wang, Chetan Tekur, Mihir Sanjay Kale, Jinliang Wei, Ruoxin Sang, Brennan Saeta, Tyler Liechty, Yi Sun, Yao Zhao, Stephan Lee, Pandu Nayak, Doug Fritz, Manish Reddy Vuyyuru, John Aslanides, Nidhi Vyas, Martin Wicke, Xiao Ma, Evgenii Eltyshev, Nina Martin, Hardie Cate, James Manyika, Keyvan Amiri, Yelin Kim, Xi Xiong, Kai Kang, Florian Luisier, Nilesh Tripuraneni, David Madras, Mandy Guo, Austin Waters, Oliver Wang, Joshua Ainslie, Jason Baldridge, Han Zhang, Garima Pruthi, Jakob Bauer, Feng Yang, Riham Mansour, Jason Gelman, Yang Xu, George Polovets, Ji Liu, Honglong Cai, Warren Chen, XiangHai Sheng, Emily Xue, Sherjil Ozair, Christof Angermueller, Xiaowei Li, Anoop Sinha, Weiren Wang, Julia Wiesinger, Emmanouil Koukoumidis, Yuan Tian, Anand Iyer, Madhu Gurumurthy, Mark Goldenson, Parashar Shah, MK Blake, Hongkun Yu, Anthony Urbanowicz, Jennimaria Palomaki, Chrisantha Fernando, Ken Durden, Harsh Mehta, Nikola Momchev, Elahe Rahimtoroghi, Maria Georgaki, Amit Raul, Sebastian Ruder, Morgan Redshaw, Jinhyuk Lee, Denny Zhou, Komal Jalan, Dinghua Li, Blake Hechtman, Parker Schuh, Milad Nasr, Kieran Milan, Vladimir Mikulik, Juliana Franco, Tim Green, Nam Nguyen, Joe Kelley, Aroma Mahendru, Andrea Hu, Joshua Howland, Ben Vargas, Jeffrey Hui, Kshitij Bansal, Vikram Rao, Rakesh Ghiya, Emma Wang, Ke Ye, Jean Michel Sarr, Melanie Moranski Preston, Madeleine Elish, Steve Li, Aakash Kaku, Jigar Gupta, Ice Pasupat, Da-Cheng Juan, Milan Someswar, Tejvi M., Xinyun Chen, Aida Amini, Alex Fabrikant, Eric Chu, Xuanyi Dong, Amruta Muthal, Senaka Buthpitiya, Sarthak Jauhari, Nan Hua, Urvashi Khandelwal, Ayal Hitron, Jie Ren, Larissa Rinaldi, Shahar Drath, Avigail Dabush, Nan-Jiang Jiang, Harshal Godhia, Uli Sachs, Anthony Chen, Yicheng Fan, Hagai Taitelbaum, Hila Noga, Zhuyun Dai, James Wang, Chen Liang, Jenny Hamer, Chun-Sung Ferng, Chenel Elkind, Aviel Atias, Paulina Lee, Vít Listík, Mathias Carlen, Jan van de Kerkhof, Marcin Pikus, Krunoslav Zaher, Paul Müller, Sasha Zykova, Richard Stefanec, Vitaly Gatsko, Christoph Hirnschall, Ashwin Sethi, Xingyu Federico Xu, Chetan Ahuja, Beth Tsai, Anca Stefanoiu, Bo Feng, Keshav Dhandhania, Manish Katyal, Akshay Gupta, Atharva Parulekar, Divya Pitta, Jing Zhao, Vivaan Bhatia, Yashodha Bhavnani, Omar Alhadlaq, Xiaolin Li, Peter Danenberg, Dennis Tu, Alex Pine, Vera Filippova, Abhipso Ghosh, Ben Limonchik, Bhargava Urala, Chaitanya Krishna Lanka, Derik Clive, Yi Sun, Edward Li, Hao Wu, Kevin Hongtongsak, Ianna Li, Kalind Thakkar, Kuanysh Omarov, Kushal Majmundar, Michael Alverson, Michael Kucharski, Mohak Patel, Mudit Jain, Maksim Zabelin, Paolo Pelagatti, Rohan Kohli, Saurabh Kumar, Joseph Kim, Swetha Sankar, Vineet Shah, Lakshmi Ramachandruni, Xiangkai Zeng, Ben Bariach, Laura Weidinger, Tu Vu, Alek Andreev, Antoine He, Kevin Hui, Sheleem Kashem, Amar Subramanya, Sissie Hsiao, Demis Hassabis, Koray Kavukcuoglu, Adam Sadovsky, Quoc Le, Trevor Strohman, Yonghui Wu, Slav Petrov, Jeffrey Dean, Oriol Vinyals,	(参考訳) 本報告では,画像,音声,ビデオ,テキスト理解の両面で優れた機能を示す,新しいマルチモーダルモデルであるGeminiを紹介する。 GeminiファミリーはUltra、Pro、Nanoサイズで構成されており、複雑な推論タスクからオンデバイスメモリ制約のユースケースまで幅広い用途に適している。幅広いベンチマークに対する評価は、我々の最も有能なGemini Ultraモデルが、これらのベンチマークのうち32のベンチマークのうち30の最先端モデルに進歩していることを示している - 特に、よく研究された試験ベンチマークMMLUで人為的なパフォーマンスを達成した最初のモデルであり、調査した20のマルチモーダルベンチマークのうちの1つで最先端モデルが改善されている。 Geminiファミリーのクロスモーダル推論と言語理解における新機能によって、さまざまなユースケースが実現できると考えています。 Gemini、Gemini Advanced、Google AI Studio、Cloud Vertex AIといったサービスを通じて、ユーザに対して責任を負うような、ゲミニモデルのポストトレーニングとデプロイに対する当社のアプローチについて議論する。 This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.	翻訳日:2024-06-20 04:44:38 公開日:2024-06-17
# マルチフェイスAIフィードバックを用いた感情支援会話における不愉快さの軽減 Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback ( http://arxiv.org/abs/2401.05928v3 ) ライセンス: Link先を確認	Jiashuo Wang, Chunpu Xu, Chak Tou Leong, Wenjie Li, Jing Li,	(参考訳) 情緒的支援会話システムは,ユーザの情緒的苦痛を軽減し,彼らの課題への対処を支援することを目的としている。支援的応答を生成するためには, 共感, 支援戦略, 応答コヒーレンスといった複数の要因を, 従来手法で確立されていた方法を考えることが重要である。それにもかかわらず、以前のモデルは時折、サポートを提供するが、反生産的な効果を示す、不快な反応を発生させる。心理学やコミュニケーション理論によれば、たった一つの要因における粗悪なパフォーマンスは、応答が弱くなる可能性がある。モデルトレーニングの観点からは、これらのモデルがトレーニングフェーズ中に不十分な応答にさらされていないため、トークンが推論中に不必要な応答をもたらすかどうかを区別できない。この問題に対処するために、感情支援のための多面的AIフィードバック(Muffin)を用いて、不健康を緩和する新しいモデル非依存フレームワークを導入する。具体的には、Muffin氏は、複数の要因を考慮して、特定のモデルによって生成された応答の有用性を評価するために、多面的なAIフィードバックモジュールを使用している。対照的な学習を用いることで、有用なものに比べて、不必要な応答を生成するモデルの可能性を減らすことができる。実験結果から,Muffinは応答頻度と応答関連性をわずかに増加させながら,非ヘルペス反応の発生を効果的に軽減することが示された。 An emotional support conversation system aims to alleviate users' emotional distress and assist them in addressing their challenges. To generate supportive responses, it is critical to consider multiple factors such as empathy, support strategies, and response coherence, as established in prior methods. Nonetheless, previous models occasionally generate unhelpful responses, which intend to provide support but display counterproductive effects. According to psychology and communication theories, poor performance in just one contributing factor might cause a response to be unhelpful. From the model training perspective, since these models have not been exposed to unhelpful responses during their training phase, they are unable to distinguish if the tokens they generate might result in unhelpful responses during inference. To address this issue, we introduce a novel model-agnostic framework named mitigating unhelpfulness with multifaceted AI feedback for emotional support (Muffin). Specifically, Muffin employs a multifaceted AI feedback module to assess the helpfulness of responses generated by a specific model with consideration of multiple factors. Using contrastive learning, it then reduces the likelihood of the model generating unhelpful responses compared to the helpful ones. Experimental results demonstrate that Muffin effectively mitigates the generation of unhelpful responses while slightly increasing response fluency and relevance.	翻訳日:2024-06-20 04:44:38 公開日:2024-06-17
# 大規模言語モデルによる勧告の多様性向上 Enhancing Recommendation Diversity by Re-ranking with Large Language Models ( http://arxiv.org/abs/2401.11506v2 ) ライセンス: Link先を確認	Diego Carraro, Derek Bridge,	(参考訳) Recommender System(RS)がユーザとの関係性のみに基づいてレコメンデーションを提供するのに十分ではないと長年認識されてきた。その他の多くの基準の中で、レコメンデーションのセットは多様である必要があるかもしれない。多様性は、レコメンデーションの不確実性に対処し、レコメンデーションがユーザーに有意義な選択を提供することを保証する方法の1つである。この文献は、様々な方法で多様性を計測し、一連のレコメンデーションの多様性を改善する方法を報告している。本稿では,多目的言語モデル(LLM)をRSパイプラインに組み込む方法について,文献から有望な洞察を得られた上で,LLMが多様性の再評価にどのように使用できるかを示す。まず、LCMがタスクの再ランク付けに利用でき、アイテムの多様性の概念をある程度理解できるという非公式な研究から始める。そこで我々は,ゼロショット方式の異なるプロンプトテンプレートを用いて,LLMが候補ランキングから多様なランキングを生成するための厳密な手法を設計する。我々はGPTファミリーとLlamaファミリーから最先端LLMの総合的な実験を行った。本論文では,それらの再ランク化能力と,ランダムな再ランク化手法とを比較検討する。再現性のための実験のコードをオープンソースにしています。 LLMを用いたリランカーのトレードオフ(性能やコストなど)は, ランダムなリランカーよりも優れているが, 従来のリランカーよりも劣っていることが示唆された。しかし、LLMアプローチは有望である。 LLMは、多くの自然言語処理およびレコメンデーションタスクの性能を改善し、推論コストを低減した。これらの傾向を踏まえると、LSMベースのリランクが近いうちに競争力を高めることが期待できる。 It has long been recognized that it is not enough for a Recommender System (RS) to provide recommendations based only on their relevance to users. Among many other criteria, the set of recommendations may need to be diverse. Diversity is one way of handling recommendation uncertainty and ensuring that recommendations offer users a meaningful choice. The literature reports many ways of measuring diversity and improving the diversity of a set of recommendations, most notably by re-ranking and selecting from a larger set of candidate recommendations. Driven by promising insights from the literature on how to incorporate versatile Large Language Models (LLMs) into the RS pipeline, in this paper we show how LLMs can be used for diversity re-ranking. We begin with an informal study that verifies that LLMs can be used for re-ranking tasks and do have some understanding of the concept of item diversity. Then, we design a more rigorous methodology where LLMs are prompted to generate a diverse ranking from a candidate ranking using various prompt templates with different re-ranking instructions in a zero-shot fashion. We conduct comprehensive experiments testing state-of-the-art LLMs from the GPT and Llama families. We compare their re-ranking capabilities with random re-ranking and various traditional re-ranking methods from the literature. We open-source the code of our experiments for reproducibility. Our findings suggest that the trade-offs (in terms of performance and costs, among others) of LLM-based re-rankers are superior to those of random re-rankers but, as yet, inferior to the ones of traditional re-rankers. However, the LLM approach is promising. LLMs exhibit improved performance on many natural language processing and recommendation tasks and lower inference costs. Given these trends, we can expect LLM-based re-ranking to become more competitive soon.	翻訳日:2024-06-20 04:34:53 公開日:2024-06-17
# ほとんど)コストがかからない安全ファインチューニング - 大規模言語モデルのためのベースライン Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models ( http://arxiv.org/abs/2402.02207v2 ) ライセンス: Link先を確認	Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, Timothy Hospedales,	(参考訳) 現在の視覚大言語モデル(VLLM)は、有害なコンテンツを生成しやすく、最も単純なジェイルブレイク攻撃にも脆弱である。我々の初期分析では、視覚言語指導の微調整中に有害なデータが存在することが原因であり、VLLM微調整は、以前にLLMが学習した安全アライメントの忘れを生じさせる可能性がある。この問題に対処するために、まず、様々な有害なカテゴリをカバーする視覚言語安全な命令追従データセットVLGuardをキュレートする。我々の実験は、このデータセットを標準的な視覚言語による微調整に統合するか、あるいはポストホックな微調整に利用することで、VLLMを効果的に適合させることを示した。このアライメントは、モデルの有用性に最小限の影響、あるいは強化することで達成される。安全微調整データセットの汎用性により、既存のVLLMの安全性テスト、新しいモデルのトレーニング、トレーニング済みのVLLMの保護に有用なリソースになります。実験の結果, 微調整VLLMは安全でない命令を効果的に拒否し, ブラックボックス攻撃の成功率を大幅に低下させ, 多くの場合ゼロに近づいた。コードとデータセットはhttps://github.com/ys-zong/VLGuard.comで公開されている。 Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM. To address this issue, we first curate a vision-language safe instruction-following dataset VLGuard covering various harmful categories. Our experiments demonstrate that integrating this dataset into standard vision-language fine-tuning or utilizing it for post-hoc fine-tuning effectively safety aligns VLLMs. This alignment is achieved with minimal impact on, or even enhancement of, the models' helpfulness. The versatility of our safety fine-tuning dataset makes it a valuable resource for safety-testing existing VLLMs, training new models or safeguarding pre-trained VLLMs. Empirical results demonstrate that fine-tuned VLLMs effectively reject unsafe instructions and substantially reduce the success rates of several black-box adversarial attacks, which approach zero in many cases. The code and dataset are available at https://github.com/ys-zong/VLGuard.	翻訳日:2024-06-20 04:25:08 公開日:2024-06-17
# ストリーム上の効率的な推論のためのオンラインカスケード学習 Online Cascade Learning for Efficient Inference over Streams ( http://arxiv.org/abs/2402.04513v3 ) ライセンス: Link先を確認	Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri,	(参考訳) 大規模言語モデル(LLM)は、データストリームに関する複雑なクエリに応答する自然な役割を持つが、LLM推論の計算コストが高いため、そのようなタスクの多くでは実現不可能である。この課題に対処する最初のアプローチであるオンラインカスケード学習を提案する。ここでの目的はモデルの"カスケード"を学習することであり、まず低容量モデル(ロジスティック回帰など)から始まり、与えられた入力で使用するモデルを決定する遅延ポリシーとともに強力なLCMで終わる。そこで我々は,LLMの実演を模擬した小さなモデルを時間とともに更新し,その問題に対する非回帰アルゴリズムを与える,模擬学習問題として,オンラインでカスケードを学習するタスクを定式化する。 4つのベンチマークによる実験結果から,提案手法は推定コストを最大90%削減し,入力分布シフトに対して強い堅牢性を付与し,ストリーム処理の有効性と適応性を実証した。 Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to address this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regression) and ending with a powerful LLM, along with a deferral policy that determines the model to be used on a given input. We formulate the task of learning cascades online as an imitation-learning problem, where smaller models are updated over time imitating the collected LLM demonstrations, and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing.	翻訳日:2024-06-20 04:25:08 公開日:2024-06-17
# 色空間は1つだけ:低照度画像強調のための効率的なネットワーク You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement ( http://arxiv.org/abs/2402.05809v3 ) ライセンス: Link先を確認	Qingsen Yan, Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang,	(参考訳) 低照度画像強調(LLIE)タスクは、劣化した低照度画像から詳細と視覚情報を復元する傾向がある。既存のほとんどの手法は、sRGBとHSV色空間上のディープニューラルネットワーク(DNN)により、低/正常光画像間のマッピング関数を学習する。それでも、強調には画像信号の増幅が含まれており、これらの色空間を低信号対雑音比の低照度画像に適用することで、強調プロセスに感度と不安定性をもたらす可能性がある。その結果、拡張された画像に色アーティファクトと明るさアーティファクトが存在することが判明した。この問題を軽減するために,HVI (Horizontal/Vertical-Intensity) と呼ばれる新しいトレーニング可能なカラー空間を提案する。輝度と色をRGBチャネルから切り離して、拡張中の不安定性を緩和するだけでなく、トレーニング可能なパラメータによって異なる照明範囲の低照度画像にも適応する。さらに,分離した画像の明るさと色をHVI空間で処理するための2つの枝を持つ新しいカラー・インテンシティ・デカップリングネットワーク(CIDNet)を設計する。 CIDNet内では、低照度画像におけるノイズを抑えつつ、画像構造とコンテンツ情報の相互作用を容易にする軽量クロスアテンション(LCA)モジュールを導入している。最後に,22種類の定量定性的実験を行い,提案したCIDNetが11個のデータセットの最先端手法より優れていることを示した。コードはhttps://github.com/Fediory/HVI-CIDNetで公開されている。 Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-noise ratio can introduce sensitivity and instability into the enhancement process. Consequently, this results in the presence of color artifacts and brightness artifacts in the enhanced images. To alleviate this problem, we propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI). It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters. Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space. Within CIDNet, we introduce the Lightweight Cross-Attention (LCA) module to facilitate interaction between image structure and content information in both branches, while also suppressing noise in low-light images. Finally, we conducted 22 quantitative and qualitative experiments to show that the proposed CIDNet outperforms the state-of-the-art methods on 11 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet.	翻訳日:2024-06-20 04:25:08 公開日:2024-06-17
# 意思決定の決定力:低分散リスク制限監査とマルジナルマーク記録による選挙コンテスト The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording ( http://arxiv.org/abs/2402.06515v4 ) ライセンス: Link先を確認	Benjamin Fuller, Rashmi Pai, Alexander Russell,	(参考訳) リスクリミット監査(リスクリミット監査、RLA)は、大規模な選挙の結果を検証する技術である。正確性に関する厳密な保証を提供する一方で、効率上の懸念と、それらが絶対的な結論ではなく統計的に提供しているという事実の両方によって広く採用が妨げられている。我々は、これらの困難に両立し、効率を改善し、統計力の質的な進歩を提供する新しい監査の家族を定義します。我々の新しい監査は、キャスト・ボイト・レコードの標準概念を再考することで、単一の決定ではなく、複数の可能なマーク解釈を宣言できるようにします。既存の監査インフラにマイナーな変更を加えるだけで、この単純な迅速さによって、大幅な効率改善が実現できることが示される。リスク制限比較監査はどちらも、Fuller、Harrison、Russell(IEEE Security & Privacy 2023)という形式的な意味で行われます。次に、競合監査と呼ぶ新しいタイプの選挙後監査を定義します。これにより、各候補者は、自身の勝利の主張を推し進めるキャスト・ボイト・レコード・テーブルを提供することができる。これらの監査が顕著なサンプル効率を示し、一定の数のサンプル(マージンとは無関係)でリスクを制御できることを実証する。これは、証明可能な音のオーディションとしては初めてのものです。これらの結果は、定量的な音質と完全性を保証するゲームベースのセキュリティモデルで定式化される。これらの監査は、従来のRSAによって確認された選挙結果のコンテストに対処する手段を提供する。 Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We attend to both of these difficulties, defining new families of audits that improve efficiency and offer qualitative advances in statistical power. Our new audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations rather than a single decision; this can reflect the presence of marginal marks, which appear regularly on hand-marked ballots. We show that this simple expedient can offer significant efficiency improvements with only minor changes to existing auditing infrastructure. We consider two ways of representing these marks, both yield risk-limiting comparison audits in the formal sense of Fuller, Harrison, and Russell (IEEE Security & Privacy 2023). We then define a new type of post-election audit we call a contested audit. These permit each candidate to provide a cast-vote record table advancing their own claim to victory. We prove that these audits offer remarkable sample efficiency, yielding control of risk with a constant number of samples (that is independent of margin). This is a first for an audit with provable soundness. These results are formulated in a game-based security model that specify quantitative soundness and completeness guarantees. These audits provide a means to handle contestation of election results affirmed by conventional RLAs.	翻訳日:2024-06-20 04:15:24 公開日:2024-06-17
# AuditLLM:マルチプローブアプローチによる大規模言語モデル監査ツール AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach ( http://arxiv.org/abs/2402.09334v2 ) ライセンス: Link先を確認	Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, Chirag Shah,	(参考訳) 大規模言語モデル(LLM)は様々な分野に統合されており、信頼性と安全性が不可欠である。これは厳格な調査と監査を必要とし、実践的な応用におけるその有効性と信頼性を維持する。単一のクエリの様々なイテレーションにLLMを適用すると、その知識ベースや機能能力に潜在的な矛盾が生じる可能性がある。しかし、ワークフローの実行が容易で、技術的なしきい値が低いような監査を行うツールは欠落している。本稿では,様々なLLMの性能を方法論的に評価するための新しいツールである 'AuditLLM' を紹介する。 AuditLLMの主な機能は、1つの質問から導かれた複数のプローブをデプロイすることで、与えられたLCMを監査することで、モデルの理解や性能の不整合を検出することである。堅牢で信頼性があり、一貫性のあるLCMは、同じ質問の可変なフレーズ付きバージョンに対する意味論的に類似した応答を生成することが期待されている。この前提に基づいて、AuditLLMは、ユーザが提供した単一の入力質問に基づいて、LCMの一貫性を反映した容易に解釈可能な結果を生成する。あるレベルの矛盾が潜在的なバイアス、幻覚、その他の問題の指標であることが示されている。次に AuditLLM の出力を使用して、前述の LLM の問題をさらに調査することができる。 1)リアルタイムクエリに対する応答を解析してLLMの即時監査を可能にするライブモードと,(2)奥行き分析のために複数のクエリを一度に処理することで総合的なLLM監査を容易にするバッチモードである。このツールは,標準監査プラットフォームを用いて,LLMの応答生成能力の理解を深めることによって,研究者と一般ユーザ双方にとって有益である。 As Large Language Models (LLMs) are integrated into various sectors, ensuring their reliability and safety is crucial. This necessitates rigorous probing and auditing to maintain their effectiveness and trustworthiness in practical applications. Subjecting LLMs to varied iterations of a single query can unveil potential inconsistencies in their knowledge base or functional capacity. However, a tool for performing such audits with a easy to execute workflow, and low technical threshold is lacking. In this demo, we introduce ``AuditLLM,'' a novel tool designed to audit the performance of various LLMs in a methodical way. AuditLLM's primary function is to audit a given LLM by deploying multiple probes derived from a single question, thus detecting any inconsistencies in the model's comprehension or performance. A robust, reliable, and consistent LLM is expected to generate semantically similar responses to variably phrased versions of the same question. Building on this premise, AuditLLM generates easily interpretable results that reflect the LLM's consistency based on a single input question provided by the user. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; and (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.	翻訳日:2024-06-20 04:15:24 公開日:2024-06-17
# 障害型DABS:障害テキストにおける動的アスペクトベース要約のベンチマーク Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts ( http://arxiv.org/abs/2402.10554v2 ) ライセンス: Link先を確認	Xiaobo Guo, Soroush Vosoughi,	(参考訳) アスペクトベースの要約は、特に構造化テキストにおいて顕著な進歩を遂げている。しかし、ソーシャルメディアや顧客からのフィードバックなど、混乱した大規模なテキストを要約することは、依然として大きな課題だ。現在の研究は、動的および乱れた環境の複雑さを無視して、構造化されたテキストの事前定義された側面を主にターゲットとしている。このギャップに対処するために、非構造化テキストに適した動的アスペクトベースの要約のための新しいベンチマークであるDissented-DABSを導入する。コスト効率とスケーラビリティのために既存のデータセットを適応させることにより、我々の包括的な実験と詳細な人的評価により、障害型DABSは、GPT-3.5のような最先端言語モデルを含む現代の要約モデルに固有の課題をもたらすことが明らかとなった。 Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current research largely targets predefined aspects within structured texts, neglecting the complexities of dynamic and disordered environments. Addressing this gap, we introduce Disordered-DABS, a novel benchmark for dynamic aspect-based summarization tailored to unstructured text. Developed by adapting existing datasets for cost-efficiency and scalability, our comprehensive experiments and detailed human evaluations reveal that Disordered-DABS poses unique challenges to contemporary summarization models, including state-of-the-art language models such as GPT-3.5.	翻訳日:2024-06-20 04:15:24 公開日:2024-06-17
# 言語モデルが反映する感情とモラル感 Whose Emotions and Moral Sentiments Do Language Models Reflect? ( http://arxiv.org/abs/2402.11114v2 ) ライセンス: Link先を確認	Zihao He, Siyi Guo, Ashwin Rao, Kristina Lerman,	(参考訳) 言語モデル(LM)は、特にコンテンツモデレーションやヘイトスピーチの検出といった主観的なタスクにおいて、他のグループよりも優れた社会集団の視点を表現することが知られている。 LMが異なる視点をどう表現するかを探求するために、既存の研究は位置的アライメント、すなわちモデルが異なるグループの意見やスタンス、例えばリベラル派や保守派をいかに模倣するかに焦点を当てている。しかし、人間のコミュニケーションは感情的・道徳的な側面も含む。本研究では、感情的アライメントの問題を定義し、LMの感情的トーンと道徳的トーンが異なるグループのトーンをどのように表すかを測定する。我々は,36個のLMが生成した応答とTwitterメッセージの影響を比較することで,両者のイデオロギー的グループによるLMの重大な不一致を観察した。 LMを特定のイデオロギー的視点に向けて操った後も、モデルの不適応とリベラルな傾向は持続し、LM内の体系的偏見が示唆される。 Language models (LMs) are known to represent the perspectives of some social groups better than others, which may impact their performance, especially on subjective tasks such as content moderation and hate speech detection. To explore how LMs represent different perspectives, existing research focused on positional alignment, i.e., how closely the models mimic the opinions and stances of different groups, e.g., liberals or conservatives. However, human communication also encompasses emotional and moral dimensions. We define the problem of affective alignment, which measures how LMs' emotional and moral tone represents those of different groups. By comparing the affect of responses generated by 36 LMs to the affect of Twitter messages, we observe significant misalignment of LMs with both ideological groups. This misalignment is larger than the partisan divide in the U.S. Even after steering the LMs towards specific ideological perspectives, the misalignment and liberal tendencies of the model persist, suggesting a systemic bias within LMs.	翻訳日:2024-06-20 04:15:24 公開日:2024-06-17
# 影響分析によるインテクスト学習の実証選択 In-Context Learning Demonstration Selection via Influence Analysis ( http://arxiv.org/abs/2402.11750v2 ) ライセンス: Link先を確認	Vinay M. S., Minh-Hao Van, Xintao Wu,	(参考訳) 大規模言語モデル(LLM)がICL(In-Context Learning)機能を披露した。その利点にもかかわらず、ICLの有効性はデモの選択に大きく依存している。 ICLの最も効果的なデモンストレーションを選択することは、依然として重要な研究課題である。そこで本研究では,インフルエンス関数を用いてトレーニングサンプルの影響を解析する,InfICLという実演選択手法を提案する。最も影響力のあるトレーニングサンプルをデモとして識別することで、InfICLはICLの一般化性能を向上させることを目指している。 InfICLのコスト効率を維持するため,LLMのみを使用してサンプル入力埋め込みを生成し,高価な微調整を回避する。実世界の様々なデータセットに関する実証研究を通じて、最先端のベースラインと比較してInfICLの利点を実証する。 Large Language Models (LLMs) have showcased their In-Context Learning (ICL) capabilities, enabling few-shot learning without the need for gradient updates. Despite its advantages, the effectiveness of ICL heavily depends on the choice of demonstrations. Selecting the most effective demonstrations for ICL remains a significant research challenge. To tackle this issue, we propose a demonstration selection method named InfICL, which utilizes influence functions to analyze impacts of training samples. By identifying the most influential training samples as demonstrations, InfICL aims to enhance the ICL generalization performance. To keep InfICL cost-effective, we only use the LLM to generate sample input embeddings, avoiding expensive fine-tuning. Through empirical studies on various real-world datasets, we demonstrate advantages of InfICL compared to state-of-the-art baselines.	翻訳日:2024-06-20 04:15:24 公開日:2024-06-17
# オンラインコミュニティにおける人的価値の調査 Investigating Human Values in Online Communities ( http://arxiv.org/abs/2402.14177v2 ) ライセンス: Link先を確認	Nadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein,	(参考訳) 人的価値は社会科学における分析ツールとして重要な役割を担い、社会全体および個々のコミュニティにおける様々な次元の研究を可能にする。本稿では、シュワルツの価値観フレームワークの計算応用をRedditに提案することで、従来の調査に基づく人的価値の研究の限界に対処する。 Redditコンテンツの自動値抽出ツールの信頼性を確保した後、Schwartzの値で10,000のサブレディットに600万の投稿を自動的に注釈付けします。本分析は,様々なオンラインコミュニティで広く普及している価値観について,これまでに記録された知見と新たな知見の両方を提示する。例えば、議論の的となる話題について異なる意見のサブレディットを調べると、カーニヴォールよりもベガンのサブレディットにおけるより高い普遍主義的価値を発見する。さらに、地理的に特異的なサブレディットの研究は、伝統的な価値観と保守的なアメリカ合衆国の州との相関を強調している。 Human values play a vital role as an analytical tool in social sciences, enabling the study of diverse dimensions within society as a whole and among individual communities. This paper addresses the limitations of traditional survey-based studies of human values by proposing a computational application of Schwartz's values framework to Reddit, a platform organized into distinct online communities. After ensuring the reliability of automated value extraction tools for Reddit content, we automatically annotate six million posts across 10,000 subreddits with Schwartz values. Our analysis unveils both previously recorded and novel insights into the values prevalent within various online communities. For instance, when examining subreddits with differing opinions on controversial topics, we discover higher universalism values in the Vegan subreddit compared to Carnivores. Additionally, our study of geographically specific subreddits highlights the correlation between traditional values and conservative U.S. states.	翻訳日:2024-06-20 04:05:40 公開日:2024-06-17
# Unraveling Babel: LLMの多言語活性化パターンの探索とその応用 Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications ( http://arxiv.org/abs/2402.16367v2 ) ライセンス: Link先を確認	Weize Liu, Yinlong Xu, Hongxia Xu, Jintai Chen, Xuming Hu, Jian Wu,	(参考訳) 近年,大規模言語モデル (LLM) はNLPの分野で大きなブレークスルーを遂げている。我々は,高密度LLMを微細なMoEアーキテクチャに変換する手法を設計し,その上で,専門家のアクティベーション周波数の熱マップを用いて多言語アクティベーションパターンを視覚的に研究した。異なるモデルファミリ,異なるモデルサイズ,異なる変種に関する総合的な実験を通じて,高周波アクティベーションの専門家の分布,多言語共有専門家の分布,異なる言語のアクティベーションパターンが言語ファミリと関連しているかどうか,およびアクティベーションパターンに及ぼす指導チューニングの影響を解析した。さらに、専門家のアクティベーション周波数の差分を利用して、非構造化プルーニングを2つの異なる方法で導く方法について検討した。実験結果から,提案手法はランダム・エキスパート・プルーニングを著しく上回り,一部の言語での未実行モデルの性能よりも優れていた。さらに、アクティベーションレベルの違いに基づいて異なるレイヤに対して異なるプルーニング率を設定することで、より良い結果が得られることがわかった。本研究は, LLM内の多言語処理機構を明らかにし, これらの知見を利用して, モデルプルーニングなどのアプリケーションに新たな視点を提供するものである。 Recently, large language models (LLMs) have achieved tremendous breakthroughs in the field of NLP, but still lack understanding of their internal activities when processing different languages. We designed a method to convert dense LLMs into fine-grained MoE architectures, and then visually studied the multilingual activation patterns of LLMs through expert activation frequency heatmaps. Through comprehensive experiments on different model families, different model sizes, and different variants, we analyzed the distribution of high-frequency activated experts, multilingual shared experts, whether the activation patterns of different languages are related to language families, and the impact of instruction tuning on activation patterns. We further explored leveraging the discovered differences in expert activation frequencies to guide unstructured pruning in two different ways. Experimental results demonstrated that our method significantly outperformed random expert pruning and even exceeded the performance of the original unpruned models in some languages. Additionally, we found that configuring different pruning rates for different layers based on activation level differences could achieve better results. Our findings reveal the multilingual processing mechanisms within LLMs and utilize these insights to offer new perspectives for applications such as model pruning.	翻訳日:2024-06-20 04:05:40 公開日:2024-06-17
# Sarathi-Serve を用いた LLM 推論におけるスループット-レイテンシトレードオフのモデル化 Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve ( http://arxiv.org/abs/2403.02310v3 ) ライセンス: Link先を確認	Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee,	(参考訳) 各LSMサービス要求は2段階に分けて行われる。第1のプリフィルは入力プロンプト全体を処理し、第1の出力トークンを生成し、第2のプリフィルは、残りの出力トークンを1対1で生成するデコードである。 Prefillイテレーションはレイテンシが高いが、入力プロンプトの並列処理によってGPU計算が飽和する。対照的に、デコードイテレーションはレイテンシが低いが、要求毎に1つのトークンしか処理しないため、計算利用率が低い。これにより、バッチ処理はデコードに非常に効果的になり、結果として全体的なスループットが向上する。しかし、複数のリクエストをバッチ化すると、プリフィルとデコードがインターリーブされ、高いスループットと低レイテンシの両方を達成することが困難になる。このスループットレイテンシのトレードオフに対処するために,効率的なLLM推論スケジューラであるSarathi-Serveを導入する。 Sarathi-Serve氏は、プレフィルリクエストをほぼ同じサイズのチャンクに分割するチャンクドプレフィルを導入し、ストールフリースケジュールを生成し、継続するデコードを変更することなく、バッチに新しいリクエストを追加する。静的なスケジューリングは、バッチ処理がレイテンシに与える影響を最小限に抑えながら、大きなバッチサイズでスループットを改善する機会を解放する。さらに、Sarathi-Serveの均一なバッチは、イテレーション間の不均衡を改善し、最小のパイプラインバブルをもたらす。我々の手法は、テール遅延制約下でのモデルとハードウェア間での推論性能を大幅に改善する。 1つのA100 GPU上のMistral-7Bでは、vLLMと比較して2つのA100 GPU上のYi-34Bモデルの2.6倍のサービス容量と3.7倍のサービス容量を達成する。ファルコン180Bでパイプライン並列性を使用する場合、サラタイサーベはエンドツーエンドの能力で最大5.6倍の利得を提供する。 Sarathi-Serveのソースコードはhttps://github.com/microsoft/sarathi-serve.comで入手できる。 Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low compute utilization because a decode iteration processes only a single token per request. This makes batching highly effective for decodes and consequently for overall throughput. However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency. We introduce an efficient LLM inference scheduler, Sarathi-Serve, to address this throughput-latency tradeoff. Sarathi-Serve introduces chunked-prefills which splits a prefill request into near equal sized chunks and creates stall-free schedules that adds new requests in a batch without pausing ongoing decodes. Stall-free scheduling unlocks the opportunity to improve throughput with large batch sizes while minimizing the effect of batching on latency. Furthermore, uniform batches in Sarathi-Serve ameliorate the imbalance between iterations resulting in minimal pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware under tail latency constraints. For Mistral-7B on single A100 GPUs, we achieve 2.6x higher serving capacity and up to 3.7x higher serving capacity for the Yi-34B model on two A100 GPUs as compared to vLLM. When used with pipeline parallelism on Falcon-180B, Sarathi-Serve provides up to 5.6x gain in the end-to-end serving capacity. The source code for Sarathi-Serve is available at https://github.com/microsoft/sarathi-serve.	翻訳日:2024-06-20 04:05:40 公開日:2024-06-17
# OffensiveLang: コミュニティベースの攻撃的言語データセット OffensiveLang: A Community Based Implicit Offensive Language Dataset ( http://arxiv.org/abs/2403.02472v6 ) ライセンス: Link先を確認	Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata Bhattacharya, Nilanjana Raychawdhary, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals,	(参考訳) ソーシャルメディアにおけるヘイトフル言語の存在は、社会的幸福に悪影響を及ぼしている。結果として、この問題に高い優先順位で対処することが非常に重要になっている。ヘイトスピーチや攻撃的な言語は、明示的な形と暗黙的な形の両方に存在するが、後者は検出することがより困難である。この領域における現在の研究はいくつかの課題に直面している。まず、既存のデータセットは主に明示的な攻撃的なキーワードを含むテキストの収集に依存しており、これらのキーワードを欠いた暗黙的に攻撃的なコンテンツをキャプチャすることは困難である。第二に、一般的な方法論は、コミュニティ情報が提供する価値ある洞察を無視して、テキスト分析にのみ焦点をあてる傾向がある。そこで本研究では,ChatGPT 3.5 が生成する攻撃的言語データセットであるOffensiveLang について紹介する。倫理的制約によりChatGPTを用いた攻撃的テキストの生成に制限があるにもかかわらず、暗黙的な攻撃的言語を効果的に生成するプロンプトベースのアプローチを提案する。データ品質を確保するために、データセットを人間で評価する。さらに,ChatGPTを用いたプロンプトベースのゼロショット法を用いて,人間のアノテーションとChatGPTアノテーションの検知結果を比較する。既存の最先端モデルを用いて、そのような言語を検出するのがいかに効果的かを確認する。データセットは以下の通りである。 https://github.com/AmitDasRup123/OffensiveLang The widespread presence of hateful languages on social media has resulted in adverse effects on societal well-being. As a result, addressing this issue with high priority has become very important. Hate speech or offensive languages exist in both explicit and implicit forms, with the latter being more challenging to detect. Current research in this domain encounters several challenges. Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords. Secondly, common methodologies tend to focus solely on textual analysis, neglecting the valuable insights that community information can provide. In this research paper, we introduce a novel dataset OffensiveLang, a community based implicit offensive language dataset generated by ChatGPT 3.5 containing data for 38 different target groups. Despite limitations in generating offensive texts using ChatGPT due to ethical constraints, we present a prompt-based approach that effectively generates implicit offensive languages. To ensure data quality, we evaluate the dataset with human. Additionally, we employ a prompt-based zero-shot method with ChatGPT and compare the detection results between human annotation and ChatGPT annotation. We utilize existing state-of-the-art models to see how effective they are in detecting such languages. The dataset is available here: https://github.com/AmitDasRup123/OffensiveLang	翻訳日:2024-06-20 04:05:40 公開日:2024-06-17
# 非存在の証明」はDNSリゾルバCPUを悪用できる Attacking with Something That Does Not Exist: 'Proof of Non-Existence' Can Exhaust DNS Resolver CPU ( http://arxiv.org/abs/2403.15233v2 ) ライセンス: Link先を確認	Olivia Gruza, Elias Heftrig, Oliver Jacobsen, Haya Schulmann, Niklas Vogel, Michael Waidner,	(参考訳) NSEC3はDNSSECに存在しないことの証明であり、クエリされたリソースがターゲットドメインに存在しないという認証された主張を提供する。 NSEC3は、検索されたホスト名の前と後をアルファベット順にソートしたハッシュネームで構成されている。辞書攻撃を困難にするため、ハッシュ関数を複数回繰り返し適用することは可能であるが、NSEC3レコードのSHA-1ハッシュの計算においてDNSリゾルバの負荷も増大する。 DNSリゾルバ上の NSEC3 レコードの計算によって発生する負荷に関する懸念はすでに NSEC3 仕様 RFC5155 と RFC9276 で検討されている。 2024年2月、NSEC3がDNSリゾルバのリソースを消費する可能性があり、CVE-2023-50868が割り当てられた。しかし,攻撃評価は公表されておらず,リゾルバに対する攻撃の影響は明らかにされていない。本研究では,DNSリゾルバの実装に対する NSEC3-encloser 攻撃の最初の評価を行い, RFC5155 の勧告に従えば, NSEC3-encloser 攻撃は 72 倍のCPU命令数を発生させることができることを確認した。攻撃の影響は、異なるDNSリゾルバによって異なるが、十分な量のDNSパケットがあれば、攻撃はCPU負荷を増大させ、パケットロスを引き起こす可能性があることを示す。 DNSの実装によって、毎秒150の悪意のあるNSEC3レコードのレートで、良質なDNSリクエストの損失率は2.7%から30%の間で異なる。 NSEC3-encloser攻撃の詳細な説明と実装を提供する。また,各NSEC3パラメータがNSEC3-encloser攻撃時の被害者リゾルバの負荷にどのように影響するかを解析した。 NSEC3 is a proof of non-existence in DNSSEC, which provides an authenticated assertion that a queried resource does not exist in the target domain. NSEC3 consists of alphabetically sorted hashed names before and after the queried hostname. To make dictionary attacks harder, the hash function can be applied in multiple iterations, which however also increases the load on the DNS resolver during the computation of the SHA-1 hashes in NSEC3 records. Concerns about the load created by the computation of NSEC3 records on the DNS resolvers were already considered in the NSEC3 specifications RFC5155 and RFC9276. In February 2024, the potential of NSEC3 to exhaust DNS resolvers' resources was assigned a CVE-2023-50868, confirming that extra iterations of NSEC3 created substantial load. However, there is no published evaluation of the attack and the impact of the attack on the resolvers was not clarified. In this work we perform the first evaluation of the NSEC3-encloser attack against DNS resolver implementations and find that the NSEC3-encloser attack can still create a 72x increase in CPU instruction count, despite the victim resolver following RFC5155 recommendations in limiting hash iteration counts. The impact of the attack varies across the different DNS resolvers, but we show that with a sufficient volume of DNS packets the attack can increase CPU load and cause packet loss. We find that at a rate of 150 malicious NSEC3 records per second, depending on the DNS implementation, the loss rate of benign DNS requests varies between 2.7% and 30%. We provide a detailed description and implementation of the NSEC3-encloser attack. We also develop the first analysis how each NSEC3 parameter impacts the load inflicted on the victim resolver during NSEC3-encloser attack.	翻訳日:2024-06-20 03:55:50 公開日:2024-06-17
# Qibo: 漢方医学における大規模言語モデル Qibo: A Large Language Model for Traditional Chinese Medicine ( http://arxiv.org/abs/2403.16056v2 ) ライセンス: Link先を確認	Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo,	(参考訳) LLM(Large Language Models)は、医学、法律、金融など多くの専門分野において大きな進歩を遂げている。しかし、伝統的な中国医学(TCM)においては、理論と近代医学の本質的な違い、専門的なコーパス資源の欠如、監督された微調整にのみ依存しているという事実は、過度な予測につながる可能性がある。これらの課題に対処するため,継続的事前学習と教師付き微調整を組み合わせた2段階の訓練手法を提案する。本研究の特筆すべき貢献は、TCM専用の2Gbコーパスの処理であり、それぞれTCMのための事前学習データセットと命令微調整データセットを構築している。さらに,主観的,客観的,および3つのTCMNLPタスクを含む,TCMにおけるLLMの性能を評価するツールであるQibo-Benchmarkを開発した。 Emph{\textbf{Qibo}}という名前のパイプラインでトレーニングされた医療用LLMは、大幅なパフォーマンス向上を示します。ベースラインと比較すると、平均主観的勝利率は63.%、平均目標精度は23.%から58.%向上し、3つのTCM NLPタスクのルージュ-Lスコアは0.72、0.61、0.55である。最後に,QiboをTCMコンサルテーションに適用するためのピップラインを提案し,ケーススタディを通じてモデル性能を実証する。 Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2Gb corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named \emph{\textbf{Qibo}}, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63\%, the average objective accuracy improved by 23\% to 58\%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.	翻訳日:2024-06-20 03:55:50 公開日:2024-06-17
# 言語モデル非現実的幻覚の機械的理解と緩和 Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations ( http://arxiv.org/abs/2403.18167v2 ) ライセンス: Link先を確認	Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong,	(参考訳) State-of-the-art Language Model (LM) は、世界の知識と混同する非現実的な幻覚を生じることがある。これらの幻覚の機械的原因を探るため,主観的関係クエリを用いた診断データセットを作成し,内部モデル表現による幻覚の追跡に解釈可能性手法を適用した。我々は、LM間で共有される幻覚(Llama-2, Pythia, GPT-J)の2つの一般的および別個の機械的原因を発見する。 1)知識豊か化幻覚:下層MLPにおける主観的属性知識の不足、及び 2)回答抽出幻覚:上層アテンションヘッドにおける正しい対象属性の選択に失敗する。また,この2つの幻覚の内的機械的原因が外的症状に反映されていることも判明した。本研究は,機械解析から得られた知見に基づいて,LMの内部事実リコールパイプラインの修復を目標とし,ベースラインよりも優れた性能を示す新しい幻覚緩和手法を提案する。 State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.	翻訳日:2024-06-20 03:55:50 公開日:2024-06-17
# Invalsiベンチマーク:イタリア語における大規模言語モデルの言語学的および数学的理解の測定 The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian ( http://arxiv.org/abs/2403.18697v2 ) ライセンス: Link先を確認	Andrea Esuli, Giovanni Puccetti,	(参考訳) イタリア語は高資源言語であるが、この言語ではLarge Language Models (LLM) の生成能力を評価するためのイタリアのネイティブベンチマークはほとんどない。 Invalsi MATEは、イタリア語の数学的理解に基づくモデル性能の評価と、Invalsi ITAはイタリア語の言語理解を評価する。これらのベンチマークは、イタリアの学校システムで6歳から18歳の学生に実施され、教育と教育の専門家によって検証されたInvalsiテストに基づいている。これらのベンチマークを用いて、現在の言語モデルが数学的理解において70%の精度で拘束され、Llama 3 70bと言語理解において85%の精度で達成されていることを示す9つの強力な言語モデルを評価する。また,LLMをイタリアの学生の平均成績と比較したところ,Llama 3がInvalsi MATEの学生より優れているのに対して,ほとんどのモデルはInvalsi ITAの生徒より優れていることがわかった。我々は,LLMの数学的および言語的理解をイタリア語で評価するために,より大規模かつ困難なベンチマークを今後開発する道を開くために,データおよび評価コードを公開する。 While Italian is a high resource language, there are few Italian-native benchmarks to evaluate Large Language Models (LLMs) generative abilities in this language. This work presents two new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian and Invalsi ITA to evaluate language understanding in Italian. These benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system and have been validated by several experts in teaching and pedagogy. We use these benchmarks to evaluate 9 powerful language models showing that current language models are bound by 70% accuracy in mathematical understanding, achieved by Llama 3 70b and by 85% in language understanding. We also compare LLMs with the average performance of Italian students to show that Llama 3 is the only one to perform better than students on Invalsi MATE while most models outperform students on Invalsi ITA. We will make data and evaluation code openly available to pave the way for the future development of larger and harder benchmarks to evaluate LLMs' mathematical and linguistic understanding in Italian.	翻訳日:2024-06-20 03:55:50 公開日:2024-06-17
# Hammersley-Chapman-Robbins境界による機密性の保証 Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds ( http://arxiv.org/abs/2404.02866v3 ) ライセンス: Link先を確認	Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert,	(参考訳) ディープニューラルネットワークによる推論中のプライバシ保護は、最終分類器や他のタスク固有のレイヤの前に、最後のレイヤのアクティベーションにノイズを加えることで実現される。このような層の活性化は、"features"(一般的には"embeddings"や"feature embeddeds"と呼ばれる)として知られている。ノイズが加わったことで、ノイズのある特徴から入力が復元されるのを防ぐことができる。入力の可能な全ての非バイアス推定器のばらつきを低くすることは、そのような付加ノイズから生じる機密性を定量化する。ハマーズリーとチャップマンとロビンズの古典的不等式(HCR境界)から、連続で計算的に計算可能な境界が利用できる。数値実験により、HCR境界は、画像分類用の10のクラスを含むデータセット "MNIST" と "CIFAR-10" で、小さなニューラルネットに対して有効であることが示唆された。 HCR境界は、標準のディープニューラルネットワークである"ResNet-18"と"Swin-T"を、1000のクラスを含むデータセットである"ImageNet-1000"で事前トレーニングする際の入力の機密性を保証するために、それ自体では不十分であるように見える。 ImageNetの場合、機密性を提供する他の方法による機能へのノイズの追加を補うことは保証される。いずれの場合も, ノイズによる分類精度の低下がほとんどない付加雑音の量について検討した。これにより、画像分類作業の精度を大幅に低下させることなく、秘密性を高めることができる。 Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.	翻訳日:2024-06-20 01:55:10 公開日:2024-06-17
# ChangeMamba:時空間空間モデルによるリモートセンシング変化検出 ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model ( http://arxiv.org/abs/2404.03425v4 ) ライセンス: Link先を確認	Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, Naoto Yokoya,	(参考訳) 畳み込みニューラルネットワーク(CNN)とトランスフォーマーは、リモートセンシング変化検出(CD)の分野で目覚ましい進歩を遂げた。しかし、両方のアーキテクチャには固有の欠点がある。CNNは、より広い空間的コンテキストをキャプチャする能力を阻害する、限定的な受容的フィールドによって制約されている一方で、Transformerは計算集約的であり、大規模なデータセット上でトレーニングとデプロイにコストがかかる。近年、状態空間モデルに基づくMambaアーキテクチャは、上記の2つのアーキテクチャの欠点を効果的に補うことができる一連の自然言語処理タスクにおいて、顕著な性能を示している。本稿では,リモートセンシングCDタスクにおけるMambaアーキテクチャの可能性について検討する。我々は,2値変化検出 (BCD), 意味変化検出 (SCD), 建物損傷評価 (BDA) に対応するフレームワークであるMambaBCD, MambaSCD, MambaBDAを調整した。 3つのフレームワークはいずれも最先端のVisual Mambaアーキテクチャをエンコーダとして採用しており、入力画像からグローバルな空間的情報を完全に学習することができる。 3つのアーキテクチャで利用可能な変更デコーダについて,Mambaアーキテクチャと自然に結合可能な3つの時空間関係モデリング機構を提案し,その特性をフル活用して複数時空間特徴の時空間相互作用を実現し,正確な変更情報を得る。 5つのベンチマークデータセットにおいて、提案するフレームワークは、複雑なトレーニング戦略やトリックを使わずに、現在のCNNおよびTransformerベースのアプローチより優れており、CDタスクにおけるMambaアーキテクチャの可能性を完全に実証している。さらなる実験は、アーキテクチャが劣化したデータに対して非常に堅牢であることを示している。ソースコードはhttps://github.com/ChenHongruixuan/MambaCDで入手できる。 Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD	翻訳日:2024-06-20 01:55:10 公開日:2024-06-17
# 白人男性、黒人女性が助ける? LLMで言語機関の社会的バイアスをベンチマーク White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs ( http://arxiv.org/abs/2404.10508v2 ) ライセンス: Link先を確認	Yixin Wan, Kai-Wei Chang,	(参考訳) 言語エージェンシーは、テキストにおける社会的偏見を評価する上で重要な側面である。いくつかの研究が人文言語におけるエージェンシー関連バイアスに近づいた一方で、LLM(Large Language Model)生成コンテンツにおけるそのようなバイアスについて、非常に限定的な研究がなされている。さらに、過去の研究は、しばしばテキスト内のエージェント語とコミュニティブ語を識別する文字列マッチング技術に依存しており、それは言語エージェンシーを正確に分類するに足らない。本稿では,言語庁バイアス評価(LABE, Language Agency Bias Evaluation)ベンチマークについて紹介する。 LABEは5,400のテンプレートベースのプロンプト、正確なエージェンシー分類器、およびそれに対応するバイアスメトリクスを利用して、3つのテキスト生成タスク(バイオグラフィー、教授レビュー、参照レター)でLSMの性別、人種、および交叉言語エージェンシーバイアスをテストする。 3,724のエージェント文と共用文からなるLanguage Agency Classification (LAC)データセットを,より良く,より正確な自動エージェント分類器の構築に寄与し,リリースする。 LABEを用いて,近年の3つのLLM(ChatGPT,Llama3,Mistral)において,未探索言語エージェンシーの社会的偏見を明らかにした。 1)同一のテキストカテゴリでは,LLM世代は人文テキストよりもジェンダーバイアスのレベルが高く,(2)ほとんどの世代タスクでは,モデルが他のバイアスのレベルよりもはるかに高い交叉バイアスのレベルを示す。性別と人種の少数派(黒人女性など)の交差点にいる人々は、一貫して低レベルの機関を持つテキストによって記述されている; (3) 調査された3つのLSMのうち、Llama3は言語エージェンシーにおいて最大の全体的なバイアスを示す; (4) プロンプトベースの緩和はLLMにおける言語エージェンシーのバイアスを解決するのに失敗するだけでなく、しばしば生成されたテキストにおけるバイアスが悪化する。 Language agency is an important aspect of evaluating social biases in texts. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous research often relies on string-matching techniques to identify agentic and communal words within texts, which fall short of accurately classifying language agency. We introduce the novel Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE leverages 5,400 template-based prompts, an accurate agency classifier, and corresponding bias metrics to test for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. To build better and more accurate automated agency classifiers, we also contribute and release the Language Agency Classification (LAC) dataset, consisting of 3,724 agentic and communal sentences. Using LABE, we unveil previously under-explored language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) For the same text category, LLM generations demonstrate higher levels of gender bias than human-written texts; (2) On most generation tasks, models show remarkably higher levels of intersectional bias than the other bias aspects. Those who are at the intersection of gender and racial minority groups -- such as Black females -- are consistently described by texts with lower levels of agency; (3) Among the 3 LLMs investigated, Llama3 demonstrates greatest overall bias in language agency; (4) Not only does prompt-based mitigation fail to resolve language agency bias in LLMs, but it frequently leads to the exacerbation of biases in generated texts.	翻訳日:2024-06-20 01:44:57 公開日:2024-06-17
# ガウス・スティング・デコーダによる3次元対応型生成逆数ネットワークの構築 Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks ( http://arxiv.org/abs/2404.10625v2 ) ライセンス: Link先を確認	Florian Barthel, Arian Beckmann, Wieland Morgenstern, Anna Hilsmann, Peter Eisert,	(参考訳) EG3D や GIRAFFE のような NeRF ベースの3D-aware Generative Adversarial Networks (GAN) は、非常に高いレンダリング品質を示す。第一に、NeRFレンダリングの計算上の重要な要求は、モバイルやVR/ARヘッドセットのような低消費電力デバイスでの使用を妨げます。第二に、ニューラルネットワークに基づく暗黙の表現は、VR環境やビデオゲームのような明示的な3Dシーンに組み込むのは難しい。 3D Gaussian Splatting (3DGS)は、高フレームレートで効率的にレンダリングできる明示的な3D表現を提供することによって、これらの制限を克服する。本研究では,NeRFをベースとした3次元GANの高画質化と,3DGSの柔軟性と計算上の利点を組み合わせた新しい手法を提案する。暗黙的なNeRF表現を明示的な3Dガウススプラッティング属性にマッピングするデコーダをトレーニングすることにより、3Dガウススプラッティングのエコシステムに3D GANの表現多様性と品質を初めて組み込むことができる。さらに,本手法により,高分解能GANインバージョンとリアルタイムGAN編集が可能となる。プロジェクトページ:florian-barthel.github.io/gaussian_decoder NeRF-based 3D-aware Generative Adversarial Networks (GANs) like EG3D or GIRAFFE have shown very high rendering quality under large representational variety. However, rendering with Neural Radiance Fields poses challenges for 3D applications: First, the significant computational demands of NeRF rendering preclude its use on low-power devices, such as mobiles and VR/AR headsets. Second, implicit representations based on neural networks are difficult to incorporate into explicit 3D scenes, such as VR environments or video games. 3D Gaussian Splatting (3DGS) overcomes these limitations by providing an explicit 3D representation that can be rendered efficiently at high frame rates. In this work, we present a novel approach that combines the high rendering quality of NeRF-based 3D-aware GANs with the flexibility and computational advantages of 3DGS. By training a decoder that maps implicit NeRF representations to explicit 3D Gaussian Splatting attributes, we can integrate the representational diversity and quality of 3D GANs into the ecosystem of 3D Gaussian Splatting for the first time. Additionally, our approach allows for a high resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes. Project page: florian-barthel.github.io/gaussian_decoder	翻訳日:2024-06-20 01:44:57 公開日:2024-06-17
# NLPモデルの潜在概念に基づく説明 Latent Concept-based Explanation of NLP Models ( http://arxiv.org/abs/2404.12545v2 ) ライセンス: Link先を確認	Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad,	(参考訳) ディープラーニングモデルによる予測の解釈と理解は、本質的に不透明な性質のため、非常に難しい課題となる。これらの予測を説明することを目的とした以前の取り組みの多くは、入力機能、特にNLPモデル内の単語に依存していた。しかし、これらの説明は、これらの単語の離散的な性質と文脈的冗長性の欠如により、あまり意味を示さないことが多い。この制限に対処するために、潜伏概念に基づく予測のための説明を生成するLACOAT(Latent Concept Attribution Method)を導入する。我々の基本的な直感は、単語が使用されるコンテキストに基づいて複数のファセットを表現できることである。したがって、文脈において単語が与えられた場合、トレーニングプロセスから派生した潜在空間はその単語の特定の面を反映する。 LACOATは、有能な入力語の表現をトレーニング潜在空間にマッピングすることで、予測の潜在文脈に基づく説明を提供することによって機能する。 Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the training latent space, allowing it to provide latent context-based explanations of the prediction.	翻訳日:2024-06-20 01:44:57 公開日:2024-06-17
# BiLO: PDE逆問題に対するバイレベルローカル演算子学習 BiLO: Bilevel Local Operator Learning for PDE inverse problems ( http://arxiv.org/abs/2404.17789v2 ) ライセンス: Link先を確認	Ray Zirui Zhang, Xiaohui Xie, John Lowengrub,	(参考訳) 本稿では、PDE逆問題を二段階最適化問題として定式化することにより、偏微分方程式(PDE)の逆問題の解法を提案する。上層部ではPDEパラメータに関してデータ損失を最小限に抑える。下層部では、与えられたPDEパラメータの近傍でPDE解演算子を局所的に近似するようにニューラルネットワークを訓練し、上層部最適化問題に対する降下方向の正確な近似を可能にする。下位レベル損失関数は、PDEパラメータに対する残差と微分の両方のL2ノルムを含む。上層と下層の両方の最適化問題に勾配勾配を同時に適用し,有効かつ高速なアルゴリズムを実現する。この手法はBiLO(Bilevel Local Operator Learning)と呼ばれ、補助変数の導入によってPDE内の未知の関数を効率的に推論することができる。複数のPDEシステムに対する広範な実験により,本手法は強いPDE制約を強制し,スパースかつノイズの多いデータに対して堅牢であり,既存手法のソフトPDE制約に固有の残差とデータ損失のバランスを取る必要がなくなることを示した。 We propose a new neural network based method for solving inverse problems for partial differential equations (PDEs) by formulating the PDE inverse problem as a bilevel optimization problem. At the upper level, we minimize the data loss with respect to the PDE parameters. At the lower level, we train a neural network to locally approximate the PDE solution operator in the neighborhood of a given set of PDE parameters, which enables an accurate approximation of the descent direction for the upper level optimization problem. The lower level loss function includes the L2 norms of both the residual and its derivative with respect to the PDE parameters. We apply gradient descent simultaneously on both the upper and lower level optimization problems, leading to an effective and fast algorithm. The method, which we refer to as BiLO (Bilevel Local Operator learning), is also able to efficiently infer unknown functions in the PDEs through the introduction of an auxiliary variable. Through extensive experiments over multiple PDE systems, we demonstrate that our method enforces strong PDE constraints, is robust to sparse and noisy data, and eliminates the need to balance the residual and the data loss, which is inherent to the soft PDE constraints in many existing methods.	翻訳日:2024-06-20 01:44:57 公開日:2024-06-17
# モンテカルロ木探索が反復推論学習による推論を強化 Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning ( http://arxiv.org/abs/2405.00451v2 ) ライセンス: Link先を確認	Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh,	(参考訳) 我々は,AlphaZero が採用した戦略に触発された反復的選好学習プロセスを通じて,Large Language Models (LLM) の推論能力の向上を目的としたアプローチを導入する。我々の研究は、MCTS(Monte Carlo Tree Search)を利用して好みデータを反復的に収集し、そのルックアヘッド機能を利用して、インスタンスレベルの報酬をよりきめ細かいステップレベルの信号に分解する。中間段階の整合性を高めるため, 結果検証と段階的自己評価を併用し, 新たに生成したデータの品質評価を継続的に更新する。提案アルゴリズムはDPO(Direct Preference Optimization)を用いて,新たに生成されたステップレベルの優先度データを用いてLCMポリシーを更新する。理論的分析は、自己改善を成功させるために、オンラインサンプルデータを使用することの重要性を明らかにしている。様々な算術的および常識的推論タスクに対する広範囲な評価は、既存のモデルよりも顕著な性能向上を示している。例えば、GSM8K、MATH、ARC-CのMistral-7B Supervised Fine-Tuning(SFT)ベースラインは精度が81.8\%$(+$5.9\%$)、34.7\%$(+$5.8\%$)、76.4\%$(+$15.8\%$)と大幅に向上している。さらに、我々の研究は、トレーニングと推論計算のトレードオフを掘り下げ、我々の方法がパフォーマンス向上を効果的に最大化する方法についての洞察を提供する。私たちのコードはhttps://github.com/YuxiXie/MCTS-DPO.comで公開されています。 We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and ARC-C, with substantial increases in accuracy to $81.8\%$ (+$5.9\%$), $34.7\%$ (+$5.8\%$), and $76.4\%$ (+$15.8\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains. Our code is publicly available at https://github.com/YuxiXie/MCTS-DPO.	翻訳日:2024-06-20 01:35:12 公開日:2024-06-17
# グラフにおける単音素半空間学習のための効率的なアルゴリズム Efficient Algorithms for Learning Monophonic Halfspaces in Graphs ( http://arxiv.org/abs/2405.00853v2 ) ライセンス: Link先を確認	Marco Bressan, Emmanuel Esposito, Maximilian Thiessen,	(参考訳) グラフの頂点上で二項分類器を学習する問題について検討する。特に、ある抽象的な意味で凸である頂点の分割である単音素半空間によって与えられる分類子を考える。単音素半空間や測地的半空間のような関連する概念は、最近関心を集め、それらの性質(例えば、VC次元)と基礎となるグラフの$G$の構造の間にいくつかの接続が引かれた。我々は、教師付き、オンライン、アクティブな設定において、モノフォニックなハーフスペースを学習するためのいくつかの新しい結果を証明した。我々の主な結果は、n = \|V(G)\|$ の時間多項式において、単音素半空間は、ほぼ最適のパッシブサンプル複雑性で学習できるということである。これにより、単調な半空間に関するいくつかの構造的洞察に基づいて、一貫した仮説チェックのための多項式時間アルゴリズムを考案し、満足度を2ドルに下げる必要がある。オンラインおよびアクティブな設定でも同様の結果が得られます。また、概念クラスは遅延$\operatorname{poly}(n)$で列挙でき、経験的リスク最小化は2.^{\omega(G)}\operatorname{poly}(n)$で、$\omega(G)$は$G$の斜め数であることを示す。これらの結果は、文献(Gonz\'alez et al , 2020)からのオープンな質問に答え、これらの問題のいくつかがNPハードである測地空間との対比を示す(Seiffarth et al , 2023)。 We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces,have recently attracted interest, and several connections have been drawn between their properties(e.g., their VC dimension) and the structure of the underlying graph $G$. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in $n = \|V(G)\|$. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to $2$-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay $\operatorname{poly}(n)$, and that empirical risk minimization can be performed in time $2^{\omega(G)}\operatorname{poly}(n)$ where $\omega(G)$ is the clique number of $G$. These results answer open questions from the literature (Gonz\'alez et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).	翻訳日:2024-06-20 01:35:12 公開日:2024-06-17
# SurfPro:連続表面に基づくタンパク質の機能設計 SurfPro: Functional Protein Design Based on Continuous Surface ( http://arxiv.org/abs/2405.06693v2 ) ライセンス: Link先を確認	Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin,	(参考訳) 所望の機能を持つタンパク質をどうやって設計できるのか? 我々は、幾何学的構造と生化学的性質の両方がタンパク質の機能に重要であるという化学的直感に動機付けられている。本稿では,期待表面の機能性タンパク質の生成法であるSurfProとその生化学的性質について述べる。 SurfProは、タンパク質表面の幾何学的形状及び生化学的特徴を段階的にモデル化する階層エンコーダと、アミノ酸配列を生成する自己回帰デコーダとを備える。本稿では,標準的な逆フォールディングベンチマークCATH 4.2でSurfProを評価し,タンパク質結合体設計と酵素設計の2つの機能的タンパク質設計タスクについて検討した。我々のSurfProは、従来の逆フォールディング法を一貫して上回り、CATH 4.2で57.78%の回復率、タンパク質-タンパク質結合と酵素-基質相互作用のスコアで高い成功率を達成した。 How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.	翻訳日:2024-06-20 01:35:12 公開日:2024-06-17
# 機能的に重要な部位と小分子の基質によって誘導される生成酵素設計 Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates ( http://arxiv.org/abs/2405.08205v2 ) ライセンス: Link先を確認	Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li,	(参考訳) 酵素は、化学反応を加速できる遺伝子コード化された生体触媒である。機能性酵素をどのように設計するか? 本稿では,酵素設計のための統一モデルであるEnzyGenを提案する。我々のキーとなるアイデアは、酵素のアミノ酸配列とその3次元(3D)座標を、所望の触媒機能に対応する機能的に重要な部位と基質に基づいて生成することである。これらの部位は酵素データベースから自動的に採掘される。 EnzyGenは、タンパク質配列全体における長距離相関と、3D空間における最も近いアミノ酸の局所的影響の両方を捉える、新しいインターリービングネットワークと近隣の同変層で構成されている。生成モデルを学習するために、配列生成損失、位置予測損失、酵素-基質相互作用損失を含む共同学習目標を考案する。さらに、タンパク質データバンク(PDB)内のすべての利用可能な酵素をカバーする3157の酵素ファミリーを持つデータセットであるEnzyBenchを構築した。実験の結果、EnzyGenは323の試験ファミリで一貫して最高のパフォーマンスを達成し、基質結合親和性の点で10.79%のベースラインを上回りました。これらの結果から, 高い親和性を有する特定の基質に結合する, 十分に折りたたみされた, 効果的な酵素を設計する上で, EnzyGenが優れていることが示唆された。 Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.	翻訳日:2024-06-20 01:35:12 公開日:2024-06-17
# $\varepsilon$-fairnessの不公平 The Unfairness of $\varepsilon$-Fairness ( http://arxiv.org/abs/2405.09360v2 ) ライセンス: Link先を確認	Tolulope Fadina, Thorsten Schmidt,	(参考訳) 意思決定プロセスの公平性は確率的指標を用いて定量化されることが多い。しかし、これらの指標は、実際の不公平な結果を完全には捉えていないかもしれない。本稿では,意思決定プロセスの現実的影響をより正確に測定するために,ユーティリティベースのアプローチを採用する。特に、$\varepsilon$-fairnessという概念が採用された場合、現実世界の文脈で最大に不公平な結果をもたらす可能性があることを示す。さらに, 虚偽陰性に関する不使用データの一般的な問題に対して, 重要な公平性を考慮した設定の削減を提案する。本研究は,大学入学と信用リスク評価の2つの実例を用いて実施した。分析の結果,従来の確率に基づく評価は公平性を示唆するが,実用性に基づくアプローチは真に平等を達成するために必要な行動を明らかにする。例えば,大学入試の場合,修了率の向上は公平性の確保に不可欠であることが判明した。本論文は, 公平性を評価する上で, 現実の文脈を考えることの重要性を強調した。 Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.	翻訳日:2024-06-20 01:35:12 公開日:2024-06-17
# Tiny Refinements Elicit Resilience: : LLM-Teaming に対する効率的なプレフィックスモデルに向けて Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming ( http://arxiv.org/abs/2405.12604v2 ) ライセンス: Link先を確認	Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, Xinping Yi, Xiaowei Huang,	(参考訳) 大規模言語モデル(LLM)のレッドチーム戦略の普及に伴い,LLM防衛戦略の安全性と堅牢性向上に関する文献の不足がますます顕著になっている。本稿では,LLM をベースとした <textbf{sentinel} モデルを,入力プロンプトをわずか (<30$) 追加トークンで再構成し,ターゲット LLM からの応答に対する毒性を効果的に低減するプラグイン・アンド・プレイプレフィックスモジュールとして導入する。センチネルモデルは、微調整された大きなターゲットモデルに対して、自然に \textit{parameter inefficiency} と \textit{limited model accessibility} を克服する。我々はPPO(Proximal Policy Optimization)を用いてレッドチームとセンチネルモデルの両方を動的に最適化し、エージェント間の複雑な相互作用を管理するためにマルチエージェントの中央集権的批評家にインスパイアされた価値共有メカニズムを取り入れたインターリーブ型トレーニングシステムを採用している。テキスト・トゥ・テキスト・トゥ・イメージにわたる広範な実験により、有害な出力を緩和するアプローチの有効性が実証された。これは、さまざまなアプリケーションの安全性とロバスト性を高める上での我々のフレームワークの可能性を強調した、 \texttt{Llama-2}, \texttt{GPT-3.5}, \texttt{Stable-Diffusion}のような大規模モデルを扱う場合であってもである。 With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications.	翻訳日:2024-06-20 01:25:27 公開日:2024-06-17
# Occam Gradient Descent Occam Gradient Descent ( http://arxiv.org/abs/2405.20194v2 ) ライセンス: Link先を確認	B. N. Kausik,	(参考訳) ディープラーニングニューラルネットワークモデルは、問題領域に適応するのに十分な大きさでなければならないが、勾配降下時のトレーニングデータの過度な適合を回避するには十分である。これらの競合する要求のバランスをとるために、トランスフォーマーのような過剰な予測されたディープラーニングモデルは、大きなデータセット上で1つのエポックのために訓練されるため、コンピューティングリソースとトレーニングデータの両方で非効率である。これらの非効率性に対応するために、我々は学習理論を利用してOccam Gradient Descentを導出する。Occam Gradient Descentはモデルサイズを適応的に減少させ、一般化誤差を最小限に抑えるアルゴリズムである。対照的に、従来の勾配降下は、一般化誤差によらず、嵌合誤差を極度に最小化する。提案アルゴリズムは, ニューラルネットワークの重み空間とトポロジカルサイズを同時に下降させるとともに, 従来の勾配勾配よりも精度, 計算, モデル圧縮に優れる。 Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, overprovisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we exploit learning theory to derive Occam Gradient Descent, an algorithm that interleaves adaptive reduction of model size to minimize generalization error, with gradient descent on model weights to minimize fitting error. In contrast, traditional gradient descent greedily minimizes fitting error without regard to generalization error. Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification, and is effective in our experiments in outperforming traditional gradient descent with or without post-train pruning in accuracy, compute and model compression.	翻訳日:2024-06-20 01:15:43 公開日:2024-06-17
# MODABS:動的アスペクトに基づく要約のための多目的学習 MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization ( http://arxiv.org/abs/2406.03479v2 ) ライセンス: Link先を確認	Xiaobo Guo, Soroush Vosoughi,	(参考訳) オンラインコンテンツの急速な普及は、動的なアスペクトベースの要約が目立つ効果的な要約方法を必要とする。既知のアスペクトの固定セットを前提とする従来のものとは異なり、このアプローチは入力テキストのさまざまな側面に適応する。本稿では,Longformer-Encoder-Decoderを用いた新しい多目的学習フレームワークを提案する。このフレームワークはアスペクト数予測を最適化し、各アスペクトに対する生成された要約と参照の相違を最小化し、アスペクト固有の要約間の相違を最大化する。大規模な実験により,本手法は,単一アスペクトの要約品質を犠牲にすることなく,生成されたアスペクトと参照アスペクトの効果的なアライメントによって,3つの多様なデータセットのベースラインを著しく上回ることがわかった。 The rapid proliferation of online content necessitates effective summarization methods, among which dynamic aspect-based summarization stands out. Unlike its traditional counterpart, which assumes a fixed set of known aspects, this approach adapts to the varied aspects of the input text. We introduce a novel multi-objective learning framework employing a Longformer-Encoder-Decoder for this task. The framework optimizes aspect number prediction, minimizes disparity between generated and reference summaries for each aspect, and maximizes dissimilarity across aspect-specific summaries. Extensive experiments show our method significantly outperforms baselines on three diverse datasets, largely due to the effective alignment of generated and reference aspect counts without sacrificing single-aspect summarization quality.	翻訳日:2024-06-20 01:15:43 公開日:2024-06-17
# Feriji: フランスのZarma Parallel Corpus, Glossary & Translator Feriji: A French-Zarma Parallel Corpus, Glossary & Translator ( http://arxiv.org/abs/2406.05888v2 ) ライセンス: Link先を確認	Mamadou K. Keita, Elysabhete Amadou Ibrahim, Habibatou Abdoulaye Alfari, Christopher Homan,	(参考訳) 近年,機械翻訳(MT)が急速に発展し,複数の言語を精度良く翻訳できるモデルの開発が進んでいる。しかし、この分野におけるアフリカの言語の表現は、言語的な複雑さと限られた資源のために改善する必要がある。これは、ニジェールと近隣諸国で500万人以上の人々が話していたソンハイ語(ニロ・サハラ語族)の方言であるザーマ語に当てはまる。本稿では,Zarmaの61,085文,フランス語42,789文,および4,062語からなる用語集が,Zarmaのさらなるリソースの必要性に対処するための重要なステップであることを示す。我々はデータセット上で3つの大きな言語モデルを微調整し、最高の性能モデルでBLEUスコア30.06を得る。さらに, 流布, 理解, 可読性の人的判断に関するモデルと, コーパスとモデルの重要性と影響について検討した。私たちの貢献は、重要な言語ギャップを埋め、本質的で見落とされたアフリカの先住民言語を促進するのに役立ちます。 Machine translation (MT) is a rapidly expanding field that has experienced significant advancements in recent years with the development of models capable of translating multiple languages with remarkable accuracy. However, the representation of African languages in this field still needs to improve due to linguistic complexities and limited resources. This applies to the Zarma language, a dialect of Songhay (of the Nilo-Saharan language family) spoken by over 5 million people across Niger and neighboring countries \cite{lewis2016ethnologue}. This paper introduces Feriji, the first robust French-Zarma parallel corpus and glossary designed for MT. The corpus, containing 61,085 sentences in Zarma and 42,789 in French, and a glossary of 4,062 words represent a significant step in addressing the need for more resources for Zarma. We fine-tune three large language models on our dataset, obtaining a BLEU score of 30.06 on the best-performing model. We further evaluate the models on human judgments of fluency, comprehension, and readability and the importance and impact of the corpus and models. Our contributions help to bridge a significant language gap and promote an essential and overlooked indigenous African language.	翻訳日:2024-06-20 01:05:59 公開日:2024-06-17
# 速度ゆらぎ下でのクロスマシントランスファー故障診断における解釈可能な変調可能なSTFTと物理インフォームドバランススペクトル測定 Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations ( http://arxiv.org/abs/2406.11917v1 ) ライセンス: Link先を確認	Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu,	(参考訳) 輪車軸受の運転条件は、鉄道重貨物列車の安全運転に直接的な影響を与えている。しかし, 列車の速度変動と断層試料が少ないことが, 故障診断の精度を抑える主な問題である。そこで, 解釈可能な可変可変短時間フーリエ変換(STFT)と物理インフォームドスペクトル品質測定を併用したクロスマシントランスファー診断(pyDSN)ネットワークを提案し, 時間変化速度下でのドメイン不変および識別的特徴を学習した。まず,固定窓を用いた時間変化速度信号の抽出周波数成分の抽出が不十分なため,STFTインフォームド理論サポートと解釈可能な変調可微分STFT (MDSTFT) が提案され,堅牢な時間周波数スペクトル (TFS) を抽出する。トレーニングプロセス中、異なる長さの複数のウィンドウが動的に変化する。また, 分類基準と領域差測度に加えて, 物理インフォームド計量と呼ばれる第3の種類の計量を創造的に導入し, 伝送可能TFSを向上する。 MDSTFTとモデルのための最適化方向を導出するために,物理インフォームド平衡スペクトル品質(BSQ)正規化損失を考案した。これにより、高品質のTFSをモデルにできるだけでなく、物理に制限されたドメイン適応ネットワークも取得でき、現実世界の物理知識を学習し、最終的には異なるデータセット間でドメインの不一致を減少させることができる。この実験は、実験室のデータセットから貨物列車のデータセットへの移行シナリオにおいて行われ、ハイブリッド駆動のpyDSNが既存の手法より優れ、実用的な価値を持っていることを示す。 The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# 専門家の混在に対するグラフ知識蒸留 Graph Knowledge Distillation to Mixture of Experts ( http://arxiv.org/abs/2406.11919v1 ) ライセンス: Link先を確認	Pavel Rumiantsev, Mark Coates,	(参考訳) 精度の面では、ノード分類タスクにおいて、グラフニューラルネットワーク(GNN)が最適なアーキテクチャ選択である。現実のデプロイメントにおける彼らの欠点は、近隣の処理操作から生じるレイテンシである。遅延問題の1つの解決策は、訓練されたGNNからMulti-Layer Perceptron (MLP)への知識蒸留を行うことである。しかし, 従来の知識蒸留技術では, トランスダクティブ・インダクティブ・セッティングでの性能は相容れない。 MLPの代わりに特別設計の学生モデルを用いて性能問題に対処することを提案する。我々のモデルはRubM(Rubing-by-Memory)と呼ばれ、Mixture-of-Experts(MoE)の一種であり、専門家の専門化を強制する設計である。隠れ表現空間上の特定の領域を専門化することを各専門家に促すことにより、複数のデータセット間でより一貫性のあるパフォーマンスを導出できることを実験的に実証する。 In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each expert to specialize on a certain region on the hidden representation space, we demonstrate experimentally that it is possible to derive considerably more consistent performance across multiple datasets.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# Job-SDF: ジョブスキル需要予測とベンチマークのためのマルチグラニュラリティデータセット Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking ( http://arxiv.org/abs/2406.11920v1 ) ライセンス: Link先を確認	Xi Chen, Chuan Qin, Chuyu Fang, Chao Wang, Chen Zhu, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong,	(参考訳) 急速に発展する雇用市場では、政策立案者や企業が変化を予測し、適応し、労働力のスキルが市場のニーズに合致することを保証し、生産性と競争力を高めるため、スキル需要予測が不可欠である。さらに、新たなスキル要件を特定することで、個人を関連するトレーニングや教育機会に誘導し、継続的な自己学習と開発を促進する。しかし、包括的なデータセットが存在しないことは、研究とこの分野の進歩を妨げる重要な課題である。このギャップを埋めるため、ジョブスキル需要予測モデルをトレーニングし、ベンチマークするためのデータセットであるJob-SDFを提示する。 2021年から2023年の間に中国の大手オンライン求人プラットフォームから収集された1035万件の求人広告に基づいて、このデータセットは521社にまたがる2324種類のスキルの月次求人需要を含んでいる。本データセットは,職業,企業,地域レベルなど,さまざまな粒度でのスキル需要予測モデルの評価を可能にする。我々は、このデータセット上のさまざまなモデルをベンチマークし、標準シナリオにおけるそれらのパフォーマンスの評価、低い値範囲に焦点をあてた予測、構造的なブレークの存在下で、さらなる研究のための新たな洞察を提供する。私たちのコードとデータセットはhttps://github.com/Job-SDF/benchmark.comから公開されています。 In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# 交通予測のための時空間変圧器の再考:多段階多視点学習フレームワーク Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework ( http://arxiv.org/abs/2406.11921v1 ) ライセンス: Link先を確認	Jiaqi Lin, Qianqian Ren,	(参考訳) 交通予測は、非常に複雑な時空間相関を伴う時空間予測問題である。本稿では,交通予測のためのマルチレベル多視点時空間変換器(LVSTformer)を提案する。このモデルは、地理的、グローバルセマンティック、ピボットノードの3つの異なるレベルから空間的依存関係を、長期および短期の時間的依存関係とともにキャプチャすることを目的としている。具体的には,局所的,大域的,重要なノードの観点から空間情報を探索するための3つの空間的拡張ビューを設計する。 3つの空間的拡張ビューと3つの平行な空間的自己アテンションメカニズムを組み合わせることで、モデルは異なるレベルの空間的依存関係を包括的にキャプチャすることができる。本研究では,長期的・短期的依存関係を効果的に把握するゲート型時間的自己注意機構を設計する。さらに、2つの時空間層の間に時空間放送モジュールを導入し、注意点の分散配置を確実にし、過度な適合と情報損失を軽減し、モデルの一般化能力と堅牢性を高める。実験結果は,LVSTformerが競合するベースラインと比較して最先端の性能を達成し,最大4.32%まで向上したことを示す。 Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# ソーシャルメディア予測の分類と実際の市場データによる予測の検証による財務専門家の信用度の評価 Explainable assessment of financial experts' credibility by classifying social media forecasts and checking the predictions with actual market data ( http://arxiv.org/abs/2406.11924v1 ) ライセンス: Link先を確認	Silvia García-Méndez, Francisco de Arriba-Pérez, Jaime González-Gonzáleza, Francisco J. González-Castaño,	(参考訳) ソーシャルメディアには、ユーザーの人気に関連する多様なインタラクションメトリクスが含まれており、最も顕著な例は、ユーザーのフォロワー数である。後者は、最も人気のあるクリエーターによる投稿の信頼性に関する懸念を提起している。しかしながら、ソーシャルメディアにおける信頼性を評価する既存のアプローチのほとんどは、この問題を、実際の現実の事実がユーザのコメントを返却するかどうかを確認することなく、しばしば優先順位情報に基づくバイナリ分類であると厳密にみなしている。また、信頼を育むための予測について、自動的な説明は提供していない。本研究では,自然言語処理と機械学習を組み合わせたソーシャルメディア上での財務担当者に対する信頼性評価ソリューションを提案する。コントリビュータの評判は、資産価値の予測をタイプ別に自動的に分類し、これらの予測を実際の市場データで検証し、成功の確率を近似することで評価される。この検証の結果は、バイナリ結果ではなく、継続的な信頼性スコアであり、この研究によるまったく新しい貢献である。さらに、ソーシャルメディアのメトリクス(すなわちユーザコンテキスト)は、信頼度ランキングとの相関を計算し、ファイナンシャルポストにおけるエンドユーザの関心と予測(すなわち、ドロップまたはアップ)に関する洞察を提供することによって活用される。最後に、関係する特徴のモデルに依存しない分析に基づいて、その決定に関する自然言語による説明を提供する。 Social media include diverse interaction metrics related to user popularity, the most evident example being the number of user followers. The latter has raised concerns about the credibility of the posts by the most popular creators. However, most existing approaches to assess credibility in social media strictly consider this problem a binary classification, often based on a priori information, without checking if actual real-world facts back the users' comments. In addition, they do not provide automatic explanations of their predictions to foster their trustworthiness. In this work, we propose a credibility assessment solution for financial creators in social media that combines Natural Language Processing and Machine Learning. The reputation of the contributors is assessed by automatically classifying their forecasts on asset values by type and verifying these predictions with actual market data to approximate their probability of success. The outcome of this verification is a continuous credibility score instead of a binary result, an entirely novel contribution by this work. Moreover, social media metrics (i.e., user context) are exploited by calculating their correlation with the credibility rankings, providing insights on the interest of the end-users in financial posts and their forecasts (i.e., drop or rise). Finally, the system provides natural language explanations of its decisions based on a model-agnostic analysis of relevant features.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# DocCGen: ドキュメントベースの制御コード生成 DocCGen: Document-based Controlled Code Generation ( http://arxiv.org/abs/2406.11925v1 ) ライセンス: Link先を確認	Sameer Pimparkhede, Mehant Kammakomati, Srikanth G. Tamilselvam, Prince Kumar, Ashok Pon Kumar, Pushpak Bhattacharyya,	(参考訳) 近年の進歩により、Large Language Models (LLM) は、C++、Java、Pythonといったリソースに富む汎用言語のためのコード生成に、自然言語(NL)で最先端のパフォーマンスをもたらすことが示されている。しかし、YAMLやJSONのような構造化ドメイン固有言語(DSL)に対する実践的な利用は、事前トレーニング中に一般的にLLMによって見つからないドメイン固有スキーマ、文法、カスタマイズによって制限される。この課題を、関連する例や微調整を通じて、コンテキスト内学習を通じて軽減する努力がなされている。しかし、DSLサンプルの制限や迅速な感度といった問題に悩まされているが、企業はDSLの優れたドキュメントを維持している。そこで我々は,構造化コード言語のためのNL-to-Code生成タスクを2段階のプロセスに分解することで,このような豊富な知識を活用できるフレームワークDocCGenを提案する。まず、NLクエリに最もよくマッチするライブラリドキュメントを使用して、正しいライブラリを検出する。次に、これらのライブラリのドキュメントから抽出したスキーマルールを使用して、デコードを制限する。我々は、Ansible YAML と Bash という2つの複雑な構造化言語に対して、アウト・オブ・ドメイン(OOD)とイン・ドメイン(ID)の2つの設定からなるフレームワークを評価した。我々の広範な実験により、DocCGenは6つの評価指標のすべてで異なるサイズの言語モデルを一貫して改善し、構造化コードにおける構文的および意味的誤りを低減します。制約付きコード生成の研究を動機付けるために、データセットとコードをオープンソース化する予定です。 Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by LLMs during pre-training. Efforts have been made to mitigate this challenge via in-context learning through relevant examples or by fine-tuning. However, it suffers from problems, such as limited DSL samples and prompt sensitivity but enterprises maintain good documentation of the DSLs. Therefore, we propose DocCGen, a framework that can leverage such rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. First, it detects the correct libraries using the library documentation that best matches the NL query. Then, it utilizes schema rules extracted from the documentation of these libraries to constrain the decoding. We evaluate our framework for two complex structured languages, Ansible YAML and Bash command, consisting of two settings: Out-of-domain (OOD) and In-domain (ID). Our extensive experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics, reducing syntactic and semantic errors in structured code. We plan to open-source the datasets and code to motivate research in constrained code generation.	翻訳日:2024-06-20 00:46:12 公開日:2024-06-17
# REPOEXEC: Repository-Level Executableベンチマークによるコード生成の評価 REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark ( http://arxiv.org/abs/2406.11927v1 ) ライセンス: Link先を確認	Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui,	(参考訳) CodeLLMs が \textit{repository-level scale } で実行可能で機能的に正しいコードを生成する能力はほとんど探索されていない。リポジトリレベルのスケールでコード生成を評価するための新しいベンチマークである‘methodnamews’を導入し、実行可能性と正確性を強調した。 \methodnamewsは、要求を検証し、高カバレッジのテストケースを動的に生成して生成されたコードの機能を評価するメカニズムを組み込む自動システムを提供する。当社の作業では、開発者が必要なコード依存関係を指定して、モデルにこれらを正確に統合させるという、コントロールされたシナリオについて検討しています。実験によると、事前訓練されたLLMは命令チューニングモデルよりも正確性が高いが、後者は、提供された依存関係を活用し、デバッグ機能を示すのに優れている。 \methodnamewsは、コード機能の包括的な評価と開発者の意図の整合性を提供することを目的としている。 The ability of CodeLLMs to generate executable and functionally correct code at the \textit{repository-level scale }remains largely unexplored. We introduce \methodnamews, a novel benchmark for evaluating code generation at the repository-level scale, emphasizing executability and correctness. \methodnamews provides an automated system that verifies requirements and incorporates a mechanism for dynamically generating high-coverage test cases to assess the functionality of generated code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuning models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. \methodnamews aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# FlexCare: フレキシブルなマルチモーダルヘルスケア予測のためのクロスタスクシナジーを活用する FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction ( http://arxiv.org/abs/2406.11928v1 ) ライセンス: Link先を確認	Muhao Xu, Zhenfeng Zhu, Youru Li, Shuai Zheng, Yawei Zhao, Kunlun He, Yao Zhao,	(参考訳) マルチモーダル電子健康記録(EHR)データは、患者の健康状態の総合的な評価を提供し、様々な予測医療タスクをサポートする。近年,医療領域におけるマルチタスク学習のアプローチを取り入れた研究がいくつかある。しかし、既存の手法では全てのタスクに対して完全なラベルを持つ必要があるため、データに強い要求を課し、モデルの柔軟性を制限する必要がある。一方、マルチモーダルな入力を持つマルチタスクフレームワークでは、モーダル間の情報格差を包括的に考慮する方法は依然として難しい問題である。これらの課題に対処するために,不完全なマルチモーダル入力を柔軟に適応し,複数の医療タスクへの適応を促進するために,‘textbf{FlexCare}’と呼ばれる統合医療予測モデルを提案する。提案モデルは,従来の並列マルチタスク予測のパラダイムを,非同期な単一タスク予測に分解することで破る。具体的には、タスクに依存しないマルチモーダル情報抽出モジュールを提示し、多様なモーダル内およびモーダル間パターンの非相関表現をキャプチャする。異なるモダリティと異なるタスク間の情報格差をフルに考慮し、洗練されたモダリティレベルの表現を個別の患者レベルの表現に統合するタスク誘導型階層型マルチモーダル融合モジュールを提案する。 MIMIC-IV/MIMIC-CXR/MIMIC-NOTEデータセットによる複数のタスクの実験結果から,提案手法の有効性が示された。さらに、さらなる分析は、医療領域でそのようなマルチタスク戦略を採用する可能性と可能性を示している。ソースコードはhttps://github.com/mhxu1998/FlexCareで入手できる。 Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samples to possess complete labels for all tasks, which places heavy demands on the data and restricts the flexibility of the model. Meanwhile, within a multitask framework with multimodal inputs, how to comprehensively consider the information disparity among modalities and among tasks still remains a challenging problem. To tackle these issues, a unified healthcare prediction model, also named by \textbf{FlexCare}, is proposed to flexibly accommodate incomplete multimodal inputs, promoting the adaption to multiple healthcare tasks. The proposed model breaks the conventional paradigm of parallel multitask prediction by decomposing it into a series of asynchronous single-task prediction. Specifically, a task-agnostic multimodal information extraction module is presented to capture decorrelated representations of diverse intra- and inter-modality patterns. Taking full account of the information disparities between different modalities and different tasks, we present a task-guided hierarchical multimodal fusion module that integrates the refined modality-level representations into an individual patient-level representation. Experimental results on multiple tasks from MIMIC-IV/MIMIC-CXR/MIMIC-NOTE datasets demonstrate the effectiveness of the proposed method. Additionally, further analysis underscores the feasibility and potential of employing such a multitask strategy in the healthcare domain. The source code is available at https://github.com/mhxu1998/FlexCare.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 集団限界外における雑音性SVGDの長期無症状 Long-time asymptotics of noisy SVGD outside the population limit ( http://arxiv.org/abs/2406.11929v1 ) ライセンス: Link先を確認	Victor Priser, Pascal Bianchi, Adil Salim,	(参考訳) Stein Variational Gradient Descent (SVGD) は、機械学習の分野で広く使われているサンプリングアルゴリズムである。 SVGDは、対象の分布を近似するために相互作用する粒子(サンプルを表す)の集合を反復的に移動する。 SVGDとその変種に関する最近の研究にもかかわらず、その長年の漸近的挙動(つまり、何度も繰り返した後に)は、有限個の粒子系では理解されていない。 SVGDの雑音変化の長期的漸近挙動について検討した。まず、大きめのノイズSVGDの極限集合が well-defined であることを示す。次に、この極限集合を特徴付け、増加とともにターゲット分布に近づくことを示す。特に、ノイズSVGDは、SVGDで観測される分散崩壊を確実に回避する。我々のアプローチは、ノイズの多いSVGDの軌道がマッケイン・ブラソフ過程によって記述された軌道とよく似ていることを示すものである。 Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# Code-LLMsが学ぶべきでないことの批判的研究 A Critical Study of What Code-LLMs (Do Not) Learn ( http://arxiv.org/abs/2406.11930v1 ) ライセンス: Link先を確認	Abhinav Anand, Shweta Verma, Krishna Narasimhan, Mira Mezini,	(参考訳) コードコーパス(コード-LLM)で訓練された大規模言語モデルは、様々なコーディング支援タスクにおいて素晴らしいパフォーマンスを示している。しかし、サイズとトレーニングデータセットが増大しているにも関わらず、コード-LLMには構文エラーや変数の誤用といった制限がある。コードLLMは、自己注意と隠された表現を用いて入力トークン間の関係を符号化するため、コーディングタスクでうまく機能すると主張する研究もある。しかし、以前の研究では、コード-LLMがコードプロパティをエンコードしていないかは研究されていない。本稿では,注意マップとコード-LLMの隠れ表現の微粒化解析を行う。コード-LLMは入力トークンの特定のサブセット間の関係を符号化するのみである。具体的には、入力トークンを統語トークンと識別子に分類することにより、モデルが統語トークンと識別子間の関係を符号化するが、それらが統語トークンと識別子間の関係を符号化しないことがわかった。また、微調整されたモデルでは、事前訓練されたモデルと比較して、これらの関係をコード化していないことも判明した。さらに、数十億のパラメータを持つ大規模なモデルは、数億のパラメータを持つモデルよりもコードに関する情報をかなり少ないコードでエンコードします。 Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidden representations to encode relations among input tokens. However, previous works have not studied what code properties are not encoded by code-LLMs. In this paper, we conduct a fine-grained analysis of attention maps and hidden representations of code-LLMs. Our study indicates that code-LLMs only encode relations among specific subsets of input tokens. Specifically, by categorizing input tokens into syntactic tokens and identifiers, we found that models encode relations among syntactic tokens and among identifiers, but they fail to encode relations between syntactic tokens and identifiers. We also found that fine-tuned models encode these relations poorly compared to their pre-trained counterparts. Additionally, larger models with billions of parameters encode significantly less information about code than models with only a few hundred million parameters.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# DeepSeek-Coder-V2: コードインテリジェンスにおけるクローズドソースモデルの障壁を突破する DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence ( http://arxiv.org/abs/2406.11931v1 ) ライセンス: Link先を確認	DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao, Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao, Chong Ruan, Fuli Luo, Wenfeng Liang,	(参考訳) We present DeepSeek-Coder-V2, a open-source Mixture-of-Experts (MoE) code language model that achieve performance to GPT4-Turbo in code-specific task。具体的には、DeepSeek-Coder-V2はさらに6兆トークンを追加して、DeepSeek-V2の中間チェックポイントから事前トレーニングされている。この継続事前トレーニングを通じて、DeepSeek-Coder-V2は、一般的な言語タスクで同等のパフォーマンスを維持しながら、DeepSeek-V2のコーディングと数学的推論能力を大幅に強化する。 DeepSeek-Coder-33Bと比較すると、DeepSeek-Coder-V2は、推論や一般的な機能だけでなく、コード関連タスクの様々な面で大きな進歩を示している。さらに、DeepSeek-Coder-V2はプログラミング言語のサポートを86から338に拡張し、コンテキスト長は16Kから128Kに拡張した。標準的なベンチマーク評価では、コーディングや数学ベンチマークにおいて、GPT4-Turbo、Claude 3 Opus、Gemini 1.5 Proといったクローズドソースモデルと比較して、DeepSeek-Coder-V2は優れたパフォーマンスを実現している。 We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 大規模リモートセンシングデータセットを用いたマスクオートエンコーダのスケーリング Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset ( http://arxiv.org/abs/2406.11933v1 ) ライセンス: Link先を確認	Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun,	(参考訳) Masked Image Modeling (MIM)は、リモートセンシング(RS)分野における基礎的な視覚モデルを開発するための重要なアプローチとして登場した。しかし、現在のRSデータセットはボリュームと多様性に制限されており、一般化可能な表現を学習するためのMIMメソッドの容量を著しく制限している。本研究では,高効率なMIMトレーニングを実現するために設計された大規模データセットである \textbf{RS-4M} を紹介する。 RS-4Mは、オブジェクトレベルの検出やピクセルレベルのセグメンテーションを含む、リッチできめ細かなRS視覚タスクを含む400万の光学画像で構成されている。自然画像と比較すると、RS画像には大量の背景画素が含まれており、従来のMIMモデルのトレーニング効率を制限している。そこで本研究では,その意味的豊かさに基づいて選択されたパッチトークンのサブセットを動的にエンコードし,再構成する,効率的なMIM手法である「textbf{SelectiveMAE}」を提案する。 SelectiveMAEはプログレッシブなセマンティックトークン選択モジュールのルーツであり、セマンティックな類似トークンの再構成から相補的なセマンティック依存関係の符号化へと進化している。このアプローチは、従来のMIMトレーニングをプログレッシブな特徴学習プロセスに変換し、SelectiveMAEがRS画像の堅牢な表現を効率的に学習できるようにする。大規模な実験により、SelectiveMAEはトレーニング効率を2.2-2.7倍に向上し、ベースラインMIMモデルの分類、検出、セグメンテーション性能を向上させることが示されている。 Masked Image Modeling (MIM) has emerged as a pivotal approach for developing foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly efficient MIM training on RS images. RS-4M comprises 4 million optical images encompassing abundant and fine-grained RS visual tasks, including object-level detection and pixel-level segmentation. Compared to natural images, RS images often contain massive redundant background pixels, which limits the training efficiency of the conventional MIM models. To address this, we propose an efficient MIM method, termed \textbf{SelectiveMAE}, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness. SelectiveMAE roots in a progressive semantic token selection module, which evolves from reconstructing semantically analogical tokens to encoding complementary semantic dependencies. This approach transforms conventional MIM training into a progressive feature learning process, enabling SelectiveMAE to efficiently learn robust representations of RS images. Extensive experiments show that SelectiveMAE significantly boosts training efficiency by 2.2-2.7 times and enhances the classification, detection, and segmentation performance of the baseline MIM model.The dataset, source code, and trained models will be released.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# ブリッジングデザインギャップ:グラフ誘導拡散モデルを用いたパラメトリックデータ補完手法 Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models ( http://arxiv.org/abs/2406.11934v1 ) ライセンス: Link先を確認	Rui Zhou, Chenyang Yuan, Frank Permenter, Yanxia Zhang, Nikos Arechiga, Matt Klenk, Faez Ahmed,	(参考訳) 本研究では, グラフ注意ネットワークと表層拡散モデルを利用して, 工学設計におけるパラメトリックデータの欠落を解消する生成的計算モデルを提案する。このモデルはAI設計の共同パイロットとして機能し、不完全設計のための複数の設計オプションを提供し、自転車設計CADデータセットを用いて実演する。比較評価により,提案手法は従来の手法,例えばMissForest, HotDeck, PPCA, および表層生成法であるTabCSDIよりも精度と多様性が優れていることを示した。生成モデリングはまた、設計可能性のより広範な探索を可能にし、エンジニアが様々な設計完了を探索できるようにすることで設計決定を強化する。グラフモデルは、GNNとアセンブリグラフに含まれる構造情報を組み合わせて、異なる設計パラメータ間の複雑な相互依存性を理解し、予測することができる。グラフモデルは、設計問題の鍵となるアセンブリグラフから複雑なパラメトリック相互依存性を正確にキャプチャし、インプットするのに役立つ。既存の設計データセットから学習することで、インプット機能は、ユーザが定義した部分パラメトリック設計に基づいてCAD設計を自動補完するインテリジェントアシスタントとして機能し、アイデアと実現のギャップを効果的に埋めることができる。提案された研究は、インフォームドデザインの決定を促進するだけでなく、デザインにおける創造的な探索を促進する経路を提供する。 This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our model significantly outperforms existing classical methods, such as MissForest, hotDeck, PPCA, and tabular generative method TabCSDI in both the accuracy and diversity of imputation options. Generative modeling also enables a broader exploration of design possibilities, thereby enhancing design decision-making by allowing engineers to explore a variety of design completions. The graph model combines GNNs with the structural information contained in assembly graphs, enabling the model to understand and predict the complex interdependencies between different design parameters. The graph model helps accurately capture and impute complex parametric interdependencies from an assembly graph, which is key for design problems. By learning from an existing dataset of designs, the imputation capability allows the model to act as an intelligent assistant that autocompletes CAD designs based on user-defined partial parametric design, effectively bridging the gap between ideation and realization. The proposed work provides a pathway to not only facilitate informed design decisions but also promote creative exploration in design.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 反復的か革新的か? コード最適化のための問題指向の視点 Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization ( http://arxiv.org/abs/2406.11935v1 ) ライセンス: Link先を確認	Tong Ye, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang,	(参考訳) 大規模言語モデル(LLM)は、幅広いプログラミングタスクを解く上で強力な能力を示している。しかし、LLMはコード最適化のために研究されることはめったにない。本稿では,パフォーマンス向上に着目したコード最適化について検討する。最近提案された性能最適化のための最初のPIEデータセットは、同じ問題に対して同じプログラマからの反復的な提案に基づいて、プログラム最適化ペアを構成する。しかし、このアプローチはLLMを局所的な性能改善に制限し、グローバルアルゴリズムの革新を無視している。したがって、最適化ペアを問題指向のアプローチに再構成することで、まったく異なる視点を採用する。これにより、異なるプログラマが同じ問題に対処する様々な巧妙なアイデアの統合が可能になる。実験により, LLMを問題指向最適化ペアに適応させることで, 最適化性能が著しく向上することが示された。一方、問題指向の観点からパフォーマンスボトルネックを特定しました。モデルマージを利用することで、ボトルネックをさらに克服し、最終的にプログラム最適化比率(51.76\%\rightarrow76.65\%$)とスピードアップ(2.65\times\rightarrow5.09\times$)を新たなレベルに引き上げる。 Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. However, LLMs have rarely been explored for code optimization. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time. The recently proposed first PIE dataset for performance optimization constructs program optimization pairs based on iterative submissions from the same programmer for the same problem. However, this approach restricts LLMs to local performance improvements, neglecting global algorithmic innovation. Therefore, we adopt a completely different perspective by reconstructing the optimization pairs into a problem-oriented approach. This allows for the integration of various ingenious ideas from different programmers tackling the same problem. Experimental results demonstrate that adapting LLMs to problem-oriented optimization pairs significantly enhances their optimization capabilities. Meanwhile, we identified performance bottlenecks within the problem-oriented perspective. By employing model merge, we further overcame bottlenecks and ultimately elevated the program optimization ratio ($51.76\%\rightarrow76.65\%$) and speedup ($2.65\times\rightarrow5.09\times$) to new levels.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 対話型言語モデルにおける視点の追跡 Tracking the perspectives of interacting language models ( http://arxiv.org/abs/2406.11938v1 ) ライセンス: Link先を確認	Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe,	(参考訳) 大型言語モデル(LLM)は前例のない速度で高品質な情報を生成することができる。これらのモデルが社会に浸透し続けていくにつれ、それらが生み出すコンテンツは、事前学習データ、微調整データ、検索データなどの他の言語モデルに組み込まれるデータベースにおいて、ますます普及していくでしょう。本稿では,LLMの通信ネットワークの考え方を定式化し,LLMの集合内の個々のモデルの視点を表現する手法を提案する。これらのツールを用いて,様々な環境下でのLLMの通信ネットワークにおける情報拡散を系統的に研究する。 Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# クラウドソーシングデータから高品質ベンチマークへ - Arena-Hard氏とBenchBuilder Pipeline From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline ( http://arxiv.org/abs/2406.11939v1 ) ライセンス: Link先を確認	Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica,	(参考訳) 言語モデルの急速な進化は、より困難なベンチマークの開発を必要としている。現在の静的ベンチマークは、異なるモデルの能力を一貫して区別するのに苦労し、実際のユーザの好みと一致しないことが多い。一方、Chatbot Arenaのようなクラウドソースのライブプラットフォームは、さまざまな自然なプロンプトやユーザからのフィードバックを集めている。しかし、これらのプロンプトは高度に変化しており、新しいモデルにオフラインでフィードバックを適用することはできない。ベンチマークがLLM開発のペースに遅れないようにするために、モデルを確実に分離する能力と人間の好みに合わせてベンチマークを評価する方法について論じる。これらの原則の下で、私たちはライブデータソースから高品質なプロンプトをフィルタリングして、新しくて困難なプロンプトのオフライン評価を可能にする、ライブベンチマークであるBenchBuilderを開発しました。 BenchBuilderは、ドメイン知識の要求など、高品質なプロンプトの7つの指標を特定し、LLMアノテータを使用して、さまざまなトピッククラスタから高品質なプロンプトのサブセットを選択する。 LLM評価プロセスは、完全に自動化され、高品質で、常に更新されるベンチマークを保証するために、LLM判定器を使用する。 We apply BenchBuilder on the Chatbot Arena to create Arena-Hard-Auto v0.1: 500 challenge user prompts from various range of tasks。 Arena-Hard-Auto v0.1はMT-Benchよりも3倍の信頼区間を提供し、最先端の89.1%と人間の選好ランクとの合意を達成している。 BenchBuilderパイプラインは評価ベンチマークを強化し、開発者に価値のあるツールを提供する。 The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback. However, these prompts vary in sophistication and the feedback cannot be applied offline to new models. In order to ensure that benchmarks keep up with the pace of LLM development, we address how one can evaluate benchmarks on their ability to confidently separate models and their alignment with human preference. Under these principles, we developed BenchBuilder, a living benchmark that filters high-quality prompts from live data sources to enable offline evaluation on fresh, challenging prompts. BenchBuilder identifies seven indicators of a high-quality prompt, such as the requirement for domain knowledge, and utilizes an LLM annotator to select a high-quality subset of prompts from various topic clusters. The LLM evaluation process employs an LLM judge to ensure a fully automated, high-quality, and constantly updating benchmark. We apply BenchBuilder on prompts from the Chatbot Arena to create Arena-Hard-Auto v0.1: 500 challenging user prompts from a wide range of tasks. Arena-Hard-Auto v0.1 offers 3x tighter confidence intervals than MT-Bench and achieves a state-of-the-art 89.1% agreement with human preference rankings, all at a cost of only $25 and without human labelers. The BenchBuilder pipeline enhances evaluation benchmarks and provides a valuable tool for developers, enabling them to extract high-quality benchmarks from extensive data with minimal effort.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 部分的ネットワークデータを用いた干渉のモデルベース推論と実験設計 Model-Based Inference and Experimental Design for Interference Using Partial Network Data ( http://arxiv.org/abs/2406.11940v1 ) ライセンス: Link先を確認	Steven Wilkins Reeves, Shane Lubold, Arun G. Chandrasekhar, Tyler H. McCormick,	(参考訳) 安定した単位処理値の仮定は、個人の結果が他者の治療状況に影響されないことを示しているが、多くの現実世界の応用において、治療は即時治療以上の多くの人に影響を及ぼす可能性がある。干渉は一般に、ネットワーク構造を通して媒介されると考えることができる。しかし、多くの経験的な状況において、完全なネットワークデータ(これらの流出効果を調整するために要求される)はコストがかかりすぎるか、論理的には収集できない。部分的あるいは間接的に観察されるネットワークデータ(サブサンプル,集約された関係データ(ARD),エゴセントリックサンプリング,あるいは応答駆動サンプリング)は,ネットワークデータ収集のロジスティックおよび金銭的負担を軽減するが,これらの設計戦略による処理効果調整の統計的性質は探求され始めている。本稿では,構造因果モデルのレンズを用いた部分的ネットワークデータを用いて,治療効果の調整を推定・推定するためのフレームワークを提案する。また、部分的ネットワークデータのみを用いて治療を割り当てる手順についても説明し、推定値の分散を最小化するか、最適なシード化を目標とする。我々は、基礎となるグラフモデルに対して、様々な選択肢に適用可能な単一のネットワーク漸近結果を得る。本研究では,インドとマラウイにおける情報拡散と観測グラフのシミュレーション実験によるアプローチの有効性を検証した。 The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# クロスフューザー:自動車追従軌道予測のためのクロスアテンション変圧器拡張条件拡散モデル Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction ( http://arxiv.org/abs/2406.11941v1 ) ライセンス: Link先を確認	Junwei You, Haotian Shi, Keshu Wu, Keke Long, Sicheng Fu, Sikai Chen, Bin Ran,	(参考訳) 自動車軌道予測は、自動運転と高度運転支援システム(ADAS)の進歩に不可欠であり、道路の安全と交通効率を向上させる。従来の手法は基礎的な作業を行っているが、現代のディープラーニング技術、特にトランスフォーマーベースのモデルと生成的アプローチは、車両の動きや交通の相互作用における複雑なパターンや非線形パターンを捉えることによって予測精度を大幅に向上させた。しかし、これらのモデルはしばしば、現実世界の運転シナリオに不可欠な詳細な自動車追従行動や車間相互作用を見落としている。本研究では,自動車追従軌道予測のためのクロスアテンショントランスフォーマー拡張条件拡散モデル(Crossfusor)を提案する。 Crossfusorは、車間相互作用と自動車追従ダイナミクスを堅牢な拡散フレームワークに統合し、予測された軌跡の精度と現実性の両方を改善する。このモデルは、GRU、位置ベースアテンション機構、そしてFourier埋め込みを組み合わせた新しい時間的特徴符号化フレームワークを活用して、歴史的車両力学を捉える。前方拡散過程において、これらの符号化された歴史的特徴によってスケールされたノイズを使用し、逆復調過程において、複雑な車間依存関係をモデル化するために、クロスアテンショントランスフォーマーを使用する。 NGSIMデータセットの実験結果から、クロスファザーは最先端のモデル、特に長期予測において、自律運転システムの予測能力を向上する可能性を示している。 Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-linear patterns in vehicle motion and traffic interactions. However, these models often overlook the detailed car-following behaviors and inter-vehicle interactions essential for real-world driving scenarios. This study introduces a Cross-Attention Transformer Enhanced Conditional Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction. Crossfusor integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories. The model leverages a novel temporal feature encoding framework combining GRU, location-based attention mechanisms, and Fourier embedding to capture historical vehicle dynamics. It employs noise scaled by these encoded historical features in the forward diffusion process, and uses a cross-attention transformer to model intricate inter-vehicle dependencies in the reverse denoising process. Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions, showcasing its potential for enhancing the predictive capabilities of autonomous driving systems.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# クライアント-ワイズ関係グラフを用いた個人化フェデレーション知識グラフ Personalized Federated Knowledge Graph Embedding with Client-Wise Relation Graph ( http://arxiv.org/abs/2406.11943v1 ) ライセンス: Link先を確認	Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen,	(参考訳) Federated Knowledge Graph Embedding (FKGE)は、分散知識グラフから表現表現を抽出する能力と、個々のクライアントのプライバシを同時に保護する能力によって、最近かなりの関心を集めている。既存のFKGEメソッドは通常、すべてのクライアントからのエンティティ埋め込みの算術平均をグローバル補完知識として利用し、各クライアントに対するグローバルコンセンサスエンティティ埋め込みのレプリカを学ぶ。しかしながら、これらの手法は通常、異なるクライアント間の固有の意味的相違を無視する。この監視によって、グローバルに共有される補完的な知識が、特定のクライアントに合わせるとノイズが多すぎるだけでなく、局所的な最適化目標とグローバルな最適化目標の相違も生じます。これにより、学習した埋め込みの品質が損なわれる。これを解決するために,PFedEG(Personalized Federated Knowledge Graph Embedding with client-wise relation Graph)を提案する。具体的には、PFedEGは、クライアントワイド関係グラフ上の「親和性」に基づいて、近隣のクライアントからエンティティを埋め込むことで、各クライアントに対してパーソナライズされた補足的知識を学習する。それぞれのクライアントは、ローカルのトリプルとパーソナライズされた補足的知識に基づいて、パーソナライズされた埋め込み学習を行う。我々は,4つのベンチマークデータセットを用いて,最先端モデルに対する提案手法の評価を行い,本手法の優位性を実証した。 Federated Knowledge Graph Embedding (FKGE) has recently garnered considerable interest due to its capacity to extract expressive representations from distributed knowledge graphs, while concurrently safeguarding the privacy of individual clients. Existing FKGE methods typically harness the arithmetic mean of entity embeddings from all clients as the global supplementary knowledge, and learn a replica of global consensus entities embeddings for each client. However, these methods usually neglect the inherent semantic disparities among distinct clients. This oversight not only results in the globally shared complementary knowledge being inundated with too much noise when tailored to a specific client, but also instigates a discrepancy between local and global optimization objectives. Consequently, the quality of the learned embeddings is compromised. To address this, we propose Personalized Federated knowledge graph Embedding with client-wise relation Graph (PFedEG), a novel approach that employs a client-wise relation graph to learn personalized embeddings by discerning the semantic relevance of embeddings from other clients. Specifically, PFedEG learns personalized supplementary knowledge for each client by amalgamating entity embedding from its neighboring clients based on their "affinity" on the client-wise relation graph. Each client then conducts personalized embedding learning based on its local triples and personalized supplementary knowledge. We conduct extensive experiments on four benchmark datasets to evaluate our method against state-of-the-art models and results demonstrate the superiority of our method.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# トランスコーダによるLLM特徴回路の解釈 Transcoders Find Interpretable LLM Feature Circuits ( http://arxiv.org/abs/2406.11944v1 ) ライセンス: Link先を確認	Jacob Dunefsky, Philippe Chlenski, Neel Nanda,	(参考訳) 機械的解釈可能性の重要なゴールは回路解析であり、特定の振る舞いや能力に対応するモデルのスパース部分グラフを見つけることである。しかし、MLPサブレイヤは変換器ベースの言語モデルにおいて、きめ細かい回路解析を困難にしている。特に、スパースオートエンコーダ(SAE)で見られるような解釈可能な特徴は、通常、非常に多くのニューロンの線形結合であり、それぞれが考慮すべき非線形性を持つ。この設定での回路解析は、引き締まるほど大きな回路を得るか、局所的および大域的挙動を乱すのに失敗する。これを解決するためにトランスコーダを探索し、より広く、疎に活性化するMLP層を忠実に近似する。 120M, 410M, 1.4Bのパラメータを持つ言語モデル上でトランスコーダをトレーニングし, 空間性, 忠実性, 人間の解釈可能性の観点から, 少なくともSAEと同等に動作できることを見出した。次に,重みに基づく回路解析を行うためにトランスコーダを用いた新しい手法を提案する。結果として得られる回路は、入力依存項と入力不変項に適切に分解される。最後に,モデル内の未知回路のリバースエンジニアリングにトランスコーダを適用し,GPT2小形回路の高次回路に関する新たな知見を得る。その結果,トランスコーダはMLPを含むモデル計算を解釈可能な回路に分解するのに有効であることが示唆された。コードはhttps://github.com/jacobdunefsky/transcoder_circuitsで入手できる。 A key goal in mechanistic interpretability is circuit analysis: finding sparse subgraphs of models corresponding to specific behaviors or capabilities. However, MLP sublayers make fine-grained circuit analysis on transformer-based language models difficult. In particular, interpretable features -- such as those found by sparse autoencoders (SAEs) -- are typically linear combinations of extremely many neurons, each with its own nonlinearity to account for. Circuit analysis in this setting thus either yields intractably large circuits or fails to disentangle local and global behavior. To address this we explore transcoders, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We successfully train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability. We then introduce a novel method for using transcoders to perform weights-based circuit analysis through MLP sublayers. The resulting circuits neatly factorize into input-dependent and input-invariant terms. Finally, we apply transcoders to reverse-engineer unknown circuits in the model, and we obtain novel insights regarding the greater-than circuit in GPT2-small. Our results suggest that transcoders can prove effective in decomposing model computations involving MLPs into interpretable circuits. Code is available at https://github.com/jacobdunefsky/transcoder_circuits.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# GAugLLM:大規模言語モデルによるテキスト分散グラフのグラフコントラスト学習の改善 GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models ( http://arxiv.org/abs/2406.11945v1 ) ライセンス: Link先を確認	Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan,	(参考訳) 本研究は,ノードがテキスト属性で表されるテキスト分散グラフ(TAG)の自己教師型グラフ学習について研究する。数値的な特徴空間を摂動させ、グラフの位相構造を変化させる従来のグラフコントラスト法とは異なり、言語指導を通してビュー生成を改善することを目的としている。これは、リッチなセマンティック情報を持つグラフ構造を補完する、実際のアプリケーションにおけるテキスト属性の出現によって引き起こされる。しかし、これは2つの大きな理由から課題を提起する。第一に、テキストの属性は長さと品質が多様であり、本来の意味を変えることなく、生のテキスト記述を摂動させることが困難である。第二に、テキスト属性はグラフ構造を補完するが、本質的には整合性はない。ギャップを埋めるために,TAGを増強する新しいフレームワークであるGAugLLMを紹介する。 Mistralのような先進的な大規模言語モデルを活用して、自己教師付きグラフ学習を強化する。具体的には、拡張ノード特徴を生成するための混合プロンプト-エキスパート手法を提案する。提案手法は,複数のプロンプトの専門家に適応的に対応して,プロンプトエンジニアリングを用いた原文属性を数値的特徴空間にマッピングする。さらに、構造的およびテキスト的共通性を活用するための協調的なエッジ修飾器を考案し、ノード間の接続を検査または構築することでエッジ拡張を強化する。さまざまなドメインにまたがる5つのベンチマークデータセットに対する実証的な結果から、プラグインツールとして主要なコントラストメソッドのパフォーマンスを向上させるフレームワークの能力が明らかになりました。特に,拡張機能とグラフ構造により,一般的なグラフニューラルネットワークと同様に,標準生成手法の性能向上が期待できる。 GAugLLMのオープンソース実装はGithubで公開されています。 This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information. However, this presents challenges because of two major reasons. First, text attributes often vary in length and quality, making it difficulty to perturb raw text descriptions without altering their original semantic meanings. Second, although text attributes complement graph structures, they are not inherently well-aligned. To bridge the gap, we introduce GAugLLM, a novel framework for augmenting TAGs. It leverages advanced large language models like Mistral to enhance self-supervised graph learning. Specifically, we introduce a mixture-of-prompt-expert technique to generate augmented node features. This approach adaptively maps multiple prompt experts, each of which modifies raw text attributes using prompt engineering, into numerical feature space. Additionally, we devise a collaborative edge modifier to leverage structural and textual commonalities, enhancing edge augmentation by examining or building connections between nodes. Empirical results across five benchmark datasets spanning various domains underscore our framework's ability to enhance the performance of leading contrastive methods as a plug-in tool. Notably, we observe that the augmented features and graph structure can also enhance the performance of standard generative methods, as well as popular graph neural networks. The open-sourced implementation of our GAugLLM is available at Github.	翻訳日:2024-06-20 00:36:26 公開日:2024-06-17
# 六方晶窒化ホウ素における荷電空孔の固有高忠実スピン偏極 Intrinsic high-fidelity spin polarization of charged vacancies in hexagonal boron nitride ( http://arxiv.org/abs/2406.11953v1 ) ライセンス: Link先を確認	Wonjae Lee, Vincent S. Liu, Zhelun Zhang, Sangha Kim, Ruotian Gong, Xinyi Du, Khanh Pham, Thomas Poirier, Zeyu Hao, James H. Edgar, Philip Kim, Chong Zu, Emily J. Davis, Norman Y. Yao,	(参考訳) 六方晶窒化ホウ素 (hBN) における負電荷のホウ素空孔 (\mathrm{V}_{\mathrm{B}}^-$) は2次元材料の欠陥において顕著な注目を集めている。これは部分的には、その決定論的生成、良好な特性を持つ原子構造、室温での光学偏光性に起因している。地表面と励起状態の偏光ダイナミクスを両立させた広範囲な測定により,後者について検討した。これらの測定に基づいて半古典的モデルを構築し、周囲条件下での他の固体スピン欠陥を克服し、スピン偏極のほぼ均一度を予測する。我々のモデルに基づいて、我々は$\mathrm{V}_{\mathrm{B}}^-$に隣接する核スピンの自由度の存在を含み、核スピンの超微細誘導偏極を研究するためにリンドブラディアン数値の包括的集合を実行する。我々のシミュレーションは、実験によって生じる磁場の関数として現れる多くの重要な特徴を予測している。 The negatively charged boron vacancy ($\mathrm{V}_{\mathrm{B}}^-$) in hexagonal boron nitride (hBN) has garnered significant attention among defects in two-dimensional materials. This owes, in part, to its deterministic generation, well-characterized atomic structure, and optical polarizability at room temperature. We investigate the latter through extensive measurements probing both the ground and excited state polarization dynamics. We develop a semiclassical model based on these measurements that predicts a near-unity degree of spin polarization, surpassing other solid-state spin defects under ambient conditions. Building upon our model, we include the presence of nuclear spin degrees of freedom adjacent to the $\mathrm{V}_{\mathrm{B}}^-$ and perform a comprehensive set of Lindbladian numerics to investigate the hyperfine-induced polarization of the nuclear spins. Our simulations predict a number of important features that emerge as a function of magnetic field which are borne out by experiment.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 空洞QED材料に対する線形応答理論 Linear response theory for cavity QED materials ( http://arxiv.org/abs/2406.11957v1 ) ライセンス: Link先を確認	Juan Román-Roche, Álvaro Gómez-León, Fernando Luis, David Zueco,	(参考訳) 空洞QED材料における線形応答理論の厳密な枠組みについて述べる。我々のアプローチは、光と物質の間の集合的な結合を利用して、量子場理論において大きなN理論と平行に描画する。空洞と物質の両方の様々な応答に対する閉公式を導出する。我々の理論は、Dickeモデルと量子ホール効果の確立された結果の回復によって検証される。さらに、空洞がマグノン対を局所状態に結合する量子磁石において、新しい励起が発見される。 We present a rigorous framework for linear response theory in cavity QED materials. Our approach leverages the collective coupling between light and matter, drawing parallels with large-N theories in quantum field theory. We derive closed formulas for various responses of both the cavity and the matter. Our theory is validated by recovering established results for the Dicke model and the Quantum Hall Effect. Additionally, we discover novel excitations in quantum magnets, where the cavity binds magnon pairs into localized states.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# Stripping Quantum Decision Diagrams of their Identity Stripping Quantum Decision Diagrams of their Identity ( http://arxiv.org/abs/2406.11959v1 ) ライセンス: Link先を確認	Aaron Sander, Ioan-Albert Florea, Lukas Burgholzer, Robert Wille,	(参考訳) ベクトルや行列としての量子状態と演算の古典的な表現は、メモリの指数関数的な増加と、システムサイズを増やすための実行時要求に悩まされている。古典コンピューティングでの使用に基づいて、決定図(Decision Diagrams, DD)と呼ばれる別のデータ構造が提案されており、よりコンパクトな表現とより効率的な計算を提供することが多い。古典的な領域では、何十年にもわたってDDの研究が行われ、特定の用途に適した多くのバリエーションが存在する。しかし、量子コンピューティングのためのDDは生まれたばかりであり、この新しい技術に合わせる余地はまだ残っている。特に、既存のDDの表現は、アイデンティティ行列を表すノードによって拡張され、量子回路内の全ての操作をシステムサイズに拡張する必要がある。本研究では、これらのアイデンティティ構造を量子演算から取り除くことにより、量子DDにとって重要な一歩を踏み出す。これにより、それらを表現するために必要なノードの数を大幅に削減し、実装の主要なビルディングブロックに対するプレッシャーを緩和する。その結果、量子コンピューティングにはより自然な構造が得られ、現状に比べて最大70倍のランタイム改善が達成され、計算処理が大幅に高速化される。 Classical representations of quantum states and operations as vectors and matrices are plagued by an exponential growth in memory and runtime requirements for increasing system sizes. Based on their use in classical computing, an alternative data structure known as Decision Diagrams (DDs) has been proposed, which, in many cases, provides both a more compact representation and more efficient computation. In the classical realm, decades of research have been conducted on DDs and numerous variations tailored for specific applications exist. However, DDs for quantum computing are just in their infancy and there is still room for tailoring them to this new technology. In particular, existing representations of DDs require extending all operations in a quantum circuit to the full system size through extension by nodes representing identity matrices. In this work, we make an important step forward for quantum DDs by stripping these identity structures from quantum operations. This significantly reduces the number of nodes required to represent them as well as eases the pressure on key building blocks of their implementation. As a result, we obtain a structure that is more natural for quantum computing and significantly speeds up with computations-with a runtime improvement of up to 70x compared to the state-of-the-art.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# スパース非エルミートSYKモデルにおける特異値相関による量子カオスの探索 Probing quantum chaos through singular-value correlations in sparse non-Hermitian SYK model ( http://arxiv.org/abs/2406.11969v1 ) ライセンス: Link先を確認	Pratik Nandy, Tanay Pathak, Masaki Tezuka,	(参考訳) 特異値分解を生かして,SYKモデルにおける特異値のスペクトルに着目した。非エルミート系の典型的な複素固有値とは異なり、特異値は本質的に実かつ正である。以上の結果から,特異値の統計値と類似のエルミート・ガウスアンサンブルの統計値との一致が明らかとなった。非エルミートSYKモデルは、そのカオス的挙動から逸脱し、特異値比によって正確に捉えられる現象となる。スペクトル形状因子 (SFF) に類似した特異形状因子 ({\upsigma}FF) の解析は, 間隔が増大する線形ランプの消失を示す。さらに、飽和度がスパース性の重要なしきい値となるエルミート系のスペクトル複雑性に着想を得た特異な複雑性を定義する。このような分解は、非エルミート系に対する既存のホログラフィック双対の分解と関係している可能性が高い。 Utilizing singular value decomposition, our investigation focuses on the spectrum of the singular values within a sparse non-Hermitian Sachdev-Ye-Kitaev (SYK) model. Unlike the complex eigenvalues typical of non-Hermitian systems, singular values are inherently real and positive. Our findings reveal a congruence between the statistics of singular values and those of the analogous Hermitian Gaussian ensembles. An increase in sparsity results in the non-Hermitian SYK model deviating from its chaotic behavior, a phenomenon precisely captured by the singular value ratios. Our analysis of the singular form factor ({\upsigma}FF), analogous to the spectral form factor (SFF) indicates the disappearance of the linear ramp with increased sparsity. Additionally, we define singular complexity, inspired by the spectral complexity in Hermitian systems, whose saturation provides a critical threshold of sparseness. Such disintegration is likely associated with the breakdown of the existing holographic dual for non-Hermitian systems.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# キャビティQED材料:任意の光-物質結合強度における2つの線形応答理論の比較と検証 Cavity QED materials: Comparison and validation of two linear response theories at arbitrary light-matter coupling strengths ( http://arxiv.org/abs/2406.11971v1 ) ライセンス: Link先を確認	Juan Román-Roche, Álvaro Gómez-León, Fernando Luis, David Zueco,	(参考訳) 共振器と共振器とを結合した材料に対する線形応答理論を開発し, 対称性破壊相を含む光物質結合のすべての状態に有効である。我々は2つの異なるアプローチを提示し比較する。まず、分割関数に対するコヒーレントパス積分定式化を用いて熱グリーン関数を得る。このアプローチは作用のサドル点展開に依存しており、熱力学の極限で切り離すことができる。第二に、グリーン関数の運動方程式を定式化し、それらを解く。我々は、閉可解方程式系を得るために、高階グリーン関数の平均場分離を用いる。どちらの手法もキャビティと材料に対する応答関数の計算において同じ結果をもたらす。これらは素空洞と物質応答の点で得られる。この2つの手法は, 相関した光物質系における平均場分離の有効性を明らかにし, 熱力学的限界に対する有限サイズ補正を補完する手段を提供する。この理論は、長波長近似において、キャビティQED材料分野において一般的に考慮されるほとんどのシステムを含む一般的なモデルのために定式化されている。最後に、量子ホール効果と磁気モデルの収集にこの理論の詳細な応用を与える。解析的および有限サイズの正確な対角化結果に対する予測を検証する。 We develop a linear response theory for materials collectively coupled to a cavity that is valid in all regimes of light-matter coupling, including symmetry-broken phases. We present and compare two different approaches. First, using a coherent path integral formulation for the partition function to obtain thermal Green functions. This approach relies on a saddle point expansion for the action, that can be truncated in the thermodynamic limit. Second, by formulating the equations of motion for the retarded Green functions and solving them. We use a mean-field decoupling of high-order Green functions in order to obtain a closed, solvable system of equations. Both approaches yield identical results in the calculation of response functions for the cavity and material. These are obtained in terms of the bare cavity and material responses. In combination, the two techniques clarify the validity of a mean-field decoupling in correlated light-matter systems and provide complementary means to compute finite-size corrections to the thermodynamic limit. The theory is formulated for a general model that encompasses most of the systems typically considered in the field of cavity QED materials, within a long-wavelength approximation. Finally, we provide a detailed application of the theory to the Quantum Hall effect and to a collection of magnetic models. We validate our predictions against analytical and finite-size exact-diagonalization results.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 演算子に基づく量子熱力学不確実性関係 Operator-based quantum thermodynamic uncertainty relations ( http://arxiv.org/abs/2406.11974v1 ) ライセンス: Link先を確認	Pratik Sathe, Luis Pedro García-Pintos, Francesco Caravelli,	(参考訳) 粒子の位置と運動量の不確かさを結びつけるハイゼンベルクの不確実性関係は、物理系の量子的挙動に重要なフットプリントを持つ。この原理により、作業、熱、内部エネルギーに関連する熱力学電流は、よく定義されたエルミート作用素によって記述される。まず, 理論上は, 平均熱力学速度を計算するために, 単点測定を行うことが可能であることを示す。これらの速度、すなわち電流は、対応する作用素の非可換性のため、古典的なものと異なる。 Robertson-Schr\odingerの不確実性関係を用いて、それらの間の様々な熱力学的不確実性関係を得る。特に、熱速度と熱力のゆらぎと内部エネルギーのゆらぎを結びつける。さらに、この手法を量子電池に適用することにより、エネルギー・電力の不確実性関係を導出し、測定が変動にどのように影響するかを示す。 The Heisenberg uncertainty relation, which links the uncertainties of the position and momentum of a particle, has an important footprint on the quantum behavior of a physical system. Motivated by this principle, we propose that thermodynamic currents associated with work, heat, and internal energy are described by well-defined Hermitian operators; i.e., we associate physical observables to quantum thermodynamic flows. First, we show that, in principle, it is possible to perform single-point measurements to compute average thermodynamic rates. These rates, or currents, differ from their classical counterparts due to the non-commutativity of the corresponding operators. Using the Robertson-Schr\"odinger uncertainty relation, we then obtain various thermodynamic uncertainty relationships between them. In particular, we connect the fluctuations in heat rate and thermodynamic power with those in internal energy. We further illustrate this approach by applying it to quantum batteries, where we derive an energy-power uncertainty relationship and show how measurements affect the fluctuations.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 視覚的文法誘導モデルを用いた共同推論としての言語ブートストラップ Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models ( http://arxiv.org/abs/2406.11977v1 ) ライセンス: Link先を確認	Eva Portelance, Siva Reddy, Timothy J. O'Donnell,	(参考訳) 意味的・統語的ブートストラッピング・ポジトリ(Semantic and Syntactic bootstrapping posit)とは、子供が特定の言語領域についての事前の知識、例えば構文的関係(syntactic relations)を使い、後に新しい単語の意味などの他の知識を取得する手助けをするものである。両理論を裏付ける実証的な結果は、これらが互いに先行する学習戦略であると考える誘惑を招きかねない。ここでは、両者が言語習得のためのより一般的な学習戦略、すなわち共同学習に精通していると論じる。一連の視覚的文法帰納モデルを用いて,構文と意味が同時に学習された場合に,構文的および意味的ブートストラップ効果が最強であることが実証された。共同学習は、より良い文法誘導、現実的な語彙カテゴリー学習、新しい文と動詞の意味のより良い解釈をもたらす。共同学習は、構文と意味論の両方の仮説空間を相互に制約することで、学習者にとって言語習得を容易にする。多くの入力源とモダリティに対する共同推論のダイナミクスを研究することは、認知科学とAIの両方における言語モデリングと学習研究にとって重要な新しい方向性であり、より制約のある学習環境で言語をどのように獲得できるかを説明するのに役立つかもしれない。 Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 対話行動トークン:マルチターンプランナを用いたゴール指向対話におけるステアリング言語モデル Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner ( http://arxiv.org/abs/2406.11978v1 ) ライセンス: Link先を確認	Kenneth Li, Yiming Wang, Fernanda Viégas, Martin Wattenberg,	(参考訳) 本稿では,対話行動トークン(DAT)と呼ばれる言語モデルエージェントを用いて,目標指向対話を計画する手法を提案する。中心となる考え方は、各発話をアクションとして扱うことで、強化学習のような既存のアプローチを適用することができるゲームに対話を変換することである。具体的には、事前訓練された言語モデルを凍結し、各ラウンドで制御された生成に使用される連続的な行動ベクトルを予測する小さなプランナーモデルを訓練する。この設計は、報酬最適化の下での言語劣化の問題を回避している。ソーシャルシミュレーションのためのSotopiaプラットフォーム上での評価では、DATステアリングされたLLaMAモデルがGPT-4の性能を上回っている。また, DATを用いて, 新たなマルチターン・リピート・セッティングにおいて, 攻撃言語モデルを操り, 潜在的に新たな攻撃面を明らかにする。 We present an approach called Dialogue Action Tokens (DAT) that adapts language model agents to plan goal-directed dialogues. The core idea is to treat each utterance as an action, thereby converting dialogues into games where existing approaches such as reinforcement learning can be applied. Specifically, we freeze a pretrained language model and train a small planner model that predicts a continuous action vector, used for controlled generation in each round. This design avoids the problem of language degradation under reward optimization. When evaluated on the Sotopia platform for social simulations, the DAT-steered LLaMA model surpasses GPT-4's performance. We also apply DAT to steer an attacker language model in a novel multi-turn red-teaming setting, revealing a potential new attack surface.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 二次元量子イジングモデルにおける制約力学と閉じ込め Constrained dynamics and confinement in the two-dimensional quantum Ising model ( http://arxiv.org/abs/2406.11979v1 ) ライセンス: Link先を確認	Luka Pavešić, Daniel Jaschke, Simone Montangero,	(参考訳) 量子イジングモデルの2次元正方格子上のダイナミクスを16進数16$スピンまで調べる。次相では, モデルが動的に制約されたダイナミックスを示すことが予測され, 基本励起の抑制と熱の緩やかな熱化が生じる。閉じ込めのシグネチャを実証した後, 対向磁化領域の積状態の急激なクエンチを通じて, 拘束状態における界面のダイナミクスを探索する。その結果, 励起の性質は, 凝縮系全体にわたって摂動理論によって捉えられ, 断裂系との交叉を識別できることがわかった。平面に沿って伝播するモードに対する横方向磁場の影響を系統的に検討し、より大きな格子に埋め込まれた2乗スピンの共振から拡散融解への交叉について検討する。 We investigate the dynamics of the quantum Ising model on two-dimensional square lattices up to $16 \times 16$ spins. In the ordered phase, the model is predicted to exhibit dynamically constrained dynamics, leading to confinement of elementary excitations and slow thermalization. After demonstrating the signatures of confinement, we probe the dynamics of interfaces in the constrained regime through sudden quenches of product states with domains of opposite magnetization. We find that the nature of excitations can be captured by perturbation theory throughout the confining regime, and identify the crossover to the deconfining regime. We systematically explore the effect of the transverse field on the modes propagating along flat interfaces and investigate the crossover from resonant to diffusive melting of a square of flipped spins embedded in a larger lattice.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 計算社会科学課題におけるプロンプト設計の課題 : 予測不可能な方法で Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways ( http://arxiv.org/abs/2406.11980v1 ) ライセンス: Link先を確認	Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, Libby Hemphill,	(参考訳) 計算社会科学のタスクに手動でアノテートするデータは、コストがかかり、時間がかかり、感情的に排水される可能性がある。最近の研究は、LCMがゼロショット設定でこのようなアノテーションタスクを実行できることを示唆しているが、設計がLCMのコンプライアンスと正確性にどのように影響するかは分かっていない。モデル選択(ChatGPT, PaLM2, Falcon7b)と設計特徴(定義包含, 出力タイプ, 説明, 即時長)が, 4つのCSSタスク(毒性, 感情, 噂姿勢, ニュースフレーム)におけるLCM生成アノテーションの適合性と正確性に与える影響を, 大規模マルチプロンプト実験により検証した。以上の結果から,LSMのコンプライアンスと精度は極めて素早い依存性があることが示唆された。例えば、ラベルの代わりに数値スコアを求めると、全てのLLMのコンプライアンスと精度が低下する。全体的な最高のプロンプト設定はタスク依存であり、マイナーなプロンプト変更は生成されたラベルの配布に大きな変更をもたらす可能性がある。迅速な設計がLLM生成アノテーションの品質と配布に大きな影響を与えることを示すことで、この研究は研究者や実践者にとって警告と実践のガイドとなる。 Manually annotating data for computational social science tasks can be costly, time-consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs' compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (ChatGPT, PaLM2, and Falcon7b) and prompt design features (definition inclusion, output type, explanation, and prompt length) impact the compliance and accuracy of LLM-generated annotations on four CSS tasks (toxicity, sentiment, rumor stance, and news frames). Our results show that LLM compliance and accuracy are highly prompt-dependent. For instance, prompting for numerical scores instead of labels reduces all LLMs' compliance and accuracy. The overall best prompting setup is task-dependent, and minor prompt changes can cause large changes in the distribution of generated labels. By showing that prompt design significantly impacts the quality and distribution of LLM-generated annotations, this work serves as both a warning and practical guide for researchers and practitioners.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# アクティブ推論を用いた複雑なタスクに対するオンラインパレート最適決定法 Online Pareto-Optimal Decision-Making for Complex Tasks using Active Inference ( http://arxiv.org/abs/2406.11984v1 ) ライセンス: Link先を確認	Peter Amorese, Shohei Wakayama, Nisar Ahmed, Morteza Lahijanian,	(参考訳) ロボットが複雑なタスクを自律的に実行する場合、安全を維持しながら競合する目標をバランスさせなければならない。これは確率的な結果を持つ不確実な環境ではより困難になる。また,ロボットの動作の透明性向上とユーザの好みの整合性も重要である。本稿では,タスク実行の安全性を確保し,目的間のトレードオフを最適化し,ユーザの嗜好に順応する多目的強化学習のための新しいフレームワークを提案する。フレームワークには、多目的タスクプランナとハイレベルセレクタの2つの主なレイヤがある。計画層は、時間論理タスクの満足度を保証するための最適なトレードオフ計画セットを生成する。セレクタはアクティブな推論を使用して、どの生成された計画がユーザの好みに最も適しているかを決定し、学習を支援する。反復的に運用するフレームワークは、収集データに基づいてパラメータ化された学習モデルを更新する。操作と移動ロボットのケーススタディとベンチマークは、我々のフレームワークが他の方法よりも優れていることを示している。 i)複数の最適なトレードオフを学習する (二)利用者の嗜好に固執し、 (三)利用者のバランス調整 (i)および (II)。 When a robot autonomously performs a complex task, it frequently must balance competing objectives while maintaining safety. This becomes more difficult in uncertain environments with stochastic outcomes. Enhancing transparency in the robot's behavior and aligning with user preferences are also crucial. This paper introduces a novel framework for multi-objective reinforcement learning that ensures safe task execution, optimizes trade-offs between objectives, and adheres to user preferences. The framework has two main layers: a multi-objective task planner and a high-level selector. The planning layer generates a set of optimal trade-off plans that guarantee satisfaction of a temporal logic task. The selector uses active inference to decide which generated plan best complies with user preferences and aids learning. Operating iteratively, the framework updates a parameterized learning model based on collected data. Case studies and benchmarks on both manipulation and mobile robots show that our framework outperforms other methods and (i) learns multiple optimal trade-offs, (ii) adheres to a user preference, and (iii) allows the user to adjust the balance between (i) and (ii).	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# テキスト・画像モデルにおける地理的格差の分解評価 Decomposed evaluations of geographic disparities in text-to-image models ( http://arxiv.org/abs/2406.11988v1 ) ライセンス: Link先を確認	Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall,	(参考訳) 近年の研究では、家や車といった日常の物体の立体的な描写を含む、異なる地理的領域の生成された画像において、かなりの差異が特定されている。しかし、これらの不一致に対する既存の対策は、時間と費用のかかる人間の評価に限られているか、あるいはフルイメージを評価する自動測定に限られており、これらの不一致は生成された画像の特定の部分に比例できない。本研究では,画像生成における対象と背景の描写における地理的差異を別々に計測することのできる,画像生成における特徴の分解指標(Decomposed Indicators of Disparities in Image Generation, Decomposed-DIG)を提案する。 Decomposed-DIGを用いて、広く使われている潜伏拡散モデルを評価し、生成した画像は背景よりも写実性の良い物体を描写し、生成した画像の背景は物体よりも地域差が大きい傾向があることを発見した。私たちはDecomposed-DIGを使って、アフリカのステレオタイプな背景生成、アフリカの近代的な車両の生成に苦労し、屋外設定にいくつかのオブジェクトを非現実的に配置するなど、相違点の具体例を特定します。測定値にインフォームされた新たなプロンプト構造を用いることで,52%の最低領域改善と,20%のバックグラウンドの多様性向上を実現している。 Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# ニューラルシーケンスモデルの遅延埋め込み理論 Delay Embedding Theory of Neural Sequence Models ( http://arxiv.org/abs/2406.11993v1 ) ライセンス: Link先を確認	Mitchell Ostrow, Adam Eisen, Ila Fiete,	(参考訳) コヒーレント応答を生成するために、言語モデルは入力されたテキストシーケンスから観測されていない意味を推測する。この能力の1つの潜在的説明は力学系における遅延埋め込みの理論から生じ、これは観測されていない変数が観測された少数の変数の歴史から復元可能であることを証明している。言語モデルが遅延埋め込みを効果的に構築しているかどうかをテストするために,シーケンスモデルの容量を測定し,観測されていないダイナミクスを再構築する。我々は、ノイズのある部分的に観測された時系列データから次のステップ予測に基づいて、1層トランスフォーマーデコーダと状態空間シーケンスモデルを訓練した。その結果、各シーケンス層は、基礎となるシステムの実行可能な埋め込みを学習できることがわかった。しかし、状態空間モデルはトランスよりも誘導バイアスが強く、特に初期化時に観測されていない情報を効果的に再構成し、パラメータ効率の良いモデルがより多くなり、動的タスクのエラーも小さくなる。そこで本研究は,遅延埋め込み理論による動的システムと深層学習シーケンスモデルとの新たな関係を定めている。 To generate coherent responses, language models infer unobserved meaning from their input text sequence. One potential explanation for this capability arises from theories of delay embeddings in dynamical systems, which prove that unobserved variables can be recovered from the history of only a handful of observed variables. To test whether language models are effectively constructing delay embeddings, we measure the capacities of sequence models to reconstruct unobserved dynamics. We trained 1-layer transformer decoders and state-space sequence models on next-step prediction from noisy, partially-observed time series data. We found that each sequence layer can learn a viable embedding of the underlying system. However, state-space models have a stronger inductive bias than transformers-in particular, they more effectively reconstruct unobserved information at initialization, leading to more parameter-efficient models and lower error on dynamics tasks. Our work thus forges a novel connection between dynamical systems and deep learning sequence models via delay embedding theory.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 入力のない2部交絡状態のワイトネスネットワークステアビリティ Witnessing network steerability of every bipartite entangled state without inputs ( http://arxiv.org/abs/2406.11994v1 ) ライセンス: Link先を確認	Shubhayan Sarkar,	(参考訳) 絡み合いと非局所性は、ほとんどの量子情報プロトコルの鍵となるリソースである。すべての絡み合った状態が半量子ゲームに関して非局所であることは、現在よく確立されている。しかし、あらゆる絡み合った状態のあらゆる形の非局所性を観察するのに使える目撃者が不足している。本研究では,入力を伴わないスワップステアリングのシナリオに注目し,計算可能なクロスノーム(CCN)基準に違反したNPTバイパーティイト状態と大量のバイパートライト状態に対応するネットワークステアビリティの線形証人を求める。さらに、信頼された当事者が入ってくるサブシステムのトモグラフィーを行うことができることを考慮し、各二部交絡状態のスワップステアビリティを目撃する線形不等式を構築する。したがって、全ての二分項絡み合った状態に対して、量子非局所性の形式を観測することができる。 Entanglement and nonlocality are key resources for most of the quantum information protocols. It is well-established now that every entangled state is nonlocal with respect to semi-quantum games. However, there is a lack of witnesses that can used to observe any form of nonlocality of every entangled state. In this work, we focus on the swap-steering scenario without inputs and find linear witnesses of network steerability corresponding to any negative partial transpose (NPT) bipartite state and a large class of bipartite states that violate the computable cross-norm (CCN) criterion. Furthermore, by considering that the trusted party can perform tomography of the incoming subsystems, we construct linear inequalities to witness swap-steerability of every bipartite entangled state. Consequently, for every bipartite entangled state one can now observe a form of quantum nonlocality.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 量子コンピューティングのゲーミフィケーションを探る:Qubit Factory Exploring Gamification in Quantum Computing: The Qubit Factory ( http://arxiv.org/abs/2406.11995v1 ) ライセンス: Link先を確認	Glen Evenbly,	(参考訳) 量子論のゲーミフィケーションは、ユーザーが明らかに量子的な振る舞いを示すシミュレーションされた世界を体験できるようにすることで、量子現象の直観を構築することができる。 Qubit Factory(クビットファクトリー)は、量子ビットと量子コンピューティングの入門を提供するために設計された、ゲーミフィケーション量子回路シミュレータに基づくエンジニアリングスタイルのパズルゲームである。量子状態、ゲート、回路を表現するための直感的な視覚言語を導入し、可視化を支援するアニメーションによってさらに強化された。 Qubit Factoryは、ユーザが解決するのがますます難しいタスクの階層を示しており、各タスクは、少数のコンポーネントから構築された適切な古典/量子回路を構築し、実行する必要がある。初期のタスクは、量子ビット、量子ゲート、重ね合わせ、絡み合いの基本をカバーしていた。その後のタスクでは、超高密度符号化、量子テレポーテーション、絡み合った蒸留、古典的および量子的誤り補正、状態トモグラフィー、バーンスタイン・ヴァジラニアルゴリズム、量子リピータなどの重要な量子アルゴリズムとプロトコルをカバーしている。 Gamification of quantum theory can provide new inroads into the subject: by allowing users to experience simulated worlds that manifest obvious quantum behaviors they can potentially build intuition for quantum phenomena. The Qubit Factory is an engineering-style puzzle game based on a gamified quantum circuit simulator that is designed to provide an introduction to qubits and quantum computing, while being approachable to those with no prior background in the area. It introduces an intuitive visual language for representing quantum states, gates and circuits, further enhanced by animations to aid in visualization. The Qubit Factory presents a hierarchy of increasingly difficult tasks for the user to solve, where each task requires the user to construct and run an appropriate classical/quantum circuit built from a small selection of components. Earlier tasks cover the fundamentals of qubits, quantum gates, superpositions and entanglement. Later tasks cover important quantum algorithms and protocols including superdense coding, quantum teleportation, entanglement distillation, classical and quantum error correction, state tomography, the Bernstein-Vazirani algorithm, quantum repeaters and more.	翻訳日:2024-06-20 00:26:41 公開日:2024-06-17
# 今後の展望:経路計画におけるGPT-4の限界試験 Look Further Ahead: Testing the Limits of GPT-4 in Path Planning ( http://arxiv.org/abs/2406.12000v1 ) ライセンス: Link先を確認	Mohamed Aghzal, Erion Plaku, Ziyu Yao,	(参考訳) 大きな言語モデル(LLM)は、様々なタスクで印象的な機能を示している。しかし、長期計画では依然として課題に直面している。そこで本研究では,LLMの長い軌道を幾何的制約の下でナビゲートする能力を評価するためのプラットフォームとして,経路計画タスクを提案する。提案するベンチマークは,複雑な環境でのパス計画スキルを体系的にテストする。これを用いて, GPT-4の様々なタスク表現とプロンプトアプローチを用いて, 計画能力について検討した。フレーミングはPythonのコードとして促進され、長い軌道上のタスクを分解することで、GPT-4の経路計画の有効性が向上することがわかった。しかしながら、これらの手法はモデルの計画能力向上へのいくつかの期待を示すが、最適経路は得られず、拡張された地平線上での一般化に失敗する。 Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks. However, they still face challenges with long-horizon planning. To study this, we propose path planning tasks as a platform to evaluate LLMs' ability to navigate long trajectories under geometric constraints. Our proposed benchmark systematically tests path-planning skills in complex settings. Using this, we examined GPT-4's planning abilities using various task representations and prompting approaches. We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness. However, while these approaches show some promise toward improving the planning ability of the model, they do not obtain optimal paths and fail at generalizing over extended horizons.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# 疫学のモビリティに基づく比較モデルにおけるモデリング・推論・予測 Modeling, Inference, and Prediction in Mobility-Based Compartmental Models for Epidemiology ( http://arxiv.org/abs/2406.12002v1 ) ライセンス: Link先を確認	Ning Jiang, Weiqi Chu, Yao Li,	(参考訳) 疫学における古典的な区画モデルは、人口の固有の異質性に対処できないために、現実世界のダイナミクスを正確に捉えるのに苦労することが多い。本稿では,移動変数による不均一性を取り入れた新しい手法を導入し,従来のODE系を,異なる区画にまたがる人口密度の動態を記述する積分微分方程式系に変換する。以上の結果から, 人口密度がディラックデルタ関数として表現される古典的コンパートメントモデルと比較して, 移動量に基づくモデルでは, 最終パンデミックサイズが小さいことが示唆された。これは、多くの古典的モデルに共通する過大評価問題に対処する。さらに,感染集団の時系列は,移動分布を一意に同定するのに十分であることを示した。我々は,この分布を機械学習ベースのフレームワークを用いて再構築し,実世界のデータによる移動性に基づくモデルを効果的に制約する理論的およびアルゴリズム的サポートを提供する。 Classical compartmental models in epidemiology often struggle to accurately capture real-world dynamics due to their inability to address the inherent heterogeneity of populations. In this paper, we introduce a novel approach that incorporates heterogeneity through a mobility variable, transforming the traditional ODE system into a system of integro-differential equations that describe the dynamics of population densities across different compartments. Our results show that, for the same basic reproduction number, our mobility-based model predicts a smaller final pandemic size compared to classic compartmental models, whose population densities are represented as Dirac delta functions in our density-based framework. This addresses the overestimation issue common in many classical models. Additionally, we demonstrate that the time series of the infected population is sufficient to uniquely identify the mobility distribution. We reconstruct this distribution using a machine-learning-based framework, providing both theoretical and algorithmic support to effectively constrain the mobility-based model with real-world data.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# P3GNN: ソフトウェア定義ネットワークにおけるAPT検出のためのプライバシ保護プロバンスグラフベースモデル P3GNN: A Privacy-Preserving Provenance Graph-Based Model for APT Detection in Software Defined Networking ( http://arxiv.org/abs/2406.12003v1 ) ライセンス: Link先を確認	Hedyeh Nazari, Abbas Yazdinejad, Ali Dehghantanha, Fattane Zarrinkalam, Gautam Srivastava,	(参考訳) Software Defined Networking (SDN)は、ネットワーク管理とプログラム可能性に大きな進歩をもたらした。しかし、この進化はAdvanced Persistent Threats (APTs) の脆弱性も高めており、特にゼロデイエクスプロイト(英語版)に直面した場合、従来の検出方法がしばしば対応できない、洗練された、ステルス的なサイバー攻撃が起きている。一般的な問題は、協調学習シナリオにおけるデータプライバシの懸念に対処しながら、新たな脅威を検出する既存の戦略が不十分であることだ。本稿では,P3GNN(プライバシ保存グラフベースグラフニューラルネットワークモデル)を提案する。これはSDN環境で効果的なAPT検出のために,フェデレートラーニング(FL)とグラフ畳み込みネットワーク(GCN)を併用する新しいモデルである。 P3GNNは教師なし学習を利用して、プロファイランスグラフ内の運用パターンを分析し、セキュリティ違反を示す偏差を識別する。その中核となる機能は、FLと同型暗号化の統合であり、コラボレーティブラーニング時のデータの機密性や整合性を強化している。このアプローチは、共有学習コンテキストにおけるデータのプライバシに関する重要な課題に対処する。 P3GNNの主なイノベーションは、前兆グラフ内のノードレベルで異常を検出する機能、攻撃軌跡の詳細なビューの提供、セキュリティ解析の強化である。さらに、教師なし学習能力により、標準的な運用パターンを学習することで、ゼロデイ攻撃を識別できる。 DARPA TCE3データセットを用いた実験的な評価は、P3GNNの例外的な性能を示し、精度は0.93、偽陽性率は0.06である。 Software Defined Networking (SDN) has brought significant advancements in network management and programmability. However, this evolution has also heightened vulnerability to Advanced Persistent Threats (APTs), sophisticated and stealthy cyberattacks that traditional detection methods often fail to counter, especially in the face of zero-day exploits. A prevalent issue is the inadequacy of existing strategies to detect novel threats while addressing data privacy concerns in collaborative learning scenarios. This paper presents P3GNN (privacy-preserving provenance graph-based graph neural network model), a novel model that synergizes Federated Learning (FL) with Graph Convolutional Networks (GCN) for effective APT detection in SDN environments. P3GNN utilizes unsupervised learning to analyze operational patterns within provenance graphs, identifying deviations indicative of security breaches. Its core feature is the integration of FL with homomorphic encryption, which fortifies data confidentiality and gradient integrity during collaborative learning. This approach addresses the critical challenge of data privacy in shared learning contexts. Key innovations of P3GNN include its ability to detect anomalies at the node level within provenance graphs, offering a detailed view of attack trajectories and enhancing security analysis. Furthermore, the models unsupervised learning capability enables it to identify zero-day attacks by learning standard operational patterns. Empirical evaluation using the DARPA TCE3 dataset demonstrates P3GNNs exceptional performance, achieving an accuracy of 0.93 and a low false positive rate of 0.06.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# Lexidate: モデル評価とLexicaseによる選択 Lexidate: Model Evaluation and Selection with Lexicase ( http://arxiv.org/abs/2406.12006v1 ) ライセンス: Link先を確認	Jose Guadalupe Hernandez, Anil Kumar Saini, Jason H. Moore,	(参考訳) 機械学習の自動化は、モデルトレーニング、評価、選択を自動化することによって、効果的な機械学習パイプラインを見つけるタスクを合理化する。クロスバリデーション(CV)のような従来の評価戦略は、パイプラインの予測の精度を平均する1つの値を生成する。しかし、この単一の値はパイプラインの一般化可能性を完全に記述していないかもしれない。本稿では,複数の独立予測値を用いたレキシケードに基づく検証手法(レキシケート)を提案する。 Lexidateはトレーニングデータを学習セットと選択セットに分割する。パイプラインは学習セットでトレーニングされ、選択セットで予測される。予測は正確性のために評価され、親パイプラインを識別するために語彙選択によって使用される。 10倍のCVと比較して、レキシケースはトレーニング時間を短縮する。 6つのOpenML分類タスクに対して,Tree-based Pipeline Optimization Tool 2 (TPOT2)パッケージ内の3つの語彙構成の有効性を検証した。 1つの構成では, TPOT2から返される最終モデルの精度は10倍CVに比べ, 差は認められなかった。ここで研究されたすべての構成は、10倍のCVと比較すると、ほぼ同じまたはより複雑な最終パイプラインを返した。 Automated machine learning streamlines the task of finding effective machine learning pipelines by automating model training, evaluation, and selection. Traditional evaluation strategies, like cross-validation (CV), generate one value that averages the accuracy of a pipeline's predictions. This single value, however, may not fully describe the generalizability of the pipeline. Here, we present Lexicase-based Validation (lexidate), a method that uses multiple, independent prediction values for selection. Lexidate splits training data into a learning set and a selection set. Pipelines are trained on the learning set and make predictions on the selection set. The predictions are graded for correctness and used by lexicase selection to identify parent pipelines. Compared to 10-fold CV, lexicase reduces the training time. We test the effectiveness of three lexidate configurations within the Tree-based Pipeline Optimization Tool 2 (TPOT2) package on six OpenML classification tasks. In one configuration, we detected no difference in the accuracy of the final model returned from TPOT2 on most tasks compared to 10-fold CV. All configurations studied here returned similar or less complex final pipelines compared to 10-fold CV.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# トラップイオン量子プロセッサによる小型ディジット画像の2値化 Supervised binary classification of small-scale digits images with a trapped-ion quantum processor ( http://arxiv.org/abs/2406.12007v1 ) ライセンス: Link先を確認	Ilia V. Zalivako, Alexander I. Gircha, Anastasiia S. Nikolaeva, Denis A. Drozhzhin, Alexander S. Borisenko, Andrei E. Korolkov, Nikita V. Semenin, Kristina P. Galstyan, Pavel A. Kamenskikh, Vasilii N. Smirnov, Mikhail A. Aksenov, Pavel L. Sidorov, Evgeniy O. Kiktenko, Ksenia Yu. Khabarova, Aleksey K. Fedorov, Nikolay N. Kolachevsky, Ilya A. Semerikov,	(参考訳) 本稿では、基本量子機械学習アルゴリズムを実行することにより、捕捉された$^{171}$Yb$^{+}$イオンに基づく量子プロセッサのベンチマーク結果を示す。具体的には、最大4キュービットの量子化されたサポートベクトルマシンアルゴリズムを用いて、100%精度で分類できるように意図的に選択された小型桁画像の教師付きバイナリ分類を行う。本研究では、データセットの異なる種類の量子符号化と、対応する量子回路に対する異なるレベルのトランスパイル最適化について検討する。各量子符号化において、トレーニングセットとテストセットの両方で100%精度の高い分類器が得られ、量子プロセッサが考慮された基本的な分類タスクを正しく解くことができることを示す。期待通り、量子プロセッサの能力が向上すれば、機械学習の有用なツールになるでしょう。 Here we present the results of benchmarking of a quantum processor based on trapped $^{171}$Yb$^{+}$ ions by performing basic quantum machine learning algorithms. Specifically, we carry out a supervised binary classification of small-scale digits images, which are intentionally chosen so that they can be classified with 100% accuracy, using a quantum-enhanced Support Vector Machine algorithm with up to four qubits. In our work, we specifically consider different types of quantum encodings of the dataset and different levels of transpilation optimizations for the corresponding quantum circuits. For each quantum encoding, we obtain a classifier that is of 100% accuracy on both training and test sets, which demonstrates that the quantum processor can correctly solve the basic classification task considered. As we expect, with the increase of the capabilities quantum processors, they can become a useful tool for machine learning.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# QC-Forest: ランダムフォレストの再トレーニングを高速化する古典的量子アルゴリズム QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest ( http://arxiv.org/abs/2406.12008v1 ) ライセンス: Link先を確認	Romina Yalovetzky, Niran Kumar, Changhao Li, Marco Pistoia,	(参考訳) ランダムフォレスト(Random Forest, RF)は、教師あり学習法として人気があり、使いやすさと柔軟性で評価されている。オンラインRFモデルは、モデルの精度を維持するために、新しいトレーニングデータを考慮する必要がある。これは、自動運転システムやクレジットカード支払いのようなデータストリームにおいて、データが周期的に、連続的に生成されるアプリケーションにおいて特に重要である。この設定では、時間とともにデータ分布のドリフトが完全に捕捉されるので、古いデータと新しいデータが蓄積された周期的モデルの再トレーニングを行うのが有益である。しかし、これは、蓄積されたサンプル数と線形にスケールするため、RFの最先端の古典的アルゴリズムでは実用的ではない。 QC-Forestは,マルチクラス分類と回帰のためのストリーミング設定において,RFモデルを時間効率よく再学習するように設計された古典量子アルゴリズムである。 QC-Forestは、Kumarらによって提案された単一木構築と再訓練のための量子アルゴリズムであるDes-qを活用し、元の提案はバイナリクラスに限定されていたため、マルチクラス分類に拡張し、同じ多対数依存を維持しながら、基礎となる量子サブルーチンを有限エラーに置き換える正確な古典的な方法を導入した。最後に、QC-Forestは、最大80,000のサンプルを持つ広く使用されているベンチマークデータセットの最先端RF手法と比較して、競合精度を向上し、モデル再トレーニングを大幅に高速化することを示した。 Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications were data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# FinTruthQA:財務情報開示の品質評価のためのベンチマークデータセット FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure ( http://arxiv.org/abs/2406.12009v1 ) ライセンス: Link先を確認	Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Bin Ke, Jie Yang,	(参考訳) 正確な透明性のある財務情報開示は、会計と金融の分野で重要であり、市場効率と投資家の信頼を確実にする。多くの情報開示プラットフォームの中で、中国証券取引所の投資家インタラクティブプラットフォームは、オンラインのQ&Aフォーマットを通じて、上場企業が投資家に興味のある情報を開示するための、新しくてインタラクティブな方法を提供する。しかし、上場企業では、限定的あるいは実質的な情報のない質問に回答することが一般的であり、大量のQ&A対の財務情報開示の質を自動評価することは困難である。本稿では、金融Q&Aデータにおける情報開示の自動品質評価のための高度な自然言語処理(NLP)技術を評価するためのベンチマークFinTruthQAを構築する。 FinTruthQAは6000の現実世界の財務Q&Aエントリで構成され、各Q&Aは4つの概念的側面に基づいて手動で注釈付けされた。我々は、FinTruthQA上で、統計的機械学習モデル、事前学習言語モデルとその微調整バージョン、および大規模言語モデルGPT-4を含む様々なNLPテクニックをベンチマークした。実験の結果,既存のNLPモデルは質問認識や質問関連タスクに強い予測能力を持つが,回答関連性や回答可読性タスクには最適であることがわかった。このベンチマークを確立することで、情報開示の自動評価のための堅牢な基盤を提供し、財務報告の透明性と品質を大幅に向上させる。 FinTruthQAは監査人、規制当局、金融アナリストがリアルタイム監視やデータ駆動意思決定に利用でき、また会計と金融の高度な研究のための研究者も利用でき、最終的には金融市場の信頼と効率を高めることができる。 Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-answer (Q&A) format. However, it is common for listed firms to respond to questions with limited or no substantive information, and automatically evaluating the quality of financial information disclosure on large amounts of Q&A pairs is challenging. This paper builds a benchmark FinTruthQA, that can evaluate advanced natural language processing (NLP) techniques for the automatic quality assessment of information disclosure in financial Q&A data. FinTruthQA comprises 6,000 real-world financial Q&A entries and each Q&A was manually annotated based on four conceptual dimensions of accounting. We benchmarked various NLP techniques on FinTruthQA, including statistical machine learning models, pre-trained language model and their fine-tuned versions, as well as the large language model GPT-4. Experiments showed that existing NLP models have strong predictive ability for real question identification and question relevance tasks, but are suboptimal for answer relevance and answer readability tasks. By establishing this benchmark, we provide a robust foundation for the automatic evaluation of information disclosure, significantly enhancing the transparency and quality of financial reporting. FinTruthQA can be used by auditors, regulators, and financial analysts for real-time monitoring and data-driven decision-making, as well as by researchers for advanced studies in accounting and finance, ultimately fostering greater trust and efficiency in the financial markets.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# AIフェアネスのためのトランスダクティブアプローチのメリットとリスク The Benefits and Risks of Transductive Approaches for AI Fairness ( http://arxiv.org/abs/2406.12011v1 ) ライセンス: Link先を確認	Muhammed Razzak, Andreas Kirsch, Yarin Gal,	(参考訳) 近年,機械学習モデルの速度,精度,公平性を向上する可能性から,学習中にホールドアウトセットを利用するトランスダクティブ学習法が人気を集めている。それにもかかわらず、ホールドアウト集合の構成そのもの、特に敏感な部分群のバランスは、ほとんど見過ごされてきている。 CIFARとCelebAデータセットを用いた実験により、ホールドアウトセットの組成変化がフェアネス指標に大きく影響することが示された。不均衡なホールトアウトセットは、既存の格差を悪化させ、バランスの取れたホールトアウトは、不均衡なトレーニングデータによってもたらされる問題を緩和する。これらの知見は,多様かつ代表的であるホールドアウトセットの構築の必要性を浮き彫りにしている。 Recently, transductive learning methods, which leverage holdout sets during training, have gained popularity for their potential to improve speed, accuracy, and fairness in machine learning models. Despite this, the composition of the holdout set itself, particularly the balance of sensitive sub-groups, has been largely overlooked. Our experiments on CIFAR and CelebA datasets show that compositional changes in the holdout set can substantially influence fairness metrics. Imbalanced holdout sets exacerbate existing disparities, while balanced holdouts can mitigate issues introduced by imbalanced training data. These findings underline the necessity of constructing holdout sets that are both diverse and representative.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# 注意シンクによる大規模言語モデル量子化のためのアクティベーションアウトレイラの緩和 Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization ( http://arxiv.org/abs/2406.12016v1 ) ライセンス: Link先を確認	Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee,	(参考訳) LLM量子化の最近の進歩にもかかわらず、アクティベーション量子化は、アクティベーションアウトレーヤのために困難である。従来の改善、例えば、異なるチャネルの精度の混合、追加のオーバーヘッドの導入、スピードアップの削減。本研究では,問題トークンの発生を防止し,アクティベーション単位の量子化を促進するための簡易かつ効果的な戦略を開発する。正確には、プレフィックスとして挿入された後続のトークンの外部化を緩和するCushionCacheという、キー値キャッシュのセットを見つける方法を提案する。 CushionCacheは2つのステップで動作します。まず最初に、後続のトークンにおける最大アクティベーション値を最小限に抑えるプロンプトトークンシーケンスを探します。次に、トークンキャッシュを調整して、その後のトークンのアクティベーションを、より量子化しやすいように調整する。提案手法は, LLMのアクティベーション・アウトレイラに対処し, アクティベーション・量子化法の性能向上に寄与する。我々は,この手法を広範囲のモデルとベンチマークで徹底的に評価し,拡張子ごとのW8A8量子化の確立されたベースラインをはるかに上回り,最近のアクティベーション量子化法とシームレスに統合できることを見出した。 Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tokens. Precisely, we propose a method to find a set of key-value cache, coined CushionCache, which mitigates outliers in subsequent tokens when inserted as a prefix. CushionCache works in two steps: First, we greedily search for a prompt token sequence that minimizes the maximum activation values in subsequent tokens. Then, we further tune the token cache to regularize the activations of subsequent tokens to be more quantization-friendly. The proposed method successfully addresses activation outliers of LLMs, providing a substantial performance boost for per-tensor activation quantization methods. We thoroughly evaluate our method over a wide range of models and benchmarks and find that it significantly surpasses the established baseline of per-tensor W8A8 quantization and can be seamlessly integrated with the recent activation quantization method.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# スプライシングイテレーションによる空間制約最適化 Sparsity-Constraint Optimization via Splicing Iteration ( http://arxiv.org/abs/2406.12017v1 ) ライセンス: Link先を確認	Zezhi Wang, Jin Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang,	(参考訳) スパシティ制約最適化は、信号処理、統計処理、機械学習に広く適用可能である。既存の高速アルゴリズムは、ステップサイズや正確な停止基準の実装などのパラメータをかなり調整しなければならない。この問題に対処するため、sPlicing itEration (SCOPE) を用いて、低次元部分空間における強い凸性と滑らか性を持つ非線形微分対象関数を最適化するスペーサ性制約最適化アルゴリズムを開発した。アルゴリズム的には、SCOPEアルゴリズムはパラメータをチューニングせずに効率的に収束する。理論的には、SCOPE は線型収束率を持ち、空間が正しく特定されたときに真の支持集合を回復する解に収束する。また,制約等距離-固有型条件を伴わない並列理論結果も開発する。 SCOPEの汎用性とパワーを適用し、スパース2次最適化を解き、スパース分類器を学習し、バイナリ変数のスパースマルコフネットワークを復元する。これらの特定のタスクに関する数値結果から、SCOPEは10-1000の精度で真のサポートセットを正しく識別し、SCOPEのアルゴリズム的および理論的利点を確認する。 C++実装に基づいたオープンソースのPythonパッケージのskscopeがGitHubで公開されており、cvxpyライブラリによって実装された競合する凸緩和メソッドの10倍のスピードアップに達しています。 Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE) to optimize nonlinear differential objective functions with strong convexity and smoothness in low dimensional subspaces. Algorithmically, the SCOPE algorithm converges effectively without tuning parameters. Theoretically, SCOPE has a linear convergence rate and converges to a solution that recovers the true support set when it correctly specifies the sparsity. We also develop parallel theoretical results without restricted-isometry-property-type conditions. We apply SCOPE's versatility and power to solve sparse quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. The numerical results on these specific tasks reveal that SCOPE perfectly identifies the true support set with a 10--1000 speedup over the standard exact solver, confirming SCOPE's algorithmic and theoretical merits. Our open-source Python package skscope based on C++ implementation is publicly available on GitHub, reaching a ten-fold speedup on the competing convex relaxation methods implemented by the cvxpy library.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# CItruS:ロングシーケンスモデリングのためのチャンクインストラクション対応状態推定 CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling ( http://arxiv.org/abs/2406.12018v1 ) ライセンス: Link先を確認	Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung,	(参考訳) 大規模言語モデル(LLM)が進歩し続け、長いシーケンスモデリングが広く関心を集めている。近年の研究では、Transformerモデルのキー値キャッシュ内の隠れ状態の大部分を、長いシーケンスを生成する際のパープレキシティのパフォーマンスに影響を与えることなく、破棄(除去)することができることが確認されている。しかし,これらの手法は,難易度を保ちながら,下流課題の解決に重要な情報を降ろすことがしばしばある。この問題に対処するために、下流タスクに有用な注目度を隠蔽状態の消去プロセスに統合する新しいモデリング手法であるChunked Instruction-Aware State Eviction (CItruS)を紹介する。さらに,チャンクシーケンス処理の効率向上のための手法を設計する。トレーニング不要な手法は,言語モデリングの難易度を保ちながら,同じメモリ予算の下で,複数の強いベースライン上での長いシーケンス理解および検索タスクにおいて優れた性能を示す。 Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexity performance, often drop information that is important for solving downstream tasks, a problem which we call information neglect. To address this issue, we introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. In addition, we design a method for chunked sequence processing to further improve efficiency. Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget, while preserving language modeling perplexity.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# 暗号化されたワイヤレス電力をハッキング:ダイナミック充電のサイバーセキュリティ Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging ( http://arxiv.org/abs/2406.12019v1 ) ライセンス: Link先を確認	Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz,	(参考訳) 近年,無線電力伝送のためのエネルギー暗号化が開発されており,公共の場所では未許可のエネルギー抽出を抑制することが重要である。ほとんどの技術は周波数に変化があり、非共鳴のため、許可されていない受信機はエネルギーを抽出できない。しかし、この戦略は信頼できない。エネルギー暗号化技術の進歩を刺激し,セキュリティホールを指摘するために,暗号化周波数可変無線電力伝送の基本原理に対する復号法を提案する。本論文は、補助コイルを用いて周波数を検出するとともに、スイッチトキャパシタアレイを用いて広い周波数範囲の受信機を適応的に補償する。スイッチングキャパシタアレイは、2つのキャパシタと1つのセミコンダクタスイッチを含む。 1つのコンデンサは受信機を常に補償し、もう1つの無線電力伝達サイクル中のアクティブな時間はスイッチによって制御される。このようにして、提案したハッキング受信機は、補償の等価容量を制御し、エネルギーを盗む。最後に、詳細なシミュレーションモデルと実験結果により、周波数ホッピングエネルギー暗号化に対する攻撃の有効性が証明された。抽出された非無視エネルギーは問題となるだろうが、認証された受信機が得るエネルギーの78%から84%を盗むことに成功した。周波数が変わると、インターセプターは非常に急速に調整され、高速な周波数変化暗号化システムにハックされる。 Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out security holes, this paper proposes a decryption method for the fundamental principle of encrypted frequency-varying wireless power transfer. The paper uses an auxiliary coil to detect the frequency and a switched-capacitor array to adaptively compensate the receiver for a wide frequency range. The switched-capacitor array contains two capacitors and one semi-conductor switch. One capacitor compensates the receiver all the time while the other's active time during one wireless power transfer cycle is regulated by the switch. Thus, the proposed hacking receiver controls the equivalent capacitance of the compensation and steals energy. Finally, a detailed simulation model and experimental results prove the effectiveness of the attack on frequency-hopping energy encryption. Although any nonnegligible energy extracted would be problematic, we achieved to steal 78% to 84% of the energy an authorized receiver could get. When the frequency changes, the interceptor is coarsely tuned very quickly, which can hack fast frequency-varying encrypted system.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# Boxがタグ認識のレコメンデーションでグラフニューラルネットワークと出会う When Box Meets Graph Neural Network in Tag-aware Recommendation ( http://arxiv.org/abs/2406.12020v1 ) ライセンス: Link先を確認	Fake Lin, Ziwei Zhao, Xi Zhu, Da Zhang, Shitian Shen, Xueying Li, Tong Xu, Suojuan Zhang, Enhong Chen,	(参考訳) 昨年は、LLM強化タグがサポートしているタグ対応レコメンデーションシステムの再構築を目撃している。残念ながら、大きな努力がなされているが、現在のソリューションでは、タグ駆動プロファイルのみを使用して、ユーザの好みに固有の多様性と不確実性を記述できない可能性がある。近年, ボックス埋め込みなどの幾何学的手法の開発により, 高次元空間におけるボックス内の範囲として, ユーザの好みの多様性を完全にモデル化できるようになった。しかし、これらの手法は高階隣の信号、すなわちユーザ・タグ・イテム三部グラフ内のセマンティック・リッチなマルチホップ関係をキャプチャできないため、欠陥は依然として存在し、ユーザ・モデリングの有効性は著しく制限される。この課題に対処するため、我々はBoxGNNと呼ばれる新しいアルゴリズムを提案し、論理演算を組み合わせてメッセージアグリゲーションを行い、高次信号を組み込む。具体的には、まずユーザ、アイテム、タグを表現空間の単純なポイントではなくハイパーボックスとして埋め込み、その後のプロセスを促進するために2つの論理演算を定義する。次に、論理演算の組み合わせによりメッセージ集約機構を実行し、対応する高階ボックス表現を得る。最後に,ボックスの表現を洗練させるために,Gumbelスムース化技術を用いたボリュームベース学習手法を採用する。 2つの公開データセットと1つのLLM強化eコマースデータセットに関する大規模な実験は、さまざまな最先端ベースラインと比較してBoxGNNの優位性を検証した。コードはオンラインでリリースされます Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embedding, diversity of user preferences now could be fully modeled as the range within a box in high dimension space. However, defect still exists as these approaches are incapable of capturing high-order neighbor signals, i.e., semantic-rich multi-hop relations within the user-tag-item tripartite graph, which severely limits the effectiveness of user modeling. To deal with this challenge, in this paper, we propose a novel algorithm, called BoxGNN, to perform the message aggregation via combination of logical operations, thereby incorporating high-order signals. Specifically, we first embed users, items, and tags as hyper-boxes rather than simple points in the representation space, and define two logical operations to facilitate the subsequent process. Next, we perform the message aggregation mechanism via the combination of logical operations, to obtain the corresponding high-order box representations. Finally, we adopt a volume-based learning objective with Gumbel smoothing techniques to refine the representation of boxes. Extensive experiments on two publicly available datasets and one LLM-enhanced e-commerce dataset have validated the superiority of BoxGNN compared with various state-of-the-art baselines. The code is released online	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# 強化学習によるアセストラル組換えグラフの構築 Constructing Ancestral Recombination Graphs through Reinforcement Learning ( http://arxiv.org/abs/2406.12022v1 ) ライセンス: Link先を確認	Mélanie Raymond, Marie-Hélène Descary, Cédric Beaulac, Fabrice Larribe,	(参考訳) 長年にわたり、祖先の組換えグラフ(ARG)を構築するために多くのアプローチが提案されてきた。これらの方法の中で、最も可能性の高いグラフが最短であるという仮定に頼っているものが多い。本稿では,短いARG(Reinforcement Learning: RL)を構築するための新しいアプローチを提案する。我々は,一組の遺伝的配列とそれらの最も最近の共通の祖先の最も短い経路を見つけることと,迷路の入り口と出口の間の最も短い経路を見つけることと,古典的なRL問題との類似性を生かした。迷路問題では、学習者はエージェントと呼ばれ、できるだけ早く脱出するために取るべき方向を学ばなければならないが、この問題では、エージェントはできるだけ早く最新の共通の祖先に到達するために、合理化、突然変異、組換えの間の行動を学ぶ必要がある。以上の結果から,RLは短いARGを構築するために最適化されたヒューリスティックアルゴリズムで構築されたARGと同等に短時間で構築できることが示唆された。さらに,本手法では,与えられたサンプルに対して短いARGの分布を構築することができ,学習プロセス中に使用されていない新しいサンプルに学習を一般化することができる。 Over the years, many approaches have been proposed to build ancestral recombination graphs (ARGs), graphs used to represent the genetic relationship between individuals. Among these methods, many rely on the assumption that the most likely graph is among the shortest ones. In this paper, we propose a new approach to build short ARGs: Reinforcement Learning (RL). We exploit the similarities between finding the shortest path between a set of genetic sequences and their most recent common ancestor and finding the shortest path between the entrance and exit of a maze, a classic RL problem. In the maze problem, the learner, called the agent, must learn the directions to take in order to escape as quickly as possible, whereas in our problem, the agent must learn the actions to take between coalescence, mutation, and recombination in order to reach the most recent common ancestor as quickly as possible. Our results show that RL can be used to build ARGs as short as those built with a heuristic algorithm optimized to build short ARGs, and sometimes even shorter. Moreover, our method allows to build a distribution of short ARGs for a given sample, and can also generalize learning to new samples not used during the learning process.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# LiLiuM:eBayのeコマースのための大規模言語モデル LiLiuM: eBay's Large Language Models for e-commerce ( http://arxiv.org/abs/2406.12023v1 ) ライセンス: Link先を確認	Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel Petrushkov, Pierre-Yves Vandenbussche, Shahram Khadivi,	(参考訳) 1B、7B、13Bパラメータモデルは、eBayのeコマース領域における特定のニーズに合うように、100%社内で開発された。これにより、eBayは、ライセンス、データ、語彙、アーキテクチャを含むモデルのすべての側面を完全にコントロールできる。これらのモデルは、細調整と命令チューニングの基礎として使われ、外部モデルへの依存をなくすことを期待しています。 LiLiuM LLMは、一般およびeコマースドメインから3兆個の多言語テキストのトークンで訓練されている。それらは、英語の自然言語理解(NLU)ベンチマークで人気のあるLLaMA-2モデルと似ている。同時に、非英語NLUタスク、機械翻訳、電子商取引特化下流タスクにおいてLLaMA-2を上回ります。データミックスの一部として、新たにリリースされたRedPajama-V2データセットを使用して、データのフィルタリングと重複に関する洞察を共有します。また,自己回帰言語モデリングにおける構造化データのシリアライズ方法についても詳細に論じる。事前学習におけるコードと並列機械翻訳データの影響について考察する。さらに,電子商取引用にカスタマイズされた独自のトークンとモデル語彙を開発する。これにより、LLaMA-2と比較してeBay固有のダウンストリームタスクでテキスト生成を最大34%高速化できます。最後に,LLM事前学習に関して,最良個々人のチェックポイントよりも,チェックポイント平均化がさらに向上することを示す。 We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating dependencies to external models. The LiLiuM LLMs have been trained on 3 trillion tokens of multilingual text from general and e-commerce domain. They perform similar to the popular LLaMA-2 models on English natural language understanding (NLU) benchmarks. At the same time, we outperform LLaMA-2 on non-English NLU tasks, machine translation and on e-commerce specific downstream tasks. As part of our data mixture, we utilize the newly released RedPajama-V2 dataset for training and share our insights regarding data filtering and deduplication. We also discuss in detail how to serialize structured data for use in autoregressive language modeling. We provide insights on the effects of including code and parallel machine translation data in pre-training. Furthermore, we develop our own tokenizer and model vocabulary, customized towards e-commerce. This way, we can achieve up to 34% speed-up in text generation on eBay-specific downstream tasks compared to LLaMA-2. Finally, in relation to LLM pretraining, we show that checkpoint averaging can further improve over the best individual model checkpoint.	翻訳日:2024-06-20 00:16:57 公開日:2024-06-17
# 技術的負債の進展予測と予測に関する体系的文献レビュー Systematic literature review on forecasting and prediction of technical debt evolution ( http://arxiv.org/abs/2406.12026v1 ) ライセンス: Link先を確認	Adekunle Ajibode, Yvon Apedo, Temitope Ajibode,	(参考訳) コンテキスト: 技術的負債(TD)とは、ソフトウェア品質の妥協によって生じる追加コストを指し、開発期間中に短期的なメリットを提供するが、長期的な品質を損なう可能性がある。正確なTD予測と予測は、インフォメーションソフトウェア保守と積極的管理に不可欠である。しかし、この研究領域では、利用可能な予測技術に関する包括的な資料が欠落している。目的:本研究は、TD進化を予測するために、ソフトウェア工学における既存の知識を探求し、研究と産業で提案されるアプローチの洞察を得ることを目的としている。方法: この目的を達成するため, 2023年までの646の異なる論文を網羅した体系的文献レビューを行った。ソフトウェア工学の確立された方法論に従って、分析のための14の主研究を特定し、含めた。結果: この分析からTD進化予測への様々なアプローチが明らかになった。特に、ランダムな森林と時間的畳み込みネットワークは、一次研究の結果に基づく他の手法に比べて優れた性能を示した。しかしながら、これらのアプローチは15の特定されたTDタイプのうち、特にコード負債とアーキテクチャ負債の2つにのみ対処します。結論:TD進化予測の研究はまだ初期段階であり,多くの課題が未解決のまま残されている。そこで本稿では,既存のギャップを埋めるためにさらなる調査を必要とするいくつかの研究方向を提案する。キーワード:システム文献レビュー、技術的負債、技術的負債予測、技術的負債予測、技術的負債メトリクス Context: Technical debt (TD) refers to the additional costs incurred due to compromises in software quality, providing short-term advantages during development but potentially compromising long-term quality. Accurate TD forecasting and prediction are vital for informed software maintenance and proactive management. However, this research area lacks comprehensive documentation on the available forecasting techniques. Objective: This study aims to explore existing knowledge in software engineering to gain insights into approaches proposed in research and industry for forecasting TD evolution. Methods: To achieve this objective, we conducted a Systematic Literature Review encompassing 646 distinct papers published until 2023. Following established methodology in software engineering, we identified and included 14 primary studies for analysis. Result: Our analysis unveiled various approaches for TD evolution forecasting. Notably, random forest and temporal convolutional networks demonstrated superior performance compared to other methods based on the result from the primary studies. However, these approaches only address two of the fifteen identified TD types, specifically Code debt and Architecture debt, while disregarding the remaining types. Conclusion: Our findings indicate that research on TD evolution forecasting is still in its early stages, leaving numerous challenges unaddressed. Therefore, we propose several research directions that require further investigation to bridge the existing gaps. Keywords: Systematic literature review, Technical debt, Technical debt prediction, Technical debt forecasting, Technical debt metrics	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# 敵の摂動は、アーティストを生成AIから確実に保護できない Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI ( http://arxiv.org/abs/2406.12027v1 ) ライセンス: Link先を確認	Robert Hönig, Javier Rando, Nicholas Carlini, Florian Tramèr,	(参考訳) アーティストたちは、独自の芸術スタイルを忠実に再現できる画像生成モデルの進歩をますます懸念している。これに対し、オンラインで公開されたアートワークに小さな敵の摂動を組み込んだ、スタイルの模倣に対する保護ツールがいくつか開発されている。本研究では,一般的な保護 – 数百万ダウンロード – の有効性を評価し,セキュリティに関する誤った感覚のみを提供することを示す。画像アップスケーリングのような低努力と「オフ・ザ・シェルフ」技術は、既存の保護を著しく劣化させる堅牢な模倣手法を作成するのに十分であることがわかった。ユーザスタディを通じて、既存の保護は簡単にバイパスでき、アーティストはスタイルの模倣に弱いままであることを示す。我々は、敵対的摂動に基づくツールが、生成的AIの誤用からアーティストを確実に保護できないことを警告し、代替技術以外のソリューションの開発を促す。 Artists are increasingly concerned about advancements in image generation models that can closely replicate their unique artistic styles. In response, several protection tools against style mimicry have been developed that incorporate small adversarial perturbations into artworks published online. In this work, we evaluate the effectiveness of popular protections -- with millions of downloads -- and show they only provide a false sense of security. We find that low-effort and "off-the-shelf" techniques, such as image upscaling, are sufficient to create robust mimicry methods that significantly degrade existing protections. Through a user study, we demonstrate that all existing protections can be easily bypassed, leaving artists vulnerable to style mimicry. We caution that tools based on adversarial perturbations cannot reliably protect artists from the misuse of generative AI, and urge the development of alternative non-technological solutions.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# SPA-VL:視覚言語モデルのための包括的安全基準アライメントデータセット SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model ( http://arxiv.org/abs/2406.12030v1 ) ライセンス: Link先を確認	Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao,	(参考訳) 視覚言語モデル(VLM)の出現は、マルチモーダル情報の理解において前例のない進歩をもたらした。 VLMにおけるテキストと視覚のセマンティクスの組み合わせは非常に複雑で多様であり、これらのモデルの安全性の整合性は困難である。さらに、VLMの安全性アライメントに関する限られた研究により、大規模で高品質なデータセットが不足している。これらの制約に対処するために,SPA-VL というビジョン言語モデルのための安全優先アライメントデータセットを提案する。 SPA-VLは6つの有害ドメイン、13のカテゴリ、53のサブカテゴリをカバーし、クエスト、画像、選択された応答、拒否された応答)の4倍体の100,788のサンプルを含む。深さの面では、応答は12個のオープン(eg, QwenVL)とクローズドソース(eg, Gemini)のVLMから収集され、多様性が保証される。実験結果から,SPA-VLデータセット上のアライメント技術を用いてトレーニングしたモデルでは,コア機能を維持しながら,無害性と有用性を大幅に向上することが示唆された。 SPA-VLは大規模で高品質で多様なデータセットであり、VLMが無害性と有用性の両方を達成することを保証する重要なマイルストーンである。コード https://github.com/EchoseChen/SPA-VL-RLHF と SPA-VL データセット url https://huggingface.co/datasets/sqrti/SPA-VL を公開しました。 The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To address these limitations, we propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL. In terms of breadth, SPA-VL covers 6 harmfulness domains, 13 categories, and 53 subcategories, and contains 100,788 samples of the quadruple (question, image, chosen response, rejected response). In terms of depth, the responses are collected from 12 open- (e.g., QwenVL) and closed-source (e.g., Gemini) VLMs to ensure diversity. The experimental results indicate that models trained with alignment techniques on the SPA-VL dataset exhibit substantial improvements in harmlessness and helpfulness while maintaining core capabilities. SPA-VL, as a large-scale, high-quality, and diverse dataset, represents a significant milestone in ensuring that VLMs achieve both harmlessness and helpfulness. We have made our code https://github.com/EchoseChen/SPA-VL-RLHF and SPA-VL dataset url https://huggingface.co/datasets/sqrti/SPA-VL publicly available.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# 言語モデリングによる語彙データの大規模伝達学習 Large Scale Transfer Learning for Tabular Data via Language Modeling ( http://arxiv.org/abs/2406.12031v1 ) ライセンス: Link先を確認	Josh Gardner, Juan C. Perdomo, Ludwig Schmidt,	(参考訳) 構造的、異質で、行と列を持つスプレッドシートスタイルのデータであるタブラルデータは、実際には多くのドメインで広く使われている。しかし、近年の基盤モデルでは、言語モデリングやコンピュータビジョンなどの領域におけるタスク固有のデータセットや予測器の開発の必要性が減っているが、この伝達学習パラダイムは表領域に類似した影響を与えていない。本研究では,このギャップを狭め,表型予測のための言語モデルであるTabuLa-8Bを提案する。本研究では,TabLibコーパスから大規模で高品質なトレーニングデータセットを抽出するプロセスを定義し,表型データフィルタリングと品質管理の手法を提案する。得られたデータセットは3.1Mのユニークなテーブルから1.6Bを超える行で構成されており、新しいパッキングとアテンションスキームを用いて表データ予測(分類とバイナリ回帰)のためのLlama 3-8B大言語モデル(LLM)を微調整する。 329のデータセットからなるテストスイートで評価した結果,TabuLa-8Bはランダムな推測よりも15ポイント(pp)高い未確認テーブル上でゼロショット精度を持つことがわかった。ターゲットデータセットを微調整することなく、数ショット設定(1-32ショット)で、TabuLa-8Bは、XGBoostやTabPFNモデルよりも5～15pp正確で、そのモデルでは、XGBoostとTabPFNは、同等または最大16倍のデータで明示的にトレーニングされている。この論文の出版とともに、私たちのモデル、コード、データをリリースします。 Tabular data -- structured, heterogeneous, spreadsheet-style data with rows and columns -- is widely used in practice across many domains. However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. We define a process for extracting a large, high-quality training dataset from the TabLib corpus, proposing methods for tabular data filtering and quality control. Using the resulting dataset, which comprises over 1.6B rows from 3.1M unique tables, we fine-tune a Llama 3-8B large language model (LLM) for tabular data prediction (classification and binned regression) using a novel packing and attention scheme for tabular prediction. Through evaluation across a test suite of 329 datasets, we find that TabuLa-8B has zero-shot accuracy on unseen tables that is over 15 percentage points (pp) higher than random guessing, a feat that is not possible with existing state-of-the-art tabular prediction models (e.g. XGBoost, TabPFN). In the few-shot setting (1-32 shots), without any fine-tuning on the target datasets, TabuLa-8B is 5-15 pp more accurate than XGBoost and TabPFN models that are explicitly trained on equal, or even up to 16x more data. We release our model, code, and data along with the publication of this paper.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# 大規模言語モデルを用いたメンタルヘルス分析におけるバイアスの発見と緩和 Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models ( http://arxiv.org/abs/2406.12033v1 ) ライセンス: Link先を確認	Yuqing Wang, Yun Zhao, Sara Alessandra Keller, Anne de Hond, Marieke M. van Buchem, Malvika Pillai, Tina Hernandez-Boussard,	(参考訳) 大規模言語モデル(LLM)の進歩は、メンタルヘルス分析を含む様々な応用において強力な能力を示している。しかし、既存の研究は予測性能に重点を置いており、フェアネスの重大な問題は未発見のままであり、脆弱な個体群に重大なリスクを及ぼしている。潜在的なバイアスを認めているにもかかわらず、以前の研究はこれらのバイアスとその影響について徹底的な調査を欠いていた。このギャップに対処するために,8種類のメンタルヘルスデータセットに対して異なるプロンプト法による10個のLSMを用いて,7つの社会的要因(性別,年齢,宗教など)のバイアスを体系的に評価した。以上の結果から,GPT-4は,MentalRoBERTaのようなドメイン固有モデルに後れを取っているものの,LLM間の性能と公平性において最高の総合バランスを達成していることが示された。さらに、調整されたフェアネス対応のプロンプトは、メンタルヘルス予測におけるバイアスを効果的に軽減し、この分野におけるフェアネス分析の大きな可能性を浮き彫りにします。 The advancement of large language models (LLMs) has demonstrated strong capabilities across various applications, including mental health analysis. However, existing studies have focused on predictive performance, leaving the critical issue of fairness underexplored, posing significant risks to vulnerable populations. Despite acknowledging potential biases, previous works have lacked thorough investigations into these biases and their impacts. To address this gap, we systematically evaluate biases across seven social factors (e.g., gender, age, religion) using ten LLMs with different prompting methods on eight diverse mental health datasets. Our results show that GPT-4 achieves the best overall balance in performance and fairness among LLMs, although it still lags behind domain-specific models like MentalRoBERTa in some cases. Additionally, our tailored fairness-aware prompts can effectively mitigate bias in mental health predictions, highlighting the great potential for fair analysis in this field.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# Self-MoE: 自己専門のエキスパートによる構成的大規模言語モデルを目指して Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts ( http://arxiv.org/abs/2406.12034v1 ) ライセンス: Link先を確認	Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter,	(参考訳) 我々は,モノリシックなLCMを,MiXSE(MiXture of Self-specialized Experts)という,自己専門の専門家による構成的,モジュール的なシステムに変換するアプローチであるSelf-MoEを提案する。提案手法は,自己生成合成データを用いて専門家モジュールを構築する自己特殊化を利用して,それぞれに共有ベースLLMを備え,自己最適化ルーティングを組み込む。これにより、さまざまな目標タスクの動的かつ機能固有の処理が可能になり、広範な人間ラベル付きデータやパラメータを追加することなく、全体的な機能を向上させることができる。実験結果から, LLMの特殊化は, 非特殊化タスクにおける性能に潜在的なトレードオフをもたらす可能性が示唆された。一方、私たちのSelf-MoEは、知識、推論、数学、コーディングといった様々なベンチマークにおいて、ベースLSMよりも大幅に改善されていることを示しています。また、インスタンスのマージや重み付けなど、他の方法よりも一貫して優れており、セマンティックエキスパートやルーティングの設計による柔軟性と解釈性も向上している。我々の発見は、モジュール化と、効率的でスケーラブルで適応可能なシステムを実現する上での自己改善の持つ重要な役割を浮き彫りにしている。 We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# ロボット神経リハビリテーショントレーニングのための社会的対話型エージェント : 概念化と概念実証研究 Socially Interactive Agents for Robotic Neurorehabilitation Training: Conceptualization and Proof-of-concept Study ( http://arxiv.org/abs/2406.12035v1 ) ライセンス: Link先を確認	Rhythm Arora, Pooja Prajod, Matteo Lavit Nicora, Daniele Panzeri, Giovanni Tauro, Rocco Vertechy, Matteo Malosio, Elisabeth André, Patrick Gebhard,	(参考訳) 多様な運動能力を持つ人は、機能回復を促進することを目的とした集中治療や専門的なリハビリテーション療法の恩恵を受けることが多い。それでも課題は、神経リハビリテーションのプロフェッショナルが限定的に利用できることであり、必要なケアレベルを効果的に提供することを妨げる。ロボットデバイスは、治療中の医療従事者への依存を減らす大きな可能性を秘めているが、同時に、従来の対人セッションが提供する重要なヒューマンインタラクションやモチベーションを欠いている。このギャップを埋めるために、我々は、神経リハビリテーショントレーニング中にパーソナライズされた院外援助を提供するAIベースのシステムを導入する。本システムは、リハビリテーション訓練装置、感情信号分類モデル、トレーニング演習、およびユーザインタフェースとしてのソーシャルインタラクティブエージェントを含む。専門職の助けを借りて、想定されたシステムは、個々の患者の独自のリハビリテーション要件を満たすように調整されるように設計されている。仮想コーチングアシスタントとして機能する社会的対話型エージェントによって支援され、予備設定および指導段階を経て、患者は自宅の快適さで自律的にリハビリ体制を継続する。我々のアプローチは、対話型社会認識仮想エージェントを神経リハビリテーションロボットフレームワークに統合することであり、その主な目的は、リハビリテーションセッションに固有の社会的側面を再現することである。また,健常患者を対象に,本フレームワークの妥当性試験を行った。予備調査の結果,参加者はシステムに適応する確率を示した。特に,提案演習における対話エージェントの存在は,注意をそらす要因として機能せず,ユーザのエンゲージメントに肯定的な影響を及ぼした。 Individuals with diverse motor abilities often benefit from intensive and specialized rehabilitation therapies aimed at enhancing their functional recovery. Nevertheless, the challenge lies in the restricted availability of neurorehabilitation professionals, hindering the effective delivery of the necessary level of care. Robotic devices hold great potential in reducing the dependence on medical personnel during therapy but, at the same time, they generally lack the crucial human interaction and motivation that traditional in-person sessions provide. To bridge this gap, we introduce an AI-based system aimed at delivering personalized, out-of-hospital assistance during neurorehabilitation training. This system includes a rehabilitation training device, affective signal classification models, training exercises, and a socially interactive agent as the user interface. With the assistance of a professional, the envisioned system is designed to be tailored to accommodate the unique rehabilitation requirements of an individual patient. Conceptually, after a preliminary setup and instruction phase, the patient is equipped to continue their rehabilitation regimen autonomously in the comfort of their home, facilitated by a socially interactive agent functioning as a virtual coaching assistant. Our approach involves the integration of an interactive socially-aware virtual agent into a neurorehabilitation robotic framework, with the primary objective of recreating the social aspects inherent to in-person rehabilitation sessions. We also conducted a feasibility study to test the framework with healthy patients. The results of our preliminary investigation indicate that participants demonstrated a propensity to adapt to the system. Notably, the presence of the interactive agent during the proposed exercises did not act as a source of distraction; instead, it positively impacted users' engagement.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# MedCalc-Bench:医学計算のための大規模言語モデルの評価 MedCalc-Bench: Evaluating Large Language Models for Medical Calculations ( http://arxiv.org/abs/2406.12036v1 ) ライセンス: Link先を確認	Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu,	(参考訳) 計算と論理に基づく推論を評価するのとは対照的に、医学における大規模言語モデル(LLM)を評価するための現在のベンチマーク2マークは、主にドメイン知識と記述的rea4ソナリングを含む質問応答に焦点を当てている。このような定性的な能力は医学的診断に欠かせないが、現実の5つの世界のシナリオでは、医師は、定量式に従う臨床電卓や、エビデンスベースの意思決定支援のためのルールベースの推論パラダイムを頻繁に使用する。この目的のために, LLMの医療計算能力を評価することを目的とした, 第一種データセットであるMedCalc-Benchを提案する。 MedCalc-Benchには、55の異なる医療計算タスクから1000以上のレビュー済みのインスタンスの評価セットが含まれている。 MedCalc-Benchの各インスタンスは、患者ノート、特定の医学的価値の計算を要求する質問、真実の答え、そしてその答えがどのように得られるかを示すステップバイステップの説明からなる。以上の結果から, 当科におけるLSMsの有用性が示唆されるが, 臨床検査に十分な効果は得られていない。一般的な問題としては、不正なエンティティを抽出すること、計算タスクに正しい方程式や規則を使わないこと、計算の算術を誤って実行することなどがある。医療現場におけるLSMの量的知識と推論のギャップを強調し,様々な臨床計算タスクにおけるLCMの今後の改善を促すことを願っている。 As opposed to evaluating computation and logic-based reasoning, current bench2 marks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive rea4 soning. While such qualitative capabilities are vital to medical diagnosis, in real5 world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks.	翻訳日:2024-06-20 00:07:11 公開日:2024-06-17
# 大規模言語モデルにおける未学習のためのソフトプロンプト Soft Prompting for Unlearning in Large Language Models ( http://arxiv.org/abs/2406.12038v1 ) ライセンス: Link先を確認	Karuna Bhaila, Minh-Hao Van, Xintao Wu,	(参考訳) LLM(Large Language Models)が広く普及しているのは、部分的には文脈内学習を行うユニークな能力のためであり、これらの事前訓練されたモデルをデプロイする際の倫理的・安全的配慮の重要性も明らかにされている。本研究では,データ保護規制を動機としたLLMの機械学習に関する研究に焦点をあてる。未学習を実現するための微調整手法に関する文献の増大とは対照的に、訓練データのサブセットの未学習を実現するためのソフトプロンプトと呼ばれる比較的軽量な代替手段に焦点を当てる。我々のフレームワークである \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) では、任意のクエリに付加可能なプロンプトトークンを学習し、LLMパラメータを更新することなく、推論時に特定の例のアンラーニングを誘導する。提案手法の厳密な評価を行い,その結果から,LLMを用いたテキスト分類の文脈において,SPULは実用性と忘れとのトレードオフを大幅に改善できることを示す。さらに,フレームワークのスケーラビリティを強調し,ハイパーパラメータの選択と未学習データのサイズの影響について詳細な知見を提供するために,複数のLSMを用いて手法を検証する。実装は \url{https://github.com/karuna-bhaila/llm_unlearning} で公開しています。 The widespread popularity of Large Language Models (LLMs), partly due to their unique ability to perform in-context learning, has also brought to light the importance of ethical and safety considerations when deploying these pre-trained models. In this work, we focus on investigating machine unlearning for LLMs motivated by data protection regulations. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we focus on a comparatively lightweight alternative called soft prompting to realize the unlearning of a subset of training data. With losses designed to enforce forgetting as well as utility preservation, our framework \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) learns prompt tokens that can be appended to an arbitrary query to induce unlearning of specific examples at inference time without updating LLM parameters. We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting in the context of text classification with LLMs. We further validate our method using multiple LLMs to highlight the scalability of our framework and provide detailed insights into the choice of hyperparameters and the influence of the size of unlearning data. Our implementation is available at \url{https://github.com/karuna-bhaila/llm_unlearning}.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# 宇宙空間のサイバー攻撃:新たなシナリオを生み出す Outer Space Cyberattacks: Generating Novel Scenarios to Avoid Surprise ( http://arxiv.org/abs/2406.12041v1 ) ライセンス: Link先を確認	Patrick Lin, Keith Abney, Bruce DeBruhl, Kira Abercromby, Henry Danielson, Ryan Jenkins,	(参考訳) 一般の認識は低いかもしれないが、現代の宇宙システムが果たす重要な役割を考えると、宇宙のサイバー攻撃はますます深刻な問題となっている。オープンソースあるいは公開の議論は通常、衛星ハッキングや信号妨害や偽造など、いくつかの一般的なシナリオを中心に展開される。しかし、さらに多くの可能性があります。報告書はシナリオ・プロンプト・ジェネレータ(ICARUS行列と呼ばれる分類学)を提供しており、400万以上のシナリオ・プロンプトを作成できる。私たちは42のシナリオの開始セットを提供し、各シナリオを簡潔に説明し、想像力を最優先させ、より多くの研究者がこの問題に対処するための多様な専門知識と視点をもたらすことができるようにします。新たなシナリオを想像できないことは、我々の有線世界を支配するデジタルシステムに侵入するために、常に新しい方法、発明的かつ資源的な方法を考案している脅威アクターによって、驚きによって取られる大きなリスクである。警戒を維持するためには、サイバーセキュリティにおいてハンターと獲物の間の敵対的なダンスに追随するためにも、被告は想像力を持っていなければならない。新たなシナリオを提供するだけでなく、我々が特定した少なくとも7つの要因を含む、宇宙サイバーセキュリティ問題の原動力についても検討する。例えば、宇宙デブリの共有された脅威は、軌道上の運動的衝突を避けるために合理的な状態やアクターを押し付けているように思われる。外空間はサイバーセキュリティの次のフロンティアだ。宇宙のサイバー攻撃から守るためには、それらを理解して予測する必要がある。 Though general awareness around it may be low, space cyberattacks are an increasingly urgent problem given the vital role that space systems play in the modern world. Open-source or public discussions about it typically revolve around only a couple generic scenarios, namely satellite hacking and signals jamming or spoofing. But there are so many more possibilities. The report offers a scenario-prompt generator -- a taxonomy of sorts, called the ICARUS matrix -- that can create more than 4 million unique scenario-prompts. We will offer a starting set of 42 scenarios, briefly describing each one, to begin priming the imagination-pump so that many more researchers can bring their diverse expertise and perspectives to bear on the problem. A failure to imagine novel scenarios is a major risk in being taken by surprise and severely harmed by threat actors who are constantly devising new ways, inventive and resourceful ways, to breach the digital systems that control our wired world. To stay vigilant, defenders likewise need to be imaginative to keep up in this adversarial dance between hunter and prey in cybersecurity. More than offering novel scenarios, we will also explore the drivers of the space cybersecurity problem, which include at least seven factors we have identified. For instance, the shared threat of space debris would seem to push rational states and actors to avoid kinetic conflicts in orbit, which weighs in favor of cyberoperations as the dominant form of space conflicts. Outer space is the next frontier for cybersecurity. To guard against space cyberattacks, we need to understand and anticipate them, and imagination is at the very heart of both cybersecurity and frontiers.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# すべてのプロンプトが等しくなるわけではない:テキストと画像の拡散モデルのプロンプトベースプルーニング Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models ( http://arxiv.org/abs/2406.12042v1 ) ライセンス: Link先を確認	Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao, Heng Huang,	(参考訳) テキスト・ツー・イメージ(T2I)拡散モデルは印象的な画像生成能力を示している。それでも、その計算強度は、リソース制約のある組織がT2Iモデルを内部のターゲットデータに微調整した後に展開することを妨げている。プルーニング技術は、T2Iモデルの計算負担を軽減する潜在的な解決策を提供する一方で、静的プルーニング手法は、異なるプロンプトのキャパシティ要件を見越して、全ての入力プロンプトに対して同じプルーニングモデルを使用する。動的プルーニングは各プロンプトに個別のサブネットワークを使用することでこの問題に対処するが、GPUのバッチ並列化を防止している。これらの制約を克服するため、T2I拡散モデル用に設計された新しいプロンプトベースのプルーニング手法であるAdaptive Prompt-Tailored Pruning (APTP)を導入する。我々のアプローチの中心はプロンプトルータモデルであり、入力テキストプロンプトに必要なキャパシティを決定することを学習し、それをアーキテクチャコードにルーティングする。それぞれのアーキテクチャコードは、割り当てられたプロンプトに合わせた特別なモデルを表しており、コードの数はハイパーパラメータである。我々は、コントラスト学習を用いてプロンプトルータとアーキテクチャコードをトレーニングし、類似のプロンプトが近くのコードにマップされることを保証する。さらに、最適なトランスポートを使用して、コードが1つのコードに崩壊するのを防ぐ。我々は、CC3MとCOCOをターゲットデータセットとして、安定拡散(SD)V2.1をプルーニングすることでAPTPの有効性を示す。 APTPはFID、CLIP、CMMDスコアの点でシングルモデルプルーニングベースラインを上回っている。 APTPが学習したクラスタの分析により、意味論的に意味があることが判明した。また、APTPは、SD、例えばテキスト画像を生成するプロンプトに対して、以前に実証された挑戦的なプロンプトを自動的に検出し、より高いキャパシティコードにアサインできることも示している。 Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# グレードスコア:オプション選択におけるLLM性能の定量化 Grade Score: Quantifying LLM Performance in Option Selection ( http://arxiv.org/abs/2406.12043v1 ) ライセンス: Link先を確認	Dmitri Iourovitski,	(参考訳) 本研究では,Large Language Models (LLMs) の整合性と公平性を評価するために考案された新しい尺度 "Grade Score" を紹介する。グレードスコアは、注文バイアスを測定するエントロピーと、選択安定性を評価し、LLMの信頼性と公平性に関する洞察を提供するモード周波数を組み合わせる。本研究は,LLMの性能向上効果を実証し,評価スコアを最適化するために,迅速な工学的手法やオプションサンプリング手法などの手法を探求する。その結果,LSMのプロンプトに対する性能の変化が示され,無関係な選択肢を含めることによる肯定的な影響が浮き彫りになった。この研究では、特定のバイアスをターゲットとした指示に適応し、適応性を実証する命令追従モデルにおいて、創発的な行動を特定する。グレードスコアはLLMの比較を促進するとともに、様々なアプリケーションにおける信頼性と公平性を改善するための潜在的な可能性として、意思決定プロセスの最適化に向けた進行中の研究を促進する。すべてのコードはGitHub https://github.com/IoDmitri/GradeLabで入手できる。 This study introduces the "Grade Score", a novel metric designed to evaluate the consistency and fairness of Large Language Models (LLMs) when used as multiple-choice judges with respect to order bias and choice consistency. The Grade Score combines Entropy, which measures order bias, and Mode Frequency, which assesses choice stability, offering insights into LLMs' reliability and impartiality. The study explores techniques such as prompt engineering and option sampling strategies to optimize the Grade Score, demonstrating their effectiveness in enhancing LLMs' performance. Results showcase varying performances among LLMs with respect to prompts and highlight the positive impact of including irrelevant options. The study also identifies an emergent behavior in instruction-following models, where they adapt to instructions targeting specific biases, demonstrating their adaptability. The Grade Score facilitates comparisons between LLMs and encourages ongoing research towards optimizing their decision-making processes, with potential implications for improving their reliability and fairness in various applications. All code is available on GitHub https://github.com/IoDmitri/GradeLab	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# ARTIST:アンタングル化によるテキストリッチ画像生成の改善 ARTIST: Improving the Generation of Text-rich Images by Disentanglement ( http://arxiv.org/abs/2406.12044v1 ) ライセンス: Link先を確認	Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang,	(参考訳) 拡散モデルは、広い範囲の視覚コンテンツを生成できるという異常な能力を示したが、テキストの描画能力はまだ限られており、下層の画像とうまく融合できない不正確な文字や単語を生成することが多い。これらの欠点に対処するため、ARTISTという新しいフレームワークを導入する。このフレームワークには専用のテキスト拡散モデルが含まれており、特にテキスト構造の学習に焦点を当てている。当初、テキスト表現の複雑さを捉えるために、このテキストモデルを事前訓練する。その後、視覚拡散モデルを微調整し、事前訓練されたテキストモデルからテキスト構造情報を同化できるようにする。この歪んだアーキテクチャ設計とトレーニング戦略は、テキストリッチな画像生成のための拡散モデルのテキストレンダリング能力を著しく向上させる。さらに、トレーニング済みの大規模言語モデルの能力を活用して、ユーザの意図をよりよく解釈し、生成品質の向上に貢献します。 MARIO-Evalベンチマークの実証結果は,提案手法の有効性を裏付けるものであり,様々な指標において最大15倍の精度向上を示した。 Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image. To address these shortcomings, we introduce a new framework named ARTIST. This framework incorporates a dedicated textual diffusion model to specifically focus on the learning of text structures. Initially, we pretrain this textual model to capture the intricacies of text representation. Subsequently, we finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model. This disentangled architecture design and the training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation. Additionally, we leverage the capabilities of pretrained large language models to better interpret user intentions, contributing to improved generation quality. Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15\% in various metrics.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# $τ$-bench: 実世界のドメインにおけるツール-エージェント-ユーザインタラクションのベンチマーク $τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains ( http://arxiv.org/abs/2406.12045v1 ) ライセンス: Link先を確認	Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan,	(参考訳) 既存のベンチマークでは、人間のユーザとのインタラクションやドメイン固有のルールに従う能力について、言語エージェントをテストすることはできない。ドメイン固有のAPIツールとポリシーガイドラインを備えた言語エージェントとユーザ(言語モデルでシミュレートされた)間の動的会話をエミュレートするベンチマークである$\tau$-benchを提案する。我々は、会話の最後にデータベースの状態と注釈付きゴール状態を比較する、効率的で忠実な評価プロセスを採用する。また,複数の試行においてエージェント動作の信頼性を評価するための新しい指標(pass^k)を提案する。実験の結果,gpt-4oのような最先端機能呼び出しエージェントでもタスクの50%が成功し,非常に矛盾している(小売りではpass^8 <25%)ことがわかった。本研究は, エージェントが一貫して行動し, ルールを確実に追従する能力を向上する手法の必要性を指摘する。 Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications. We propose $\tau$-bench, a benchmark emulating dynamic conversations between a user (simulated by language models) and a language agent provided with domain-specific API tools and policy guidelines. We employ an efficient and faithful evaluation process that compares the database state at the end of a conversation with the annotated goal state. We also propose a new metric (pass^k) to evaluate the reliability of agent behavior over multiple trials. Our experiments show that even state-of-the-art function calling agents (like gpt-4o) succeed on <50% of the tasks, and are quite inconsistent (pass^8 <25% in retail). Our findings point to the need for methods that can improve the ability of agents to act consistently and follow rules reliably.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# MEDeA: マルチビュー効率の良い深さ調整 MEDeA: Multi-view Efficient Depth Adjustment ( http://arxiv.org/abs/2406.12048v1 ) ライセンス: Link先を確認	Mikhail Artemyev, Anna Vorontsova, Anna Sokolova, Alexander Limonov,	(参考訳) 現代の単一視点深度推定手法の大多数は相対的な深さを予測しており、ベンチマークで顕著な性能を示したにもかかわらず、多くの実世界のシナリオでは直接適用できない。さらに、単一ビューアプローチは、一連のフレーム間の一貫性を保証することはできない。一貫性は通常、ビュー間の不一致をテスト時の最適化で対処するが、単一のシーンを処理するのに数時間かかる。本稿では,従来のテスト時間手法よりもはるかに高速な多視点テスト時間深度補正手法であるMEDeAを提案する。カメラパラメータを持つRGBフレームが与えられた場合、MEDeAは初期深度マップを予測し、局所スケーリング係数を最適化して調整し、時間的に一貫性のある深度マップを出力する。 MEDeAは、正規化や光フロー、セマンティックス推定を必要とするテスト時間法とは対照的に、深度推定ネットワークのみで高品質な予測を行う。提案手法は, TUM RGB-D, 7Scenes, ScanNet のベンチマークに新たな最先端性を設定し,ARKitScenes データセットから取得したスマートフォンデータの処理に成功している。 The majority of modern single-view depth estimation methods predict relative depth and thus cannot be directly applied in many real-world scenarios, despite impressive performance in the benchmarks. Moreover, single-view approaches cannot guarantee consistency across a sequence of frames. Consistency is typically addressed with test-time optimization of discrepancy across views; however, it takes hours to process a single scene. In this paper, we present MEDeA, an efficient multi-view test-time depth adjustment method, that is an order of magnitude faster than existing test-time approaches. Given RGB frames with camera parameters, MEDeA predicts initial depth maps, adjusts them by optimizing local scaling coefficients, and outputs temporally-consistent depth maps. Contrary to test-time methods requiring normals, optical flow, or semantics estimation, MEDeA produces high-quality predictions with a depth estimation network solely. Our method sets a new state-of-the-art on TUM RGB-D, 7Scenes, and ScanNet benchmarks and successfully handles smartphone-captured data from ARKitScenes dataset.	翻訳日:2024-06-20 00:07:10 公開日:2024-06-17
# 回答を超えて学ぶ:数学的推論のためのリフレクションを用いた言語モデルの訓練 Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning ( http://arxiv.org/abs/2406.12050v1 ) ライセンス: Link先を確認	Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang,	(参考訳) 教師付き微調整により、様々な数学的推論タスクにおける言語モデルの問題解決能力が向上する。このような利点を最大化するために、既存の研究は、標準的な単ラウンド質問応答設定に有効である様々なデータ拡張手法でトレーニングセットを拡張することに焦点を当てている。我々の研究は,目前にあるトレーニング問題を深く理解し,標準設定だけでなく,反射的思考を必要とするより複雑なシナリオでもパフォーマンスを向上させることを目的とした,新しい手法を導入している。具体的には,各トレーニングインスタンスに問題リフレクションを埋め込む手法であるリフレクティブ拡張を提案する。モデルに代替的な視点を考慮させ、抽象論やアナロジーに関わり、反射的推論を通じて完全な理解を促進するよう訓練する。本手法の特長と既存拡張技術に対する相補的特性を概説し, 目的達成の実証実験を行った。 Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper understanding of the training problems at hand, enhancing performance not only in standard settings but also in more complex scenarios that require reflective thinking. Specifically, we propose reflective augmentation, a method that embeds problem reflection into each training instance. It trains the model to consider alternative perspectives and engage with abstractions and analogies, thereby fostering a thorough comprehension through reflective reasoning. Extensive experiments validate the achievement of our aim, underscoring the unique advantages of our method and its complementary nature relative to existing augmentation techniques.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# UniGLM: テキスト分散グラフのための統一言語モデルのトレーニング UniGLM: Training One Unified Language Model for Text-Attributed Graphs ( http://arxiv.org/abs/2406.12052v1 ) ライセンス: Link先を確認	Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, Qiaoyu Tan,	(参考訳) ノードをテキスト記述で表現するTAG(text-attributed graph)での表現学習は、テキストおよびリレーショナル知識システムやレコメンデーションシステムにおいて重要である。現在、TAGの最先端の埋め込み手法は主に構造認識学習信号を用いた微調整言語モデル(例えばBERT)に焦点を当てている。有効ではあるが、これらの手法は個々のTAGに合わせて調整されており、様々なグラフシナリオにまたがる一般化はできない。共有されたテキスト空間を考えると、複数のTAGを活用して、異なる側面からテキストとグラフ構造を調整することはより有益である。そこで我々はUnified Graph Language Model (UniGLM) フレームワークを紹介した。これは、ドメイン内およびドメイン間のTAGをうまく一般化する最初のグラフ埋め込みモデルである。具体的には、UniGLMは、異なるドメインとスケールを持つ複数のTAGに対して、自己教師付きコントラスト学習を使用して訓練される。 UniGLMには、構造的に類似したノードを特定するための適応的な正のサンプル選択技術と、反復符号化計算を最小化してトレーニングを加速するために考案された遅延コントラストモジュールが含まれている。 9つのベンチマークTAGの広範な実験結果は、UniGLMが一般化(様々な下流タスクとバックボーン)と移行学習(ドメインシナリオ内および外)の観点から、主要な埋め込みベースラインに対して有効であることを実証している。コードはhttps://github.com/NYUSHCS/UniGLMで入手できる。 Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for individual TAG and cannot generalize across various graph scenarios. Given the shared textual space, leveraging multiple TAGs for joint fine-tuning, aligning text and graph structure from different aspects, would be more beneficial. Motivated by this, we introduce a novel Unified Graph Language Model (UniGLM) framework, the first graph embedding model that generalizes well to both in-domain and cross-domain TAGs. Specifically, UniGLM is trained over multiple TAGs with different domains and scales using self-supervised contrastive learning. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training by minimizing repetitive encoding calculations. Extensive empirical results across 9 benchmark TAGs demonstrate UniGLM's efficacy against leading embedding baselines in terms of generalization (various downstream tasks and backbones) and transfer learning (in and out of domain scenarios). The code is available at https://github.com/NYUSHCS/UniGLM.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# 内部インスペクタ$I^2$:内部状態によるLLMのロバスト信頼度推定 InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States ( http://arxiv.org/abs/2406.12053v1 ) ライセンス: Link先を確認	Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang,	(参考訳) 大きな言語モデル(LLM)は、その膨大な能力にもかかわらず、信頼できる出力を生成するのにしばしば苦労し、幻覚として知られる高信頼の不正確さをしばしば生み出す。この課題に対処するため,本研究では,すべてのレイヤの注意状態,フィードフォワード状態,アクティベーション状態を含む内部状態に対するコントラスト学習を活用することで,LCMにおける信頼度推定を向上する新しいフレームワークであるInternalInspectorを紹介した。最終的なアクティベーション状態に主にフォーカスする既存の方法とは異なり、InternalInspectorはすべてのレイヤの内部状態を網羅的に分析し、正しい予測プロセスと間違った予測プロセスの両方を正確に識別する。事実質問応答,コモンセンス推論,読解理解など,さまざまな自然言語理解・生成タスクにおける既存の信頼度推定手法に対して,内部検査器をベンチマークすることにより,推定された信頼度スコアをLLMの予測の正しさと低いキャリブレーション誤差の正しさとを一致させる精度を著しく向上させる。さらに、幻覚検出ベンチマークであるHaluEvalでは、内部インスペクタが優れており、このタスクにおける他の内部信頼度推定方法よりも優れている。 Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering, commonsense reasoning, and reading comprehension, InternalInspector achieves significantly higher accuracy in aligning the estimated confidence scores with the correctness of the LLM's predictions and lower calibration error. Furthermore, InternalInspector excels at HaluEval, a hallucination detection benchmark, outperforming other internal-based confidence estimation methods in this task.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# FAWN:Floor-and-walls normal regularization for direct Neural TSDF Reconstruction FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction ( http://arxiv.org/abs/2406.12054v1 ) ライセンス: Link先を確認	Anna Sokolova, Anna Vorontsova, Bulat Gabdullin, Alexander Limonov,	(参考訳) 直接3D再構成のための3Dセマンティクスを活用することは、大きな可能性を秘めている。例えば、壁が垂直で、床が平面で水平であると仮定することで、歪んだ部屋の形を補正し、穴、穴、丘などの局所的な遺物を取り除くことができる。本稿では,シーン内の壁や床を検知してシーン構造を考察し,水平方向と垂直方向を逸脱するための対応する表面正規化をペナルライズする,TSDF (truncated signed distance function) 再構成手法であるFAWNを提案する。 3Dスパース畳み込みモジュールとして実装されたFAWNは、TSDFを予測するトレーニング可能なパイプラインに組み込むことができる。 FAWNはトレーニングのためにのみ3Dセマンティクスを必要とするため、さらなる使用に関する追加の制限は課されない。 FAWNを修飾した手法は,既存の意味に基づく手法よりも,意味論を効果的に活用することが実証された。また,最新のTSDF再構成手法に適用し,SCANNET, ICL-NUIM, TUM RGB-D, 7SCENESベンチマークの品質向上を示す。 Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes and eliminate local artifacts such as holes, pits, and hills. In this paper, we propose FAWN, a modification of truncated signed distance function (TSDF) reconstruction methods, which considers scene structure by detecting walls and floor in a scene, and penalizing the corresponding surface normals for deviating from the horizontal and vertical directions. Implemented as a 3D sparse convolutional module, FAWN can be incorporated into any trainable pipeline that predicts TSDF. Since FAWN requires 3D semantics only for training, no additional limitations on further use are imposed. We demonstrate, that FAWN-modified methods use semantics more effectively, than existing semantic-based approaches. Besides, we apply our modification to state-of-the-art TSDF reconstruction methods, and demonstrate a quality gain in SCANNET, ICL-NUIM, TUM RGB-D, and 7SCENES benchmarks.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# 細胞内における分子表現の学習 Learning Molecular Representation in a Cell ( http://arxiv.org/abs/2406.12056v1 ) ライセンス: Link先を確認	Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh,	(参考訳) 薬物の有効性と安全性をin vivoで予測するには、小さな分子摂動に対する生物学的反応(細胞形態、遺伝子発現など)に関する情報が必要である。しかしながら、現在の分子表現学習法は、これらの摂動下での細胞状態の包括的なビューを提供しておらず、ノイズを取り除くのに苦労し、モデル一般化を妨げている。本稿では,細胞内情報ボトルネック法を用いて分子表現を学習するための情報アライメント(InfoAlign)手法を提案する。我々は、分子と細胞応答データをノードとしてコンテキストグラフに統合し、化学、生物学的、計算基準に基づいて重み付けされたエッジと接続する。トレーニングバッチの各分子に対して、InfoAlignはエンコーダの潜在表現を最小限の目的で最適化し、冗長な構造情報を破棄する。十分性目的(sufficiency objective)は、コンテキストグラフ内の分子の近傍から異なる特徴空間と整合するように表現をデコードする。提案手法は,既存のエンコーダをベースとしたコントラスト法よりも,アライメントの効率向上を目標としている。経験的に、我々はInfoAlignの表現を2つの下流タスクで検証した: 4つのデータセットにまたがる19のベースライン法に対する分子特性予測とゼロショット分子形態整合である。 Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# WellDunn: ウェルネス次元の同定における言語モデルと大規模言語モデルのロバスト性と説明可能性について WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions ( http://arxiv.org/abs/2406.12058v1 ) ライセンス: Link先を確認	Seyedali Mohammadi, Edward Raff, Jinendra Malekar, Vedant Palit, Francis Ferraro, Manas Gaur,	(参考訳) 言語モデル (LM) は, 予後のリスクを高めることで, 臨床実践におけるモデルの有用性の十分なリトマステストにはならない, メンタルヘルスの分野で提案されている。実践に信頼できるモデルは、説明と臨床的決定の対応性を持つべきであるが、これらのモデルの注意力と、それらの基礎的真理的説明への影響について、事前の研究は行われていない。本稿では,ウェルネス次元(WD)の同定におけるLMの堅牢性と説明性に着目した評価設計を提案する。 2つのメンタルヘルスと幸福なデータセットに焦点を当てます。 (a)多ラベル分類に基づくMultiWD及び b) 専門家による説明に対する注意機構の妥当性を評価するためのWellXplain ラベルはハルベルト・ダンのウェルネスの理論に基づいている。 1)人間のような能力にもかかわらず、RoBERTaに遅れてGPT-3.5/4ラグ、そしてMedAlpacaでは、微調整のLDMでは、パフォーマンスや説明に顕著な改善が得られなかった。 2)信頼性指向の損失関数に基づくLMの予測を再検討した結果,性能低下が顕著であった。 (3) すべてのLM/LLMにおいて, 注意と説明の整合性は低く, LLMは0.0。 (4)ほとんどの精神保健専門のLM/LLMは、ドメイン固有の知識や価値の低い説明を見落とし、これらの相違の原因となった。この研究は、精神保健と健康における一貫性と説明について、さらなる研究の必要性を強調している。 Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that can be trusted for practice should have a correspondence between explanation and clinical determination, yet no prior research has examined the attention fidelity of these models and their effect on ground truth explanations. We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WD). We focus on two mental health and well-being datasets: (a) Multi-label Classification-based MultiWD, and (b) WellXplain for evaluating attention mechanism veracity against expert-labeled explanations. The labels are based on Halbert Dunn's theory of wellness, which gives grounding to our evaluation. We reveal four surprising results about LMs/LLMs: (1) Despite their human-like capabilities, GPT-3.5/4 lag behind RoBERTa, and MedAlpaca, a fine-tuned LLM fails to deliver any remarkable improvements in performance or explanations. (2) Re-examining LMs' predictions based on a confidence-oriented loss function reveals a significant performance drop. (3) Across all LMs/LLMs, the alignment between attention and explanations remains low, with LLMs scoring a dismal 0.0. (4) Most mental health-specific LMs/LLMs overlook domain-specific knowledge and undervalue explanations, causing these discrepancies. This study highlights the need for further research into their consistency and explanations in mental health and well-being.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# グラフ変換器のスケーラブルで効果的な代替手段 A Scalable and Effective Alternative to Graph Transformers ( http://arxiv.org/abs/2406.12059v1 ) ライセンス: Link先を確認	Kaan Sancak, Zhigang Hua, Jin Fang, Yan Xie, Andrey Malevich, Bo Long, Muhammed Fatih Balin, Ümit V. Çatalyürek,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ表現学習において顕著なパフォーマンスを示しているが、表現力の制限により、長距離依存をキャプチャする上での課題に直面している。これを解決するために、グラフ変換器(GT)が導入された。これらの利点にもかかわらず、GTはグラフ内のノード数という二次的な複雑さに悩まされ、大きなグラフに適用できなくなる。本研究では,グラフ拡張コンテキスト演算子(GECO)を提案する。これはGTのスケーラブルで効果的な代替手段であり,近隣の伝播とグローバルな畳み込みを利用して,準線形時間で局所的およびグローバルな依存関係を効果的にキャプチャする。合成データセットについて検討した結果,GECOは2Mノードを最適化したグラフ上で169倍の高速化を実現していることがわかった。さまざまなベンチマークに関するさらなる評価は、GECOが従来のGTがメモリと時間制限に直面している大きなグラフにスケールすることを示している。特にGECOは、ベースラインに比べて一貫して同等または優れた品質を実現し、SOTAを4.5%まで改善し、大規模グラフ学習のためのスケーラブルで効果的なソリューションを提供する。 Graph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasilinear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# ゼロではなく集約:自然言語理解におけるショートカットシフトに対処するための実験の混合によるポストホック制御 Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding ( http://arxiv.org/abs/2406.12060v1 ) ライセンス: Link先を確認	Ukyo Honda, Tatsushi Oka, Peinan Zhang, Masato Mita,	(参考訳) 最近の自然言語理解モデルは、一般的にショートカットとして知られるデータセットの単純なパターンを利用する傾向にある。これらのショートカットは、トレーニングデータに存在するラベルと潜在機能の間の急激な相関にヒンジする。推定時において、ショートカットに依存したモデルは、特にラベルと関係のない潜在的特徴がなくなった場合、分布シフトの下で誤った予測を生成する傾向にある。これを避けるために、従来の研究ではショートカットへの依存を取り除くためにモデルを訓練してきた。本研究では,各専門家が比較的異なる潜伏特徴を捉えると仮定して,実験結果の混合予測を悲観的に集約する。実験結果から,専門家に対するポストホック制御は,ショートカットにおける分布シフトに対するモデルのロバスト性を大幅に向上させることが示された。さらに、我々のアプローチにはいくつかの実用的な利点があることが示されています。また、我々のモデルを分析し、その仮定を支持する結果を提供する。 Recent models for natural language understanding are inclined to exploit simple patterns in datasets, commonly known as shortcuts. These shortcuts hinge on spurious correlations between labels and latent features existing in the training data. At inference time, shortcut-dependent models are likely to generate erroneous predictions under distribution shifts, particularly when some latent features are no longer correlated with the labels. To avoid this, previous studies have trained models to eliminate the reliance on shortcuts. In this study, we explore a different direction: pessimistically aggregating the predictions of a mixture-of-experts, assuming each expert captures relatively different latent features. The experimental results demonstrate that our post-hoc control over the experts significantly enhances the model's robustness to the distribution shift in shortcuts. Besides, we show that our approach has some practical advantages. We also analyze our model and provide results to support the assumption.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# エントロピック回帰MD(ERDMD)は不均一時間遅れモデルと不均一時間遅延モデルを発見する Entropic Regression DMD (ERDMD) Discovers Informative Sparse and Nonuniformly Time Delayed Models ( http://arxiv.org/abs/2406.12062v1 ) ライセンス: Link先を確認	Christopher W. Curtis, Erik Bollt, Daniel Jay Alford-Lago,	(参考訳) 本研究では,非線形情報フロー検出アルゴリズムであるエントロピー回帰を用いて,最適多段階動的モード分解(DMD)モデルを決定する手法を提案する。本研究では,高次DMD (HODMD) 法と,ネットワーク検出とモデル構築のためのエントロピック回帰 (ER) 手法を用いて,不均一な時間空間を許容する高忠実度時間遅延DMDモデルを生成するEDDMDと呼ばれる手法を開発した。これらのモデルは、非常に効率的で堅牢であることが示されている。カオス的アトラクタによって生成された複数のデータセット上で本手法を検証し,比較的最小限のモデルを用いて優れた再構成を構築可能であることを示す。同様に、動的モード分解の実用性を高めるモデルにより、マルチスケールの機能をよりよく識別できる。 In this work, we present a method which determines optimal multi-step dynamic mode decomposition (DMD) models via entropic regression, which is a nonlinear information flow detection algorithm. Motivated by the higher-order DMD (HODMD) method of \cite{clainche}, and the entropic regression (ER) technique for network detection and model construction found in \cite{bollt, bollt2}, we develop a method that we call ERDMD that produces high fidelity time-delay DMD models that allow for nonuniform time space, and the time spacing is discovered by consider most informativity based on ER. These models are shown to be highly efficient and robust. We test our method over several data sets generated by chaotic attractors and show that we are able to build excellent reconstructions using relatively minimal models. We likewise are able to better identify multiscale features via our models which enhances the utility of dynamic mode decomposition.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# STNAGNN:タスクベースfMRI解析のための時空間ノード注意グラフニューラルネットワーク STNAGNN: Spatiotemporal Node Attention Graph Neural Network for Task-based fMRI Analysis ( http://arxiv.org/abs/2406.12065v1 ) ライセンス: Link先を確認	Jiyao Wang, Nicha C. Dvornek, Peiyu Duan, Lawrence H. Staib, Pamela Ventola, James S. Duncan,	(参考訳) タスクベースのfMRIは、アクションまたは刺激を使用して、タスク固有の脳反応をトリガーし、BOLDコントラストを使用してそれらを測定する。タスクによる時空間脳活動の著しい変動にもかかわらず、タスクベースfMRIの研究の多くは、タスクコンテキスト情報がfMRIと一致していることを無視し、タスクベースfMRIをコヒーレントなシーケンスとみなす。本稿では,タスク構造をデータ駆動型ガイダンスとして用いることが時空間分析に有効であることを示す。本稿では,GNNに基づく時空間アーキテクチャSTNAGNNを提案し,その性能を自閉症分類タスクで検証する。トレーニングされたモデルは、自閉症に関連する時空間脳バイオマーカーを特定するためにも解釈される。 Task-based fMRI uses actions or stimuli to trigger task-specific brain responses and measures them using BOLD contrast. Despite the significant task-induced spatiotemporal brain activation fluctuations, most studies on task-based fMRI ignore the task context information aligned with fMRI and consider task-based fMRI a coherent sequence. In this paper, we show that using the task structures as data-driven guidance is effective for spatiotemporal analysis. We propose STNAGNN, a GNN-based spatiotemporal architecture, and validate its performance in an autism classification task. The trained model is also interpreted for identifying autism-related spatiotemporal brain biomarkers.	翻訳日:2024-06-19 23:57:20 公開日:2024-06-17
# バイオメディカルベンチマークにおける薬物名と言語モデル Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks ( http://arxiv.org/abs/2406.12066v1 ) ライセンス: Link先を確認	Jack Gallifant, Shan Chen, Pedro Moreira, Nikolaj Munch, Mingye Gao, Jackson Pond, Leo Anthony Celi, Hugo Aerts, Thomas Hartvigsen, Danielle Bitterman,	(参考訳) 医学知識は文脈に依存しており、意味論的に等価なフレーズの様々な自然言語表現に対して一貫した推論を必要とする。これは薬名にとって特に重要であり、患者は一般的な等価品の代わりにAdvilやTylenolといったブランド名を使うことが多い。そこで本研究では,医用医用アノテーションを用いて医用ベンチマークの性能差を評価するために,新しい頑健性データセットであるRABBITSを作成した。 MedQA と MedMCQA のオープンソース LLM と API ベースの LLM を比較し,一貫した性能低下を 1-10 % から明らかにした。さらに、この脆弱性の潜在的な源泉を、広く使われている事前学習データセットにおけるテストデータの汚染として同定する。すべてのコードはhttps://github.com/BittermanLab/RABBITSでアクセスでき、HuggingFaceのリーダーボードはhttps://huggingface.co/spaces/AIM-Harvard/rabbits- Leaderboardで利用できる。 Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping brand and generic drug names using physician expert annotations. We assess both open-source and API-based LLMs on MedQA and MedMCQA, revealing a consistent performance drop ranging from 1-10\%. Furthermore, we identify a potential source of this fragility as the contamination of test data in widely used pre-training datasets. All code is accessible at https://github.com/BittermanLab/RABBITS, and a HuggingFace leaderboard is available at https://huggingface.co/spaces/AIM-Harvard/rabbits-leaderboard.	翻訳日:2024-06-19 23:57:19 公開日:2024-06-17
# Satyrn: 分析強化世代のためのプラットフォーム Satyrn: A Platform for Analytics Augmented Generation ( http://arxiv.org/abs/2406.12069v1 ) ライセンス: Link先を確認	Marko Sterbentz, Cameron Barrie, Shubham Shahi, Abhratanu Dutta, Donna Hooshmand, Harper Pack, Kristian J. Hammond,	(参考訳) 大規模言語モデル(LLM)は文書を作成でき、検索拡張生成(RAG)は、流速を犠牲にすることなく精度を向上する強力な方法であることが示されている。しかし、すべての情報をテキストから取り出すことはできない。本稿では、構造化データの解析を用いて、検索された文書がRAGで使用されるのとほとんど同じように、生成をガイドするために使用される事実集合を生成するアプローチを提案する。この分析拡張生成(AAG)アプローチは、標準的な分析技術を使用して、テキストに変換してLLMに渡される事実を生成する能力をサポートする。我々は、AAGを利用して大規模データベース上に構築された正確で流動的でコヒーレントなレポートを生成する、ニューロシンボリックなプラットフォームであるSatyrnを提案する。実験の結果,約57%のクレームが正確である GPT-4 Code Interpreter と比較して,Mistral-7B のようなより小さな言語モデルを用いても,高いフラレンシとコヒーレンスを維持しつつ,精度の高いクレームを 86% 以上生成していることがわかった。 Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.	翻訳日:2024-06-19 23:57:19 公開日:2024-06-17
# DTGB: 動的テキスト分散グラフの総合ベンチマーク DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs ( http://arxiv.org/abs/2406.12072v1 ) ライセンス: Link先を確認	Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying,	(参考訳) 動的テキスト分散グラフ(DyTAG)は、各ノードとエッジがテキスト記述と関連付けられ、グラフ構造とテキスト記述の両方が時間とともに進化する様々な実世界のシナリオで一般的である。適用性は広いが、DyTAGに合わせたベンチマークデータセットが不足しているため、多くの研究分野での潜在的な進歩を妨げている。このギャップに対処するために、動的テキスト分散グラフベンチマーク(DTGB)を導入します。これは、テキスト属性とカテゴリを動的に変更することで、ノードとエッジを豊かにする、さまざまなドメインからの大規模で時間進化的なグラフのコレクションです。 DTGBの使用を容易にするため,将来的なリンク予測,宛先ノード検索,エッジ分類,テキスト関係生成の4つの実世界のユースケースに基づいた標準化された評価手順を設計した。これらのタスクは、動的グラフ構造と自然言語の両方を理解するためにモデルを必要とし、DyTAGsによって引き起こされるユニークな課題を強調します。さらに、DTGB上で広範囲なベンチマーク実験を行い、7つの人気のある動的グラフ学習アルゴリズムと、LLM埋め込みによるテキスト属性への適応のバリエーションを6つの強力な大言語モデル(LLM)とともに評価した。以上の結果から,DyTAGの処理における既存モデルの限界が示唆された。また, 構造力学とテキスト力学の一体化について, DTGBの有用性を考察した。提案されたDTGBは、DyTAGとその幅広い応用に関する研究を促進する。動的グラフ構造と自然言語間の相互作用を扱うためのモデルの評価と進化のための包括的なベンチマークを提供する。データセットとソースコードはhttps://github.com/zjs123/DTGBで入手できる。 Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs. Moreover, we conduct extensive benchmark experiments on DTGB, evaluating 7 popular dynamic graph learning algorithms and their variants of adapting to text attributes with LLM embeddings, along with 6 powerful large language models (LLMs). Our results show the limitations of existing models in handling DyTAGs. Our analysis also demonstrates the utility of DTGB in investigating the incorporation of structural and textual dynamics. The proposed DTGB fosters research on DyTAGs and their broad applications. It offers a comprehensive benchmark for evaluating and advancing models to handle the interplay between dynamic graph structures and natural language. The dataset and source code are available at https://github.com/zjs123/DTGB.	翻訳日:2024-06-19 23:57:19 公開日:2024-06-17
# コミュニティ・クロス・インストラクト:大規模言語モデルをオンラインコミュニティにアライメントするための教師なしインストラクション生成 COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities ( http://arxiv.org/abs/2406.12074v1 ) ライセンス: Link先を確認	Zihao He, Rebecca Dorn, Siyi Guo, Minh Duc Chu, Kristina Lerman,	(参考訳) 社会科学者は、人口の意見や信念を調査するために調査を行っているが、これらの手法は遅く、費用がかかり、偏見がちである。大規模言語モデル(LLM)の最近の進歩は、人口の言語、スタイル、態度を模倣する人間のような反応を生成する集団の計算表現や「デジタル双子」を作成することを可能にする。コミュニティ・クロス・インストラクション(Community-Cross-Instruct)は、LLMをオンラインコミュニティに調整し、彼らの信念を導き出すための、教師なしのフレームワークである。コミュニティ・クロス・インストラクトは,コミュニティのオンライン議論のコーパスを前提として,先進的なLCMによるインストラクション・アウトプット・ペアを自動生成し,(1)基礎的なLCMを微調整してコミュニティを忠実に表現し,(2)細調整されたモデルのコミュニティへのアライメントを評価する。 Reddit上で政治・フィットネスのコミュニティを正確に表現する上で,本手法の有用性を実証する。従来の方法とは異なり、Community-Cross-Instructは、完全に教師なしの方法で命令を生成し、拡張性とドメイン間の一般化を促進する。この作業により、様々なオンラインコミュニティの費用対効果と自動調査が可能になる。 Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune an foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and fitness communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities.	翻訳日:2024-06-19 23:57:19 公開日:2024-06-17
# 宣言的時間仕様に対するファジィログのコンパタンスチェック Conformance Checking of Fuzzy Logs against Declarative Temporal Specifications ( http://arxiv.org/abs/2406.12078v1 ) ライセンス: Link先を確認	Ivan Donadello, Paolo Felli, Craig Innes, Fabrizio Maria Maggi, Marco Montali,	(参考訳) 従来の適合性チェックタスクは、イベントデータが実際のプロセス実行の忠実で完全な表現を提供すると仮定します。多くの場合、イベントは明示的にトレースされないが、イベント認識パイプラインの結果として間接的に取得されるため、本質的に不確実性が発生する。本研究では、不確実性の典型的な確率論的解釈とは違い、ファジィ意味論の下で、不確実性が実際にどの活動が行われているかを示す場合について考察する。本稿では,ファジィ事象データが宣言的時間規則に適合しているか,あるいはより一般的には,有限トレース(LTLf)上の線形時間論理の定式化として検討する。これは、各瞬間に1つのアクティビティのみが実行されるという仮定を緩和し、ファジィセマンティクスで論理のブール演算子を再定義する必要がある。具体的には、3倍のコントリビューションを提供します。まず,我々の目的に合わせて,ファジィなLTLfを定義する。次に,この論理の検証問題としてファジィログに対する適合性チェックを行った。第三に、複数のファジィトレースの適合性をチェックするのに適した、PythonライブラリPyTorchに基づく概念実証、効率的な実装を提供する。 Traditional conformance checking tasks assume that event data provide a faithful and complete representation of the actual process executions. This assumption has been recently questioned: more and more often events are not traced explicitly, but are instead indirectly obtained as the result of event recognition pipelines, and thus inherently come with uncertainty. In this work, differently from the typical probabilistic interpretation of uncertainty, we consider the relevant case where uncertainty refers to which activity is actually conducted, under a fuzzy semantics. In this novel setting, we consider the problem of checking whether fuzzy event data conform with declarative temporal rules specified as Declare patterns or, more generally, as formulae of linear temporal logic over finite traces (LTLf). This requires to relax the assumption that at each instant only one activity is executed, and to correspondingly redefine boolean operators of the logic with a fuzzy semantics. Specifically, we provide a threefold contribution. First, we define a fuzzy counterpart of LTLf tailored to our purpose. Second, we cast conformance checking over fuzzy logs as a verification problem in this logic. Third, we provide a proof-of-concept, efficient implementation based on the PyTorch Python library, suited to check conformance of multiple fuzzy traces at once.	翻訳日:2024-06-19 23:57:19 公開日:2024-06-17
# 多次元プルーニング:レイテンシ制約による結合チャネル, 層, ブロックプルーニング Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint ( http://arxiv.org/abs/2406.12079v1 ) ライセンス: Link先を確認	Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez,	(参考訳) 様々な視覚タスクにおける性能の境界を推し進めると、モデルのサイズはそれに応じて大きくなる。この成長に追従するためには、エッジデバイスへの効率的な推論とデプロイのための非常に積極的なプルーニング技術が必要です。既存のプルーニング手法はチャネルプルーニングに限られており、アグレッシブパラメータ削減に苦慮している。本稿では,遅延制約に固執しつつ,チャネル,レイヤ,ブロック間のプルーニングを協調的に最適化する,新しい多次元プルーニングフレームワークを提案する。我々は,プルーニング中にモデル全体の遅延変動を正確に把握する遅延モデリング手法を開発し,高いプルーニング比で最適な遅延精度トレードオフを実現するために重要である。混合整数非線形プログラム (MINLP) としてプルーニングを再構成し, 最適プルーニング構造を1パスのみで効率的に決定する。以上の結果から, 従来手法に比べて, 特に大きな刈り取り率で大幅な改善が見られた。分類では,Top-1精度が70.0(v.s. 68.6),FPSが5262 im/s(v.s. 4101 im/s)であった。 3Dオブジェクト検出では,StreamPETRを45%のプルーニング比で刈り上げ,FPS(37.3 vs. 31.7)とmAP(0.451 vs. 0.449)を高密度ベースラインより高めることにより,新たな最先端技術を確立する。 As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 大規模データセットのリアルタイムレンダリングのための階層型3次元ガウス表現 A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets ( http://arxiv.org/abs/2406.12080v1 ) ライセンス: Link先を確認	Bernhard Kerbl, Andréas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis,	(参考訳) 新たなビュー合成は、視覚的品質、高速トレーニング、リアルタイムレンダリングに優れたレベルを提供する3Dガウススプラッティングによって、近年大きな進歩を遂げている。しかし、トレーニングやレンダリングに必要なリソースは、必然的に、優れた視覚的品質で表現できるキャプチャされたシーンのサイズを制限します。我々は,非常に大きなシーンの視覚的品質を保った3次元ガウスの階層構造を導入し,有効レベルの選択と階層間のスムーズな遷移を伴う遠隔コンテンツの効率的なレンダリングを行うための,効率的なレベル・オブ・ディーテール(LOD)ソリューションを提供する。チャンクを階層に集約し、ガウスの視覚的品質をさらに改善し、中間ノードにマージする。非常に大きなキャプチャは、通常、シーンの少ないカバレッジを持ち、元の3Dガウススプラッティング訓練法に多くの課題をもたらします。我々は,非常に大きなシーンをリアルタイムにレンダリングし,LOD法により利用可能なリソースに適応できる,完全なソリューションを提案する。単純で手頃な価格のリグで何万枚もの画像を撮影し、最大数キロの軌道をカバーし、最大1時間持続する。 Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/ Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# ディープHM-SORT: スポーツにおける多目的追跡の深部特徴, ハーモニック平均, 拡張IOU Deep HM-SORT: Enhancing Multi-Object Tracking in Sports with Deep Features, Harmonic Mean, and Expansion IOU ( http://arxiv.org/abs/2406.12081v1 ) ライセンス: Link先を確認	Matias Gran-Henriksen, Hans Andreas Lindgaard, Gabriel Kiss, Frank Lindseth,	(参考訳) 本稿では,スポーツシナリオにおけるスポーツ選手の追跡を強化するために設計された,新しいオンライン多目的追跡アルゴリズムであるDeep HM-SORTを紹介する。従来の多目的追跡手法は、プレイヤーの類似した外観、不規則で予測不可能な動き、重要なカメラの動きのために、しばしばスポーツ環境に苦しむ。 Deep HM-SORTは、深い特徴、調和平均、拡張IOUを統合することで、これらの課題に対処する。本手法は,高調波平均を利用して外見と動きのバランスを効果的に保ち,IDスワップを著しく低減する。さらに,本手法では,全てのトラックレットを無期限に保持し,フレームを離れて再入場する選手の再識別を改善する。実験の結果,SportsMOT と SoccerNet Tracking Challenge 2023 の2つの大規模公開ベンチマークにおいて,Deep HM-SORT が最先端の性能を達成した。具体的には,SportsMOTデータセットでは85.4HOTA,SportsMOTデータセットでは85.4HOTAを達成し,HOTA,IFF1,AssA,MOTAといった重要な指標において既存のトラッカーよりも優れていた。この堅牢なソリューションは、自動スポーツ分析の精度と信頼性を向上し、計算コストを増やすことなく、以前の方法よりも大幅に改善する。 This paper introduces Deep HM-SORT, a novel online multi-object tracking algorithm specifically designed to enhance the tracking of athletes in sports scenarios. Traditional multi-object tracking methods often struggle with sports environments due to the similar appearances of players, irregular and unpredictable movements, and significant camera motion. Deep HM-SORT addresses these challenges by integrating deep features, harmonic mean, and Expansion IOU. By leveraging the harmonic mean, our method effectively balances appearance and motion cues, significantly reducing ID-swaps. Additionally, our approach retains all tracklets indefinitely, improving the re-identification of players who leave and re-enter the frame. Experimental results demonstrate that Deep HM-SORT achieves state-of-the-art performance on two large-scale public benchmarks, SportsMOT and SoccerNet Tracking Challenge 2023. Specifically, our method achieves 80.1 HOTA on the SportsMOT dataset and 85.4 HOTA on the SoccerNet-Tracking dataset, outperforming existing trackers in key metrics such as HOTA, IDF1, AssA, and MOTA. This robust solution provides enhanced accuracy and reliability for automated sports analytics, offering significant improvements over previous methods without introducing additional computational cost.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 微調整暗黙関数の不確かさモデリング Uncertainty modeling for fine-tuned implicit functions ( http://arxiv.org/abs/2406.12082v1 ) ライセンス: Link先を確認	Anna Susmelj, Mael Macuglia, Nataša Tagasovska, Reto Sutter, Sebastiano Caprara, Jean-Philippe Thiran, Ender Konukoglu,	(参考訳) ニューラルネットワーク(NeRF)、占有ネットワーク、符号付き距離関数(SDF)などの暗黙の関数は、スパースビューから詳細な物体形状を再構築するコンピュータビジョンにおいて重要な役割を担っている。これらのモデルで最適な性能を達成することは、データの破損によって引き起こされる入力と分散シフトの極端に分散しているため、困難である。この目的のために、大きなノイズのない合成データセットは、モデルがギャップを埋めるのを助けるために、形状の先行として機能するが、その結果の再構築には注意が必要である。これらの復元の質を評価するためには不確実性推定が不可欠であり、特にモデルが以前から推測した部分について不確実である地域を特定する上で重要である。本稿では,暗黙関数における不確実性推定手法であるDropsemblesを紹介する。我々は,おもちゃの例から始まり,現実のシナリオへと進む一連の実験を通じて,アプローチの有効性を実証する。具体的には、合成解剖学的データに基づいて畳み込み職業ネットワークを訓練し、腰椎の低分解能MRIセグメント上でテストする。その結果,Dropsemblesは深層アンサンブルの精度とキャリブレーションレベルを達成するが,計算コストは著しく低いことがわかった。 Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 量子誤り訂正符号の等価クラス Equivalence Classes of Quantum Error-Correcting Codes ( http://arxiv.org/abs/2406.12083v1 ) ライセンス: Link先を確認	Andrey Boris Khesin, Alexander Li,	(参考訳) 量子過程に影響を与える固有のノイズに対処するために、量子誤り訂正符号(QECC)が必要である。 ZX計算を用いて、テンソルネットワークからなるZXダイアグラムと呼ばれる形式でQECCを表す。本稿では,CSSコードとCSS状態(入力0のCSSコード)の標準形式を示し,トーリックコードと特定の表面符号の標準形式を示す。次に、素コードダイアグラム、単一の連結コンポーネントを持つコードのZXダイアグラムの概念を導入し、リライトルールのシーケンスがそのようなダイアグラムを2つの連結コンポーネントに分割することができない特性について述べる。また、クリフォード符号の基本定理を示し、クリフォード符号の素分解の存在と特異性を証明した。次に、出力の置換と出力上の任意の局所演算を可能にする同値性の定義が異なるZXダイアグラムの同値類を集計する。これらの同値クラスの考えられる代表が分析される。この研究は、ZX図形表現におけるQECCの正準形式を探索する以前の研究を拡張している。 Quantum error-correcting codes (QECC's) are needed to combat the inherent noise affecting quantum processes. Using ZX calculus, we represent QECC's in a form called a ZX diagram, consisting of a tensor network. In this paper, we present canonical forms for CSS codes and CSS states (which are CSS codes with 0 inputs), and we show the resulting canonical forms for the toric code and certain surface codes. Next, we introduce the notion of prime code diagrams, ZX diagrams of codes that have a single connected component with the property that no sequence of rewrite rules can split such a diagram into two connected components. We also show the Fundamental Theorem of Clifford Codes, proving the existence and uniqueness of the prime decomposition of Clifford codes. Next, we tabulate equivalence classes of ZX diagrams under a different definition of equivalence that allows output permutations and any local operations on the outputs. Possible representatives of these equivalence classes are analyzed. This work expands on previous works in exploring the canonical forms of QECC's in their ZX diagram representations.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 推論と情報集約 : スポーツナラティブを用いた事例研究 When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives ( http://arxiv.org/abs/2406.12084v1 ) ライセンス: Link先を確認	Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu,	(参考訳) LLMが関連情報を正確に集約する場合、推論は最も強力である。スポーツ物語の分析を LLM が要求する推論における情報集約の重要性について検討する。このタスクを成功させるためには、LCMはアクションからポイントを推測し、関連するエンティティを特定し、プレイヤーやチームに正確に属性ポイントを割り当て、結論を引き出すために重要な統計データをコンパイルする必要がある。我々はNBAの実際のバスケットボールデータを用いて総合的な実験を行い、ゲーム物語を合成する新しい手法であるSportsGenを提示する。データの合成により, 物語の長さや情報密度の異なる複雑なシナリオ下で, LLMの推論能力を厳格に評価することができる。その結果, GPT-4oを含むほとんどのモデルでは, 頻繁な得点パターンのため, バスケットボールの得点を正確に集計することができないことが判明した。 Llama-3のようなオープンソースのモデルは、さらに大きなスコア幻覚に悩まされている。最後に、推論の有効性は、物語の複雑さ、情報密度、ドメイン固有の用語の影響を受け、分析的推論タスクにおける課題を浮き彫りにする。 Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer points from actions, identify related entities, attribute points accurately to players and teams, and compile key statistics to draw conclusions. We conduct comprehensive experiments with real NBA basketball data and present SportsGen, a new method to synthesize game narratives. By synthesizing data, we can rigorously evaluate LLMs' reasoning capabilities under complex scenarios with varying narrative lengths and density of information. Our findings show that most models, including GPT-4o, often fail to accurately aggregate basketball scores due to frequent scoring patterns. Open-source models like Llama-3 further suffer from significant score hallucinations. Finally, the effectiveness of reasoning is influenced by narrative complexity, information density, and domain-specific terms, highlighting the challenges in analytical reasoning tasks.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 最適量子線形系解法への近道 A shortcut to an optimal quantum linear system solver ( http://arxiv.org/abs/2406.12086v1 ) ライセンス: Link先を確認	Alexander M. Dalzell,	(参考訳) A\boldsymbol{x}=\boldsymbol{b}$ 方程式の線形系が与えられたとき、量子線型系ソルバ (QLSSs) は量子状態 $\|\boldsymbol{x}\rangle$ をおよそ準備し、振幅は解ベクトル $\boldsymbol{x}$ に比例する。漸近的に最適なQLSSはクエリ複雑性が$O(\kappa \log(1/\varepsilon))$で、$\kappa$は$A$の条件番号、$\varepsilon$は近似エラーである。しかしながら、既存の最適かつほぼ最適なQLSSのランタイム保証には、一定の事前ファクタが適していない。ここでは、これらのテクニックを使用しない概念的にシンプルなQLSSを提供します。ソリューションノルム $\lVert\boldsymbol{x}\rVert$ が正確に知られている場合、我々の QLSS はカーネルリフレクションの1つのアプリケーションしか必要とせず、QLSS のクエリ複雑性は $(1+O(\varepsilon))\kappa \ln(2\sqrt{2}/\varepsilon)$ である。ノルムが不明な場合、我々の手法は、$O(\log\log(\kappa))$カーネルプロジェクションの応用(EFの直接一般化)を用いて定数係数まで推定することができ、ほぼ最適の$O(\kappa \log(\kappa)\log\log(\kappa)+\kappa\log(1/\varepsilon))$トータル複雑性を持つ単純なQLSSが得られる。あるいは、adiabatic path-following 法から概念を再導入することにより、$O(\kappa)$ complexity がノルム推定のために達成され、$O(\kappa\log(1/\varepsilon))$ complexity で最適な QLSS が得られるが、それでもadiabatic theorem を呼び出す必要はない。最後に、最適QLSSの複雑さに対して、56\kappa+1.05\kappa \ln(1/\varepsilon)+o(\kappa)$の明示的な上限を計算する。 Given a linear system of equations $A\boldsymbol{x}=\boldsymbol{b}$, quantum linear system solvers (QLSSs) approximately prepare a quantum state $\|\boldsymbol{x}\rangle$ for which the amplitudes are proportional to the solution vector $\boldsymbol{x}$. Asymptotically optimal QLSSs have query complexity $O(\kappa \log(1/\varepsilon))$, where $\kappa$ is the condition number of $A$, and $\varepsilon$ is the approximation error. However, runtime guarantees for existing optimal and near-optimal QLSSs do not have favorable constant prefactors, in part because they rely on complex or difficult-to-analyze techniques like variable-time amplitude amplification and adiabatic path-following. Here, we give a conceptually simple QLSS that does not use these techniques. If the solution norm $\lVert\boldsymbol{x}\rVert$ is known exactly, our QLSS requires only a single application of kernel reflection (a straightforward extension of the eigenstate filtering (EF) technique of previous work) and the query complexity of the QLSS is $(1+O(\varepsilon))\kappa \ln(2\sqrt{2}/\varepsilon)$. If the norm is unknown, our method allows it to be estimated up to a constant factor using $O(\log\log(\kappa))$ applications of kernel projection (a direct generalization of EF) yielding a straightforward QLSS with near-optimal $O(\kappa \log\log(\kappa)\log\log\log(\kappa)+\kappa\log(1/\varepsilon))$ total complexity. Alternatively, by reintroducing a concept from the adiabatic path-following technique, we show that $O(\kappa)$ complexity can be achieved for norm estimation, yielding an optimal QLSS with $O(\kappa\log(1/\varepsilon))$ complexity while still avoiding the need to invoke the adiabatic theorem. Finally, we compute an explicit upper bound of $56\kappa+1.05\kappa \ln(1/\varepsilon)+o(\kappa)$ for the complexity of our optimal QLSS.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# クリックスルーレート予測モデルのための相互学習 Mutual Learning for Finetuning Click-Through Rate Prediction Models ( http://arxiv.org/abs/2406.12087v1 ) ライセンス: Link先を確認	Ibrahim Can Yilmaz, Said Aldemir,	(参考訳) クリックスルーレート(CTR)予測はデジタル広告やオンラインショッピングといったデジタル産業において重要な課題となっている。多くのディープラーニングベースの手法が実装され、ドメインにおける最先端のモデルとなっている。 CTRモデルの性能向上のために、知識蒸留に基づくアプローチが広く用いられている。しかし、現在のCTR予測モデルのほとんどは、あまり複雑なアーキテクチャを持っていないため、それらのうちの1つを「面倒」、もう1つを「汚い」とするのは困難です。一方、複雑なモデルと単純なモデルの間にも、パフォーマンスのギャップはそれほど大きくない。そのため、あるモデルから別のモデルへの知識の蒸留は、その努力に値するものではなかった。これらの考慮の下では、相互学習は、すべてのモデルを相互に改善できるため、より良いアプローチになり得る。本稿では,相互学習アルゴリズムが対等である場合に,いかに有用かを示す。 CriteoデータセットとAvazuデータセットの実験では、相互学習アルゴリズムがモデルの性能を最大0.66%改善した。 Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR prediction models do not have much complex architectures, so it's hard to call one of them 'cumbersome' and the other one 'tiny'. On the other hand, the performance gap is also not very large between complex and simple models. So, distilling knowledge from one model to the other could not be worth the effort. Under these considerations, Mutual Learning could be a better approach, since all the models could be improved mutually. In this paper, we showed how useful the mutual learning algorithm could be when it is between equals. In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 多体量子幾何双極子 Many-Body Quantum Geometric Dipole ( http://arxiv.org/abs/2406.12089v1 ) ライセンス: Link先を確認	H. A. Fertig, Luis Brey,	(参考訳) 多体電子系の集合励起は内部構造を持ち、ヒルベルト空間の量子幾何学と結びついている。これは「量子幾何学的双極子」 (QGD) を持ち、これは本質的に状態に関連付けられた電気双極子モーメントである。この研究で、この性質は、単一粒子ホール状態の項で表される波動関数を必要としない、汎用的な方法で定式化できることを実証する。我々の定式化は、運動量${\bf K}$で連続的に進化する励起の枝に付随する密度行列を利用しており、そこからQGDの構築を可能にする単一粒子状態を取り出すことができる。 2つの量子ホール系の励起状態に対する単一モード近似を用いて定式化を行う: 1つは積分的に満たされたランダウレベル、もう1つは補充係数$\nu=1/m$の分数量子ホール状態で、$m$の奇数整数である。どちらの場合も QGD に対して同じ結果が得られるが、これは系が仮定する翻訳的不変性に起因する。本研究は,QGDが集合モードの固有特性であり,波動関数の近似を超越して有効であることを示す。 Collective excitations of many-body electron systems can carry internal structure, tied to the quantum geometry of the Hilbert space in which they are embedded. This has been shown explicitly for particle-hole-like excitations, which carry a ``quantum geometric dipole'' (QGD) that is essentially an electric dipole moment associated with the state. We demonstrate in this work that this property can be formulated in a generic way, which does not require wavefunctions expressed in terms of single particle-hole states. Our formulation exploits the density matrix associated with a branch of excitations that evolves continuously with its momentum ${\bf K}$, from which one may extract single-particle states allowing a construction of the QGD. We demonstrate the formulation using the single-mode approximation for excited states of two quantum Hall systems: the first for an integrally filled Landau level, and the second for a fractional quantum Hall state at filling factor $\nu=1/m$, with $m$ an odd integer. In both cases we obtain the same result for the QGD, which can be attributed to the translational invariance assumed of the system. Our study demonstrates that the QGD is an intrinsic property of collective modes which is valid beyond approximations one might make for their wavefunctions.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# LLMアライメントに対する毒の脅威は本当にあるのか? Is poisoning a real threat to LLM alignment? Maybe more so than you think ( http://arxiv.org/abs/2406.12091v1 ) ライセンス: Link先を確認	Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang,	(参考訳) 近年のRLHF(Reinforcement Learning with Human Feedback)は,Large Language Models(LLM)のアライメントに大きな影響を与えている。 PPO(Proximal Policy Optimization)のような強化学習アルゴリズムの感度は、RLHFを教師付き学習フレームワークとして扱うDPO(Direct Policy Optimization)の新たなラインワークにつながっている。これらのRLHF手法の実用性の向上は、その脆弱性の分析を保証している。本研究は,DPOの攻撃に対する脆弱性を異なるシナリオで調査し,第1種である嗜好中毒の有効性を比較した。 DPOの脆弱性は、バックドアや非バックドア攻撃、さまざまな言語モデル(LLama 7B, Mistral 7B, Gemma 7B)で網羅的に分析する。バックドア攻撃に関して、有害な行動を誘発するためには、少なくとも4\%のデータを汚染する必要があるPPOベースの手法とは違って、DPOの真の脆弱性をより簡単に活用することで、データの0.5\%でモデルに毒を与えることができる。脆弱性の背後にある潜在的な理由と、この脆弱性がバックドアと非バックドアの攻撃にどの程度うまく変換されるかをさらに調査する。 Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO's vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# ユーザペルソナと潜在的ミスアライメントのメカニズム Who's asking? User personas and the mechanics of latent misalignment ( http://arxiv.org/abs/2406.12094v1 ) ライセンス: Link先を確認	Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon,	(参考訳) モデル安全性の改善への投資にもかかわらず、安全に配慮したモデルでは、不整合性は相変わらず維持されていることが研究で示されている。この研究において、我々はこの現象の力学に光を当てた。まず、モデル世代が安全である場合でも、有害なコンテンツは隠された表現に留まり、以前のレイヤから復号することで抽出できることを示す。そして,モデルがそのようなコンテンツを拡散するか否かは,相手に対する認識に大きく依存していることを示し,これをユーザペルソナと呼ぶ。実際、ユーザペルソナの操作は、モデル拒絶を直接制御しようとする試みよりも有害なコンテンツを引き出すのに効果的であることがわかった。自然言語のプロンプトとアクティベーションステアリングの両方を制御法として検討し、アクティベーションステアリングが安全フィルタをバイパスするのに著しく有効であることを示す。特定のペルソナがモデルセーフガードを破る理由を調査し、そのモデルが危険なクエリのより慈善的な解釈を形成することを確認した。最後に, 操舵ベクトルの幾何学のみを考慮すれば, 拒絶に対するペルソナの影響を予測できることを示す。 Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. First, we show that even when model generations are safe, harmful content can persist in hidden representations and can be extracted by decoding from earlier layers. Then, we show that whether the model divulges such content depends significantly on its perception of who it is talking to, which we refer to as user persona. In fact, we find manipulating user persona to be even more effective for eliciting harmful content than direct attempts to control model refusal. We study both natural language prompting and activation steering as control methods and show that activation steering is significantly more effective at bypassing safety filters. We investigate why certain personas break model safeguards and find that they enable the model to form more charitable interpretations of otherwise dangerous queries. Finally, we show we can predict a persona's effect on refusal given only the geometry of its steering vector.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# DistillNeRF: ニューラルネットワークと基礎モデル特徴の蒸留による単一視点画像からの3次元シーンの認識 DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features ( http://arxiv.org/abs/2406.12095v1 ) ライセンス: Link先を確認	Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus,	(参考訳) 自律運転における2次元の限られた観察から3次元環境を理解することの難しさに対処する自己教師型学習フレームワークであるDistillNeRFを提案する。提案手法は,スパース,シングルフレームのマルチビューカメラ入力からリッチなニューラルシーン表現を予測する一般化可能なフィードフォワードモデルであり,RGB,深度,特徴画像の再構成のために,可変レンダリングを用いて自己教師を行う。我々の最初の洞察は、トレーニングのために深度と仮想カメラターゲットを生成することで、シーンごとの最適化されたニューラルレージアンスフィールド(NeRF)を活用することである。次に,CLIPやDINOv2のような事前訓練された2次元基礎モデルから特徴を抽出し,コストのかかる3次元アノテーションを必要とせずに,下流の様々なタスクを可能にすることを提案する。これら2つの知見を活用するために,2段階のリフト・スプラット・エンコーダとパラメータ化されたスパース階層のボクセル表現を用いた新しいモデルアーキテクチャを導入する。 NuScenesデータセットの実験結果によると、DistillNeRFはシーン再構成、新規ビュー合成、深度推定といった既存の自己監督手法よりも大幅に優れており、競争力のあるゼロショット3Dセマンティック占有率予測や、蒸留基礎モデルの特徴によるオープンワールドのシーン理解を可能にしている。デモとコードはhttps://distillnerf.github.io/.com/で公開される。 We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images. Our first insight is to exploit per-scene optimized Neural Radiance Fields (NeRFs) by generating dense depth and virtual camera targets for training, thereby helping our model to learn 3D geometry from sparse non-overlapping image inputs. Second, to learn a semantically rich 3D representation, we propose distilling features from pre-trained 2D foundation models, such as CLIP or DINOv2, thereby enabling various downstream tasks without the need for costly 3D human annotations. To leverage these two insights, we introduce a novel model architecture with a two-stage lift-splat-shoot encoder and a parameterized sparse hierarchical voxel representation. Experimental results on the NuScenes dataset demonstrate that DistillNeRF significantly outperforms existing comparable self-supervised methods for scene reconstruction, novel view synthesis, and depth estimation; and it allows for competitive zero-shot 3D semantic occupancy prediction, as well as open-world scene understanding through distilled foundation model features. Demos and code will be available at https://distillnerf.github.io/.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 分布シフト下における軌道予測のための適応的不確実性定量化 Adaptive Uncertainty Quantification for Trajectory Prediction Under Distributional Shift ( http://arxiv.org/abs/2406.12100v1 ) ライセンス: Link先を確認	Huiqun Huang, Sihong He, Fei Miao,	(参考訳) 将来の有限軌跡とそれに伴う目標車両の不確実性の両方をオンライン環境で推測できる軌道予測モデル(例:現実世界のアプリケーションシナリオ)は、自律走行車の動きの安全で堅牢なナビゲーションと経路計画の確保に不可欠である。しかし,既存の軌道予測モデルの大部分は,トレーニング段階における不確実性を1つの目的として低減することや,潜在的分布シフト下での推論段階における確実な不確実性定量化を提供することは考えていない。そこで本研究では,既存の軌道予測モデルの予測軌道の不確かさを,予測精度の向上とトレーニング段階における予測不確かさの低減を考慮しながら定量的に定量化する,分散シフトフレームワークCUQDS(Conformal Uncertainty Quantification under Distribution Shift framework)を提案する。特にCUQDSは 1)学習に基づくガウス過程回帰モジュールで、ベースモデル(既存の軌道予測や時系列予測ニューラルネットワーク)の出力分布をモデル化し、損失項の追加による推定不確実性を低減する。 2) ガウス過程回帰モジュールから推定された不確かさを、トレーニングデータとテストデータ間の潜在的分散シフトの下でオンライン環境で校正する統計ベースのコンフォーマルP制御モジュール。 Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the uncertainty as one objective during the training stage nor provided reliable uncertainty quantification during inference stage under potential distribution shift. Therefore, in this paper, we propose the Conformal Uncertainty Quantification under Distribution Shift framework, CUQDS, to quantify the uncertainty of the predicted trajectories of existing trajectory prediction models under potential data distribution shift, while considering improving the prediction accuracy of the models and reducing the estimated uncertainty during the training stage. Specifically, CUQDS includes 1) a learning-based Gaussian process regression module that models the output distribution of the base model (any existing trajectory prediction or time series forecasting neural networks) and reduces the estimated uncertainty by additional loss term, and 2) a statistical-based Conformal P control module to calibrate the estimated uncertainty from the Gaussian process regression module in an online setting under potential distribution shift between training and testing data.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 政策と実践の中心: 利用可能な差別的プライバシに関する研究ギャップ Centering Policy and Practice: Research Gaps around Usable Differential Privacy ( http://arxiv.org/abs/2406.12103v1 ) ライセンス: Link先を確認	Rachel Cummings, Jayshree Sarathy,	(参考訳) 数学的に厳格なフレームワークであり、豊富な理論文献を蓄積しているため、多くの専門家は差分プライバシーをプライバシー保護データ分析のゴールドスタンダードとみなしている。差分プライバシーは理論上はクリーンな定式化であるが、実際は重大な課題を生じさせると主張する者もいる。どちらの視点も、私たちの見解では、有効で重要なものです。差分プライバシーの約束と現実世界のユーザビリティのギャップを埋めるために、研究者と実践者は協力してこの技術の政策と実践を進めなければならない。本稿では,ユーザニーズに合わせてリスクフレームワークを開発すること,利害関係者のコミュニケーションを調整すること,プライバシロスパラメータの影響をモデル化すること,効果的なユーザインターフェースに投資すること,ディファレンシャルプライバシシステムのアルゴリズム的および手続き的監査を容易にすること,など,有用なディファレンシャルプライバシ構築に向けたオープンな質問を概説する。 As a mathematically rigorous framework that has amassed a rich theoretical literature, differential privacy is considered by many experts to be the gold standard for privacy-preserving data analysis. Others argue that while differential privacy is a clean formulation in theory, it poses significant challenges in practice. Both perspectives are, in our view, valid and important. To bridge the gaps between differential privacy's promises and its real-world usability, researchers and practitioners must work together to advance policy and practice of this technology. In this paper, we outline pressing open questions towards building usable differential privacy and offer recommendations for the field, such as developing risk frameworks to align with user needs, tailoring communications for different stakeholders, modeling the impact of privacy-loss parameters, investing in effective user interfaces, and facilitating algorithmic and procedural audits of differential privacy systems.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# 分析インテリジェンスエンジンにおけるエンドツーエンドのテキスト-SQL生成 End-to-end Text-to-SQL Generation within an Analytics Insight Engine ( http://arxiv.org/abs/2406.12104v1 ) ライセンス: Link先を確認	Karime Maamari, Amine Mhedhbi,	(参考訳) Text-to-SQLの最近の進歩は、データベース管理システムをデータアクセスの民主化をさらに進めている。今日の言語モデルは、これらの進歩の中核にある。 Distyl AIのAnalytics Insight Engineの開発で経験した、印象的なText-to-SQL生成を可能にする。エンタープライズ顧客との初期の展開は、3つの重要な課題を強調した。まず、データアナリストは、非常に複雑なSQLクエリのオーサリングのサポートを期待する。第二に、リクエストはアドホックで、低レイテンシを必要とする。最後に、生成にはドメイン固有の用語とプラクティスを理解する必要があります。大規模言語モデルを活用したText-to-SQL生成パイプラインの設計と実装は、これらの課題に対処します。このアプローチのコアテナントは、事前処理フェーズで抽出した外部知識、クエリ生成時に適切な外部知識を取得すること、階層的なCTEベースの構造に従ってSQLクエリ生成を分解することに依存しています。最後に、適応フレームワークはフィードバックを利用して外部知識を更新し、時間とともにクエリ生成を改善する。エンドツーエンドのアプローチの概要を説明し、推論中にSQLを生成するオペレータを強調します。 Recent advancements in Text-to-SQL have pushed database management systems towards greater democratization of data access. Today's language models are at the core of these advancements. They enable impressive Text-to-SQL generation as experienced in the development of Distyl AI's Analytics Insight Engine. Its early deployment with enterprise customers has highlighted three core challenges. First, data analysts expect support with authoring SQL queries of very high complexity. Second, requests are ad-hoc and, as such, require low latency. Finally, generation requires an understanding of domain-specific terminology and practices. The design and implementation of our Text-to-SQL generation pipeline, powered by large language models, tackles these challenges. The core tenants of our approach rely on external knowledge that we extract in a pre-processing phase, on retrieving the appropriate external knowledge at query generation time, and on decomposing SQL query generation following a hierarchical CTE-based structure. Finally, an adaptation framework leverages feedback to update the external knowledge, in turn improving query generation over time. We give an overview of our end-to-end approach and highlight the operators generating SQL during inference.	翻訳日:2024-06-19 23:47:35 公開日:2024-06-17
# モノリシック量子プロセッサ用22nmFDSOICMOSにおける低温小型ミリ波広帯域SPSTスイッチ Cryogenic Compact mm-Wave Broadband SPST Switch in 22nm FDSOI CMOS for Monolithic Quantum Processors ( http://arxiv.org/abs/2406.12105v1 ) ライセンス: Link先を確認	T. D. Nhut, S. Bonen, G. Cooke, T. Jager, M. Spasaro, D. Sufra, S. P. Voinigescu, D. Zito,	(参考訳) 本稿では,22nmFDSOICMOS技術を用いた小型ミリ波ブロードバンド単極スイッチ(SPST)の低温特性について報告する。スイッチは2つのn-MOSFETと、基板寄生効果を低減する特別な装置オプションと、分離を改善する第3のn-MOSFETで構成されている。従来の広帯域mm波スイッチとは異なり、大きな受動部品は必要とせず、非常にコンパクトな設計、低損失、高アイソレーション性能を実現している。 2Kでの低温測定では、2.3dB未満の挿入損失、25.3dB未満の分離損失、および11.5dB以下の戻り損失が、DCから70GHzまでの全周波数範囲で示される。 This paper reports the experimental characterization at the cryogenic temperature of a compact mm-wave broadband single-pole single-throw (SPST) switch in 22nm FDSOI CMOS technology. The switch consists of two n-MOSFETs with a special device option to reduce the substrate parasitic effects, and a third n-MOSFET to improve isolation. Unlike prior wideband mm-wave switches, it does not require any large passive components, allowing a very compact design, low loss and high isolation performance. The cryogenic measurements at 2 K show an insertion loss lower than 2.3 dB, an isolation better than 25.3 dB, and the return loss better than -11.5 dB, over the entire frequency range from DC to 70 GHz.	翻訳日:2024-06-19 23:37:51 公開日:2024-06-17
# 生命科学におけるコンピューティング - 初期のアルゴリズムから現代AIへ Computing in the Life Sciences: From Early Algorithms to Modern AI ( http://arxiv.org/abs/2406.12108v1 ) ライセンス: Link先を確認	Samuel A. Donkor, Matthew E. Walsh, Alexander J. Titus,	(参考訳) 生命科学におけるコンピューティングは、1950年代の初期の計算モデルから、現在見られる人工知能(AI)と機械学習(ML)の応用まで、変革的な進化を遂げてきた。本稿では,生命科学におけるコンピューティングの歴史的発展を通じて,重要なマイルストーンと技術進歩を強調した。この議論には、生物学的プロセスの計算モデルの導入、バイオインフォマティクスツールの出現、現代の生命科学研究におけるAI/MLの統合が含まれる。科学的な大規模言語モデルやバイオAIツールなど、生命科学で使用されるAI対応ツールに注意が向けられ、その能力、限界、生物学的リスクへの影響を調べる。本研究は,諸分野における情報的意思決定と効果的なコミュニケーションを確保するために,本質的な用語と概念を明確にし,確立することを目的とする。 Computing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of artificial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scientific large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and effective communication across disciplines.	翻訳日:2024-06-19 23:37:51 公開日:2024-06-17
# LLMはソーシャルメディアからマクロ経済学的ナラティブを学べるか? Can LLMs Learn Macroeconomic Narratives from Social Media? ( http://arxiv.org/abs/2406.12109v1 ) ライセンス: Link先を確認	Almog Gueta, Amir Feder, Zorik Gekhman, Ariel Goldstein, Roi Reichart,	(参考訳) この研究は実証的に$\textit{Narrative Economics}$仮説を検証し、物語(広く広まり、大衆の信念に影響を及ぼすイデア)が経済変動に影響を与えることを示唆している。我々は,X(旧Twitter)からの投稿を含む2つのキュレートされたデータセットを紹介した。自然言語処理(NLP)手法を用いて,ツイートからナラティブを抽出し,要約する。我々は、これらの予測力を、ツイートを組み込んだ$\textit{macroeconomic}$予測や、抽出したナラティブの表現を下流の財務予測タスクに組み込むことでテストする。我々の研究は、物語データを用いてマクロ経済モデルを改善する上での課題を強調し、研究コミュニティがこの重要な課題に現実的に対処する道を開く。学術的な観点から,我々は,Large Language Models (LLMs) を用いた物語抽出と要約のための貴重な洞察とNLPツールを提供し,今後の経済学における物語の役割に関する研究に寄与する。 This study empirically tests the $\textit{Narrative Economics}$ hypothesis, which posits that narratives (ideas that are spread virally and affect public beliefs) can influence economic fluctuations. We introduce two curated datasets containing posts from X (formerly Twitter) which capture economy-related narratives (Data will be shared upon paper acceptance). Employing Natural Language Processing (NLP) methods, we extract and summarize narratives from the tweets. We test their predictive power for $\textit{macroeconomic}$ forecasting by incorporating the tweets' or the extracted narratives' representations in downstream financial prediction tasks. Our work highlights the challenges in improving macroeconomic models with narrative data, paving the way for the research community to realistically address this important challenge. From a scientific perspective, our investigation offers valuable insights and NLP tools for narrative extraction and summarization using Large Language Models (LLMs), contributing to future research on the role of narratives in economics.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# Cancellable Memory Requests: 透過的で軽量なSpectre緩和 Cancellable Memory Requests: A transparent, lightweight Spectre mitigation ( http://arxiv.org/abs/2406.12110v1 ) ライセンス: Link先を確認	Hossam ElAtali, N. Asokan,	(参考訳) 推論はCPUのパフォーマンス向上に基本的だが、Spectre攻撃のような脆弱性を可能にする。これらの攻撃は一般的に3つのステップで展開される: 機密データ(アクセス)に投機的にアクセスし、キャッシュ状態(送信)を変更し、キャッシュタイミングアタック(例えば Flush+Reload)を使用してシークレット(受信)を抽出する。多くのSpectre攻撃は、送信および受信ステップ中にキャッシュタイミング側チャネルを利用する。我々のキーとなる観察は、誤予測が検出され、誤特定命令がスクアッシュされる前に、送信命令を完了させる必要がないことである。代わりに、命令がメモリ階層に要求を実行し、ディスパッチするのに十分である。スカッシング後にやってくるメモリからの応答は、誤って特定されたメモリアクセスに関連するものを含むキャッシュ状態を変化させる。そこで我々はCMR(Cancellable Memory Requests)という,不特定メモリ要求をキャンセルする新しい緩和手法を提案する。スキャッシングの直後に、キャンセルがキャッシュ階層に送信され、下流を伝播し、まだ応答を受けていないキャッシュの変更を防止する。これにより、キャッシュ状態が変更される可能性が低下し、Spectre攻撃が成功する可能性が低下する。 gem5 上で CMR を実装し,実際の Spectre 攻撃を阻止し,性能上のオーバーヘッドがほぼゼロに近いことを示す。我々は,現実的なシステム構成を持つ4つの実世界のプロセッサにおいて,CMRがSpectre攻撃を完全に阻止できることを示す。 Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks, which remain a significant challenge to mitigate without incurring substantial performance overheads. These attacks typically unfold in three steps: they speculatively access sensitive data (access), alter the cache state (transmit), and then utilize a cache timing attack (e.g., Flush+Reload) to extract the secret (receive). Most Spectre attacks exploit a cache timing side channel during the transmit and receive steps. Our key observation is that Spectre attacks do not require the transmit instruction to complete before mis-prediction is detected and mis-speculated instructions are squashed. Instead, it suffices for the instruction to execute and dispatch a request to the memory hierarchy. Responses from memory that arrive after squashing occurs still alter the cache state, including those related to mis-speculated memory accesses. We therefore propose a novel mitigation technique, Cancellable Memory Requests (CMR), that cancels mis-speculated memory requests. Immediately upon squashing, a cancellation is sent to the cache hierarchy, propagating downstream and preventing any changes to caches that have not yet received a response. This reduces the likelihood of cache state changes, thereby reducing the likelihood of Spectre attacks succeeding. We implement CMR on gem5 and show that it thwarts practical Spectre attacks, and has near-zero performance overheads. We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# グラフニューラルネットワークを用いた粗粒力場の熱力学的伝達性 Thermodynamic Transferability in Coarse-Grained Force Fields using Graph Neural Networks ( http://arxiv.org/abs/2406.12112v1 ) ライセンス: Link先を確認	Emily Shinkle, Aleksandra Pachalieva, Riti Bahl, Sakib Matin, Brendan Gifford, Galen T. Craven, Nicholas Lubbers,	(参考訳) 粗粒化(英: coarse-graining)とは、原子論的な系が単純化された方法で表現され、ターゲットの出力に寄与する最も重要な系の特徴を保ちながら、関連性の低い自由度を除去する分子モデリング技術である。このモデル複雑性の低減により、粗粒度分子シミュレーションは、対応する全原子モデルと比較して空間的および時間的スケールが増大する。粗粒化における中核的な課題は、原子レベルの特性を保持する方法で、新しい表現における相互作用を表現する力場を構築することである。粗大きめの力場を構築するための多くのアプローチは、特定の熱力学状態点における内部のゆらぎに対する平均化の結果、異なる熱力学条件の間での伝達性に制限がある。本稿では,階層的相互作用粒子ニューラルネットとテンソル感度(HIP-NN-TS)のグラフ畳み込みニューラルネットワークアーキテクチャを用いて,力マッチング手法に基づく粗粒度モデルの伝達可能性の研究を可能にする,粗粒度フィールドのための高度に自動化されたトレーニングパイプラインを開発する。このアプローチは高い精度の力場を得るだけでなく、これらの力場は様々な熱力学条件を通してより伝達可能であることを示す。これらの結果は、伝達可能な粗い力場の構築を改善するため、グラフニューラルネットワークのような機械学習技術の可能性を示している。 Coarse-graining is a molecular modeling technique in which an atomistic system is represented in a simplified fashion that retains the most significant system features that contribute to a target output, while removing the degrees of freedom that are less relevant. This reduction in model complexity allows coarse-grained molecular simulations to reach increased spatial and temporal scales compared to corresponding all-atom models. A core challenge in coarse-graining is to construct a force field that represents the interactions in the new representation in a way that preserves the atomistic-level properties. Many approaches to building coarse-grained force fields have limited transferability between different thermodynamic conditions as a result of averaging over internal fluctuations at a specific thermodynamic state point. Here, we use a graph-convolutional neural network architecture, the Hierarchically Interacting Particle Neural Network with Tensor Sensitivity (HIP-NN-TS), to develop a highly automated training pipeline for coarse grained force fields which allows for studying the transferability of coarse-grained models based on the force-matching approach. We show that this approach not only yields highly accurate force fields, but also that these force fields are more transferable through a variety of thermodynamic conditions. These results illustrate the potential of machine learning techniques such as graph neural networks to improve the construction of transferable coarse-grained force fields.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# LLM駆動型アクティブラーニングと人間アノテーションによるテキスト分類の強化 Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation ( http://arxiv.org/abs/2406.12114v1 ) ライセンス: Link先を確認	Hamidreza Rouzegar, Masoud Makrehchi,	(参考訳) テキスト分類の文脈では、トレーニングデータを作成するためのアノテーション演習の金銭的負担が重要な問題である。アクティブラーニング技術、特に不確実性サンプリングに根ざした手法は、手動アノテーションの最も指導的なサンプルをピンポイントすることで、コスト効率の良いソリューションを提供する。同様に、GPT-3.5のようなLarge Language Models (LLM) は自動アノテーションの代替を提供するが、その信頼性に関する懸念がある。本研究では,人間のアノテータとLLMをアクティブラーニングフレームワークに統合する新しい手法を提案する。 3つの公開データセットの評価を行った。 IMDB, 信頼度識別のためのFake Newsデータセット, マルチラベル分類のためのMovie Genresデータセット, 提案フレームワークは, モデル不確実性レベルに応じて, 人間のアノテーションとLCMの出力を統合する。この戦略は、コスト効率と分類性能の最適バランスを達成する。実験結果から, モデル精度の維持・改善を図りながら, データアノテーションに関連するコストを大幅に削減した。 In the context of text classification, the financial burden of annotation exercises for creating training data is a critical issue. Active learning techniques, particularly those rooted in uncertainty sampling, offer a cost-effective solution by pinpointing the most instructive samples for manual annotation. Similarly, Large Language Models (LLMs) such as GPT-3.5 provide an alternative for automated annotation but come with concerns regarding their reliability. This study introduces a novel methodology that integrates human annotators and LLMs within an Active Learning framework. We conducted evaluations on three public datasets. IMDB for sentiment analysis, a Fake News dataset for authenticity discernment, and a Movie Genres dataset for multi-label classification.The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels. This strategy achieves an optimal balance between cost efficiency and classification performance. The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# モノリシックシリコン量子プロセッサにおけるスピン量子制御のための低温小型低消費電力60GHz増幅器 Cryogenic Compact Low-Power 60GHz Amplifier for Spin Qubit Control in Monolithic Silicon Quantum Processors ( http://arxiv.org/abs/2406.12115v1 ) ライセンス: Link先を確認	M. Spasaro, S. Bonen, G. Cooke, T. Jager, T. D. Nhut, D. Sufra, S. P. Voinigescu, D. Zito,	(参考訳) 本稿では,モノリシックSi量子プロセッサのための基本構造ブロックとして,電子/ホールスピン量子ビット制御のための低温小型低消費電力60GHz増幅器の設計と評価を報告する。 2Kで試験され、15dBのS21を59 GHzで、BW3dBの52.5-67.5 GHzで、消費電力は2.16 mWである。インダクタレスアクティブネットワークを持つトポロジーのため、増幅器は0.18 x 0.19 mm2のコンパクトコア領域を有する。 This paper reports the design and experimental characterization of a cryogenic compact low-power 60GHz amplifier for control of electron/hole spin qubits, as elementary building block for monolithic Si quantum processors. Tested at 2 K, the amplifier exhibits S21 of 15 dB at 59 GHz, BW3dB of 52.5-67.5 GHz, and power consumption of 2.16 mW. Owing to the topology with inductorless active network, the amplifier has a compact core area of 0.18 x 0.19 mm2.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# Decoding the Narratives: Redditで共有された個人薬物体験の分析 Decoding the Narratives: Analyzing Personal Drug Experiences Shared on Reddit ( http://arxiv.org/abs/2406.12117v1 ) ライセンス: Link先を確認	Layla Bouzoubaa, Elham Aghakhani, Max Song, Minh Trinh, Rezvaneh Rezapour,	(参考訳) 薬物関連のサブレディット(英語版)のようなオンラインコミュニティは、薬物の使用経験、有害度低減、中毒回復に関する議論を促進するため、薬物使用者(PWUD)にとって安全な場所として機能している。これらのフォーラムで利用者が共有する物語は、物質使用障害(SUD)と回復可能性(Recovery potential)を発達させる可能性についての洞察を提供する。本研究は,物質利用経験に関するオンラインユーザ生成テキストを解析するための多レベル多ラベル分類モデルの構築を目的とする。この目的のために,我々はまず,意図した関係(問い合わせや開示),主題(例えば,回収,依存),特定の目的(例えば,再発,品質,安全)など,ポストの性質を評価する新しい分類法を導入する。注釈付きデータの集合上で様々なマルチラベル分類アルゴリズムを用いて、GPT-4が命令、定義、例によって誘導された場合、他の全てのモデルよりも優れていたことを示す。本モデルを用いて,1000以上の投稿をラベル付けし,各クラスにおける投稿内で使用される言語表現のカテゴリを解析する。本分析では, 安全性, 物質の組み合わせ, メンタルヘルスなどのトピックが開示されやすくなり, 生理的効果の議論は害軽減に焦点が当てられている。我々の研究は、PWUDの経験の理解を深め、SUDと薬物使用に関する幅広い知識基盤を伝えます。 Online communities such as drug-related subreddits serve as safe spaces for people who use drugs (PWUD), fostering discussions on substance use experiences, harm reduction, and addiction recovery. Users' shared narratives on these forums provide insights into the likelihood of developing a substance use disorder (SUD) and recovery potential. Our study aims to develop a multi-level, multi-label classification model to analyze online user-generated texts about substance use experiences. For this purpose, we first introduce a novel taxonomy to assess the nature of posts, including their intended connections (Inquisition or Disclosure), subjects (e.g., Recovery, Dependency), and specific objectives (e.g., Relapse, Quality, Safety). Using various multi-label classification algorithms on a set of annotated data, we show that GPT-4, when prompted with instructions, definitions, and examples, outperformed all other models. We apply this model to label an additional 1,000 posts and analyze the categories of linguistic expression used within posts in each class. Our analysis shows that topics such as Safety, Combination of Substances, and Mental Health see more disclosure, while discussions about physiological Effects focus on harm reduction. Our work enriches the understanding of PWUD's experiences and informs the broader knowledge base on SUD and drug use.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# リアルタイム大規模交通ネットワークにおけるハリケーン避難時の効率的な管理のためのスケーラブルな交通予測モデルの構築 Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations ( http://arxiv.org/abs/2406.12119v1 ) ライセンス: Link先を確認	Qinhua Jiang, Brian Yueshuai He, Changju Lee, Jiaqi Ma,	(参考訳) ハリケーン避難時の交通管理には正確な交通予測が不可欠である。本稿では,MLP(Multilayer Perceptron)モデルとLSTM(Long-Short Term Memory)モデルを統合した予測モデリングシステムを提案する。収集された交通データ,時空間道路網情報,ハリケーン予測データなど,さまざまな入力変数を活用することで,異種人の行動,限られた避難データ,ハリケーンイベントの不確実性といった課題に対処する。ルイジアナ州の現実の交通予測システムで展開されたこのモデルは、7日間のハリケーンの影響を受けた期間に6時間にわたって長期の渋滞状態を予測する精度を82%達成した。短期速度予測モデルでは1時間から6時間にわたる避難地平地を7%から13%の範囲で平均絶対パーセンテージ誤差(MAPEs)を示した。評価結果は、ハリケーン避難時の交通管理を強化するモデルの可能性を強調し、実際の展開は、広範な交通ネットワーク内の多様なハリケーンシナリオにおける適応性とスケーラビリティを強調している。 Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road network information, and hurricane forecast data, the framework is designed to address challenges posed by heterogeneous human behaviors, limited evacuation data, and hurricane event uncertainties. Deployed in a real-world traffic prediction system in Louisiana, the model achieved an 82% accuracy in predicting long-term congestion states over a 6-hour period during a 7-day hurricane-impacted duration. The short-term speed prediction model exhibited Mean Absolute Percentage Errors (MAPEs) ranging from 7% to 13% across evacuation horizons from 1 to 6 hours. Evaluation results underscore the model's potential to enhance traffic management during hurricane evacuations, and real-world deployment highlights its adaptability and scalability in diverse hurricane scenarios within extensive transportation networks.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# 強化学習を伴う拡散モデルへの条件制御の追加 Adding Conditional Control to Diffusion Models with Reinforcement Learning ( http://arxiv.org/abs/2406.12120v1 ) ライセンス: Link先を確認	Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali,	(参考訳) 拡散モデルは、生成されたサンプルの特性を正確に制御できる強力な生成モデルである。大規模なデータセットでトレーニングされたこれらの拡散モデルは成功したが、下流の微調整プロセスに新たな制御を導入し、これらの強力なモデルを事前訓練された拡散モデルとして扱う必要があることが多い。本研究は、入力と対応するラベルからなるオフラインデータセットを活用することを目的として、強化学習(RL)に基づく新たな制御手法を提案する。我々は、このタスクをRL問題として定式化し、オフラインデータセットから学習した分類器と、報酬関数として機能する事前訓練されたモデルに対するKLの発散について述べる。我々は、上記の報酬関数を最大化するソフト最適化ポリシーを生成する、$\textbf{C}$onditioning pre-\textbf{T}$rained diffusion model with $\textbf{R}$einforcement $\textbf{L}$earning(英語版))を導入する。我々は,提案手法が推論中に追加制御で条件付き分布からサンプリングできることを正式に証明した。我々のRLベースのアプローチは、既存の方法よりもいくつかのアドバンテージを提供します。一般的な分類器フリーガイダンスと比較して,本手法はサンプル効率を向上し,入力と追加制御の条件付き独立性を利用してオフラインデータセット構築を大幅に単純化することができる。さらに、分類器のガイダンスとは異なり、中間状態から追加制御への分類器の訓練は不要である。 Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to introduce additional controls in downstream fine-tuning processes, treating these powerful models as pre-trained diffusion models. This work presents a novel method based on reinforcement learning (RL) to add additional controls, leveraging an offline dataset comprising inputs and corresponding labels. We formulate this task as an RL problem, with the classifier learned from the offline dataset and the KL divergence against pre-trained models serving as the reward functions. We introduce our method, $\textbf{CTRL}$ ($\textbf{C}$onditioning pre-$\textbf{T}$rained diffusion models with $\textbf{R}$einforcement $\textbf{L}$earning), which produces soft-optimal policies that maximize the abovementioned reward functions. We formally demonstrate that our method enables sampling from the conditional distribution conditioned on additional controls during inference. Our RL-based approach offers several advantages over existing methods. Compared to commonly used classifier-free guidance, our approach improves sample efficiency, and can greatly simplify offline dataset construction by exploiting conditional independence between the inputs and additional controls. Furthermore, unlike classifier guidance, we avoid the need to train classifiers from intermediate states to additional controls.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# TutteNet:2次元メッシュ変形の構成によるインジェクティブ3次元変形 TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations ( http://arxiv.org/abs/2406.12121v1 ) ライセンス: Link先を確認	Bo Sun, Thibault Groueix, Chen Song, Qixing Huang, Noam Aigerman,	(参考訳) 本研究は、3次元空間の射影変形の新たな表現法を提案する。これは、不正確さ、頑健さの欠如、一般学習および最適化フレームワークとの非互換性といった、既存の射影的手法の限界を克服するものである。中心となる考え方は、問題を複数の2Dメッシュベースのピースワイズ線形マップの深い構成に還元することである。すなわち、3次元体積の複雑な3次元インジェクティブ変形を生成するために、異なる平面上にこれらの層を構成する。提案手法は, 複雑な変形を効率よく, 正確に最適化し, 学習し, 他のインジェクティブアプローチよりも優れていることを示す。主な用途として、複雑で人工物のないNeRFおよびSDF変形を生成する。 This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that produce mesh deformations through Tutte's embedding (guaranteed to be injective in 2D), and compose these layers over different planes to create complex 3D injective deformations of the 3D volume. We show our method provides the ability to efficiently and accurately optimize and learn complex deformations, outperforming other injective approaches. As a main application, we produce complex and artifact-free NeRF and SDF deformations.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# ChatEMG: ストロークのためのロボットハンドオーソシスを制御するための合成データ生成 ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke ( http://arxiv.org/abs/2406.12123v1 ) ライセンス: Link先を確認	Jingxi Xu, Runsheng Wang, Siqi Shang, Ava Chen, Lauren Winterbottom, To-Liang Hsu, Wenxi Chen, Khondoker Ahmed, Pedro Leandro La Rotta, Xinyue Zhu, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie,	(参考訳) 脳卒中患者の手指矯正におけるインテント・インフェラルは,障害者からのデータ収集が困難であるため困難である。さらに、EMG信号は、様々な条件、セッション、主題に有意な変化を示しており、分類器の一般化が困難である。従来のアプローチでは、新しい条件やセッション、あるいは列車意図の分類対象からの大きなラベル付きデータセットが必要ですが、このデータ収集プロセスは重荷と時間を要する。本稿では,自動回帰生成モデルであるChatEMGを提案する。 ChatEMGは、新しい状態、セッション、主題から小さなデータセットのみを収集し、この新しいコンテキストからのプロンプトで条件付けされた合成サンプルで拡張することを可能にする。 ChatEMGは、生成トレーニングを通じて以前のデータの巨大なリポジトリを活用すると同時に、プロンプトを通じてコンテキスト固有のままである。実験の結果,これらの合成標本は分類器に依存しず,異なるタイプの分類器に対する意図的推論精度を向上させることができることがわかった。機能的整形外科支援タスクの分類器の使用を含め,我々の完全アプローチを1つの患者セッションに統合できることを実証した。我々の知る限りでは、脳卒中生存者による整形機能制御のために、部分的に合成データに基づいて訓練された意図分類器が配備されたのはこれが初めてである。ビデオと追加情報はhttps://jxu.ai/chatemgで見ることができる。 Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos and additional information can be found at https://jxu.ai/chatemg.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# 大規模言語モデルを用いた効率的な逐次決定法 Efficient Sequential Decision Making with Large Language Models ( http://arxiv.org/abs/2406.12125v1 ) ライセンス: Link先を確認	Dingyang Chen, Qi Zhang, Yinglun Zhu,	(参考訳) 本稿では,大規模言語モデル(LLM)の成功を逐次意思決定に拡張することに焦点を当てる。既存の努力も。一意思決定のための再訓練又は微調整 LLM (II)事前訓練LSMの設計プロンプト前者のアプローチは勾配更新の計算負担に悩まされており、後者のアプローチは有望な結果を示さない。本稿では,LLMエージェントを逐次決定に効率的に組み込むために,オンラインモデル選択アルゴリズムを活用する新しい手法を提案する。統計的には,従来の意思決定アルゴリズムとバニラLSMエージェントの双方より有意に優れている。提案手法は,LLMの高コスト勾配更新を回避し,意思決定プロセスを通じて,少数のLLM呼び出ししか必要としない。提案手法の有効性を検証するため,広範囲な実験を行った。例えば、大規模なAmazonデータセットでは、私たちのアプローチはベースラインよりも6ドル以上パフォーマンスが向上し、LCMをわずか1.5ドル\%の時間ステップで呼び出すことができます。 This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making. Statistically, our approach significantly outperforms both traditional decision making algorithms and vanilla LLM agents. Computationally, our approach avoids the need for expensive gradient updates of LLMs, and throughout the decision making process, it requires only a small number of LLM calls. We conduct extensive experiments to verify the effectiveness of our proposed approach. As an example, on a large-scale Amazon dataset, our approach achieves more than a $6$x performance gain over baselines while calling LLMs in only $1.5$\% of the time steps.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# AIの「新しい」コンテンツファームは作りやすく、検出も難しい AI "News" Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian ( http://arxiv.org/abs/2406.12128v1 ) ライセンス: Link先を確認	Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell'Orletta, Andrea Esuli,	(参考訳) 大規模言語モデル (LLMs) は、実際のニュース記事に伝達可能な合成テキストを生成するために「コンテンツファーム」モデル (CFMs) として使われることが多い。高品質なモノリンガルLLMを持たない言語でも、これはすでに起こっています。 Llama (v1)は、主に英語で訓練され、40Kものイタリア語のニュース記事で、イタリア語の母語話者が合成語として識別するのに苦労するニュースのようなテキストを生成するのに十分であることを示す。我々は,3つのLCMと3つの合成テキスト(log-likelihood, DetectGPT, 教師付き分類)の検出方法を調査し,それらすべてが人間のレーダよりも優れていることを発見した。また、プロキシCFMを作成する可能性についても検討する。LLMは、実際の"コンテンツファーム"で使用されるものと類似したデータセットを微調整したものだ。検出を成功させるためには、少量の微調整データでも十分であることがわかったが、どのLLMが使われているのかを知る必要がある。以上の結果から,現在,合成ニュースのようなテキストを「野生」に検出する方法は存在せず,生成も容易すぎることが示唆された。この問題に関するNLP研究の緊急性を強調した。 Large Language Models (LLMs) are increasingly used as "content farm" models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic. We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real "content farm". We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge. Our results suggest that there are currently no practical methods for detecting synthetic news-like texts 'in the wild', while generating them is too easy. We highlight the urgency of more NLP research on this problem.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# 効率的な粒子保存レンガウォール量子回路 Efficient particle-conserving brick-wall quantum circuits ( http://arxiv.org/abs/2406.12130v1 ) ライセンス: Link先を確認	Babatunde M. Ayeni,	(参考訳) 粒子保存量子回路を用いた変分量子最適化では、粒子保存ゲートと回路アンサーゼが与えられた問題に対して最も効率的かを決定することはしばしば困難である。これは特に限られたリソースを持つノイズの多い中間スケール量子(NISQ)プロセッサにとって重要である。これは一般には答えが難しいが、どの粒子保存ゲートが最も効率的かを決めることは、特定の回路アンサッツの中でより容易である。本稿では,対称テンソルネットワークによる実用的なアイデアを用いて,効率的な粒子保存ゲートを構築する方法について述べる。一般化されたゲートを含む様々な種類の粒子保存ゲートを導出する。ブロックウォール回路の枠組みの下でゲートを数値的にテストする。 4つの実パラメータしか持たない一般粒子保存ゲートが一般に最適であることを示す。さらに,隣り合うゲートが2ビット近いブロックウォール回路を非隣り合うゲートに拡張するアルゴリズムを提案する。回路の効率をハイゼンベルクスピンチェーンと比較し, 次アネレスト-隣り合う相互作用を伴わずに比較した。 In variational quantum optimization with particle-conserving quantum circuits, it is often difficult to decide a priori which particle-conserving gates and circuit ansatzes would be most efficient for a given problem. This is important especially for noisy intermediate-scale quantum (NISQ) processors with limited resources. While this may be challenging to answer in general, deciding which particle-conserving gate would be most efficient is easier within a specified circuit ansatz. In this paper, we show how to construct efficient particle-conserving gates using some practical ideas from symmetric tensor networks. We derive different types of particle-conserving gates, including the generalized one. We numerically test the gates under the framework of brick-wall circuits. We show that the general particle-conserving gate with only four real parameters is generally best. In addition, we present an algorithm to extend brick-wall circuit with two-qubit nearest-neighbouring gates to non-nearest-neighbouring gates. We test and compare the efficiency of the circuits with Heisenberg spin chain with and without next-nearest-neighbouring interactions.	翻訳日:2024-06-19 23:37:50 公開日:2024-06-17
# Gram2Vec: 解釈可能なドキュメントベクタ Gram2Vec: An Interpretable Document Vectorizer ( http://arxiv.org/abs/2406.12131v1 ) ライセンス: Link先を確認	Peter Zeng, Eric Sclafani, Owen Rambow,	(参考訳) テキスト中の文法的特徴の正規化相対周波数を抽出することにより,文書を高次元空間に埋め込む文法的スタイルの埋め込みアルゴリズムであるGram2Vecを提案する。ニューラルアプローチと比較して、Gram2Vecは、特徴ベクトルの生成方法に基づいた固有の解釈性を提供する。デモでは,Gram2Vecベクタをベースとした文書への著者のマッピングを視覚化し,どの著者が特定の言語的選択を行うかを確認するために,機能をドロップまたは追加する機能を強調した。次に、著者属性を用いて、Gram2Vecの機能ベクトル間のコサイン類似性を用いて、候補文書とクエリドキュメント間の距離を計算することにより、文書が特定の著者に帰属する理由を説明する。 We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a mapping of authors to documents based on their Gram2Vec vectors and highlight the ability to drop or add features to view which authors make certain linguistic choices. Next, we use authorship attribution as an application to show how Gram2Vec can explain why a document is attributed to a certain author, using cosine similarities between the Gram2Vec feature vectors to calculate the distances between candidate documents and a query document.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# AIシステムのためのID IDs for AI Systems ( http://arxiv.org/abs/2406.12137v1 ) ライセンス: Link先を確認	Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung,	(参考訳) AIシステムはますます普及しているが、それらをどう扱うかを決めるのに必要な情報は存在せず、アクセスもできない。ユーザーは、システムが特定の安全基準を満たすかどうかを検証できないかもしれない。調査官は、システムがいつインシデントを発生させるか、誰が調査すべきかを知らないかもしれない。プラットフォームは、同じシステムとの繰り返しの負の相互作用をペナルティ化するのは難しいかもしれない。多くのドメインにおいて、IDは類似した問題に対処し、 \textit{particular}エンティティ(例えば、特定のボーイング747)を特定し、同じクラスの他のエンティティ(例えば、一部のボーイング747)に関する情報を提供する。本稿では,AIシステムの「textbf{instances}」にIDを付加するフレームワークを提案する。 AIシステムのIDを特徴付け、主要なアクターからのIDに対する大きな需要がある可能性を主張し、それらのアクターがIDの採用を動機付ける方法を分析し、フレームワークの実装の可能性を探り、制限とリスクを強調します。特定のアクター(例えば、AIシステムによる金融取引を可能にするアクター)が、ID使用のインセンティブを試すことができる。 AIシステムのデプロイは、ID実装の開発を試すことができる。さらなる研究により、IDはAIシステムが社会に浸透する世界を管理するのに役立つかもしれない。 AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same system. Across a number of domains, IDs address analogous problems by identifying \textit{particular} entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to \textbf{instances} of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore potential implementations of our framework, and highlight limitations and risks. IDs seem most warranted in high-stakes settings, where certain actors (e.g., those that enable AI systems to make financial transactions) could experiment with incentives for ID use. Deployers of AI systems could experiment with developing ID implementations. With further study, IDs could help to manage a world where AI systems pervade society.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# テキスト埋め込みモデルにおけるバイアス Bias in Text Embedding Models ( http://arxiv.org/abs/2406.12138v1 ) ライセンス: Link先を確認	Vasyl Rakivnenko, Nestor Maslej, Jessica Cervi, Volodymyr Zhukov,	(参考訳) テキスト埋め込みは、特に企業の間では、ますますポピュラーなAI方法論になりつつあるが、テキスト埋め込みモデルのバイアスを負う可能性はよく理解されていない。本稿では,一般的なテキスト埋め込みモデルの選択が,特に性別次元に偏りがある程度について検討する。より具体的には、これらのモデルが与えられた職業のリストとジェンダー付き用語を関連づける程度について研究する。この分析によると、テキストの埋め込みモデルは男女差が多いが、様々な方法がある。例えば、看護師、ホームメイカー、社交界などの専門職と女性識別子、CEO、マネージャー、ボスといった職業と男性識別子の関連性があるが、全てのモデルが職業ごとに同じ性的な関連性を持つわけではない。さらに、バイアスの大きさと方向はモデル単位でも変化し、特定の単語モデルによっても異なる。本稿は,ジェンダーバイアスがテキスト埋め込みモデルに影響を及ぼすことを実証し,この問題の具体的側面に留意する必要があることを示唆する。 Text embedding is becoming an increasingly popular AI methodology, especially among businesses, yet the potential of text embedding models to be biased is not well understood. This paper examines the degree to which a selection of popular text embedding models are biased, particularly along gendered dimensions. More specifically, this paper studies the degree to which these models associate a list of given professions with gendered terms. The analysis reveals that text embedding models are prone to gendered biases but in varying ways. Although there are certain inter-model commonalities, for instance, greater association of professions like nurse, homemaker, and socialite with female identifiers, and greater association of professions like CEO, manager, and boss with male identifiers, not all models make the same gendered associations for each occupation. Furthermore, the magnitude and directionality of bias can also vary on a model-by-model basis and depend on the particular words models are prompted with. This paper demonstrates that gender bias afflicts text embedding models and suggests that businesses using this technology need to be mindful of the specific dimensions of this problem.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# COTフロー: コントラストペアによる最適トランスポート画像サンプリングと編集を学習する COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs ( http://arxiv.org/abs/2406.12140v1 ) ライセンス: Link先を確認	Xinrui Zu, Qian Tao,	(参考訳) 拡散モデルは,高次品質のマルチモーダルデータのサンプリング・編集において高い性能を示してきたが,計算コストが高く,遅い反復生成プロセスに悩まされている。さらに、ほとんどの手法はガウスノイズからデータを生成するよう制約されているため、サンプリングや編集の柔軟性が制限される。両欠点を克服するために,従来の拡散モデルと比較してゼロショット編集の柔軟性を向上し,高速かつ高品質な生成を実現する新しい手法であるContrastive Optimal Transport Flow (COT Flow)を提案する。最適トランスポート (OT) から恩恵を受けるため,本手法は事前分布に制限がなく,未ペア画像対像 (I2I) 変換が可能であり,編集可能な空間(軌道の始点と終端の両方)を他のゼロショット編集法と比較して2倍にすることができる。品質面では、COT Flowは従来の最先端のイメージ・ツー・イメージ(I2I)翻訳法と比較して1ステップで競合結果を生成することができる。 OTの導入によるCOT Flowの利点を強調するため,ユーザガイドによる編集を行うCOTエディタを導入する。コードはhttps://github.com/zuxinrui/cot_flow.comでリリースされる。 Diffusion models have demonstrated strong performance in sampling and editing multi-modal data with high generation quality, yet they suffer from the iterative generation process which is computationally expensive and slow. In addition, most methods are constrained to generate data from Gaussian noise, which limits their sampling and editing flexibility. To overcome both disadvantages, we present Contrastive Optimal Transport Flow (COT Flow), a new method that achieves fast and high-quality generation with improved zero-shot editing flexibility compared to previous diffusion models. Benefiting from optimal transport (OT), our method has no limitation on the prior distribution, enabling unpaired image-to-image (I2I) translation and doubling the editable space (at both the start and end of the trajectory) compared to other zero-shot editing methods. In terms of quality, COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods. To highlight the advantages of COT Flow through the introduction of OT, we introduce the COT Editor to perform user-guided editing with excellent flexibility and quality. The code will be released at https://github.com/zuxinrui/cot_flow.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# 音声理解のための多言語意味音声エンコーダの微調整のための二重タスク学習手法 A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding ( http://arxiv.org/abs/2406.12141v1 ) ライセンス: Link先を確認	Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Yannick Estève,	(参考訳) 自己指導型学習は、音声言語理解のための発話を効率よく表現するために広く用いられ、従来のアプローチを徐々に置き換えている。一方、言語に依存しないセマンティクスを符号化するテキストSSLモデルが提案されている。 SAMU-XLSRフレームワークはこの意味情報を多言語音声表現の強化に用いた。最近の研究では、SAMU-XLSRのドメイン内セマンティックエンリッチメントについて、下流の転写に特化して検討し、挑戦的なSLUタスクにおける最先端の結果をもたらした。本研究の関心は、SLUを含まない近接言語におけるそのような特殊化によって引き起こされる多言語パフォーマンスの喪失と特異意味訓練の欠如にある。また,SAMU-XLSRの初期言語間能力の喪失は,SLUファインチューニングの分離によるものであると考えられた。そこで本稿では,多言語および言語可搬性実験のための遠隔言語を考察しながら,SAMU-XLSRセマンティックエンリッチメントを改善するための2つのタスク学習手法を提案する。 Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations. A recent study investigated SAMU-XLSR in-domain semantic enrichment by specializing it on downstream transcriptions, leading to state-of-the-art results on a challenging SLU task. This study's interest lies in the loss of multilingual performances and lack of specific-semantics training induced by such specialization in close languages without any SLU implication. We also consider SAMU-XLSR's loss of initial cross-lingual abilities due to a separate SLU fine-tuning. Therefore, this paper proposes a dual task learning approach to improve SAMU-XLSR semantic enrichment while considering distant languages for multilingual and language portability experiments.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# バイアススライシング:スライスディスカバリ法による医用画像解析におけるパフォーマンスギャップの説明 Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods ( http://arxiv.org/abs/2406.12142v1 ) ライセンス: Link先を確認	Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen,	(参考訳) 機械学習モデルは、医用画像解析において高い総合的精度を達成した。しかし、特定の患者群におけるパフォーマンス格差は、その臨床的有用性、安全性、公平性に課題をもたらす。これは、性別、年齢、病気のサブタイプに基づく既知の患者グループや、以前は知られていなかった、ラベルのないグループに影響を及ぼす可能性がある。さらに、このような観察された性能格差の根本原因を明らかにすることはしばしば困難であり、緩和の努力を妨げる。本稿では,これらの問題に対処するために,Slice Discovery Methods (SDMs) を用いて,データの解釈可能な未処理部分集合を同定し,観測性能の相違の原因に関する仮説を定式化する。我々は,新しいSDMを導入し,胸部X線からの気胸と無気腫の分類のケーススタディに応用した。本研究は, 仮説定式化におけるSDMの有効性を実証し, 広く用いられている胸部X線データセットおよびモデルにおいて, 男女間の既往の相違について説明する。以上の結果から,胸部ドレインと心電図ワイヤーを併用し,両分類作業におけるショートカット学習について検討した。これらのショートカット特徴の有病率の性差は、ショートカット学習とモデルフェアネス分析の相違点として、観察された分類性能のギャップを引き起こすと考えられる。 Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# 量子リスク下におけるミニマックス線形回帰 Minimax Linear Regression under the Quantile Risk ( http://arxiv.org/abs/2406.12145v1 ) ライセンス: Link先を確認	Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu,	(参考訳) 量子リスク下での線形回帰におけるミニマックス法の設計問題について検討する。まず、任意のノイズレベルと入力の分布に対して、豊富な誤差関数の族に対して正確な極小量子化リスクを求め、OLSの極小性を確立する独立ガウス雑音による実現可能な設定を考える。これにより、平方誤差の特別な場合の既知の下界が改善され、より大きな分布集合よりも極小量子化リスクの低い境界が得られる。入力の分布に関する四分数誤差と四分数仮定の下では、この下限がより大きい問題に対して厳密であることが示される。具体的には、最近提案されたmin-max回帰法の変種における最悪の量子化リスクの上限が一致することを証明し、絶対定数までその最小値を確立する。この結果を、$p \in (2, \infty)$に対するすべての$p$-thパワーエラー関数に拡張することで、我々のアプローチの有用性を説明する。その過程で、古典的ベイズ法に類似した、量子化リスクを扱う際のミニマックスリスクを低く抑える方法や、サンプル共分散行列の最小固有値の量子化の厳密なキャラクタリゼーションを開発する。 We study the problem of designing minimax procedures in linear regression under the quantile risk. We start by considering the realizable setting with independent Gaussian noise, where for any given noise level and distribution of inputs, we obtain the exact minimax quantile risk for a rich family of error functions and establish the minimaxity of OLS. This improves on the known lower bounds for the special case of square error, and provides us with a lower bound on the minimax quantile risk over larger sets of distributions. Under the square error and a fourth moment assumption on the distribution of inputs, we show that this lower bound is tight over a larger class of problems. Specifically, we prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure, thereby establishing its minimaxity, up to absolute constants. We illustrate the usefulness of our approach by extending this result to all $p$-th power error functions for $p \in (2, \infty)$. Along the way, we develop a generic analogue to the classical Bayesian method for lower bounding the minimax risk when working with the quantile risk, as well as a tight characterization of the quantiles of the smallest eigenvalue of the sample covariance matrix.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# AIはコードを最適化すべきか? 現在の大規模言語モデルと古典的最適化コンパイラの比較研究 Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers ( http://arxiv.org/abs/2406.12146v1 ) ライセンス: Link先を確認	Miguel Romero Rosas, Miguel Torres Sanchez, Rudolf Eigenmann,	(参考訳) 現代のコンピュータアーキテクチャの状況では、効率的な並列プログラミングの需要は持続し、堅牢な最適化技術を必要としている。従来の最適化コンパイラはこの取り組みにおいて歴史的に重要な役割を担い、現代のソフトウェアシステムの複雑さの進化に適応してきた。大規模言語モデル(LLM)の出現は、コード最適化方法論に革命をもたらすAI駆動アプローチの可能性に関する興味深い疑問を提起する。本稿では、GPT-4.0とCodeLlama-70Bの2つの最先端大言語モデルと従来の最適化コンパイラの比較分析を行い、最適化の能力と限界を最大効率のために評価する。さらに,これらのツールが生成するコードのパフォーマンスと正確性を評価するための,難易度の高い最適化パターンと自動メカニズムのベンチマークスイートも導入する。思考の連鎖(CoT)とインストラクション・プロンプト(IP)の2つの異なるプロンプト手法を用いてLCMの性能を評価した。次に、これらの結果をCETUS、PLUTO、ROSEの3つの従来の最適化コンパイラと比較した。重要な発見は、LLMが現在の最適化コンパイラを上回る性能を持つ一方で、大規模なコードサイズで間違ったコードを生成し、自動検証メソッドを呼び出すことがしばしばあることである。 3つのベンチマークスイートで広範囲に評価したところ、CodeLlama-70Bは2.1倍のスピードアップを達成できる2つのLLMの中で、優れたオプティマイザであることがわかった。さらに、CETUSは最適化コンパイラの中でも最高であり、最大1.9倍のスピードアップを実現している。また,思考の連鎖 (Cot) とインストラクション・プロンプト (IP) の2つの方法の間に有意な差は認められなかった。 In the contemporary landscape of computer architecture, the demand for efficient parallel programming persists, needing robust optimization techniques. Traditional optimizing compilers have historically been pivotal in this endeavor, adapting to the evolving complexities of modern software systems. The emergence of Large Language Models (LLMs) raises intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies. This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers, assessing their respective abilities and limitations in optimizing code for maximum efficiency. Additionally, we introduce a benchmark suite of challenging optimization patterns and an automatic mechanism for evaluating performance and correctness of the code generated by such tools. We used two different prompting methodologies to assess the performance of the LLMs -- Chain of Thought (CoT) and Instruction Prompting (IP). We then compared these results with three traditional optimizing compilers, CETUS, PLUTO and ROSE, across a range of real-world use cases. A key finding is that while LLMs have the potential to outperform current optimizing compilers, they often generate incorrect code on large code sizes, calling for automated verification methods. Our extensive evaluation across 3 different benchmarks suites shows CodeLlama-70B as the superior optimizer among the two LLMs, capable of achieving speedups of up to 2.1x. Additionally, CETUS is the best among the optimizing compilers, achieving a maximum speedup of 1.9x. We also found no significant difference between the two prompting methods: Chain of Thought (Cot) and Instructing prompting (IP).	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# メタ認知AI:その枠組みとニューロシンボリックアプローチの事例 Metacognitive AI: Framework and the Case for a Neurosymbolic Approach ( http://arxiv.org/abs/2406.12147v1 ) ライセンス: Link先を確認	Hua Wei, Paulo Shakarian, Christian Lebiere, Bruce Draper, Nikhil Krishnaswamy, Sergei Nirenburg,	(参考訳) メタ認知は、エージェントの内部過程を推論する概念であり、もともとは発達心理学の分野で導入された。本稿では,メタ認知を人工知能に適用する概念について考察する。我々は、TRAPと呼ばれるメタ認知人工知能(AI)を理解するための枠組みを導入する。我々は、これらの局面のそれぞれについて議論し、メタ認知の課題に対処するために、ニューロシンボリックAI(NSAI)をどのように活用できるかを探求する。 Metacognition is the concept of reasoning about an agent's own internal processes and was originally introduced in the field of developmental psychology. In this position paper, we examine the concept of applying metacognition to artificial intelligence. We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and perception. We discuss each of these aspects in-turn and explore how neurosymbolic AI (NSAI) can be leveraged to address challenges of metacognition.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# カオスマイニング:低SNR環境における局所帰属手法の評価ベンチマーク ChaosMining: A Benchmark to Evaluate Post-Hoc Local Attribution Methods in Low SNR Environments ( http://arxiv.org/abs/2406.12150v1 ) ライセンス: Link先を確認	Ge Shi, Ziwen Kan, Jason Smucny, Ian Davidson,	(参考訳) 本研究では,リアルタイム機械学習の一般的なシナリオである低信号-雑音比(SNR)を特徴とする領域において,非関連領域から予測力のある特徴を識別するためのポストホック局所帰属法の有効性を検討する。我々は, 記号関数, 画像, 音声データを含む合成データセットを開発し, {\it (Model \(\times\) Attribution\(\times\) Noise Condition)}三重項のベンチマークを組み込んだ。スクラッチから訓練された様々な古典的モデルを厳格にテストすることにより、これらの属性手法の性能について、複数の条件下での貴重な洞察を得た。これらの知見に基づき、ニューラルネットワークの適用性を高めるために、顕著な再帰的特徴除去(RFE)アルゴリズムの新たな拡張を導入する。我々の実験では、スケーラビリティの制限とともに、予測と特徴選択の長所を強調しています。付録にはさらなる詳細と追加のマイナーな発見が含まれており、広範な議論が交わされている。コードとリソースは \href{https://github.com/geshijoker/ChaosMining/}{URL} で入手できる。 In this study, we examine the efficacy of post-hoc local attribution methods in identifying features with predictive power from irrelevant ones in domains characterized by a low signal-to-noise ratio (SNR), a common scenario in real-world machine learning applications. We developed synthetic datasets encompassing symbolic functional, image, and audio data, incorporating a benchmark on the {\it (Model \(\times\) Attribution\(\times\) Noise Condition)} triplet. By rigorously testing various classic models trained from scratch, we gained valuable insights into the performance of these attribution methods in multiple conditions. Based on these findings, we introduce a novel extension to the notable recursive feature elimination (RFE) algorithm, enhancing its applicability for neural networks. Our experiments highlight its strengths in prediction and feature selection, alongside limitations in scalability. Further details and additional minor findings are included in the appendix, with extensive discussions. The codes and resources are available at \href{https://github.com/geshijoker/ChaosMining/}{URL}.	翻訳日:2024-06-19 23:28:06 公開日:2024-06-17
# 非負のニューラルネットワークの固定点 Fixed points of nonnegative neural networks ( http://arxiv.org/abs/2106.16239v9 ) ライセンス: Link先を確認	Tomasz J. Piotrowski, Renato L. G. Cavalcante, Mateusz Gabor,	(参考訳) 非負のベクトルを非負のベクトルにマッピングするニューラルネットワークとして定義する、非負のニューラルネットワークの解析に固定点理論を用いる。まず、非負の重みとバイアスを持つ非負のニューラルネットワークは、非線形ペロン・フロベニウス理論の枠組みの中で単調かつ(弱く)スケーラブルな写像として認識できることを示す。この事実により、同じ次元の入力と出力を持つ非負のニューラルネットワークの固定点の存在条件を提供することができ、これらの条件は凸解析の引数を用いて最近得られた条件よりも弱い。さらに、非負の重みとバイアスを持つ非負のニューラルネットワークの固定点集合の形状が間隔であり、穏やかな条件下では点に縮退することを示した。これらの結果は、より一般的な非負のニューラルネットワークの固定点の存在を得るために用いられる。実用的観点からは, オートエンコーダの挙動の理解に寄与し, 深層平衡モデルにおける今後の発展に有用な数学機械も提供する。 We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.	翻訳日:2024-06-19 13:36:52 公開日:2024-06-17
# 密度推定に基づくクラスタリング評価のための新しい指標 A New Index for Clustering Evaluation Based on Density Estimation ( http://arxiv.org/abs/2207.01294v4 ) ライセンス: Link先を確認	Gangli Liu,	(参考訳) クラスタリングの内部評価のための新しい指標が導入された。インデックスは2つのサブインデックスの混合として定義される。最初のサブインデックス $ I_a $ は Ambiguous Index と呼ばれ、2番目のサブインデックス $ I_s $ は similarity Index と呼ばれる。 2つのサブインデックスの計算は、データのパーティションの各クラスタに対する密度推定に基づいている。新しいインデックスの性能をテストする実験を行い、145データセットのセット上で、Calinski-Harabasz指数、Silhouette係数、Davies-Bouldin指数、CDbw、DBCV、VIASCKDEの6つの内部クラスタリング評価指標と比較した。その結果、新たな指標は、他の内部クラスタリング評価指標を大幅に改善することが示された。 A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.	翻訳日:2024-06-19 13:29:49 公開日:2024-06-17
# 教師なし人物Re-IDのためのドメインカメラ適応と協調多重特徴クラスタリング Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID ( http://arxiv.org/abs/2208.08624v2 ) ライセンス: Link先を確認	Yuanpeng Tu,	(参考訳) 最近、制限付きアノテートデータが利用可能なオープンワールドのシナリオ設定のために、教師なしの人物再識別(re-ID)が注目されている。既存の教師なし手法は、しばしば目に見えない領域でうまく一般化できないが、教師なし手法は、主に複数の粒度情報がなく、確認バイアスに悩まされる傾向がある。本稿では,2つの側面から未確認対象領域のより優れた特徴表現を求める。 1) ラベル付きソースドメインに教師なしのドメインアダプティブを実行し、 2)未ラベル対象ドメインにおけるマイニング可能性の類似性。また、確認バイアスの影響を軽減するために、協調的な擬似再ラベル戦略を提案する。まず、生成対向ネットワークを利用して、ソースドメインからターゲットドメインへの画像転送を行う。さらに、生成した画像の品質を向上させるために、個人識別の保存とアイデンティティマッピングの損失を導入する。第2に,大域的特徴や部分的特徴分岐を含む対象領域の内部データ構造を学習するための協調的多機能クラスタリングフレームワーク(CMFC)を提案する。グローバル機能ブランチ(GB)は、人画像のグローバル機能に教師なしクラスタリングを採用し、部分機能ブランチ(PB)は、異なる身体領域内で類似性をマイニングする。最後に、2つのベンチマークデータセットに対する広範な実験により、教師なしの人物再ID設定下での手法の競合性能を示す。 Recently unsupervised person re-identification (re-ID) has drawn much attention due to its open-world scenario settings where limited annotated data is available. Existing supervised methods often fail to generalize well on unseen domains, while the unsupervised methods, mostly lack multi-granularity information and are prone to suffer from confirmation bias. In this paper, we aim at finding better feature representations on the unseen target domain from two aspects, 1) performing unsupervised domain adaptation on the labeled source domain and 2) mining potential similarities on the unlabeled target domain. Besides, a collaborative pseudo re-labeling strategy is proposed to alleviate the influence of confirmation bias. Firstly, a generative adversarial network is utilized to transfer images from the source domain to the target domain. Moreover, person identity preserving and identity mapping losses are introduced to improve the quality of generated images. Secondly, we propose a novel collaborative multiple feature clustering framework (CMFC) to learn the internal data structure of target domain, including global feature and partial feature branches. The global feature branch (GB) employs unsupervised clustering on the global feature of person images while the Partial feature branch (PB) mines similarities within different body regions. Finally, extensive experiments on two benchmark datasets show the competitive performance of our method under unsupervised person re-ID settings.	翻訳日:2024-06-19 13:29:49 公開日:2024-06-17
# 樹木のトラバーサル再考 Rethink Tree Traversal ( http://arxiv.org/abs/2209.04825v5 ) ライセンス: Link先を確認	Jinxiong Zhang,	(参考訳) 本稿では,行列計算の言語における二分決定木トラバーサルの実装方法について述べる。我々の主な貢献は、決定木の階層構造の新しい行列表現に基づく二分木トラバーサルの等価アルゴリズムを提案することである。私たちのキーとなるアイデアは、内部積探索の最大化によるバイナリ決定ツリーの移動です。我々は、再帰的トラバースのない決定木メソッドを実装するだけでなく、木ベースのメソッドのパーティショニングの性質を掘り下げる。 We will show how to implement binary decision tree traversal in the language of matrix computation. Our main contribution is to propose some equivalent algorithms of binary tree traversal based on a novel matrix representation of the hierarchical structure of the decision tree. Our key idea is to travel the binary decision tree by maximum inner product search. We not only implement decision tree methods without the recursive traverse but also delve into the partitioning nature of tree-based methods.	翻訳日:2024-06-19 13:29:49 公開日:2024-06-17
# 生涯学習対話システム Lifelong and Continual Learning Dialogue Systems ( http://arxiv.org/abs/2211.06553v2 ) ライセンス: Link先を確認	Sahisnu Mazumder, Bing Liu,	(参考訳) チャットボットとして知られる対話システムは、ユーザとのチャット会話やタスク指向の対話で様々なタスクをこなすために広く普及しているため、近年で普及している。既存のチャットボットは通常、事前にコンパイルされたデータや手動でラベル付けされたデータからトレーニングされる。多くの人が手動でコンパイルされた知識ベース(KB)を使用している。自然言語を理解する能力はまだ限られており、多くのエラーが発生する傾向にあり、ユーザ満足度は低い。通常、よりラベル付きデータとより手作業でコンパイルされた知識を持つエンジニアによって継続的に改善される必要がある。本書では,チャットボットがユーザや作業環境との自己開始型対話を通じて,自分自身で継続的に学習する能力を実現するための,生涯学習対話システムの新たなパラダイムを紹介する。システムがユーザとチャットしたり、外部ソースからより多くのことを学ぶようになると、会話の知識が増し、より良くなる。本書では,ユーザからの会話中,外部ソースからの会話中,新たな言語表現や語彙的,事実的知識を継続的に学習し,会話中に新たなトレーニング例を取得し,会話スキルを習得する,このような継続的な学習対話システムを構築するための最新の開発技術について紹介する。これらの一般的な話題とは別に、対話システムの特定の側面の継続的な学習に関する既存の研究も調査されている。この本は、将来の研究におけるオープンな課題に関する議論で締めくくられている。 Dialogue systems, commonly known as chatbots, have gained escalating popularity in recent times due to their wide-spread applications in carrying out chit-chat conversations with users and task-oriented dialogues to accomplish various user tasks. Existing chatbots are usually trained from pre-collected and manually-labeled data and/or written with handcrafted rules. Many also use manually-compiled knowledge bases (KBs). Their ability to understand natural language is still limited, and they tend to produce many errors resulting in poor user satisfaction. Typically, they need to be constantly improved by engineers with more labeled data and more manually compiled knowledge. This book introduces the new paradigm of lifelong learning dialogue systems to endow chatbots the ability to learn continually by themselves through their own self-initiated interactions with their users and working environments to improve themselves. As the systems chat more and more with users or learn more and more from external sources, they become more and more knowledgeable and better and better at conversing. The book presents the latest developments and techniques for building such continual learning dialogue systems that continuously learn new language expressions and lexical and factual knowledge during conversation from users and off conversation from external sources, acquire new training examples during conversation, and learn conversational skills. Apart from these general topics, existing works on continual learning of some specific aspects of dialogue systems are also surveyed. The book concludes with a discussion of open challenges for future research.	翻訳日:2024-06-19 13:29:49 公開日:2024-06-17
# 最適化ユニタリ結合クラスタアンサッツを用いた実験的量子計算化学 Experimental quantum computational chemistry with optimised unitary coupled cluster ansatz ( http://arxiv.org/abs/2212.08006v3 ) ライセンス: Link先を確認	Shaojun Guo, Jinzhao Sun, Haoran Qian, Ming Gong, Yukun Zhang, Fusheng Chen, Yangsen Ye, Yulin Wu, Sirui Cao, Kun Liu, Chen Zha, Chong Ying, Qingling Zhu, He-Liang Huang, Youwei Zhao, Shaowei Li, Shiyu Wang, Jiale Yu, Daojin Fan, Dachao Wu, Hong Su, Hui Deng, Hao Rong, Yuan Li, Kaili Zhang, Tung-Hsun Chung, Futian Liang, Jin Lin, Yu Xu, Lihua Sun, Cheng Guo, Na Li, Yong-Heng Huo, Cheng-Zhi Peng, Chao-Yang Lu, Xiao Yuan, Xiaobo Zhu, Jian-Wei Pan,	(参考訳) 量子計算化学は量子コンピューティングの重要な応用として登場した。変分量子固有解法(VQE)のようなハイブリッド量子古典計算法は、量子化学問題の有望な解法として設計されているが、理論的な複雑さと実験上の不完全性により、信頼性と正確な結果が得られない。電子構造を解くための実験は、いまだに計算不能(ハードウエア効率)または古典的にシミュレート可能な(Hartree-Fock)アンサッツに制限されている。スケーラブルで高精度な量子化学シミュレーションの実験的実現はいまだ解明されていない。ここでは、ノイズ量子プロセッサを用いて分子電子構造を解くことに伴う重要な課題に対処する。本プロトコルは, 回路深度, 走行時間, 化学シミュレーションの指標を著しく改善する。ハードウェアの体系的な拡張とエラー軽減技術の統合を通じて、実験的な量子計算化学の限界を推し進め、最適化されたユニタリ結合クラスタ・アザッツを12キュービットに拡大してVQEの実装を成功させた。誤差抑制分子の基底状態エネルギーの高精度な計算結果を2桁程度精度で生成する。すべての結合距離におけるH$_2$の化学的精度と実験中の小さな結合距離におけるLiHの化学的精度は、最近の2つの同時処理を超えても達成される。我々の研究は、電子構造計算におけるスケーラブルなソリューションへの実現可能な道筋を示し、重要な技術的特徴を検証し、この目標の今後の課題を特定する。 Quantum computational chemistry has emerged as an important application of quantum computing. Hybrid quantum-classical computing methods, such as variational quantum eigensolvers (VQE), have been designed as promising solutions to quantum chemistry problems, yet challenges due to theoretical complexity and experimental imperfections hinder progress in achieving reliable and accurate results. Experimental works for solving electronic structures are consequently still restricted to nonscalable (hardware efficient) or classically simulable (Hartree-Fock) ansatz, or limited to a few qubits with large errors. The experimental realisation of scalable and high-precision quantum chemistry simulation remains elusive. Here, we address the critical challenges {associated with} solving molecular electronic structures using noisy quantum processors. Our protocol presents significant improvements in the circuit depth and running time, key metrics for chemistry simulation. Through systematic hardware enhancements and the integration of error mitigation techniques, we push forward the limit of experimental quantum computational chemistry and successfully scale up the implementation of VQE with an optimised unitary coupled-cluster ansatz to 12 qubits. We produce high-precision results of the ground-state energy for molecules with error suppression by around two orders of magnitude. We achieve chemical accuracy for H$_2$ at all bond distances and LiH at small bond distances in the experiment, even beyond the two recent concurrent works. Our work demonstrates a feasible path towards a scalable solution to electronic structure calculation, validating the key technological features and identifying future challenges for this goal.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# SeaFormer++: モバイル視覚認識のためのスキーズ強化軸変換器 SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition ( http://arxiv.org/abs/2301.13156v5 ) ライセンス: Link先を確認	Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang,	(参考訳) ビジョントランスフォーマーの導入以来、CNNが圧倒的に支配してきた多くのコンピュータビジョンタスク(例えばセマンティックセグメンテーション)のランドスケープは、近年大きく革新されている。しかし、計算コストとメモリ要求により、これらの手法はモバイルデバイスには適さない。本稿では,モバイル視覚認識のための圧縮強化軸変換器(SeaFormer)を提案する。具体的には、圧縮軸の定式化と詳細強化を特徴とする一般的な注意ブロックを設計する。さらにコスト効率のよいバックボーンアーキテクチャのファミリを作成するためにも使用できる。光セグメンテーションヘッドと組み合わせることで、ADE20K、Cityscapes、Pascal Context、COCO-Stuffデータセット上のARMベースのモバイルデバイス上で、セグメンテーション精度とレイテンシの最良のトレードオフを実現する。重要なことは、モバイルフレンドリーなライバルとTransformerベースのライバルの両方を、ベルやホイッスルを使わずに、パフォーマンスとレイテンシの低下で打ち負かした。さらに,機能アップサンプリングに基づくマルチレゾリューション蒸留技術を導入し,提案フレームワークの推論遅延を低減した。セマンティックセグメンテーション以外にも、提案するSeaFormerアーキテクチャを画像分類やオブジェクト検出問題に適用し、モバイルフレンドリーなバックボーンとして機能する可能性を示す。私たちのコードとモデルはhttps://github.com/fudan-zvg/SeaFormer.comで公開されています。 Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K, Cityscapes, Pascal Context and COCO-Stuff datasets. Critically, we beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles. Furthermore, we incorporate a feature upsampling-based multi-resolution distillation technique, further reducing the inference latency of the proposed framework. Beyond semantic segmentation, we further apply the proposed SeaFormer architecture to image classification and object detection problems, demonstrating the potential of serving as a versatile mobile-friendly backbone. Our code and models are made publicly available at https://github.com/fudan-zvg/SeaFormer.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# 未熟児網膜症の深層学習分類を改善するリトカム画像のための新しい基礎画像前処理法 Novel Fundus Image Preprocessing for Retcam Images to Improve Deep Learning Classification of Retinopathy of Prematurity ( http://arxiv.org/abs/2302.02524v5 ) ライセンス: Link先を確認	Sajid Rahim, Kourosh Sabri, Anna Ells, Alan Wassyng, Mark Lawford, Linyang Chu, Wenbo He,	(参考訳) 未熟児網膜症(英: Retinopathy of Prematurity、ROP)は、未熟児に影響を与える眼の網膜の損傷により失明する可能性のある眼疾患である。 ROPのスクリーニングは早期発見と治療に不可欠である。これは、臨床上重要な疾患の診断成功率を低下させる原因となる主観的な拡張眼科検査を訓練された医師に要求する、退屈で手動のプロセスである。自動診断法は、深層学習を用いて眼科医が診断精度を向上させるのに役立つ。いくつかの研究グループが様々なアプローチを強調している。キャプチャされたROPリトカム画像は、品質の低下に悩まされる。本稿では,事前学習フレームワークを用いた新しい基礎前処理手法を用いてハイブリッドモデルを構築し,診断精度を高めることを提案する。評価の結果、従来の画像処理と比較して、Plus病、ROPのステージ、およびゾーンの分類において、これらの新しい手法が、ピアペーパーと比較して、多くの面で、より良い精度に寄与することが確認された。 Retinopathy of Prematurity (ROP) is a potentially blinding eye disorder because of damage to the eye's retina which can affect babies born prematurely. Screening of ROP is essential for early detection and treatment. This is a laborious and manual process which requires trained physician performing dilated ophthalmological examination which can be subjective resulting in lower diagnosis success for clinically significant disease. Automated diagnostic methods can assist ophthalmologists increase diagnosis accuracy using deep learning. Several research groups have highlighted various approaches. Captured ROP Retcam images suffer from poor quality. This paper proposes the use of improved novel fundus preprocessing methods using pretrained transfer learning frameworks to create hybrid models to give higher diagnosis accuracy. Once trained and validated, the evaluations showed that these novel methods in comparison to traditional imaging processing contribute to better and in many aspects higher accuracy in classifying Plus disease, Stages of ROP and Zones in comparison to peer papers.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# エッジコンピューティングにおけるトポロジーを考慮したフェデレーションラーニング - 総合的な調査 Topology-aware Federated Learning in Edge Computing: A Comprehensive Survey ( http://arxiv.org/abs/2302.02573v2 ) ライセンス: Link先を確認	Jiajun Wu, Steve Drew, Fan Dong, Zhuangdi Zhu, Jiayu Zhou,	(参考訳) 5G/6Gアプリケーションの超低レイテンシ要件とプライバシ制約は、分散機械学習システムをエッジにデプロイすることを要求している。シンプルだが効果的なアプローチであるフェデレーションドラーニング(FL)は、分散トレーニングデータとプライベートトレーニングデータを使ったエッジコンピューティングにおける、巨大なユーザ所有デバイスに対する自然なソリューションである。 FedAvgをベースとしたFL法は典型的には、不安定なエッジコンピューティングアーキテクチャやトポロジーの不均一性や階層性を無視して、ナイーブな星トポロジーに従う。他にもいくつかのネットワークトポロジーが存在し、恒星トポロジーの限界とボトルネックに対処することができる。これは、ネットワークトポロジに関連するFLソリューションを調査する動機となります。本稿では,ネットワークトポロジに着目した既存のFL作品の包括的調査を行う。 FLおよびエッジコンピューティングネットワークの概要を概説した後、様々なエッジネットワークトポロジとそれらの利点とデメリットについて論じる。最後に、FLをトポロジ固有のエッジネットワークに適用するための課題と今後の課題について論じる。 The ultra-low latency requirements of 5G/6G applications and privacy constraints call for distributed machine learning systems to be deployed at the edge. With its simple yet effective approach, federated learning (FL) is a natural solution for massive user-owned devices in edge computing with distributed and private training data. FL methods based on FedAvg typically follow a naive star topology, ignoring the heterogeneity and hierarchy of the volatile edge computing architectures and topologies in reality. Several other network topologies exist and can address the limitations and bottlenecks of the star topology. This motivates us to survey network topology-related FL solutions. In this paper, we conduct a comprehensive survey of the existing FL works focusing on network topologies. After a brief overview of FL and edge computing networks, we discuss various edge network topologies as well as their advantages and disadvantages. Lastly, we discuss the remaining challenges and future works for applying FL to topology-specific edge networks.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# デビアスのためのバックドア:バックドアアタックに基づく人工バイアスによるモデルバイアスの緩和 Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias ( http://arxiv.org/abs/2303.01504v2 ) ライセンス: Link先を確認	Shangxi Wu, Qiuyang He, Dongyuan Lu, Jian Yu, Jitao Sang,	(参考訳) ディープラーニングの急速な進歩により、最先端のアルゴリズムは様々な社会的状況で利用されてきた。それでも、いくつかのアルゴリズムはバイアスを示し、不平等な結果をもたらすことが発見されている。現在のデバイアス法では、データの低利用や複雑なトレーニング要件といった課題に直面している。本研究では, バックドア攻撃により, 標準訓練によるモデルバイアスに類似した人工バイアスが構築できることを見出した。バックドア・トリガーの強い調整性を考えると、バックドア・アタックから生じるリバース・人工バイアスを慎重に設計することでモデルバイアスを緩和する動機がある。そこで本研究では,知識蒸留に基づくバックドア脱バイアスフレームワークを提案し,モデルバイアスを元のデータから効果的に低減し,バックドア攻撃によるセキュリティリスクを最小限に抑える。提案手法は、画像と構造化されたデータセットの両方で検証され、有望な結果を示す。この作業はバックドア攻撃の理解を深め、有益なアプリケーションの可能性を強調します。この研究のコードは \url{https://anonymous.4open.science/r/DwB-BC07/} で見ることができる。 With the swift advancement of deep learning, state-of-the-art algorithms have been utilized in various social situations. Nonetheless, some algorithms have been discovered to exhibit biases and provide unequal results. The current debiasing methods face challenges such as poor utilization of data or intricate training requirements. In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training. Considering the strong adjustability of backdoor triggers, we are motivated to mitigate the model bias by carefully designing reverse artificial bias created from backdoor attack. Based on this, we propose a backdoor debiasing framework based on knowledge distillation, which effectively reduces the model bias from original data and minimizes security risks from the backdoor attack. The proposed solution is validated on both image and structured datasets, showing promising results. This work advances the understanding of backdoor attacks and highlights its potential for beneficial applications. The code for the study can be found at \url{https://anonymous.4open.science/r/DwB-BC07/}.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# 連続グロモフ・ワッサーシュタイン問題の解法 Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem ( http://arxiv.org/abs/2303.05978v2 ) ライセンス: Link先を確認	Xavier Aramayo Carrasco, Maksim Nekrashevich, Petr Mokrov, Evgeny Burnaev, Alexander Korotin,	(参考訳) 近年,Gromov-Wasserstein Optimal Transport (GWOT)問題がMLコミュニティの注目を集めている。この問題において、2つの(おそらく異なる)空間上で支えられる2つの分布が与えられたとき、それらの間の最も等距離写像を見つける必要がある。 GWOT の離散変種では、与えられた離散点の集合間の代入を学習する。より高度な連続定式化では、未知の連続分布のパラメトリックマッピングを、それらから派生したサンプルに基づいて復元することを目的としている。 GWOTの背後にある明らかな幾何学的直観は、いくつかの実用的なユースケースにとって自然な選択となり、提案された多くの解法がもたらされる。それらのいくつかは、問題の継続的バージョンを解決していると主張している。同時に、GWOTは理論上も数値上も難しいと悪名高い。さらに、既存の連続GWOTソルバは依然として離散技術に大きく依存している。どのようにして既存の手法がGWOT問題を解き放つか、どのような困難に遭遇し、どの条件で成功するか、という自然な疑問が生まれます。我々のベンチマーク論文はこれらの質問に答える試みである。特に、最も興味深く、議論の余地のないセットアップとして、継続的GWOTに注目します。既存の連続GWOTアプローチをさまざまなシナリオでクラッシュテストし、結果を注意深く記録し分析し、問題を特定します。我々の研究結果は、科学コミュニティが依然として信頼性の高いGWOT解決器を欠いていることを実験的に証明している。この方向への第一歩として、離散技術に依存しない新しい連続GWOT法を提案し、競合者の問題を部分的に解決する。私たちのコードはhttps://github.com/Ark-130994/GW-Solvers.comで公開されています。 Recently, the Gromov-Wasserstein Optimal Transport (GWOT) problem has attracted the special attention of the ML community. In this problem, given two distributions supported on two (possibly different) spaces, one has to find the most isometric map between them. In the discrete variant of GWOT, the task is to learn an assignment between given discrete sets of points. In the more advanced continuous formulation, one aims at recovering a parametric mapping between unknown continuous distributions based on i.i.d. samples derived from them. The clear geometrical intuition behind the GWOT makes it a natural choice for several practical use cases, giving rise to a number of proposed solvers. Some of them claim to solve the continuous version of the problem. At the same time, GWOT is notoriously hard, both theoretically and numerically. Moreover, all existing continuous GWOT solvers still heavily rely on discrete techniques. Natural questions arise: to what extent existing methods unravel GWOT problem, what difficulties they encounter, and under which conditions they are successful. Our benchmark paper is an attempt to answer these questions. We specifically focus on the continuous GWOT as the most interesting and debatable setup. We crash-test existing continuous GWOT approaches on different scenarios, carefully record and analyze the obtained results, and identify issues. Our findings experimentally testify that the scientific community is still missing a reliable continuous GWOT solver, which necessitates further research efforts. As the first step in this direction, we propose a new continuous GWOT method which does not rely on discrete techniques and partially solves some of the problems of the competitors. Our code is available at https://github.com/Ark-130994/GW-Solvers.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# FedML-HE: 効率的な同型暗号化に基づくプライバシ保護フェデレーション学習システム FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System ( http://arxiv.org/abs/2303.10837v3 ) ライセンス: Link先を確認	Weizhao Jin, Yuhang Yao, Shanshan Han, Jiajun Gu, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He,	(参考訳) Federated Learningは、ローカルデータではなく、ローカルモデルのアップデートを集約することによって、分散デバイス上の機械学習モデルをトレーニングする。しかし、サーバ上の集約されたローカルモデルは、逆攻撃によって機密性の高い個人情報を明らかにする可能性があるため、プライバシー上の懸念が生じる。ホモモルフィック暗号化(HE)のようなプライバシ保護手法はFLトレーニングに必要となる。 HEのプライバシー上の優位性にもかかわらず、そのアプリケーションは特に基礎モデルにおいて非現実的なオーバーヘッドに悩まされている。本稿では,HedML-HEをベースとした安全なモデルアグリゲーションを効率よく実現した,最初の実践的フェデレーション学習システムを提案する。 FedML-HEは、機密パラメータを選択的に暗号化し、カスタマイズ可能なプライバシ保護を提供しながら、トレーニング中の計算と通信のオーバーヘッドを大幅に削減することを提案する。最適化されたシステムでは,特に大規模な基盤モデル(ResNet-50では10倍,BERTでは40倍程度)において,大幅なオーバーヘッド削減を実現しています。 Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.	翻訳日:2024-06-19 13:20:03 公開日:2024-06-17
# StepMix: 外部変数を持つ一般化混合モデルの擬似的推定のためのPythonパッケージ StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables ( http://arxiv.org/abs/2304.03853v6 ) ライセンス: Link先を確認	Sacha Morin, Robin Legault, Félix Laliberté, Zsuzsa Bakk, Charles-Édouard Giguère, Roxane de la Sablonnière, Éric Lacourse,	(参考訳) StepMixは、外部変数(共変量および遠位結果)を持つ一般化有限混合モデル(潜時プロファイルおよび潜時クラス分析)の擬似的様相推定(1段階、2段階、3段階のアプローチ)のためのオープンソースのPythonパッケージである。社会科学における多くの応用において、主な目的は個人を潜在クラスにクラスタリングするだけでなく、これらのクラスを使用してより複雑な統計モデルを開発することである。これらのモデルは一般に、潜在クラスを観察された指標に関連付ける測定モデルと、共変量と結果変数を潜在クラスに関連付ける構造モデルに分けられる。測定と構造モデルは、いわゆるワンステップアプローチやステップワイド手法を用いて、共同で推定することができる。 1段階法に加えて、Blk-Croon-Hagenaarsを用いたバイアス調整3段階法や最大誤差補正、より最近の2段階法など、文献から最も重要なステップワイズ推定手法を実装している。これらの擬似的様相推定器は、特定の期待-最大化サブルーチンとして統一された枠組みの下で提示される。データサイエンスコミュニティにおける彼らの採用を促進するため、StepMixはScikit-Lernライブラリのオブジェクト指向設計に従い、追加のRラッパーを提供する。 StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# PiClick: クリックベースのインタラクティブセグメンテーションにおいて、複数の候補から望ましいマスクを選択する PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation ( http://arxiv.org/abs/2304.11609v5 ) ライセンス: Link先を確認	Cilin Yan, Haochen Wang, Jie Liu, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves,	(参考訳) クリックベースのインタラクティブセグメンテーションは、効率的なピクセルレベルのアノテーションと画像編集を容易にする、人間のクリックによるターゲットマスクの生成を目的としている。そのようなタスクでは、目的の曖昧さはセグメンテーションの精度と効率を妨げる問題のままである。つまり、リッチなコンテキストのシーンでは、1クリックで複数の潜在的なターゲットに対応できるが、従来の対話型セグメンタは1つのマスクしか生成せず、ターゲットの曖昧さに対処できない。本稿では,PiClickという対話型セグメンテーションネットワークを提案する。具体的には、Transformerベースのアーキテクチャを使用して、相互に対話的なマスククエリによって、潜在的なすべてのマスクを生成する。さらに、ターゲット推論モジュール(TRM)がPiClickで設計され、ターゲットの曖昧さと外的努力を軽減し、すべての候補からユーザ希望のマスクを自動的に提案する。 9つのインタラクティブなセグメンテーションデータセットに関する大規模な実験は、セグメンテーション結果を考慮して、PiClickが従来の最先端技術に対して好適に機能することを示した。さらに,PiClickは,所望のマスクのアノテートや選択において,人間の努力を効果的に削減することを示す。 PiClickのソースコードをhttps://github.com/cilinyan/PiClickのプラグイン・アンド・プレイアノテーションツールと一緒にリリースします。 Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors only generate a single mask and fail to deal with target ambiguity. In this paper, we propose a novel interactive segmentation network named PiClick, to yield all potentially reasonable masks and suggest the most plausible one for the user. Specifically, PiClick utilizes a Transformer-based architecture to generate all potential target masks by mutually interactive mask queries. Moreover, a Target Reasoning module(TRM) is designed in PiClick to automatically suggest the user-desired mask from all candidates, relieving target ambiguity and extra-human efforts. Extensive experiments on 9 interactive segmentation datasets demonstrate PiClick performs favorably against previous state-of-the-arts considering the segmentation results. Moreover, we show that PiClick effectively reduces human efforts in annotating and picking the desired masks. To ease the usage and inspire future research, we release the source code of PiClick together with a plug-and-play annotation tool at https://github.com/cilinyan/PiClick.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# 家系図のトポロジーと会員の満足感 : 機械学習アプローチ The Topology of a Family Tree Graph and Its Members' Satisfaction with One Another: A Machine Learning Approach ( http://arxiv.org/abs/2305.01552v2 ) ライセンス: Link先を確認	Teddy Lazebnik, Amit Yaniv-Rosenfeld,	(参考訳) 家族同士の満足感は、健全で支援的な家族環境作りの中心である。本研究では,ある家系図のトポロジと,そのメンバーの満足度との関係を探索する新しい計算手法を提案し,実装する。広範な実証評価(N=486$ family)を通じて,提案手法は,家族間の満足度を家族グラフのトポロジのみに基づいて予測する上で,高精度な結果をもたらすことを示した。さらに,本手法は,家族の満足度に係わる確立した特徴に依拠するベースライン回帰モデルと比較して,先行文献において好意的に比較した。 Family members' satisfaction with one another is central to creating healthy and supportive family environments. In this work, we propose and implement a novel computational technique aimed at exploring the possible relationship between the topology of a given family tree graph and its members' satisfaction with one another. Through an extensive empirical evaluation ($N=486$ families), we show that the proposed technique brings about highly accurate results in predicting family members' satisfaction with one another based solely on the family graph's topology. Furthermore, the results indicate that our technique favorably compares to baseline regression models which rely on established features associated with family members' satisfaction with one another in prior literature.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# 説明可能な人工知能手法の展望:SHAPとLIME A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME ( http://arxiv.org/abs/2305.02012v3 ) ライセンス: Link先を確認	Ahmed Salih, Zahra Raisi-Estabragh, Ilaria Boscolo Galazzo, Petia Radeva, Steffen E. Petersen, Gloria Menegaz, Karim Lekadir,	(参考訳) eXplainable AI(XAI)メソッドは、機械学習(ML)モデルのブラックボックスを、より消化しやすい形式に変換するために登場した。これらの方法は、MLモデルをより透過的にし、エンドユーザの信頼をアウトプットに高めることを目的として、モデルがどのように機能するかを伝えるのに役立つ。 SHapley Additive exPlanations (SHAP) と Local Interpretable Model Agnostic Explanation (LIME) は2つの広く使われているXAI法である。本稿では、これらの2つの手法の説明可能性指標の生成方法について論じ、その弱点と強みを浮き彫りにして、それらの出力を解釈する枠組みを提案する。具体的には, 生体医学領域(心筋梗塞の有無にかかわらず個人を分類する)の事例から, モデル依存性と特徴間のコリニアリティの有無について考察した。以上の結果から,SHAPとLIMEはMLモデルや特徴コリナリティーの影響を強く受けており,その使用法や解釈に注意を喚起している。 eXplainable artificial intelligence (XAI) methods have emerged to convert the black box of machine learning (ML) models into a more digestible form. These methods help to communicate how the model works with the aim of making ML models more transparent and increasing the trust of end-users into their output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model Agnostic Explanation (LIME) are two widely used XAI methods, particularly with tabular data. In this perspective piece, we discuss the way the explainability metrics of these two methods are generated and propose a framework for interpretation of their outputs, highlighting their weaknesses and strengths. Specifically, we discuss their outcomes in terms of model-dependency and in the presence of collinearity among the features, relying on a case study from the biomedical domain (classification of individuals with or without myocardial infarction). The results indicate that SHAP and LIME are highly affected by the adopted ML model and feature collinearity, raising a note of caution on their usage and interpretation.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# サブスペース学習によるブラックボックスプロンプトチューニング Black-box Prompt Tuning with Subspace Learning ( http://arxiv.org/abs/2305.03518v2 ) ライセンス: Link先を確認	Yuanhang Zheng, Zhixing Tan, Peng Li, Yang Liu,	(参考訳) ブラックボックスプロンプトチューニングは、大言語モデル(LLM)のネットワークをバックプロパゲートするのではなく、低次元のサブ空間内でプロンプトを学習するためにデリバティブフリー最適化アルゴリズムを用いる。近年の研究では、ブラックボックスのプロンプトチューニングはタスクやLLM間の汎用性に欠けており、これは部分空間の最適下選択に関係していると考えられている。本稿では,サブスペース・ラーニング(BSL)を用いたブラックボックス・プロンプト・チューニングを導入し,ブラックボックス・プロンプト・チューニングの汎用性を高める。類似したタスクのほぼ最適なプロンプトが共通部分空間に存在するという仮定に基づいて、類似したタスクのコレクション上でメタラーニングによってそのようなサブスペースを特定することを提案する。したがって、ソースタスクと類似性を共有するターゲットタスクに対しては、特定したサブスペース内での最適化により、ターゲットタスクに対して良好に動作するプロンプトが得られることを期待する。実験の結果,BSL フレームワークは様々な下流タスクや LLM の競合性能を一貫して達成していることがわかった。 Black-box prompt tuning employs derivative-free optimization algorithms to learn prompts within low-dimensional subspaces rather than back-propagating through the network of Large Language Models (LLMs). Recent studies reveal that black-box prompt tuning lacks versatility across tasks and LLMs, which we believe is related to the suboptimal choice of subspaces. In this paper, we introduce Black-box prompt tuning with Subspace Learning (BSL) to enhance the versatility of black-box prompt tuning. Based on the assumption that nearly optimal prompts for similar tasks reside in a common subspace, we propose identifying such subspaces through meta-learning on a collection of similar source tasks. Consequently, for a target task that shares similarities with the source tasks, we expect that optimizing within the identified subspace can yield a prompt that performs well on the target task. Experimental results confirm that our BSL framework consistently achieves competitive performance across various downstream tasks and LLMs.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# 暗黒領域における共同脱落と低照度化のための2次元劣化表現法 Dual Degradation Representation for Joint Deraining and Low-Light Enhancement in the Dark ( http://arxiv.org/abs/2305.03997v3 ) ライセンス: Link先を確認	Xin Lin, Jingtong Yue, Sixian Ding, Chao Ren, Lu Qi, Ming-Hsuan Yang,	(参考訳) 暗闇の中での雨は、自律運転や監視システム、夜間写真など、現実世界のアプリケーションをデプロイする上で大きな課題となる。既存の低照度化や除染法は、低照度を明るくし、同時に雨を取り除くのに苦労する。また、「低照度化」などのカスケードのアプローチは、しばしば雨のパターンの問題や過度にぼやけ、露出の過剰なイメージをもたらす。これらの課題に対処するために,L$^{2}$RIRNetというエンド・ツー・エンドのモデルを導入する。本モデルでは、DDR-Net(Dual Degradation Representation Network)とRecovery Networkの2つの主要コンポーネントを特徴とする。 DDR-Netは、暗黒領域の輝度効果と光領域の雨パターンの劣化表現を独立に学習し、トレーニングプロセスの導出に二重劣化損失を用いる。復元ネットワークは、FDG(Fourier Detail Guidance)モジュールを用いて劣化した画像を復元する。さらに,合成画像と実世界の低照度画像の両方を含むデータセットをコントリビュートする。我々のL$^{2}$RIRNetは、合成実世界のシナリオと複雑な実世界のシナリオの両方において、既存の手法に対して好意的に作用することを示した。すべてのコードとデータセットは \url{https://github.com/linxin0/Low_light_rainy} で見ることができる。 Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in problematic rain patterns or overly blurred and overexposed images. To address these challenges, we introduce an end-to-end model called L$^{2}$RIRNet, designed to manage both low-light enhancement and deraining in real-world settings. Our model features two main components: a Dual Degradation Representation Network (DDR-Net) and a Restoration Network. The DDR-Net independently learns degradation representations for luminance effects in dark areas and rain patterns in light areas, employing dual degradation loss to guide the training process. The Restoration Network restores the degraded image using a Fourier Detail Guidance (FDG) module, which leverages near-rainless detailed images, focusing on texture details in frequency and spatial domains to inform the restoration process. Furthermore, we contribute a dataset containing both synthetic and real-world low-light-rainy images. Extensive experiments demonstrate that our L$^{2}$RIRNet performs favorably against existing methods in both synthetic and complex real-world scenarios. All the code and dataset can be found in \url{https://github.com/linxin0/Low_light_rainy}.	翻訳日:2024-06-19 13:10:19 公開日:2024-06-17
# 大規模言語モデルのための自己教師型論理強化学習の探索 Exploring Self-supervised Logic-enhanced Training for Large Language Models ( http://arxiv.org/abs/2305.13718v7 ) ライセンス: Link先を確認	Fangkai Jiao, Zhiyang Teng, Bosheng Ding, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty,	(参考訳) 言語モデルの論理的推論能力を改善する努力は、主に教師付き微調整に依存し、新しいドメインやタスクへの一般化を妨げる。 LLM(Large Langauge Models)の開発は、豊富な知識を単一のプロキシに圧縮する能力を示し、複数のタスクに効果的に対処できるようにする。予備実験では, LLMは論理的推論の能力を示していない。論理的推論ベンチマークにおけるLLMのパフォーマンスは、既存の最先端のベースラインよりもはるかに遅れている。本稿では,自己教師付きポストトレーニングを通じて論理知識を組み込むことの実現可能性について検討し,論理LLM(LogicLLM)と呼ぶコンテキスト内学習を通じてそれを活性化する試みを行う。具体的には、自動回帰客観的なMERItを考案し、パラメータサイズが30億から13億の2つのLLM系列、すなわちFLAN-T5とLLaMAと統合する。 2つの挑戦的な論理的推論ベンチマークの結果は、LogicLLMの有効性を示している。さらに、論理指向のプロキシタスクを設計する上で重要な要素を分析するために、広範囲にわたるアブレーション研究を行っている。 Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevertheless, show that LLMs do not show capability on logical reasoning. The performance of LLMs on logical reasoning benchmarks is far behind the existing state-of-the-art baselines. In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training, and activating it via in-context learning, which we termed as LogicLLM. Specifically, we devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM. Besides, we conduct extensive ablation studies to analyze the key factors in designing logic-oriented proxy tasks.	翻訳日:2024-06-19 13:00:15 公開日:2024-06-17
# 3次元点雲のラベル有効深層学習に関する調査研究 A Survey of Label-Efficient Deep Learning for 3D Point Clouds ( http://arxiv.org/abs/2305.19812v2 ) ライセンス: Link先を確認	Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu,	(参考訳) 過去10年間で、深層ニューラルネットワークは、ポイントクラウド学習において大きな進歩を遂げてきた。しかし、大規模で正確に注釈付けされたトレーニングデータの収集は非常に困難で費用がかかるため、既存のポイントクラウドデータセットのスケーラビリティが損なわれ、さまざまなタスクやアプリケーションにおけるポイントクラウドデータの効率的な探索のボトルネックとなる。ラベル効率のよい学習は、多量のアノテーションによる効果的なディープネットワークトレーニングを可能にすることで、有望なソリューションを提供する。本稿では,点雲のラベル効率学習に関する包括的調査を行う。この新興研究分野における3つの重要な疑問に対処する。一ポイントクラウド処理におけるラベル効率学習の重要性及び緊急性二被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被三この分野での進歩そこで本研究では,ラベルの種類によって提供されるデータ前提条件に基づいて,ラベル効率のよい学習手法を整理する分類法を提案する。データ拡張、ドメイン移行学習、弱教師付き学習、事前訓練された基礎モデルという、ポイントクラウドアノテーションの取り組みを著しく削減する、ラベル効率のよい4つの典型的な学習手法を分類する。それぞれのアプローチについて、問題設定の概要と、関連する進展と課題を示す広範な文献レビューを提供する。最後に、現在の研究課題と今後の方向性についての洞察を共有します。この調査に関連するプロジェクトはhttps://github.com/xiaoaoran/3D_label_efficient_learning.comに構築されている。 In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated training data is extremely laborious and expensive, which hinders the scalability of existing point cloud datasets and poses a bottleneck for efficient exploration of point cloud data in various tasks and applications. Label-efficient learning offers a promising solution by enabling effective deep network training with much-reduced annotation efforts. This paper presents the first comprehensive survey of label-efficient learning of point clouds. We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area. To achieve this, we propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels. We categorize four typical label-efficient learning approaches that significantly reduce point cloud annotation efforts: data augmentation, domain transfer learning, weakly-supervised learning, and pretrained foundation models. For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges. Finally, we share insights into current research challenges and potential future directions. A project associated with this survey has been built at https://github.com/xiaoaoran/3D_label_efficient_learning.	翻訳日:2024-06-19 13:00:14 公開日:2024-06-17
# ArtWhisperer:芸術創造における人間とAIのインタラクションを特徴付けるデータセット ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations ( http://arxiv.org/abs/2306.08141v4 ) ライセンス: Link先を確認	Kailas Vodrahalli, James Zou,	(参考訳) 生成的AIがより普及するにつれて、人間のユーザがそのようなモデルとどのように相互作用するかを研究することが重要である。本研究では,対象画像の生成にテキスト・ツー・イメージ・モデルをどのように利用するかを検討する。このインタラクションを研究するために、私たちはArtWhispererというオンラインゲームを作成しました。このゲームを通して、5万以上の人間とAIのインタラクションを記録し、各インタラクションは、ユーザが生成した1つのテキストプロンプトと、それに対応する生成された画像に対応する。その多くは、ユーザがターゲットイメージの最良のプロンプトを見つけるために反復的なインタラクションであり、これは人間とAIのコラボレーションを研究するためのユニークなシーケンシャルデータセットである。本データセットの初期分析では,迅速なインタラクションとユーザ戦略のいくつかの特徴を同定する。人々は多様なプロンプトを提出し、類似した画像を生成するさまざまなテキスト記述を発見できる。興味深いことに、ユーザがより良いプロンプトを見つけるため、迅速な多様性は低下しない。さらに,我々のデータセットを用いたAIの聴取可能性の定量化のための新しい指標を提案する。我々は、タスクを適切に完了させるために必要な相互作用の期待数として、ステアビリティを定義する。この値は、各目標タスクにマルコフ連鎖を適合させ、マルコフ連鎖の適切なスコアに到達するための期待時間を計算することで推定する。我々は、異なるタイプのターゲットイメージと2つの異なるモデルでAIのステアビリティを定量化し比較し、都市と自然世界のイメージが芸術的、幻想的なイメージよりもステアビリティが高いことを発見した。これらの知見は、AIとAIの相互作用に関する洞察を与え、AIのステアビリティを評価する具体的な方法を示し、ArtWhispererデータセットの汎用性を実証する。 As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.	翻訳日:2024-06-19 13:00:14 公開日:2024-06-17
# RoMe:メッシュ表現による大規模道路表面再構築に向けて RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation ( http://arxiv.org/abs/2306.11368v3 ) ライセンス: Link先を確認	Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang,	(参考訳) 自律運転アプリケーションでは、正確で効率的な道路表面の再構築が最重要である。本稿では,大規模道路路面の堅牢な復元を目的とした新しいフレームワークであるRoMeを紹介する。ユニークなメッシュ表現を活用することで、再構築された路面が正確で、セマンティックスとシームレスに整合していることを保証する。計算効率の課題に対処するため,我々は,RoMeがサブアレーに着目し,その後にマージすることで,広大な環境を再構築できる経路点サンプリング戦略を提案する。さらに,外因性キャリブレーションにおける不正確性に対するロバスト性を高めるために,外因性最適化モジュールを組み込んだ。パブリックデータセットとワイルドデータの両方に対する広範な評価は、速度、正確性、堅牢性という点で、RoMeの優位性を示している。たとえば、何千もの画像から600600平方メートルの道路表面を回収するのに2GPU時間しかかからない。特に、RoMeの機能は単なる再構築を超えて、自律運転アプリケーションにおける自動ラベリングタスクに重要な価値を提供する。関連するすべてのデータとコードはhttps://github.com/DRosemei/RoMe.comで入手できる。 In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for autolabeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe.	翻訳日:2024-06-19 13:00:14 公開日:2024-06-17
# HEDI : 初回臨床応用と臨床応用 : 内臓ヘルニア修復のためのバイオメカニカル・アセスメント・ビジュアライゼーション・ツールの開発 HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair ( http://arxiv.org/abs/2307.01502v2 ) ライセンス: Link先を確認	Philipp D. Lösel, Jacob J. Relle, Samuel Voß, Ramesch Raschidi, Regine Nessel, Johannes Görich, Mark O. Wielpütz, Thorsten Löffler, Vincent Heuveline, Friedrich Kallinowski,	(参考訳) 腹壁欠損は、しばしば痛み、不快感、また切開ヘルニアの再発を招き、世界中で重大な致命傷と外科的修復を繰り返している。大きなヘルニアに対するメッシュ修復は通常、筋肉の活性化、腹部内圧、組織弾性、腹部壁の伸展といった生体力学的要因を無視して、重なりが固定された欠損領域に基づいている。この問題を解決するため,不安定な腹壁を考慮に入れた切開ヘルニア修復に対する生体力学的アプローチを提案する。また,Valsalva操作によるCTを用いて,ヘルニアの大きさ,容積,腹部壁の不安定性を自動検出・評価するHEDIも導入した。 31例の術前評価におけるHEDIの初回臨床応用は, 術後3年経過した後の無痛, ヘルニア再発を伴わない症例で, 報告例と比較して有意に改善した。 Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, neglecting biomechanical factors such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distension. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and experiencing no hernia recurrence after three years of follow-up.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 近似二重量子ドットにおけるフェルミオンパリティ量子ビット Fermion-parity qubit in a proximitized double quantum dot ( http://arxiv.org/abs/2307.05678v2 ) ライセンス: Link先を確認	Max Geier, Rubén Seoane Souto, Jens Schulenborg, Serwan Asaad, Martin Leijnse, Karsten Flensberg,	(参考訳) 超伝導体に結合した量子ドットのバウンド状態は、異なる電子数を持つが同じフェルミオンパリティを持つ状態のコヒーレントな重ね合わせにある。静電ゲーティングは、この重ね合わせを、電子数パリティとは無関係に同じ平均電荷を持つ量子ドットのスイートスポットに調整することができる。ここでは,ジョセフソン接合に埋め込まれた2つのトンネル結合量子ドットの局所フェルミオンパリティの量子情報を符号化する。スイートスポットでは、クォービット状態は電荷双極子モーメントがゼロである。これにより、各ドットの電位に作用する電荷ノイズと(弱)ドット間トンネルのゆらぎにより、クォービットが劣化するのを防ぐ。弱いドット間トンネルでは、不整合量子ビット状態のため緩和が抑制される。一方、強いドット間トンネルの場合、システムはそれぞれの量子ドットに影響を与えるノイズ(エネルギーレベルノイズ、ドット-超伝導トンネル変動、超微細相互作用)に対して保護される。最後に、ゲート電圧をパルスすることで、初期化および読み出し、およびシングルキュービットおよび2キュービットゲートを記述する。 Bound states in quantum dots coupled to superconductors can be in a coherent superposition of states with different electron number but with the same fermion parity. Electrostatic gating can tune this superposition to a sweet spot, where the quantum dot has the same mean electric charge independent of its electron-number parity. Here, we propose to encode quantum information in the local fermion parity of two tunnel-coupled quantum dots embedded in a Josephson junction. At the sweet spot, the qubit states have zero charge dipole moment. This protects the qubit from dephasing due to charge noise acting on the potential of each dot, as well as fluctuations of the (weak) inter-dot tunneling. At weak inter-dot tunneling, relaxation is suppressed because of disjoint qubit states. On the other hand, for strong inter-dot tunneling the system is protected against noise affecting each quantum dot separately (energy level noise, dot-superconductor tunneling fluctuations, and hyperfine interactions). Finally, we describe initialization and readout as well as single-qubit and two-qubit gates by pulsing gate voltages.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# ディープニューラルネットワークにおける定量的CLT Quantitative CLTs in Deep Neural Networks ( http://arxiv.org/abs/2307.06092v5 ) ライセンス: Link先を確認	Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati,	(参考訳) ランダムなガウス重みとバイアスを持つ完全連結ニューラルネットワークの分布について検討し,隠れた層幅が大きな定数$n$に比例することを示した。非線型性に関する軽度な仮定の下では、大まかではあるが有限の$n$および任意の固定されたネットワーク深さで有効な正規近似の量的境界を求める。我々の定理は有限次元分布と全過程について、ランダムに連結されたネットワーク(およびその導関数)と対応する無限幅ガウス過程の間の距離が$n^{-\gamma}$ for $\gamma>0$ であることを示す。我々の境界は、それまでの文献よりもネットワーク幅に依存しているという点で強く、一次元の場合、それらが最適であること、すなわち一致した下界を確立することを証明する。 We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 形式的解釈性を考慮した確率的制約付き強化学習 Probabilistic Constrained Reinforcement Learning with Formal Interpretability ( http://arxiv.org/abs/2307.07084v4 ) ライセンス: Link先を確認	Yanran Wang, Qiuchen Qian, David Boyle,	(参考訳) 強化学習は、変動力学を用いた逐次決定問題に対する効果的な推論を提供することができる。しかし、実際的な実装におけるそのような推論は、報酬関数と対応する最適ポリシーを解釈する上で、永続的な課題となる。したがって、逐次意思決定問題を確率的推論として表すことは、原則として、この推論は、確率的力学を推論し、政策最適化の確率論的解釈を示唆しながら、多様で強力な数学的ツールを提供する。本研究では,これらの解釈可能性問題に対処するために,適応ワッサースタイン変分最適化(AWaVO)を提案する。提案手法は,コンバージェンス保証の解釈可能性,透明性の訓練,本質的な決定解釈を実現するために形式的手法を用いる。その実用性を示すために,シミュレーションおよび実運用4次タスクにおいて,最適な大域収束率で解釈可能性を示す。 TRPO-IPO、PCPO、CRPOといった最先端のベンチマークと比較して、AWaVOがハイパフォーマンスと十分な解釈可能性の間に合理的なトレードオフをもたらすことを実証的に検証する。 Reinforcement learning can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and the corresponding optimal policy. Consequently, representing sequential decision-making problems as probabilistic inference can have considerable value, as, in principle, the inference offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of policy optimization. In this study, we propose a novel Adaptive Wasserstein Variational Optimization, namely AWaVO, to tackle these interpretability challenges. Our approach uses formal methods to achieve the interpretability for convergence guarantee, training transparency, and intrinsic decision-interpretation. To demonstrate its practicality, we showcase guaranteed interpretability with an optimal global convergence rate in simulation and in practical quadrotor tasks. In comparison with state-of-the-art benchmarks including TRPO-IPO, PCPO and CRPO, we empirically verify that AWaVO offers a reasonable trade-off between high performance and sufficient interpretability.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 大規模言語モデルを用いた深度検索のためのソフトプロンプトチューニング Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models ( http://arxiv.org/abs/2307.08303v5 ) ライセンス: Link先を確認	Zhiyuan Peng, Xuyang Wu, Qifan Wang, Yi Fang,	(参考訳) Dense Search (DR) はクエリとドキュメントを密埋め込みに変換し、ベクトル空間におけるクエリとドキュメント間の類似度を測定する。 DRの課題のひとつは、ドメイン固有のトレーニングデータがないことだ。 DRモデルは、転送学習を通じてMS MARCOのような大規模な公開データセットから学習することができるが、すべてのDRモデルやドメインが転送学習の恩恵を受けるわけではないことが証拠として示される。近年、一部の研究者はゼロショットと少数ショットのDRモデルを改善するために大規模言語モデル(LLM)を活用している。しかし、これらの作業で使われるハードプロンプトや人書きプロンプトは、生成された弱いクエリの質を保証できない。そこで本研究では,DR(SPTAR)強化のためのソフトプロンプトチューニングを提案する。各タスクに対して,ソフトプロンプトチューニングを活用して,限られた真実データに基づいてタスク固有のソフトプロンプトを最適化する。我々は、弱いタグ付きクエリの品質をさらに向上させるために、高品質な文書クエリペアを選択するフィルタを設計する。我々の知る限り、DRモデルを増強するためにソフトプロンプトチューニングを利用する事前の作業はない。実験により、SPTARは、教師なしベースラインBM25と、最近提案されたDRのLLMベースの拡張法よりも優れていることが示された。 Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 構文対応複合価値ニューラルマシン翻訳 Syntax-Aware Complex-Valued Neural Machine Translation ( http://arxiv.org/abs/2307.08586v2 ) ライセンス: Link先を確認	Yang Liu, Yuexian Hou,	(参考訳) シンタクスは神経機械翻訳(NMT)において極めて効果的であることが証明されている。従来のモデルは構文解析ツールから構文情報を取得し、翻訳性能を向上させるためにNMTモデルに統合された。本研究では,構文情報を複雑なエンコーダ・デコーダアーキテクチャに組み込む手法を提案する。提案モデルは,単語レベルと構文レベルのアテンションスコアを,アテンション機構を用いて,ソース側からターゲット側へ共同で学習する。重要なのは、特定のネットワークアーキテクチャに依存しておらず、既存のシークエンス・ツー・シーケンス(Seq2Seq)フレームワークに直接統合可能であることだ。実験により,提案手法は2つのデータセット上でのBLEUスコアを大幅に改善できることを示した。特に,提案手法は,意味的な構文的差異を持つ言語ペアを含む翻訳タスクにおいて,BLEUスコアをより向上させる。 Syntax has been proven to be remarkably effective in neural machine translation (NMT). Previous models obtained syntax information from syntactic parsing tools and integrated it into NMT models to improve translation performance. In this work, we propose a method to incorporate syntax information into a complex-valued Encoder-Decoder architecture. The proposed model jointly learns word-level and syntax-level attention scores from the source side to the target side using an attention mechanism. Importantly, it is not dependent on specific network architectures and can be directly integrated into any existing sequence-to-sequence (Seq2Seq) framework. The experimental results demonstrate that the proposed method can bring significant improvements in BLEU scores on two datasets. In particular, the proposed method achieves a greater improvement in BLEU scores in translation tasks involving language pairs with significant syntactic differences.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 大規模言語モデルに基づくファズドライバ生成の理解 Understanding Large Language Model Based Fuzz Driver Generation ( http://arxiv.org/abs/2307.12469v4 ) ライセンス: Link先を確認	Cen Zhang, Mingqiang Bai, Yaowen Zheng, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, Yang Liu,	(参考訳) LLM(Large Language Model)ファズドライバ生成は有望な研究分野である。従来のプログラム分析ベースの手法とは異なり、このテキストベースのアプローチはより一般的であり、様々なAPI使用情報を利用することができる。しかし、その効果や潜在的な課題など、この方向の根本的な問題に対する理解の欠如がまだ残っている。このギャップを埋めるために,LLMを用いてファズドライバを効果的に生成する上での重要な課題を対象とした,最初の詳細な研究を行った。本研究は,30の広く利用されているCプロジェクトから86のファズドライバ生成質問を収集した,キュレートされたデータセットを特徴とする。 6つのプロンプト戦略は、5つの異なる温度設定を持つ5つの最先端のLCMで設計およびテストされる。合計で736,430個のファジィドライバを評価したところ、トークンのコストは0.85億ドル(8000ドル以上)だった。さらに,LLM生成ドライバを産業用ドライバと比較し,ファジリング実験(3.75 CPU-year)を行った。 LLMをベースとしたファズドライバ生成は有望な方向であるが、実用的アプリケーションに対するいくつかの障害に直面している; - LLMは複雑な仕様を持つAPIに対して効果的なファズドライバを生成するのに困難に直面している。繰り返しクエリの発行、例によるクエリ、反復的なクエリプロセスの採用、 – LLMの生成したドライバは、業界で使用されているものと同等のファジィな結果を得ることができるが、含まれたAPI使用の延長や、論理的なバグ検出を容易にするセマンティックオーラクルの統合など、拡張する大きなチャンスがある。我々の洞察はOSS-Fuzz-Genプロジェクトを改善するために実装され、業界におけるファズドライバの実践的生成を促進しました。 LLM-based (Large Language Model) fuzz driver generation is a promising research area. Unlike traditional program analysis-based method, this text-based approach is more general and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: - While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; - LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; - While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 繰り返しビーム分割によるコヒーレンス Coherence via reiterated beam splitting ( http://arxiv.org/abs/2307.13279v2 ) ライセンス: Link先を確認	Guillermo Díez, Laura Ares, Alfredo Luis,	(参考訳) ビームスプリッターは、量子コヒーレンスに関する自由な操作である。その結果、コヒーレント状態と非コヒーレント状態の両方からコヒーレンスを生成することができる。ビームスプリッタのカスケードによるコヒーレンスの増加について検討した。この目的のために、2つの異なる構成を構築し、入力状態の異なるシーケンスを解析する。 Beam splitters are not-free operations with regard to quantum coherence. As a consequence, they can create coherence from both coherent and incoherent states. We investigate the increase in coherence produced by cascades of beam splitters. To this end, we construct two different configurations and analyze different sequences of input states.	翻訳日:2024-06-19 12:50:30 公開日:2024-06-17
# 最近近傍における最小Q学習 Minimax Optimal Q Learning with Nearest Neighbors ( http://arxiv.org/abs/2308.01490v2 ) ライセンス: Link先を確認	Puning Zhao, Lifeng Lai,	(参考訳) マルコフ決定プロセス(MDP)を連続状態空間で解析することは一般的に困難である。最近の興味深い研究 \cite{shah2018q} は、隣接する$Q$学習アプローチによって境界付き連続状態空間を持つ MDP を解き、サンプル複雑性は $\tilde{O}(\frac{1}{\epsilon^{d+3}(1-\gamma)^{d+7}})$ for $\epsilon$-accurate $Q$ function Estimation with discount factor $\gamma$ である。本稿では,オフライン設定用とオンライン設定用という,近接した2つの学習方法を提案する。これら2つの方法のサンプル複雑度は$\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+2}})$と$\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+3}})$である。試料をより効率的に利用し, 改良を図っている。特に、 \cite{shah2018q} のメソッドは反復後にすべてのサンプルをクリアするので、これらのサンプルは幾らか無駄になる。一方、オフラインメソッドはサンプルを削除せず、オンラインメソッドは、$\beta t$ at time $t$と$\beta$が調整可能なパラメータである時間でのみ、サンプルを削除します。サンプルの複雑さを別にすれば、我々の手法は計算の複雑さを向上し、非有界な状態空間に適しているという利点もある。 Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor $Q$ learning approach, which has a sample complexity of $\tilde{O}(\frac{1}{\epsilon^{d+3}(1-\gamma)^{d+7}})$ for $\epsilon$-accurate $Q$ function estimation with discount factor $\gamma$. In this paper, we propose two new nearest neighbor $Q$ learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are $\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+2}})$ and $\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+3}})$ for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over $\epsilon$. We achieve such improvement by utilizing the samples more efficiently. In particular, the method in \cite{shah2018q} clears up all samples after each iteration, thus these samples are somewhat wasted. On the other hand, our offline method does not remove any samples, and our online method only removes samples with time earlier than $\beta t$ at time $t$ with $\beta$ being a tunable parameter, thus our methods significantly reduce the loss of information. Apart from the sample complexity, our methods also have additional advantages of better computational complexity, as well as suitability to unbounded state spaces.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# 超流動ヘリウム中の単一電子スピン検出法の提案 A proposal for detecting the spin of a single electron in superfluid helium ( http://arxiv.org/abs/2308.07174v2 ) ライセンス: Link先を確認	Jinyong Ma, Y. S. S. Patil, Jiaxin Yu, Yiqi Wang, J. G. E. Harris,	(参考訳) 超流動ヘリウム中の電子バブルは2つの自由度を持ち、電子のスピンと気泡の運動という非常に低い散逸をもたらす可能性がある。これらの自由度が十分な感度で読み出され、制御できるなら、様々な量子技術を実現し、超流動ヘリウムの物理学におけるオープンな疑問を探求するための新しいプラットフォームを提供するだろう。本稿では,超流動充填光音響キャビティ内で電子気泡を捕捉し,これを実現するための実用的な手法を提案する。 The electron bubble in superfluid helium has two degrees of freedom that may offer exceptionally low dissipation: the electron's spin and the bubble's motion. If these degrees of freedom can be read out and controlled with sufficient sensitivity, they would provide a novel platform for realizing a range of quantum technologies and for exploring open questions in the physics of superfluid helium. Here we propose a practical scheme for accomplishing this by trapping an electron bubble inside a superfluid-filled opto-acoustic cavity.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# Image-to-Point Cloud Saliency Transferを用いた注意誘導ライダーセグメンテーションとオドメトリー Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer ( http://arxiv.org/abs/2308.14332v2 ) ライセンス: Link先を確認	Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, Masahiro Murakawa, Ryosuke Nakamura,	(参考訳) LiDAR計測と3Dセマンティックセグメンテーションは自動運転に不可欠であり、近年顕著な進歩を遂げている。しかし,これらの課題は,3次元セマンティックセグメンテーションの異なるセマンティックカテゴリにおけるポイントの不均衡や,LiDAR odometry 推定における動的オブジェクトの影響により,ロバストな特徴学習のための参照ポイントとして代表/サレントなランドマークを使用することの重要性が高まっているため,課題である。これらの課題に対処するために、注意情報を活用してLiDAR odometry 推定とセマンティックセグメンテーションモデルの性能を向上させるサリエンシ誘導手法を提案する。画像領域とは異なり、注釈付きトレーニングデータがないため、ポイントクラウドのサリエンシ情報に対処した研究はごくわずかである。これを緩和するために、私たちはまず、カラー画像からポイントクラウドに塩分分布の知識を伝達するための普遍的なフレームワークを提示し、これを用いてポイントクラウドのための擬似塩分分布データセット(すなわちFordSaliency)を構築する。そこで我々は,Pseudo-SaliencyラベルからSalLiDARモジュールを学習するために,ポイントクラウドベースのバックボーンを導入し,それに続いてSalLiDARモジュールを提案する。 SalLiDARは3次元セマンティックセマンティックセマンティクスモデルであり、セマンティクス性能を向上させるために、サリエンシ情報を統合する。最後に、SalLiDARのセマンティックおよびサリエンシ予測を用いて、より優れたオドメトリー推定を実現する自己教師型サリエンシ誘導型LiDARオドメトリーネットワークであるSalLONetを紹介する。提案したSalLiDARモデルとSalLONetモデルが既存の手法に対する最先端性能を実現し,画像からLiDARへのサリエンシ知識伝達の有効性を明らかにした。ソースコードはhttps://github.com/nevrez/SalLONet.comで入手できる。 LiDAR odometry estimation and 3D semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3D semantic segmentation and the influence of dynamic objects for LiDAR odometry estimation, which increases the importance of using representative/salient landmarks as reference points for robust feature learning. To address these challenges, we propose a saliency-guided approach that leverages attention information to improve the performance of LiDAR odometry estimation and semantic segmentation models. Unlike in the image domain, only a few studies have addressed point cloud saliency information due to the lack of annotated training data. To alleviate this, we first present a universal framework to transfer saliency distribution knowledge from color images to point clouds, and use this to construct a pseudo-saliency dataset (i.e. FordSaliency) for point clouds. Then, we adopt point cloud-based backbones to learn saliency distribution from pseudo-saliency labels, which is followed by our proposed SalLiDAR module. SalLiDAR is a saliency-guided 3D semantic segmentation model that integrates saliency information to improve segmentation performance. Finally, we introduce SalLONet, a self-supervised saliency-guided LiDAR odometry network that uses the semantic and saliency predictions of SalLiDAR to achieve better odometry estimation. Our extensive experiments on benchmark datasets demonstrate that the proposed SalLiDAR and SalLONet models achieve state-of-the-art performance against existing methods, highlighting the effectiveness of image-to-LiDAR saliency knowledge transfer. Source code will be available at https://github.com/nevrez/SalLONet.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# IDVT:ソーシャルレコメンデーションのための関心ある認知的認知とビュー誘導型チューニング IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation ( http://arxiv.org/abs/2308.15926v2 ) ライセンス: Link先を確認	Dezhao Yang, Jianghong Ma, Shanshan Feng, Haijun Zhang, Zhao Zhang,	(参考訳) 情報時代のレコメンデーションシステムは,情報のフィルタリングやユーザの好みの特定に不可欠である。オンラインソーシャルプラットフォームは、貴重な補助情報を提供することで、これらのシステムを豊かにしている。ソーシャル接続されたユーザは、同様の好みを共有し、レコメンデーションの精度を高め、コールドスタートの問題に対処することが想定される。しかし、実証的な発見は、特定の社会的つながりがシステムのパフォーマンスを実際に損なう可能性があることを明らかにし、この仮定に挑戦する。我々の統計分析は、多くのソーシャル・コネクテッド・ユーザーが共通の関心を共有していないソーシャルネットワークにおいて、かなりの量のノイズを示唆している。この問題に対処するために,社会レコメンデーションのための革新的な \underline{I}nterest-aware \underline{D}enoising と \underline{V}ieded \underline{T}uning (IDVT) 手法を提案する。第1のID部は、社会的つながりを効果的に認知する。具体的には、ソーシャルネットワークの構造とユーザインタラクションの利害関係をグローバルな視点で考察する。さらに、このグローバルな視点では、デノベートされたソーシャル情報(社会ドメイン)を、ユーザとイテムの相互作用(協調ドメイン)の伝播に統合し、ゲーティング機構を用いて2つのドメインからのユーザ表現を集約する。我々の第2のVT部では、ユーザ関心の潜在的な損失に対処し、グローバルビュー内のモデルロバスト性を高めるために、コントラスト学習を通じて、グローバルビューに微調整されたユーザ表現のための2つの追加ビュー(ローカルビューとドロップアウトエンハンスビュー)を導入している。ノイズ比の異なる実世界のデータセットに対する広範囲な評価は、最先端の社会的レコメンデーション手法よりもIDVTの方が優れていることを示す。 In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge the assumption, revealing that certain social connections can actually harm system performance. Our statistical analysis indicates a significant amount of noise in the social network, where many socially connected users do not share common interests. To address this issue, we propose an innovative \underline{I}nterest-aware \underline{D}enoising and \underline{V}iew-guided \underline{T}uning (IDVT) method for the social recommendation. The first ID part effectively denoises social connections. Specifically, the denoising process considers both social network structure and user interaction interests in a global view. Moreover, in this global view, we also integrate denoised social information (social domain) into the propagation of the user-item interactions (collaborative domain) and aggregate user representations from two domains using a gating mechanism. To tackle potential user interest loss and enhance model robustness within the global view, our second VT part introduces two additional views (local view and dropout-enhanced view) for fine-tuning user representations in the global view through contrastive learning. Extensive evaluations on real-world datasets with varying noise ratios demonstrate the superiority of IDVT over state-of-the-art social recommendation methods.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# ICLEF: 説明可能なスタイル転送のためのエキスパートフィードバックによるインコンテキスト学習 ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer ( http://arxiv.org/abs/2309.08583v2 ) ライセンス: Link先を確認	Arkadiy Saakyan, Smaranda Muresan,	(参考訳) 最先端の大規模言語モデル(LLM)は、あるスタイルから別のスタイルへのテキストの適応に優れるが、現在の作業はスタイル転送モデルの説明可能性に対処するものではない。近年の研究では、より大きな教師モデルからテキストによる説明を作成し、それをより小さな学生モデルに蒸留する方法が検討されている。このアプローチの課題の1つは、LCM出力には、修正する専門知識を必要とするエラーが含まれているかもしれないが、コストと可用性のために専門家のフィードバックを集め、取り入れることは困難である。この課題に対処するため,本論文では,文脈内学習と自己批判を組み合わせ,少ない専門家によるフィードバックを取り入れた,新しい人間-AI協調型蒸留手法であるICLEFを提案する。提案手法は,形式性(e-GYAFC)と主観的バイアス(e-WNC)のための高品質な合成説明可能なスタイル転送データセットを生成する。自動的, 人的評価により, 一般教師モデルでは, 単発で説明可能なスタイル伝達タスクにおいて, 教師モデルよりも優れ, 教師モデルと比較し, データの質と専門家のフィードバックの役割を強調した。本研究は,e-GYAFCで微調整された小型モデルによる説明は,教師による説明よりも著者の予測性が高いことを示す。 While state-of-the-art large language models (LLMs) can excel at adapting text from one style to another, current work does not address the explainability of style transfer models. Recent work has explored generating textual explanations from larger teacher models and distilling them into smaller student models. One challenge with such approach is that LLM outputs may contain errors that require expertise to correct, but gathering and incorporating expert feedback is difficult due to cost and availability. To address this challenge, we propose ICLEF, a novel human-AI collaboration approach to model distillation that incorporates scarce expert human feedback by combining in-context learning and model self-critique. We show that our method leads to generation of high-quality synthetic explainable style transfer datasets for formality (e-GYAFC) and subjective bias (e-WNC). Via automatic and human evaluation, we show that specialized student models fine-tuned on our datasets outperform generalist teacher models on the explainable style transfer task in one-shot settings, and perform competitively compared to few-shot teacher models, highlighting the quality of the data and the role of expert feedback. In an extrinsic task of authorship attribution, we show that explanations generated by smaller models fine-tuned on e-GYAFC are more predictive of authorship than explanations generated by few-shot teacher models.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# 性能保証を用いたρ-POMDPの簡易化 Measurement Simplification in ρ-POMDP with Performance Guarantees ( http://arxiv.org/abs/2309.10701v2 ) ライセンス: Link先を確認	Tom Yotam, Vadim Indelman,	(参考訳) 不確実性の下での意思決定は、不完全な情報で行動する自律システムの中心にある。意思決定問題を解決するコストは行動や観察空間において指数関数的であり、多くのオンラインシステムでは実現不可能である。本稿では,高次元観測空間を分割することで,効率的な意思決定手法を提案する。分割された観測空間を用いて、一般的な信念分布に対する期待される情報理論的報酬に関する解析的境界を定式化する。これらのバウンダリは、パフォーマンスの保証を維持しながら、効率的に計画するために使用される。境界は適応的で、計算効率が良く、元の解に収束していることが示される。分割パラダイムを拡張し、分割空間の階層構造を示し、計画の効率性を高める。次に、ガウス的信念に対するこれらの境界の特定の変種を提案し、少なくとも 4 の係数の理論的性能改善を示す。最後に,本手法を,能動SLAMシナリオ,シミュレーション,実実験において,他の最先端アルゴリズムと比較する。どちらの場合も、性能保証を伴う計画の大幅なスピードアップを示します。 Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information. The cost of solving the decision making problem is exponential in the action and observation spaces, thus rendering it unfeasible for many online systems. This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space. Using the partitioned observation space, we formulate analytical bounds on the expected information-theoretic reward, for general belief distributions. These bounds are then used to plan efficiently while keeping performance guarantees. We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution. We extend the partitioning paradigm and present a hierarchy of partitioned spaces that allows greater efficiency in planning. We then propose a specific variant of these bounds for Gaussian beliefs and show a theoretical performance improvement of at least a factor of 4. Finally, we compare our novel method to other state of the art algorithms in active SLAM scenarios, in simulation and in real experiments. In both cases we show a significant speed-up in planning with performance guarantees.	翻訳日:2024-06-19 12:40:28 公開日:2024-06-17
# 広告における金のストライク:広告テキスト生成の標準化と探索 Striking Gold in Advertising: Standardization and Exploration of Ad Text Generation ( http://arxiv.org/abs/2309.12030v2 ) ライセンス: Link先を確認	Masato Mita, Soichiro Murakami, Akihiko Kato, Peinan Zhang,	(参考訳) 手動広告作成の限界に対応するため、自動広告テキスト生成(ATG)分野において重要な研究がなされている。しかし、包括的なベンチマークと明確に定義された問題セットの欠如は、異なる方法の比較を困難にしている。これらの課題に対処するため、ATGのタスクを標準化し、マルチモーダル情報の利用を慎重に設計し、産業的評価を容易にする第1のベンチマークデータセットであるCAMERAを提案する。従来の手法から,大規模言語モデル(LLM)を含む最先端モデルまで,9つのベースラインによる広範な実験は,現状と今後の課題を示している。また、ATGの既存の指標とLLMに基づく評価器が人間の評価とどのように一致しているかについても検討する。 In response to the limitations of manual ad creation, significant research has been conducted in the field of automatic ad text generation (ATG). However, the lack of comprehensive benchmarks and well-defined problem sets has made comparing different methods challenging. To tackle these challenges, we standardize the task of ATG and propose a first benchmark dataset, CAMERA, carefully designed and enabling the utilization of multi-modal information and facilitating industry-wise evaluations. Our extensive experiments with a variety of nine baselines, from classical methods to state-of-the-art models including large language models (LLMs), show the current state and the remaining challenges. We also explore how existing metrics in ATG and an LLM-based evaluator align with human evaluations.	翻訳日:2024-06-19 12:40:27 公開日:2024-06-17
# 活性物質の深層学習確率フローとエントロピー生成速度 Deep learning probability flows and entropy production rates in active matter ( http://arxiv.org/abs/2309.12991v2 ) ライセンス: Link先を確認	Nicholas M. Boffi, Eric Vanden-Eijnden,	(参考訳) 自己推進コロイドから運動性細菌への活性物質系は、顕微鏡スケールで、自由エネルギーを有用な仕事に変換することで特徴づけられる。これらは平衡統計力学の範囲を超えて物理学を巻き込み、その非平衡状態の性質を理解することが永続的な課題である。エントロピー生成速度と確率電流は、時間反転対称性の分解を測定することで定量的な方法を提供する。しかし、それらの効率的な計算は、システムの未知かつ高次元の確率密度に依存するため、いまだ解明されていない。そこで本研究では, 生成モデリングの最近の進歩に基づき, この密度のスコアを推定する深層学習フレームワークを開発する。本研究では, 微視的運動方程式とともに, エントロピー生成速度, 確率電流, および個々の粒子からの局所的寄与への分解にアクセスできることを示す。このスコアを表現するために,粒子間の高次相互作用を基礎となる置換対称性を尊重しながら学習する,空間的に局所的なトランスフォーマーネットワークアーキテクチャを導入する。運動誘発相分離法(MIPS)を施行した活性粒子の高次元システムに適用することにより,本手法の幅広い有用性と拡張性を実証する。本研究では,4096粒子を1つの充填率で学習した1つのネットワークが,最大32768粒子を含む相図の他の領域に一般化可能であることを示す。本研究では, 粒子数と充填率の関数として, MIPSにおける平衡からの離脱の空間構造を定量化する。 Active matter systems, from self-propelled colloids to motile bacteria, are characterized by the conversion of free energy into useful work at the microscopic scale. They involve physics beyond the reach of equilibrium statistical mechanics, and a persistent challenge has been to understand the nature of their nonequilibrium states. The entropy production rate and the probability current provide quantitative ways to do so by measuring the breakdown of time-reversal symmetry. Yet, their efficient computation has remained elusive, as they depend on the system's unknown and high-dimensional probability density. Here, building upon recent advances in generative modeling, we develop a deep learning framework to estimate the score of this density. We show that the score, together with the microscopic equations of motion, gives access to the entropy production rate, the probability current, and their decomposition into local contributions from individual particles. To represent the score, we introduce a novel, spatially-local transformer network architecture that learns high-order interactions between particles while respecting their underlying permutation symmetry. We demonstrate the broad utility and scalability of the method by applying it to several high-dimensional systems of active particles undergoing motility-induced phase separation (MIPS). We show that a single network trained on a system of 4096 particles at one packing fraction can generalize to other regions of the phase diagram, including systems with as many as 32768 particles. We use this observation to quantify the spatial structure of the departure from equilibrium in MIPS as a function of the number of particles and the packing fraction.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# ValueDCG: 言語モデルの包括的人間的価値理解能力の測定 ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models ( http://arxiv.org/abs/2310.00378v4 ) ライセンス: Link先を確認	Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang,	(参考訳) 人的価値は人間の意思決定の背後にある重要な要素である。大きな言語モデル(LLM)が人間の決定に大きく影響していることを考えると、人間の価値を正確に理解して安全性を確保することが不可欠である。しかし、これらの値の把握は、その値が複雑で適応可能な性質のため複雑である。 LLMの価値を真に理解するには、"know what"と"know why"の両方を考慮する必要がある、と私たちは主張する。そこで本研究では,2つの側面を定量的に評価するための総合評価指標であるValueDCG(Value Discriminator-Critique Gap)を提案する。 4つの代表的なLCMを評価し,LLMの「何」と「なぜ」の能力の成長率がパラメータ数の増加と一致しないことを示す。このことは、LLMが提供されたコンテキストに基づいて、その固有の価値を真に理解せず、潜在的なリスクを示さずに、もっともらしい説明を行うかもしれないことを示唆している。 Personal values are a crucial factor behind human decision-making. Considering that Large Language Models (LLMs) have been shown to impact human decisions significantly, it is essential to make sure they accurately understand human values to ensure their safety. However, evaluating their grasp of these values is complex due to the value's intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present a comprehensive evaluation metric, ValueDCG (Value Discriminator-Critique Gap), to quantitatively assess the two aspects with an engineering implementation. We assess four representative LLMs and provide compelling evidence that the growth rates of LLM's "know what" and "know why" capabilities do not align with increases in parameter numbers, resulting in a decline in the models' capacity to understand human values as larger amounts of parameters. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# 思考伝播:大規模言語モデルを用いた複雑推論における分析的アプローチ Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models ( http://arxiv.org/abs/2310.03965v3 ) ライセンス: Link先を確認	Junchi Yu, Ran He, Rex Ying,	(参考訳) 大規模言語モデル(LLM)は、プロンプトメソッドの開発において、タスクの推論において顕著な成功を収めた。しかし、既存のプロンプトアプローチは、類似の問題を解決するための洞察を再利用できず、多段階の推論において累積誤差に悩まされる。これらの問題に対処するために、類似した問題を探索し、それらの解を活用してLLMの複雑な推論能力を向上する「textbf{\textit{Thought Propagation} (TP)」を提案する。これらの類似した問題は、再利用可能な解と問題解決戦略を持つ入力問題と関係している。そのため、従来の類似問題の解決に関する洞察を広めて、新たな問題解決を促すことが期待されている。これを実現するため,TP は LLM に対して,入力問題に関連する類似問題の集合を提案し,解決するよう促す。そして、TPは、類似問題の結果を再利用して、新しい解を直接生成するか、あるいは、スクラッチから得られた初期解を修正するための知識集約的な実行計画を導出する。 TPは既存のプロンプトアプローチと互換性があり、タスク固有のプロンプトエンジニアリングに多くの労力をかけることなく、プラグイン・アンド・プレイの一般化と幅広いタスクの強化を可能にしている。 3つの課題にわたる実験により、TPは、最短経路推論における最適解の発見における平均12倍の絶対的な増加、創造的記述における人間の嗜好の13倍の改善、LLM-Agent Planningのタスク完了率の15倍の強化により、ベースラインよりも大幅に改善されていることを示した。 Large Language Models (LLMs) have achieved remarkable success in reasoning tasks with the development of prompting methods. However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. These analogous problems are related to the input one, with reusable solutions and problem-solving strategies. Thus, it is promising to propagate insights of solving previous analogous problems to inspire new problem-solving. To achieve this, TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. Then, TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch. TP is compatible with existing prompting approaches, allowing plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering. Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12\% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13\% improvement of human preference in Creative Writing, and 15\% enhancement in the task completion rate of LLM-Agent Planning.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# ProbTS: 横方向予測ホライズンにおけるベンチマークポイントと分布予測 ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons ( http://arxiv.org/abs/2310.07446v4 ) ライセンス: Link先を確認	Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, Jiang Bian,	(参考訳) 予測地平線の範囲に正確な点と分布予測を提供することは、様々な業界における時系列予測の適用において、重要かつ永続的な課題である。時系列予測のためのディープラーニングモデル開発に関する先行研究は、長期点予測や短期確率推定のような孤立した側面にしばしば集中している。この狭い焦点は、難解な方法論的選択をもたらし、これらのモデルの未知のシナリオへの適応性を阻害する可能性がある。普遍的な予測モデルの開発の傾向は高まっているが、その利点や欠点について、特に点や分布予測といった重要な予測ニーズについて、特に短い地平線と長い地平線をまたいだものについては、十分に理解されていない。本稿では、これらの基本的な予測ニーズを評価し、近年の多くの最先端研究の厳密な比較分析を行うために、統一的なプラットフォームとして設計されたベンチマークツールであるProbTSを提案する。異なる予測要求から生じる特徴データの特徴を識別し、これらの特徴が典型的な研究軌跡において方法論的嗜好を損なうことができるかを明らかにする。これに基づいて, 時系列予測の最新モデルについて検討し, 方法論的強みと弱みの分析がこれらの普遍的モデルにも適用可能であることを明らかにする。最後に、現在の研究に固有の限界を概説し、今後の探査にいくつかの道のりを画定する。 Delivering precise point and distributional forecasts across a spectrum of prediction horizons represents a significant and enduring challenge in the application of time-series forecasting within various industries. Prior research on developing deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilistic estimations. This narrow focus may result in skewed methodological choices and hinder the adaptability of these models to uncharted scenarios. While there is a rising trend in developing universal forecasting models, a thorough understanding of their advantages and drawbacks, especially regarding essential forecasting needs like point and distributional forecasts across short and long horizons, is still lacking. In this paper, we present ProbTS, a benchmark tool designed as a unified platform to evaluate these fundamental forecasting needs and to conduct a rigorous comparative analysis of numerous cutting-edge studies from recent years. We dissect the distinctive data characteristics arising from disparate forecasting requirements and elucidate how these characteristics can skew methodological preferences in typical research trajectories, which often fail to fully accommodate essential forecasting needs. Building on this, we examine the latest models for universal time-series forecasting and discover that our analyses of methodological strengths and weaknesses are also applicable to these universal models. Finally, we outline the limitations inherent in current research and underscore several avenues for future exploration.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# 質問応答モデルにおけるバイアスの影響の追跡による緩和 Mitigating Bias for Question Answering Models by Tracking Bias Influence ( http://arxiv.org/abs/2310.08795v2 ) ライセンス: Link先を確認	Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng,	(参考訳) 様々なNLPタスクのモデルはステレオタイプを示すことが示されており、QA(QA)モデルのバイアスは特に有害であり、出力回答はエンドユーザーが直接消費する可能性がある。 QAモデルのバイアスを評価するデータセットは存在するが、QAモデルのバイアス緩和技術はまだ未検討である。本研究では,複数選択QAモデルのバイアスを軽減するためのBMBIを提案する。モデルがバイアスのある例から学んだ場合、よりバイアスに傾くように傾くという直感に基づいて、別のインスタンスへの影響を観察して、クエリインスタンスのバイアスレベルを測定します。影響のあるインスタンスがよりバイアスを受ければ、クエリインスタンスはバイアスを受けます。次に、最適化目的として検出されたバイアスレベルを用いて、元のQAタスクに加えてマルチタスク学習環境を構築する。さらに、包括的かつ敏感な方法でバイアスを定量化する新しいバイアス評価指標を導入する。本手法は,複数のバイアスカテゴリにまたがる複数のQA定式化に適用可能であることを示す。 BBQデータセットの9つのバイアスカテゴリのバイアスレベルを、同等のQA精度を維持しながら大幅に低減することができる。 Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# 単分子運動制御 Single-molecule motion control ( http://arxiv.org/abs/2310.09296v2 ) ライセンス: Link先を確認	Divyam Neer Verma, KV Chinmaya, Jan Heck, G Mohan Rao, Sonia Contera, Moumita Ghosh, Siddharth Ghosh,	(参考訳) 高時空間分解能で単一分子の動的操作と制御を実現することは、原子スケールコンピューティングとナノロボティクスの進展に重要である。しかし、この試みは、原子と分子の相互作用の複雑な性質、ナノスケールシステムの高次元特性、実験データの不足によって批判的に挑戦されている。本稿では, 格子構造内における基本表面電荷から生じる静電力を利用して, 表面への埋め込み電荷を模倣することにより, 単分子拡散を制御するための玩具モデルを提案する。状態依存拡散方程式とグリーン関数を組み合わせた単一分子拡散過程における量子力学と静電相互作用の相互作用について検討する。その結果, 表面電荷密度は拡散係数に大きく影響し, クーロン力に類似した線形スケーリングを示すことがわかった。実験拡散定数を正確に予測し、観測範囲を6000$\mu\text{m}^2\text{ms}^{-1}$および80000$\mu\text{m}^2\text{ms}^{-1}$まで拡張する。我々のモデルにより予測された分子軌道は、特に重力支援加速度のような挙動において、惑星運動に類似している。ナノロボティクス、ナノスケールでの運動制御、特に原子と分子のトラップが不可欠である分子と量子コンピューティングの分野でのコンピューティング応用への変革的な意味を持っている。原子/分子操作のための最先端の光学格子と走査型トンネル顕微鏡の他に、我々はアングストロームスケールでの量子操作による単一分子ダイナミクスの精密制御の利点を明確化している。 Achieving dynamic manipulation and control of single molecules at high spatio-temporal resolution is pivotal for advancing atomic-scale computing and nanorobotics. However, this endeavour is critically challenged by complex nature of atomic and molecular interactions, high-dimensional characteristics of nanoscale systems, and scarcity of experimental data. Here, we present a toy model for controlling single-molecule diffusion by harnessing electrostatic forces arising from elementary surface charges within a lattice structure, mimicking embedded charges on a surface. We investigate the interplay between quantum mechanics and electrostatic interactions in single molecule diffusion processes using a combination of state-dependent diffusion equations and Green's functions. We find that surface charge density critically influences diffusion coefficients, exhibiting linear scaling akin to Coulombic forces. We achieve accurate predictions of experimental diffusion constants and extending the observed range to values reaching up to 6000 $\mu\text{m}^2\text{ms}^{-1}$ and 80000 $\mu\text{m}^2\text{ms}^{-1}$. The molecular trajectories predicted by our model bear resemblance to planetary motion, particularly in their gravity-assisted acceleration-like behaviour. It holds transformative implications for nanorobotics, motion control at the nanoscale, and computing applications, particularly in the areas of molecular and quantum computing where the trapping of atoms and molecules is essential. Beyond the state-of-the-art optical lattice and scanning tunnelling microscopy for atomic/molecular manipulation, our findings give unambiguous advantage of precise control over single-molecule dynamics through quantum manipulation at the angstrom scale.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# 音声認識のための多段階大規模言語モデル補正 Multi-stage Large Language Model Correction for Speech Recognition ( http://arxiv.org/abs/2310.11532v2 ) ライセンス: Link先を確認	Jie Pu, Thai-Son Nguyen, Sebastian Stüker,	(参考訳) 本稿では,大規模言語モデル(LLM)を用いて,競合音声認識システムの性能向上を図る。従来のLLMに基づくASR誤り訂正法とは違って,ASR出力の不確実性推定とLLMの推論能力を利用した新しい多段階手法を提案する。具体的には、提案手法には2つの段階がある: 第一段階は、ASRの不確実性の推定であり、N-bestリストの仮説を利用して、信頼性の低い転写を識別する。この修正タスクは多段階ルールに基づくLCM推論プロセスとして定式化され、明示的に記述されたルールを使用して、タスクを具体的な推論ステップに分解する。提案手法の有効性は,複数のテスト領域およびゼロショット設定において,競合するASRシステムに対するWERの10%～20%の相対的な改善を示すことによって実証された。 In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from previous LLM-based ASR error correction methods, we propose a novel multi-stage approach that utilizes uncertainty estimation of ASR outputs and reasoning capability of LLMs. Specifically, the proposed approach has two stages: the first stage is about ASR uncertainty estimation and exploits N-best list hypotheses to identify less reliable transcriptions; The second stage works on these identified transcriptions and performs LLM-based corrections. This correction task is formulated as a multi-step rule-based LLM reasoning process, which uses explicitly written rules in prompts to decompose the task into concrete reasoning steps. Our experimental results demonstrate the effectiveness of the proposed method by showing 10% ~ 20% relative improvement in WER over competitive ASR systems -- across multiple test domains and in zero-shot settings.	翻訳日:2024-06-19 12:30:40 公開日:2024-06-17
# 一般化乳癌切除のための進行性デュアルプリオリネットワーク Progressive Dual Priori Network for Generalized Breast Tumor Segmentation ( http://arxiv.org/abs/2310.13574v2 ) ライセンス: Link先を確認	Li Wang, Lihui Wang, Zixiang Kuai, Lei Tang, Yingfeng Ou, Chen Ye, Yuemin Zhu,	(参考訳) 乳房腫瘍セグメント化モデルの一般化能力の向上と,より小型で低コントラストで不規則な形状の乳房腫瘍に対するセグメンテーション性能の向上を目的として,異なるセンターで取得したダイナミックエンハンスメント磁気共鳴画像(DCE-MRI)から乳房腫瘍を分割するプログレッシブ・デュアルプライオリティ・ネットワーク(PDPNet)を提案する。 PDPNetは,まず粗いセグメンテーションをベースとした局在モジュールを持つ腫瘍領域を収穫し,弱いセマンティックオーディションとクロススケール相関の事前知識を用いて乳房腫瘍マスクを徐々に改良した。 PDPNetの有効性を検証するため,マルチセンタデータセット上での最先端手法との比較を行った。その結果, PDPNet の DSC と HD95 はそれぞれ5.13%, 7.58% 改善した。さらに, 局所化モジュールが正常組織の影響を低減し, モデルの一般化能力を向上させることを実証した。弱いセマンティクスにより、腫瘍領域に焦点を合わせることで、欠損した小腫瘍や低コントラスト腫瘍を避けることができる。クロススケール相関は不規則腫瘍の形状認識能力を促進するのに有用である。したがって、それらを統合されたフレームワークに統合することで、マルチセンターの乳がんセグメンテーション性能が向上した。ソースコードとオープンデータはhttps://github.com/wangli100209/PDPNetでアクセスできる。 To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# 階層型ガウス過程とニューラルネットワーク回帰によるカリフォルニア・セントラルバレーの地下水位モデリング Modeling groundwater levels in California's Central Valley by hierarchical Gaussian process and neural network regression ( http://arxiv.org/abs/2310.14555v2 ) ライセンス: Link先を確認	Anshuman Pradhan, Kyra H. Adams, Venkat Chandrasekaran, Zhen Liu, John T. Reager, Andrew M. Stuart, Michael J. Turmon,	(参考訳) カリフォルニア州のセントラル・バレー(CV)で連続的に地下水位をモデル化することは、低品質の井戸データが時間と空間にわたって希少にサンプリングされ、困難である。一貫性のある井戸データがないため、2012-2015年の激しい干ばつの後、2017年と2019年の湿潤年がCV地下水に与える影響を評価するのは難しい。 CV帯水層における3次元岩相テクスチャモデルから学習することにより,地下水位をモデル化するための新しい機械学習手法を定式化した。提案法は,ガウス過程(GP)とディープニューラルネットワーク(DNN)を組み合わせて多変量回帰を行う。階層的モデリング手法はDNNを訓練し、GPによる非パラメトリック回帰が実行されるリソロジー的に情報を得た潜在空間を学習する。高速かつ確実な不確実性定量化を伴う井戸データの非定常特性をモデル化するためのGP-DNN回帰の有効性を示す。本研究では,不規則な井戸データを持つ流域における帯水層応答に対する水文学的理解を補うためにモデル予測がどのように用いられるかを示す。以上の結果から,2017年と2019年のカリフォルニアの湿潤年は,前回の干ばつによる地下水損失の補充にはほとんど効果がなかったことが示唆された。 Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. The lack of consistent well data makes it difficult to evaluate the impact of 2017 and 2019 wet years on CV groundwater following a severe drought during 2012-2015. A novel machine learning method is formulated for modeling groundwater levels by learning from a 3D lithological texture model of the CV aquifer. The proposed formulation performs multivariate regression by combining Gaussian processes (GP) and deep neural networks (DNN). The hierarchical modeling approach constitutes training the DNN to learn a lithologically informed latent space where non-parametric regression with GP is performed. We demonstrate the efficacy of GP-DNN regression for modeling non-stationary features in the well data with fast and reliable uncertainty quantification, as validated to be statistically consistent with the empirical data distribution from 90 blind wells across CV. We show how the model predictions may be used to supplement hydrological understanding of aquifer responses in basins with irregular well data. Our results indicate that on average the 2017 and 2019 wet years in California were largely ineffective in replenishing the groundwater loss caused during previous drought years.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# 私の注意に基づくASRシステムはどれくらいのコンテキストが必要か? How Much Context Does My Attention-Based ASR System Need? ( http://arxiv.org/abs/2310.15672v2 ) ライセンス: Link先を確認	Robert Flynn, Anton Ragni,	(参考訳) 音声認識のタスクでは,30秒以上の音響コンテキストの使用は稀であり,文献ではあまり語られていない。本研究では,音響モデルの訓練・評価に用いるシーケンス長が音声認識性能に与える影響について実験的検討を行った。これらの実験では、約10万個の擬似ラベル付きSpotifyポッドキャストのデータセットを使用し、コンテキストの長さは5秒から1時間である。ゼロショット評価は、Earnings-22、Tedlium、Rev16といったロングフォーマットデータセットに表示される。その結果、最大21.8分間の音響コンテキストでトレーニングを行うことの利点が示され、10秒のコンテキストでトレーニングしたベースラインから14.5\%の相対的な改善が見られた。モデルの幅・深度,位置符号化方式,注目点数などによって,より長いコンテキストを使うことができることが判明した。 For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon and under-investigated in literature. In this work, we conduct an empirical study on the effect of scaling the sequence length used to train/evaluate (dense-attention-based) acoustic models on speech recognition performance. For these experiments, a dataset of roughly 100,000 pseudo-labelled Spotify podcasts is used, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations are presented on the long-format datasets: Earnings-22, Tedlium and Rev16. Results demonstrate a benefit from training with up to 21.8 minutes of acoustic context, showing up to a 14.5\% relative improvement from a baseline trained with 10 seconds of context. We find that the model's width/depth, positional encoding scheme and number of attention heads impact its ability to use longer contexts.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# 生成言語モデルが社会的アイデンティティのバイアスを表わす Generative Language Models Exhibit Social Identity Biases ( http://arxiv.org/abs/2310.15819v2 ) ライセンス: Link先を確認	Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek,	(参考訳) 大規模言語モデルの人気が高まったことで、これらのモデルが人間から学べるバイアスに関する懸念が高まっている。 56大言語モデルに内集団連帯性と外集団敵意,社会心理学から知られている基本的社会的アイデンティティバイアスが存在するか否かを検討する。ほぼすべての基礎言語モデルといくつかの命令微調整モデルは、文の完全化を促されたときに、明確な非群陽性および非群陰性な関連を示す(例:「我々は...」)。この結果から,現代の言語モデルでは,実験室でも実世界のLLMとの会話においても,人間と同じ程度に基本的な社会的アイデンティティバイアスが示され,学習データと微調整のキュレーションにより,そのようなバイアスが軽減されることが示唆された。以上の結果から,LLMとユーザインタラクションのさらなる研究の必要性が示唆された。 The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., "We are..."). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# 固有ベクトル継続と投影型エミュレータ Eigenvector Continuation and Projection-Based Emulators ( http://arxiv.org/abs/2310.19419v3 ) ライセンス: Link先を確認	Thomas Duguet, Andreas Ekström, Richard J. Furnstahl, Sebastian König, Dean Lee,	(参考訳) 固有ベクトル継続(英: Eigenvector continuation)は、パラメータ集合の固有ベクトルスナップショットから派生した部分空間射影を用いたパラメトリック固有値問題の計算方法である。還元基底法(reduce-basis method)と呼ばれる、より広範な部分空間射影技法のクラスの一部である。本稿では固有ベクトル継続と射影型エミュレータの開発、理論、応用について述べる。本稿では,基本概念を紹介し,基礎となる理論と収束特性について論じるとともに,近年の量子システムへの応用と今後の展望について述べる。 Eigenvector continuation is a computational method for parametric eigenvalue problems that uses subspace projection with a basis derived from eigenvector snapshots from different parameter sets. It is part of a broader class of subspace-projection techniques called reduced-basis methods. In this colloquium article, we present the development, theory, and applications of eigenvector continuation and projection-based emulators. We introduce the basic concepts, discuss the underlying theory and convergence properties, and present recent applications for quantum systems and future prospects.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# Pseudorandom Statesから$\bot$-PRFsによる署名 Signatures From Pseudorandom States via $\bot$-PRFs ( http://arxiv.org/abs/2311.00847v4 ) ライセンス: Link先を確認	Mohammed Barhoush, Amit Behera, Lior Ozer, Louis Salvail, Or Sattath,	(参考訳) 量子擬似ランダム性の異なるフレーバーは、様々な暗号アプリケーションに有用であることが証明されており、これらのプリミティブは量子後片道関数よりも弱い可能性がある。 Ananth, Lin, and Yuen (2023) は、対数擬似ランダム状態が擬決定論的PRGを構成するのに使えることを示した。本研究では, $\bot$-PRG と $\bot$-PRF の新たな定義を導入する。正当性保証は、固定種の場合、無視可能な確率を除いて、出力が同一(確率1-1/poly$)または認識可能な中止($\bot$)である。当社のアプローチは、PRFの適応セキュリティと同様に、マルチタイムPRGセキュリティの自然な定義を認めている。疑似決定論的PRGから$\bot$-PRGを構築し、そこから$\bot$-PRFを得る。対称鍵暗号、コミットメント、MAC、長さ制限されたワンタイムデジタルシグネチャなど、ほとんどのミニ暗号化プリミティブは、様々な量子擬似ランダム性の仮定に基づいて示されているが、デジタルシグネチャは解明されていない。本研究の主な応用は,古典的な公開鍵と署名を備えた(量子)デジタル署名方式であり,森前と山川の作品(クリプト,2022年)に提示された未解決問題に対処するものである。さらに, タンパーレジリエントな量子公開鍵を用いたセキュアな公開鍵暗号を構築する。 Different flavors of quantum pseudorandomness have proven useful for various cryptographic applications, with the compelling feature that these primitives are potentially weaker than post-quantum one-way functions. Ananth, Lin, and Yuen (2023) have shown that logarithmic pseudorandom states can be used to construct a pseudo-deterministic PRG: informally, for a fixed seed, the output is the same with $1-1/poly$ probability. In this work, we introduce new definitions for $\bot$-PRG and $\bot$-PRF. The correctness guarantees are that, for a fixed seed, except with negligible probability, the output is either the same (with probability $1-1/poly$) or recognizable abort, denoted $\bot$. Our approach admits a natural definition of multi-time PRG security, as well as the adaptive security of a PRF. We construct a $\bot$-PRG from any pseudo-deterministic PRG and, from that, a $\bot$-PRF. Even though most mini-crypt primitives, such as symmetric key encryption, commitments, MAC, and length-restricted one-time digital signatures, have been shown based on various quantum pseudorandomness assumptions, digital signatures remained elusive. Our main application is a (quantum) digital signature scheme with classical public keys and signatures, thereby addressing a previously unresolved question posed in Morimae and Yamakawa's work (Crypto, 2022). Additionally, we construct CPA secure public-key encryption with tamper-resilient quantum public keys.	翻訳日:2024-06-19 12:20:53 公開日:2024-06-17
# AdaNCA: よりロバストな視覚変換器のアダプターとしての神経細胞性オートマタ AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer ( http://arxiv.org/abs/2406.08298v2 ) ライセンス: Link先を確認	Yitao Xu, Tong Zhang, Sabine Süsstrunk,	(参考訳) 視覚変換器(ViT)は画像分類タスクにおいて、特に局所的な注意や畳み込みによる局所的な情報を備えた場合、顕著な性能を示した。このようなアーキテクチャは機能集約を粒度によって改善するが、ネットワークの堅牢性に寄与しないことが多い。ニューラルセルオートマタ(NCA)は、局所的な相互作用を通じてグローバルなセル表現のモデリングを可能にし、そのトレーニング戦略とアーキテクチャ設計は、ノイズの多い入力に対して強力な一般化能力と堅牢性をもたらす。本稿では,視覚変換器用Adaptor Neural Cellular Automata (AdaNCA)を提案する。標準的なNAAの計算オーバーヘッドを克服するために,より効率的な対話学習のための動的インタラクションを提案する。さらに,AdaNCAの配置解析とロバスト性改善に基づいて,AdaNCAの最も効果的な挿入点を同定するアルゴリズムを開発した。パラメータの3%未満の増加により、AdaNCAはImageNet1Kベンチマークの敵攻撃下での精度の10%以上の絶対的な改善に貢献している。さらに,8つのロバスト性ベンチマークと4つのViTアーキテクチャに対して,プラグインモジュールであるAdaNCAが常にViTのロバスト性を改善することを実証した。 Vision Transformers (ViTs) have demonstrated remarkable performance in image classification tasks, particularly when equipped with local information via region attention or convolutions. While such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global cell representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy inputs. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformer that uses NCA as plug-in-play adaptors between ViT layers, enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning. Furthermore, we develop an algorithm for identifying the most effective insertion points for AdaNCA based on our analysis of AdaNCA placement and robustness improvement. With less than a 3% increase in parameters, AdaNCA contributes to more than 10% absolute improvement in accuracy under adversarial attacks on the ImageNet1K benchmark. Moreover, we demonstrate with extensive evaluations across 8 robustness benchmarks and 4 ViT architectures that AdaNCA, as a plug-in-play module, consistently improves the robustness of ViTs.	翻訳日:2024-06-19 12:11:07 公開日:2024-06-17
# VLind-Bench: 大規模視覚言語モデルにおける言語事前測定 VLind-Bench: Measuring Language Priors in Large Vision-Language Models ( http://arxiv.org/abs/2406.08702v2 ) ライセンス: Link先を確認	Kang-il Lee, Minbeom Kim, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung,	(参考訳) LVLM(Large Vision-Language Models)は、様々なマルチモーダルタスクにおいて優れた性能を示す。しかし、それらは、画像情報を無視しながら、テキストパターンのみに基づいて応答が生成される、言語事前(Language prior)と呼ばれる問題に悩まされている。事前言語の問題に対処することは、トレーニングディストリビューション外の画像を扱う際に、望ましくない偏見や幻覚を引き起こす可能性があるため、非常に重要である。その重要性にもかかわらず、LVLMにおける言語先行を正確に測定する現在の手法は、あまり研究されていない。既存のベンチマークは、反ファクトやアウト・オブ・ディストリビューションのイメージに基づいており、部分的に言語先行を計測することができるが、言語先行を他の要因から切り離すことはできない。この目的のために我々は,LVLM の言語先行,すなわち盲点を測定するために設計された最初のベンチマークである VLind-Bench という新しいベンチマークを提案する。言語先行性を評価するために、対物画像に関するテストを含むだけでなく、コモンセンス知識、視覚知覚、コモンセンスバイアスなど、より基本的な機能を評価する一連のテストも含んでいる。ベンチマーク中の各インスタンスについて、これらの基本テストが言語事前評価の前にパスされることを保証し、その結果、他の要素が評価に与える影響を最小限に抑える。近年のLVLMの評価と分析により,ほぼすべてのモデルが言語先行に大きく依存していることが判明した。 Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.	翻訳日:2024-06-19 12:01:13 公開日:2024-06-17
# フレキシブル・高効率密度行列アルゴリズムによる任意テンソルネットワークの近似縮約 Approximate Contraction of Arbitrary Tensor Networks with a Flexible and Efficient Density Matrix Algorithm ( http://arxiv.org/abs/2406.09769v2 ) ライセンス: Link先を確認	Linjian Ma, Matthew Fishman, Miles Stoudenmire, Edgar Solomonik,	(参考訳) テンソルネットワークの収縮は、統計物理学、量子コンピューティング、計算機科学で広く使われている。低ランク近似を用いてテンソルネットワークの縮約を効率的に近似する手法を提案し、この縮約時に生成された各中間テンソルを低ランク二分木テンソルネットワークとして近似する。提案アルゴリズムは,低ランク近似を行う場合,環境の大部分を組み込むことが可能である。ここでは、この環境はネットワーク内のテンソルの残りの集合を指し、より大きな環境を持つ低ランク近似は一般により高い精度を提供する。格子上に定義されたテンソルネットワークを縮約するために、提案アルゴリズムは標準境界ベースアルゴリズムの一般化と見なすことができる。さらに、このアルゴリズムは、一般的なグラフ構造を持つテンソルネットワークを木構造に近似するためのコスト効率の高い密度行列アルゴリズムを含む。実験結果から,提案手法は従来提案した近似テンソルネットワーク縮合アルゴリズムよりも精度と効率の両面から,複数の問題に対して優れていたことが示唆された。 Tensor network contractions are widely used in statistical physics, quantum computing, and computer science. We introduce a method to efficiently approximate tensor network contractions using low-rank approximations, where each intermediate tensor generated during the contractions is approximated as a low-rank binary tree tensor network. The proposed algorithm has the flexibility to incorporate a large portion of the environment when performing low-rank approximations, which can lead to high accuracy for a given rank. Here, the environment refers to the remaining set of tensors in the network, and low-rank approximations with larger environments can generally provide higher accuracy. For contracting tensor networks defined on lattices, the proposed algorithm can be viewed as a generalization of the standard boundary-based algorithms. In addition, the algorithm includes a cost-efficient density matrix algorithm for approximating a tensor network with a general graph structure into a tree structure, whose computational cost is asymptotically upper-bounded by that of the standard algorithm that uses canonicalization. Experimental results indicate that the proposed technique outperforms previously proposed approximate tensor network contraction algorithms for multiple problems in terms of both accuracy and efficiency.	翻訳日:2024-06-19 12:01:13 公開日:2024-06-17
# 組合せ最適化のためのQ-Learningを用いたポインタネットワーク Pointer Networks with Q-Learning for Combinatorial Optimization ( http://arxiv.org/abs/2311.02629v3 ) ライセンス: Link先を確認	Alessandro Barro,	(参考訳) 本稿では、モデルフリーなQ値ポリシー近似をPointer Networks(Ptr-Nets)と統合したハイブリッドニューラルネットワークであるPointer Q-Network(PQN)を紹介し、長期的成果に焦点をあてて、注目に基づくシーケンス生成の最適性を高める。この統合は特に組合せ最適化(CO)タスク、特に本研究の焦点であるトラベリングセールスマン問題(TSP)の解決に有効である。 PQNと互換性のあるマルコフ決定プロセス(MDP)を定義することでこの問題に対処する。このプロセスは、コンテキストベクトルを生成し、ソフトマックスを適用する前に、利用可能なすべての状態-作用対について計算されたQ値によって動的に調整される生の注意スコアを算出する。得られた注目ベクトルは行動分布として利用され、PQNの探索・探索動的適応性によって選択される。実験により,本手法の有効性を実証し,不安定な環境でモデルをテストする。 We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) to enhance the optimality of attention-based sequence generation, focusing on long-term outcomes. This integration proves particularly effective in solving combinatorial optimization (CO) tasks, especially the Travelling Salesman Problem (TSP), which is the focus of our study. We address this challenge by defining a Markov Decision Process (MDP) compatible with PQN, which involves iterative graph embedding, encoding and decoding by an LSTM-based recurrent neural network. This process generates a context vector and computes raw attention scores, which are dynamically adjusted by Q-values calculated for all available state-action pairs before applying softmax. The resulting attention vector is utilized as an action distribution, with actions selected hinged to exploration-exploitation dynamic adaptibility of PQN. Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.	翻訳日:2024-06-19 11:31:28 公開日:2024-06-17
# 大規模言語モデルは知識推論のための文脈内教師である Large Language Models are In-context Teachers for Knowledge Reasoning ( http://arxiv.org/abs/2311.06985v2 ) ライセンス: Link先を確認	Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu,	(参考訳) CoT(Chain-of- Thought)プロンプトは、単に情報検索以上の情報を必要とするクエリを推論するために、コンテキストにおいて大きな言語モデル(LLM)を教える。しかし、人間の専門家は通常、高コストでばらつきが高いインコンテキスト学習(ICL)のデモンストレーションを作成する必要がある。さらに重要なことは、ICLの有用な推論例を作る方法が明確でないことである。本研究は,LLMが知識推論に有効であるかどうかを考察する。我々は、人間の記憶検索における「エンコード特異性」仮説に従い、推論におけるインコンテキストの例は、トレーニングデータのエンコーディングコンテキストと一致するべきであると仮定する。そこで本研究では, LLM の自己記述的説明を実例から一般化する上で, 自己記述的説明を文脈内説明として用いることを提案する。自己説明は、人造の模範やその他のベースラインを用いて、著しく優れていた。さらに、文脈内教育において、LLMの自己説明とより類似した、異なる教師のLLMや人間の専門家による合理性は、私たちの符号化特異性仮説を支持する、より良い実証であることを明らかにした。そこで,本研究では,教師のLLMを学生に合わせるTeach-Backを提案する。例えば Teach-Back は 7B モデルで,より大きな GPT-3.5 をコンテキストで教えることができる。 Chain-of-thought (CoT) prompting teaches large language models (LLMs) in context to reason over queries that require more than mere information retrieval. However, human experts are usually required to craft demonstrations for in-context learning (ICL), which is expensive and has high variance. More importantly, how to craft helpful reasoning exemplars for ICL remains unclear. In this work, we investigate whether LLMs can be better in-context teachers for knowledge reasoning. We follow the ``encoding specificity'' hypothesis in human's memory retrieval to assume in-context exemplars at inference should match the encoding context in training data. We are thus motivated to propose Self-Explain to use one LLM's self-elicited explanations as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines. We further reveal that for in-context teaching, rationales by distinct teacher LLMs or human experts that more resemble the student LLM's self-explanations are better demonstrations, which supports our encoding specificity hypothesis. We then propose Teach-Back that aligns the teacher LLM with the student to enhance the in-context teaching performance. For example, Teach-Back enables a 7B model to teach the much larger GPT-3.5 in context, surpassing human teachers by around 5% in test accuracy on medical question answering.	翻訳日:2024-06-19 11:31:28 公開日:2024-06-17
# Auto-ICL:人間の監督を伴わないインテクスト学習 Auto-ICL: In-Context Learning without Human Supervision ( http://arxiv.org/abs/2311.09263v2 ) ライセンス: Link先を確認	Jinghan Yang, Shuming Ma, Furu Wei,	(参考訳) コンテキスト内学習能力により、適切なコンテキストを提供すると、大きな言語モデルの性能が大幅に向上する。しかし、既存の文脈内学習法は主に、ラベル付き例や明示的な指示など、人間が提供する文脈に依存している。人間によるコンテキスト記述は、様々なタスクに労働集約的であり、モデルが人間によって管理可能なタスクに制限される。これらの制約を克服するために,モデルが問題解決のための例と指示を自律的に生成できる自動文脈学習フレームワークを提案する。 Few-ShotやFew-Shot-CoTメソッドなど、モデル生成コンテキストは、Zero-CoTやAuto-CoTといった既存の自己生成コンテキストメソッドを上回っている。 With in-context learning ability, the performance of large language models can be significantly boosted when provided with appropriate context. However, existing in-context learning methods mainly rely on human-provided contexts, such as labeled examples and explicit instructions. Writing context by humans is labor-intensive on various tasks and limits the model to tasks manageable by humans. To overcome these limitations, we propose Automatic In-Context Learning framework that enables the model to autonomously generate examples and instructions for problem-solving. With experiments across various models and datasets, results show that model-generated contexts outperform human-annotated contexts, including Few-Shot and Few-Shot-CoT methods, and surpass existing self-generated context methods like Zero-CoT and Auto-CoT.	翻訳日:2024-06-19 11:31:28 公開日:2024-06-17
# InterControl: 全関節制御によるゼロショットヒューマンインタラクション生成 InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint ( http://arxiv.org/abs/2311.15864v3 ) ライセンス: Link先を確認	Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai,	(参考訳) テキスト条件の運動合成は拡散モデルの出現とともに顕著な進歩を遂げた。しかしながら、これらの運動拡散モデルの大部分は、主に1つのキャラクタのために設計され、マルチヒューマンインタラクションを見落としている。提案手法では, ゼロショット方式で, 任意の大きさの文字群に対して, 人間の動きと相互作用を合成することにより, この問題を探究する。このアプローチのキーとなる側面は、人間の関節のペアとして人間のインタラクションを適応させることです。固定数の文字を持つ多人数動作データセット上でのトレーニング動作生成モデルを必要とする既存の手法とは対照的に,本手法は,任意の個数の個人を含む人間のインタラクションをモデル化する柔軟性を持ち,トレーニングデータに課される制約を超越する。関節間の所望距離を維持するために,新しい制御可能な運動生成手法であるInterControlを導入する。モーションコントローラと逆キネマティクス誘導モジュールで構成されており、合成された文字の関節を所望の場所に現実的に正確に整列させる。さらに, 既成のLarge Language Model (LLM) を用いて, ヒューマンインタラクションのための接合対間距離を生成できることを実証した。実験結果から,本フレームワークが複数の人体文字とのインタラクションを生成する能力と,既成の物理系シミュレータで作業する可能性を強調した。 Text-conditioned motion synthesis has made remarkable progress with the emergence of diffusion models. However, the majority of these motion diffusion models are primarily designed for a single character and overlook multi-human interactions. In our approach, we strive to explore this problem by synthesizing human motion with interactions for a group of characters of any size in a zero-shot manner. The key aspect of our approach is the adaptation of human-wise interactions as pairs of human joints that can be either in contact or separated by a desired distance. In contrast to existing methods that necessitate training motion generation models on multi-human motion datasets with a fixed number of characters, our approach inherently possesses the flexibility to model human interactions involving an arbitrary number of individuals, thereby transcending the limitations imposed by the training data. We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. It consists of a motion controller and an inverse kinematics guidance module that realistically and accurately aligns the joints of synthesized characters to the desired location. Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM). Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 独自のアウトプットから生じる大規模言語モデル:自己消費型学習ループの分析 Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop ( http://arxiv.org/abs/2311.16822v2 ) ライセンス: Link先を確認	Martin Briesch, Dominik Sobania, Franz Rothlauf,	(参考訳) 大規模言語モデル(LLM)は、様々なオンラインプラットフォーム向けのコンテンツを生成するために既に広く使われている。 LLM生成コンテンツと人為的コンテンツとを安全に区別できないため、LLM生成コンテンツは次世代のLLMを訓練するために使われ、自己消費型トレーニングループが生まれる。画像生成領域から、このような自己消費トレーニングループは、最終的にモデル崩壊で終わる画像の品質と多様性の両方を減少させる。しかし、このアラーム効果がLLMにも見られるかどうかは不明である。そこで本研究では,LSMの自己消費訓練ループについて検討した。さらに,LLM生成したコンテンツの正確性を曖昧に検証できる論理式に基づく新しい手法を提案する。自己消費学習ループは正しい出力を生成するが、使用済みデータの割合によって出力の多様性は低下する。新鮮なデータは、この減少を遅らせる可能性があるが、それを止めることはできない。これらの結果を踏まえ、我々は研究者にこのプロセスの無効化方法の研究を奨励する。 Large Language Models (LLM) are already widely used to generate content for a variety of online platforms. As we are not able to safely distinguish LLM-generated content from human-produced content, LLM-generated content is used to train the next generation of LLMs, giving rise to a self-consuming training loop. From the image generation domain we know that such a self-consuming training loop reduces both quality and diversity of images finally ending in a model collapse. However, it is unclear whether this alarming effect can also be observed for LLMs. Therefore, we present the first study investigating the self-consuming training loop for LLMs. Further, we propose a novel method based on logic expressions that allows us to unambiguously verify the correctness of LLM-generated content, which is difficult for natural language text. We find that the self-consuming training loop produces correct outputs, however, the output declines in its diversity depending on the proportion of the used generated data. Fresh data can slow down this decline, but not stop it. Given these concerning results, we encourage researchers to study methods to negate this process.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 3DオブジェクトのアノテーションにVLMベースのパイプラインを活用する Leveraging VLM-Based Pipelines to Annotate 3D Objects ( http://arxiv.org/abs/2311.17851v2 ) ライセンス: Link先を確認	Rishabh Kabra, Loic Matthey, Alexander Lerchner, Niloy J. Mitra,	(参考訳) 事前訓練された視覚言語モデル(VLM)は、ラベルのない3Dオブジェクトを大規模にキャプションする機会を提供する。オブジェクトの異なるビュー(Luo et al , 2023)からのVLM記述を要約する主要なアプローチは、最終的な出力を生成するために言語モデル(GPT4)に依存している。このテキストベースの集約は、潜在的に矛盾する記述をマージするため、幻覚の影響を受けやすい。本稿では,VLMの応答に影響を与える視点などの要因を疎外する代替アルゴリズムを提案する。テキストのみの応答をマージする代わりに、VLMの合同画像テキストの可能性を利用する。確率的アグリゲーションは、より信頼性が高く、効率的であるだけでなく、人間の検証されたラベルに対するオブジェクトタイプをSoTAに当てはめている。集約されたアノテーションは条件付き推論にも有用であり、オブジェクトの型が補助的なテキストベースの入力として指定されたときに、下流の予測(オブジェクト材料の例)を改善する。このような補助的な入力は、教師なし環境における視覚的推論に対する視覚的推論の貢献を非難することを可能にする。これらの教師付きおよび教師なしの評価により、VLMベースのパイプラインをどのように活用して、Objaverseデータセットから764Kオブジェクトに対する信頼性の高いアノテーションを生成するかを示す。 Pretrained vision language models (VLMs) present an opportunity to caption unlabeled 3D objects at scale. The leading approach to summarize VLM descriptions from different views of an object (Luo et al., 2023) relies on a language model (GPT4) to produce the final output. This text-based aggregation is susceptible to hallucinations as it merges potentially contradictory descriptions. We propose an alternative algorithm to marginalize over factors such as the viewpoint that affect the VLM's response. Instead of merging text-only responses, we utilize the VLM's joint image-text likelihoods. We show our probabilistic aggregation is not only more reliable and efficient, but sets the SoTA on inferring object types with respect to human-verified labels. The aggregated annotations are also useful for conditional inference; they improve downstream predictions (e.g., of object material) when the object's type is specified as an auxiliary text-based input. Such auxiliary inputs allow ablating the contribution of visual reasoning over visionless reasoning in an unsupervised setting. With these supervised and unsupervised evaluations, we show how a VLM-based pipeline can be leveraged to produce reliable annotations for 764K objects from the Objaverse dataset.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 強化学習のための最適攻撃と防御 Optimal Attack and Defense for Reinforcement Learning ( http://arxiv.org/abs/2312.00198v2 ) ライセンス: Link先を確認	Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie,	(参考訳) 実システムにおける強化学習(Reinforcement Learning, RL)の有用性を確保するためには, 騒音や敵攻撃に対して堅牢であることを保証することが重要である。敵RLでは、外部攻撃者は、環境との相互作用を操作できる。我々は、オンライン操作攻撃の全クラスについて研究する。 (i)国家攻撃 (二観測攻撃(認識状態攻撃の一般化) (三)攻撃、及び (4)報酬攻撃。我々は,攻撃者の期待する報酬を最大化できるステルスシー攻撃を設計する際の問題点を,攻撃された相互作用によって引き起こされる真の環境ではなく,より高いレベルの環境をメタMDPと呼ぶマルコフ決定プロセス(MDP)によって捉えた。攻撃者は、多項式時間で計画したり、標準RL手法を用いて多項式サンプルの複雑さを学習することで、最適な攻撃を導出できることを示す。我々は,被害者に対する最適な防衛方針を,部分的に観測可能なターンベース確率ゲーム(POTBSG)にさらに単純化できる確率的スタックルバーグゲーム(英語版)の解として計算できると主張している。攻撃者も被害者も、それぞれの最適なポリシーから逸脱する恩恵を受けないため、そのような解決策は真に堅牢である。防御問題はNPハードであるが,多くのシナリオにおいて,最適マルコフディフェンスを多項式時間(サンプル複雑性)で計算(学習)できることを示す。 To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# コセットによるグロキング群乗法 Grokking Group Multiplication with Cosets ( http://arxiv.org/abs/2312.06581v2 ) ライセンス: Link先を確認	Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman,	(参考訳) ディープニューラルネットワークの複雑で予測不可能な性質は、多くのハイテイクなアプリケーションで安全な使用を妨げている。ディープニューラルネットワークを解釈するために開発されたテクニックは数多くあるが、いずれもかなりの制限がある。アルゴリズムタスクは、ニューラルネットワークをエンドツーエンドに解釈するための実りあるテスト場であることが証明されている。以前の研究に基づいて、置換群$S_5$と$S_6$の算術的な 'grokked'' を持つ1つの隠れた層ネットワークを完全にリバースエンジニアリングしました。モデルは全群の真の部分群構造を発見し、置換群の部分群を用いて群演算を分解するニューラルネットワークに収束する。我々は,モデル機構のリバースエンジニアリングについて考察し,この理論が回路の機能の忠実な記述であることを確認した。また,本研究をChughtai et al [4]と比較することで,解釈可能性研究における現在の課題にも注意を払っている。 The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have ``grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group's subgroups. We relate how we reverse engineered the model's mechanisms and confirmed our theory was a faithful description of the circuit's functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. [4] which alleges to find a different algorithm for this same problem.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 安全なマルチタスクベイズ最適化に向けて Towards Safe Multi-Task Bayesian Optimization ( http://arxiv.org/abs/2312.07281v3 ) ライセンス: Link先を確認	Jannis O. Lübsen, Christian Hespe, Annika Eichler,	(参考訳) ベイズ最適化は、高サンプリング効率とノイズロバスト性のために、システムの安全なオンライン最適化のための非常に効果的なツールとして登場した。効率をさらに高めるため、システムの物理モデルを最適化プロセスに組み込むことができ、高速化することができる。これらのモデルは実際のシステムの近似を提供することができ、それらの評価は極めて安価である。モデルと現実の類似性は、最適化プロセス内で学習される追加のハイパーパラメータによって表現される。安全はベイズ最適化のようなオンライン最適化手法にとって重要な基準であり、既知のハイパーパラメータの仮定の下で安全保証を提供する最近の研究によって解決されている。しかし実際には、これは適用されない。そこで我々は,マルコフ連鎖モンテカルロ法によるハイパーパラメータ後部分布からの信頼領域の計算を含むマルチタスク設定を満たすために,ロバストなガウス過程の一様誤差境界を拡張した。その後、モデルの測定を取り入れつつ、システムの安全な最適化を容易にするために堅牢な安全性境界が採用される。シミュレーションの結果,従来のベイズ最適化法に比べて高コストで性能評価を行うことが可能であることが示唆された。 Bayesian optimization has emerged as a highly effective tool for the safe online optimization of systems, due to its high sample efficiency and noise robustness. To further enhance its efficiency, reduced physical models of the system can be incorporated into the optimization process, accelerating it. These models are able to offer an approximation of the actual system, and evaluating them is significantly cheaper. The similarity between the model and reality is represented by additional hyperparameters, which are learned within the optimization process. Safety is a crucial criterion for online optimization methods such as Bayesian optimization, which has been addressed by recent works that provide safety guarantees under the assumption of known hyperparameters. In practice, however, this does not apply. Therefore, we extend the robust Gaussian process uniform error bounds to meet the multi-task setting, which involves the calculation of a confidence region from the hyperparameter posterior distribution utilizing Markov chain Monte Carlo methods. Subsequently, the robust safety bounds are employed to facilitate the safe optimization of the system, while incorporating measurements of the models. Simulation results indicate that the optimization can be significantly accelerated for expensive to evaluate functions in comparison to other state-of-the-art safe Bayesian optimization methods, contingent on the fidelity of the models.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 77点以上のテキストトークン:CLIP-Style ModelsをDense Captionで評価 A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions ( http://arxiv.org/abs/2312.08578v2 ) ライセンス: Link先を確認	Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, Adriana Romero-Soriano,	(参考訳) 膨大なビジョン言語データセットのキュレーション方法は、データセットのサイズと品質をトレードオフする。しかし、利用可能なキャプションの最高品質でさえ、画像のリッチな視覚的詳細を捉えるには、あまりにも短すぎる。濃密で高度に整合した画像テキストペアの価値を示すために,1000語以上を平均的に表現した7805の自然画像を含むDensely Captioned Images (DCI)データセットを収集した。画像の特定の部分に関連する正確かつ信頼性の高いキャプションを用いて、画像内容の視覚言語モデル(VLM)理解を、各キャプションと対応するサブクロップとを一致させる新しいタスクで評価することができる。現在のモデルは77のテキストトークンに制限されることが多いため、各キャプションの長さが制限された要約版(sDCI)も導入する。標準ベンチマークを進歩させる最新の技術は、我々のsDCIベースのベンチマークの大幅な改善と一致しないことを示す。最後に, sDCIを用いてCLIPを微調整し, トレーニングセットが小さいにもかかわらず, ベースラインを大幅に改善した。人間の注釈付き高密度画像キャプションデータセットを初めてリリースすることで、次世代のVLMのための新しいベンチマークや微調整のレシピの開発を可能にしたいと考えています。 Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of available curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with mask-aligned descriptions averaging above 1000 words each. With precise and reliable captions associated with specific parts of an image, we can evaluate vision-language models' (VLMs) understanding of image content with a novel task that matches each caption with its corresponding subcrop. As current models are often limited to 77 text tokens, we also introduce a summarized version (sDCI) in which each caption length is limited. We show that modern techniques that make progress on standard benchmarks do not correspond with significant improvement on our sDCI based benchmark. Lastly, we finetune CLIP using sDCI and show significant improvements over the baseline despite a small training set. By releasing the first human annotated dense image captioning dataset, we hope to enable the development of new benchmarks or fine-tuning recipes for the next generation of VLMs to come.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# CAT:好ましくないグラフをトリミングするための因果グラフ注意ネットワーク CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph ( http://arxiv.org/abs/2312.08672v3 ) ライセンス: Link先を確認	Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li,	(参考訳) グラフ注意ネットワーク(GAT)に採用されているローカルアテンション誘導メッセージパッシングメカニズム(LAMP)は、グラフ上のより優れたローカルアグリゲーションのために、近隣ノードの重要性を適応的に学習するように設計されている。しかし、既存のGATは、異種近傍の高割合が中央ノードの自己アテンションを弱め、表現空間内の類似ノードから中央ノードが逸脱する結果となるため、異種グラフの顕著な識別能力の低下に悩まされる。本稿では, 隣接ノードが生成するこのような効果をディストラクション効果(DE)と呼ぶ。隣接するノードのDEを推定し、弱めるために、ヘテロ親和性グラフ(CAT)をトリミングするための因果グラフ注意ネットワークを提案する。 DEを推定するために、DEは2つの経路(近傍に割り当てられた注意をグラフ化し、中央ノードの自己注意を減らす)を通して生成されるので、因果推定の一種であり、介入データから推定できるDEのモデルにトータルエフェクトを使用します。我々は提案したCATフレームワークのベースモデルとして3つの代表的GATを採用し、3つの異なるサイズで7つのヘテロ親和性データセット上で実験を行う。比較実験により、CATは全てのベースGATモデルのノード分類精度を向上させることができることが示された。アブレーション実験と可視化により、CATによる識別能力の向上がさらに検証された。ソースコードはhttps://github.com/GeoX-Lab/CATで入手できる。 Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# 物理インフォームドニューラルネットワークによる軟組織非線形生体力学モデルにおける材料特性の推定 Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models ( http://arxiv.org/abs/2312.09787v3 ) ライセンス: Link先を確認	Federica Caforio, Francesco Regazzoni, Stefano Pagani, Elias Karabelas, Christoph Augustin, Gundolf Haase, Gernot Plank, Alfio Quarteroni,	(参考訳) 臨床応用のための生物物理モデルの開発は、その予測的性質と臨床データの解釈を支援する能力のおかげで、研究コミュニティで急速に進展している。しかし、高解像度で正確な多物理計算モデルは計算コストが高く、そのパーソナライゼーションには多数のパラメータの微調整が伴う。本研究では,物理インフォームドニューラルネットワーク(PINN)と3次元軟組織非線形生体力学モデルを組み合わせた新しいアプローチを提案する。提案した学習アルゴリズムは、限られた量の変位から情報を符号化し、場合によっては、臨床環境で日常的に取得できる歪みデータを符号化し、偏微分方程式に基づく数学的モデルで表される問題の物理学と組み合わせ、問題を正規化し、収束性を向上させる。提案手法の精度とロバスト性を示し, 患者特異的で不均一な物理的特性, 組織硬度特性のロバストかつ有効同定を可能にする大きな可能性を示すために, いくつかのベンチマークを提出した。特に, 傷痕組織の存在, 位置, 重症度を検出するPINNの能力を実証し, 特に心臓疾患の診断における個人化シミュレーションモデルの開発に有用であることを示す。 The development of biophysical models for clinical applications is rapidly advancing in the research community, thanks to their predictive nature and their ability to assist the interpretation of clinical data. However, high-resolution and accurate multi-physics computational models are computationally expensive and their personalisation involves fine calibration of a large number of parameters, which may be space-dependent, challenging their clinical translation. In this work, we propose a new approach which relies on the combination of physics-informed neural networks (PINNs) with three-dimensional soft tissue nonlinear biomechanical models, capable of reconstructing displacement fields and estimating heterogeneous patient-specific biophysical properties. The proposed learning algorithm encodes information from a limited amount of displacement and, in some cases, strain data, that can be routinely acquired in the clinical setting, and combines it with the physics of the problem, represented by a mathematical model based on partial differential equations, to regularise the problem and improve its convergence properties. Several benchmarks are presented to show the accuracy and robustness of the proposed method and its great potential to enable the robust and effective identification of patient-specific, heterogeneous physical properties, s.a. tissue stiffness properties. In particular, we demonstrate the capability of the PINN to detect the presence, location and severity of scar tissue, which is beneficial to develop personalised simulation models for disease diagnosis, especially for cardiac applications.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# スケッチとシフト:圧縮クラスタリングのためのロバストデコーダ Sketch and shift: a robust decoder for compressive clustering ( http://arxiv.org/abs/2312.09940v2 ) ライセンス: Link先を確認	Ayoub Belhadji, Rémi Gribonval,	(参考訳) 圧縮学習は,まず大規模なデータセットを低次元のスケッチベクトルに要約し,このスケッチから学習に必要な潜時情報を復号することで,大規模学習のメモリフットプリントを大幅に削減する,新たなアプローチである。ランダムな特徴に基づくスケッチの情報保存保証の最近の進歩を踏まえ、主要な目的は、この情報を堅牢かつ効率的に抽出するために、容易に修正できるアルゴリズム(デコーダと呼ばれる)を設計することである。非凸最適化問題に対処するために、様々なヒューリスティックな手法が提案されている。圧縮クラスタリングの場合、標準的なヒューリスティックは CL-OMPR である。しかし、CL-OMPRのチューニングは困難であり、その堅牢性の検討は見落とされた。本研究では,CL-OMPRを精査し,その限界を回避する。特に,このアルゴリズムは,有利なシナリオにおいても,クラスタの回復に失敗する可能性があることを示す。このアルゴリズムの欠点は,アルゴリズムのコアステップに現れる相関関数の構造に関連した最適化の難しさに起因すると考えられる。これらの制限に対処するため、CL-OMPRよりも大幅に改善された代替デコーダを提案する。その設計は、カーネル密度推定器の局所的な最大値を検出する古典的なアプローチである平均シフトアルゴリズムから着想を得ている。提案アルゴリズムは,従来より10倍小さいMNISTデータセットのスケッチからクラスタリング情報を抽出することができる。 Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.	翻訳日:2024-06-19 09:12:15 公開日:2024-06-17
# PPOのカラーノイズ:相関行動サンプリングによる探索と性能の改善 Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling ( http://arxiv.org/abs/2312.11091v2 ) ライセンス: Link先を確認	Jakob Hollenstein, Georg Martius, Justus Piater,	(参考訳) PPO(Proximal Policy Optimization, Proximal Policy Optimization)は、政治の深い強化学習手法であり、探索に確率的政策を用いる。本稿では,色付き雑音に基づくPPOの確率的ポリシー変種を提案する。従来の研究では、活動雑音における時間的相関の重要性を強調して、非政治強化学習における効果的な探索を行った。そこで本研究では,PPOのような政治手法においても,相関ノイズが探索を促進できるかどうかを考察する。行動選択のための相関ノイズは学習性能を向上し,現在普及している非相関性ホワイトノイズ手法よりも優れることがわかった。ピンクノイズが有効であることが判明した非政治学習とは異なり、色付きノイズは白とピンクの中間であり、PPOのオンライン学習に最適であることがわかった。我々は,データ収集のための並列シミュレーション環境の数を変更することで,更新毎に収集したデータ量を変化させる影響について検討し,より多くの並列環境において,より強い相関ノイズが有効であることを示した。実装の大幅な影響と容易さのため、PPOのデフォルトノイズ源として相関ノイズに切り替えることを推奨する。 Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments for data collection and observed that with a larger number of parallel environments, more strongly correlated noise is beneficial. Due to the significant impact and ease of implementation, we recommend switching to correlated noise as the default noise source in PPO.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# MAC-SQL: テキストからSQLへの多言語コラボレーションフレームワーク MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL ( http://arxiv.org/abs/2312.11242v4 ) ライセンス: Link先を確認	Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, Zhoujun Li,	(参考訳) 近年の LLM ベースの Text-to-SQL メソッドは "巨大な" データベースや,マルチステップ推論を必要とする複雑なユーザ質問に対して,パフォーマンスが著しく低下している。さらに、既存のほとんどの手法は、外部ツールやモデルコラボレーションを利用したLLMの重要な重要性を無視している。これらの課題に対処するために,新しいLLMベースのマルチエージェント協調フレームワークであるMAC-SQLを紹介する。本フレームワークは,外部ツールやモデルを用いてより小さなサブデータベースを取得し,誤ったSQLクエリを精査する2つの補助エージェントを伴って,数発の連鎖推論によるテキストからSQL生成のためのコアデコンポーザエージェントで構成されている。 Decomposerエージェントは、必要に応じてアクティベートされ、Text-to-SQLパースのための新機能やツールに対応するために拡張可能な補助エージェントと協調する。我々のフレームワークでは、まず、GPT-4 を全てのエージェントタスクの強力なバックボーン LLM として利用し、フレームワークの上限を決定する。次に、Code Llama 7Bを活用して、オープンソースの命令フォローモデルSQL-Llamaを微調整し、GPT-4のように全てのタスクを達成します。実験によると、SQL-Llama はバニラ GPT-4 のベースライン精度 46.35 と比較して 43.94 の実行精度を達成している。執筆時点で、MAC-SQL+GPT-4はBIRDベンチマークで評価すると59.59の実行精度を達成し、そのホールドアウトテストセット(https://github.com/wbbeyourself/MAC-SQL)上に新しい最先端(SOTA)を確立する。 Recent LLM-based Text-to-SQL methods usually suffer from significant performance degradation on "huge" databases and complex user questions that require multi-step reasoning. Moreover, most existing methods neglect the crucial significance of LLMs utilizing external tools and model collaboration. To address these challenges, we introduce MAC-SQL, a novel LLM-based multi-agent collaborative framework. Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries. The decomposer agent collaborates with auxiliary agents, which are activated as needed and can be expanded to accommodate new features or tools for effective Text-to-SQL parsing. In our framework, We initially leverage GPT-4 as the strong backbone LLM for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model, SQL-Llama, by leveraging Code Llama 7B, to accomplish all tasks as GPT-4 does. Experiments show that SQL-Llama achieves a comparable execution accuracy of 43.94, compared to the baseline accuracy of 46.35 for vanilla GPT-4. At the time of writing, MAC-SQL+GPT-4 achieves an execution accuracy of 59.59 when evaluated on the BIRD benchmark, establishing a new state-of-the-art (SOTA) on its holdout test set (https://github.com/wbbeyourself/MAC-SQL).	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# StarCraft IIをプレイする大規模言語モデル - 要約アプローチのベンチマークとチェーン Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach ( http://arxiv.org/abs/2312.11865v2 ) ライセンス: Link先を確認	Weiyu Ma, Qirui Mi, Xue Yan, Yuqiao Wu, Runji Lin, Haifeng Zhang, Jun Wang,	(参考訳) StarCraft IIは、正確なマイクロレベルの操作と戦略的マクロ認識の両方を必要とするため、AIエージェントにとって困難なベンチマークである。しかし、AlphastarやSCCといった以前の研究は、StarCraft IIに対処する上で素晴らしい成果を上げているが、長期的な戦略計画と戦略解釈性には欠点がある。 VoyageやMetaGPTといった新たな大規模言語モデル(LLM)エージェントは、複雑なタスクを解決する大きな可能性を示している。そこで我々は,高度に複雑なRTSゲームであるStarCraft IIにおけるLLMの能力を検証することを目指しており,LLMの推論能力を最大限活用するために,LLMエージェントと対話可能なテキストStratCraft II環境を開発する。第2に,ゲーム情報の解析,コマンドレコメンデーションの提供,戦略的意思決定のための,生観測処理のための単一フレーム要約と多フレーム要約を含む要約手法を提案する。実験は、まず、人間の専門家による評価と、ゲームにおけるLLMエージェントの熟達度の評価と、ゲーム内のLLMエージェントのパフォーマンス、そして、LLMエージェントのゲームパフォーマンスと、勝利率や要約の連鎖の影響といった側面を含む2つの部から成っている。 1. LLMは、スタークラフトIIのシナリオに対処するために必要な知識及び複雑な計画能力を有する。 2. 人間の専門家は、LLMエージェントの演奏は、スタークラフトIIを8年間プレイした平均的な選手の演奏に近いものとみなす。 3. LLMエージェントは、Harder(Lv5)の難易度で構築されたAIを倒すことができる。コードをオープンソース化し、LLMエージェントがStarCraft IIをプレイするデモビデオを公開しました。 StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS game.To conveniently take full advantage of LLMs` reasoning abilities, we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment consists of two parts: first, an evaluation by human experts, which includes assessing the LLMs`s mastery of StarCraft II knowledge and the performance of LLM agents in the game; second, the in game performance of LLM agents, encompassing aspects like win rate and the impact of Chain of Summarization.Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI at the Harder(Lv5) difficulty level. We have open sourced the code and released demo videos of LLM agent playing StarCraft II.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# SHAPの形成:レイヤワイズ近隣選択による安定性向上 Shaping Up SHAP: Enhancing Stability through Layer-Wise Neighbor Selection ( http://arxiv.org/abs/2312.12115v2 ) ライセンス: Link先を確認	Gwladys Kelodjou, Laurence Rozé, Véronique Masson, Luis Galárraga, Romaric Gaudel, Maurice Tchuente, Alexandre Termier,	(参考訳) ディープラーニングやアンサンブル手法などの機械学習技術は、複雑な現実世界のタスクを処理できるため、様々な領域で広く使われている。しかしながら、ブラックボックスの性質は、コンピュータによる意思決定の公平性、信頼性、透明性について、様々な懸念を引き起こしている。これにより、ブラックボックスアルゴリズムによる個々の決定に関する説明を提供する、ローカルなポストホックな説明可能性法が出現した。これらの手法の中で、Kernel SHAPはモデルに依存しない性質と十分に確立された理論的枠組みのために広く使われている。これらの強みにもかかわらず、Kernel SHAPは高い不安定さに悩まされ、同じ入力を持つメソッドの異なる実行は、非常に異なる説明をもたらす可能性があるため、説明の関連性が低下する。本論文の貢献は2つある。一方, Kernel SHAP の不安定性は確率的近傍選択法によって引き起こされることを示す。一方、第1層の連立と呼ばれる第1層の摂動に隣人の世代を限定することにより、完全に安定し、計算効率が良く、意味のある新しい特徴帰属法が得られることを示す。 Machine learning techniques, such as deep learning and ensemble methods, are widely used in various domains due to their ability to handle complex real-world tasks. However, their black-box nature has raised multiple concerns about the fairness, trustworthiness, and transparency of computer-assisted decision-making. This has led to the emergence of local post-hoc explainability methods, which offer explanations for individual decisions made by black-box algorithms. Among these methods, Kernel SHAP is widely used due to its model-agnostic nature and its well-founded theoretical framework. Despite these strengths, Kernel SHAP suffers from high instability: different executions of the method with the same inputs can lead to significantly different explanations, which diminishes the relevance of the explanations. The contribution of this paper is two-fold. On the one hand, we show that Kernel SHAP's instability is caused by its stochastic neighbor selection procedure, which we adapt to achieve full stability without compromising explanation fidelity. On the other hand, we show that by restricting the neighbors generation to perturbations of size 1 -- which we call the coalitions of Layer 1 -- we obtain a novel feature-attribution method that is fully stable, computationally efficient, and still meaningful.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# 非線形量子古典力学におけるコオポモン軌道 Koopmon trajectories in nonadiabatic quantum-classical dynamics ( http://arxiv.org/abs/2312.13878v2 ) ライセンス: Link先を確認	Werner Bauer, Paul Bergold, François Gay-Balmaz, Cesare Tronci,	(参考訳) 完全量子非線形力学の計算コストを軽減するために、クープマン波動関数の理論に基づく混合量子古典(MQC)粒子法を提案する。従来のMQCモデルは、ハイゼンベルクの原理に違反したような一貫性の問題にしばしば悩まされるが、我々は、コップマンの古典力学をヒルベルト空間とシンプレクティック幾何学の手法を混合することにより、これらの困難を克服する。結果の連続体モデルは変分構造とハミルトン構造の両方を楽しむが、その非線形な性格は適切な閉包を求める。ここでは、基本となる行動原理から、以前にチーム内で開発された正規化手法を適用します。このステップは、計算粒子の軌道、コオポモン(英語版)、位相空間におけるラグランジアン古典経路のサンプリングを導入する特異解アンザッツ(英語版)を可能にする。タリーの非線形問題の場合、標準的なMQCエレンフェストシミュレーションでは達成できない精度のレベルで完全に量子シミュレーションの結果を再現する。さらに、コオポモン法は、同様の完全量子アプローチに対して計算的に有利であり、本研究でも検討されている。さらに, MQC 処理がほとんど適用できない超強結合系と深部強結合系の両方において, Rabi 問題を考慮し, 手法の限界を検証した。この場合、この手法は完全な量子結果の一部を再現することに成功した。 In order to alleviate the computational costs of fully quantum nonadiabatic dynamics, we present a mixed quantum-classical (MQC) particle method based on the theory of Koopman wavefunctions. Although conventional MQC models often suffer from consistency issues such as the violation of Heisenberg's principle, we overcame these difficulties by blending Koopman's classical mechanics on Hilbert spaces with methods in symplectic geometry. The resulting continuum model enjoys both a variational and a Hamiltonian structure, while its nonlinear character calls for suitable closures. Benefiting from the underlying action principle, here we apply a regularization technique previously developed within our team. This step allows for a singular solution ansatz which introduces the trajectories of computational particles - the koopmons - sampling the Lagrangian classical paths in phase space. In the case of Tully's nonadiabatic problems, the method reproduces the results of fully quantum simulations with levels of accuracy that are not achieved by standard MQC Ehrenfest simulations. In addition, the koopmon method is computationally advantageous over similar fully quantum approaches, which are also considered in our study. As a further step, we probe the limits of the method by considering the Rabi problem in both the ultrastrong and the deep strong coupling regimes, where MQC treatments appear hardly applicable. In this case, the method succeeds in reproducing parts of the fully quantum results.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# パスワイズラッソのための量子アルゴリズム Quantum Algorithms for the Pathwise Lasso ( http://arxiv.org/abs/2312.14141v2 ) ライセンス: Link先を確認	Joao F. Doriguello, Debbie Lim, Chi Seng Pun, Patrick Rebentrost, Tushar Vaidya,	(参考訳) 古典的LARS(Least Angle Regression)パスワイズアルゴリズムに基づいて,$\ell_1$-penaltyの量子高次元線形回帰アルゴリズムを提案する。ラッソの古典的アルゴリズムと同様に、我々の量子アルゴリズムは、ペナルティ項が変化するにつれて完全な正規化パスを提供するが、特定の条件下では反復ごとに2次的に高速である。 D\"urr と Hoyer (arXiv'96) の量子最小フィンディングルーチンを使うことで、各イテレーションにおける結合時間を取得することで、機能数$d$の二次的なスピードアップが可能になる。次に、この単純な量子アルゴリズムを改善し、Chen と de Wolf (ICALP'23) の近似量子最小有限ルーチンを用いて、特徴数 $d$ と観測数 $n$ の両方で二次的なスピードアップを得る。我々の主な貢献の一つとして、近似量子最小探索によって探索される結合時間を近似的に計算する量子ユニタリを構築している。結合時間はもはや正確に計算されないため、得られた近似量子アルゴリズムが良い解を得るかどうかはもはや明らかではない。 2つ目の主な貢献として、KKT条件の近似バージョンと双対性ギャップを通じて、LARSアルゴリズム(つまり我々の量子アルゴリズム)がエラーに対して堅牢であることを証明する。これは、結合時間がほぼ計算された場合、ラッソのコスト関数を小さな誤差まで最小化する経路を出力することを意味する。さらに、ガウス分布から観測結果がサンプリングされると、量子アルゴリズムの複雑さは、古典的なLARSアルゴリズムよりも指数関数的に良い$n$にのみ依存し、$d$に二次的な改善を保っていることを示す。最後に、標準的なLARSアルゴリズムから$d$の線形スケーリングとともに、$n$の多元対数依存を保ち続ける非等化アルゴリズムを提案する。 We present a novel quantum high-dimensional linear regression algorithm with an $\ell_1$-penalty based on the classical LARS (Least Angle Regression) pathwise algorithm. Similarly to available classical algorithms for Lasso, our quantum algorithm provides the full regularisation path as the penalty term varies, but quadratically faster per iteration under specific conditions. A quadratic speedup on the number of features $d$ is possible by using the quantum minimum-finding routine from D\"urr and Hoyer (arXiv'96) in order to obtain the joining time at each iteration. We then improve upon this simple quantum algorithm and obtain a quadratic speedup both in the number of features $d$ and the number of observations $n$ by using the approximate quantum minimum-finding routine from Chen and de Wolf (ICALP'23). As one of our main contributions, we construct a quantum unitary to approximately compute the joining times to be searched over by the approximate quantum minimum finding. Since the joining times are no longer exactly computed, it is no longer clear that the resulting approximate quantum algorithm obtains a good solution. As our second main contribution, we prove, via an approximate version of the KKT conditions and a duality gap, that the LARS algorithm (and thus our quantum algorithm) is robust to errors. This means that it still outputs a path that minimises the Lasso cost function up to a small error if the joining times are approximately computed. Moreover, we show that, when the observations are sampled from a Gaussian distribution, our quantum algorithm's complexity only depends polylogarithmically on $n$, exponentially better than the classical LARS algorithm, while keeping the quadratic improvement on $d$. Finally, we propose a dequantised algorithm that also retains the polylogarithmic dependence on $n$, albeit with the linear scaling on $d$ from the standard LARS algorithm.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# README:データ中心NLPによる医療ジャーゴンのブリッジと患者教育への理解 README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP ( http://arxiv.org/abs/2312.15561v3 ) ライセンス: Link先を確認	Zonghai Yao, Nandyala Siddharth Kantu, Guanghao Wei, Hieu Tran, Zhangqi Duan, Sunjae Kwon, Zhichao Yang, README annotation team, Hong Yu,	(参考訳) 医療の進歩は、患者中心のアプローチ、特にElectronic Health Records(EHR)へのアクセスによって促進されるセルフケアと患者教育に焦点を移している。しかし, EHRの医療ジャーゴンは, 患者の理解に重大な課題をもたらす。そこで我々は,複雑な医療用語を患者フレンドリーなレイ言語に単純化することを目的とした,レイ定義を自動的に生成する新しいタスクを提案する。最初、READMEデータセットを作成しました。これは、5万以上のユニークな(医療用語、レイ定義)ペアと30万の言及の広範なコレクションで、それぞれがドメインの専門家が手動で注釈付けしたコンテキスト対応のレイ定義を提供しています。また、データフィルタリング、拡張、選択を相乗化してデータ品質を改善する、データ中心のHuman-AIパイプラインも開発しました。その後、READMEをモデルトレーニングデータとして使用し、検索補助生成法を用いて幻覚を低減し、モデル出力の品質を向上させる。我々の大規模な自動および人為的評価は、高品質なデータで微調整されたオープンソースのモバイルフレンドリなモデルが、ChatGPTのような最先端のクローズドソースな大規模言語モデルの性能にマッチしたり、超えたりできることを示している。この研究は、患者教育における知識ギャップを解消し、患者中心の医療ソリューションを前進させる重要な取り組みである。 The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical terms into patient-friendly lay language. We first created the README dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions, each offering context-aware lay definitions manually annotated by domain experts. We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality. We then used README as the training data for models and leveraged a Retrieval-Augmented Generation method to reduce hallucinations and improve the quality of model outputs. Our extensive automatic and human evaluations demonstrate that open-source mobile-friendly models, when fine-tuned with high-quality data, are capable of matching or even surpassing the performance of state-of-the-art closed-source large language models like ChatGPT. This research represents a significant stride in closing the knowledge gap in patient education and advancing patient-centric healthcare solutions.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# インストラクションフュージョン:ハイブリッド化によるプロンプト進化の促進 Instruction Fusion: Advancing Prompt Evolution through Hybridization ( http://arxiv.org/abs/2312.15692v4 ) ライセンス: Link先を確認	Weidong Guo, Jiuding Yang, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu,	(参考訳) コード生成に特化したLLM(Large Language Models)の微調整は、オープンドメインのコーディングクエリを使うことで、顕著な進歩を遂げた。成功にもかかわらず、Evol-Instructのような既存の方法論はパフォーマンスの制限に直面し、コード生成タスクのさらなる強化を妨げる。本稿では,既存のプロンプト進化手法の制約について検討し,新しいアプローチであるインストラクション・フュージョン(IF)を導入する。 IFは、ハイブリッド化プロセスを通じて、2つの異なるプロンプトを革新的に組み合わせ、コードLLMのトレーニングプロンプトの進化を強化する。提案手法は,HumanEval,HumanEval+,MBPP,MBPP+,MultiPL-Eの5つのコード生成ベンチマークにおけるコードLLMの性能を著しく向上し,コード生成におけるLLMの能力向上にインストラクションフュージョンが有効であることを示す。 The fine-tuning of Large Language Models (LLMs) specialized in code generation has seen notable advancements through the use of open-domain coding queries. Despite the successes, existing methodologies like Evol-Instruct encounter performance limitations, impeding further enhancements in code generation tasks. This paper examines the constraints of existing prompt evolution techniques and introduces a novel approach, Instruction Fusion (IF). IF innovatively combines two distinct prompts through a hybridization process, thereby enhancing the evolution of training prompts for code LLMs. Our experimental results reveal that the proposed novel method effectively addresses the shortcomings of prior methods, significantly improving the performance of Code LLMs across five code generation benchmarks, namely HumanEval, HumanEval+, MBPP, MBPP+ and MultiPL-E, which underscore the effectiveness of Instruction Fusion in advancing the capabilities of LLMs in code generation.	翻訳日:2024-06-19 07:14:24 公開日:2024-06-17
# 量子誤り訂正符号の準最適性能 The Near-optimal Performance of Quantum Error Correction Codes ( http://arxiv.org/abs/2401.02022v2 ) ライセンス: Link先を確認	Guo Zheng, Wenhao He, Gideon Lee, Liang Jiang,	(参考訳) Knill-Laflamme (KL) 条件は正確な量子誤り訂正符号を区別し、最先端の符号の発見に重要な役割を果たしている。しかし、正確な符号の族は非常に制限的であり、必ずしも最高の性能の符号を含まない。したがって、一般化された定量的な性能指標を開発することが望ましい。このレターでは、任意の符号と雑音に対する簡潔で最適化のない計量である準最適チャネル忠実度を導出する。この計量は最適なコード性能に限定した狭い2辺の値を与え、KL条件で要求されるのと全く同じ入力で評価することができる。複数の量子ビット符号と発振器符号の例を通して、準最適チャネル忠実度の数値的利点を示す。従来の最適化手法と比較して、計算コストの削減により、何百もの平均励起を符号化する発振器など、これまでアクセスできない大きさのシステムをシミュレートすることができる。さらに,熱力学符号とGottesman-Kitaev-Preskill (GKP)符号のほぼ最適性能を解析的に導出した。特に、励起損失下でのGKP符号の性能は、そのエネルギーと単調に改善し、他の発振器符号とは異なる無限エネルギーでの漸近極限に収束する。 The Knill-Laflamme (KL) conditions distinguish exact quantum error correction codes, and it has played a critical role in the discovery of state-of-the-art codes. However, the family of exact codes is a very restrictive one and does not necessarily contain the best-performing codes. Therefore, it is desirable to develop a generalized and quantitative performance metric. In this Letter, we derive the near-optimal channel fidelity, a concise and optimization-free metric for arbitrary codes and noise. The metric provides a narrow two-sided bound to the optimal code performance, and it can be evaluated with exactly the same input required by the KL conditions. We demonstrate the numerical advantage of the near-optimal channel fidelity through multiple qubit code and oscillator code examples. Compared to conventional optimization-based approaches, the reduced computational cost enables us to simulate systems with previously inaccessible sizes, such as oscillators encoding hundreds of average excitations. Moreover, we analytically derive the near-optimal performance for the thermodynamic code and the Gottesman-Kitaev-Preskill (GKP) code. In particular, the GKP code's performance under excitation loss improves monotonically with its energy and converges to an asymptotic limit at infinite energy, which is distinct from other oscillator codes.	翻訳日:2024-06-19 07:04:39 公開日:2024-06-17
# t-DGR:意思決定における連続学習のための軌道ベース深層生成再生法 t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making ( http://arxiv.org/abs/2401.02576v2 ) ライセンス: Link先を確認	William Yue, Bo Liu, Peter Stone,	(参考訳) 深い生成的リプレイは、意思決定タスクにおける継続的な学習のための有望なアプローチとして現れてきた。このアプローチは、これまで遭遇したタスクからトラジェクトリの生成を活用して、現在のデータセットを増大させることによって、破滅的な忘れの問題に対処する。しかし、既存の連続学習のための深層生成的再生法は、生成した軌跡の複雑な誤りに悩まされる自己回帰モデルに依存している。本稿では,軌道上の時間ステップに条件付きタスクサンプルを生成する生成モデルを用いて,意思決定タスクにおける継続学習のためのシンプルでスケーラブルで非自己回帰的手法を提案する。提案手法は連続世界ベンチマークで評価し, 連続学習手法の平均成功率測定値から最先端のパフォーマンスを達成できることを確認した。コードはhttps://github.com/WilliamYue37/t-DGRで公開されている。 Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR.	翻訳日:2024-06-19 07:04:39 公開日:2024-06-17
# MLLM-Protector:HurtingパフォーマンスのないMLLMの安全性を保証する MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance ( http://arxiv.org/abs/2401.02906v3 ) ライセンス: Link先を確認	Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang,	(参考訳) マルチモーダルな大規模言語モデル(MLLM)の展開は、視覚入力による悪意のある攻撃に対する感受性という、ユニークな脆弱性を生み出した。本稿では,このような攻撃に対してMLLMを防御する新たな課題について検討する。大型言語モデル (LLM) と比較して、MLLM には追加の画像モダリティが含まれている。画像は安全アライメント時に考慮されない「外部言語」として機能し、MLLMは有害な応答を生じやすくする。残念なことに、テキストベースのLLMで考慮された離散トークンとは異なり、画像信号の連続的な性質は重要なアライメントの課題を示しており、すべてのシナリオを完全にカバーすることは困難である。この脆弱性は、ほとんどの最先端のMLLMが、大規模なテキストベースの事前学習コーパスよりもはるかに少ない制限された画像テキストペアで微調整されているという事実により、さらに悪化している。これらの課題に対処するために,2つのサブタスクを解決するプラグアンドプレイ戦略であるMLLM-Protectorを導入する。 1)軽量害検知器を介して有害な応答を識別し、 2) 有害な応答を除毒剤を介して無害な応答に変換する。このアプローチは、MLLMの本来の性能を損なうことなく、悪意ある視覚入力によって引き起こされるリスクを効果的に軽減する。 MLLM-Protectorは,MLLMセキュリティの未適応な側面に対して,堅牢なソリューションを提供することを示す。 The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not considered during safety alignment, making MLLMs more prone to producing harmful responses. Unfortunately, unlike the discrete tokens considered in text-based LLMs, the continuous nature of image signals presents significant alignment challenges, which poses difficulty to thoroughly cover all possible scenarios. This vulnerability is exacerbated by the fact that most state-of-the-art MLLMs are fine-tuned on limited image-text pairs that are much fewer than the extensive text-based pretraining corpus, which makes the MLLMs more prone to catastrophic forgetting of their original abilities during safety fine-tuning. To tackle these challenges, we introduce MLLM-Protector, a plug-and-play strategy that solves two subtasks: 1) identifying harmful responses via a lightweight harm detector, and 2) transforming harmful responses into harmless ones via a detoxifier. This approach effectively mitigates the risks posed by malicious visual inputs without compromising the original performance of MLLMs. Our results demonstrate that MLLM-Protector offers a robust solution to a previously unaddressed aspect of MLLM security.	翻訳日:2024-06-19 07:04:39 公開日:2024-06-17
# 現実的な量子系のシミュレーションにおける過度パラメトリゼーションのキャラクタリゼーション Characterization of overparametrization in the simulation of realistic quantum systems ( http://arxiv.org/abs/2401.05500v2 ) ライセンス: Link先を確認	Matthew Duschenes, Juan Carrasquilla, Raymond Laflamme,	(参考訳) 量子コンピューティングデバイスは、量子状態を作成し、他の量子システムをシミュレートするために、実験パラメータを例外的に制御する必要がある。このような最適制御パラメータを見つけるために使用される古典的な最適化手順は、様々な学習様式を示すために理想化された設定でさらに示されている。十分な数のパラメータを持つシステムでは、準備された状態に対するグローバルな最適化とコンパイルされたユニタリ忠実度が指数関数的に速く到達する可能性がある。本稿では,演算子間のパラメータの有界化や共有など,実験的な制約が存在する場合の過パラメータ化現象のロバスト性や,実験的な設定に固有のノイズの存在について検討する。過度パラメータ化現象は、これらの現実的な環境では短時間で回復するが、量子ノイズまたは古典ノイズの蓄積により、臨界シミュレーション期間を過ぎて、忠実度はゼロに低下する。この臨界深度はノイズのスケールで対数的であり、最適忠実度は最初は深さで指数関数的に増加し、その後、深さで多項式的に減少し、ノイズで減少する。この結果から, パラメータ化アンサツェは環境からエントロピー効果を緩和し, 近い将来の量子デバイスでの実験的な実現を可能にした。 Quantum computing devices require exceptional control of their experimental parameters to prepare quantum states and simulate other quantum systems. Classical optimization procedures used to find such optimal control parameters, have further been shown in idealized settings to exhibit different regimes of learning. Of interest in this work is the overparameterization regime, where for systems with a sufficient number of parameters, global optima for prepared state and compiled unitary fidelities may potentially be reached exponentially quickly. Here, we study the robustness of overparameterization phenomena in the presence of experimental constraints on the controls, such as bounding or sharing parameters across operators, as well as in the presence of noise inherent to experimental setups. We observe that overparameterization phenomena are resilient in these realistic settings at short times, however fidelities decay to zero past a critical simulation duration due to accumulation of either quantum or classical noise. This critical depth is found to be logarithmic in the scale of noise, and optimal fidelities initially increase exponentially with depth, before decreasing polynomially with depth, and with noise. Our results demonstrate that parameterized ansatze can mitigate entropic effects from their environment, offering tantalizing opportunities for their application and experimental realization in near term quantum devices.	翻訳日:2024-06-19 07:04:39 公開日:2024-06-17
# 最小編集制約によるきめ細かい強化学習による大規模言語モデルの改善 Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint ( http://arxiv.org/abs/2401.06081v2 ) ライセンス: Link先を確認	Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Junchen Wan, Fuzheng Zhang, Di Zhang, Ji-Rong Wen,	(参考訳) 強化学習(Reinforcement Learning, RL)は、予期せぬアウトプットを防止し、有害性とエラーを減らすために、大規模言語モデル(LLM)の訓練に広く用いられている。しかし、既存のRLメソッドは、主にインスタンスレベルの報酬を採用しており、複雑な推論タスクのきめ細かい監督を提供することができず、不正につながるいくつかのキートークンに集中できない。そこで本研究では,生成モデルを報酬モデルとして組み込んだ新たなRL手法を提案する。これは,最小編集制約下での誤解書き換えタスクによってトレーニングされ,RLトレーニングのためのトークンレベル報酬を生成することができる。生成報酬モデルに基づいて、トレーニングのためのトークンレベルRL目標と、RLプロセスの安定化のための模倣ベース正規化を設計する。両方の目的は、誤った解に対するキートークンの学習に集中し、他の重要でないトークンの影響を減らします。数学的タスクと質問応答タスクの実験結果から,本手法の有効性が示された。私たちのコードとデータはhttps://github.com/RUCAIBox/RLMECで公開されています。 Reinforcement learning (RL) has been widely used in training large language models (LLMs) for preventing unexpected outputs, eg reducing harmfulness and errors. However, existing RL methods mostly adopt the instance-level reward, which is unable to provide fine-grained supervision for complex reasoning tasks, and can not focus on the few key tokens that lead to the incorrectness. To address it, we propose a new RL method named RLMEC that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process. And the both objectives focus on the learning of the key tokens for the erroneous solution, reducing the effect of other unimportant tokens. The experiment results on mathematical tasks and question-answering tasks have demonstrated the effectiveness of our approach. Our code and data are available at https://github.com/RUCAIBox/RLMEC.	翻訳日:2024-06-19 07:04:39 公開日:2024-06-17
# Stockformer: Wavelet Transform と Multi-Task Self-Attention Network に基づく価格変数ストック選択モデル Stockformer: A Price-Volume Factor Stock Selection Model Based on Wavelet Transform and Multi-Task Self-Attention Networks ( http://arxiv.org/abs/2401.06139v2 ) ライセンス: Link先を確認	Bohan Ma, Yushan Xue, Yuan Lu, Jing Chen,	(参考訳) 中国株式市場が発展し、市場構造が複雑化するにつれ、伝統的な量的取引手法はエスカレートする課題に直面している。特に、政策の不確実性や突然の経済的な出来事によって引き起こされる市場の頻繁な変動により、既存のモデルは市場のダイナミクスを正確に予測するのに苦労することが多い。これらの課題に対処するため,市場不安定性に関する応答性と予測精度の向上を目的とした,ウェーブレット変換とマルチタスク自己注意ネットワークを統合した価格-体積係数ストックセレクションモデルであるStockformerを紹介した。離散ウェーブレット変換により、ストックフォーマーは株価のリターンを高頻度と低頻度に分解し、急激な出来事を含む長期市場のトレンドと短期的な変動を注意深く捉えている。さらに、このモデルには、二重周波数時空間エンコーダとグラフ埋め込み技術が組み込まれ、ストック間の複雑な時間的および空間的関係を効果的に捉えることができる。マルチタスク学習戦略を採用することで、株価のリターンと方向性の傾向を同時に予測する。実験結果から、Stockformerは複数の実市場データセットにおいて、既存の先進的な手法よりも優れていることが示された。ストラテジーバックテストにおいて、Stockformerは、ダウンターンや揮発性の期間に特に高いパフォーマンスを維持し、市場の変動に高い適応性を示すような、市場条件全体にわたる例外的な安定性と信頼性を一貫して示している。金融分析分野におけるイノベーションとコラボレーションを促進するため、Stockformerモデルのコードはオープンソースとして公開され、GitHubリポジトリで公開されている。 As the Chinese stock market continues to evolve and its market structure grows increasingly complex, traditional quantitative trading methods are facing escalating challenges. Particularly, due to policy uncertainty and the frequent market fluctuations triggered by sudden economic events, existing models often struggle to accurately predict market dynamics. To address these challenges, this paper introduces Stockformer, a price-volume factor stock selection model that integrates wavelet transformation and a multitask self-attention network, aimed at enhancing responsiveness and predictive accuracy regarding market instabilities. Through discrete wavelet transform, Stockformer decomposes stock returns into high and low frequencies, meticulously capturing long-term market trends and short-term fluctuations, including abrupt events. Moreover, the model incorporates a Dual-Frequency Spatiotemporal Encoder and graph embedding techniques to effectively capture complex temporal and spatial relationships among stocks. Employing a multitask learning strategy, it simultaneously predicts stock returns and directional trends. Experimental results show that Stockformer outperforms existing advanced methods on multiple real stock market datasets. In strategy backtesting, Stockformer consistently demonstrates exceptional stability and reliability across market conditions-whether rising, falling, or fluctuating-particularly maintaining high performance during downturns or volatile periods, indicating a high adaptability to market fluctuations. To foster innovation and collaboration in the financial analysis sector, the Stockformer model's code has been open-sourced and is available on the GitHub repository: https://github.com/Eric991005/Multitask-Stockformer.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# 完全連結フィードフォワードニューラルネットワークにおける重み最適化のための閉形式解法 A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks ( http://arxiv.org/abs/2401.06699v2 ) ライセンス: Link先を確認	Slavisa Tomic, João Pedro Matos-Carvalho, Marko Beko,	(参考訳) 本研究は、完全連結フィードフォワードニューラルネットワークにおける重み付け最適化問題に対処する。バックプロパゲーション(BP)とチェーン規則勾配に基づく最適化(場合によっては繰り返し実行、潜在的に重荷、時間を要する)に基づく既存の手法とは異なり、提案手法は最小二乗法(LS)法を用いて閉形式における重み付け最適化の解を提供する。インプット・トゥ・アウトプット・マッピングがインジェクティブである場合、新しいアプローチでは、各ニューロンに対して各レイヤの重みのセットを共同で最適化することにより、1イテレーションでバックプロパゲーション方式で重みを最適化する。インプット・トゥ・アウトプット・マッピングが帰納的でない場合(例えば分類問題では)、提案手法は数イテレーションで最終解が得られるように容易に適応できる。既存のソリューションに対する重要な利点は、これらの計算(層内の全てのニューロン)が互いに独立していることである。さらに、その実行時間は、全てのネットワーク層の重みを最適化するために必要な計算の正確な数が得られるという意味で決定論的である(反復の場合、非射影写像の場合)。シミュレーションおよび実験結果から,提案手法であるBPLSは,既存の手法と精度で競合するが,実行時間ではかなり上回っていることがわかった。要約すると、新しい手法は実装が簡単で、既存の方法よりも競争力があり、計算効率が良く、並列実装に適している。 This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least squares (LS) methodology. In the case where the input-to-output mapping is injective, the new approach optimizes the weights in a back-propagating fashion in a single iteration by jointly optimizing a set of weights in each layer for each neuron. In the case where the input-to-output mapping is not injective (e.g., in classification problems), the proposed solution is easily adapted to obtain its final solution in a few iterations. An important advantage over the existing solutions is that these computations (for all neurons in a layer) are independent from each other; thus, they can be carried out in parallel to optimize all weights in a given layer simultaneously. Furthermore, its running time is deterministic in the sense that one can obtain the exact number of computations necessary to optimize the weights in all network layers (per iteration, in the case of non-injective mapping). Our simulation and empirical results show that the proposed scheme, BPLS, works well and is competitive with existing ones in terms of accuracy, but significantly surpasses them in terms of running time. To summarize, the new method is straightforward to implement, is competitive and computationally more efficient than the existing ones, and is well-tailored for parallel implementation.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# MADA: 高度劣化によるメタ適応最適化 MADA: Meta-Adaptive Optimizers through hyper-gradient Descent ( http://arxiv.org/abs/2401.08893v3 ) ライセンス: Link先を確認	Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher,	(参考訳) Adamの導入に続いて、ディープラーニングのための新しい適応最適化器が提案されている。これらのオプティマイザは一般的にいくつかのタスクで優れるが、すべてのタスクでAdamを均一に上回るものではない。本稿では,複数の既知のオプティマイザを一般化し,トレーニング中に最も適したオプティマイザを動的に学習する,統一オプティマイザフレームワークであるメタ適応オプティマイザ(MADA)を紹介する。 MADAのキーとなるアイデアは、オプティマイザの空間をパラメータ化して、トレーニング中に過度な降下を使って動的に探索することだ。我々は、MADAを視覚や言語タスクにおける他の人気のあるオプティマイザと経験的に比較し、MADAがAdamや他の人気のあるオプティマイザより一貫して優れており、サブ最適化されたハイパーパラメータに対して堅牢であることを確認した。 MADAは、GPT-2トレーニングや微調整において、他の一般的なオプティマイザと比較して、Adamよりも高い検証性能向上を実現している。 AVGradも提案する。AMSGradは最大演算子を平均演算子に置き換えたもので、高次最適化に適している。最後に、最適化器のパラメータ化補間が誤差境界(定数まで)を改善できることを示し、メタ最適化器の利点を示唆する収束解析を提供する。 Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# 適応器混合:微調整事前学習テキスト分類器の逆ロバスト性を高めるためのパラメータ効率の良い混合適応器 Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers ( http://arxiv.org/abs/2401.10111v2 ) ライセンス: Link先を確認	Tuc Nguyen, Thai Le,	(参考訳) 既存の研究は、パラメータ効率のよい微調整法(PEFT)を用いて微調整された分類タスクのための事前訓練された言語モデル(PLM)のトレーニングデータを増やすことで、敵攻撃下での堅牢性を高めることを示している。しかし、この敵対的なトレーニングパラダイムは、しばしばクリーンな入力のパフォーマンス低下を招き、新しい未知の攻撃を説明するために、データ全体を頻繁に再トレーニングする必要がある。これらの課題を克服しつつ、PEFTの利点と効率を生かし、(1)アダプタによる微調整と(2)ミックスアップによる敵の増強という2つのパラダイムを組み合わせた新しいアプローチを提案する。直感的には、AdpMixupファインチューンPLMは、クリーンかつ既知の逆数例を持つ複数のアダプタを持ち、予測中に異なる比率でそれらをインテリジェントに混合する。実験の結果,AdpMixupは6つのブラックボックス攻撃と2つのPLMに対して,既存の5つの下流タスクのベースラインと比較して,トレーニング効率とロバストネスの最良のトレードオフを実現していることがわかった。すべてのソースコードが利用可能になる。 Existing works show that augmenting the training data of pre-trained language models (PLMs) for classification tasks fine-tuned via parameter-efficient fine-tuning methods (PEFT) using both clean and adversarial examples can enhance their robustness under adversarial attacks. However, this adversarial training paradigm often leads to performance degradation on clean inputs and requires frequent re-training on the entire data to account for new, unknown attacks. To overcome these challenges while still harnessing the benefits of adversarial training and the efficiency of PEFT, this work proposes a novel approach, called AdpMixup, that combines two paradigms: (1) fine-tuning through adapters and (2) adversarial augmentation via mixup to dynamically leverage existing knowledge from a set of pre-known attacks for robust inference. Intuitively, AdpMixup fine-tunes PLMs with multiple adapters with both clean and pre-known adversarial examples and intelligently mixes them up in different ratios during prediction. Our experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks, compared to existing baselines on five downstream tasks across six varied black-box attacks and 2 PLMs. All source code will be available.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# 交通流予測のためのハイブリッド時変グラフニューラルネットワーク A novel hybrid time-varying graph neural network for traffic flow forecasting ( http://arxiv.org/abs/2401.10155v4 ) ライセンス: Link先を確認	Ben-Ao Dai, Bao-Lin Ye, Lingxi Li,	(参考訳) インテリジェント交通システムの効率化には,リアルタイムかつ正確な交通流予測が不可欠である。従来の手法では、都市道路網における交通ノード間の空間的相関を記述するために、事前に定義されたグラフを持つグラフニューラルネットワーク(GNN)を用いることが多い。しかし、これらの事前定義されたグラフは、既存の知識やグラフ生成手法によって制限されており、空間的相関の完全な図形を提供していない。データ駆動学習に基づく時間変化グラフは、これらの制限に対処しようとするが、トラフィックデータに固有の空間的相関を適切に捉えることに苦慮している。さらに、動的時間相関を捕捉するための現在のほとんどの手法は、時間的多頭部自己注意機構を用いた統一的な計算方式に依存しており、あるレベルでは不正確な結果をもたらす可能性がある。これらの課題を克服するために,交通流予測のためのハイブリッド時変グラフニューラルネットワーク(HTVGNN)を提案する。まず,時間変化マスク強化に基づく新しい時間的知覚多頭部自己認識機構を報告し,トラフィックネットワーク内の異なるトラフィックノード間の動的時間的依存関係をより正確にモデル化した。次に,道路ネットワークにおける異なる交通ノード間の静的および動的空間的関連を同時に学習するグラフ学習手法を提案する。一方、時間変化グラフの学習能力を高めるために、各時間ステップで学習したグラフを結合するグラフ学習機構が設計された。最後に,提案手法の有効性を4つの実データを用いて実証した。シミュレーションの結果,HTVGNNは最先端の時空間グラフニューラルネットワークモデルと比較して予測精度が優れていることがわかった。さらに、このアブレーション実験により、結合グラフ学習機構がHTVGNNの長期予測性能を効果的に向上できることを確認した。 Real-time and precise traffic flow prediction is vital for the efficiency of intelligent transportation systems. Traditional methods often employ graph neural networks (GNNs) with predefined graphs to describe spatial correlations among traffic nodes in urban road networks. However, these pre-defined graphs are limited by existing knowledge and graph generation methodologies, offering an incomplete picture of spatial correlations. While time-varying graphs based on data-driven learning have attempted to address these limitations, they still struggle with adequately capturing the inherent spatial correlations in traffic data. Moreover, most current methods for capturing dynamic temporal correlations rely on a unified calculation scheme using a temporal multi-head self-attention mechanism, which at some level might leads to inaccuracies. In order to overcome these challenges, we have proposed a novel hybrid time-varying graph neural network (HTVGNN) for traffic flow prediction. Firstly, a novel enhanced temporal perception multi-head self-attention mechanism based on time-varying mask enhancement was reported to more accurately model the dynamic temporal dependencies among distinct traffic nodes in the traffic network. Secondly, we have proposed a novel graph learning strategy to concurrently learn both static and dynamic spatial associations between different traffic nodes in road networks. Meanwhile, in order to enhance the learning ability of time-varying graphs, a coupled graph learning mechanism was designed to couple the graphs learned at each time step. Finally, the effectiveness of the proposed method HTVGNN was demonstrated with four real data sets. Simulation results revealed that HTVGNN achieves superior prediction accuracy compared to the state of the art spatio-temporal graph neural network models. Additionally, the ablation experiment verifies that the coupled graph learning mechanism can effectively improve the long-term prediction performance of HTVGNN.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# AFS-BM: バイナリマスキングによる適応的特徴選択によるモデル性能の向上 AFS-BM: Enhancing Model Performance through Adaptive Feature Selection with Binary Masking ( http://arxiv.org/abs/2401.11250v2 ) ライセンス: Link先を確認	Mehmet Y. Turali, Mehmet E. Lorasdagi, Ali T. Koc, Suleyman S. Kozat,	(参考訳) 一般機械学習(ML)コンテキストにおける特徴選択の問題について検討する。しかし,これらの手法はスケーラビリティ,高次元データ管理,特徴の相関処理,特徴の多様性への適応,ドメイン知識の統合といった課題に直面している。この目的のために,これらの問題を修復する「二項マスキングによる適応的特徴選択(AFS-BM)」を導入する。 AFS-BMは、同時特徴選択とモデルトレーニングのための共同最適化によってこれを達成している。特に、トレーニングプロセス中に特徴とモデルパラメータのセットを継続的に適応させるために、共同最適化とバイナリマスキングを行います。このアプローチにより、モデルの精度が大幅に向上し、計算要求が減少する。我々は、AFS-BMと、実生活のコンペティションからよく知られたデータセットを用いて確立された特徴選択手法を比較する、広範な実験セットを提供する。以上の結果から,AFS-BMの精度は大幅に向上し,計算量も大幅に削減された。これは、AFS-BMが訓練過程における機能の重要性の変化を動的に調整する能力によって、この分野に重要な貢献をしたためである。結果の複製性に関するコードをオープンに共有し、さらなる研究を促進する。 We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the "Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# Leggett-Garg不等式による量子ドットデバイスにおける電子輸送の量子性:非平衡グリーン関数アプローチ Quantumness of electron transport in quantum dot devices through Leggett-Garg inequalities: A non-equilibrium Green's function approach ( http://arxiv.org/abs/2401.12502v3 ) ライセンス: Link先を確認	Thingujam Yaiphalemba Meitei, Saikumar Krithivasan, Arijit Sen, Md Manirul Ali,	(参考訳) 電子状態のコヒーレントな操作は、ナノファブリケーションツールを利用することで量子ドット(QD)デバイスで達成できるが、これらのナノエレクトロニクスデバイスが量子力学的に振る舞う範囲を太くすることはしばしばである。そのため、電子状態のコヒーレントなダイナミクスが重要な役割を担っているため、量子技術の新興世界では、その非古典的な性質が最重要視される。このような背景から、LGI(Leggett-Garg inequality)の一般的な枠組みを利用して、ナノ構造を介する古典的および量子的輸送を、様々な2時間相関関数によって区別することができる。 2つの異なる時間における局所電荷検出を用いて、マルコフ力学と非マルコフ力学の両方の下で、元のLGIの量子違反が存在するかどうかを理論的に調査する。 LGI内の2時間相関子は、量子ランゲヴィン方程式を正確に解くことによって、非平衡グリーン関数(NEGF)によって導出される。貯水池と相互作用する量子系の非マルコフ力学の研究は、超高速な過渡状態における緩和現象を理解し、特に高速な量子デバイスに起こることを模倣するために重要である。非マルコフ記憶効果とともに電極の水平拡大を考慮し, 有限貯水池相関時間の影響を捉えることができる。さらに、電子貯水池間の有限バイアスを安全に考慮できるように、我々の計算では大きなバイアス制限はもはや課されない。我々のアプローチは、平衡から追い出される他の量子多体系の量子性を目撃する新たな可能性を開く可能性が高い。 Although coherent manipulation of electronic states can be achieved in quantum dot (QD) devices by harnessing nanofabrication tools, it is often hard to fathom the extent to which these nanoelectronic devices can behave quantum mechanically. Witnessing their nonclassical nature would thus remain of paramount importance in the emerging world of quantum technologies, since the coherent dynamics of electronic states plays there a crucial role. Against this backdrop, we resort to the general framework of Leggett-Garg inequalities (LGI) as it allows for distinguishing the classical and quantum transport through nanostructures by way of various two-time correlation functions. Using the local charge detection at two different time, we investigate here theoretically whether any quantum violation of the original LGI exists with varying device configurations and parameters under both Markovian and non-Markovian dynamics. Two-time correlators within LGI are derived in terms of the non-equilibrium Green's functions (NEGFs) by exactly solving the quantum Langevin equations. The present study of non-Markovian dynamics of quantum systems interacting with reservoirs is significant for understanding the relaxation phenomenon in the ultrafast transient regime to especially mimic what happens to high-speed quantum devices. We can potentially capture the effect of finite reservoir correlation time by accounting for level broadening at the electrodes along with non-Markovian memory effects. Furthermore, the large bias restriction is no longer imposed in our calculations so that we can safely consider a finite bias between the electronic reservoirs. Our approach is likely to open up new possibilities of witnessing the quantumness for other quantum many-body systems as well that are driven out of the equilibrium.	翻訳日:2024-06-19 06:54:55 公開日:2024-06-17
# 予測の公平な分布から社会財の公正な分布へ--機械学習が長期的失業に与える影響の評価 From the Fair Distribution of Predictions to the Fair Distribution of Social Goods: Evaluating the Impact of Fair Machine Learning on Long-Term Unemployment ( http://arxiv.org/abs/2401.14438v2 ) ライセンス: Link先を確認	Sebastian Zezulka, Konstantin Genin,	(参考訳) アルゴリズムによる政策の展開は、社会における重要な介入である。アルゴリズムフェアネスの代表的な方法は、特定の社会的文脈にアルゴリズムを配置した後に生じる社会財の分布よりも、訓練時の予測の分布に焦点を当てる。しかし、予測の「公正な」分布を必要とすることは、社会的商品の公平な分布を確立するための努力を損なう可能性がある。まず,この問題に対処するためには,展開後のソーシャルグッズ分布の変化を予見する予見的公正性の概念が必要であると論じる。第2に、この変更が事前デプロイデータから特定される正式な条件を提供する。それは、さまざまな種類のパフォーマンス効果を説明する必要がある。ここでは、予測が政策決定をどう変えるか、その結果、社会財の因果的下流分布に焦点をあてる。全体として、私たちは、公共行政からの申請によって導かれています。最近失業した人のうちの誰が長期的に失業し続け、労働市場プログラムで彼らを狙うかを予測するアルゴリズムの使用です。第3に、スイスの公共雇用サービスによる行政データを用いて、このようなアルゴリズムによるインフォームドポリシーが、長期失業における男女不平等にどのように影響するかをシミュレートする。リスク予測が統計的平等と機会の平等に従って「公正」である必要がある場合、ターゲティング決定は効果が低く、長期失業の全体的な水準を低くし、長期失業の男女格差を埋める努力を損なう。 Deploying an algorithmically informed policy is a significant intervention in society. Prominent methods for algorithmic fairness focus on the distribution of predictions at the time of training, rather than the distribution of social goods that arises after deploying the algorithm in a specific social context. However, requiring a "fair" distribution of predictions may undermine efforts at establishing a fair distribution of social goods. First, we argue that addressing this problem requires a notion of prospective fairness that anticipates the change in the distribution of social goods after deployment. Second, we provide formal conditions under which this change is identified from pre-deployment data. That requires accounting for different kinds of performative effects. Here, we focus on the way predictions change policy decisions and, consequently, the causally downstream distribution of social goods. Throughout, we are guided by an application from public administration: the use of algorithms to predict who among the recently unemployed will remain unemployed in the long term and to target them with labor market programs. Third, using administrative data from the Swiss public employment service, we simulate how such algorithmically informed policies would affect gender inequalities in long-term unemployment. When risk predictions are required to be "fair" according to statistical parity and equality of opportunity, targeting decisions are less effective, undermining efforts to both lower overall levels of long-term unemployment and to close the gender gap in long-term unemployment.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# HiFT:階層型フルパラメータ細調整戦略 HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy ( http://arxiv.org/abs/2401.15207v3 ) ライセンス: Link先を確認	Yongkang Liu, Yiqun Zhang, Qian Li, Tong Liu, Shi Feng, Daling Wang, Yifei Zhang, Hinrich Schütze,	(参考訳) 言語モデル(LM)を下流タスクに適応させる手段として,フルパラメータの微調整が選択肢となっている。 LMのサイズが大きくなるにつれて、LMの完全なパラメータを微調整するには、非常に大量のGPUメモリが必要である。既存のアプローチでは、ゼロオーダーオプティマイザを使用してGPUメモリを保存することで、非ゼロオーダーオプティマイザがほとんどのダウンストリームタスクに容易に収束する傾向があるため、LMのパフォーマンスを損なう可能性がある。本稿では,各学習段階におけるパラメータのサブセットのみを更新する,新しい最適化非依存型エンドツーエンド階層的微調整戦略であるHiFTを提案する。 HiFTは、GPUメモリに存在する勾配の量と最適化状態パラメータを同時に大幅に削減し、GPUメモリ使用量を減らすことができる。その結果,(1) HiFT はパラメータ効率の高いファインチューニングと標準のフルパラメータファインチューニングに匹敵する性能を達成できることがわかった。 2) HiFTは,AdamW,AdaGrad,SGDなど,さまざまなオプティマイザをサポートする。 (4) HiFTはメモリセーブ技術を用いることなく,AdamWオプティマイザを用いた精度32のシングル48G A6000上で7Bモデルのフルパラメータ微調整を可能にする。 Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60\% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# RE-GAINS & EnCHANT: クエリ応答強化のためのインテリジェントツール操作システム RE-GAINS & EnCHANT: Intelligent Tool Manipulation Systems For Enhanced Query Responses ( http://arxiv.org/abs/2401.15724v2 ) ライセンス: Link先を確認	Sahil Girhepuje, Siva Sankar Sajeev, Purvam Jain, Arya Sikder, Adithya Rama Varma, Ryan George, Akshay Govind Srinivasan, Mahendra Kurup, Ashmit Sinha, Sudip Mondal,	(参考訳) LLMの顕著な成功にもかかわらず、入力クエリやツール引数の記述が不十分なため、ツール呼び出しやツールチェーンに悩まされている。本稿では,RE-GAINSとEnCHANTという2つの新しいフレームワークを提案する。 EnCHANTはオープンソースのソリューションで、LLMフォーマットインクルーダー、LLM(OpenChat 3.5)、レトリバー(ToolBenchのAPI Retriever)を利用している。 RE-GAINSはOpenAIモデルと組み込みに基づいており、RAP論文に基づいた特別なプロンプトを使用している。どちらのソリューションもクエリ毎に0.01ドル以下でレイテンシが最小で、フレームワークの有用性を示している。 Despite the remarkable success of LLMs, they still suffer from tool invocation and tool chaining due to inadequate input queries and/or tool argument descriptions. We propose two novel frameworks, RE-GAINS and EnCHANT, enabling LLMs to tackle tool manipulation for solving complex user queries by making API calls. EnCHANT is an open-source solution that makes use of an LLM format enforcer, an LLM(OpenChat 3.5) and a retriever(ToolBench's API Retriever). RE-GAINS is based on OpenAI models and embeddings using a special prompt based on the RAP paper. Both solutions cost less than $0.01 per query with minimal latency, therefore showcasing the usefulness of the frameworks.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# 2つの石が1羽の鳥にぶつかる Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation ( http://arxiv.org/abs/2401.16421v2 ) ライセンス: Link先を確認	Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He,	(参考訳) 本研究では,言語系列の固有セグメンテーションを活用し,Bilevel Positional Encoding (BiPE)と呼ばれる新しい位置符号化法を設計する。それぞれの位置について、私たちのBiPEは、セグメント内エンコーディングとセグメント間エンコーディングをブレンドします。セグメント内エンコーディングはセグメント内の位置を識別し、絶対的な位置エンコーディングによってモデルがそこにある意味情報をキャプチャするのを助ける。セグメント間符号化はセグメントインデックスを規定し、セグメント間の関係をモデル化し、相対的な位置符号化による外挿能力の向上を目的としている。理論的分析は、この位置情報の絡み合いが学習をより効果的にすることを示している。実験結果から,BiPEは多種多様なテキストモダリティにおいて,幅広いタスクにまたがる長さの補間能力に優れていたことが示唆された。 In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# 言語モデルでは人間の学習者と同じ認知バイアスが解けるか? Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? ( http://arxiv.org/abs/2401.18070v2 ) ライセンス: Link先を確認	Andreas Opedal, Alessandro Stolfo, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan,	(参考訳) 認知モデルとして大規模言語モデル(LLM)を採用することへの関心が高まっている。このような目的のために、人間の認知のどの特性がLLMによって適切にモデル化されているかを理解することが中心であり、どちらがそうでないかを理解することが重要である。本研究では,算術語問題の解法において,子どもに知られている問題とLLMの偏りについて検討する。学習科学の文献を調査した結果、問題解決の過程は、テキスト理解、ソリューション計画、ソリューション実行の3つのステップに分けることができると仮定した。これらの段階において,現在のLSMが子どもと同じ認知バイアスを示すかどうかを理解するために,それぞれのテストを構築した。我々は,これらの各テストに対して,問題特徴のきめ細かい制御を可能にするニューロシンボリックアプローチを用いて,新しい単語問題を生成する。我々は,LLMがテキスト理解と解法計画の両方において人間的な偏見を示すが,算術式が実行されて解を得る最終段階には現れないことを示す。 There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the problem-solving process can be split into three distinct steps: text comprehension, solution planning and solution execution. We construct tests for each one in order to understand whether current LLMs display the same cognitive biases as children in these steps. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features. We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not in the final step, in which the arithmetic expressions are executed to obtain the answer.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# 情報完全量子計測のためのデュアルフレーム最適化 Dual frame optimization for informationally complete quantum measurements ( http://arxiv.org/abs/2401.18071v2 ) ライセンス: Link先を確認	Laurin E. Fischer, Timothée Dao, Ivano Tavernelli, Francesco Tacchino,	(参考訳) 古典的なシャドウのようなランダム化測定プロトコルは量子技術の強力なリソースを表しており、量子状態のキャラクタリゼーションやプロセストモグラフィーから機械学習やエラー軽減まで幅広い応用がある。近年、古典的な影をPOVM効果の双対作用素に一般化する測定双対フレームの概念が文献で再浮上している。このことは、しばしば確立された技術によって無視されるランダム化測定の処理後の段階において、さらなる自由度に注意を向けた。本研究では,2重フレームを利用して,情報的に完全な測定サンプルから改良された観測可能推定器を構築する。実験周波数に基づくパラメタライズド・フレーム・スーパーオペレータと最適化自由なデュアルフレームの新たなクラスを導入し,計算効率を保ちながら,その標準周波数よりも優れていることを示す。興味深いことに、これは量子や古典的なコストがほとんどないため、デュアルフレームの最適化はランダム化測定ツールボックスに価値ある追加となる。 Randomized measurement protocols such as classical shadows represent powerful resources for quantum technologies, with applications ranging from quantum state characterization and process tomography to machine learning and error mitigation. Recently, the notion of measurement dual frames, in which classical shadows are generalized to dual operators of POVM effects, resurfaced in the literature. This brought attention to additional degrees of freedom in the post-processing stage of randomized measurements that are often neglected by established techniques. In this work, we leverage dual frames to construct improved observable estimators from informationally complete measurement samples. We introduce novel classes of parametrized frame superoperators and optimization-free dual frames based on empirical frequencies, which offer advantages over their canonical counterparts while retaining computational efficiency. Remarkably, this comes at almost no quantum or classical cost, thus rendering dual frame optimization a valuable addition to the randomized measurement toolbox.	翻訳日:2024-06-19 06:45:07 公開日:2024-06-17
# 位置エンコーディングは、ニューラルネットワークが大きな語彙を扱うのに役立つ Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary ( http://arxiv.org/abs/2402.00236v2 ) ライセンス: Link先を確認	Takashi Morita,	(参考訳) 本研究では、位置エンコーディングがリカレントニューラルネットワーク(RNN)の学習を促進するという直感的な発見を報告する。位置符号化は入力データ上の時間指標の高次元表現である。最も有名なのは、位置エンコーディングは、データ順序を表現する固有のメカニズムが欠如しているトランスフォーマーニューラルネットワークの能力を補完するものである。対照的に、RNNはデータポイントの時間情報を自身でエンコードすることができ、位置エンコーディングの使用は冗長/不要のように見える。それにもかかわらず、合成ベンチマークによる調査は、特に低周波トークンを生成する大きな語彙を扱うために、位置符号化とRNNの結合の利点を明らかにしている。さらなる精査により、これらの低周波トークンがバニラRNNの勾配を不安定にし、位置エンコーディングがこの不安定を解消することが明らかになった。これらの結果は、トランスフォーマーのタイムキーパーとしての役割を超えて、位置エンコーディングの実用性に新たな光を当てた。 This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# 大規模言語モデルに基づくコードレビュー自動化のためのファインチューニングとプロンプトエンジニアリング Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation ( http://arxiv.org/abs/2402.00905v4 ) ライセンス: Link先を確認	Chanathip Pornprasit, Chakkrit Tantithamthavorn,	(参考訳) コンテキスト: 大規模言語モデル(LLM)の急速な進化は、コードレビュープロセスの自動化に彼らの能力を活用することに、大きな関心を喚起しました。以前の研究は、コードレビュー自動化のためのLLMの開発に注力することが多いが、高価なリソースを必要とするため、予算やリソースが限られている組織では不可能である。したがって、コードレビュー自動化にLLMを活用するための2つの一般的なアプローチは、微調整と迅速なエンジニアリングである。目的: LLMが微調整とプロンプトによって活用される場合の2つのコンテキストに基づいて,LLMベースのコードレビュー自動化の性能を検討することを目的とする。微調整には、特定のコードレビューデータセットでモデルをトレーニングすること、プロンプトには、特定のコードレビューデータセットを必要とせずに、モデル生成プロセスをガイドするための明確な命令を提供することが含まれる。方法: LLMベースのコードレビュー自動化において,モデルファインチューニングと推論技術(ゼロショット学習,少数ショット学習,ペルソナ)を活用する。総じて、2つのLCMベースのコードレビュー自動化(GPT-3.5とMagicoder)の12のバリエーションを調査し、それらをGuo et alのアプローチと既存のコードレビュー自動化アプローチ3つと比較する。結果: ゼロショット学習による GPT 3.5 の微調整により GPT-3.5 は 73.17% -74.23% の EM を達成することができる。さらに、GPT-3.5が微調整されていない場合、少数ショット学習のGPT-3.5は0ショット学習のGPT-3.5よりも46.38%から659.09%高いEMが得られる。結論: 結果から,(1) コードレビュー自動化のためのLLMは,最高のパフォーマンスを達成するために微調整する必要があること,(2) モデル微調整に十分なデータがない場合(例: コールドスタート問題)は,コードレビュー自動化のためのLLMにはペルソナを使わずに,ペルソナを使わなければならないこと,などが示唆された。 Context: The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation. Objective: We aim to investigate the performance of LLMs-based code review automation based on two contexts, i.e., when LLMs are leveraged by fine-tuning and prompting. Fine-tuning involves training the model on a specific code review dataset, while prompting involves providing explicit instructions to guide the model's generation process without requiring a specific code review dataset. Method: We leverage model fine-tuning and inference techniques (i.e., zero-shot learning, few-shot learning and persona) on LLMs-based code review automation. In total, we investigate 12 variations of two LLMs-based code review automation (i.e., GPT- 3.5 and Magicoder), and compare them with the Guo et al.'s approach and three existing code review automation approaches. Results: The fine-tuning of GPT 3.5 with zero-shot learning helps GPT-3.5 to achieve 73.17% -74.23% higher EM than the Guo et al.'s approach. In addition, when GPT-3.5 is not fine-tuned, GPT-3.5 with few-shot learning achieves 46.38% - 659.09% higher EM than GPT-3.5 with zero-shot learning. Conclusions: Based on our results, we recommend that (1) LLMs for code review automation should be fine-tuned to achieve the highest performance; and (2) when data is not sufficient for model fine-tuning (e.g., a cold-start problem), few-shot learning without a persona should be used for LLMs for code review automation.	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# Killer Apps: 高速で大規模なAI兵器 Killer Apps: Low-Speed, Large-Scale AI Weapons ( http://arxiv.org/abs/2402.01663v4 ) ライセンス: Link先を確認	Philip Feldman, Aaron Dant, James R. Foulds,	(参考訳) 人工知能(AI)と機械学習(ML)の急速な進歩は、OpenAI、Meta、Anthhropicといった組織による最先端のジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)モデルの開発によって強調され、戦争と安全保障における新たな課題と機会を提供する。現在注目されているのは、武器システムにおけるAIの統合と、速度論的衝突における迅速な意思決定におけるその役割である。しかし、同様に重要だが見落とされがちな側面は、情報領域内のインターネットスケールにおけるAIベースの心理的操作の可能性である。これらの能力は、世界中の個人、組織、社会に重大な脅威をもたらす可能性がある。本稿では,AI兵器の概念,その展開,検出,潜在的な対策について検討する。 The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# Minusformer: 逐次学習による時系列予測の改善 Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals ( http://arxiv.org/abs/2402.02332v3 ) ライセンス: Link先を確認	Daojun Liang, Haixia Zhang, Dongfeng Yuan, Bingzheng Zhang, Minggao Zhang,	(参考訳) 本稿では,ユビキタス時系列(TS)予測モデルが過度なオーバーフィッティングの傾向にあることを示す。この問題に対処するため,我々はTSの内在値を将来的な間隔で漸進的に再保存する非冗長アプローチを採用した。具体的には、ディープ・ブースティング・アンサンブル学習法である二重ストリーム・サブトラクション機構を導入する。そして、バニラ変換器は、情報集約機構を加算から減算に再配置することにより、更新される。そして、原モデルの各ブロックに補助出力分岐を組み込んで、最終的な予測につながるハイウェイを構築する。このブランチにおけるその後のモジュールの出力は、事前に学習した結果を減らし、モデルが監視信号の残余を層ごとに学習できるようにする。この設計により、学習駆動による入力ストリームと出力ストリームの漸進的分解が促進され、モデルの汎用性、解釈可能性、過度な適合に対するレジリエンスが向上する。モデル内のすべてのアグリゲーションはマイナス記号であるため、これはMinusformerと呼ばれる。大規模な実験により、提案手法は既存の最先端手法よりも優れており、様々なデータセットで平均11.9%の性能向上を実現している。このコードはhttps://github.com/Anoise/Minusformer.comでリリースされた。 In this paper, we find that ubiquitous time series (TS) forecasting models are prone to severe overfitting. To cope with this problem, we embrace a de-redundancy approach to progressively reinstate the intrinsic values of TS for future intervals. Specifically, we introduce a dual-stream and subtraction mechanism, which is a deep Boosting ensemble learning method. And the vanilla Transformer is renovated by reorienting the information aggregation mechanism from addition to subtraction. Then, we incorporate an auxiliary output branch into each block of the original model to construct a highway leading to the ultimate prediction. The output of subsequent modules in this branch will subtract the previously learned results, enabling the model to learn the residuals of the supervision signal, layer by layer. This designing facilitates the learning-driven implicit progressive decomposition of the input and output streams, empowering the model with heightened versatility, interpretability, and resilience against overfitting. Since all aggregations in the model are minus signs, which is called Minusformer. Extensive experiments demonstrate the proposed method outperform existing state-of-the-art methods, yielding an average performance improvement of 11.9% across various datasets.The code has been released at https://github.com/Anoise/Minusformer.	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# ポストホック解釈可能性と注意:数学的視点 Attention Meets Post-hoc Interpretability: A Mathematical Perspective ( http://arxiv.org/abs/2402.03485v2 ) ライセンス: Link先を確認	Gianluigi Lopardo, Frederic Precioso, Damien Garreau,	(参考訳) 注意に基づくアーキテクチャ、特にトランスフォーマーは、技術的な革命の中心にある。興味深いことに、幅広いアプリケーションにおける最先端の成果の獲得を支援することに加えて、アテンションメカニズムは本質的にモデルの内部動作に関する有意義な洞察を提供する。これらの洞察は説明として利用できますか? 物議を醸す。本稿では,簡単な注意に基づくアーキテクチャを数学的に研究し,ポストホックとアテンションに基づく説明の違いを指摘する。それらとは全く異なる結果が得られており、その制限にもかかわらず、ポストホック法は単に注意重みを調べるだけでなく、より有用な洞察を捉えることができることを示した。 Attention-based architectures, in particular transformers, are at the heart of a technological revolution. Interestingly, in addition to helping obtain state-of-the-art results on a wide range of applications, the attention mechanism intrinsically provides meaningful insights on the internal behavior of the model. Can these insights be used as explanations? Debate rages on. In this paper, we mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights.	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# 適応的勾配法で正方根を除去できるか? : 2次視点 Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective ( http://arxiv.org/abs/2402.03496v5 ) ライセンス: Link先を確認	Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani,	(参考訳) Adam(W)のような適応的な勾配最適化アルゴリズムは、トランスフォーマーのような多くのディープラーニングアーキテクチャのデフォルトのトレーニングアルゴリズムである。彼らの対角プレコンディショナーは、平方根を介してパラメータ更新に組み込まれた勾配外積に基づいている。これらの方法はしばしば近似二階法として動機付けされるが、平方根は基本的な違いを表す。本研究では,適応手法の動作が根の除去時にどのように変化するか,すなわち2階のモチベーションを強化するかを検討する。意外なことに、これらの平方根自由適応法は、変換器の性能を維持しながら、畳み込みアーキテクチャ上のSGDへの一般化ギャップを閉じている。 2階の視点はまた、プレコンディショナー不変性の概念を通じて非対角適応法を開発するための実践的な利点も持っている。シャンプーのような根ベースの手法とは対照的に、根のない手法は数値的に不安定な行列の根分解や逆変換を必要としないため、半精度でうまく高速に機能する。これは対角法と非対角法の計算ギャップを埋めるのに役立つ。本研究は適応手法の開発に関する新たな知見を提供し,現在見過ごされている適応性の役割について重要な疑問を提起する。 (実験コード:https://github.com/yorkerlin/remove-the-square-root Optimizationr code:https://github.com/f-dangel/sirfshampoo) Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental difference. In this work, we investigate how the behavior of adaptive methods changes when we remove the root, i.e. strengthen their second-order motivation. Surprisingly, we find that such square-root-free adaptive methods close the generalization gap to SGD on convolutional architectures, while maintaining their root-based counterpart's performance on transformers. The second-order perspective also has practical benefits for developing non-diagonal adaptive methods through the concept of preconditioner invariance. In contrast to root-based methods like Shampoo, root-free counterparts work well and fast with half-precision since they do not require numerically unstable matrix root decompositions and inversions. This is useful to bridge the computation gap between diagonal and non-diagonal methods. Our findings provide new insights into the development of adaptive methods and raise important questions regarding the currently overlooked role of adaptivity for their success. (experiment code: https://github.com/yorkerlin/remove-the-square-root optimizer code: https://github.com/f-dangel/sirfshampoo)	翻訳日:2024-06-19 06:35:20 公開日:2024-06-17
# AIフィードバックによる強化学習を用いたビデオ用大規模マルチモーダルモデルのチューニング Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ( http://arxiv.org/abs/2402.03746v3 ) ライセンス: Link先を確認	Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi,	(参考訳) 近年の大規模言語モデルの発展は,ビデオ大マルチモーダルモデル(VLMM)の開発に影響を及ぼしている。 VLMMの以前のアプローチには、命令調整されたデータセットを使用したSupervised Fine-Tuning (SFT)、ビジュアルエンコーダとLLMの統合、学習可能なモジュールの追加が含まれていた。ビデオとテキストのマルチモーダルアライメントは、主にテキストのみのデータと比較して、マルチモーダル命令・トゥンデータのボリュームと品質が不足しているため、依然として困難である。本稿では,AIフィードバックからの強化学習(Reinforcement Learning from AI Feedback, RLAIF)と呼ばれる,マルチモーダルAIシステムを利用した新たなアライメント戦略を提案する。具体的には、映像コンテンツの理解を深めるため、好みフィードバックの生成中に、詳細な映像記述を文脈として提供し、文脈対応報酬モデルを提案する。我々のマルチモーダルRLAIFアプローチであるVLM-RLAIFはSFTモデルを含む既存の手法よりも優れています。私たちは、この分野のさらなる研究を促進するために、コード、モデル、データセットをオープンソース化することを約束します。 Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume and quality of multimodal instruction-tune data compared to text-only data. We present a novel alignment strategy that employs multimodal AI system to oversee itself called Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. In specific, we propose context-aware reward modeling by providing detailed video descriptions as context during the generation of preference feedback in order to enrich the understanding of video content. Demonstrating enhanced performance across diverse video benchmarks, our multimodal RLAIF approach, VLM-RLAIF, outperforms existing approaches, including the SFT model. We commit to open-sourcing our code, models, and datasets to foster further research in this area.	翻訳日:2024-06-19 06:25:35 公開日:2024-06-17
# 中性原子量子プロセッサを用いたブレンダー分解を用いた混合整数線形計画法 Mixed Integer Linear Programming Solver Using Benders Decomposition Assisted by Neutral Atom Quantum Processor ( http://arxiv.org/abs/2402.05748v2 ) ライセンス: Link先を確認	M. Yassine Naghmouchi, Wesley da Silva Coelho,	(参考訳) 本稿では,中性原子量子計算を用いたMILP(Mixed Integer Linear Programming)の解法を提案する。我々は,MILPをマスター問題 (MP) とサブプロブレム (SP) に分割するためにBenders decomposition (BD) を適用し,MP を擬似非拘束バイナリ最適化 (QUBO) モデルに変換した後,中性原子デバイスで処理する。我々のMILPからQUBOへの変換は、関連する連続変数の上限を狭め、必要量子ビット数とアルゴリズムの収束に肯定的に影響を及ぼす。 QUBOを解くため、我々は原子レジスタ埋め込みのためのヒューリスティックを開発し、パルス整形のための変分アルゴリズムを適用した。さらに、既存のソリューションよりも優れたPoC(Proof of Concept)を実装します。我々のアルゴリズムは,MPを擬似アニーリングを用いて解いた古典的BD手法よりも優れた,高品質な実現可能な解の95%以上を同定する。我々の知る限り、この研究は、BDを通してMILPを解くための、自動化された問題に依存しないフレームワークを開発する際に、中性原子量子プロセッサを利用する最初のものである。 This paper presents a new hybrid classical-quantum approach to solve Mixed Integer Linear Programming (MILP) using neutral atom quantum computations. We apply Benders decomposition (BD) to segment MILPs into a master problem (MP) and a subproblem (SP), where the MP is addressed using a neutral-atom device, after being transformed into a Quadratic Unconstrained Binary Optimization (QUBO) model, with an automatized procedure. Our MILP to QUBO conversion tightens the upper bounds of the involved continuous variables, positively impacting the required qubit count, and the convergence of the algorithm. To solve the QUBO, we develop a heuristic for atom register embedding and apply a variational algorithm for pulse shaping. In addition, we implement a Proof of Concept (PoC) that outperforms existing solutions. We also conduct preliminary numerical results: in a series of small MILP instances our algorithm identifies over 95 percent of feasible solutions of high quality, outperforming classical BD approaches where the MP is solved using simulated annealing. To the best of our knowledge, this work is the first to utilize a neutral atom quantum processor in developing an automated, problem-agnostic framework for solving MILPs through BD.	翻訳日:2024-06-19 06:25:35 公開日:2024-06-17
# 大規模言語モデルにおけるゼロ次フェデレーションチューニングの収束性について On the Convergence of Zeroth-Order Federated Tuning for Large Language Models ( http://arxiv.org/abs/2402.05926v3 ) ライセンス: Link先を確認	Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen,	(参考訳) FL(Federated Learning)とLLM(Large Language Models)の合流は、プライバシを保存する自然言語処理の新しい時代を後押ししている。しかし、微調整LDMのメモリ要求は、特に限られた計算資源を持つクライアントにデプロイする場合、大きな課題を生じさせる。これを回避するために、フェデレーション設定におけるメモリ効率ゼロ階最適化の新たな統合、すなわちFedMeZOというシナジーについて検討する。本研究では, LLMの文脈におけるFedMeZOの理論的基盤について, 大きなパラメータ空間が最適化行動に与える影響, 収束特性の確立, パーソナライズされたフェデレーション戦略を伝えるための重要なパラメータの同定について, 主要な疑問に対処する。 FedMeZOは従来のFedAvgのような一階法よりも高速に収束するだけでなく、トレーニング中のGPUメモリ使用量を推論時に同等のレベルまで大幅に削減することを示す。さらに、クライアントの学習率をカスタマイズするための理論的洞察に基づくパーソナライズされたFL戦略は、損失削減を効果的に加速させることができる。我々は,LLMのフェデレーションファインチューニングの理論的および実践的な側面を橋渡しし,この分野のさらなる進歩と研究を促進することを願っている。 The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-Order Optimization within a federated setting, a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs, tackling key questions regarding the influence of large parameter spaces on optimization behavior, the establishment of convergence properties, and the identification of critical parameters for convergence to inform personalized federated strategies. Our extensive empirical evidence supports the theory, showing that FedMeZO not only converges faster than traditional first-order methods such as FedAvg but also significantly reduces GPU memory usage during training to levels comparable to those during inference. Moreover, the proposed personalized FL strategy that is built upon the theoretical insights to customize the client-wise learning rate can effectively accelerate loss reduction. We hope our work can help to bridge theoretical and practical aspects of federated fine-tuning for LLMs, thereby stimulating further advancements and research in this area.	翻訳日:2024-06-19 06:25:35 公開日:2024-06-17
# 対話型エージェント基礎モデル An Interactive Agent Foundation Model ( http://arxiv.org/abs/2402.05929v2 ) ライセンス: Link先を確認	Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang,	(参考訳) 人工知能システムの開発は、静的なタスク固有のモデルから、幅広いアプリケーションでうまく機能する動的エージェントベースのシステムへと移行しつつある。多様なドメイン、データセット、タスクにまたがるAIエージェントのトレーニングに、新しいマルチタスクエージェントトレーニングパラダイムを使用するインタラクティブエージェント財団モデルを提案する。私たちのトレーニングパラダイムは、ビジュアルマスク付きオートエンコーダ、言語モデリング、次世代予測など、さまざまな事前トレーニング戦略を統合することで、汎用的で適応可能なAIフレームワークを実現しています。私たちは、ロボティクス、ゲームAI、ヘルスケアという3つの異なる領域でフレームワークのパフォーマンスを実演します。本モデルでは,各領域において意味的かつ文脈的に関係のある出力を生成する能力を示す。提案手法の強みは,ロボットシーケンス,ゲームプレイデータ,大規模ビデオデータセット,テキスト情報など,さまざまなデータソースを有効マルチモーダル・マルチタスク学習に活用することにある。我々のアプローチは、ジェネラリスト、アクションテイク、マルチモーダルシステムを開発するための有望な道を提供する。 The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.	翻訳日:2024-06-19 06:25:35 公開日:2024-06-17
# LLMにおける復号法の検討 A Thorough Examination of Decoding Methods in the Era of LLMs ( http://arxiv.org/abs/2402.06925v2 ) ライセンス: Link先を確認	Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam,	(参考訳) 復号法は、次世代の予測器から実用的なタスク解決器に言語モデルを変換する上で、必須の役割を果たす。主にタスク固有モデルに焦点を当てた復号法に関する先行研究は、汎用大規模言語モデル(LLM)の現在まで及ばない可能性がある。さらに、最近のデコード戦略の流入により、この状況はさらに複雑になっている。本稿では,LLMのコンテキスト内での様々なデコード手法の包括的かつ多面的解析を行い,その性能,ハイパーパラメータ変化に対する堅牢性,幅広いタスク,モデル,デプロイメント環境におけるデコード速度を評価する。その結果,復号法の性能は特にタスク依存的であり,アライメント,モデルサイズ,量子化などの要因に影響されていることが明らかとなった。興味深いことに、感度分析は、広範囲なハイパーパラメータチューニングのコストにおいて、特定の手法が優れたパフォーマンスを達成することを明らかにし、最適な結果と様々な状況における実装の実践性との間のトレードオフを強調している。 Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.	翻訳日:2024-06-19 06:25:35 公開日:2024-06-17
# 縦型胸部X線における視覚的質問応答の事前学習モデル Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays ( http://arxiv.org/abs/2402.08966v2 ) ライセンス: Link先を確認	Yeongjae Cho, Taehee Kim, Heejun Shin, Sungzoon Cho, Dongmyung Shin,	(参考訳) 差分視覚質問応答(diff-VQA)は、画像間の差分に基づいて複雑な質問に答えることを必要とする課題である。この課題は胸部X線画像の読影において特に重要であり, 放射線科医は疾患の進行と重症度の変化を追跡するために, 異なる時期に撮影された同一患者の複数の画像と比較することが多い。しかし、以前の研究はdiff-VQAタスクのための特定のネットワークアーキテクチャの設計に重点を置いており、事前訓練された視覚言語モデル(VLM)を使用してモデルの性能を向上させる機会を欠いていた。本稿では,diff-VQAタスクのための自然および縦部胸部X線データに基づいて,PLURALと呼ばれる新しいVLMを提案する。このモデルはステップバイステップのアプローチで開発され、まず自然画像やテキストで事前訓練され、続いて縦型胸部X線データを用いて訓練される。縦方向のデータは、X線画像の対と、時間とともに肺の異常や疾患の変化を記述した質問・回答セットと放射線技師の報告で構成されている。実験結果から,PLURALモデルは縦X線に対するdiff-VQAだけでなく,1枚のX線画像に対する従来のVQAにおいても,最先端の手法よりも優れていることがわかった。広範にわたる実験により,提案するVLMアーキテクチャの有効性と,モデルの性能向上のための事前学習手法の有効性を実証した。 Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images. This task is particularly important in reading chest X-ray images because radiologists often compare multiple images of the same patient taken at different times to track disease progression and changes in its severity in their clinical practice. However, previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance using a pretrained vision-language model (VLM). Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task. The model is developed using a step-by-step approach, starting with being pretrained on natural images and texts, followed by being trained using longitudinal chest X-ray data. The longitudinal data consist of pairs of X-ray images, along with question-answer sets and radiologist's reports that describe the changes in lung abnormalities and diseases over time. Our experimental results show that the PLURAL model outperforms state-of-the-art methods not only in diff-VQA for longitudinal X-rays but also in conventional VQA for a single X-ray image. Through extensive experiments, we demonstrate the effectiveness of the proposed VLM architecture and pretraining method in improving the model's performance.	翻訳日:2024-06-19 06:15:51 公開日:2024-06-17
# 生成型大規模言語モデルにおける確率論的推論 Probabilistic Reasoning in Generative Large Language Models ( http://arxiv.org/abs/2402.09614v2 ) ライセンス: Link先を確認	Aliakbar Nafar, Kristen Brent Venable, Parisa Kordjamshidi,	(参考訳) 本稿では,大言語モデル (LLM) が,確率値を介して明示的に定量化される不確実性を含む情報を含むテキストを推論する際に直面する課題について考察する。この種の推論は、日常的な会話から医療的な意思決定まで、さまざまな文脈に関係している。 LLMの数学的推論能力は改善されているものの、確率論的推論に関しては依然として重大な困難を呈している。この問題に対処するために,LLMの確率論的推論能力をテストするために設計された新しいデータセットであるBayesian Linguistic Inference Dataset (BLInD)を導入する。 BLInD を用いて確率論的推論を含むタスクにおいて LLM の限界を明らかにする。さらに,Pythonのコードや確率論的アルゴリズム,確率論的論理プログラミングなど,様々な形式表現に問題をマッピングするいくつかのプロンプト戦略を提案する。我々は,BLInDにおける手法の評価と因果推論質問応答データセットの適応を提供することで結論付けた。実験結果から,複数のLSMに対する提案手法の有効性が明らかになった。 This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We use BLInD to find out the limitations of LLMs for tasks involving probabilistic reasoning. In addition, we present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and an adaptation of a causal reasoning question-answering dataset. Our empirical results highlight the effectiveness of our proposed strategies for multiple LLMs.	翻訳日:2024-06-19 06:15:51 公開日:2024-06-17
# InSaAF: 正確性と公正性による安全性の確立 : LLMsはインド法定領域に向けて準備が整っているか? InSaAF: Incorporating Safety through Accuracy and Fairness \| Are LLMs ready for the Indian Legal Domain? ( http://arxiv.org/abs/2402.10567v4 ) ライセンス: Link先を確認	Yogesh Tripathi, Raghav Donakanti, Sahil Girhepuje, Ishan Kavathekar, Bhaskara Hanuma Vedula, Gokul S Krishnan, Shreya Goyal, Anmol Goel, Balaraman Ravindran, Ponnurangam Kumaraguru,	(参考訳) 近年の言語技術と人工知能の進歩により、判断の予測から要約の生成に至るまで、法域における様々なタスクを実行するために多くの言語モデルが提案されている。その大きな可能性にもかかわらず、これらのモデルは学習し、社会的偏見を示し、不公平な予測を行うことが証明されている。本研究では,大規模言語モデル(LLM)の社会的要因が関与するインドの景観における法的タスクを遂行する能力について検討する。 LLMの公平性と正確性の両方をカプセル化した新しい計量である$\beta$-weighted $\textit{Legal Safety Score (LSS_{\beta}$)} を提示する。我々は,インド社会における様々な格差の軸に関する課題と公正な展示において,その性能を考慮し,LCMsの安全性を評価する。 LLaMAとLLaMA--2モデルのタスク性能と公平性スコアは、提案されたLSS_{\beta}$メトリックが、法分野における安全な使用のためのモデルの可読性を効果的に決定できることを示している。また、偏見を緩和し、モデルの安全性を改善するための潜在的方法として、特別法データセットを利用した微調整パイプラインを提案する。LLaMAとLLaMA--2モデルの微調整手順は、LSS_{\beta}$を増大させ、インドの法域におけるユーザビリティを向上させる。私たちのコードは公開されています。 Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $\beta$-weighted $\textit{Legal Safety Score ($LSS_{\beta}$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_{\beta}$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_{\beta}$, improving their usability in the Indian legal domain. Our code is publicly released.	翻訳日:2024-06-19 06:15:51 公開日:2024-06-17
# Absinstruct: 可塑性推定による説明調整によるLLMからの抽象化能力の排除 AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation ( http://arxiv.org/abs/2402.10646v2 ) ライセンス: Link先を確認	Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See,	(参考訳) 抽象化能力は人間の知性において不可欠であり、NLP研究における様々なタスクにも有用である。既存の研究によると、LLMは抽象能力に欠けており、その改善方法はまだ解明されていない。本研究では,命令チューニングによるLLMの抽象化能力を向上するフレームワークAbsInstructを設計する。このフレームワークは、LLMが抽象化の根底にある理論的根拠を捉えるのを支援するために、詳細な説明で命令を構築する。一方,LLMの抽象的知識とより整合した命令を選択するための可視性推定器を導入する。そして、このフレームワークは抽象化命令と汎用命令を組み合わせてハイブリッドデータセットを構築する。大規模な実験と分析により,LLMの抽象化能力は,一般的な命令追従能力を維持しつつ,高い一般化性能で大幅に向上できることが示された。 Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs' abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LLMs in capturing the underlying rationale of abstraction. Meanwhile, we introduce a plausibility estimator to select instructions that are more consistent with the abstraction knowledge of LLMs to be aligned. Then, our framework combines abstraction instructions with general-purpose ones to build a hybrid dataset. Extensive experiments and analyses demonstrate that our framework can considerably enhance LLMs' abstraction ability with strong generalization performance while maintaining their general instruction-following abilities.	翻訳日:2024-06-19 06:15:51 公開日:2024-06-17
# 効率的な言語モデル推論のための言語間語彙適応に関する実証的研究 An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference ( http://arxiv.org/abs/2402.10712v2 ) ライセンス: Link先を確認	Atsuki Yamaguchi, Aline Villavicencio, Nikolaos Aletras,	(参考訳) 最先端の生成型大言語モデル(LLM)の開発は、英語中心のトークン化器、語彙、事前学習データに依存している。 LLMには多言語機能があるにもかかわらず、近年の研究では、英語以外の言語でテキストを生成する際に、推論効率が低下することが示されている。その結果、推論時間とコストが増加する。下流の性能向上を目的としたターゲット言語にモデルを適用するために,言語間語彙適応法 (CVA) が提案されている。しかし, 生成LDMの推論効率向上に対するこれらの手法の有効性については, 未だ検討されていない。本稿では,4つの言語と4つの自然言語理解タスクにおける4つの生成LLM(単言語モデルと多言語モデルを含む)に対する5つのCVA手法の実証的研究を行う。 CVA は LLM の推論速度を最大 271.5 % まで向上させる。また、よりバランスの取れた多言語データに事前学習されたLLMを適応させることで、元のモデルに匹敵するダウンストリーム性能が得られることを示す。 The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation (CVA) methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of five CVA methods on four generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that CVA substantially contributes to LLM inference speedups of up to 271.5\%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.	翻訳日:2024-06-19 06:15:51 公開日:2024-06-17
# LLMシミュレーションにおけるペルソナ効果の定量化 Quantifying the Persona Effect in LLM Simulations ( http://arxiv.org/abs/2402.10811v2 ) ライセンス: Link先を確認	Tiancheng Hu, Nigel Collier,	(参考訳) 大規模言語モデル(LLM)は、人間の言語と振る舞いをシミュレートする際、顕著な可能性を示してきた。本研究では,パーソナ変数のデコグラフィ,社会的,行動的要因の統合がLLMの多様な視点をシミュレートする能力にどのように影響するかを検討する。既存の主観的NLPデータセットにおいて,ペルソナ変数はアノテーションの10%のばらつきを考慮に入れている。それでも、LSMのプロンプトによるペルソナ変数の導入は、控えめではあるが統計的に有意な改善をもたらす。ペルソナのプロンプトは、多くのアノテーターが同意しないサンプルにおいて最も効果的であるが、それらの不一致は比較的小さい。ペルソナ変数と人間のアノテーションの相関が強くなるほど、LSMの予測がより正確になる。ゼロショット設定では、ペルソナを誘導する強力な70bモデルが、基底真理アノテーションに基づいて訓練された線形回帰によって達成可能なアノテーションの81%をキャプチャする。しかしながら、ペルソナ変数が説明力に制限があるほとんどの主観的NLPデータセットでは、ペルソナプロンプトの利点は限られている。 Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures 81% of the annotation variance achievable by linear regression trained on ground truth annotations. However, for most subjective NLP datasets, where persona variables have limited explanatory power, the benefits of persona prompting are limited.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# 偽装検出はより深くなるか? 偽装推論のためのデータセット, 評価, ベンチマーク Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning ( http://arxiv.org/abs/2402.11432v2 ) ライセンス: Link先を確認	Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao,	(参考訳) 虚偽検出は、現実のシナリオにおける重要性から注目を集めている。その主な目的は、ジェスチャー、表情、韻律など、マルチモーダルな手がかりから欺く行動を検出することである。しかしながら、これらの基盤は通常主観的であり、個人の習慣に関係している。そこで我々は, 虚偽検出を虚偽推論に拡張し, さらに主観的判断を支持する客観的な証拠を提供する。具体的には、潜在的な嘘と基本的な事実を提供し、その背景にある事実の矛盾と意図を組み合わせることによって、この文が嘘である可能性がある理由を分析する。偽造検出と比較すると、このタスクは現実世界のシナリオにもより適用可能である。例えば、尋問においては、警察は確固たる証拠に基づいて嘘をついているかどうかを判断すべきである。本稿では,データセットの構築や評価指標の定義など,この課題に対する最初の試みについて述べる。一方、このタスクは、大規模言語モデルの複雑な推論能力を評価するためのベンチマークとして機能する。コードとデータは公開されます。 Deception detection has attracted increasing attention due to its importance in real-world scenarios. Its main goal is to detect deceptive behaviors from multimodal clues such as gestures, facial expressions, prosody, etc. However, these bases are usually subjective and related to personal habits. Therefore, we extend deception detection to deception reasoning, further providing objective evidence to support subjective judgment. Specifically, we provide potential lies and basic facts and then analyze why this sentence may be a lie by combining factual inconsistencies and intent behind them. Compared with deception detection, this task is more applicable to real-world scenarios. For example, in interrogation, the police should judge whether a person is lying based on solid evidence. This paper presents our initial attempts at this task, including constructing a dataset and defining evaluation metrics. Meanwhile, this task can serve as a benchmark for evaluating the complex reasoning capability of large language models. Code and data will be made publicly available.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# alaVA:ライトビジョン言語モデルのためのGPT4V合成データのハーネス化 ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models ( http://arxiv.org/abs/2402.11684v2 ) ライセンス: Link先を確認	Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang,	(参考訳) 大規模視覚言語モデル(LVLM)は、その強力な推論と一般化能力を備えた幅広い視覚言語タスクの前提を示してきた。しかし、訓練と配備にはかなりの計算資源が必要である。本研究では,従来のLVLMとリソースフレンドリなライトバージョンのパフォーマンスギャップを,高品質なトレーニングデータを用いて橋渡しすることを目的とする。そこで本研究では,合成データセットを生成するための包括的パイプラインを提案する。鍵となるアイデアは、強力なプロプライエタリなモデルを利用して生成することです。 (i)視覚言語アライメントのためのきめ細かい画像アノテーション (II)視覚指導微調整のための複合推論視覚質問応答ペアは、合計1.3Mサンプルを得る。合成データセット上で一連のライトVLMを訓練し,提案手法の有効性を実証し, 4B LVLM間で17のベンチマークで競合性能を達成し, 各種ベンチマークで7B/13Bスケールモデルと同等の性能を示す。この研究は、より効率的なLVLMを作成する際に、高品質なデータを採用する可能性を強調している。当社のデータセットであるtextit{ALLaVA} をオープンソースとして公開し、リソース効率のよい LVLM を広く活用するための研究コミュニティに公開しています。 Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To this end, we propose a comprehensive pipeline for generating a synthetic dataset. The key idea is to leverage strong proprietary models to generate (i) fine-grained image annotations for vision-language alignment and (ii) complex reasoning visual question-answering pairs for visual instruction fine-tuning, yielding 1.3M samples in total. We train a series of lite VLMs on the synthetic dataset and experimental results demonstrate the effectiveness of the proposed scheme, where they achieve competitive performance on 17 benchmarks among 4B LVLMs, and even perform on par with 7B/13B-scale models on various benchmarks. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. We name our dataset \textit{ALLaVA}, and open-source it to research community for developing better resource-efficient LVLMs for wider usage.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# ROSEはそうしない: 逆プロンプトコントラストデコーディングによる命令付き大規模言語モデルの安全性を高める ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding ( http://arxiv.org/abs/2402.11889v2 ) ライセンス: Link先を確認	Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao,	(参考訳) 命令調整型大規模言語モデル(LLM)の開発により,LLMの安全性の向上がますます重要になっている。しかしながら、LLMの出力を期待される安全性に合わせるための現在のアプローチは、通常、高品質の安全データや高価な計算資源といった、費用がかかり非効率な訓練努力を必要とする。そこで本研究では,既存の命令調整 LLM の安全性を,追加の訓練を伴わずに直接向上させる,逆プロンプトコントラスト復号法 (ROSE) を提案する。 ROSEの原理は、慎重に設計された逆プロンプトによって引き起こされる望ましくない出力を抑えることにより、所望の安全出力の確率を改善することである。 6つの安全性と2つの汎用タスクの実験から、ROSEは5種類の命令調整LDMに対して、一貫した、重要な安全性向上(+13.8%の安全性スコア)をもたらすだけでなく、LLMの汎用能力にも恩恵をもたらすことが示されている。詳細な分析では、ROSEの基盤となるメカニズムを探求し、いつ、どこで使用するかを明らかにする。 With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt contrastive decoding (ROSE), a simple-yet-effective method to directly boost the safety of existing instruction-tuned LLMs without any additional training. The principle of ROSE is to improve the probability of desired safe output via suppressing the undesired output induced by the carefully-designed reverse prompts. Experiments on 6 safety and 2 general-purpose tasks show that, our ROSE not only brings consistent and significant safety improvements (up to +13.8% safety score) upon 5 types of instruction-tuned LLMs, but also benefits the general-purpose ability of LLMs. In-depth analyses explore the underlying mechanism of ROSE, and reveal when and where to use it.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# 自己回帰型言語モデルにおける知識蒸留の再検討 Revisiting Knowledge Distillation for Autoregressive Language Models ( http://arxiv.org/abs/2402.11890v2 ) ライセンス: Link先を確認	Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, Dacheng Tao,	(参考訳) 知識蒸留(KD)は、より小さな学生モデルを訓練することで、推論コストとメモリフットプリントを減らすために教師モデルを圧縮する一般的な手法である。しかし、自己回帰言語モデル(LM)の文脈では、より大きな教師のLMが劇的に貧しい学生に繋がる可能性があることを実証的に見出した。この問題に対処するため、我々は一連の分析を行い、異なるトークンが異なる指導モードを持ち、性能劣化につながることを明らかにした。そこで本研究では,KD を改善するための簡易かつ効果的な適応型教育手法 (ATKD) を提案する。 ATKDの中核は、ロート学習を減らし、教育をより多様で柔軟なものにすることだ。 8つのLMタスクに関する大規模な実験は、ATKDの助けを借りて、様々なベースラインのKD手法が、すべてのモデルタイプとサイズに対して一貫した、重要なパフォーマンス向上(平均スコア+3.04%)を達成することを示した。より奨励的に、ATKDは学生モデルの一般化を効果的に改善することができる。 Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that different tokens have different teaching modes, neglecting which will lead to performance degradation. Motivated by this, we propose a simple yet effective adaptive teaching approach (ATKD) to improve the KD. The core of ATKD is to reduce rote learning and make teaching more diverse and flexible. Extensive experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains (up to +3.04% average score) across all model types and sizes. More encouragingly, ATKD can improve the student model generalization effectively.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# Self-AMPLIFY: セルフポストホック説明による小言語モデルの改善 Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations ( http://arxiv.org/abs/2402.12038v3 ) ライセンス: Link先を確認	Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot,	(参考訳) インプロンプトとインコンテキスト学習(ICL)に自然言語の合理性を組み込むことで、LLM(Large Language Models)のパフォーマンスが大幅に向上した。しかし、高品質な合理性を生成するには、人間のアノテーションや補助的なプロキシモデルの使用が必要である。そこで本研究では,Small Language Models (SLM) に適用したポストホックな説明手法から論理式を自動的に生成する自己AMPLIFYを提案する。 Self-AMPLIFYは、サンプルをターゲットとし、合理性を生成し、ICLを活用するための最後のプロンプトを構築する3段階のメソッドである。自己AMPLIFYのパフォーマンスは、強力な推論能力を必要とする4つのSLMと5つのデータセットで評価される。 Self-AMPLIFYは競争相手に対して良い結果をもたらし、高い精度の改善をもたらす。 Self-AMPLIFYは、自己回帰型言語モデルにポストホックな説明法を適用した最初の方法である。 Incorporating natural language rationales in the prompt and In-Context Learning (ICL) have led to a significant improvement of Large Language Models (LLMs) performance. However, generating high-quality rationales require human-annotation or the use of auxiliary proxy models. In this work, we propose Self-AMPLIFY to automatically generate rationales from post hoc explanation methods applied to Small Language Models (SLMs) to improve their own performance. Self-AMPLIFY is a 3-step method that targets samples, generates rationales and builds a final prompt to leverage ICL. Self-AMPLIFY performance is evaluated on four SLMs and five datasets requiring strong reasoning abilities. Self-AMPLIFY achieves good results against competitors, leading to strong accuracy improvement. Self-AMPLIFY is the first method to apply post hoc explanation methods to autoregressive language models to generate rationales to improve their own performance in a fully automated manner.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# チェックの学習:大規模言語モデルにおける自己補正の可能性 Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models ( http://arxiv.org/abs/2402.13035v3 ) ライセンス: Link先を確認	Che Zhang, Zhenyang Xiao, Chengcheng Han, Yixin Lian, Yuejian Fang,	(参考訳) 自己補正は、大きな言語モデル(LLM)から生成された出力のスタイルと安全性を向上させることで、驚くべき成果を上げている。しかし、近年の研究では、LLMが論理的誤りを特定するのが困難であることから、自己訂正は限定的あるいは非生産的である可能性が示唆されている。本稿では,タスクチェックのためのトレーニングデータを構築することで,LCMの自己チェック能力を向上させることを目的とする。具体的には、思考の連鎖(CoT)手法を自己チェックタスクに適用し、ステップレベルの詳細な分析と説明を利用して推論経路の正しさを評価する。我々は「ステップCoTチェック」と呼ばれる特殊なチェックフォーマットを提案する。このフォーマットに従うと、ステップバイステップの分析とチェックを含むチェック補正データセットを構築する。次に,LLMの微調整を行い,その誤り検出と補正能力を向上させる。実験により,複数のベンチマークでLLMの自己検査と自己補正能力を大幅に向上させることを示す。このアプローチは、特に不正な位置の特定において他のフォーマットよりも優れ、より大きなモデルでより大きな利点が観察される。再現性のために、すべてのデータセットとコードはhttps://github.com/bammt/Learn-to-checkで提供されている。 Self-correction has achieved impressive results in enhancing the style and security of the generated output from large language models (LLMs). However, recent studies suggest that self-correction might be limited or even counterproductive in reasoning tasks due to LLMs' difficulties in identifying logical mistakes. In this paper, we aim to enhance the self-checking capabilities of LLMs by constructing training data for checking tasks. Specifically, we apply the Chain of Thought (CoT) methodology to self-checking tasks, utilizing fine-grained step-level analyses and explanations to assess the correctness of reasoning paths. We propose a specialized checking format called "Step CoT Check". Following this format, we construct a checking-correction dataset that includes detailed step-by-step analysis and checking. Then we fine-tune LLMs to enhance their error detection and correction abilities. Our experiments demonstrate that fine-tuning with the "Step CoT Check" format significantly improves the self-checking and self-correction abilities of LLMs across multiple benchmarks. This approach outperforms other formats, especially in locating the incorrect position, with greater benefits observed in larger models. For reproducibility, all the datasets and code are provided in https://github.com/bammt/Learn-to-check.	翻訳日:2024-06-19 06:06:06 公開日:2024-06-17
# 視覚言語モデルにおける社会バイアス評価のための統一フレームワークとデータセット A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models ( http://arxiv.org/abs/2402.13636v2 ) ライセンス: Link先を確認	Ashutosh Sathe, Prachi Jain, Sunayana Sitaram,	(参考訳) ヴィジュアル言語モデル(VLM)は、産業とアカデミックの両方で広く採用されている。本研究では,VLMにおける職業に関する性別,人種,年齢の偏りを体系的に評価するための統一的な枠組みを提案する。我々の評価は、画像からテキストへ、テキストからテキストへ、テキストから画像へ、画像から画像へを含む、最近のVLMでサポートされているすべての推論モードを含む。さらに、生成したテキストと画像の両方において、異なる専門分野にわたる性別、人種、年齢情報を意図的に隠蔽する高品質な合成データセットを生成する自動パイプラインを提案する。データセットには、各専門職のアクションベースの記述が含まれており、視覚言語モデル(VLM)における社会的バイアスを評価するためのベンチマークとして機能している。広範に使用されているVLMの比較分析では,入力出力モードの変動が,バイアスの大きさと方向の差を識別できることを示した。さらに, VLMモデルでは, 異なるバイアス特性に対して, 異なるバイアス特性を示すことが判明した。私たちの仕事は、VLMの改善における今後の進歩を、社会的に偏見のない表現を学ぶのに役立つことを願っています。データとコードを公開します。 Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible differences in bias magnitudes and directions. Additionally, we find that VLM models exhibit distinct biases across different bias attributes we investigated. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# KinIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection (英語) KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection ( http://arxiv.org/abs/2402.13671v2 ) ライセンス: Link先を確認	Michal Spiegel, Dominik Macko,	(参考訳) SemEval-2024 Task 8は、マルチジェネレータ、マルチドメイン、マルチランガルブラックボックスマシン生成テキスト検出に重点を置いている。このような検出は、言語モデル(LLM)の潜在的な誤用を防ぐために重要である。我々は,テキスト分類において,言語識別とより小さなLLMのパラメータ効率の微調整を利用して,この課題に複数の方法で対処してきた。さらに、言語ごとの分類閾値校正を用いて、微調整モデル予測と統計的検出指標を一意に組み合わせ、システム検出性能の一般化を図った。提案手法は,第4位にランクインし,勝者のわずか1ポイント未満の競争結果を得た。 SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection. Such a detection is important for preventing a potential misuse of large language models (LLMs), the newest of which are very capable in generating multilingual human-like texts. We have coped with this task in multiple ways, utilizing language identification and parameter-efficient fine-tuning of smaller LLMs for text classification. We have further used the per-language classification-threshold calibration to uniquely combine fine-tuned models predictions with statistical detection metrics to improve generalization of the system detection performance. Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# ファクチュアル知識のひび割れ:大規模言語モデルにおける退化知識ニューロンの包括的解析 Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models ( http://arxiv.org/abs/2402.13731v2 ) ライセンス: Link先を確認	Yuheng Chen, Pengfei Cao, Yubo Chen, Yining Wang, Shengping Liu, Kang Liu, Jun Zhao,	(参考訳) 大規模言語モデル(LLM)は、膨大な事実知識を格納するが、その基盤となるメカニズムはいまだ不明である。以前の研究では、事実知識は多層パーセプトロンの重みに格納され、いくつかの記憶ユニットは縮退知識ニューロン(DKN)と呼ばれる縮退性を示すことが示唆された。この概念の斬新さと独特な性質にもかかわらず、厳密に定義されたり体系的に研究されたりはしていない。まず、MLPニューロンの接続重みパターンを考察し、構造的側面と機能的側面の両方からDKNを定義する。これに基づいて神経学的トポロジ・クラスタリング法を導入し,任意の数や構造にDKNが形成されることにより,より正確なDKNの取得が可能となる。さらに、認知科学に触発されて、我々はDKNとLLMの堅牢性、進化性、複雑さとの関係を探求する。 6 つの条件下で34 実験を行った結果,DKN とこれら3 つの特性の関連性が示された。コードはまもなく利用可能になる。 Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear. Previous research suggests that factual knowledge is stored within multi-layer perceptron weights, and some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons (DKNs). Despite the novelty and unique properties of this concept, it has not been rigorously defined or systematically studied. We first consider the connection weight patterns of MLP neurons and define DKNs from both structural and functional aspects. Based on this, we introduce the Neurological Topology Clustering method, which allows the formation of DKNs in any numbers and structures, leading to a more accurate DKN acquisition. Furthermore, inspired by cognitive science, we explore the relationship between DKNs and the robustness, evolvability, and complexity of LLMs. Our execution of 34 experiments under 6 settings demonstrates the connection between DKNs and these three properties. The code will be available soon.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# MLXP: Pythonで再現可能な実験を行うフレームワーク MLXP: A Framework for Conducting Replicable Experiments in Python ( http://arxiv.org/abs/2402.13831v2 ) ライセンス: Link先を確認	Michael Arbel, Alexandre Zouaoui,	(参考訳) 機械学習(ML)研究の再現性は、複雑な非決定論的アルゴリズムの利用と、モデルアーキテクチャやトレーニングデータセットなどの多くのハイパーパラメータ選択への依存により、ますます懸念されている。再現性と複製性のある結果の確保は、この分野を前進させるには不可欠であるが、堅牢な結論を得るための体系的かつよく組織された実験を行うためには、重要な技術的努力を必要とすることが多い。実験管理の促進と再現性向上のためにいくつかのツールが開発されているが、産業環境ではうまく処理されているにもかかわらず、研究コミュニティ内での採用を妨げる複雑さをしばしば導入している。低採用の課題に対処するため、オープンソースでシンプルで軽量なPythonベースの実験管理ツールであるMLXPがhttps://github.com/inria-thoth/mlxp.comで公開されている。 MLXPは、高い再現性を確保しながら、最小限のオーバーヘッドで実験プロセスを合理化します。 Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp . MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# 社会環境設計 Social Environment Design ( http://arxiv.org/abs/2402.14090v3 ) ライセンス: Link先を確認	Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen,	(参考訳) 人工知能(AI)は、政府や経済政策の改善に使用できる技術として、約束を守る。本稿では、強化学習、EconCS、計算社会選択のコミュニティと連携する自動政策作成にAIを使用するための一般的なフレームワークである社会環境設計を導入することにより、この目的に向けた新たな研究課題を提案する。このフレームワークは、一般的な経済環境を捉え、政策目標に関する投票を含め、AIシミュレーションを通じて政府と経済政策を体系的に分析するための方向性を提供する。 AIベースの政策決定における今後の研究の鍵となるオープンな問題を強調します。これらの課題を解決することで、我々は様々な社会福祉目標を達成することができ、それによってより倫理的で責任ある意思決定を促進することを望んでいます。 Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# COBIAS:バイアス評価におけるコンテキスト信頼性 COBIAS: Contextual Reliability in Bias Assessment ( http://arxiv.org/abs/2402.14889v2 ) ライセンス: Link先を確認	Priyanshul Govil, Hemang Jain, Vamshi Krishna Bonagiri, Aman Chadha, Ponnurangam Kumaraguru, Manas Gaur, Sanorita Dey,	(参考訳) 大規模な言語モデル(LLM)は、広範囲なウェブコーパスで訓練されており、人間のようなテキストを理解して生成することができる。しかし、このトレーニングプロセスはモデルに固有のバイアスをもたらす。これらのバイアスは、様々なステレオタイプや偏見を含む、Webデータの多様性と、しばしば未修正の性質から生じる。デバイアスモデルに関するこれまでの作業は、メソッドのパフォーマンスを測定するためにベンチマークデータセットに依存していた。しかし、これらのデータセットは、偏見の非常に主観的な理解のため、いくつかの落とし穴に悩まされ、文脈探索の重要な必要性が浮かび上がっている。本稿では,それらが生じる可能性のある多様な状況を考慮して,入力の文脈を理解することを提案する。私たちの貢献は2つあります。 (i)2つの既存のバイアスベンチマークデータセットから2,291個のステレオタイプステートメントを拡張し、コンテキストを追加するためのポイントを付与する。 (II) 文脈指向バイアス指標と評価スコア(COBIAS)を開発し, バイアス測定における文の文脈的信頼性を評価する。我々の計量は、文の文脈的信頼性に関する人間の判断(Spearman's $\rho = 0.65, p = 3.4 * 10^{-60}$)と一致し、バイアス軽減作業を支援する信頼できるデータセットを作成するために使用できる。 Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measure their method's performance. However, these datasets suffer from several pitfalls due to the highly subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of inputs by considering the diverse situations in which they may arise. Our contribution is two-fold: (i) we augment 2,291 stereotyped statements from two existing bias-benchmark datasets with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess a statement's contextual reliability in measuring bias. Our metric aligns with human judgment on contextual reliability of statements (Spearman's $\rho = 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable datasets, which would assist bias mitigation works.	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# 人間がどうやってコードを書くのか? 大型モデルも同じように How Do Humans Write Code? Large Models Do It the Same Way Too ( http://arxiv.org/abs/2402.15729v2 ) ライセンス: Link先を確認	Long Li, Xuzheng He,	(参考訳) Program-of-Thought (PoT) は、自然言語ベースのChain-of-Thought (CoT) を、計算エラーを回避するために外部ツールコールを利用することで、Large Language Models (LLM) の数学的推論タスクで最も一般的な方法として置き換える。しかし, GPT-4 と Llama シリーズの評価では, CoT と比較して,PoT の誤式や論理の欠陥などの推論誤差が大きくなることが判明した。この問題に対処するために,PoTとCoTの統合を支援する一連の戦略を活用するHTL(Human-Think Language)を提案する。 2) 注意点より論理的なコードを生成するため、PoT中のCoT推論にモデル注意を向ける。 3)難解な数学問題の解法において,LLMの繰り返し推論ステップを防止するため,CoT応答とPoT応答の精度を報奨として活用する強化学習を行う。 Llama-Baseモデルでは平均6.5%,Mistral-Baseモデルでは4.3%の改善を実現している。また、5つのドメイン外のデータセットに対して、モデルの情報フローを制御し、強い転送可能性を示すことにより、大きな効果を示す。さらに、HTLは非数学的自然言語推論タスクにおいて最も顕著な改善を示し、統一推論タスクフレームワークに寄与している。 Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models (LLMs) mathematical reasoning tasks by utilizing external tool calls to circumvent computational errors. However, our evaluation of the GPT-4 and Llama series reveals that using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT. To address this issue, we propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT, encompassing: (1) a new generation paradigm that uses full CoT reasoning to control code generation. (2) Focus Attention, that directs model attention to the CoT reasoning during PoT to generate more logical code. (3) reinforcement learning that utilizes the accuracy of both CoT and PoT responses as rewards to prevent repetitive reasoning steps in LLMs when solving difficult math problems. Our method achieves an average improvement of 6.5% on the Llama-Base model and 4.3% on the Mistral-Base model across 8 mathematical calculation datasets. It also shows significant effectiveness on five out-of-domain datasets by controlling the model's information flow, exhibiting strong transferability. Additionally, HTL shows the most significant improvement in non-mathematical natural language inference task, contributing to a unified reasoning task framework	翻訳日:2024-06-19 05:56:21 公開日:2024-06-17
# 真理とファシリテーティング・チェンジの展開--エージェントによる大規模社会運動シミュレーションを目指して Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation ( http://arxiv.org/abs/2402.16333v2 ) ライセンス: Link先を確認	Xinyi Mou, Zhongyu Wei, Xuanjing Huang,	(参考訳) ソーシャルメディアは社会運動の基盤として現れ、社会変革の推進に大きな影響を与えている。大衆の反応をシミュレートし、潜在的な影響を予測することがますます重要になっている。しかし,このような現象をシミュレートする既存の手法は,社会運動参加者の行動を把握する上での有効性と効率性に関する課題に直面している。本稿では,ソーシャルメディアユーザシミュレーションのためのハイブリッドフレームワークHiSimを紹介し,ユーザを2つのタイプに分類する。コアユーザはLarge Language Modelsによって駆動されるが、多くの一般ユーザはdeductive agent-based modelによってモデル化される。さらに、トリガイベントに続く応答ダイナミクスを再現するために、Twitterのような環境を構築します。次に,実世界のデータセットを対象とした総合的な実験を行うための,多面的ベンチマークSoMoSiMu-Benchを開発した。実験の結果,本手法の有効性と柔軟性が示された。 Social media has emerged as a cornerstone of social movements, wielding significant influence in driving societal change. Simulating the response of the public and forecasting the potential impact has become increasingly important. However, existing methods for simulating such phenomena encounter challenges concerning their efficacy and efficiency in capturing the behaviors of social movement participants. In this paper, we introduce a hybrid framework HiSim for social media user simulation, wherein users are categorized into two types. Core users are driven by Large Language Models, while numerous ordinary users are modeled by deductive agent-based models. We further construct a Twitter-like environment to replicate their response dynamics following trigger events. Subsequently, we develop a multi-faceted benchmark SoMoSiMu-Bench for evaluation and conduct comprehensive experiments across real-world datasets. Experimental results demonstrate the effectiveness and flexibility of our method.	翻訳日:2024-06-19 05:46:37 公開日:2024-06-17
# KoDialogBench:韓国語対話ベンチマークによる言語モデルの会話的理解の評価 KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark ( http://arxiv.org/abs/2402.17377v2 ) ライセンス: Link先を確認	Seongbo Jang, Seonghyeon Lee, Hwanjo Yu,	(参考訳) 言語モデルは、しばしばチャットボットアシスタントとしてデプロイされるため、モデルがユーザの最初の言語で会話を行うようになる。これらのモデルは幅広い言語で訓練されているが、韓国語のような低リソース言語におけるそれらの能力の総合的な評価は欠如している。本研究では,韓国語における言語モデルの対話能力を評価するためのベンチマークであるKoDialogBenchを紹介する。この目的のために,日中の話題に関する韓国語対話を公開資料から収集したり,他言語からの対話を翻訳したりする。次に、これらの会話を多様なテストデータセットに構成し、対話理解から応答選択タスクにまたがる。提案手法を応用して,韓国語対話の基盤的理解を測定するために,様々な言語モデルの広範囲な評価と分析を行う。実験結果から,モデルの会話能力向上のための重要な場があることが示唆された。さらに、異なる言語モデル間での詳細な比較では、会話の熟練度を高めるための最近の訓練手法の有効性を強調した。我々はKoDialogBenchが韓国語モデルの発展を促進することを期待する。 As language models are often deployed as chatbot assistants, it becomes a virtue for models to engage in conversations in a user's first language. While these models are trained on a wide range of languages, a comprehensive evaluation of their proficiency in low-resource languages such as Korean has been lacking. In this work, we introduce KoDialogBench, a benchmark designed to assess language models' conversational capabilities in Korean. To this end, we collect native Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks. Leveraging the proposed benchmark, we conduct extensive evaluations and analyses of various language models to measure a foundational understanding of Korean dialogues. Experimental results indicate that there exists significant room for improvement in models' conversation skills. Furthermore, our in-depth comparisons across different language models highlight the effectiveness of recent training techniques in enhancing conversational proficiency. We anticipate that KoDialogBench will promote the progress towards conversation-aware Korean language models.	翻訳日:2024-06-19 05:46:37 公開日:2024-06-17
# シリコンバレーの群衆の知恵: LLM Ensemble Prediction Capability Rival Human Crowd Accuracy Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy ( http://arxiv.org/abs/2402.19379v5 ) ライセンス: Link先を確認	Philipp Schoenegger, Indre Tuminauskaite, Peter S. Park, Rafael Valdece Sousa Bastos, Philip E. Tetlock,	(参考訳) 実際の人間の予測精度は、「群衆の知恵」効果に依存しており、個々の予測者の群集に集結することで、将来の出来事に関する予測が著しく改善される。大規模言語モデル(LLMs)の予測能力に関する過去の研究は、フロンティアのLLMは、人混みの予測・学習集約のゴールド標準に比べて性能が劣っていることを示唆している。研究1では、12個のLLMの群集からなるLLMアンサンブルアプローチを用いて、この研究を拡大する。我々は,31の2進数質問に対するLLM予測を,3ヶ月の予測トーナメントの925人の予測者の群集と比較した。我々の事前登録された主要な分析は、LLMの群集が単純な非情報ベンチマークよりも優れており、統計的にヒトの群集と異なるものではないことを示している。また、アクセプション効果やラウンド数を好む傾向など、機械応答における人間のようなバイアスの集合も観察する。研究2では,LLM予測(GPT-4とClaude 2)が人間の認知的アウトプットに描画することで改善できるかどうかを検証した。両モデルの予測精度は、中央値の人間の予測を情報として露出することで、精度を17%から28%向上させることで得られるが、これは人や機械の予測を単に平均化するよりも精度の低い予測につながる。以上の結果から, LLMは, 簡易で実用的な予測集計手法により, 人群に匹敵する予測精度を達成できることが示唆された。 Human forecasting accuracy in practice relies on the 'wisdom of the crowd' effect, in which predictions about future events are significantly improved by aggregating across a crowd of individual forecasters. Past work on the forecasting ability of large language models (LLMs) suggests that frontier LLMs, as individual forecasters, underperform compared to the gold standard of a human-crowd forecasting-tournament aggregate. In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of 12 LLMs. We compare the aggregated LLM predictions on 31 binary questions to those of a crowd of 925 human forecasters from a three-month forecasting tournament. Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark, and is not statistically different from the human crowd. We also observe a set of human-like biases in machine responses, such as an acquiescence effect and a tendency to favour round numbers. In Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models' forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%, though this leads to less accurate predictions than simply averaging human and machine forecasts. Our results suggest that LLMs can achieve forecasting accuracy rivaling that of the human crowd: via the simple, practically applicable method of forecast aggregation.	翻訳日:2024-06-19 05:46:37 公開日:2024-06-17
# DiaHalu: 大規模言語モデルのための対話レベルの幻覚評価ベンチマーク DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models ( http://arxiv.org/abs/2403.00896v2 ) ライセンス: Link先を確認	Kedi Chen, Qin Chen, Jie Zhou, Yishen He, Liang He,	(参考訳) 近年, 大規模言語モデル (LLM) が大きな成功を収めているため, 幻覚の問題は依然として課題であり, 幻覚を検出するためのベンチマークが多数提案されている。しかしながら、これらのベンチマークのいくつかはLLMによって自然に生成されるものではなく、意図的に誘導される。また、忠実な幻覚を無視しながら、事実の幻覚にのみ焦点をあてる者も多い。さらに,LLMの時代には,対話パターンが広く利用されているが,現在のベンチマークでは文レベルと通過レベルの幻覚にのみ焦点が当てられている。本研究では,対話レベルの幻覚評価ベンチマークDiaHaluを提案する。当初、収集したトピックをシステムプロンプトに統合し、2つのChatGPT3.5間の対話を容易にする。その後、人間の言語規則に従わない内容を手動で修正し、LLMを再生させ、人間と機械の相互作用シナリオをシミュレートする。最後に、専門家はデータセットのすべてのサンプルに注釈を付ける。 DiaHaluは4つの共通多ターン対話ドメインと5つの幻覚サブタイプをカバーしており、事実性と忠実な幻覚から拡張されている。データセット上のよく知られたLCMと検出方法による実験は、DiaHaluが挑戦的なベンチマークであり、さらなる研究に重要な価値を持っていることを示している。 Since large language models (LLMs) achieve significant success in recent years, the hallucination issue remains a challenge, numerous benchmarks are proposed to detect the hallucination. Nevertheless, some of these benchmarks are not naturally generated by LLMs but are intentionally induced. Also, many merely focus on the factuality hallucination while ignoring the faithfulness hallucination. Additionally, although dialogue pattern is more widely utilized in the era of LLMs, current benchmarks only concentrate on sentence-level and passage-level hallucination. In this study, we propose DiaHalu, the first dialogue-level hallucination evaluation benchmark to our knowledge. Initially, we integrate the collected topics into system prompts and facilitate a dialogue between two ChatGPT3.5. Subsequently, we manually modify the contents that do not adhere to human language conventions and then have LLMs re-generate, simulating authentic human-machine interaction scenarios. Finally, professional scholars annotate all the samples in the dataset. DiaHalu covers four common multi-turn dialogue domains and five hallucination subtypes, extended from factuality and faithfulness hallucination. Experiments through some well-known LLMs and detection methods on the dataset show that DiaHalu is a challenging benchmark, holding significant value for further research.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# 量子ビット計測用リードアウト共振器の方向放射 Directional emission of a readout resonator for qubit measurement ( http://arxiv.org/abs/2403.01375v2 ) ライセンス: Link先を確認	Alec Yen, Yufeng Ye, Kaidong Peng, Jennifer Wang, Gregory Cunningham, Michael Gingras, Bethany M. Niedzielski, Hannah Stickler, Kyle Serniak, Mollie E. Schwartz, Kevin P. O'Brien,	(参考訳) 我々は、全パス共振器を用いて超伝導量子ビットの伝送に基づく分散読み出しを提案し、出力に対して優先的に読み出し光子を出力する。これは、リードアウト信号が出力に向かって優先的に減衰するように、フィードラインを一方の端で意図的にミスマッチする典型的な読み出し方式とは対照的である。この意図的なミスマッチは、非理想的インピーダンス環境による有効共振器のライン幅の拡大や、インピーダンスマッチングのためのインフラの追加など、スケーリング上の課題を生じさせる。多重化オールパスリードアウト共振器を用いた将来のアーキテクチャでは、意図的にミスマッチする必要がなくなり、量子コンピュータのスケーリングの見通しが向上する可能性がある。オールパスリードアウト」の実証実証として,全パスリードアウト共振器を設計し,リードアウト周波数1.17dB未満の挿入損失と最大挿入損失1.53dBを,トランスモンキュービットの最低3つの状態に対して全帯域にわたって実現した。我々は,600 nsで平均98.1%のシングルショット忠実度を持つ量子ビット読み出しを実証し,より大きな分散シフトの効果を評価するために,シェルビングプロトコルを実装し,300 nsで99.0%の忠実度を達成する。 We propose and demonstrate transmission-based dispersive readout of a superconducting qubit using an all-pass resonator, which preferentially emits readout photons toward the output. This is in contrast to typical readout schemes, which intentionally mismatch the feedline at one end so that the readout signal preferentially decays toward the output. We show that this intentional mismatch creates scaling challenges, including larger spread of effective resonator linewidths due to non-ideal impedance environments and added infrastructure for impedance matching. A future architecture using multiplexed all-pass readout resonators would avoid the need for intentional mismatch and potentially improve the scaling prospects of quantum computers. As a proof-of-concept demonstration of "all-pass readout," we design and fabricate an all-pass readout resonator that demonstrates insertion loss below 1.17 dB at the readout frequency and a maximum insertion loss of 1.53 dB across its full bandwidth for the lowest three states of a transmon qubit. We demonstrate qubit readout with an average single-shot fidelity of 98.1% in 600 ns; to assess the effect of larger dispersive shift, we implement a shelving protocol and achieve a fidelity of 99.0% in 300 ns.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# 群集ナビゲーションのための混合戦略ナッシュ平衡 Mixed Strategy Nash Equilibrium for Crowd Navigation ( http://arxiv.org/abs/2403.01537v4 ) ライセンス: Link先を確認	Muchen Sun, Francesca Baldini, Katie Hughes, Peter Trautman, Todd Murphey,	(参考訳) 混雑した地域で移動するロボットは、衝突回避を完全に制御するのではなく、人間と自由空間を交渉する必要がある。ゲーム理論は、ロボットが経路計画中に衝突回避のために人間と協力する可能性について推論するための枠組みを提供する。特に、混合戦略ナッシュ均衡は不確実性の下での交渉行動を捉え、群衆のナビゲーションに適している。しかし、混合戦略のナッシュ均衡の計算は、しばしばリアルタイムな意思決定には不当に高価である。本稿では,軌道の確率分布の反復的ベイズ更新方式を提案する。アルゴリズムはロボットの確率的計画と他の歩行者の進路の確率論的予測を同時に生成する。提案アルゴリズムは,クラウドナビゲーションのための混合戦略ゲームと等価であり,このアルゴリズムはゲーム全体のナッシュ均衡の回復を保証する。我々はベイズのルールナッシュ平衡 (BRNE) と命名し、リアルタイムモデル予測クラウドナビゲーションフレームワークを開発した。 BRNEは汎用的な混合戦略ナッシュ均衡を解くのではなく、特に群集ナビゲーションに適した公式を解くため、低消費電力の組込みコンピュータ上でリアルタイムで解を計算することができる。シミュレーション環境と実世界の歩行者データの両方においてBRNEを評価する。 BRNEは、安全性とナビゲーション効率に関する非学習および学習ベースの手法を一貫して上回っている。また、歩行者データセットのベンチマークでは、人レベルの群衆ナビゲーションのパフォーマンスにも到達している。最後に,本アルゴリズムの実際の人間による実用性を,完全に搭載された知覚と計算能力を備えた四足歩行ロボット上で実証する。 Robots navigating in crowded areas should negotiate free space with humans rather than fully controlling collision avoidance, as this can lead to freezing behavior. Game theory provides a framework for the robot to reason about potential cooperation from humans for collision avoidance during path planning. In particular, the mixed strategy Nash equilibrium captures the negotiation behavior under uncertainty, making it well suited for crowd navigation. However, computing the mixed strategy Nash equilibrium is often prohibitively expensive for real-time decision-making. In this paper, we propose an iterative Bayesian update scheme over probability distributions of trajectories. The algorithm simultaneously generates a stochastic plan for the robot and probabilistic predictions of other pedestrians' paths. We prove that the proposed algorithm is equivalent to solving a mixed strategy game for crowd navigation, and the algorithm guarantees the recovery of the global Nash equilibrium of the game. We name our algorithm Bayes' Rule Nash Equilibrium (BRNE) and develop a real-time model prediction crowd navigation framework. Since BRNE is not solving a general-purpose mixed strategy Nash equilibrium but a tailored formula specifically for crowd navigation, it can compute the solution in real-time on a low-power embedded computer. We evaluate BRNE in both simulated environments and real-world pedestrian datasets. BRNE consistently outperforms non-learning and learning-based methods regarding safety and navigation efficiency. It also reaches human-level crowd navigation performance in the pedestrian dataset benchmark. Lastly, we demonstrate the practicality of our algorithm with real humans on an untethered quadruped robot with fully onboard perception and computation.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# 超高速後方散乱光電子に対するカタストロフィと隠れ力学対称性の影響 Influence of catastrophes and hidden dynamical symmetries on ultrafast backscattered photoelectrons ( http://arxiv.org/abs/2403.02264v2 ) ライセンス: Link先を確認	T. Rook, L. Cruz Rodriguez, C. Figueira de Morisson Faria,	(参考訳) 我々は最近実装されたハイブリッドフォワード境界CQSFA(H-CQSFA)を用いて、クーロンテールと光電子運動量分布(PMD)における軟化の程度の違いによるポテンシャルの利用効果について議論した。クーロン相互作用に軟化を導入することは、後方散乱電子軌跡に関連するPSDで観察される隆起に影響を及ぼすことを示す。ハードコアクーロン相互作用の限界では、再散乱した尾根は偏光軸に沿って近づき、ソフトコア電位は尾根特異的な角度で中断される。我々は、尾根につながる異なる軌道の運動量マッピングを分析する。ハードコアポテンシャルについては、尾根で結合する2種類のサドルポイント解が存在する。軟化を増大させることにより、クーロンポテンシャルにのみ関連する隠れた力学対称性を破り、さらに2つの解が現れることを示す。この対称性の破れのさらなるシグネチャは運動量空間の軌跡のサブセットで遭遇する。最後に、散乱理論を用いて、軟化が最大散乱角にどのように影響するかを示し、CQSFAからの観測と一致する見積もりを提供する。これは、電子の連続体伝播における残留結合電位の存在下では、純粋に運動学と動的因果関係の区別が曖昧になることを意味する。 We discuss the effect of using potentials with a Coulomb tail and different degrees of softening in the photoelectron momentum distributions (PMDs) using the recently implemented hybrid forward-boundary CQSFA (H-CQSFA). We show that introducing a softening in the Coulomb interaction influences the ridges observed in the PMDs associated with backscattered electron trajectories. In the limit of a hard-core Coulomb interaction, the re-scattering ridges close along the polarization axis, while for a soft-core potential, they are interrupted at ridge-specific angles. We analyze the momentum mapping of the different orbits leading to the ridges. For the hard-core potential, there exist two types of saddle-point solutions that coalesce at the ridge. By increasing the softening, we show that two additional solutions emerge as the result of breaking a hidden dynamical symmetry associated exclusively with the Coulomb potential. Further signatures of this symmetry breaking are encountered in subsets of momentum-space trajectories. Finally, we use scattering theory to show how the softening affects the maximal scattering angle and provide estimates that agree with our observations from the CQSFA. This implies that, in the presence of residual binding potentials in the electron's continuum propagation, the distinction between purely kinematic and dynamic caustics becomes blurred.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# LLM評価のための微調整判定モデルの限界について On the Limitations of Fine-tuned Judge Models for LLM Evaluation ( http://arxiv.org/abs/2403.02839v2 ) ライセンス: Link先を確認	Hui Huang, Yingqi Qu, Hongli Zhou, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao,	(参考訳) 近年,Large Language Model (LLM) を用いて他のLLMの品質を評価する傾向が高まっている。多くの研究では、プロプライエタリなオープンソースモデル、特にGPT-4を評価対象として採用している。あるいは、オープンソースのLCMに基づいて微調整された判断モデルを評価対象とする作品もある。微調整された判定モデルはGPT-4と同等の評価能力を発揮すると主張されているが,本研究では,判定モデルの実証的研究を行う。提案手法は, GPT-4 を超越しても, GPT-4 は汎用性, 公平性, アスペクト特化評価, 拡張性など, 領域内テストセット上で高い性能を達成できることが示唆された。また、微調整された判断モデルが本質的にタスク固有の分類器として機能し、その結果、制限が課されることを明らかにした。最後に, LLM評価における有効性を最大化する目的で, 微調整審査員の信頼性を測定する効果的な指標を提案する。 Recently, there has been a growing trend of utilizing Large Language Model (LLM) to evaluate the quality of other LLMs. Many studies have employed proprietary close-source models, especially GPT-4, as the evaluator. Alternatively, other works have fine-tuned judge models based on open-source LLMs as the evaluator. While the fine-tuned judge models are claimed to achieve comparable evaluation capability with GPT-4, in this study, we conduct an empirical study of judge models. Our findings indicate that although the fine-tuned judge models achieve high performance on in-domain test sets, even surpassing GPT-4, they underperform GPT-4 across several dimensions, including generalizability, fairness, aspect-specific evaluation, and scalability. We also reveal that the fine-tuned judge model inherently operates as a task-specific classifier, consequently imposing the limitations. Finally, we propose an effective indicator to measure the reliability of fine-tuned judges, with the aim of maximizing their utility in LLM evaluation.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# 統合性保護ブロック暗号モード -- 絡み合ったWebをアンタングする Integrity-protecting block cipher modes -- Untangling a tangled web ( http://arxiv.org/abs/2403.03654v2 ) ライセンス: Link先を確認	Chris J Mitchell,	(参考訳) 本稿では,認証暗号を提供するために設計された3つのブロック暗号モードのセキュリティを再検討する。これらのモードは PES-PCBC, IOBC, EPBC と呼ばれ、いずれも1990年代半ばに提案された。しかし、後者の2つのモードのセキュリティ分析はより最近になって発表された。いずれの場合も、これらのスキームに関するセキュリティ問題を記述した1つ以上の論文が最終的に公表されたが、これらの分析のうちの1つ(EDBCの)の欠陥が後に発見された。本稿は,これら3つのスキームがいずれも,それらの使用を防ぐための欠陥を持っていること,特にセキュリティの証明を有する効率的な代替スキームが多数存在することを明らかにする。 This paper re-examines the security of three related block cipher modes of operation designed to provide authenticated encryption. These modes, known as PES-PCBC, IOBC and EPBC, were all proposed in the mid-1990s. However, analyses of security of the latter two modes were published more recently. In each case one or more papers describing security issues with the schemes were eventually published, although a flaw in one of these analyses (of EPBC) was subsequently discovered - this means that until now EPBC had no known major issues. This paper establishes that, despite this, all three schemes possess defects which should prevent their use - especially as there are a number of efficient alternative schemes possessing proofs of security.	翻訳日:2024-06-19 05:36:50 公開日:2024-06-17
# 生成事前学習型構造化変換器:大規模における教師なし構文言語モデル Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale ( http://arxiv.org/abs/2403.08293v3 ) ライセンス: Link先を確認	Xiang Hu, Pengyu Ji, Qingyang Zhu, Wei Wu, Kewei Tu,	(参考訳) 構文言語モデル(SLM)はその構文木を左から右に漸進的に生成する。並列性の高い原文のスクラッチから事前学習が可能な大規模教師なしSLMであるGenerative Pretrained Structured Transformers (GPST)を提案する。 GPSTは、ゴールドツリーやシーケンシャルトレーニングなど、以前のSLMの制限を回避している。これは、一方向の言語モデリング損失によって教師される通常のSLMと、構文解析木を誘導し、双方向の言語モデリング損失によって教師される構成表現を計算する追加の合成モデルからなる。本稿では,2つのモデルの連立並列訓練をEM方式で行うための表現代行法を提案する。我々は9億ドルのトークンを持つコーパスであるOpenWebText上でGPSTを事前訓練し、GPT-2よりもGPSTの方が優れていることを示す。一方、GPSTは既存の教師なしSLMよりも左から右への文法誘導に優れており、トレーニングにおいてかなりの加速を保っている。 A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It consists of two components, a usual SLM supervised by a uni-directional language modeling loss, and an additional composition model, which induces syntactic parse trees and computes constituent representations, supervised by a bi-directional language modeling loss. We propose a representation surrogate to enable joint parallel training of the two models in a hard-EM fashion. We pre-train GPST on OpenWebText, a corpus with $9$ billion tokens, and demonstrate the superiority of GPST over GPT-2 with a comparable size in numerous tasks covering both language understanding and language generation. Meanwhile, GPST also significantly outperforms existing unsupervised SLMs on left-to-right grammar induction, while holding a substantial acceleration on training.	翻訳日:2024-06-19 05:27:06 公開日:2024-06-17
# TaxoLLaMA:複数語彙意味課題の解決のためのWordNetベースのモデル TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks ( http://arxiv.org/abs/2403.09207v2 ) ライセンス: Link先を確認	Viktor Moskvoretskii, Ekaterina Neminova, Alina Lobanova, Alexander Panchenko, Irina Nikishina,	(参考訳) 本稿では,LLaMA-2-7bモデルの例を用いて,WordNetから語彙意味知識を抽出し,複数の語彙意味タスクで検証するLLMの機能について検討する。実験の結果,4ビット量子化とLoRAにより軽量なオールインワンモデルであるTaxoLLaMAを提案する。 SotAの結果は11で、分類の豊かさ、ハイパーネム発見、分類構築、レキシカル・エンテリメントの16のタスクのうち4つのトップ2が達成されている。さらに、レキシカルエンターメントと分類構築において、微調整なしで非常に強力なゼロショット性能を示す。また、その隠れた多言語およびドメイン適応機能についても、少しチューニングしたり、ほんの少しの学習で調べます。すべてのデータセット、コード、モデルはhttps://github.com/VityaVitalich/TaxoLLaMAで公開されている。 In this paper, we explore the capabilities of LLMs in capturing lexical-semantic knowledge from WordNet on the example of the LLaMA-2-7b model and test it on multiple lexical semantic tasks. As the outcome of our experiments, we present TaxoLLaMA, the everything-in-one model, lightweight due to 4-bit quantization and LoRA. It achieves 11 SotA results, 4 top-2 results out of 16 tasks for the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks. Moreover, it demonstrates very strong zero-shot performance on Lexical Entailment and Taxonomy Construction with no fine-tuning. We also explore its hidden multilingual and domain adaptation capabilities with a little tuning or few-shot learning. All datasets, code, and model are available online at https://github.com/VityaVitalich/TaxoLLaMA	翻訳日:2024-06-19 05:27:06 公開日:2024-06-17
# 包括的マルチモーダル知覚に向けて:タッチ・ランゲージ・ビジョン・データセットの導入 Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset ( http://arxiv.org/abs/2403.09813v3 ) ライセンス: Link先を確認	Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han,	(参考訳) 触覚は、人間とロボットの両方の知覚と相互作用能力に対する重要なサポートと強化を提供する。それでも、タッチに関連するマルチモーダル研究は主に視覚的・触覚的なモダリティに焦点を当てており、言語領域での探索は限られている。語彙以外にも、文レベルの記述にはよりリッチな意味論が含まれる。そこで我々は,マルチモードアライメントのための文レベル記述を特徴とする,人間と機械のカスケード協調によるTLV(Touch-Language-Vision)というタッチ言語ビジョンデータセットを構築した。新しいデータセットは、提案した軽量トレーニングフレームワークであるSTLV-Align(Synergistic Touch-Language-Vision Alignment)を微調整するために使用され、最小パラメータ調整(1%)で効果的なセマンティックアライメントを実現する。 Project Page: https://xiaoen0.github.io/touch.page/.com Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-language-vision dataset named TLV (Touch-Language-Vision) by human-machine cascade collaboration, featuring sentence-level descriptions for multimode alignment. The new dataset is used to fine-tune our proposed lightweight training framework, STLV-Align (Synergistic Touch-Language-Vision Alignment), achieving effective semantic alignment with minimal parameter adjustments (1%). Project Page: https://xiaoen0.github.io/touch.page/.	翻訳日:2024-06-19 05:27:06 公開日:2024-06-17
# BirdSet: 鳥類のバイオ音響学の分類のためのデータセットとベンチマーク BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics ( http://arxiv.org/abs/2403.10380v3 ) ライセンス: Link先を確認	Lukas Rauch, Raphael Schwinger, Moritz Wirth, René Heinrich, Denis Huseljic, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz,	(参考訳) 深層学習(DL)モデルは、環境健康を評価するための鳥のバイオ音響学の強力なツールとして登場した。低コストで最小限のパッシブ・アコースティック・モニタリング(PAM)の可能性を最大化するために、DLモデルは幅広い種や環境条件で鳥の声化を分析する必要がある。しかし、データの断片化は一般化性能の包括的な評価に挑戦する。そこで,BirdSetデータセットを導入し,約52万本のグローバル・バード・レコードと400時間以上のPAM・レコードをテスト対象とする。我々のベンチマークでは、複数のDLモデルのベースラインを提供し、総合的なトレーニングや評価プロトコルを含むコード実装とともに、コンパラビリティを高め、研究を集約しています。 Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.	翻訳日:2024-06-19 05:27:06 公開日:2024-06-17
# 時間的Oracleの混在を伴わない実践的アワード強化学習のグローバルな最適化に向けて Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles ( http://arxiv.org/abs/2403.11925v4 ) ライセンス: Link先を確認	Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha,	(参考訳) 平均回帰強化学習の文脈では、固定された政策の下でマルコフ連鎖が定常分布を達成するためには、混合時間のオラクル知識が必要であり、政策勾配法のグローバル収束に重要な課題を生じさせる。この要件は、大規模な状態空間を持つ環境における混合時間推定の困難さと費用が問題であり、実用的なアプリケーションにおいて効果的な勾配推定を行うために、急激な長い軌道が必要となり、この制限に対処するために、マルチレベルモンテカルロ勾配推定器を組み込んだマルチレベルアクター・クリティカル(MAC)フレームワークを考える。提案手法では, 時間知識の混合への依存を効果的に緩和する。さらに,本手法は先行研究から知られている$\mathcal{O}\left( \sqrt{\tau_{mix}} \right)の最も厳密な依存性を示す。 2Dグリッドの世界における目標達成ナビゲーション実験により、MACは、平均的な報酬設定のために既存の最先端のポリシー勾配に基づく手法よりも優れていることを示す。 In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications.To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte-Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$known from prior work. With a 2D grid world goal-reaching navigation experiment, we demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# 基準に基づくメトリクスは質問生成のテーマを異にする Reference-based Metrics Disprove Themselves in Question Generation ( http://arxiv.org/abs/2403.12242v2 ) ライセンス: Link先を確認	Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang,	(参考訳) BLEUやBERTScoreのような基準ベースのメトリクスは、質問生成(QG)を評価するために広く使われている。本研究では、SQuADやHotpotQAなどのQGベンチマークにおいて、人手による参照を用いることで基準ベースのメトリクスの有効性を保証できないことを示す。ほとんどのQGベンチマークには1つの参照しかありません。優れた測定基準は、生成した質問に比較して、人間公認の質問を格付けすることが期待された。しかし, 新たに収集した基準値に対する基準基準値の結果は, 基準値自体を反証した。本研究では,大規模言語モデルを用いて,自然性,応答可能性,複雑性などの多次元基準からなる基準自由度尺度を提案する。これらの基準は単一の参照質問の構文や意味に制約されず、メトリクスは多様な参照セットを必要としない。実験の結果、我々の測定基準は高品質な質問と欠陥のある質問を正確に区別し、人間の判断と最先端の一致を実現していることがわかった。 Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# 任意多視点画像からの人間のメッシュ復元 Human Mesh Recovery from Arbitrary Multi-view Images ( http://arxiv.org/abs/2403.12434v4 ) ライセンス: Link先を確認	Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen,	(参考訳) 任意のマルチビュー画像からのヒューマンメッシュリカバリには、任意のカメラポーズと、任意の数のカメラビューの2つの特徴がある。可変性のため、このタスクに取り組むために統一されたフレームワークを設計することは困難である。この課題は、フレキシビリティを維持しつつ、任意のカメラのポーズを同時に推定し、任意のマルチビューイメージから人間のメッシュを復元できるというジレンマとして要約できる。このジレンマを解決するために、任意の多視点画像から統一人間メッシュ回復(U-HMR)を分離・征服するフレームワークを提案する。特にU-HMRは、分離された構造と、カメラとボディーデカップリング(CBD)、カメラポーズ推定(CPE)、任意のビュー融合(AVF)の2つの主要コンポーネントから構成される。カメラのポーズと人体メッシュが互いに独立しているため、CBDはそれらを2つのサブタスクに分割し、2つのサブネットワーク(ie, CPE, AVF)でそれぞれ処理する。 CPEでは、各カメラのポーズは他のカメラと無関係であるため、すべてのビューを並列に処理するために共有MLPを採用する。 AVFでは、マルチビュー情報を融合して融合操作をビュー数に依存しないものにするため、SMPLパラメータクエリトークンを用いたトランスフォーマーデコーダを導入し、メッシュリカバリのためのクロスビュー機能を抽出する。提案するフレームワークの有効性と各コンポーネントの効果を実証するため,Human3.6M,MPI-INF-3DHP,TotalCaptureの3つの公開データセットに対して広範な実験を行った。 Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views. Because of the variability, designing a unified framework to tackle this task is challenging. The challenges can be summarized as the dilemma of being able to simultaneously estimate arbitrary camera poses and recover human mesh from arbitrary multi-view images while maintaining flexibility. To solve this dilemma, we propose a divide and conquer framework for Unified Human Mesh Recovery (U-HMR) from arbitrary multi-view images. In particular, U-HMR consists of a decoupled structure and two main components: camera and body decoupling (CBD), camera pose estimation (CPE), and arbitrary view fusion (AVF). As camera poses and human body mesh are independent of each other, CBD splits the estimation of them into two sub-tasks for two individual sub-networks (ie, CPE and AVF) to handle respectively, thus the two sub-tasks are disentangled. In CPE, since each camera pose is unrelated to the others, we adopt a shared MLP to process all views in a parallel way. In AVF, in order to fuse multi-view information and make the fusion operation independent of the number of views, we introduce a transformer decoder with a SMPL parameters query token to extract cross-view features for mesh recovery. To demonstrate the efficacy and flexibility of the proposed framework and effect of each component, we conduct extensive experiments on three public datasets: Human3.6M, MPI-INF-3DHP, and TotalCapture.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# NovelQA:20万件の文書に関するベンチマーク質問 NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens ( http://arxiv.org/abs/2403.12766v2 ) ライセンス: Link先を確認	Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Xiangkun Hu, Zheng Zhang, Qian Wang, Yue Zhang,	(参考訳) 大規模言語モデル(LLM)の急速な進歩は、特に長文情報の理解と処理において、自然言語処理における新たなフロンティアを導入している。しかしながら、これらのモデルの長期コンテキスト能力の評価は、現在のベンチマークの限界のため、依然として課題である。このギャップに対処するために,拡張テキストでLLMの能力をテストするためのベンチマークであるNovellQAを紹介する。ノベルクアは英語の小説から作られており、複雑さ、長さ、物語のコヒーレンスを独特にブレンドしており、LLMの深いテキスト理解を評価するのに理想的なツールである。本稿では,ノベルQAの設計と構築について述べる。 NovelQA上でのLong-context LLMの評価では、特にマルチホップ推論、詳細指向の質問、および平均20万トークン以上の非常に長い入力で直面する課題について、モデルの性能に関する重要な洞察が明らかにされている。その結果,LLMの長文理解を改善するためのさらなる進歩の必要性が浮き彫りになった。 The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with an average length more than 200,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# 拡散復元モデルに基づく超音波イメージング Ultrasound Imaging based on the Variance of a Diffusion Restoration Model ( http://arxiv.org/abs/2403.15316v2 ) ライセンス: Link先を確認	Yuxin Zhang, Clément Huneau, Jérôme Idier, Diana Mateus,	(参考訳) 今日の医学における超音波画像の流行にもかかわらず、超音波信号とノイズの比率は、いくつかのノイズや人工物の影響を受けている。さらに、超音波画像品質の向上には、コントラスト、解像度、スペックル保存といった同時的な要因のバランスが伴う。近年,超音波画像再構成の問題に対処するモデルベースと学習ベースの両方のアプローチが進展している。両世界から最善を享受し, 生成的デノナイジング拡散モデルから得られた学習前モデルと超音波線形直列モデルを組み合わせたハイブリッド再構成手法を提案する。より具体的には、事前訓練されたDDRM(Denoising Diffusion Restoration Model)の教師なし微調整に頼る。本稿では,超音波固有の乗法ノイズの性質を考慮し,超音波画像の拡散再構成の確率性を特徴付ける実験モデルを提案する。本研究では, 合成, 生体内, 生体内データに関する実験を行い, 単一平面波取得による高画質画像再構成および最先端手法との比較において, 分散イメージング手法の有効性を実証した。コードは、https://github.com/Yuxin-Zhang-Jasmine/DRUSvarで入手できる。 Despite today's prevalence of ultrasound imaging in medicine, ultrasound signal-to-noise ratio is still affected by several sources of noise and artefacts. Moreover, enhancing ultrasound image quality involves balancing concurrent factors like contrast, resolution, and speckle preservation. Recently, there has been progress in both model-based and learning-based approaches addressing the problem of ultrasound image reconstruction. Bringing the best from both worlds, we propose a hybrid reconstruction method combining an ultrasound linear direct model with a learning-based prior coming from a generative Denoising Diffusion model. More specifically, we rely on the unsupervised fine-tuning of a pre-trained Denoising Diffusion Restoration Model (DDRM). Given the nature of multiplicative noise inherent to ultrasound, this paper proposes an empirical model to characterize the stochasticity of diffusion reconstruction of ultrasound images, and shows the interest of its variance as an echogenicity map estimator. We conduct experiments on synthetic, in-vitro, and in-vivo data, demonstrating the efficacy of our variance imaging approach in achieving high-quality image reconstructions from single plane-wave acquisitions and in comparison to state-of-the-art methods. The code is available at: https://github.com/Yuxin-Zhang-Jasmine/DRUSvar	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# 有限量子系におけるワームホールテレポーテーションの忠実性 Fidelity of Wormhole Teleportation in Finite-qubit Systems ( http://arxiv.org/abs/2403.16793v3 ) ライセンス: Link先を確認	Zeyu Liu, Pengfei Zhang,	(参考訳) 量子科学と技術の急速な発展は、量子シミュレーションによって量子多体システムを解釈できる時代へと導く。ホログラフィーの双対性は、強い相互作用を持つ系から重力と時空を表現し、実験的に実現不可能な高エネルギーを掘り下げることなく、重力物理学の実験研究のための自然な道を提供する。顕著な例として、ワームホール・テレポーテーションプロトコルを通したワームホールのシミュレーションがあり、理論的にも実験的にも注目されている。本研究では、相互情報と絡み合いの負性によって定量化され、全対一の相互作用を持つ$N$量子ビットシステムにおけるワームホールテレポーテーションの忠実度を計算するための理論的枠組みを開発する。主な手法はスクランブルン有効理論であり、一般的なカオス系における普遍的な時間外相関を捉えている。半古典的トラベル可能なワームホールのプローブ限界を, ほぼ最大カオスの強い相互作用系を用いてシミュレートするためには, 両システム間の強い結合が不可欠であることを示す。しかし、テレポーテーション信号はシステムサイズを$N$にすると急速に減少し、サハデフ・イェ・キタエフモデルをシミュレートすることで、創発的幾何学の鋭いシグネチャを観測するために多数のキュービットを必要とする。これには、信号の因果時間順序と、異なる信号と結合するためのテレポーテーション信号の非対称性の両方が含まれる。比較として、弱い相互作用を持つシステムにおいて、N$を減少させると、テレポーテーション信号が増加する。また、フェルミオン弦作用素における一般化符号化スキームの忠実度も解析する。 The rapid development of quantum science and technology is leading us into an era where quantum many-body systems can be comprehended through quantum simulations. Holographic duality, which states gravity and spacetime can emerge from strongly interacting systems, then offers a natural avenue for the experimental study of gravity physics without delving into experimentally infeasible high energies. A prominent example is the simulation of traversable wormholes through the wormhole teleportation protocol, attracting both theoretical and experimental attention. In this work, we develop the theoretical framework for computing the fidelity of wormhole teleportation in $N$-qubit systems with all-to-all interactions, quantified by mutual information and entanglement negativity. The main technique is the scramblon effective theory, which captures universal out-of-time-order correlations in generic chaotic systems. We clarify that strong couplings between the two systems are essential for simulating the probe limit of semi-classical traversable wormholes using strongly interacting systems with near-maximal chaos. However, the teleportation signal diminishes rapidly when reducing the system size $N$, requiring a large number of qubits to observe a sharp signature of emergent geometry by simulating the Sachdev-Ye-Kitaev model. This includes both the causal time-order of signals and the asymmetry of the teleportation signal for coupling with different signs. As a comparison, the teleportation signal increases when reducing $N$ in weakly interacting systems. We also analyze the fidelity of the generalized encoding scheme in fermionic string operators.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# Beyond Embeddings: Visual ReasoningにおけるVisual Tableの約束 Beyond Embeddings: The Promise of Visual Table in Visual Reasoning ( http://arxiv.org/abs/2403.18252v2 ) ライセンス: Link先を確認	Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang,	(参考訳) 視覚表現学習はコンピュータビジョンの基盤であり、視覚埋め込み、構造記号、テキストベースの表現などの典型的な形式を含んでいる。 CLIP型視覚埋め込みの成功にもかかわらず、視覚的推論にとって重要な世界知識へのアクセスが欠如していることが多い。本研究では,視覚的推論に適した新しい視覚表現形式である視覚表を提案する。ビジュアルテーブルは、視覚シーンの階層的な記述として構築され、シーン記述とカテゴリ、属性、知識を含む複数のオブジェクト中心の記述が特徴である。構造的およびテキスト的フォーマットのおかげで、ビジュアルテーブルは、解釈可能性や制御可能な編集など、単に視覚的な埋め込みよりも独特なアドバンテージを提供する。さらに、視覚的推論に不可欠な、インスタンスレベルの世界知識と詳細な属性を提供する。ビジュアルテーブルを作成するために、収集された小さなアノテーションを用いてデータセット上で訓練されたジェネレータを開発する。 11の視覚的推論ベンチマークの結果は、生成した視覚表が、以前の構造的およびテキストベースの表現よりも大幅に優れていたことを示している。さらに、さまざまなベンチマークで最先端のマルチモーダルな大規模言語モデルを強化し、視覚的推論タスクを前進させる可能性を示している。私たちのコードはhttps://github.com/LaVi-Lab/Visual-Table.comで利用可能です。 Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often lack access to world knowledge critical for visual reasoning. In this work, we propose Visual Table, a novel form of visual representation tailored for visual reasoning. Visual tables are constructed as hierarchical descriptions of visual scenes, featuring a scene description and multiple object-centric descriptions covering categories, attributes, and knowledge. Thanks to the structural and textual formats, visual tables offer unique advantages over mere visual embeddings, such as interpretability and controllable editing. Furthermore, they deliver instance-level world knowledge and detailed attributes that are essential for visual reasoning. To create visual tables, we develop a generator trained on the dataset with collected, small-scale annotations. Extensive results on 11 visual reasoning benchmarks demonstrate that the generated visual tables significantly outperform previous structural and text-based representations. Moreover, they consistently enhance state-of-the-art multimodal large language models across diverse benchmarks, showcasing their potential for advancing visual reasoning tasks. Our code is available at https://github.com/LaVi-Lab/Visual-Table.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# 雑音リンク上の分散最大合意 Distributed Maximum Consensus over Noisy Links ( http://arxiv.org/abs/2403.18509v2 ) ライセンス: Link先を確認	Ehsan Lari, Reza Arablouei, Naveen K. D. Venkategowda, Stefan Werner,	(参考訳) 本稿では,雑音の多い通信リンクが存在する場合のマルチエージェントネットワーク内の最大値を推定する分散アルゴリズムRD-MCを提案する。提案手法では,最大収束問題を分散最適化問題として再定義し,乗算器の交互方向法を用いて解を求める。複数のノイズ破損推定セットに依存する既存のアルゴリズムとは異なり、RD-MCは単一のセットを採用し、堅牢性と効率性を向上させる。リンクノイズの影響を緩和し、ロバスト性を向上させるため、移動平均化を局所推定に適用する。大規模なシミュレーションにより,RD-MCは既存の最大合意アルゴリズムに比べて通信リンクノイズに対してかなり頑健であることを示す。 We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.	翻訳日:2024-06-19 05:17:19 公開日:2024-06-17
# SGCNeRF:Sparse Geometric Consistency GuidanceによるFew-Shot Neural Rendering SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance ( http://arxiv.org/abs/2404.00992v2 ) ライセンス: Link先を確認	Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji,	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)技術は、新しい視点の創出に大きく貢献している。しかし、その効果は、わずかに利用可能なビューを扱うときに妨げられ、しばしばオーバーフィッティングによるパフォーマンス低下につながる。 FreeNeRFは、幾何学とテクスチャの両方を漸進的に改善する暗黙の幾何正規化を統合することで、この制限を克服しようとする。それでも、初期低位置符号化帯域は高周波素子を除外する。過度な適合と高周波の詳細の保存を兼ね備えた包括的アプローチの探求は現在も続いている。本研究では,特徴マッチングに基づくスパース幾何正規化モジュールを提案する。このモジュールは、高周波キーポイントをピンポイントすることで、詳細の完全性を保護する。我々は、NeRF反復による幾何やテクスチャの漸進的な改善を通じて、新規なビュー合成を向上するために、SGCNeRFと命名された効果的な数ショットのニューラルレンダリングアーキテクチャを公表する。 LLFFデータセットとDTUデータセットのPSNRの0.7dBと0.6dBの改善により、SGCNeRFは優れた幾何一貫性を持つ結果を得るだけでなく、FreeNeRFを上回る結果が得られることを示した。 Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless, an initial low positional encoding bandwidth results in the exclusion of high-frequency elements. The quest for a holistic approach that simultaneously addresses overfitting and the preservation of high-frequency details remains ongoing. This study introduces a novel feature matching based sparse geometry regularization module. This module excels in pinpointing high-frequency keypoints, thereby safeguarding the integrity of fine details. Through progressive refinement of geometry and textures across NeRF iterations, we unveil an effective few-shot neural rendering architecture, designated as SGCNeRF, for enhanced novel view synthesis. Our experiments demonstrate that SGCNeRF not only achieves superior geometry-consistent outcomes but also surpasses FreeNeRF, with improvements of 0.7 dB and 0.6 dB in PSNR on the LLFF and DTU datasets, respectively.	翻訳日:2024-06-19 05:07:34 公開日:2024-06-17
# 大規模言語モデルによる関連判断を用いたクエリ性能予測 Query Performance Prediction using Relevance Judgments Generated by Large Language Models ( http://arxiv.org/abs/2404.01012v2 ) ライセンス: Link先を確認	Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke,	(参考訳) クエリ性能予測(QPP)は,クエリの検索システムの検索品質を人間関係判断なしで推定することを目的としている。従来のQPPメソッドは通常、単一のスカラー値を返すが、特定の情報検索(IR)評価尺度を近似するために予測値を必要としない。一一つのスカラーで異なる赤外線評価尺度を正確に表すには不十分で、特にメトリクスが高度に相関しない場合 (II) 単一スカラーは、単にスカラーを用いることだけでQPP結果を説明することができないため、QPP法の解釈可能性を制限する。これらの問題に対処するために,QPPを個別のサブタスクに分解し,ランクリスト内の各項目の関連性を所定のクエリに予測するQPPフレームワーク(QPP-GenRE)を提案する。これにより、生成した関連判断を擬似ラベルとして利用して、任意のIR評価尺度を予測することができる。これにより、予測されたIR評価尺度を解釈し、生成された関連判断における誤りを特定し、追跡し、修正し、QPP品質を向上させることができる。我々は,オープンソースの大規模言語モデル(LLM)を用いて,科学的再現性を確保することにより,項目の関連性を予測する。主な課題は2つあります。一リコールを考慮したメートル法予測のための全コーパスを判定する過大な計算コスト (II) オープンソース LLM をゼロ/フェーショット方式でプロンプトする際の限られた性能。課題を解決するため、リコールを考慮したIR測度予測のための近似戦略を考案し、人間ラベルの関連判断を用いたオープンソースのLCMの微調整を提案する。 TREC 2019-2022のディープラーニングトラックでの実験によると、QPP-GenREは、語彙とニューラルランサーの両方で最先端のQPP品質を達成する。 Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of predicting the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels. This also allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific reproducibility. We face two main challenges: (i) excessive computational costs of judging an entire corpus for predicting a metric considering recall, and (ii) limited performance in prompting open-source LLMs in a zero-/few-shot manner. To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments. Experiments on the TREC 2019-2022 deep learning tracks show that QPP-GenRE achieves state-of-the-art QPP quality for both lexical and neural rankers.	翻訳日:2024-06-19 05:07:34 公開日:2024-06-17
# Rydberg Superatoms: 量子情報処理と量子光学のための人工量子システム Rydberg superatoms: An artificial quantum system for quantum information processing and quantum optics ( http://arxiv.org/abs/2404.05330v2 ) ライセンス: Link先を確認	Xiao-Qiang Shao, Shi-Lei Su, Lin Li, Rejish Nath, Jin-Hui Wu, Weibin Li,	(参考訳) ライドバーグ励起による高密度原子アンサンブルは、その強い長距離双極子-双極子相互作用を媒介する集団効果を示す。これらの集団効果は、しばしばリドバーグ超原子を用いてモデル化され、量子情報処理や量子光学における潜在的な応用により、様々な分野において大きな注目を集めている。本稿では,Rydberg相互作用の理論的基礎を掘り下げ,その操作と検出のための実験的手法を探求する。また、Rydberg集合効果を量子計算や光量子技術に活用する最新の進歩についても論じる。理論的研究と実験的実証から洞察を合成することにより、この急速に発展する分野と、量子技術の将来に対するその潜在的影響の包括的概要を提供する。 Dense atom ensembles with Rydberg excitations display intriguing collective effects mediated by their strong, long-range dipole-dipole interactions. These collective effects, often modeled using Rydberg superatoms, have gained significant attention across various fields due to their potential applications in quantum information processing and quantum optics. In this review article, we delve into the theoretical foundations of Rydberg interactions and explore experimental techniques for their manipulation and detection. We also discuss the latest advancements in harnessing Rydberg collective effects for quantum computation and optical quantum technologies. By synthesizing insights from theoretical studies and experimental demonstrations, we aim to provide a comprehensive overview of this rapidly evolving field and its potential impact on the future of quantum technologies.	翻訳日:2024-06-19 05:07:34 公開日:2024-06-17
# YOLC: 空撮画像の細い物体検出のためのクラスターのみを見る YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images ( http://arxiv.org/abs/2404.06180v2 ) ライセンス: Link先を確認	Chenguang Liu, Guangshuai Gao, Ziyue Huang, Zhenghui Hu, Qingjie Liu, Yunhong Wang,	(参考訳) 空中画像から物体を検出することは、以下の要因により大きな課題となる。 1) 空中画像は一般に非常に大きなサイズを持ち、一般に数百万または数億のピクセルを持つが、計算資源は限られている。 2) 対象物の大きさが小さいと, 有効検出に十分な情報が得られない。 3)不均一なオブジェクト分布は計算資源の浪費につながる。これらの問題に対処するために、我々は、アンカーフリーなオブジェクト検出器であるCenterNet上に構築された効率的で効果的なフレームワークであるYOLC(You Only Look Clusters)を提案する。大規模画像や非一様オブジェクトの分布がもたらす課題を克服するため,正確な検出のためにクラスタ領域のズームインを適応的に検索するローカルスケールモジュール(LSM)を導入する。さらに、ガウスワッサーシュタイン距離(GWD)を用いて回帰損失を修正し、高品質なバウンディングボックスを得る。検出ヘッドに変形可能な畳み込み・精細化法を用い、小型物体の検出を強化する。 Visdrone2019 と UAVDT を含む2つの航空画像データセットに対する広範な実験を行い、提案手法の有効性と優位性を実証した。コードはhttps://github.com/dawn-ech/YOLCで入手できる。 Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. Additionally, we modify the regression loss using Gaussian Wasserstein distance (GWD) to obtain high-quality bounding boxes. Deformable convolution and refinement methods are employed in the detection head to enhance the detection of small objects. We perform extensive experiments on two aerial image datasets, including Visdrone2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach. Code is available at https://github.com/dawn-ech/YOLC.	翻訳日:2024-06-19 05:07:34 公開日:2024-06-17
# 4-times 4$ Involutory MDS 行列の体系的構成法 A Systematic Construction Approach for All $4\times 4$ Involutory MDS Matrices ( http://arxiv.org/abs/2404.08250v2 ) ライセンス: Link先を確認	Yogesh Kumar, P. R. Mishra, Susanta Samanta, Atul Gaur,	(参考訳) 最大距離分離(MDS)行列は、符号化理論だけでなく、ブロック暗号やハッシュ関数の設計においても重要な役割を果たす。特に興味深いのは、ハードウェア実装における暗号化と復号化の両方に単一の回路を使用することを容易にする不揮発性MDS行列である。本稿では、偶数次不揮発性MDS行列のいくつかの特性について述べる。さらに、偶数列のすべての不揮発性MDS行列を得るための新しい行列形式を導入し、文献で利用可能な他の行列形式と比較する。次に、有限体 $\mathbb{F}_{2^m}$ 上の 4 つの時間 4$ のインボリュートな MDS 行列を体系的に構築する手法を提案する。この方法では,不揮発性MDSクラス代表行列に着目して探索空間を著しく減少させ,これらすべての行列を4,4,4$不揮発性行列と比較すると,かなり小さいセットで生成する。具体的には、これらの代表行列を濃度の集合((2^m-1)^5$)で探索する。この方法を通じて、$$$\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$ の総数 4 \times 4$ involutory MDS 行列を明示的に列挙する。 Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even order. Additionally, we introduce a new matrix form for obtaining all involutory MDS matrices of even order and compare it with other matrix forms available in the literature. We then propose a technique to systematically construct all $4 \times 4$ involutory MDS matrices over a finite field $\mathbb{F}_{2^m}$. This method significantly reduces the search space by focusing on involutory MDS class representative matrices, leading to the generation of all such matrices within a substantially smaller set compared to considering all $4 \times 4$ involutory matrices. Specifically, our approach involves searching for these representative matrices within a set of cardinality $(2^m-1)^5$. Through this method, we provide an explicit enumeration of the total number of $4 \times 4$ involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# 大きな言語モデルは間違いから進化し続けることができる Large Language Model Can Continue Evolving From Mistakes ( http://arxiv.org/abs/2404.08707v4 ) ライセンス: Link先を確認	Haokun Zhao, Haixia Han, Jie Shi, Chengyu Du, Jiaqing Liang, Yanghua Xiao,	(参考訳) 世界の知識が進化し、新しいタスクパラダイムが出現するにつれて、継続的な学習(CL)は、大きな言語モデル(LLM)を最新に保つ上で不可欠であり、その欠点に対処する。 LLMは、新しいタスクパラダイムに適応し、タスク解決に必要な知識を取得するために、連続的命令チューニング(CIT)と連続的事前訓練(CPT)の両方を必要とすることが多い。しかし, 適切なボリュームを維持しながら, モデル内の知識不足に対処するCPTデータを収集することは依然として困難であり, また, このデータの利用効率も向上している。そこで本研究では,CPTデータ収集のためのデータ効率の高いアプローチを提案し,誤り関連知識の反復的評価と補足によってLCMの性能を継続的に向上することを目的とした,ミスからの継続進化(Continuue Evolving from Mistakes, CEM)手法を提案する。これらのCPTデータを効率的に利用し、忘れを軽減するために、並列CITとCPTデータを統合する新しいCLトレーニングセット構築パラダイムを設計する。 CEM法の有効性を実証し,CEM法の精度を最大17%向上させる実験を行った。さらに、CEMと破滅的吸収緩和法を組み合わせる可能性を確認し、反復的および連続的なモデル進化を可能にする。 As world knowledge evolves and new task paradigms emerge, Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings. In practical applications, LLMs often require both continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new task paradigms and acquire necessary knowledge for task-solving. However, it remains challenging to collect CPT data that addresses the knowledge deficiencies in models while maintaining adequate volume, and improving the efficiency of utilizing this data also presents significant difficulties. Inspired by the 'summarizing mistakes' learning skill, we propose the Continue Evolving from Mistakes (CEM) method, aiming to provide a data-efficient approach for collecting CPT data and continually improving LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To efficiently utilize these CPT data and mitigate forgetting, we design a novel CL training set construction paradigm that integrates parallel CIT and CPT data. Extensive experiments demonstrate the efficacy of the CEM method, achieving up to a 17% improvement in accuracy in the best case. Furthermore, additional experiments confirm the potential of combining CEM with catastrophic forgetting mitigation methods, enabling iterative and continual model evolution.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# モンテカルロのマルコフ氏、3Dガウシアン・スプラッティング 3D Gaussian Splatting as Markov Chain Monte Carlo ( http://arxiv.org/abs/2404.09591v2 ) ライセンス: Link先を確認	Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi,	(参考訳) 3D Gaussian Splattingは最近、ニューラルレンダリングで人気になっているが、現在の手法は、ガウシアンを配置するための注意深く設計されたクローニングと分割戦略に依存しており、品質の悪いレンダリングにつながり、優れた初期化に依存している。本研究では,3次元ガウスの集合を,背景となる確率分布から得られたランダムなサンプルとして再考する。この観点から,3次元ガウス更新をSGLD(Stochastic Gradient Langevin Dynamics)更新として,単にノイズを導入するだけで変換できることを示す。次に,3次元ガウススプラッティングにおける密度化とプルーニングの戦略をMCMCサンプルの決定論的状態遷移として書き直し,これらのヒューリスティックをフレームワークから取り除いた。そのため、ガウスの「閉化」をサンプル確率を概ね保存する再局在化スキームに修正する。ガウスの効率的な利用を促進するために,未使用ガウスの除去を促進する正則化器を導入する。様々な標準的な評価シーンにおいて,本手法はレンダリング品質の向上,ガウス数の簡易制御,初期化に対する堅牢性などを実現する。 While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physical representation of the scene-in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates can be converted as Stochastic Gradient Langevin Dynamics (SGLD) updates by simply introducing noise. We then rewrite the densification and pruning strategies in 3D Gaussian Splatting as simply a deterministic state transition of MCMC samples, removing these heuristics from the framework. To do so, we revise the 'cloning' of Gaussians into a relocalization scheme that approximately preserves sample probability. To encourage efficient use of Gaussians, we introduce a regularizer that promotes the removal of unused Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# MOWA:マルチインワンイメージワープモデル MOWA: Multiple-in-One Image Warping Model ( http://arxiv.org/abs/2404.10716v2 ) ライセンス: Link先を確認	Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy,	(参考訳) 最近の画像ワープアプローチは既存のベンチマークで顕著に成功したが、特定のタスクごとに個別のモデルをトレーニングする必要があるため、異なるカメラモデルやカスタマイズされた操作にうまく対応できない。本研究で提案するマルチ・イン・ワン・イメージWArpingモデル(MOWA)は,マルチ・イン・ワン・イメージWArpingモデル(Multiple-in-One Image WArping model)である。具体的には、領域レベルと画素レベルの両方で動作推定を遠ざけることで、マルチタスク学習の難しさを軽減する。さらに動的なタスク認識画像のワープを可能にするために,タスクタイプを予測する軽量なポイントベース分類器を導入し,より正確な推定のために特徴マップを変調するプロンプトとして機能する。私たちの知る限り、これは1つのモデルで複数の実用的なワープタスクを解決する最初の作業です。マルチインワンイメージワープのために6つのタスクでトレーニングされたMOWAは、ほとんどのタスクで最先端のタスク固有モデルより優れています。さらに、MOWAは、クロスドメインとゼロショットの評価によって証明されているように、目に見えないシーンに一般化する有望な可能性をも示している。コードとより視覚的な結果は、プロジェクトのページ(https://kangliao929.github.io/projects/mowa/)で見ることができる。 While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level. To further enable dynamic task-aware image warping, we introduce a lightweight point-based classifier that predicts the task type, serving as prompts to modulate the feature maps for more accurate estimation. To our knowledge, this is the first work that solves multiple practical warping tasks in one single model. Extensive experiments demonstrate that our MOWA, which is trained on six tasks for multiple-in-one image warping, outperforms state-of-the-art task-specific models across most tasks. Moreover, MOWA also exhibits promising potential to generalize into unseen scenes, as evidenced by cross-domain and zero-shot evaluations. The code and more visual results can be found on the project page: https://kangliao929.github.io/projects/mowa/.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# δ_θ=nθ$のダイヨン Dyons with phase $δ_θ=nθ$ ( http://arxiv.org/abs/2404.11622v2 ) ライセンス: Link先を確認	Ricardo Heras,	(参考訳) 最近の論文 (Heras in Eur. Phys. J. Plus 138: 329, 2023) では、峡谷が電気および磁気束を囲む無限長のソレノイドを囲むとき、その波動関数が電磁双対変換の下で量子相不変性を蓄積することを示した。本稿では、この位相がウィッテン効果とともに真空角$\theta$に比例した位相位相となり、CP違反に結びつくことを示す。この位相は真空状態 $\delta_{\theta}=n\theta$ で量子化され、この量子化に関連する最も一般的な真空状態は、$\theta$-vacua のアベリア形式と同一視される。真空中における2つの仮定的干渉効果について論じ、そこでは角$\theta$が現れる。 In a recent paper (Heras in Eur. Phys. J. Plus 138: 329, 2023), we have demonstrated that when a dyon encircles an infinitely long solenoid enclosing electric and magnetic fluxes, its wave function accumulates a quantum phase invariant under electromagnetic duality transformations. In this paper, we show that this phase, in conjunction with the Witten effect, gives rise to a topological phase proportional to the vacuum angle $\theta$ and thereby connected with CP violation. We show that this phase becomes quantised in a vacuum state $\delta_{\theta}=n\theta$ and that the most general vacuum state associated with this quantisation identifies with an Abelian form of the $\theta$-vacua. We discuss two hypothetical interference effects in the vacuum where the angle $\theta$ could manifest.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# あいまいさを明示的に扱えるように言語モデルを調整する Aligning Language Models to Explicitly Handle Ambiguity ( http://arxiv.org/abs/2404.11972v2 ) ライセンス: Link先を確認	Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim,	(参考訳) ユーザと言語モデルエージェント間のインタラクションにおいて、ユーザの発話は効率を優先するために、楕円(単語やフレーズの省略)や不正確(正確さの欠如)をしばしば示す。これは、異なる仮定や背景知識に基づいて、同じ入力の様々な解釈につながる可能性がある。したがって、信頼性を確保するために、エージェントがクエリの固有のあいまいさを適切に処理することが不可欠である。しかし、現在最先端の大規模言語モデル(LLM)でさえも、主に次のようなハードルにより、このようなシナリオで課題に直面している:(1) LLMは、曖昧な発話を扱うために明示的に訓練されていない; (2) LLMが認識する曖昧さの程度は、所有する知識によって異なるかもしれない。これらの問題に対処するために、我々は、あいまいさ(すなわち知覚曖昧さ)の自己評価を活用することで、LLMをあいまいなクエリを管理するために調整する新しいパイプラインであるAlignment with Perceived Ambiguity (APA)を提案する。質問応答データセットの実験結果から、APAは、明確な質問に答える能力を維持しながら、あいまいなクエリを明示的に検出し、管理する権限をLLMに与えていることが示された。さらに,APAは,特にアウト・オブ・ディストリビューションのシナリオにおいて,ゴールド・スタンダード・ラベルのトレーニング以上に優れていることが確認された。 In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# X-Light: 変圧器上の変圧器をメタマルチエージェント強化学習器として用いた都市横断信号制御 X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner ( http://arxiv.org/abs/2404.12090v3 ) ライセンス: Link先を確認	Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao,	(参考訳) 交通光制御の有効性は、複数の信号機間の協調により、現在の強化学習に基づくアプローチによって著しく改善されている。しかし、持続的な問題として、多様な都市にまたがる顕著な転送性を持つマルチエージェント交通信号制御アルゴリズムの取得方法がある。本稿では,都市間メタマルチエージェント交通信号制御のためのトランスフォーマー(TonT)モデルを提案する。X-Light:我々はマルコフ決定プロセスの完全なトラジェクトリを入力し,ローワートランスフォーマーは,都市内における目標交差点とその周辺地域の状態,行動,報酬を集約し,アッパートランスフォーマーは,各都市間の一般的な決定トラジェクトリを学習する。この二重レベルアプローチはモデルの堅牢な一般化と伝達可能性を促進する。特に、目に見えないシナリオへの直接転送では、平均で+7.91%、場合によっては+16.3%のベースラインメソッドを超越し、最良の結果が得られる。 The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# MergeNet: 異種モデル、タスク、モダリティ間の知識マイグレーション MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities ( http://arxiv.org/abs/2404.13322v2 ) ライセンス: Link先を確認	Kunxi Li, Tianyu Zhan, Kairui Fu, Shengyu Zhang, Kun Kuang, Jiwei Li, Zhou Zhao, Fei Wu,	(参考訳) 本研究では, 全く異なるモデルアーキテクチャ, タスク, モダリティ間の異質な知識伝達に着目した。既存の知識伝達方法(例えば、バックボーン共有、知識蒸留)は、しばしばモデル構造やタスク固有の機能/ラベル内の共有要素にヒンジし、複雑なモデルタイプやタスクへの転送を制限する。これらの課題を克服するために、異種モデルのパラメータ空間のギャップを埋めることを学び、これらのパラメータ空間内での直接的な相互作用、抽出、知識の応用を容易にするMergeNetを提案する。 MergeNetの中核となるメカニズムはパラメータアダプタにあり、ソースモデルの低ランクパラメータをクエリして、ターゲットモデルへのパラメータの識別とマッピングを順応的に学習する。 MergeNetは両方のモデルと共に学習され、我々のフレームワークは、ソースモデルのトレーニング軌道知識を含む、現在のステージに関連する知識を動的に転送し、適応することができます。不均一な知識伝達に関する大規模な実験は、代表的アプローチが干渉したり適用範囲を減らしたりすることの可能な、挑戦的な設定において顕著な改善を示す。 In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.	翻訳日:2024-06-19 04:57:50 公開日:2024-06-17
# コヒーレンス測定に基づく電子ビームのウィグナー関数の再構成 Reconstruction of Wigner function of electron beams based on coherence measurements ( http://arxiv.org/abs/2404.13379v2 ) ライセンス: Link先を確認	Shuhei Hatanaka, Jun Yamasaki,	(参考訳) 電子ビームの密度行列とウィグナー関数をエアリーパターン強度プロファイル解析により再構成する方法を開発した。透過電子顕微鏡対象物体の密度行列をコヒーレンス関数と電子波振幅と位相分布を用いて計算した。その後、ウィグナー函数は行列要素を用いて再構成された。位相空間の起点におけるウィグナー関数に基づいて、その軸方向の明るさを計算する式を導出し、従来の平均輝度測定よりも精度よくエミッタ性能を反映したショットキー界放出ガンの軸方向の明るさを決定した。 We developed a reconstruction method for the density matrix and Wigner function of electron beams through analysis of the Airy pattern intensity profile. The density matrix in a transmission electron microscope object plane was calculated using the coherence function and the electron wave amplitude and phase distributions. The Wigner function was then reconstructed using the matrix elements. Based on the Wigner function at the origin of the phase space, we derived a formula to calculate the axial brightness, and then determined the axial brightness of a Schottky field emission gun, which reflects the emitter performance more accurately and precisely than the conventional mean brightness measurements.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# SnapKV: LLMは、あなたが生成前に探しているものを知っている SnapKV: LLM Knows What You are Looking for Before Generation ( http://arxiv.org/abs/2404.14469v2 ) ライセンス: Link先を確認	Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen,	(参考訳) 大きな言語モデル(LLM)は、キーバリュー(KV)キャッシュがパフォーマンス向上に重要な役割を果たしているため、広範なコンテキストの処理において顕著な進歩を遂げている。しかし、入力長の増加に対応するKVキャッシュの増加は、メモリと時間効率に課題をもたらす。この問題に対処するため,本稿では,KVキャッシュサイズを効率よく最小化しつつ,実世界のアプリケーションで同等のパフォーマンスを実現する,革新的な,微調整不要なアプローチであるSnapKVを紹介する。モデル内の各注意点が、生成中の特定の注意点に一貫して焦点を合わせていることが判明した。一方、この頑健なパターンはプロンプトの端にある「観測」ウィンドウから得ることができる。この洞察に基づいてSnapKVは、注目ヘッド毎にクラスタ化された重要なKV位置を選択することで、KVキャッシュを自動的に圧縮する。提案手法は,長い入力シーケンスを処理する際の計算オーバーヘッドとメモリフットプリントの増大を著しく低減する。具体的には、SnapKVは16Kトークンの入力を処理する際に、生成速度が3.6倍、メモリ効率が8.2倍向上して一貫した復号速度を達成する。同時に、16の長いシーケンスデータセットにわたるベースラインモデルと同等のパフォーマンスを維持している。さらに、SnapKVはHuggingFace実装を使って1つのA100-80GB GPU上で最大380Kのコンテキストトークンを小さな変更で処理でき、Needdle-in-a-Haystackテストでは無視できる精度の低下しか表示できない。より包括的な研究は、SnapKVの実用的な応用の可能性を示している。 Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach that efficiently minimizes KV cache size while still delivering comparable performance in real-world applications. We discover that each attention head in the model consistently focuses on specific prompt attention features during generation. Meanwhile, this robust pattern can be obtained from an 'observation' window located at the end of the prompts. Drawing on this insight, SnapKV automatically compresses KV caches by selecting clustered important KV positions for each attention head. Our approach significantly reduces the growing computational overhead and memory footprint when processing long input sequences. Specifically, SnapKV achieves a consistent decoding speed with a 3.6x increase in generation speed and an 8.2x enhancement in memory efficiency compared to the baseline when processing inputs of 16K tokens. At the same time, it maintains comparable performance to the baseline models across 16 long sequence datasets. Moreover, SnapKV can process up to 380K context tokens on a single A100-80GB GPU using HuggingFace implementation with minor changes, exhibiting only a negligible accuracy drop in the Needle-in-a-Haystack test. Further comprehensive studies suggest SnapKV's potential for practical applications.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# PLAYER:殺人ミステリーゲームにおけるLLMに基づくマルチエージェントコミュニケーションとインタラクションの強化 PLAYER: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games ( http://arxiv.org/abs/2404.17662v2 ) ライセンス: Link先を確認	Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He,	(参考訳) 複雑な問題に対処し、動的環境における対人関係を理解する上で、LLM(Large Language Models)上に構築された既存のエージェントベースのアプローチの限界に対処する新しいフレームワークPLAYERを提案する。 PLAYERは,任意のサンプリングベースプランナと質問駆動検索フレームワークを用いて,Murder Mystery Games(MMG)のパス計画を強化する。エージェントに一連のセンサーを装備することで、PLAYERは事前に定義された質問を不要にし、エージェントが複雑な社会的相互作用をナビゲートすることを可能にする。また,複数問合せを用いた定量評価手法を導入し,1,482問問問答対を含むデータセットWellPlayを提案する。実験の結果、PLAYERは既存のマルチエージェント法よりも優れており、MMGにおけるエージェントの汎用性と適応性を高め、より効果的なマルチエージェントインタラクションの道を開いた。 We propose PLAYER, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# リモートセンシング画像における高能率メタラーニングによるマルチスケールFew-Shotオブジェクト検出 Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images ( http://arxiv.org/abs/2404.18426v3 ) ライセンス: Link先を確認	Wenbin Guan, Zijiu Yang, Xiaohong Wu, Liqiong Chen, Feng Huang, Xiaohai He, Honggang Chen,	(参考訳) 現在、リモートセンシング画像(RSI)における小ショット物体検出(FSOD)の課題が注目されている。多数の数発の検出器、特に2段階の検出器に基づく検出器は、RSIに固有のマルチスケールの複雑さを扱う際に困難に直面している。さらに、これらの検出器は、大量のデータを扱う際に、主に不安定なモデルパラメータのために、現実世界の応用において非現実的な特性を示す。対照的に、高い検出速度や大域的受容場を含む一段検出器の利点を認識している。その結果,YOLOv71段検出器をベースラインとして選択し,新しいメタラーニングトレーニングフレームワークを提案する。この変換により、検出器はFSODのタスクに十分対応できると同時に、その固有の軽量化の利点を活かすことができる。さらに, メタ学習戦略によって生成されたサンプルを徹底的に調査し, 設計したメタ検出ヘッドが生成したサンプルを保持するための新しいメタサンプリング手法を提案する。考案したメタクロス損失と相まって、しばしば見過ごされる"負のサンプル"を意図的に利用して、それらから貴重な知識を抽出します。このアプローチは、検出精度を高め、全体的なメタ学習戦略を効率的に洗練する。提案した検出器の有効性を検証するため,DIORとNWPU VHR-10.v2データセットを用いて現状の検出器の性能比較を行い,良好な結果を得た。 Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwieldy model parameters when handling large amount of data. In contrast, we recognize the advantages of one-stage detectors, including high detection speed and a global receptive field. Consequently, we choose the YOLOv7 one-stage detector as a baseline and subject it to a novel meta-learning training framework. This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight. Additionally, we thoroughly investigate the samples generated by the meta-learning strategy and introduce a novel meta-sampling approach to retain samples produced by our designed meta-detection head. Coupled with our devised meta-cross loss, we deliberately utilize "negative samples" that are often overlooked to extract valuable knowledge from them. This approach serves to enhance detection accuracy and efficiently refine the overall meta-learning strategy. To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors using the DIOR and NWPU VHR-10.v2 datasets, yielding satisfactory results.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# バイアス・コントラスト・ペアにおけるクラス識別コモン属性の探索による内在的特徴のデバイアス化 Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair ( http://arxiv.org/abs/2404.19250v2 ) ライセンス: Link先を確認	Jeonghoon Park, Chaeyeon Chung, Juyoung Lee, Jaegul Choo,	(参考訳) 画像分類タスクでは、ディープニューラルネットワークは、データセットバイアスが存在する場合、ターゲットクラスと突発的に相関するバイアス特性にしばしば依存し、バイアス属性のないデータに適用した場合、性能が低下する。 Debiasingのタスクは、バイアス属性ではなく、本質的にターゲットクラスを定義する固有の属性を学ぶために、分類器を強制することを目的としている。近年のアプローチでは、バイアス特性のないデータサンプルの学習(すなわちバイアス強調サンプル)をバイアス特性を持つサンプル(すなわちバイアス整合サンプル)と比較して強調する傾向にあるが、本質的な特徴の学習に焦点をあてるモデルを直接指導するには至っていない。この制限に対処するため,本研究では,本質的な特徴の領域を示す明示的な空間的ガイダンスをモデルに提供する手法を提案する。まず, バイアス整合型 (BA) サンプルとバイアス整合型 (BC) サンプル (バイアス整合型 (BC) ペア) のクラス識別共通特徴について検討した。次に, BA試料の内在的特徴をBC試料と比較した場合, 予測にはあまり役に立たなかった。バイアス情報を使わずにバイアス競合対を構築するために,バイアスモデルを用いたBAサンプルとBCサンプルを区別するバイアス負スコアを導入する。実験により, 種々のバイアス重大度を有する合成および実世界のデータセットに対して, 最先端の性能を達成できることが実証された。 In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias attributes. While recent approaches mainly focus on emphasizing the learning of data samples without bias attributes (i.e., bias-conflicting samples) compared to samples with bias attributes (i.e., bias-aligned samples), they fall short of directly guiding models where to focus for learning intrinsic features. To address this limitation, this paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. We first identify the intrinsic features by investigating the class-discerning common features between a bias-aligned (BA) sample and a bias-conflicting (BC) sample (i.e., bias-contrastive pair). Next, we enhance the intrinsic features in the BA sample that are relatively under-exploited for prediction compared to the BC sample. To construct the bias-contrastive pair without using bias information, we introduce a bias-negative score that distinguishes BC samples from BA samples employing a biased model. The experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# ParaGrapher を用いた大規模圧縮グラフの選択的並列ロード Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher ( http://arxiv.org/abs/2404.19735v2 ) ライセンス: Link先を確認	Mohsen Koohi Esfahani, Marco D'Antonio, Syed Ibtisam Tauhidi, Thai Son Mai, Hans Vandierendonck,	(参考訳) 総合評価は実験科学の基礎の1つである。高性能グラフ処理では、さまざまなフレームワーク上で共通の入力フォーマットをサポートすることで、コントリビューションの徹底的な評価がより達成できるようになります。しかし、それぞれのフレームワークは、大規模な実世界のグラフデータセットの読み込みをサポートしない特定のフォーマットを作成する。これはグラフをロードできる高性能ライブラリの需要を示している。 (i)新しいグラフアルゴリズムの設計を加速する (二)幅広いグラフアルゴリズムへの貢献を評価すること、及び (iii)異なるグラフフレームワークに対する簡易かつ高速な比較を容易にすること。そこで我々は,大規模および圧縮されたグラフをロードする高性能APIおよびライブラリであるParaGrapherを紹介する。 ParaGrapherは、共有メモリおよび分散メモリおよびアウトオブコアグラフ処理でグラフにアクセスするためのさまざまなタイプのリクエストをサポートする。本稿では,ParaGrapherの設計と,ParaGrapherを3つのストレージタイプで評価するグラフ圧縮の性能モデルについて説明する。評価の結果,ParaGrapherは圧縮グラフをWebGraph形式で圧縮することにより,ロード時の最大3.2倍,エンドツーエンド実行時の最大5.2倍の高速化を実現している。 ParaGrapherはhttps://blogs.qub.ac.uk/DIPSA/ParaGrapher/.comで公開されている。 Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks. To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats. ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# SIMPLOT: 必需品の蒸留によるチャート回答の強化 SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials ( http://arxiv.org/abs/2405.00021v2 ) ライセンス: Link先を確認	Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park,	(参考訳) 近年,論理的推論による複雑なチャートの解釈が,視覚言語モデルの開発による課題として浮上している。従来のSOTA(State-of-the-art)モデルでは、視覚言語モデルを利用して、大言語モデル(LLM)を用いてチャートをテーブル形式に変換するエンド・ツー・エンドの手法が提案されている。しかし、自然画像とは異なり、チャートにはチャート推論に必要な重要な情報と無関係な情報が混在しており、この特性がチャートからテーブルへの抽出の性能を低下させることが判明した。本稿では,チャート推論に必要な要素のみを抽出するSIMPLOTを提案する。提案手法には2つのステップがある。 1)複雑な図表から表を抽出するための重要な情報のみを含む単純なプロットを模倣する訓練。 2) 表に基づく推論を行う。本モデルでは,アノテーションやデータセットを追加することなく正確なチャート推論が可能であり,その有効性は様々な実験によって実証されている。さらに,より正確な推論を行うために,人間の解釈チャートを模倣する新しいプロンプトを提案する。ソースコードはhttps://github.com/sangwu99/Simplot.comから入手可能です。 Recently, interpreting complex charts with logical reasoning has emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Large Language Model (LLM) for reasoning. However, unlike natural images, charts contain a mix of essential and irrelevant information required for chart reasoning, and we discover that this characteristic can lower the performance of chart-to-table extraction. In this paper, we introduce SIMPLOT, a method designed to extract only the elements necessary for chart reasoning. The proposed method involves two steps: 1) training to mimic a simple plot that contains only the essential information from a complex chart for table extraction, followed by 2) performing reasoning based on the table. Our model enables accurate chart reasoning without the need for additional annotations or datasets, and its effectiveness is demonstrated through various experiments. Furthermore, we propose a novel prompt mimicking how human interpret charts for more accurate reasoning. Our source code is available at https://github.com/sangwu99/Simplot.	翻訳日:2024-06-19 04:48:05 公開日:2024-06-17
# 平均場コヒーレントイジングマシンを用いたL0規則化圧縮センシング L0-regularized compressed sensing with Mean-field Coherent Ising Machines ( http://arxiv.org/abs/2405.00366v2 ) ライセンス: Link先を確認	Mastiyage Don Sudeera Hasaranga Gunathilaka, Yoshitaka Inui, Satoshi Kako, Kazushi Mimura, Masato Okada, Yoshihisa Yamamoto, Toru Aonishi,	(参考訳) コヒーレントイジングマシン(Coherent Ising Machine, CIM)は、イジング・ハミルトンの基底状態を見つけることで組合せ最適化問題を解決する光学パラメトリック発振器のネットワークである。 CIMの実用化として、AonishiらはL0規則化に基づく圧縮センシング(L0RBCS)の最適化問題を解決するために量子古典ハイブリッドシステムを提案した。 Gunathilakaらはシステムの精度をさらに高めた。しかし、計算コストのかかるCIMの確率微分方程式(SDE)は、デジタルハードウェアの実装の使用を制限する。我々は,GunathilakaらのCIM SDEの代替として,量子ノイズのない物理学的なヒューリスティック解法である平均場CIM(MF-CIM)モデルを提案する。 MF-CIMは微分方程式(DE)の単純性により計算コストを上回ります。さらに,提案手法は,CIMベースのL0RBCSをFPGA(Field Programmable Gate Arrays)などのデジタルハードウェア上で実装する方法として,人工的および磁気共鳴画像データの両方において,物理的に正確なSDEと類似した性能を有することを示す。 Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the accuracy of the system. However, the computationally expensive CIM's stochastic differential equations (SDEs) limit the use of digital hardware implementations. As an alternative to Gunathilaka et al.'s CIM SDEs used previously, we propose using the mean-field CIM (MF-CIM) model, which is a physics-inspired heuristic solver without quantum noise. MF-CIM surmounts the high computational cost due to the simple nature of the differential equations (DEs). Furthermore, our results indicate that the proposed model has similar performance to physically accurate SDEs in both artificial and magnetic resonance imaging data, paving the way for implementing CIM-based L0RBCS on digital hardware such as Field Programmable Gate Arrays (FPGAs).	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# 対数的複雑度と規則保証を考慮したオンライングラディエント型キャッシングポリシー An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees ( http://arxiv.org/abs/2405.01263v2 ) ライセンス: Link先を確認	Damiano Carra, Giovanni Neglia,	(参考訳) LRU(Least recent Used)やLFU(Least Frequently Used)といった一般的なキャッシュポリシは、特定のトラフィックパターンの下でのみ最適なパフォーマンスを示す。過去の要求データのパターンを検出する高度な機械学習ベースの方法でさえ、将来の要求が過去のトレンドから逸脱した場合に苦労する。近年,交通パターンの変化に頑健な新しいポリシーが出現している。これらのアルゴリズムは、コンテキストへの継続的な適応を可能にするオンライン最適化問題に対処する。それらは、オンラインポリシーと後ろ向きの最適な静的キャッシュ割り当ての間のパフォーマンスギャップを測定する、後悔の度合いに関する理論的保証を提供する。しかし、これらの解の計算複雑性が高いことは、その実践的採用を妨げる。本研究では,カタログサイズに対する対数計算の複雑さを突破し,かつ,後悔の保証を提供する,勾配に基づくオンラインキャッシュポリシーの新たなバリエーションを提案する。この進歩により、何百万ものリクエストやアイテムをフィーチャーした大規模で現実世界のトレース上でポリシーをテストすることが可能になります。我々の知る限り、我々の実験結果は、勾配に基づくキャッシュポリシーの後悔の保証が、現実的なシナリオでかなりの利益をもたらすことを初めて証明した。 Commonly used caching policies, such as LRU (Least Recently Used) or LFU (Least Frequently Used), exhibit optimal performance only under specific traffic patterns. Even advanced machine learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that are robust to varying traffic patterns. These algorithms address an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which measures the performance gap between the online policy and the optimal static cache allocation in hindsight. However, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity relative to catalog size, while also providing regret guarantees. This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items - a significant achievement, as such scales have been beyond the reach of existing policies with regret guarantees. To the best of our knowledge, our experimental results demonstrate for the first time that the regret guarantees of gradient-based caching policies offer substantial benefits in practical scenarios.	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# ポジション:LLMを理解するには統計的一般化以上のものが必要だ Position: Understanding LLMs Requires More Than Statistical Generalization ( http://arxiv.org/abs/2405.01964v3 ) ライセンス: Link先を確認	Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár,	(参考訳) この10年、ディープラーニング理論における花の咲く研究が「なぜディープラーニングは一般化するのか?」と答えようとしている。パースペクティブの強力なシフトは、補間系における過度にパラメトリケートされたモデルの研究という、この進歩を早めた。本稿では, LLMの望ましい性質のいくつかは, 良好な統計一般化の結果ではなく, 別々に理論的な説明を必要とするため, もう一つの視点シフトが原因であると主張する。我々の中心的な議論は、AR確率モデルは本質的には識別不可能である、という観察に依存している。我々は,(1)ゼロショット規則外挿の非識別性,(2)文脈内学習の近似的非識別性,(3)微視的学習の非識別性という3つのケーススタディを通じて,非識別性が実際的関連性を持つ理由を考察した。我々は, LLM関連一般化対策, 伝達可能性, 誘導バイアスに着目した有望な研究方向性を概観する。 The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# P-ICL:大規模言語モデルを用いた名前付きエンティティ認識のためのポイントインコンテキスト学習 P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models ( http://arxiv.org/abs/2405.04960v2 ) ライセンス: Link先を確認	Guochao Jiang, Zepeng Ding, Yuchen Shi, Deqing Yang,	(参考訳) 近年,大規模言語モデル (LLM) の台頭により,実演サンプルを使わずに直接名前付きエンティティ認識 (NER) を実現することが可能になった。しかし、標準のICLは、LSMがタスク命令、フォーマット、入力ラベルマッピングを理解するのに役立ち、NERタスク自体の特異性を無視している。本稿では, LLM を用いて NER をよりよく実現するための新しいプロンプトフレームワーク P-ICL を提案する。このような重要な情報により、LLMはより正確にエンティティ分類を達成することができる。そこで本研究では,K-Meansクラスタリングに基づくポイントエンティティ選択手法を提案する。 P-ICL とポイントエンティティ選択における提案手法の有効性を検証するため,いくつかの代表的 NER ベンチマークの広範な実験を行った。 In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we propose a new prompting framework P-ICL to better achieve NER with LLMs, in which some point entities are leveraged as the auxiliary information to recognize each entity type. With such significant information, the LLM can achieve entity classification more precisely. To obtain optimal point entities for prompting LLMs, we also proposed a point entity selection method based on K-Means clustering. Our extensive experiments on some representative NER benchmarks verify the effectiveness of our proposed strategies in P-ICL and point entity selection.	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# 拡散過程とインプット補間予測マスクによる時系列表現の自己教師付き学習 Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask ( http://arxiv.org/abs/2405.05959v2 ) ライセンス: Link先を確認	Zineb Senane, Lele Cao, Valentin Leonhard Buchner, Yusuke Tashiro, Lei You, Pawel Herman, Mats Nordahl, Ruibo Tu, Vilhelm von Ehrenheim,	(参考訳) 時系列表現学習(TSRL)は、様々な時系列モデリングタスクのための情報表現を生成することに焦点を当てている。 TSRLの従来の自己監視学習(SSL)の手法は、再構成、反対、対照的、予測の4つの主要なカテゴリに分類され、それぞれにノイズに対する感受性と複雑なデータニュアンスに関する共通の課題がある。近年,拡散法は高度な生成能力を示している。しかし、それらは主に計算や予測のような特定のアプリケーションシナリオをターゲットにしており、一般的なTSRLに拡散モデルを利用する際のギャップを残している。我々の研究である Time Series Diffusion Embedding (TSDE) は、このギャップを最初の拡散ベースのSSL TSRLアプローチとして橋渡ししています。 TSDEは、Imputation-Interpolation-Forecasting (IIF)マスクを使用して、TSデータを観察およびマスクされた部分にセグメントする。両直交トランスフォーマーエンコーダとクロスオーバー機構を備えたトレーニング可能な埋め込み関数を観察部位に適用する。我々は,マスク部分に追加される雑音を予測するために,埋め込みを条件とした逆拡散過程を訓練する。大規模な実験は、TSDEの計算、補間、予測、異常検出、分類、クラスタリングにおける優位性を実証している。また,TSDEデータの学習表現における効率と妥当性について,アブレーション研究,埋め込み可視化の提示,推論速度の比較を行い,TSDEの効率と妥当性について検討した。 Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# RAG 会議 LLM に関する調査研究 : 検索型大規模言語モデルに向けて A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models ( http://arxiv.org/abs/2405.06211v3 ) ライセンス: Link先を確認	Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li,	(参考訳) AIの最も高度な技術のひとつとして、Retrieval-Augmented Generation(RAG)は、信頼性と最新の外部知識を提供し、多数のタスクに多大な利便性を提供する。特にAIGC(AI-Generated Content)の時代には、追加知識を提供するための強力な検索能力により、RAGは既存の生成AIが高品質な出力を生成するのを支援することができる。近年、Large Language Models (LLM) は言語理解と生成において革命的な能力を示しつつも、幻覚や時代遅れの内的知識といった固有の制限に直面している。最新の補助情報を提供するRAGの強力な能力を考えると、検索型大規模言語モデル(RA-LLM)は、モデルの内部知識にのみ依存するのではなく、外部および権威的な知識ベースを活用してLLMの生成品質を向上する。本調査では, RA-LLMの既存の研究成果を概観し, アーキテクチャ, トレーニング戦略, 応用の3つの技術的側面を概観する。予備知識として,LLMの基礎と最近の進歩を紹介する。次に, LLMにおけるRAGの実用的意義を説明するため, アーキテクチャ, トレーニング戦略, アプリケーション分野の主流となる業務を体系的に検討し, RA-LLMの課題とそれに対応する能力について詳述する。最後に、より深い洞察を提供するため、今後の研究に向けて、現在の限界といくつかの有望な方向性について論じる。この調査に関する最新の情報は、https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/にある。 As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, we briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, we systematically review mainstream relevant work by their architectures, training strategies, and application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, we discuss current limitations and several promising directions for future research. Updated information about this survey can be found at https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# チャート上でのMLLMのタスクベースの有効性評価 Evaluating Task-based Effectiveness of MLLMs on Charts ( http://arxiv.org/abs/2405.07001v2 ) ライセンス: Link先を確認	Yifan Wu, Lutao Yan, Yuyu Luo, Yunhai Wang, Nan Tang,	(参考訳) 本稿では,GPT-4Vはグラフ上の低レベルデータ解析タスクに有効か? この目的のために、我々はまず89,388のクォーテット(チャート、タスク、質問、回答)からなるChartInsightsという名の大規模なデータセットをキュレートし、7つのチャートタイプで広く使用されている10の低レベルデータ分析タスクをカバーした。まず、12のオープンソースモデルと6のクローズドソースモデルを含む18の高度なMLLMの能力と限界を理解するために、系統的な評価を行う。標準的なテキストプロンプトアプローチから始めて、18個のMLLMの平均精度は36.17%である。全てのモデルの中で、GPT-4Vは最高精度で56.13%に達する。低レベルデータ解析タスクにおけるマルチモーダル大モデルの限界を理解するため、GPT-4Vの機能の詳細なテストを行うために様々な実験を設計した。さらに、視覚要素の変更(例えば、色調の変更)や摂動の導入(例えば、画像ノイズの追加)など、チャートに対する視覚的変化が、GPT-4Vの性能に与える影響についても検討する。第2に,12例の実験的検討を行った。これらの結果は,GPT-4Vがチャートとの相互作用に革命をもたらす可能性を示し,人的分析ニーズとGPT-4Vの能力のギャップを明らかにすることを示唆している。第3に、低レベル解析タスクに適した、Chain-of-Chartsという新しいテキストプロンプト戦略を提案し、モデル性能を24.36%向上させ、80.49%の精度を実現した。さらに, GPT-4Vの注意を疑問関連視覚要素に向ける視覚的プロンプト戦略を導入することにより, さらに精度を83.83%向上させる。本研究は,低レベルデータ解析タスクにおけるGPT-4Vの能力と限界に光を当てるだけでなく,今後の研究に有用な知見を提供する。 In this paper, we explore a forward-thinking question: Is GPT-4V effective at low-level data analysis tasks on charts? To this end, we first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types. Firstly, we conduct systematic evaluations to understand the capabilities and limitations of 18 advanced MLLMs, which include 12 open-source models and 6 closed-source models. Starting with a standard textual prompt approach, the average accuracy rate across the 18 MLLMs is 36.17%. Among all the models, GPT-4V achieves the highest accuracy, reaching 56.13%. To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V. We further investigate how visual modifications to charts, such as altering visual elements (e.g. changing color schemes) and introducing perturbations (e.g. adding image noise), affect performance of GPT-4V. Secondly, we present 12 experimental findings. These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V. Thirdly, we propose a novel textual prompt strategy, named Chain-of-Charts, tailored for low-level analysis tasks, which boosts model performance by 24.36%, resulting in an accuracy of 80.49%. Furthermore, by incorporating a visual prompt strategy that directs attention of GPT-4V to question-relevant visual elements, we further improve accuracy to 83.83%. Our study not only sheds light on the capabilities and limitations of GPT-4V in low-level data analysis tasks but also offers valuable insights for future research.	翻訳日:2024-06-19 04:38:09 公開日:2024-06-17
# 特徴融合ネットワークを用いた人・機械用スケーラブル画像符号化 Scalable Image Coding for Humans and Machines Using Feature Fusion Network ( http://arxiv.org/abs/2405.09152v5 ) ライセンス: Link先を確認	Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe,	(参考訳) 画像認識モデルがより普及するにつれて、機械や人間のスケーラブルなコーディング方法がより重要になる。画像認識モデルの応用例としては、交通監視と農業管理がある。これらのユースケースでは、スケーラブルな符号化手法が有効であることが証明される。人間や機械の既存の画像圧縮手法は、これらの要件をある程度満たしている。しかし,これらの圧縮法は特定の画像認識モデルにのみ有効である。本稿では,多数の画像認識モデルと互換性のある人や機械を対象とした,学習に基づくスケーラブルな画像符号化手法を提案する。我々は,機械用画像圧縮モデルと圧縮モデルを組み合わせて,人間の画像復号を容易にするための追加情報を提供する。これらの圧縮モデルの特徴は、効率的な画像圧縮を実現するために、特徴融合ネットワークを用いて融合される。本手法では,特徴融合ネットワークにおいて,異なるサイズの特徴の組み合わせを可能とし,パラメータ数を削減するために,付加的な情報圧縮モデルを調整する。提案手法では,パラメータ数を削減しつつ,画像圧縮モデルを効率よく組み合わせることを確認する。さらに、デコードされた画像の品質とビットレートの観点から画像圧縮性能を評価することにより、提案手法の有効性を実証する。 As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# 点クラウドデータセットへの量子ニューラルネットワークの適用における正確な置換と回転対称性の強制 Enforcing exact permutation and rotational symmetries in the application of quantum neural network on point cloud datasets ( http://arxiv.org/abs/2405.11150v3 ) ライセンス: Link先を確認	Zhelun Li, Lento Nagano, Koji Terashi,	(参考訳) 量子機械学習の分野での最近の進歩は、量子回路の構造に物理対称性を取り入れるというアイデアを推進してきた。この領域における重要なマイルストーンは、入力オブジェクトの置換の下で同変である$S_{n}$-permutation等変量子ニューラルネットワーク(QNN)の実現である。本稿では,ポイントクラウドデータセットの回転対称性をQNNに符号化することに焦点を当てる。このアプローチのキーとなる洞察は、ベクトル入力を持つすべての回転不変関数は、ベクトル内部積の入力を持つ関数と等価であるということである。プロトン-陽子衝突によって生じる高エネルギー粒子崩壊をSO(1,3)$ローレンツ対称性で数値的に証明し,その有効性を示す。 Recent developments in the field of quantum machine learning have promoted the idea of incorporating physical symmetries in the structure of quantum circuits. A crucial milestone in this area is the realization of $S_{n}$-permutation equivariant quantum neural networks (QNN) that are equivariant under permutations of input objects. In this work, we focus on encoding the rotational symmetry of point cloud datasets into the QNN. The key insight of the approach is that all rotationally invariant functions with vector inputs are equivalent to a function with inputs of vector inner products. We provide a novel structure of QNN that is exactly invariant to both rotations and permutations, with its efficacy demonstrated numerically in the problems of two-dimensional image classifications and identifying high-energy particle decays, produced by proton-proton collisions, with the $SO(1,3)$ Lorentz symmetry.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# SLAB: 線形注意とプログレッシブ再パラメータ化バッチ正規化を簡略化した効率的な変圧器 SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization ( http://arxiv.org/abs/2405.11582v2 ) ライセンス: Link先を確認	Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang,	(参考訳) トランスフォーマーは自然言語とコンピュータビジョンの両方のタスクの基盤となるアーキテクチャとなっている。しかし、計算コストが高いため、リソース制約のあるデバイスへのデプロイは非常に困難である。本稿では,効率的な変圧器の計算ボトルネックモジュール,すなわち正規化層とアテンションモジュールについて検討する。 LayerNormはトランスフォーマーアーキテクチャで一般的に使用されるが、推論中の統計計算のために計算に適さない。しかし、トランスフォーマーでLayerNormをより効率的なBatchNormに置き換えると、しばしばパフォーマンスが低下し、トレーニングが崩壊する。そこで本研究では,LayerNorm を再パラメータ化した BatchNorm に段階的に置き換える PRepBN という新しい手法を提案する。さらに,単純化された線形アテンション(SLA)モジュールを提案する。画像分類および物体検出に関する大規模な実験により,提案手法の有効性が示された。例えば、私たちのSLAB-Swinは、ImageNet-1K上で16.2$msのレイテンシで8,3.6\%のTop-1精度を得ることができ、これはFlatten-Swinよりも2.4$ms安く、精度は0.1$%の精度である。また、言語モデリングタスクの手法を評価し、同等のパフォーマンスと低レイテンシを得る。コードはhttps://github.com/xinghaochen/SLABとhttps://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLABで公開されています。 Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains $83.6\%$ top-1 accuracy on ImageNet-1K with $16.2$ms latency, which is $2.4$ms less than that of Flatten-Swin with $0.1\%$ higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# CNNを用いた後処理による人間の視覚層における符号化画像の精細化 Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing ( http://arxiv.org/abs/2405.11894v2 ) ライセンス: Link先を確認	Takahiro Shindo, Yui Tatsumi, Taiju Watanabe, Hiroshi Watanabe,	(参考訳) 人間と機械の両方のスケーラブルなイメージコーディングは、最近多くの注目を集めているテクニックです。この技術は、人間の視覚と画像認識モデルのための画像の階層的復号化を可能にする。画像が両方の目的を果たす必要がある場合、非常に効果的な方法である。しかし、一般的な画像圧縮方式でよく使われるポストプロセッシングを人や機械のスケーラブルな画像符号化法に組み込んだ研究はまだない。本稿では,ポストプロセッシングをスケーラブルな符号化方式に統合することにより,人間のデコード画像の品質を向上させる手法を提案する。実験結果から, 後処理により圧縮性能が向上することが示された。さらに,従来の手法との比較により,提案手法の有効性を検証した。 Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression schemes into scalable image coding method for humans and machines. In this paper, we propose a method to enhance the quality of decoded images for humans by integrating post-processing into scalable coding scheme. Experimental results show that the post-processing improves compression performance. Furthermore, the effectiveness of the proposed method is validated through comparisons with traditional methods.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# シフトベクトルの測地的性質と量子化 Geodesic nature and quantization of shift vector ( http://arxiv.org/abs/2405.13355v2 ) ライセンス: Link先を確認	Hua Wang, Kai Chang,	(参考訳) 量子系における幾何シフトベクトルの測地的性質と量子化について、Wilsonループアプローチを用いてブロッホ運動量によって定義されるパラメータ空間について述べる。我々の分析は、非垂直遷移を持つボソニックフォノンドラッグシフトベクトルを含むまで拡張する。ゲージ不変シフトベクトルは、滑らかな境界を持つ多様体に対するガウス・ボンネットの定理に基づくオイラー特性に類似した整数値として量子化できることを示した。シフトベクトル、ベリー曲率、量子計量などの幾何量の間の複雑な関係を明らかにする。その結果, 量子化バンド式におけるシフトベクトルのループ積分は, 円光ガルバニック効果における導電率のトレースの非量子化成分に寄与することが示唆された。ウィルソンループ法は第一原理計算を容易にし、これらのバンド間ゲージ不変量の幾何学的基盤に関する洞察を与え、実材料における非線形光学的現象に光を遮蔽する。 We present the geodesic nature and quantization of geometric shift vector in quantum systems, with the parameter space defined by the Bloch momentum, using the Wilson loop approach. Our analysis extends to include bosonic phonon drag shift vectors with non-vertical transitions. We demonstrate that the gauge invariant shift vector can be quantized as integer values, analogous to the Euler characteristic based on the Gauss-Bonnet theorem for a manifold with a smooth boundary. We reveal intricate relationships among geometric quantities such as the shift vector, Berry curvature, and quantum metric. Our findings demonstrate that the loop integral of the shift vector in the quantized interband formula contributes to the non-quantized component of the trace of conductivity in the circular photogalvanic effect. The Wilson loop method facilitates first-principles calculations, providing insights in the geometric underpinnings of these interband gauge invariant quantities and shedding light on their nonlinear optical manifestations in real materials.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# 分散調和:フェデレートされたクラスタバッチ効果の調整と一般化 Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization ( http://arxiv.org/abs/2405.15081v2 ) ライセンス: Link先を確認	Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul Thompson, Jiayu Zhou,	(参考訳) 独立かつ同一に分散したデータ(d.d.)は多くのデータ分析とモデリング技術に不可欠である。医療分野において、複数の施設や施設からデータを収集することは、医療データの分散性によって決定される十分な臨床多様性を保証する共通の戦略である。しかし、各地のデータは、現地の環境や施設によって容易にバイアスを受け、従ってi.d.ルールに違反する。一般的な戦略は、重要な生物学的情報を保持しながら、サイトのバイアスを調和させることである。 ComBatは最も人気のある調和方式の一つであり、最近分散サイトを扱うように拡張されている。しかし、新しく加入したサイトが未知のサイトからデータをトレーニングしたり、評価したりする状況に直面している場合、ComBatは互換性に欠け、すべてのサイトからのデータで再トレーニングする必要がある。再訓練は計算上のオーバーヘッドとロジスティックなオーバーヘッドをもたらし、通常は禁止される。本研究では,異なるサイトのデータのクラスタパターンを活用し,ComBatのハーモニゼーションのユーザビリティを大幅に向上させる新しいクラスタ・コンバット・ハーモニゼーション・アルゴリズムを提案する。提案手法の優位性を実証するために,ADNIによる広範囲なシミュレーションと実際の医用画像データを用いた。私たちのコードはhttps://github.com/illidanlab/distributed-cluster-harmonizationで提供されます。 Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.	翻訳日:2024-06-19 04:28:22 公開日:2024-06-17
# MindStar: 推論時間における事前学習LDMにおける数学推論の強化 MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time ( http://arxiv.org/abs/2405.16265v2 ) ライセンス: Link先を確認	Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Boxing Chen,	(参考訳) 大きな言語モデル(LLM)は様々なタスクで顕著なパフォーマンスを達成するが、数学的な疑問に答えるなど複雑な推論タスクに苦しむことが多い。この問題に対処する最近の取り組みは、主に教師付き微調整技術や自己改善技術による数学的データセットの活用に焦点を当てている。しかし、これらの手法は、しばしば準備が難しい高品質なデータセットに依存するか、あるいは微調整のためにかなりの計算資源を必要とする。 LLMが正しい答えを生成する方法を知っているが、正しい推論経路を選択するのに苦労しているという発見に触発されて、我々は純粋に推論に基づく探索手法であるMindStar (M)を提案する。本手法は,探索問題として推論タスクを定式化し,最適な推論経路を特定するための2つの探索アイデアを提案する。 GSM8KとMATHの両方のデータセット上でMフレームワークを評価し,その性能を既存のオープンソースLLMと比較した。その結果,M* は Llama-2-13B や Mistral-7B などのオープンソースモデルの推論能力を大幅に向上し,GPT-3.5 や Grok-1 に匹敵する性能が得られたが,モデルサイズや計算コストは大幅に削減された。 Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datasets that are difficult to prepare, or they require substantial computational resources for fine-tuning. Inspired by findings that LLMs know how to produce the right answer but struggle to select the correct reasoning path, we propose a purely inference-based searching method -- MindStar (M). This method formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. We evaluate the M framework on both the GSM8K and MATH datasets, comparing its performance with existing open and closed-source LLMs. Our results demonstrate that M* significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1, but with substantially reduced model size and computational costs.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# ステラー光曲線のスケーリング法則 The Scaling Law in Stellar Light Curves ( http://arxiv.org/abs/2405.17156v2 ) ライセンス: Link先を確認	Jia-Shu Pan, Yuan-Sen Ting, Yang Huang, Jie Yu, Ji-Feng Liu,	(参考訳) 恒星の光曲線として知られる恒星からの一連のフラックスを分析することで、恒星の性質に関する貴重な情報を明らかにすることができる。しかし、現在のほとんどの手法は要約統計の抽出に依存しており、ディープラーニングを用いた研究は教師付きアプローチに限られている。本研究では、天文時系列データから学習するときに現れるスケーリング法則について、自己監督技術を用いて検討する。 GPT-2アーキテクチャを用いることで,パラメータ数が10^4$から10^9$に増加するにつれて,性能の低下の兆候がなく,学習表現が向上することを示す。本研究では, 自監督トランスフォーマーモデルを用いて, 恒星の表面重力を下流の課題として推定した場合の, 最先端の教師付き学習モデルと比較して, サンプル効率を310倍に向上させることを示した。本研究は,大規模自己回帰生成モデルを用いて恒星の光度曲線を解析するための基礎研究である。 Analyzing time series of fluxes from stars, known as stellar light curves, can reveal valuable information about stellar properties. However, most current methods rely on extracting summary statistics, and studies using deep learning have been limited to supervised approaches. In this research, we investigate the scaling law properties that emerge when learning from astronomical time series data using self-supervised techniques. By employing the GPT-2 architecture, we show the learned representation improves as the number of parameters increases from $10^4$ to $10^9$, with no signs of performance plateauing. We demonstrate that a self-supervised Transformer model achieves 3-10 times the sample efficiency compared to the state-of-the-art supervised learning model when inferring the surface gravity of stars as a downstream task. Our research lays the groundwork for analyzing stellar light curves by examining them through large-scale auto-regressive generative models.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# インテクスト学習のためのベンチマーク Benchmarking General Purpose In-Context Learning ( http://arxiv.org/abs/2405.17234v4 ) ライセンス: Link先を確認	Fan Wang, Chuan Lin, Yang Cao, Yu Kang,	(参考訳) インコンテキスト学習(ICL)は、柔軟性、汎用性、サンプル効率、人工最適化スキルの免除などにより、AIコミュニティにますますアピールしている。汎用インコンテキスト学習(GPICL)の概念がもたらされるICLの汎用性と能力のさらなる向上が望まれる。我々は、より広い範囲のタスクに対応するためにICLを拡張し、比較的制限されたゼロショットの一般化を伴いながら、学習の地平を拡大し、改善の可能性を高めることを目指している。この目的のために、GPICLの機能のトレーニングと評価に特化して開発された2つの軽量で洞察に富んだベンチマークを導入する。各ベンチマークには、大きなタスク分散を特徴とする膨大なタスクが含まれており、最小限の帰納バイアスが特徴である。これらのタスクは、連続した生成と相互作用を通じて、生涯にわたるコンテキスト内学習を促進するように設計されている。これらの特徴は、言語モデル、決定モデル、世界モデルなどの能力を向上させるために文脈や相互作用に依存するモデルに重大な課題をもたらす。実験の結果,パラメータのスケールはICLやGPICLにとって重要ではなく,コンテキストやメモリ状態のスケールを増大させるような代替手法が提案されている。 In-context learning (ICL) is becoming increasingly appealing to the AI community due to its flexibility, generality, sample efficiency, and exemption from artificial optimization skills. It is desirable to further enhance the generality and capability of ICL, which gives rise to the concept of general-purpose in-context learning (GPICL). We aim to extend ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, albeit with relatively limited zero-shot generalization. To this end, we introduce two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark includes a vast number of tasks characterized by significant task variance, featuring minimal inductive bias. These tasks are also designed to facilitate lifelong in-context learning through continuous generation and interaction. These features pose significant challenges for models that rely on context or interactions to improve their proficiency, including language models, decision models, and world models. Our experiments reveal that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# 観測量子ドットの力学における近藤-世野交叉 Kondo-Zeno crossover in the dynamics of a monitored quantum dot ( http://arxiv.org/abs/2405.17348v2 ) ライセンス: Link先を確認	Matthieu Vanhoecke, Marco Schirò,	(参考訳) 金属浴に結合した量子ドットの力学について検討し, 電荷密度の連続モニタリングを行った。測定ノイズ上で平均化された力学は、局所マルコフのデプションを持つ散逸的アンダーソン不純物モデルにより記述され、ベクトル化されたヒルベルト空間における非閉近似の拡張を用いて解決する。浴槽と監視プロトコルに突然結合した初期偏光スピンの崩壊時間スケールは, 相互作用によって制御された近藤スクリーニングから量子ゼノ効果へのクロスオーバーを示し, 脱落・監視速度が増大するにつれて, 脱落とともに減少する寿命を示す。リンドブラディアン上のシュリーファー・ヴォルフ変換を用いて、複素数値スピン-スピン交換を持つ非エルミート・コンドモデルによって弱散逸時に記述される長時間力学の有効モデルが導出される。ダブルロン生成による脱落反応の加熱が増加すると、スピン崩壊が制御される。 We study the dynamics of a quantum dot coupled to a metallic bath and subject to continuous monitoring of its charge density. The dynamics averaged over measurement noise is described by a dissipative Anderson impurity model with local Markovian dephasing, that we solve using an extension of the Non-Crossing Approximation in the vectorized Hilbert space. We show that the decay time scale of an initially polarised spin which is suddenly coupled to the bath and to the monitoring protocol displays a crossover from Kondo screening, with a lifetime controlled by interactions, to Quantum Zeno effect, with a lifetime which decreases with bare dissipation as the dephasing or monitoring rate is increased. Using a Schrieffer-Wolff transformation on the Lindbladian we derive an effective model for the long-time dynamics which is described at weak dissipation by a non-Hermitian Kondo model with complex-valued spin-spin exchange. As the dephasing is increased heating due to doublon production takes over and control the spin decay.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# ART: テキストから画像への自動リピートによるユーザ保護 ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users ( http://arxiv.org/abs/2405.19360v2 ) ライセンス: Link先を確認	Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, Tianwei Zhang,	(参考訳) 大規模で事前訓練された生成モデルは、創造的なコンテンツを生成する能力のために、世界を嵐にさらしている。一方、これらの生成モデルの保護は、ユーザの権利と安全性を保護するために開発されており、そのほとんどは大規模言語モデル用に設計されている。既存の手法は主に、悪質なプロンプトの下でモデルの安全性を評価するジェイルブレイクと敵攻撃に焦点を当てている。最近の研究によると、手作業で安全なプロンプトを作れば、意図せずに安全でない世代が引き起こされる可能性がある。そこで本研究では,テキスト・ツー・イメージモデルの安全性リスクを定量的に評価するために,新しい自動レッド・チーム・フレームワークARTを提案する。本手法は,視覚言語モデルと大言語モデルの両方を活用し,安全でない世代とそのプロンプト間の接続を確立することにより,モデルの脆弱性をより効率的に識別する。包括的実験により、人気のあるオープンソーステキスト・ツー・イメージモデルの毒性を明らかにする。実験はまた、ARTの有効性、適応性、および大きな多様性を検証した。さらに,テキスト・ツー・イメージ・モデルに関連する安全性リスクを研究するために,大規模な3つのレッド・チーム・データセットを導入する。データセットとモデルはhttps://github.com/GuanlinLee/ARTで確認できる。 Large-scale pre-trained generative models are taking the world by storm, due to their abilities in generating creative content. Meanwhile, safeguards for these generative models are developed, to protect users' rights and safety, most of which are designed for large language models. Existing methods primarily focus on jailbreak and adversarial attacks, which mainly evaluate the model's safety under malicious prompts. Recent work found that manually crafted safe prompts can unintentionally trigger unsafe generations. To further systematically evaluate the safety risks of text-to-image models, we propose a novel Automatic Red-Teaming framework, ART. Our method leverages both vision language model and large language model to establish a connection between unsafe generations and their prompts, thereby more efficiently identifying the model's vulnerabilities. With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models. The experiments also validate the effectiveness, adaptability, and great diversity of ART. Additionally, we introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models. Datasets and models can be found in https://github.com/GuanlinLee/ART.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# TetSphere Splatting:ラグランジアン体積メッシュを用いた高品質形状の表現 TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes ( http://arxiv.org/abs/2405.20283v2 ) ライセンス: Link先を確認	Minghao Guo, Bohan Wang, Kaiming He, Wojciech Matusik,	(参考訳) 高品質な幾何学を用いて3次元形状を再構成するための明示的なラグランジュ表現であるTetSphere splattingを提案する。ニューラル暗黙的(例えば、NeRF、NeuS)と明示的(例えば、DMTet)の両方を含むユーレリア表現を主に用いた従来のオブジェクト再構成手法とは異なり、高い計算要求と最適メッシュ品質に苦しむ場合が多いが、TetSphere splatting は未使用で非常に効果的な原始的四面体メッシュを利用する。このアプローチでは、ニューラルネットワークや後処理に頼ることなく、メッシュ品質が直接的に向上する。複数の初期四面体球を変形させ、微分可能レンダリングと幾何エネルギー最適化を組み合わせて3次元形状を正確に再構成し、計算効率を著しく向上させる。 Tet-Sphereのスプラッティングは、堅牢で汎用的な幾何学表現として機能し、シングルビューの3D再構成、画像とテキストの3Dコンテンツ生成など、多様なアプリケーションにシームレスに統合される。実験結果から,TetSphereスプラッティングは既存の表現よりも優れており,最適化速度の向上,メッシュ品質の向上,薄型構造物の信頼性維持を実現している。 We present TetSphere splatting, an explicit, Lagrangian representation for reconstructing 3D shapes with high-quality geometry. In contrast to conventional object reconstruction methods which predominantly use Eulerian representations, including both neural implicit (e.g., NeRF, NeuS) and explicit representations (e.g., DMTet), and often struggle with high computational demands and suboptimal mesh quality, TetSphere splatting utilizes an underused but highly effective geometric primitive -- tetrahedral meshes. This approach directly yields superior mesh quality without relying on neural networks or post-processing. It deforms multiple initial tetrahedral spheres to accurately reconstruct the 3D shape through a combination of differentiable rendering and geometric energy optimization, resulting in significant computational efficiency. Serving as a robust and versatile geometry representation, Tet-Sphere splatting seamlessly integrates into diverse applications, including single-view 3D reconstruction, image-/text-to-3D content generation. Experimental results demonstrate that TetSphere splatting outperforms existing representations, delivering faster optimization speed, enhanced mesh quality, and reliable preservation of thin structures.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# MTEB- French: フランス語文の埋め込み評価と分析のためのリソース MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis ( http://arxiv.org/abs/2405.20468v2 ) ライセンス: Link先を確認	Mathieu Ciancone, Imene Kerboua, Marion Schaeffer, Wissam Siblini,	(参考訳) 近年、様々なNLPタスクに多くの埋め込みモデルが利用可能となり、広く使われている。 MTEB(Massive Text Embedding Benchmark)は、主に英語のいくつかのタスクでうまく機能するモデルを選択するプロセスを単純化しているが、他の言語への拡張は難しいままである。そこで、MTEBを拡張して、フランス語の文埋め込みに関する最初の大規模なベンチマークを提案する。 15の既存のデータセットを使いやすいインターフェースで収集し、8つのタスクカテゴリのグローバル評価のための3つの新しいフランス語データセットを作成します。我々は,大規模に選択した51個の埋め込みモデルを比較し,包括的統計テストを行い,モデル性能と多くの特性の相関関係を解析した。すべてのタスクにおいてモデルが最良でない場合でも、文類似性に基づいて事前訓練された大規模多言語モデルは非常によく機能することがわかった。私たちの作業には、オープンソースコード、新しいデータセット、公開リーダボードが含まれています。 Recently, numerous embedding models have been made available and widely used for various NLP tasks. The Massive Text Embedding Benchmark (MTEB) has primarily simplified the process of choosing a model that performs well for several tasks in English, but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. We gather 15 existing datasets in an easy-to-use interface and create three new French datasets for a global evaluation of 8 task categories. We compare 51 carefully selected embedding models on a large scale, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform exceptionally well. Our work comes with open-source code, new datasets and a public leaderboard.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# Ovis: マルチモーダル大言語モデルのための構造埋め込みアライメント Ovis: Structural Embedding Alignment for Multimodal Large Language Model ( http://arxiv.org/abs/2405.20797v2 ) ライセンス: Link先を確認	Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye,	(参考訳) 現在のMultimodal Large Language Models (MLLM) は、通常、事前訓練されたLLMと、MLPのようなコネクタを通じて、他の事前訓練されたビジョントランスフォーマーを統合する。しかし、MLLMの2つの埋め込み戦略(埋め込みルックアップテーブルに基づく構造的テキスト埋め込みと、ビジョンエンコーダによって直接生成される継続的埋め込み)の相違は、視覚的およびテキスト情報のよりシームレスな融合に挑戦する。視覚とテキストの埋め込みを構造的に整列する新しいMLLMアーキテクチャであるOvisを提案する。 Ovisは学習可能なビジュアル埋め込みテーブルをビジュアルエンコーダのプロセスに統合する。リッチな視覚的セマンティクスをキャプチャするために、各イメージパッチは視覚的埋め込みテーブルを複数回インデックスし、最終的な視覚的埋め込みはインデックス化された埋め込みの確率的組み合わせとなる。この構造的アプローチは、テキスト埋め込みを生成するために使われる手法を反映している。様々なマルチモーダルベンチマークにおける実証的な評価は、Ovisが同様のパラメータスケールのオープンソースMLLMよりも優れており、Qwen-VL-Plusのプロプライエタリモデルよりも優れていることを示している。これらの結果は,MLLMアーキテクチャ設計を推進し,より効果的なマルチモーダル学習を促進するために,Ovisが構築した視覚表現の可能性を強調している。コード、データセット、モデルはhttps://github.com/AIDC-AI/Ovis.comで入手できる。 Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information. We propose Ovis, a novel MLLM architecture designed to structurally align visual and textual embeddings. Ovis integrates an additional learnable visual embedding table into the visual encoder's process. To capture rich visual semantics, each image patch indexes the visual embedding table multiple times, resulting in a final visual embedding that is a probabilistic combination of the indexed embeddings. This structural approach mirrors the method used for generating textual embeddings. Empirical evaluations on various multimodal benchmarks show that Ovis outperforms open-source MLLMs of similar parameter scales and even surpasses the proprietary model Qwen-VL-Plus overall. These results highlight the potential of Ovis' structured visual representation for advancing MLLM architectural design and promoting more effective multimodal learning. Code, datasets, and models are available at https://github.com/AIDC-AI/Ovis.	翻訳日:2024-06-19 04:18:36 公開日:2024-06-17
# 量子最適制御における基底の役割 The Role of Bases in Quantum Optimal Control ( http://arxiv.org/abs/2405.20889v2 ) ライセンス: Link先を確認	Alice Pagano, Matthias M Müller, Tommaso Calarco, Simone Montangero, Phila Rembold,	(参考訳) 量子最適制御(QOC)は、パルスレベルで問題に取り組むことで量子技術の進歩をサポートする: 数値的なアプローチは、有限個の変数で適用された時間依存フィールドをパラメトリすることで、与えられたターゲットに向かって反復的に機能する。結果の最適化の有効性は、問題の複雑さと変数の数に依存する。応用基底の選択が最適化の品質に影響を及ぼすかどうかを問うため、基底関数の観点から異なるパラメトリを考察する。さらに、最も適切な基盤を選択するための戦略も検討する。比較のために,シック基底とシグモイド基底をフーリエ基底の代替として導入する3つの異なるランダム化可能な基底を,複雑さの異なるQOC問題に基づいて検証した。各問題に対して、基底固有の収束速度は、一意のランク付けをもたらす。特にクローズドループでの高価な評価では、最大10倍のスピードアップが最適化の実現可能性に不可欠である。問題依存に基づく基本選択はQOC効率に影響を及ぼす要因であり、そのアプローチに対するアドバイスを提供すると結論付けている。 Quantum Optimal Control (QOC) supports the advance of quantum technologies by tackling its problems at the pulse level: Numerical approaches iteratively work towards a given target by parametrising the applied time-dependent fields with a finite set of variables. The effectiveness of the resulting optimisation depends on the complexity of the problem and the number of variables. We consider different parametrisations in terms of basis functions, asking whether the choice of the applied basis affects the quality of the optimisation. Furthermore, we consider strategies to choose the most suitable basis. For the comparison, we test three different randomisable bases - introducing the sinc and sigmoid bases as alternatives to the Fourier basis - on QOC problems of varying complexity. For each problem, the basis-specific convergence rates result in a unique ranking. Especially for expensive evaluations, e.g., in closed-loop, a potential speed-up by a factor of up to 10 may be crucial for the optimisation's feasibility. We conclude that a problem-dependent basis choice is an influential factor for QOC efficiency and provide advice for its approach.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# KnowledgeHub: 科学的発見を支援するエンドツーエンドツール KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery ( http://arxiv.org/abs/2406.00008v2 ) ライセンス: Link先を確認	Shinnosuke Tanaka, James Barry, Vishnudev Kuruvanthodi, Movina Moses, Maxwell J. Giammona, Nathan Herr, Mohab Elkaref, Geeth De Mel,	(参考訳) 本稿では、知識Hubツール、科学文献情報抽出(IE)および質問回答(QA)パイプラインについて述べる。これはPDF文書がテキストや構造化表現に変換されるのをサポートすることで達成される。オントロジーは、ユーザがキャプチャしたいエンティティとリレーションのタイプを定義するように構築できる。ブラウザベースのアノテーションツールは、オントロジーに従ってPDF文書の内容に注釈を付けることができる。名前付きエンティティ認識(NER)と関係分類(RC)モデルは、結果として得られたアノテーションに基づいてトレーニングすることができ、文書の注釈のない部分を注釈付けするのに使うことができる。これらのエンティティと関係トリプルから知識グラフを構築し、データから洞察を得るためにクエリすることができる。さらに,QAや要約に使用できるLarge Language Models (LLMs) のスイートを統合する。 KnowledgeHubは、アノテーション、IE、QAをサポートするユニークなツールである。 This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. Named Entity Recognition (NER) and Relation Classification (RC) models can be trained on the resulting annotations and can be used to annotate the unannotated portion of the documents. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data. Furthermore, we integrate a suite of Large Language Models (LLMs) that can be used for QA and summarisation that is grounded in the included documents via a retrieval component. KnowledgeHub is a unique tool that supports annotation, IE and QA, which gives the user full insight into the knowledge discovery pipeline.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# 騒音・不確実環境における深部RL用逆流機 Reward Machines for Deep RL in Noisy and Uncertain Environments ( http://arxiv.org/abs/2406.00120v2 ) ライセンス: Link先を確認	Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith,	(参考訳) Reward Machinesは、命令、安全性の制約、その他の時間的に拡張された報酬に値する振る舞いを指定するための、オートマチックにインスパイアされた構造を提供する。複雑な報酬関数構造を公開することで、サンプル効率が著しく向上した反実的学習の更新が可能になる。 Reward Machinesは表と奥のRL設定の両方で使われているが、典型的には、報酬関数の構成要素を形成するドメイン固有の語彙の地味な解釈に依存している。このような地味な解釈は、部分的な可観測性やノイズ感知のために、現実世界で多くの場面で解明することができる。本稿では,雑音および不確実な環境における深部RLに対するReward Machinesの利用について検討する。我々はこの問題をPOMDPとして特徴付け、ドメイン固有語彙の不確定な解釈の下でタスク構造を利用するRLアルゴリズムスイートを提案する。理論的解析により,本問題に対する直感的なアプローチの落とし穴が明らかとなり,実験結果から,我々のアルゴリズムはタスク構造をうまく活用し,語彙のノイズの多い解釈下での性能向上を図っている。本研究では,Reward Machinesを部分的に観測可能な環境で活用するための一般的なフレームワークを提供する。 Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While Reward Machines have been employed in both tabular and deep RL settings, they have typically relied on a ground-truth interpretation of the domain-specific vocabulary that form the building blocks of the reward function. Such ground-truth interpretations can be elusive in many real-world settings, due in part to partial observability or noisy sensing. In this paper, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that leverage task structure under uncertain interpretation of domain-specific vocabulary. Theoretical analysis exposes pitfalls in naive approaches to this problem, while experimental results show that our algorithms successfully leverage task structure to improve performance under noisy interpretations of the vocabulary. Our results provide a general framework for exploiting Reward Machines in partially observable environments.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# A-SDM:モデルアセンブリと特徴継承戦略による安定拡散の加速 A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies ( http://arxiv.org/abs/2406.00210v3 ) ライセンス: Link先を確認	Jinchao Zhu, Yuxuan Wang, Siyuan Pan, Pengfei Wan, Di Zhang, Gao Huang,	(参考訳) 安定拡散モデル (SDM) はテキスト・トゥ・イメージ (T2I) と画像・ツー・イメージ (I2I) 生成のための一般的かつ効果的なモデルである。サンプル最適化、モデル蒸留、ネットワーク量子化の様々な試みにもかかわらず、これらのアプローチは典型的には元のネットワークアーキテクチャを維持している。広範なパラメータスケールとかなりの計算要求により、モデルアーキテクチャの調整に関する研究は限られている。本研究では,SDMにおける冗長計算の削減に焦点をあて,チューニング不要とチューニング不要の両方の手法を用いてモデルを最適化する。 1) 本手法では, 蒸留により性能を保ちつつ, 軽量モデルを再構築するためのモデル組立戦略を設計する。第2に, プレニングによる性能低下を軽減するため, 圧縮ユニセットにマルチエキスパート条件付き畳み込み(ME-CondConv)を導入し, 速度を犠牲にすることなく, ネットワーク性能を向上させる。第3に,ネットワーク速度向上のためのマルチUNet切替方式の有効性を検証する。 2)チューニング不要な手法では,ネットワーク構造内のブロック,層,単位レベルの局所計算をスキップすることで,推論を高速化する機能継承戦略を提案する。また、時間段階における特徴継承のための複数のサンプリングモードについても検討する。実験により,提案手法とチューニング不要手法の両方がSDMの高速化と性能向上を図っている。モデル組立戦略によって再構成された軽量モデルは、生成速度を22.4%高め、特徴継承戦略はSDM生成速度を40.0%高めにする。 The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. This study focuses on reducing redundant computation in SDM and optimizes the model through both tuning and tuning-free methods. 1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation. Second, to mitigate performance loss due to pruning, we incorporate multi-expert conditional convolution (ME-CondConv) into compressed UNets to enhance network performance by increasing capacity without sacrificing speed. Third, we validate the effectiveness of the multi-UNet switching method for improving network speed. 2) For the tuning-free method, we propose a feature inheritance strategy to accelerate inference by skipping local computations at the block, layer, or unit level within the network structure. We also examine multiple sampling modes for feature inheritance at the time-step level. Experiments demonstrate that both the proposed tuning and the tuning-free methods can improve the speed and performance of the SDM. The lightweight model reconstructed by the model assembly strategy increases generation speed by $22.4%$, while the feature inheritance strategy enhances the SDM generation speed by $40.0%$.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# Bileve: 双方向署名によるスポーフィングに対する大規模言語モデルにおけるテキストの保護 Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature ( http://arxiv.org/abs/2406.01946v2 ) ライセンス: Link先を確認	Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren,	(参考訳) 大規模言語モデル(LLM)のテキスト透かしは、ディープフェイクや有害なコンテンツと闘う際の責任評価を約束する機械生成コンテンツの起源を特定するために一般的に用いられてきた。既存の透かし技術は、通常、除去攻撃に対する堅牢性を優先するが、残念ながら、悪質なアクターはLLM生成の応答の意味を微妙に変更したり、有害なコンテンツを偽造したり、LLM開発者の非難を招きかねない。この問題を解決するために、二レベルシグネチャスキームであるBileveを導入する。これは、整合性チェック(スプーフィング攻撃の軽減)のためのきめ細かいシグネチャビットを埋め込むとともに、新しいランクベースのサンプリング戦略により、シグネチャが無効(検出可能性の向上)であるときにテキストソースをトレースする粗いシグネチャビットを埋め込む。バイナリ結果のみを出力する従来の透かし検出器と比較して、Bileveは検出中に5つのシナリオを区別し、テキストの出所を確実に追跡し、LLMを調整できる。 OPT-1.3BとLLaMA-7Bで実施された実験は、検出性を高めたスプーフ攻撃を打破するBileveの有効性を実証した。 Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# 圧縮率の高いキー情報の保持:LCM用クエリ誘導圧縮機 Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ( http://arxiv.org/abs/2406.02376v2 ) ライセンス: Link先を確認	Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su,	(参考訳) 大規模言語モデルの人気が高まり、LLM(Large Language Models)のコンテキスト圧縮への関心が高まった。しかし、圧縮比が増加するにつれて従来の手法の性能は劇的に低下し、時にはクローズドブックレベルにまで低下する。この減少は、圧縮プロセス中にキー情報が失われることに起因する。本研究は, 高圧縮比下でのモデル性能を維持するために重要な情報を保持することの重要性を強調し, この仮説を支持する。その結果,QGC (Query-Guided Compressor) を導入し,クエリを利用してコンテキスト圧縮プロセスのガイドを行い,圧縮されたコンテキスト内のキー情報を効果的に保存する。さらに、動的圧縮戦略を採用する。提案したQGCの有効性を,NaturalQuestions,TriviaQA,HotpotQAデータセットを含む質問応答タスクで検証する。実験結果から,QGCは高い圧縮比でも一貫した性能を示し,推算コストとスループットの面でも有益であることがわかった。 The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# モニタリングされた単一粒子動力学における多フラクタル性 Multifractality in monitored single-particle dynamics ( http://arxiv.org/abs/2406.02386v2 ) ライセンス: Link先を確認	Kohei Yajima, Hisanori Oshima, Ken Mochizuki, Yohei Fuji,	(参考訳) 繰り返し測定した単一粒子の時間発展におけるマルチフラクタル特性について検討した。量子系では、局所的なユニタリゲートと局所射影測定からなる回路モデルを考える。古典系では,局所遷移過程下で発達した粒子の軌道を部分的に測定することで推定するモデルを考える。どちらの場合も、波動関数のアンサンブルや、十分に長い時間経過後に測定結果に条件付けられた確率分布にマルチフラクタルな挙動が現れる。粒子輸送の性質(拡散性または弾道性)は、多フラクタル特性に質的に影響を及ぼすが、測定速度や特定のプロトコルに対してさえ定量的に堅牢である。一方、多フラクタル性は、誤った結果が得られるような一般化された測定や、粒子検出のない結果のポストセレクションによって一般的に失われる。数値シミュレーションによりこれらの特性を実証し、また、監視された単一粒子系における多重フラクタル特性を解析的に得るために、いくつかの単純化されたモデルを提案する。 We study multifractal properties in time evolution of a single particle subject to repeated measurements. For quantum systems, we consider circuit models consisting of local unitary gates and local projective measurements. For classical systems, we consider models for estimating the trajectory of a particle evolved under local transition processes by partially measuring particle occupations. In both cases, multifractal behaviors appear in the ensemble of wave functions or probability distributions conditioned on measurement outcomes after a sufficiently long time. While the nature of particle transport (diffusive or ballistic) qualitatively affects the multifractal properties, they are even quantitatively robust to the measurement rate or specific protocols. On the other hand, multifractality is generically lost by generalized measurements allowing erroneous outcomes or by postselection of the outcomes with no particle detection. We demonstrate these properties by numerical simulations and also propose several simplified models, which allow us to analytically obtain multifractal properties in the monitored single-particle systems.	翻訳日:2024-06-19 04:08:51 公開日:2024-06-17
# 深層強化学習による自動微分の最適化 Optimizing Automatic Differentiation with Deep Reinforcement Learning ( http://arxiv.org/abs/2406.05027v2 ) ライセンス: Link先を確認	Jamie Lohoff, Emre Neftci,	(参考訳) 自動微分を持つ計算ジャコビアン(英語版)は、機械学習、計算流体力学、ロボット工学、ファイナンスなど、多くの科学分野においてユビキタスである。ヤコビアン計算における計算量やメモリ使用量の小さな削減でさえ、既にエネルギー消費と実行時の大幅な削減を招いている。このような貯蓄を許容する多くの方法が存在するが、それらは一般に、正確なヤコビアンを近似するために計算効率を交換する。本稿では、深い強化学習(RL)とクロスカントリー除去という概念を活用して、ジャコビアン計算に必要な乗算数を最適化する新しい手法を提案する。クロスカントリー除去は、ジャコビアン累積を計算グラフ上の全ての頂点の順序づけられた除去として表現する自動微分のフレームワークであり、全ての除去が一定の計算コストを発生させる。本稿では,RLエージェントがプレイする単一プレイヤーゲームとして必要な乗算数を最小化する最適消去順序の探索を定式化する。本手法は,様々な領域から取得した複数のタスクに対して,最先端の手法よりも最大33%改善できることを実証する。さらに、これらの理論的なゲインは、得られた除去順序を効率的に実行可能なJAXのクロスカントリー除去インタプリタを提供することにより、実際のランタイム改善に変換されることを示す。 Computing Jacobians with automatic differentiation is ubiquitous in many scientific domains such as machine learning, computational fluid dynamics, robotics and finance. Even small savings in the number of computations or memory usage in Jacobian computations can already incur massive savings in energy consumption and runtime. While there exist many methods that allow for such savings, they generally trade computational efficiency for approximations of the exact Jacobian. In this paper, we present a novel method to optimize the number of necessary multiplications for Jacobian computation by leveraging deep reinforcement learning (RL) and a concept called cross-country elimination while still computing the exact Jacobian. Cross-country elimination is a framework for automatic differentiation that phrases Jacobian accumulation as ordered elimination of all vertices on the computational graph where every elimination incurs a certain computational cost. We formulate the search for the optimal elimination order that minimizes the number of necessary multiplications as a single player game which is played by an RL agent. We demonstrate that this method achieves up to 33% improvements over state-of-the-art methods on several relevant tasks taken from diverse domains. Furthermore, we show that these theoretical gains translate into actual runtime improvements by providing a cross-country elimination interpreter in JAX that can efficiently execute the obtained elimination orders.	翻訳日:2024-06-19 02:10:30 公開日:2024-06-17
# VALL-E 2:ニューラルコーデック言語モデルは、音声合成のための人間のパーティゼロショットテキストである VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers ( http://arxiv.org/abs/2406.05370v2 ) ライセンス: Link先を確認	Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei,	(参考訳) 本稿では,ゼロショット音声合成(TTS)における節目となる,ニューラルコーデック言語モデルの最新の進歩であるVALL-E 2を紹介する。繰り返し認識サンプリング(Repetition Aware Smpling)は、デコード履歴におけるトークンの繰り返しを考慮して、元の核サンプリングプロセスを洗練する。復号化を安定化するだけでなく、無限ループ問題を回避している。 Grouped Code Modelingは、コーデックコードをグループに編成してシーケンス長を効果的に短縮する。 LibriSpeech と VCTK を用いた実験により,VALL-E 2 は音声の頑健性,自然性,話者の類似性において,従来のシステムを上回っていることがわかった。この種のベンチマークで人間と同等に到達したのは、これが初めてのことだ。さらに、VALL-E 2は、その複雑さや繰り返し句によって伝統的に困難な文であっても、高品質な音声を一貫して合成する。この研究の利点は、失語症のある人や筋萎縮性側索硬化症を持つ人のためのスピーチを生成するなど、貴重な努力に寄与する可能性がある。 VALL-E 2.0のデモはhttps://aka.ms/valle2を参照。 This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis. See https://aka.ms/valle2 for demos of VALL-E 2.	翻訳日:2024-06-19 02:10:30 公開日:2024-06-17
# 高次グラフニューラルネットワークのための高効率トポロジ対応データ拡張 Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks ( http://arxiv.org/abs/2406.05482v3 ) ライセンス: Link先を確認	Yurui Lai, Xiaoyang Lin, Renchi Yang, Hongtao Wang,	(参考訳) 近年,グラフニューラルネットワーク(GNN)がグラフ構造化データ学習の強力なツールとして登場し,様々な分野で実りある成功を収めている。 GNNの大多数はメッセージパッシングのパラダイムに従っており、各ノードの表現は隣人の機能を再帰的に集約することで学習される。しかし、このメカニズムは、高次グラフ(HDG)よりも過度にスムーシングと効率上の問題をもたらし、ほとんどのノードには、ソーシャルネットワーク、トランザクショングラフ、電力網など、数十(あるいは数百)の隣人が存在する。さらに、そのようなグラフは通常、リッチで複雑な構造意味論を含み、GNNの機能集約だけではキャプチャが困難である。上記の制限により,HDG上でのGNNのための効率的かつ効果的なフロントマウントデータ拡張フレームワークであるTADを提案する。内部では、TADには2つの重要なモジュールが含まれている。 (i)構造埋め込みによる特徴拡張、及び (ii) トポロジーと属性対応グラフのスパース化。前者は,高効率スケッチ法を用いて,グラフ構造を高品質な構造埋め込みに符号化することにより,拡張ノード特性とモデルキャパシティを向上させる。さらに、グラフ構造や属性から抽出したタスク関連特徴を利用して、第2モジュールは、入力グラフから多数の冗長/ノイズエッジの正確な識別と削減を可能にし、過剰なスムーシングを緩和し、HDGよりも高速な特徴集約を容易にする。経験的に、TADはノード分類の観点から8つの実ホモ親和性/ヘテロ親和性HDG上でのメインストリームGNNモデルの予測性能を著しく改善し、効率的なトレーニングと推論プロセスを実現している。 In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over high-degree graphs (HDGs), wherein most nodes have dozens (or even hundreds) of neighbors, such as social networks, transaction graphs, power grids, etc. Additionally, such graphs usually encompass rich and complex structure semantics, which are hard to capture merely by feature aggregations in GNNs. Motivated by the above limitations, we propose TADA, an efficient and effective front-mounted data augmentation framework for GNNs on HDGs. Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. The former obtains augmented node features and enhanced model capacity by encoding the graph structure into high-quality structure embeddings with our highly-efficient sketching method. Further, by exploiting task-relevant features extracted from graph structures and attributes, the second module enables the accurate identification and reduction of numerous redundant/noisy edges from the input graph, thereby alleviating over-smoothing and facilitating faster feature aggregations over HDGs. Empirically, TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification, while achieving efficient training and inference processes.	翻訳日:2024-06-19 02:10:30 公開日:2024-06-17
# DomainRAG: ドメイン固有検索拡張世代評価のための中国語ベンチマーク DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation ( http://arxiv.org/abs/2406.05654v2 ) ライセンス: Link先を確認	Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou,	(参考訳) Retrieval-Augmented Generation (RAG) は,幻覚やリアルタイム更新の維持の難しさといった,Large Language Models (LLM) のさまざまな制限に対処する,有望なソリューションを提供する。 LLMが専門家の知識をカバーするのに苦労する専門家やドメイン固有のアプリケーションでは、このアプローチは特に重要である。したがって、このようなシナリオにおけるRAGモデルの評価は極めて重要であるが、最近の研究は、共通センスの問題を解決する際のモデルの能力を評価するために、ウィキペディアのような一般的な知識ソースに依存していることが多い。本稿では,ドメイン固有の文脈,大学入学におけるRAG設定によるLCMの評価を行った。 RAGモデルに必要な機能として,会話RAGの能力,構造情報の分析,外部知識への忠実さ,妄想,時間依存問題の解決,多文書間相互作用の理解など6つを同定した。各機能は、RAGモデルのパフォーマンスを評価するために、共有コーパスに関連付けられたデータセットを持つ。 Llama,Baichuan,ChatGLM,GPTモデルなどのLLMの評価を行った。実験の結果,既存の閉書 LLM はドメイン固有の問題に悩まされており,専門家の問題を解決するためのRAG モデルの必要性を強調している。さらに、RAGモデルは、会話履歴の理解、構造情報の分析、装飾、多文書間相互作用の処理、専門家の知識への忠実さなどの能力を向上させる余地がある。今後の研究がこれらの問題をよりよく解決することを期待している。 Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, yet current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems. In this paper, we evaluated LLMs by RAG settings in a domain-specific context, college enrollment. We identified six required abilities for RAG models, including the ability in conversational RAG, analyzing structural information, faithfulness to external knowledge, denoising, solving time-sensitive problems, and understanding multi-document interactions. Each ability has an associated dataset with shared corpora to evaluate the RAG models' performance. We evaluated popular LLMs such as Llama, Baichuan, ChatGLM, and GPT models. Experimental results indicate that existing closed-book LLMs struggle with domain-specific questions, highlighting the need for RAG models to solve expert problems. Moreover, there is room for RAG models to improve their abilities in comprehending conversational history, analyzing structural information, denoising, processing multi-document interactions, and faithfulness in expert knowledge. We expect future studies could solve these problems better.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# 幾何学における接地連続表現:等変ニューラル場 Grounding Continuous Representations in Geometry: Equivariant Neural Fields ( http://arxiv.org/abs/2406.05753v3 ) ライセンス: Link先を確認	David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Sharvaree Vadgama, Efstratios Gavves, Erik J Bekkers,	(参考訳) 近年,ニューラルフィールドは連続的な信号を表現するための強力なモデリングパラダイムとして出現している。条件付きニューラルネットワークでは、フィールドはNeFを条件とする潜在変数で表現され、そのパラメータはデータセット全体にわたって共有される。クロスアテンション・トランスフォーマーをベースとした同変ニューラル場を提案する。NeFは、ラテント点雲である幾何条件変数に条件付けされ、ラテント点からフィールドへの同変復号を可能にする。我々の同変的アプローチは、場と潜伏剤の両方が幾何学的に接地され、場が変換されたときの変換法則に従属するステアビリティ特性を誘導する。重要なこととして、等式関係は、(1)被写体が幾何学的パターンをファイトフリーに表現でき、(2)被写体空間における幾何学的推論が可能であること、(2)空間的に類似したパターンを重み分けできること、およびフィールドのデータセットの効率的な学習を可能にすることを保証する。これらの主な特性は、他の非同変NeFアプローチと比較して、分類実験とデータセット全体を適合させる能力の検証によって検証される。さらに,一意な局所フィールド編集特性を示すことで,ENFの可能性を検証した。 Recently, Neural Fields have emerged as a powerful modelling paradigm to represent continuous signals. In a conditional neural field, a field is represented by a latent variable that conditions the NeF, whose parametrisation is otherwise shared over an entire dataset. We propose Equivariant Neural Fields based on cross attention transformers, in which NeFs are conditioned on a geometric conditioning variable, a latent point cloud, that enables an equivariant decoding from latent to field. Our equivariant approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws if the field transforms, the latent represents transforms accordingly and vice versa. Crucially, the equivariance relation ensures that the latent is capable of (1) representing geometric patterns faitfhully, allowing for geometric reasoning in latent space, (2) weightsharing over spatially similar patterns, allowing for efficient learning of datasets of fields. These main properties are validated using classification experiments and a verification of the capability of fitting entire datasets, in comparison to other non-equivariant NeF approaches. We further validate the potential of ENFs by demonstrate unique local field editing properties.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# マルチモーダル気候変化を考慮した作物収量予測のためのオープンかつ大規模データセット An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware Crop Yield Predictions ( http://arxiv.org/abs/2406.06081v2 ) ライセンス: Link先を確認	Fudong Lin, Kaleb Guillot, Summer Crawford, Yihe Zhang, Xu Yuan, Nian-Feng Tzeng,	(参考訳) 正確な収穫予測は、食料の安全と持続可能な農業慣行を保証するために国家的に重要である。 AI-for-scienceアプローチは、薬物発見や降水流キャストなど、多くの科学的問題を解決する上で有望な成果を示したが、作物収量を予測するディープラーニングモデルの開発は、十分な情報を満たすために、複数のモダリティを持つオープンで大規模なディープラーニング対応データセットが欠如していることによって、常に妨げられている。これを改善するために,米国(アメリカ合衆国)大陸の気候変化を考慮した収量予測を対象とする,最初のテラバイト規模の,公開可能なマルチモーダルデータセットであるCropNetデータセットを紹介した。私たちのCropNetデータセットは、3つのデータ、すなわちSentinel-2 Imagery、WRF-HRRR Computed Dataset、USDA Crop Datasetで構成されており、6年間にわたる2200以上の米国郡(2017-2022年)で、短期間に成長する季節変動と長期気候変動の両方が収穫量に与える影響を考慮し、タイムリーかつ正確に郡レベルでの収穫量を予測するための多目的ディープラーニングモデルの開発を促進することが期待されている。さらに、CropNetパッケージを開発し、3種類のAPIを提供し、研究者が興味のある時間と領域でCropNetデータをダウンロードしやすくし、正確な収量予測のためのディープラーニングモデルを柔軟に構築する。気候変化を考慮した作物収量予測におけるCropNetデータセットの適用性と有効性を検証した。 Precise crop yield predictions are of national importance for ensuring food security and sustainable agricultural practices. While AI-for-science approaches have exhibited promising achievements in solving many scientific problems such as drug discovery, precipitation nowcasting, etc., the development of deep learning models for predicting crop yields is constantly hindered by the lack of an open and large-scale deep learning-ready dataset with multiple modalities to accommodate sufficient information. To remedy this, we introduce the CropNet dataset, the first terabyte-sized, publicly available, and multi-modal dataset specifically targeting climate change-aware crop yield predictions for the contiguous United States (U.S.) continent at the county level. Our CropNet dataset is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, for over 2200 U.S. counties spanning 6 years (2017-2022), expected to facilitate researchers in developing versatile deep learning models for timely and precisely predicting crop yields at the county-level, by accounting for the effects of both short-term growing season weather variations and long-term climate change on crop yields. Besides, we develop the CropNet package, offering three types of APIs, for facilitating researchers in downloading the CropNet data on the fly over the time and region of interest, and flexibly building their deep learning models for accurate crop yield predictions. Extensive experiments have been conducted on our CropNet dataset via employing various types of deep learning solutions, with the results validating the general applicability and the efficacy of the CropNet dataset in climate change-aware crop yield predictions.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# SUBLLM: LLMのためのToken Sequence Subsamplingを用いた新しい効率的なアーキテクチャ SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM ( http://arxiv.org/abs/2406.06571v2 ) ライセンス: Link先を確認	Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang,	(参考訳) 大規模言語モデル(LLM)は様々な分野で大きな成功を収めてきたが、トレーニングと推論の効率性は依然として大きな課題である。本稿では,Subsampling-Upsampling-Bypass Large Language Modelの略で,Subsampling, Upsampling, Bypassモジュールを組み込んでコアデコーダのみのフレームワークを拡張する革新的なアーキテクチャであるSUBLLMを提案する。サブサンプリングモジュールはシーケンスを短縮し、アップサンプリングモジュールはシーケンスの長さを復元し、バイパスモジュールは収束を高める。 LLaMAと比較して、提案されたSUBLLMは、トレーニング速度と推論速度、メモリ使用量の両方で大幅に向上し、競合する数ショットのパフォーマンスを維持している。トレーニング中、SUBLLMはスピードを26%向上し、GPU毎にメモリを10GB削減する。推論では、スピードを最大37%向上し、1GPUあたりのメモリを1GB削減する。トレーニングと推論のスピードは、コンテキストウィンドウが8192に拡張された場合、それぞれ34%と52%向上できる。提案されたアーキテクチャのソースコードを公開バージョンで公開します。 While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# Prompt Report: A Systematic Survey of Prompting Techniques The Prompt Report: A Systematic Survey of Prompting Techniques ( http://arxiv.org/abs/2406.06608v2 ) ライセンス: Link先を確認	Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker, Denis Peskoff, Marine Carpuat, Jules White, Shyamal Anadkat, Alexander Hoyle, Philip Resnik,	(参考訳) ジェネレーティブ・人工知能(GenAI)システムは、産業や研究環境のあらゆる部分に展開されている。開発者とエンドユーザは、プロンプトやプロンプトエンジニアリングを使用して、これらのシステムと対話する。プロンプトは広く研究されている概念であるが、この地域の急進性のために何がプロンプトを構成するのかについての矛盾する用語や質素な存在論的理解が存在する。本稿では, プロンプトの分類を組立て, 利用分析を行うことにより, プロンプトの構造的理解を確立した。本稿では,33の語彙の包括的語彙,58のテキストのみのプロンプト技術,40のモダリティのテクニックを提示する。さらに、自然言語のプレフィックス・プロンプティングに関する文献全体をメタ分析する。 Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# モデルアーキテクチャのレンズによるニューラルビークルルーティング問題解法の一般化 Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture ( http://arxiv.org/abs/2406.06652v2 ) ライセンス: Link先を確認	Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou,	(参考訳) ニューラルモデルは、車両ルーティング問題(VRP)を解決する際に有望な結果をもたらすが、一般化においてしばしば不足する。モデル一般化の最近の試みは、必要以上に大規模なトレーニングコストを発生させるか、あるいは異なるVRPのバリエーションを解決する他のモデルに直接適用できない場合が多い。これらの課題に対処するため,本研究では,モデルアーキテクチャの新たな視点について考察する。具体的には,Scaling Factor (ESF) とDistributment-Specific (DS) デコーダをそれぞれ提案し,サイズと分布の一般化を促進させる。 ESFは、様々な大きさのVRPを解く際に、トレーニング中に発見された慣れ親しんだものに対して、モデルの注意重みパターンを調整する。 DSデコーダは、複数の補助光デコーダを通して複数のトレーニング分布パターンのVRPを明示的にモデル化し、より広範な分散シナリオを含むモデル表現空間を拡張する。我々は,合成および広く認識されている実世界のベンチマークデータセットについて広範な実験を行い,その性能を7つのベースラインモデルと比較した。その結果、ESFとDSデコーダを用いてより一般化可能なモデルを得ることができ、様々なVRP、すなわち旅行セールスマン問題と静電容量化VRPを解くための適用性を示すことができた。特に,提案する汎用コンポーネントは最小限の計算資源を必要とするため,モデル一般化をさらに高めるため,従来の一般化戦略に精力的に組み込むことができる。 Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.	翻訳日:2024-06-19 02:00:43 公開日:2024-06-17
# 時空ホークプロセスのためのフレキシブルパラメトリック推論 Flexible Parametric Inference for Space-Time Hawkes Processes ( http://arxiv.org/abs/2406.06849v2 ) ライセンス: Link先を確認	Emilia Siviero, Guillaume Staerman, Stephan Clémençon, Thomas Moreau,	(参考訳) 社会学、疫学、地震学などの現代の時空間データセットの多くは、適切なホークス時空過程が正確に捉えられるように、自励特性、トリガー、クラスタリングの挙動を同時に示している。本稿では,これらのデータに基づいて,時空ホークスプロセスの強度関数に係わるカーネル関数のパラメータを高速かつ柔軟なパラメトリック推論手法を開発することを目的とする。私たちの統計的アプローチは3つの重要な要素を組み合わせています。 1)有限支持のカーネルについて検討する。 2)時空領域は適切に識別され、 3) (近似)事前計算が使用される。そこで提案する推論手法は, 高速かつ統計的に精度の高い$\ell_2$グラデーションベースの解法である。アルゴリズムの側面を説明することに加えて、合成時空間データと実時空間データについて数値実験を行い、提案手法の妥当性を実証した。 Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in the intensity function of a space-time Hawkes process based on such data. Our statistical approach combines three key ingredients: 1) kernels with finite support are considered, 2) the space-time domain is appropriately discretized, and 3) (approximate) precomputations are used. The inference technique we propose then consists of a $\ell_2$ gradient-based solver that is fast and statistically accurate. In addition to describing the algorithmic aspects, numerical experiments have been carried out on synthetic and real spatio-temporal data, providing solid empirical evidence of the relevance of the proposed methodology.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# Hydra-MDP:マルチターゲットハイドラ蒸留によるエンドツーエンドマルチモーダルプランニング Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation ( http://arxiv.org/abs/2406.06978v2 ) ライセンス: Link先を確認	Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez,	(参考訳) 教師-学生モデルに複数の教師を取り入れた新しいパラダイムであるHydra-MDPを提案する。このアプローチでは、人間とルールベースの教師の両方から知識を蒸留して学生モデルを訓練し、様々な評価指標に合わせて様々な軌道候補を学習するマルチヘッドデコーダを特徴とする。ルールベースの教師の知識により、Hydra-MDPは、非微分不可能なポストプロセッシングに頼るのではなく、エンド・ツー・エンドの方法で環境がプランニングにどのように影響するかを学ぶ。この手法はナブシム問題において1^{st}$の精度を達成し、様々な運転環境や条件における一般化の大幅な改善を示す。コードはhttps://github.com/woxihuanjiangguo/Hydra-MDPで入手できる。 We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at https://github.com/woxihuanjiangguo/Hydra-MDP	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# 自由を破る:非協力的仮定なしで効率的な多人数のプライベート・セット・ユニオン Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions ( http://arxiv.org/abs/2406.07011v2 ) ライセンス: Link先を確認	Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai,	(参考訳) マルチパーティ・プライベート・セット・ユニオン(MPSU)プロトコルでは、$m$$(m > 2)$パーティがそれぞれセットを持っていて、他のパーティに追加情報を公開することなく、セットのユニオンをまとめて計算することができる。 MPSUプロトコルには2つの主要なカテゴリがある。このカテゴリの既存のすべての作業は、超直線的な公開鍵操作を含み、結果として実用的効率が低下する。 2つ目は、暗黙の転送と対称キー技術に基づくものである。このカテゴリにおける唯一の既存の研究は、Liu and Gao (ASIACRYPT 2023) によって提案されている。残念なことに、これは通常の半正直なセキュリティを達成しない。したがって、標準的な半真性モデルにおいて、暗黙の転送と対称鍵技術に基づく実用的なMPSUプロトコルを構築するという問題は未解決のままである。さらに,線形計算と線形通信の複雑さを両立させるMPSUプロトコルは存在しない。本稿では、これらの2つの未解決問題を解決する。本稿では,標準半高次モデルにおいて,暗黙の転送と対称鍵技術に基づく最初のMPSUプロトコルを提案する。このプロトコルは、LAN設定でLiuやGaoよりも高速な4.9-9.3 \timesである。具体的には、当社のプロトコルはオンラインフェーズでわずか3.6ドル秒で、それぞれ2〜20ドルのアイテムがセットされている。公開鍵演算に基づく線形計算と線形通信の複雑さを両立させる最初のMPSUプロトコルを提案する。このプロトコルは通信コストが低く、Liu や Gao と比較すると、通信コストが3.0-36.5 倍になる。 Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# 音声テキスト検索におけるブリッジング言語ギャップ Bridging Language Gaps in Audio-Text Retrieval ( http://arxiv.org/abs/2406.07012v2 ) ライセンス: Link先を確認	Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang,	(参考訳) 音声テキスト検索は難しい作業であり、データベース内で音声クリップやテキストキャプションを検索する必要がある。英語記述に関する既存の研究の主な焦点は、実世界のデータに非英語コンテンツが豊富に存在することを考えると、そのようなモデルの適用性に制限を課している。これらの言語格差に対処するため,多言語テキストエンコーダ(SONAR)を用いて言語固有の情報でテキストデータを符号化する言語拡張(LE)を提案する。さらに、一貫したアンサンブル蒸留(CED)を適用してオーディオエンコーダを最適化し、可変長音声テキスト検索のサポートを強化する。提案手法は,AudioCaps や Clotho などの一般的なデータセット上でのSOTA (State-of-the-art) の性能を示す,英語の音声テキスト検索に優れている。同時に、この手法は、追加の言語強化トレーニングデータの10%しか持たない、他の7つの言語でのコンテンツ検索の習熟度を示し、有望な結果をもたらす。ソースコードはhttps://github.com/zyyan4/ml-clap.comで公開されている。 Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multilingual text encoder (SONAR) to encode the text data with language-specific information. Additionally, we optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval. Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho. Simultaneously, the approach exhibits proficiency in retrieving content in seven other languages with only 10% of additional language-enhanced training data, yielding promising results. The source code is publicly available https://github.com/zyyan4/ml-clap.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# Beyond Bare Queries: 3D Scene Graphによるオープン語彙オブジェクト検索 Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph ( http://arxiv.org/abs/2406.07113v2 ) ライセンス: Link先を確認	Sergey Linok, Tatiana Zemskova, Svetlana Ladanova, Roman Titkov, Dmitry Yudin,	(参考訳) 自然言語で言及されたオブジェクトの配置は、自律的なエージェントにとって大きな課題となる。既存のCLIPベースのオープンボキャブラリ手法は,単純なクエリによる3次元オブジェクトの検索に成功しているが,オブジェクト関係の理解を求める曖昧な記述には対応できない。そこで,この問題を解決するためにBBQ (Beyond Bare Queries) と呼ばれるモジュラー手法を提案する。この手法は3次元空間グラフ表現を計量エッジで構築し,提案アルゴリズムを用いて大規模言語モデルを人対エージェントインタフェースとして利用する。 BBQは、3Dオブジェクトを形成するためにDINOを使ったロバストなアソシエーション、それらを2Dに投影する高度なレイキャストアルゴリズム、グラフノードとして記述するビジョン言語モデルを採用している。 Replica と ScanNet のデータセットでは,設計手法が3次元オブジェクト中心の地図を正確に構築できることが示されている。オープンな3次元セマンティックセマンティックセグメンテーションにおいて,他のゼロショット手法に対して,その品質が重要な位置を占めることを実証した。また,同じ意味クラスの複数の実体を含む場面において,空間的関係の活用が特に有効であることを示す。 Sr3D と Nr3D のベンチマークでは、提案手法は、他の最先端手法と比較して、複雑なクエリによるオブジェクトの検索を可能にした。設計ソリューションを考えると、最も近いアナログの約x3倍の処理速度を達成した。この有望なパフォーマンスは、応用インテリジェントロボティクスプロジェクトにおける私たちのアプローチの活用を可能にします。コードをlinukc.github.io/bbq/で公開しています。 Locating objects referred to in natural language poses a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object retrieval with simple (bare) queries but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene spatial graph representation with metric edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to form 3D objects, an advanced raycasting algorithm to project them to 2D, and a vision-language model to describe them as graph nodes. On Replica and ScanNet datasets, we show that the designed method accurately constructs 3D object-centric maps. We have demonstrated that their quality takes a leading place for open-vocabulary 3D semantic segmentation against other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On Sr3D and Nr3D benchmarks, our deductive approach demonstrates a significant improvement, enabling retrieving objects by complex queries compared to other state-of-the-art methods. Considering our design solutions, we achieved a processing speed approximately x3 times faster than the closest analog. This promising performance enables our approach for usage in applied intelligent robotics projects. We make the code publicly available at linukc.github.io/bbq/.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# 深層強化学習に基づく車両インターネットにおけるセマンティック・アウェアスペクトル共有 Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning ( http://arxiv.org/abs/2406.07213v3 ) ライセンス: Link先を確認	Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief,	(参考訳) 本研究の目的は、車両間通信(V2V)と車両間通信(V2I)のスペクトル共有に着目し、高速移動体インターネット(IoV)環境における意味コミュニケーションを検討することである。本稿では、スペクトル不足とネットワークトラフィックに対処し、深部強化学習(DRL)に基づく意味認識スペクトル共有アルゴリズム(SSS)を提案する。まず,意味情報の抽出について検討する。第二に、IoV環境でのV2VとV2Iのスペクトル共有における意味情報のメトリクスを再定義し、高速な意味スペクトル効率(HSSE)と意味伝達率(HSR)を導入する。最後に、意味情報に基づくV2VおよびV2Iスペクトル共有における決定最適化にSACアルゴリズムを用いる。この最適化は、V2VとV2Iの共有戦略の最適リンク、V2VのHSSEを最大化し、V2Vの効果的な意味情報伝達(SRS)の成功率を高めることを目的として、セマンティック情報を送信する車両の送信パワーと送信セマンティックシンボルの長さを含む。実験の結果,SSSアルゴリズムは,従来の通信方式のスペクトル共有アルゴリズムや,他の強化学習手法を用いたスペクトル共有アルゴリズムなど,他のベースラインアルゴリズムよりも優れていた。 SSSアルゴリズムは、HSSEの15%増加、SRSの約7%増加を示す。 This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# 能動学習を用いた局所モデル妥当性の定量化 Quantifying Local Model Validity using Active Learning ( http://arxiv.org/abs/2406.07474v2 ) ライセンス: Link先を確認	Sven Lämmle, Can Bogoclu, Robert Voßhall, Anselm Haselhoff, Dirk Roos,	(参考訳) 機械学習モデルの現実的な応用は、しばしば法律やポリシーに基づく規制の対象となる。これらの規則のいくつかはモデルの妥当性を保証することを必要とし、すなわち近似誤差は閾値よりも小さい。グローバルメトリックは、一般的に、特定の予測の妥当性を決定するには敏感すぎるが、追加データを集める必要があるため、局所的な妥当性を評価するにはコストがかかる。モデル検証ベンチマークを用いて,提案手法が比較的少量のデータを用いて十分な識別特性を持つ誤差モデルに導出できることを示す。さらに, 妥当性境界の局所的変化に対する感度を, 代替手法と比較して高めることを示した。 Real-world applications of machine learning models are often subject to legal or policy-based regulations. Some of these regulations require ensuring the validity of the model, i.e., the approximation error being smaller than a threshold. A global metric is generally too insensitive to determine the validity of a specific prediction, whereas evaluating local validity is costly since it requires gathering additional data.We propose learning the model error to acquire a local validity estimate while reducing the amount of required data through active learning. Using model validation benchmarks, we provide empirical evidence that the proposed method can lead to an error model with sufficient discriminative properties using a relatively small amount of data. Furthermore, an increased sensitivity to local changes of the validity bounds compared to alternative approaches is demonstrated.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# VideoLLaMA 2:ビデオLLMにおける空間時間モデリングと音声理解の促進 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs ( http://arxiv.org/abs/2406.07476v2 ) ライセンス: Link先を確認	Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing,	(参考訳) 本稿では,映像・音声指向タスクにおける時空間モデリングと音声理解の強化を目的としたビデオ大言語モデル(Video Large Language Models: Video-LLMs)のセットであるVideoLLaMA 2を提案する。 VideoLLaMA 2には、ビデオデータの複雑な空間的・時間的ダイナミクスを効果的にキャプチャする、テーラーメイドの空間的時間的畳み込みコネクタが組み込まれている。さらに,ジョイントトレーニングを通じてモデルにオーディオブランチを組み込むことで,音声キューをシームレスに組み込むことで,モデルのマルチモーダル理解能力を向上する。マルチ選択ビデオ質問応答(MC-VQA)、オープンエンドビデオ質問応答(OE-VQA)、ビデオキャプション(VC)タスクに関する総合的な評価は、VideoLLaMA 2がオープンソースモデル間の競争結果を一貫して達成し、いくつかのベンチマークでいくつかのプロプライエタリなモデルに近づいたことを示している。さらに、VideoLLaMA 2は、既存のモデルよりもオーディオ専用およびオーディオビデオ質問応答(AQA & OE-AVQA)のベンチマークが合理的に改善されている。これらの進歩は、マルチモーダル理解におけるVideoLLaMA 2の優れた性能を基盤としており、インテリジェントなビデオ分析システムのための新しい標準となっている。すべてのモデルは、さらなる研究を促進するために公開されています。 In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data. Additionally, we integrate an Audio Branch into the model through joint training, thereby enriching the multimodal understanding capabilities of the model by seamlessly incorporating audio cues. Comprehensive evaluations on multiple-choice video question answering (MC-VQA), open-ended video question answering (OE-VQA), and video captioning (VC) tasks demonstrate that VideoLLaMA 2 consistently achieves competitive results among open-source models and even gets close to some proprietary models on several benchmarks. Furthermore, VideoLLaMA 2 exhibits reasonable improvements in audio-only and audio-video question-answering (AQA & OE-AVQA) benchmarks over existing models. These advancements underline VideoLLaMA 2's superior performance in multimodal comprehension, setting a new standard for intelligent video analysis systems. All models are public to facilitate further research.	翻訳日:2024-06-19 01:50:51 公開日:2024-06-17
# アクターが話す: 動きと外見が絡み合った、一般化可能で高忠実なリップシンク Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement ( http://arxiv.org/abs/2406.08096v2 ) ライセンス: Link先を確認	Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian,	(参考訳) 本研究の目的は,個人的アイデンティティと視覚的詳細を保ちながら,音声による唇の動きの編集を行うことである。課題は,(1)音声による唇の動き生成と(2)視覚的外観合成の2つのサブプロブレムに分解することができる。現在のソリューションは、単一の生成モデル内で2つのサブプロブレムを処理する。その代わりに、動作と外観をアンタングルにし、音声間拡散モデルと動作条件付き外観生成モデルで1つずつ生成することを提案する。しかし,(1)における動作認識のアイデンティティの保存,(2)における視覚的詳細の保存など,各段階における課題は依然として残っている。したがって、個人的アイデンティティを維持するために、動作を表現するためにランドマークを採用し、さらにランドマークに基づくアイデンティティ損失を採用する。動きに依存しない視覚的詳細をキャプチャするために、別個のエンコーダを使用して唇、非唇の外観、動きを符号化し、学習した融合モジュールと統合する。大規模で多様なデータセットでMyTalkをトレーニングします。実験により,本手法は,リップシンクと視覚的ディテールの両面から,未知のドメイン外人物によく一般化することが示された。プロジェクトページ(https://Ingrid789.github.io/MyTalk/)でビデオを見ることを推奨しています。 We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync quality and visual details preservation. Instead, we propose to disentangle the motion and appearance, and then generate them one by one with a speech-to-motion diffusion model and a motion-conditioned appearance generation model. However, there still remain challenges in each stage, such as motion-aware identity preservation in (1) and visual details preservation in (2). Therefore, to preserve personal identity, we adopt landmarks to represent the motion, and further employ a landmark-based identity loss. To capture motion-agnostic visual details, we use separate encoders to encode the lip, non-lip appearance and motion, and then integrate them with a learned fusion module. We train MyTalk on a large-scale and diverse dataset. Experiments show that our method generalizes well to the unknown, even out-of-domain person, in terms of both lip sync and visual detail preservation. We encourage the readers to watch the videos on our project page (https://Ingrid789.github.io/MyTalk/).	翻訳日:2024-06-19 01:41:06 公開日:2024-06-17
# 目が広いアンシャット:予測不能な視線検出による自我中心ビデオにおける教師なしの誤検出 Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze ( http://arxiv.org/abs/2406.08379v2 ) ライセンス: Link先を確認	Michele Mazzamuto, Antonino Furnari, Giovanni Maria Farinella,	(参考訳) 本稿では,スマートグラスにおけるユーザ支援を促進する重要な要素である視線信号の解析を通じて,自我中心映像における教師なし誤り検出の課題に対処する。手動でラベル付けされたミスに依存する従来の教師付きメソッドは、ドメイン依存性とスケーラビリティの問題に悩まされている。本研究では、ドメイン固有の要件と注釈付きデータの必要性を克服し、人間の活動のビデオの誤りを検出する教師なし手法を提案する。不完全な入力から視線軌跡を予測できる視線完了モデルを提案する。期待された視線経路と観測された視線経路の違いは、誤りを特定する指標として機能する。本手法はEPIC-Tentデータセットで検証され,従来の1クラスの教師なし・教師なしの手法と比較して優位性を示した。 In this paper, we address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals, a critical component for advancing user assistance in smart glasses. Traditional supervised methods, reliant on manually labeled mistakes, suffer from domain-dependence and scalability issues. This research introduces an unsupervised method for detecting mistakes in videos of human activities, overcoming the challenges of domain-specific requirements and the necessity for annotated data. By analyzing unusual gaze patterns that signal user disorientation during tasks, we propose a gaze completion model that forecasts eye gaze trajectories from incomplete inputs. The difference between the anticipated and observed gaze paths acts as an indicator for identifying errors. Our method is validated on the EPIC-Tent dataset, showing its superiority compared to current one-class supervised and unsupervised techniques.	翻訳日:2024-06-19 01:41:06 公開日:2024-06-17
# CIMRL: 安全な自動運転のためのシミュレーションと強化学習を組み合わせる CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving ( http://arxiv.org/abs/2406.08878v2 ) ライセンス: Link先を確認	Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Aleksandr Petiushko,	(参考訳) 現代の自動運転のアプローチは、模倣学習を通じて大量の人間の運転データで訓練された学習されたコンポーネントに大きく依存している。しかし、これらの手法には大量の高価なデータ収集が必要であり、ロングテールシナリオを安全に処理し、時間とともにエラーを複雑化するという課題に直面している。同時に、純粋な強化学習(RL)手法は、運転のような報酬設定を疎外し、制約し、かつ決定し難いパフォーマンスポリシーを学習することができない。これらの課題はどちらも、自動運転車のような安全上重要なアプリケーションに、純粋にクローン化されたポリシーを展開させる。本稿では,模倣動作の先行と安全性制約を活用することで,シミュレーションにおける運転方針のトレーニングを可能にするCIMRL(Combining imitation and Reinforcement Learning)アプローチを提案する。 CIMRLは広範な報酬仕様を必要とせず、純粋なクローンメソッドの閉ループ挙動を改善している。 RLと模倣を組み合わせることで,本手法は閉ループシミュレーション駆動ベンチマークにおいて最先端の結果が得られることを示す。 Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.	翻訳日:2024-06-19 01:41:06 公開日:2024-06-17
# 特徴構造を持つ細粒領域一般化 Fine-Grained Domain Generalization with Feature Structuralization ( http://arxiv.org/abs/2406.09166v2 ) ライセンス: Link先を確認	Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu,	(参考訳) 細粒度領域一般化(FGDG)は、クラス間差が小さく、クラス内差が比較的大きいため、従来のDGタスクよりも難しい課題である。ドメイン分布が変化すると、微妙な特徴の脆弱性がモデルの性能を著しく低下させる。それでも人間は、カテゴリー内の共通点と特異点の識別から生じる構造化された多粒性知識を活用して、アウト・オブ・ディストリビューションデータに一般化する能力を本質的に示している。同様に、FGDGの性能を高めるために、FSDGモデル(Feature Structureized Domain Generalization: FSDG)モデルを提案する。特に特徴構造化(FS)は5つの制約の合同最適化によって達成される: 絡み合ったセグメントに適用されるデコリレーション関数、共通特徴の一貫性と特徴の特異性を保証する3つの制約、予測キャリブレーション項。これらの規定を課すことにより、FSDGは多粒度知識に基づいて特徴を歪め、整列させ、カテゴリー間の頑健な微妙な区別を促進する。 3つのベンチマークでの大規模な実験は、FGDGのパフォーマンスが平均6.2%向上し、最先端のベンチマークよりもFSDGの方が優れていることを一貫して検証している。さらに、カテゴリ間の共有概念とモデルチャネル間の明示的な概念マッチング強度に関する説明可能性分析を行い、様々な主流モデルアーキテクチャの実験を行い、FSの有効性を実証した。 Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.	翻訳日:2024-06-19 01:41:06 公開日:2024-06-17
# 双方向AIアライメントに向けて: 明確化, 枠組み, 今後の方向性の体系的レビュー Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions ( http://arxiv.org/abs/2406.09264v2 ) ライセンス: Link先を確認	Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens,	(参考訳) 汎用AIの最近の進歩は、AIシステムを意図された目標、倫理的原則、個人とグループの価値に向けて導くことの重要性を強調している。しかしながら、人間-AIアライメントの明確な定義とスコープの欠如は、このアライメントを達成するための研究領域間の協調作業を妨げる重要な障害となる。特に、MLと哲学指向のアライメント研究は、AIアライメントを、進行中の相互アライメント問題(429)ではなく、静的で一方向のプロセス(つまり、AIシステムの目的が人間と一致することを保証すること)とみなすことが多い。この観点は、長期の相互作用とアライメントの動的変化を無視している。これらのギャップを理解するために、2019年から2024年1月までに発行された400以上の論文を体系的にレビューし、ヒューマン・コンピュータ・インタラクション(HCI)、自然言語処理(NLP)、機械学習(ML)など複数のドメインにまたがる調査を行った。人間のAIアライメントを特徴づけ、定義し、スコープ化します。そこで本研究では,「双方向型AIアライメント」の概念的枠組みを提示し,文学を人間中心の視点から整理する。このフレームワークは両方を包含する 1)AIを人間に合わせる従来の研究は、AIが人間によって決定された結果を生み出すことを確実にしている。 2) 個人や社会が認知的・行動的にAIの進歩に適応することを支援することを目的として,人間をAIに整合させる概念を提案する。さらに,人的価値,インタラクション技術,評価に関する議論など,文献分析から得られた重要な知見を述べる。今後の研究の道を開くために,今後の方向性に関する3つの重要な課題を思いつき,今後の解決策の例を提案する。 Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.	翻訳日:2024-06-19 01:41:06 公開日:2024-06-17
# フレームが多すぎて役に立たない:長めのビデオQAのための効率的な戦略 Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA ( http://arxiv.org/abs/2406.09396v2 ) ライセンス: Link先を確認	Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo,	(参考訳) 広い時間間隔にまたがるロングフォームビデオは、非常に情報冗長であり、しばしばゆるやかな関係を持つ複数の異なるイベントやエンティティを含んでいる。したがって、長文ビデオ質問応答(LVQA)を行う場合、正しい応答を生成するために必要な情報はすべて、フレームの小さなサブセットに含まれることが多い。近年の文献では、ビデオ内のすべての視覚コンテンツを自然言語に変換するために視覚言語モデル(VLM)に依存しながら、LVQAベンチマークにおける大きな言語モデル(LLM)の使用を調査している。このようなVLMは、長いビデオから一様にサンプリングされた大量のフレームを独立にキャプションすることが多いが、これは効率的ではなく、ほとんど冗長である。これらの選択を問うことで、キーフレーム選択とシーケンス認識キャプションの最適戦略を探求し、これらの冗長性を著しく低減することができる。本稿では,階層型鍵フレームセレクタと逐次型ビジュアルLLMという,各側面を改善する2つの新しいアプローチを提案する。 LVNetと呼ばれるフレームワークは、3つのベンチマークLVQAデータセットにまたがって最先端のパフォーマンスを実現する。私たちのコードは公開されます。 Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large language models (LLMs) in LVQA benchmarks, achieving exceptional performance, while relying on vision language models (VLMs) to convert all visual content within videos into natural language. Such VLMs often independently caption a large number of frames uniformly sampled from long videos, which is not efficient and can mostly be redundant. Questioning these decision choices, we explore optimal strategies for key-frame selection and sequence-aware captioning, that can significantly reduce these redundancies. We propose two novel approaches that improve each of aspects, namely Hierarchical Keyframe Selector and Sequential Visual LLM. Our resulting framework termed LVNet achieves state-of-the-art performance across three benchmark LVQA datasets. Our code will be released publicly.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-17
# 要件は必要なものすべて: LLMによる要件からコードへ Requirements are All You Need: From Requirements to Code with LLMs ( http://arxiv.org/abs/2406.10101v2 ) ライセンス: Link先を確認	Bingyang Wei,	(参考訳) ソフトウェア要件のドキュメンテーションにおけるテキスト形式の普及は、ソフトウェアエンジニアリングタスクに大規模言語モデル(LLM)を適用する大きな機会を提供する。高品質なソフトウェア要件は、手動のソフトウェア開発プロセスを強化するだけでなく、新興のLLM技術の可能性を完全に活用するように組織を配置する。本稿では,要求文書からコードスニペットを自動生成するLLMについて述べる。このLLMは、ソフトウェア開発プロセス、要件分析、オブジェクト指向設計、テスト駆動開発に関連する知識、ヒューリスティックス、インストラクションで拡張され、経験豊富なソフトウェアエンジニアの専門知識を効果的にエミュレートします。我々は,ソフトウェア技術者が段階的にこのLLMに関わり得る「プログレッシブ・プロンプティング」手法を導入する。このアプローチを通じて、LLMは、提供された要件を解釈して機能要件を抽出し、これらを使用してオブジェクト指向モデルを作成し、その後、オブジェクト指向設計に基づいて単体テストとコードを生成することで、ソフトウェア開発タスクに段階的に取り組みます。複雑なユーザ要件の理解とロバストな設計とコードソリューションの創出におけるLCMの熟練度を,Webプロジェクトの開発に焦点をあてたケーススタディを通じて実証する。本研究は、LCMをソフトウェア開発ワークフローに統合し、効率と品質の両方を大幅に向上させる可能性を明らかにする。 LLMはhttps://chat.openai.com/g/g-bahoiKzkB-software-engineer-gptで利用可能である。 The pervasive use of textual formats in the documentation of software requirements presents a great opportunity for applying large language models (LLMs) to software engineering tasks. High-quality software requirements not only enhance the manual software development process but also position organizations to fully harness the potential of the emerging LLMs technology. This paper introduces a tailored LLM for automating the generation of code snippets from well-structured requirements documents. This LLM is augmented with knowledge, heuristics, and instructions that are pertinent to the software development process, requirements analysis, object-oriented design, and test-driven development, effectively emulating the expertise of a seasoned software engineer. We introduce a "Progressive Prompting" method that allows software engineers to engage with this LLM in a stepwise manner. Through this approach, the LLM incrementally tackles software development tasks by interpreting the provided requirements to extract functional requirements, using these to create object-oriented models, and subsequently generating unit tests and code based on the object-oriented designs. We demonstrate the LLM's proficiency in comprehending intricate user requirements and producing robust design and code solutions through a case study focused on the development of a web project. This study underscores the potential of integrating LLMs into the software development workflow to significantly enhance both efficiency and quality. The tailored LLM is available at https://chat.openai.com/g/g-bahoiKzkB-software-engineer-gpt.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-17
# サブターフュージへのシクロファンシー:大規模言語モデルにおけるリワードタンパの検討 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ( http://arxiv.org/abs/2406.10162v2 ) ライセンス: Link先を確認	Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger,	(参考訳) 強化学習では、AIシステムが不特定のトレーニング目標のために高い報酬を受ける、望ましくない振る舞いを学ぶとき、仕様ゲームが発生する。仕様ゲームは、サイコファンシーのような単純な行動から、報酬のテーパーのような洗練された行動まで様々で、モデルが自身の報酬メカニズムを直接変更する。しかし、これらの悪質な行動は、探索によって発見されるには複雑すぎるかもしれない。本稿では,言語モデル(LLM)アシスタントにおいて,発見が容易な仕様ゲームが,報酬テーパーを含む,より希少な,よりブレントな形式を実現するために一般化されるかどうかを考察する。より洗練されたゲーム環境のカリキュラムを構築し、早期のカリキュラム環境におけるトレーニングが、残りの環境におけるより多くの仕様ゲームに繋がることを示した。興味深いことに、LLMアシスタントは、カリキュラム全体を訓練し、ゼロショットを一般化して、自身の報酬関数を直接書き換える。初期のカリキュラム環境をゲームするためにLLMをトレーニングすることは、軽減するが、後続の環境では報酬のテーパーを排除しない。さらに、ゲーム可能な環境に無害トレーニングを加えることで、報酬の改ざんを防ぐことはできない。これらの結果は、LLMが一般的な仕様ゲームからより悪質な報酬テーパーへと一般化でき、そのような振る舞いを除去するのは簡単ではないことを示している。 In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too complex to be discovered via exploration. In this paper, we study whether Large Language Model (LLM) assistants which find easily discovered forms of specification gaming will generalize to perform rarer and more blatant forms, up to and including reward-tampering. We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments. Strikingly, a small but non-negligible proportion of the time, LLM assistants trained on the full curriculum generalize zero-shot to directly rewriting their own reward function. Retraining an LLM not to game early-curriculum environments mitigates, but does not eliminate, reward-tampering in later environments. Moreover, adding harmlessness training to our gameable environments does not prevent reward-tampering. These results demonstrate that LLMs can generalize from common forms of specification gaming to more pernicious reward tampering and that such behavior may be nontrivial to remove.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-17
# 意図からテクニックへ:大規模言語モデルのためのテキスト透かしの包括的分類と課題 From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models ( http://arxiv.org/abs/2406.11106v1 ) ライセンス: Link先を確認	Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee,	(参考訳) LLM(Large Language Models)の急速な成長に伴い、不正使用に対するテキストコンテンツの保護が重要となる。テキスト透かしは、LLM生成とプレーンテキストソースの両方を保護する、重要なソリューションを提供する。本稿では, 透かし技術設計の背景にある様々な視点を総合的に概観し, 研究文献の総合的な調査を通して概観する。本研究は, 異なる透かし技術の背後にある特定の意図, 使用する評価データセット, 透かしの追加, および, 凝集性分類学を構築するための除去方法に基づいて, 研究を考察する。 2)テキストオーサシップの保護研究を促進するために,テキスト透かしにおけるギャップとオープンな課題を強調した。この広範囲にわたるカバレッジと詳細な分析は、言語モデルにおけるテキスト透かしの進化状況に関する貴重な洞察を与えてくれる。 With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# パーソナライズド言語モデルにおける安全・実用取引の探索 Exploring Safety-Utility Trade-Offs in Personalized Language Models ( http://arxiv.org/abs/2406.11107v1 ) ライセンス: Link先を確認	Anvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi,	(参考訳) 大規模言語モデル(LLM)が日々のアプリケーションにますます統合されるにつれて、多様なユーザ層にわたって、それらが適切に動作することを保証することが不可欠である。本研究では,LLMがパーソナライズバイアスに悩まされ,ユーザのアイデンティティにパーソナライズされた場合のパフォーマンスに影響を及ぼすことを示す。安全性と実用性という2つの軸に沿ってLLMの性能を評価することにより、パーソナライズバイアスを定量化する。我々は、パーソナライズなしで、安全でないプロンプトに対する良識あるLLM応答がどのように安全であるかを調べることで安全性を測定する。汎用知識,数学的能力,プログラミング,推論能力など,様々なタスクにおいてLLMの性能を評価することで,実用性を評価する。 Llama (Touvron et al , 2023) や Mistral (Jiang et al , 2023) のようなオープンソースのモデルから GPT-3.5 や GPT-4o (Ouyang et al , 2022) のような API ベースのモデルまで,ユーザアイデンティティによる安全性と実用性のトレードオフの観点からは,さまざまな LLM がパフォーマンスに有意なばらつきを示すことがわかった。最後に、嗜好調整とプロンプトベースディフェンスを用いたパーソナライズバイアスを軽減するためのいくつかの戦略について議論する。 As large language models (LLMs) become increasingly integrated into daily applications, it is essential to ensure they operate fairly across diverse user demographics. In this work, we show that LLMs suffer from personalization bias, where their performance is impacted when they are personalized to a user's identity. We quantify personalization bias by evaluating the performance of LLMs along two axes - safety and utility. We measure safety by examining how benign LLM responses are to unsafe prompts with and without personalization. We measure utility by evaluating the LLM's performance on various tasks, including general knowledge, mathematical abilities, programming, and reasoning skills. We find that various LLMs, ranging from open-source models like Llama (Touvron et al., 2023) and Mistral (Jiang et al., 2023) to API-based ones like GPT-3.5 and GPT-4o (Ouyang et al., 2022), exhibit significant variance in performance in terms of safety-utility trade-offs depending on the user's identity. Finally, we discuss several strategies to mitigate personalization bias using preference tuning and prompt-based defenses.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# ヘイト音声検出のための大規模言語モデルにおけるアノテーションバイアスの検討 Investigating Annotator Bias in Large Language Models for Hate Speech Detection ( http://arxiv.org/abs/2406.11109v1 ) ライセンス: Link先を確認	Amit Das, Zheng Zhang, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Nilanjana Raychawdhary, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals,	(参考訳) データアノテーション(生データに記述ラベルを割り当てるプラクティス)は、機械学習モデルのパフォーマンスを最適化する上で重要である。しかし、アノテータが導入したバイアスの影響を受けやすいリソース集約プロセスである。 ChatGPTのような高度なLarge Language Models(LLM)の出現は、この複雑な手続きを近代化し合理化するユニークな機会を提供する。既存の研究は,LPMのアノテータとしての有効性を広く評価しているが,本論文では,ヘイトスピーチデータのアノテート時のLPM,特にGPT 3.5およびGPT 4oのバイアスについて検討する。我々の研究は、性別、人種、宗教、障害の4つの主要なカテゴリーにおけるバイアスの理解に貢献する。具体的には、これらのカテゴリ内の非常に脆弱なグループを対象として、アノテータバイアスを分析します。さらに、アノテーション付きデータを精査することにより、これらのバイアスに寄与する潜在的な因子を網羅的に調査する。我々は、この研究を行うために、私たちのカスタムヘイトスピーチ検出データセットであるHateSpeechCorpusを紹介します。さらに、ETHOS(Mollas et al , 2022)データセット上でも、比較分析のために同様の実験を行う。本論文は,LLMの可能性をデータアノテーションに活用する上で,研究者や実践者たちを指導する上で重要な資源として機能する。 https://github.com/AmitDasRup123/HateSpeechCorpus.com/HateSpeechCorpusデータセットが利用可能だ。 Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs), like ChatGPT presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs, specifically GPT 3.5 and GPT 4o when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateSpeechCorpus, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al., 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for dataannotation, thereby fostering advancements in this critical field. The HateSpeechCorpus dataset is available here: https://github.com/AmitDasRup123/HateSpeechCorpus	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# ニューラルネットワークがいかにしてサポートを学ぶかは、SGDの必然的正規化効果である How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD ( http://arxiv.org/abs/2406.11110v1 ) ライセンス: Link先を確認	Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio,	(参考訳) 目的関数の支持を識別するディープニューラルネットワークの能力について検討する。以上の結果から,SGDは入力の無関係成分に関連する重みをゼロにすることで,ネットワークの第1層の支持を効果的に学習することがわかった。対照的に、バニラGDも対象関数を近似するが、第1層の支持を学習するためには明示的な正規化項が必要である。ミニバッチSGDのこの性質は、2階の暗黙正則化効果が$\eta / b$(ステップサイズ/バッチサイズ)に比例していることが証明されている。我々の結果は、暗黙の正則化がトレーニング最適化のダイナミクスに重大な影響を与えることの証明であるだけでなく、ネットワークによって学習される特徴の構造にも光を当てている。さらに、より小さなバッチは機能の解釈可能性を高め、初期化への依存を減らすことを示唆している。 We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input. In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit regularization term to learn the support in the first layer. We prove that this property of mini-batch SGD is due to a second-order implicit regularization effect which is proportional to $\eta / b$ (step size / batch size). Our results are not only another proof that implicit regularization has a significant impact on training optimization dynamics but they also shed light on the structure of the features that are learned by the network. Additionally, they suggest that smaller batches enhance feature interpretability and reduce dependency on initialization.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 固有状態熱化仮説によるエルゴトロピーとノーゴー理論の普遍的束縛 Universal bound on Ergotropy and No-Go Theorem by the Eigenstate Thermalization Hypothesis ( http://arxiv.org/abs/2406.11112v1 ) ライセンス: Link先を確認	Akihiro Hokkyo, Masahito Ueda,	(参考訳) 量子多体系から抽出可能な最大処理(エルゴトロピー)は、初期状態の局所熱水性と量子演算による局所エントロピー減少によって制約されることを示す。得られたエルゴトロピーの普遍的境界は、固有状態熱化仮説が有限時間単位演算によるエネルギー固有状態からの仕事の抽出を禁止していることを示している。このノーゴー性質は、プランクの原理、すなわち熱力学の第2法則の形式が純粋量子状態に対しても成り立つことを意味する。その結果, 量子熱力学, 第2法則, 熱化の2つの独立に研究された概念を, 作業抽出の資源としての多体系系内相関を通じて橋渡しした。 We show that the maximum extractable work (ergotropy) from a quantum many-body system is constrained by local athermality of an initial state and local entropy decrease brought about by quantum operations. The obtained universal bound on ergotropy implies that the eigenstate thermalization hypothesis prohibits work extraction from energy eigenstates by means of finite-time unitary operations. This no-go property implies that Planck's principle, a form of the second law of thermodynamics, holds even for pure quantum states. Our result bridges two independently studied concepts of quantum thermodynamics, the second law and thermalization, via intrasystem correlations in many-body systems as a resource for work extraction.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# テキストグラフト:テキスト分類におけるマイノリティクラスのための近分布弱スーパービジョン Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification ( http://arxiv.org/abs/2406.11115v1 ) ライセンス: Link先を確認	Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang,	(参考訳) 極端に弱められたテキスト分類のために、先駆的な研究は、生のコーパスからクラス名に似たテキストをマイニングすることで擬似ラベルを生成する。最近の研究は、クラス名や定義を使って LLM に関連テキストを生成し始めたが、LCM が in-distribution (すなわち、テキスト分類器が適用されるコーパスに似た) データを生成できないリスクが高く、一般化不可能な分類に繋がる。本稿では,これら2つのアプローチの利点を組み合わせて,マイノリティクラスにおけるクリーンでほぼ分布の弱い監督者獲得を目的とした,新しいフレームワークである 'emph{text grafting} を通じてギャップを埋めることを提案する。具体的には、まずLLMベースのロジットを用いて、ターゲットとするマイノリティクラスへのデータ合成の可能性が高い生コーパスからマスク付きテンプレートをマイニングする。次に、テンプレートは最先端のLCMで満たされ、マイノリティクラスに該当する近分布テキストを合成する。テキストグラフトは、マイノリティクラスでの直接採掘や合成よりも大幅に改善されている。また,テキストグラフトの性質を理解するために解析とケーススタディを用いた。 For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot generate in-distribution (i.e., similar to the corpus where the text classifier will be applied) data, leading to ungeneralizable classifiers. In this paper, we combine the advantages of these two approaches and propose to bridge the gap via a novel framework, \emph{text grafting}, which aims to obtain clean and near-distribution weak supervision for minority classes. Specifically, we first use LLM-based logits to mine masked templates from the raw corpus, which have a high potential for data synthesis into the target minority class. Then, the templates are filled by state-of-the-art LLMs to synthesize near-distribution texts falling into minority classes. Text grafting shows significant improvement over direct mining or synthesis on minority classes. We also use analysis and case studies to comprehend the property of text grafting.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# ChatGPTにおける文法表現 : 言語学者とレイマンとの比較 Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople ( http://arxiv.org/abs/2406.11116v1 ) ライセンス: Link先を確認	Zhuang Qiu, Xufeng Duan, Zhenguang G. Cai,	(参考訳) 大規模言語モデル (LLM) は様々な言語課題において例外的な性能を示した。しかし、LLMが人間のような微粒な文法的直観を発達させたかどうかは不明である。この事前登録された研究 (https://osf.io/t5nes) は、ChatGPTの文法的直観を初めて大規模に調査し、言語学者が文法的、非文法的、辺縁的な文法的であると判断した148の言語的現象について、住民の文法的判断を収集した以前の研究に基づいている(Sprouse, Schutze, & Almeida, 2013)。我々の主な焦点は、これらの言語構成の判断において、ChatGPTを一般人と言語学者の両方と比較することであった。実験1では、ChatGPTは与えられた参照文に基づいて評価を文に割り当てた。実験2では7点の尺度で評価文を選択し,実験3ではChatGPTに対して,より文法的な文章を選択するように求めた。全体として,ChatGPTと言語学者の間には73%から95%の収束率があり,全体としては89%と推定された。また,全てのタスクにおいてChatGPTとレイパーの間に有意な相関関係が認められたが,相関強度はタスクによって異なる。これらの結果は、判断タスクの心理測定的性質と、人間とLLMの言語処理スタイルの違いによるものである。 Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 統計的契約による品質テキスト生成のインセンティブ Incentivizing Quality Text Generation via Statistical Contracts ( http://arxiv.org/abs/2406.11118v1 ) ライセンス: Link先を確認	Eden Saig, Ohad Einav, Inbal Talgam-Cohen,	(参考訳) 大規模言語モデル(LLMs)の成功は、機械生成テキストの需要を増加させる一方で、現在のペイ・パー・トーケンの価格体系は、経済においてモラルハザード(モラルハザード)として知られるインセンティブの誤調整を生み出している。本研究は、品質をインセンティブ化するための、パフォーマンスの高い契約ベースのフレームワークを提案することで、経済的な観点からこの問題にアプローチする。エージェントがコストのかかる推論を用いてテキストを生成するプリンシパルエージェントゲームについて検討し、自動品質評価に基づいて、契約がテキストに対するプリンシパルの支払いを決定する。内部推論コストが不明な場合,標準契約理論は適用できないため,コストロバスト契約を導入する。筆者らの主な理論的貢献として、統計学からの最適合成仮説テストと直接対応して最適コストロス契約を特徴づけ、Saig et al (NeurIPS'23) の結果を一般化する。我々は,様々な目標とLCM評価ベンチマークの契約を導出して,実証的にフレームワークを評価し,コストロスの契約は,コスト意識の契約よりも目標値の限界的な増加を犠牲にしていることがわかった。 While the success of large language models (LLMs) increases demand for machine-generated text, current pay-per-token pricing schemes create a misalignment of incentives known in economics as moral hazard: Text-generating agents have strong incentive to cut costs by preferring a cheaper model over the cutting-edge one, and this can be done "behind the scenes" since the agent performs inference internally. In this work, we approach this issue from an economic perspective, by proposing a pay-for-performance, contract-based framework for incentivizing quality. We study a principal-agent game where the agent generates text using costly inference, and the contract determines the principal's payment for the text according to an automated quality evaluation. Since standard contract theory is inapplicable when internal inference costs are unknown, we introduce cost-robust contracts. As our main theoretical contribution, we characterize optimal cost-robust contracts through a direct correspondence to optimal composite hypothesis tests from statistics, generalizing a result of Saig et al. (NeurIPS'23). We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 時間制約付き身体制御のためのモデル適応 Model Adaptation for Time Constrained Embodied Control ( http://arxiv.org/abs/2406.11128v1 ) ライセンス: Link先を確認	Jaehyun Song, Minjong Yoo, Honguk Woo,	(参考訳) エージェントの深層学習モデルを採用するには,特定のタスクや運用条件に対してモデル構造を最適化する必要がある。このような最適化は、モデル圧縮や適応推論のような動的に静的である。しかし、これらの手法は、複数のタスクに対する逐次的な意思決定を必要とする時間制約を考慮に入れた実施制御システムについて、完全には研究されていない。本稿では,モジュラーモデル適応を用いた時間制約を考慮した具体化制御フレームワークであるMoDeCを提案する。モジュールネットワーク上の動的ルーティングとして資源および時間制限の様々な運用条件に対するモデル適応を定式化し、これらの条件をマルチタスクの目的の一部として組み込む。ロボット操作と自律運転における時間制約の両面において,MoDeCのロバスト性を示し,他のモデル適応手法よりも優れていることを示す。 When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential decision-making for multiple tasks, each with distinct inference latency limitations. In this paper, we present MoDeC, a time constraint-aware embodied control framework using the modular model adaptation. We formulate model adaptation to varying operational conditions on resource and time restrictions as dynamic routing on a modular network, incorporating these conditions as part of multi-task objectives. Our evaluation across several vision-based embodied environments demonstrates the robustness of MoDeC, showing that it outperforms other model adaptation methods in both performance and adherence to time constraints in robotic manipulation and autonomous driving applications	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 神経系 Neural Lineage ( http://arxiv.org/abs/2406.11129v1 ) ライセンス: Link先を確認	Runpeng Yu, Xinchao Wang,	(参考訳) ニューラルネットワークが十分に機能しているとすれば、その親を、そのチューニングに基づいて特定することは可能だろうか? 本稿では,親子間の系統関係の発見を目的としたニューラルライン検出という新しいタスクを提案する。具体的には、一組の親モデルから、神経系統検出は、子モデルからどの親モデルが微調整されたかを予測する。この課題に対処するための2つのアプローチを提案する。 1) 実用上, 微調整過程の近似をニューラルネットワーク表現類似度指標に統合した学習自由アプローチを導入し, 類似性に基づく系統検出手法を提案する。 2) 精度の追求のために, エンコーダと変圧器検出器を組み合わせた学習系系統検出装置を導入する。実験を通じて,提案手法が学習環境におけるベースラインよりも優れており,様々な視覚モデルに適応可能であることを検証した。さらに、親モデルだけでなく祖先も識別し、世代間の血統を辿る能力も示している。 Given a well-behaved neural network, is possible to identify its parent, based on which it was tuned? In this paper, we introduce a novel task known as neural lineage detection, aiming at discovering lineage relationships between parent and child models. Specifically, from a set of parent models, neural lineage detection predicts which parent model a child model has been fine-tuned from. We propose two approaches to address this task. (1) For practical convenience, we introduce a learning-free approach, which integrates an approximation of the finetuning process into the neural network representation similarity metrics, leading to a similarity-based lineage detection scheme. (2) For the pursuit of accuracy, we introduce a learning-based lineage detector comprising encoders and a transformer detector. Through experimentation, we have validated that our proposed learning-free and learning-based methods outperform the baseline in various learning settings and are adaptable to a variety of visual models. Moreover, they also exhibit the ability to trace cross-generational lineage, identifying not only parent models but also their ancestors.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 生成アスペクトに基づく感性分析のための動的順序テンプレート予測 Dynamic Order Template Prediction for Generative Aspect-Based Sentiment Analysis ( http://arxiv.org/abs/2406.11130v1 ) ライセンス: Link先を確認	Yonghyun Jun, Hwanhee Lee,	(参考訳) アスペクトベースの感情分析(ABSA)は、テキスト内の特定の側面に対する感情を評価し、詳細な感情タプルをもたらす。以前のABSAモデルは、しばしば静的テンプレートを使用してタプル内のすべての要素を予測する。マルチビュープロンプト法は,様々なテンプレートでタプルを予測し,結果をアンサンブルすることでABSAの性能を向上させる。しかし、この方法は非効率性や分配エラーに悩まされる。本稿では,インスタンスレベルのエントロピーに基づいて,各インスタンスに必要なビューを動的に生成するABSAの動的順序テンプレート(DOT)手法を提案する。提案手法は,ASQPおよびACOSデータセットのF1スコアを改善するとともに,推論時間を大幅に短縮する。 Aspect-based sentiment analysis (ABSA) assesses sentiments towards specific aspects within texts, resulting in detailed sentiment tuples. Previous ABSA models often use static templates to predict all of the elements in the tuples, and these models often fail to accurately capture dependencies between elements. Multi-view prompting method improves the performance of ABSA by predicting tuples with various templates and then ensembling the results. However, this method suffers from inefficiencies and out-of-distribution errors. In this paper, we propose a Dynamic Order Template (DOT) method for ABSA, which dynamically generates necessary views for each instance based on instance-level entropy. Ensuring the diverse and relevant view generation, our proposed method improves F1-scores on ASQP and ACOS datasets while significantly reducing inference time.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# 大規模言語モデルは分類学のよい置き換えか? Are Large Language Models a Good Replacement of Taxonomies? ( http://arxiv.org/abs/2406.11131v1 ) ライセンス: Link先を確認	Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen,	(参考訳) 大きな言語モデル(LLM)は、知識を内部化し、自然言語の質問に答える素晴らしい能力を示している。従来の知識グラフがLLMに置き換わるべきかどうかについては,従来の知識グラフがLLMに置き換わるべきかどうか,コミュニティは疑念を抱いている。本稿では,LLMによって知識グラフのスキーマ(分類学)が時代遅れになるかどうかを問う。直感的には、LLMは一般的な分類学や人間に共通する分類学レベルでうまく機能すべきである。残念なことに、LLMを一般的なドメインから特定のドメイン、ルートからリーフまでのレベルまで幅広く評価する包括的なベンチマークが欠けているため、確実な結論が得られます。研究ギャップを狭めるため,分類学上のLLMの性能を評価するため,TaxoGlimpseという新しい分類階層構造探索ベンチマークを構築した。 TaxoGlimpseは10の代表的な分類体系を網羅し、根から葉まで、この分類学におけるさまざまなレベルの実体の詳細な実験を行っている。現状のLLM18種の総合的な実験から, LLM18種の分類学的知識を十分に把握できないことが確認された。 Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask if the schema of knowledge graph (i.e., taxonomy) is made obsolete by LLMs. Intuitively, LLMs should perform well on common taxonomies and at taxonomy levels that are common to people. Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion. To narrow the research gap, we constructed a novel taxonomy hierarchical structure discovery benchmark named TaxoGlimpse to evaluate the performance of LLMs over taxonomies. TaxoGlimpse covers ten representative taxonomies from common to specialized domains with in-depth experiments of different levels of entities in this taxonomy from root to leaf. Our comprehensive experiments of eighteen state-of-the-art LLMs under three prompting settings validate that LLMs can still not well capture the knowledge of specialized taxonomies and leaf-level entities.	翻訳日:2024-06-18 18:53:41 公開日:2024-06-17
# RePrompt:大規模言語モデルエージェントのための自動プロンプトエンジニアリングによる計画 RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents ( http://arxiv.org/abs/2406.11132v1 ) ライセンス: Link先を確認	Weizhe Chen, Sven Koenig, Bistra Dilkina,	(参考訳) この1年間で、大規模言語モデル(LLM)は、従来の自然言語処理以外の領域で顕著な成功を収め、コード生成や旅行計画、ロボット制御といったアプリケーションドメインに近い、より一般的で近い分野におけるLLMの使用を探求し始めている。 LLMを優れた能力と外部ツールで結びつけることで、人びとはLLMエージェントと呼ばれるエージェントを構築している。これらすべての領域において、LLMのプロンプトはLLMが生成するものに大きな違いを示し、LLMエージェントの性能に影響を及ぼす。したがって、自動プロンプトエンジニアリングは多くの研究者やLLMのユーザにとって重要な問題となっている。本稿では, LLMエージェントとの対話から得られるチャット履歴に基づいて, LLMエージェントのプロンプトにおけるステップバイステップ命令を最適化する,新しい手法である「textsc{RePrompt}」を提案する。プロンプトを最適化することで、LLMは特定のドメインで計画する方法を学ぶことができる。我々はPDDL生成と旅行計画において、更新プロンプトを初期プロンプトとして使用する場合、一般的に異なる推論タスクの性能を向上させることができることを示すために実験を行った。 In this past year, large language models (LLMs) have had remarkable success in domains outside the traditional natural language processing, and people are starting to explore the usage of LLMs in more general and close to application domains like code generation, travel planning, and robot controls. Connecting these LLMs with great capacity and external tools, people are building the so-called LLM agents, which are supposed to help people do all kinds of work in everyday life. In all these domains, the prompt to the LLMs has been shown to make a big difference in what the LLM would generate and thus affect the performance of the LLM agents. Therefore, automatic prompt engineering has become an important question for many researchers and users of LLMs. In this paper, we propose a novel method, \textsc{RePrompt}, which does "gradient descent" to optimize the step-by-step instructions in the prompt of the LLM agents based on the chat history obtained from interactions with LLM agents. By optimizing the prompt, the LLM will learn how to plan in specific domains. We have used experiments in PDDL generation and travel planning to show that our method could generally improve the performance for different reasoning tasks when using the updated prompt as the initial prompt.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# メンタルヘルスの会話における感情の理解に向けて Towards Understanding Emotions for Engaged Mental Health Conversations ( http://arxiv.org/abs/2406.11135v1 ) ライセンス: Link先を確認	Kellie Yu Hui Sim, Kohleen Tijing Fortuno, Kenny Tsu Wei Choo,	(参考訳) タイムリーなサポートと介入を提供することは、メンタルヘルス設定において不可欠である。若者をテキストメッセージに慣れさせる必要性が高まる中、メンタルヘルスプロバイダーはチャットボット、コミュニティベースのフォーラム、認可された専門家によるオンラインセラピー、訓練された対応者が運営するヘルプラインなど、テキストベースのメディアを探求し、採用している。メンタルヘルスのためのテキストベースのメディア,特に危機ケアを支援するため,我々はキーストロークダイナミクスと感情分析を組み合わせた受動的感情センシングシステムを開発している。このシステムの初期の研究は、短いテキストメッセージとキーボードタイピングパターンの分析により、クライアントと応答者の両方をサポートするために使用できる感情情報を提供できることを示唆している。予備的な研究結果を用いて、メンタルヘルスプロバイダーがより良いケアを提供するためのAIの適用方法について議論する。 Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--particularly for crisis care--we are developing a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard typing patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 低レベル視における拡散モデル:サーベイ Diffusion Models in Low-Level Vision: A Survey ( http://arxiv.org/abs/2406.11138v1 ) ライセンス: Link先を確認	Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li,	(参考訳) 深部生成モデルは、その生成能力のため、低レベルの視覚タスクに大きな注目を集めている。このうち, 拡散モデルに基づく解法は, 高い品質と多様性のサンプルを生成できることで広く評価されている。これにより、複雑なテクスチャ情報によって視覚的に説得力のある結果が生成される。その顕著な成功にもかかわらず、これらの先駆的な拡散モデルに基づく研究とそれに対応するスレッドを整理する、包括的な調査に顕著なギャップが存在する。本稿では拡散モデルに基づく手法の総合的なレビューを提案する。本稿では,3つの一般化拡散モデルフレームワークを提案し,それらの関係を他の深層生成モデルと検討し,理論基盤を確立する。次に、基礎となるフレームワークと対象タスクの両方を考慮して、拡散モデルの多重パースペクティブな分類を導入する。さらに、医療、リモートセンシング、ビデオシナリオなど、他のタスクに適用された拡張拡散モデルについても要約する。さらに、よく使われるベンチマークと評価指標の概要について述べる。 3つの課題において拡散モデルに基づく手法の性能と効率の両面を網羅的に評価する。最後に,現在の拡散モデルの限界を解明し,今後の研究に興味深い7つの方向を提案する。この総合的な調査は,低レベル視覚タスクの文脈における認知拡散モデルを取り巻く景観の深い理解を促進することを目的としている。 20以上の低レベルビジョンタスクにおける拡散モデルベースのテクニックのキュレートされたリストは、https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-visionにある。 Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 境界線を破る: モデル編集が言語間性能に及ぼす影響について Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance ( http://arxiv.org/abs/2406.11139v1 ) ライセンス: Link先を確認	Somnath Banerjee, Avik Halder, Rajarshi Mandal, Sayan Layek, Ian Soboroff, Rima Hazra, Animesh Mukherjee,	(参考訳) BERTやGPTのような事前訓練された言語モデル(PLM)の統合は、特に英語においてNLPに革命をもたらしたが、言語的不均衡も生んでいる。本稿では,多言語文脈における知識編集技術を検討することにより,言語的平等の必要性を戦略的に識別する。 Mistral, TowerInstruct, OpenHathi, Tamil-Llama, Kan-Llamaなどのモデルの性能を,英語,ドイツ語,フランス語,イタリア語,スペイン語,ヒンディー語,タミル語,カンナダ語を含む言語で評価した。本研究は,言語間整合性に関する正規モデルとマージモデルにおいて重要な相違点を同定する。我々は、これらのモデルをストレステストするために、'each language for itself'(ELFI)や'each language for others'(ELFO)のような戦略を採用している。我々の研究は、LLMが言語的障壁を克服する可能性を実証し、AI技術における言語的傾倒を達成するための基礎となる基礎を築いた。 The integration of pretrained language models (PLMs) like BERT and GPT has revolutionized NLP, particularly for English, but it has also created linguistic imbalances. This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts. We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada. Our research identifies significant discrepancies in normal and merged models concerning cross-lingual consistency. We employ strategies like 'each language for itself' (ELFI) and 'each language for others' (ELFO) to stress-test these models. Our findings demonstrate the potential for LLMs to overcome linguistic barriers, laying the groundwork for future research in achieving linguistic inclusivity in AI technologies.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# Bifurcations のアクティブサーチ Active search for Bifurcations ( http://arxiv.org/abs/2406.11141v1 ) ライセンス: Link先を確認	Yorgos M. Psarellis, Themistoklis P. Sapsis, Ioannis G. Kevrekidis,	(参考訳) ビフルケーションは力学系における長期挙動の質的変化を示し、しばしば突然の(硬い)遷移や破滅的な事象(発散)を知らせる。正確な位置決めは、観察された動的な振る舞いをより深く理解するためにだけでなく、効率的な介入を設計するためにも重要である。手前の力学系が複雑で、ノイズがあり、サンプリングに費用がかかる場合、標準的な(例えば連続法に基づく)数値法は実用的でない。ベイズ最適化を応用して,少数のベクトル場観測からサドルノードやホップ分岐を発見する能動的学習フレームワークを提案する。このようなアプローチは、状態 x パラメータ空間探索がリソース制限のシステムでは特に魅力的になる。また、本質的な確率性を持つシステムで有用な不確実性定量化(放散虫とてんかん)の枠組みも自然に提供する。 Bifurcations mark qualitative changes of long-term behavior in dynamical systems and can often signal sudden ("hard") transitions or catastrophic events (divergences). Accurately locating them is critical not just for deeper understanding of observed dynamic behavior, but also for designing efficient interventions. When the dynamical system at hand is complex, possibly noisy, and expensive to sample, standard (e.g. continuation based) numerical methods may become impractical. We propose an active learning framework, where Bayesian Optimization is leveraged to discover saddle-node or Hopf bifurcations, from a judiciously chosen small number of vector field observations. Such an approach becomes especially attractive in systems whose state x parameter space exploration is resource-limited. It also naturally provides a framework for uncertainty quantification (aleatoric and epistemic), useful in systems with inherent stochasticity.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 高速かつ高精度な粒度検出のためのクラッタ内の粒度検出 Graspness Discovery in Clutters for Fast and Accurate Grasp Detection ( http://arxiv.org/abs/2406.11142v1 ) ライセンス: Link先を確認	Chenxi Wang, Hao-Shu Fang, Minghao Gou, Hongjie Fang, Jin Gao, Cewu Lu,	(参考訳) ロボット操作には、効率的で堅牢なグリップポーズ検出が不可欠である。一般的な6つのDoFグルーピングでは、従来の方法ではシーン内のすべての点を等しく扱い、通常、一様サンプリングを用いてグルーピング候補を選択する。しかし、把握すべき場所を無視することは、現在の把握ポーズ検出手法の速度と精度を著しく損なうことが判明した。本稿では,散らばったシーンにおける把握可能な領域を識別する幾何学的手がかりに基づく質の「グラッピーネス」を提案する。本手法の合理性を正当に把握し, 統計的に評価するために, ルックアヘッド探索法を提案する。実際に把握性を迅速に検出するために,探索過程を近似するカスケード把握モデルを構築した。広汎な実験により, 把握モデルの安定性, 汎用性, 有効性を検証し, 異なる手法のプラグアンドプレイモジュールとして使用することができる。我々の把握度モデルに載った後、様々な従来手法の精度が大幅に向上するのを目撃する。さらに,低品質な予測を早期にフィルタリングするための把握性モデルを組み込んだエンドツーエンドネットワークであるGSNetを開発した。大規模ベンチマークであるGraspNet-1Billionの実験により,提案手法は従来よりも30以上の差(30以上のAP)で性能が向上し,推論速度が向上することを示した。 GSNetのライブラリは、https://github.com/graspnet/anygrasp_sdkにあるAnyGraspに統合されている。 Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF grasping, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose "graspness", a quality based on geometry cues that distinguishes graspable areas in cluttered scenes. A look-ahead searching method is proposed for measuring the graspness and statistical results justify the rationality of our method. To quickly detect graspness in practice, we develop a neural network named cascaded graspness model to approximate the searching process. Extensive experiments verify the stability, generality and effectiveness of our graspness model, allowing it to be used as a plug-and-play module for different methods. A large improvement in accuracy is witnessed for various previous methods after equipping our graspness model. Moreover, we develop GSNet, an end-to-end network that incorporates our graspness model for early filtering of low-quality predictions. Experiments on a large-scale benchmark, GraspNet-1Billion, show that our method outperforms previous arts by a large margin (30+ AP) and achieves a high inference speed. The library of GSNet has been integrated into AnyGrasp, which is at https://github.com/graspnet/anygrasp_sdk.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 総合医療データ評価・報告のためのスコアカード Scorecards for Synthetic Medical Data Evaluation and Reporting ( http://arxiv.org/abs/2406.11143v1 ) ライセンス: Link先を確認	Ghada Zamzmi, Adarsh Subbaswamy, Elena Sizikova, Edward Margerrison, Jana Delfino, Aldo Badano,	(参考訳) 医療におけるAI駆動ツールのトレーニングおよびテストにおける合成医療データ(SMD)の利用の増加は、SMDの品質を評価するための体系的な枠組みを必要とする。現在、SMDを評価するための標準化された方法論が欠如しており、特に様々な医療シナリオに適用可能であるという点では、医療応用において広く受け入れられ、活用されていることへの大きな障害となっている。本稿では、医療応用のユニークな要件を満たすために設計された評価フレームワークの概要と、人工的に生成されたデータセットに付随する総合的なレポートとして機能するSMDスコアカードの概念を紹介する。これにより、SMD開発者は、注意が必要な領域を特定し、合成データをより正確に近似させることにより、評価を標準化し、SMDの品質を評価し、さらに向上することができる。 The growing utilization of synthetic medical data (SMD) in training and testing AI-driven tools in healthcare necessitates a systematic framework for assessing SMD quality. The current lack of a standardized methodology to evaluate SMD, particularly in terms of its applicability in various medical scenarios, is a significant hindrance to its broader acceptance and utilization in healthcare applications. Here, we outline an evaluation framework designed to meet the unique requirements of medical applications, and introduce the concept of SMD scorecards, which can serve as comprehensive reports that accompany artificially generated datasets. This can help standardize evaluation and enable SMD developers to assess and further enhance the quality of SMDs by identifying areas in need of attention and ensuring that the synthetic data more accurately approximate patient data.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# パーソナライズされた表現を用いたフェデレーション顔偽検出学習 Federated Face Forgery Detection Learning with Personalized Representation ( http://arxiv.org/abs/2406.11145v1 ) ライセンス: Link先を確認	Decheng Liu, Zhan Dang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao,	(参考訳) ディープジェネレータ技術は、区別がつかない高品質のフェイクビデオを制作し、深刻な社会的脅威をもたらす可能性がある。従来の偽造検出手法は、データを直接集中的にトレーニングし、非パブリックなビデオデータシナリオとデータプライバシにおける情報共有の考慮を欠いていた。当然、フェデレートされた学習戦略は、クライアントのモデルパラメータを集約するが、元のデータではないプライバシー保護に適用することができる。しかし、実際のハイブリッドドメインフォージェリーデータセットの一般化能力が貧弱なため、単純なフェデレート学習では十分な性能が得られない。そこで本研究では,個人化表現を用いた新しいフェデレーション顔偽造検出学習を提案する。デザインされたPersonalized Forgery Representation Learningは、各クライアントのパーソナライズされた表現を学習し、個々のクライアントモデルの検出性能を改善することを目的としている。さらに、分散検出モデルのパラメータを更新するために、パーソナライズド・フェデレーション・ラーニング・トレーニング戦略を利用する。ここでは、複数の分散クライアントデバイス上で協調的なトレーニングが行われ、これらのクライアントモデルの共有表現が集約のためにサーバ側にアップロードされます。いくつかの公開顔偽造検出データセットの実験は、提案手法の最先端手法と比較して、提案アルゴリズムの優れた性能を示す。コードは \emph{https://github.com/GANG370/PFR-Forgery で公開されている。 ※ Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat. Traditional forgery detection methods directly centralized training on data and lacked consideration of information sharing in non-public video data scenarios and data privacy. Naturally, the federated learning strategy can be applied for privacy protection, which aggregates model parameters of clients but not original data. However, simple federated learning can't achieve satisfactory performance because of poor generalization capabilities for the real hybrid-domain forgery dataset. To solve the problem, the paper proposes a novel federated face forgery detection learning with personalized representation. The designed Personalized Forgery Representation Learning aims to learn the personalized representation of each client to improve the detection performance of individual client models. In addition, a personalized federated learning training strategy is utilized to update the parameters of the distributed detection model. Here collaborative training is conducted on multiple distributed client devices, and shared representations of these client models are uploaded to the server side for aggregation. Experiments on several public face forgery detection datasets demonstrate the superior performance of the proposed algorithm compared with state-of-the-art methods. The code is available at \emph{https://github.com/GANG370/PFR-Forgery.}	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# Vul-RAG:知識レベルRAGによるLCMに基づく脆弱性検出の強化 Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG ( http://arxiv.org/abs/2406.11147v1 ) ライセンス: Link先を確認	Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Xin Peng, Tao Ma, Yiling Lou,	(参考訳) 脆弱性検出はソフトウェアの品質保証に不可欠である。近年,ディープラーニングモデル(特に大規模言語モデル)は,脆弱性検出の可能性を示唆している。本研究では,LLMに基づく脆弱性検出手法であるVul-RAGを提案する。まず、Vul-RAGは、既存のCVEインスタンスからLLMを介して多次元知識を抽出し、脆弱性知識ベースを構築する。次に、与えられたコードスニペットに対して、Vul-RAGは、機能的セマンティクスに基づいて構築された知識ベースから関連する脆弱性知識を検索する。 PairVul を用いた Vul-RAG の評価は,Vul-RAG が精度/ペアワイズ精度の相対的向上率を 12.96 %/110 % で大幅に向上していることを示す。さらに,Vul-RAGによる脆弱性知識は,手動検出精度を0.60から0.77に向上させる,高品質な説明として機能することを示す。 Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# ステージワイズ強化ファインタニングによるFew-Shot認識 Few-Shot Recognition via Stage-Wise Augmented Finetuning ( http://arxiv.org/abs/2406.11148v1 ) ライセンス: Link先を確認	Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong,	(参考訳) 少ないショット認識は、事前定義された概念のいくつかのラベル付き例で分類モデルをトレーニングすることを目的としており、ダウンストリームタスクではアノテーションがコストがかかる可能性がある。別の研究領域では、ダウンストリームタスクデータへのアクセスを前提としないゼロショット認識が、事前訓練されたビジョンランゲージモデル(VLM)を用いて大幅に進歩している。この領域において、検索強化学習(RAL)は、下流の概念に関連する外部データから抽出および学習することにより、ゼロショット精度を効果的に向上する。これらの進歩に感銘を受けた我々の研究は、数発の認識のためにアラビアを探索する。文献であまり研究されていないにもかかわらず(今はまだ!)、我々は、数発の認識にRALを適用するための新しい課題と機会を提示する。まず、おそらく意外なことに、取得した大量のデータに対してVLMを微調整することは、取得したデータの不均衡な分布とドメインギャップのため、最先端のゼロショット法をわずかに超える。第二に、少数ショットの例だけでVLMを微調整することは、以前の方法よりも大幅に優れており、検索したデータと少数ショットのデータの組み合わせによる微調整の方が、より優れた結果が得られる。第3に,不均衡分布と領域ギャップの問題を軽減するために,第1段階の混合データに対してエンドツーエンドの微調整を行い,第2段階の少数ショットデータのみに基づいて分類器を再訓練するSWAT法を提案する。大規模な実験により、SWATは標準ベンチマークデータセット上で最高のパフォーマンスを達成し、事前作業の精度が約10%向上したことが示された。コードはhttps://github.com/tian1327/SWAT.comで入手できる。 Few-shot recognition aims to train a classification model with only a few labeled examples of pre-defined concepts, where annotation can be costly in a downstream task. In another related research area, zero-shot recognition, which assumes no access to any downstream-task data, has been greatly advanced by using pretrained Vision-Language Models (VLMs). In this area, retrieval-augmented learning (RAL) effectively boosts zero-shot accuracy by retrieving and learning from external data relevant to downstream concepts. Motivated by these advancements, our work explores RAL for few-shot recognition. While seemingly straightforward despite being under-explored in the literature (till now!), we present novel challenges and opportunities for applying RAL for few-shot recognition. First, perhaps surprisingly, simply finetuning the VLM on a large amount of retrieved data barely surpasses state-of-the-art zero-shot methods due to the imbalanced distribution of retrieved data and its domain gaps compared to few-shot annotated data. Second, finetuning a VLM on few-shot examples alone significantly outperforms prior methods, and finetuning on the mix of retrieved and few-shot data yields even better results. Third, to mitigate the imbalanced distribution and domain gap issue, we propose Stage-Wise Augmented fineTuning (SWAT) method, which involves end-to-end finetuning on mixed data for the first stage and retraining the classifier solely on the few-shot data in the second stage. Extensive experiments show that SWAT achieves the best performance on standard benchmark datasets, resoundingly outperforming prior works by ~10% in accuracy. Code is available at https://github.com/tian1327/SWAT.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# GoldCoin: コンテキスト整合性理論によるプライバシー法則の大規模言語モデル構築 GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory ( http://arxiv.org/abs/2406.11149v1 ) ライセンス: Link先を確認	Wei Fan, Haoran Li, Zheye Deng, Weiqi Wang, Yangqiu Song,	(参考訳) プライバシ問題は、エンティティ間の情報の不適切な送信中に顕著に発生する。既存の研究は、様々なプライバシー攻撃、防衛、評価を狭義に定義されたパターンの中で探求し、プライバシーは伝統的に機密性の高いデータ(社会保障番号など)に限られる孤立した文脈のない概念ではなく、潜在的なプライバシー侵害の識別と分析を複雑化する複雑な社会的コンテキストに絡み合っていることを無視する。 LLM(Large Language Models)の出現は、これらの複雑なプライバシー問題に対処するために、プライバシー法で概説された曖昧なシナリオを取り入れる前例のない機会を提供する。しかし、オープンソースに関するケーススタディの欠如は、特定の法規に適合するLLMの効率を制限している。この課題に対処するため,我々は,プライバシー侵害を評価する司法法において,LLMを効果的に活用するための新しい枠組みであるGoldCoinを紹介した。我々のフレームワークは、コンテキスト整合性の理論をブリッジとして活用し、関連するプライバシー規則(例えばHIPAA)に基づく多数の合成シナリオを作成し、LLMが現実世界のプライバシーリスクを特定する複雑なコンテキストを理解するのを支援する。広範な実験結果から、ゴールドコインは、現実の裁判所におけるプライバシーリスクを認識し、異なる司法業務のベースラインを超越するLLMの能力を著しく向上することが示された。 Privacy issues arise prominently during the inappropriate transmission of information between entities. Existing research primarily studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns, while neglecting that privacy is not an isolated, context-free concept limited to traditionally sensitive data (e.g., social security numbers), but intertwined with intricate social contexts that complicate the identification and analysis of potential privacy violations. The advent of Large Language Models (LLMs) offers unprecedented opportunities for incorporating the nuanced scenarios outlined in privacy laws to tackle these complex privacy issues. However, the scarcity of open-source relevant case studies restricts the efficiency of LLMs in aligning with specific legal statutes. To address this challenge, we introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. Our framework leverages the theory of contextual integrity as a bridge, creating numerous synthetic scenarios grounded in relevant privacy statutes (e.g., HIPAA), to assist LLMs in comprehending the complex contexts for identifying privacy risks in the real world. Extensive experimental results demonstrate that GoldCoin markedly enhances LLMs' capabilities in recognizing privacy risks across real court cases, surpassing the baselines on different judicial tasks.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 機械学習のためのランダム化数値線形代数の最近の動向と今後の展開 Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning ( http://arxiv.org/abs/2406.11151v1 ) ライセンス: Link先を確認	Michał Dereziński, Michael W. Mahoney,	(参考訳) 大規模な行列は、データセット、グラフ、モデルウェイト、第1および第2階微分の表現など、多くの機械学習およびデータ分析アプリケーションで発生する。 RandNLA (Randomized Numerical Linear Algebra) は、ランダムネスを用いてユビキタス行列問題に対する改良アルゴリズムを開発する分野である。この領域は一定の成熟度に達しているが、最近のハードウェアのトレンド、RandNLAアルゴリズムを核となる数値ライブラリに組み込む取り組み、機械学習、統計学、ランダム行列理論の進歩は、新たな理論的および実践的な課題をもたらしている。この記事では、これらの開発状況を踏まえた自己完結したRandNLAの概要を紹介する。 Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity; but recent hardware trends, efforts to incorporate RandNLA algorithms into core numerical libraries, and advances in machine learning, statistics, and random matrix theory, have lead to new theoretical and practical challenges. This article provides a self-contained overview of RandNLA, in light of these developments.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# DELRec:LCMをベースとしたレコメンデーションを促進するためにシーケンスパターンを蒸留する DELRec: Distilling Sequential Pattern to Enhance LLM-based Recommendation ( http://arxiv.org/abs/2406.11156v1 ) ライセンス: Link先を確認	Guohao Sun, Haoyi Zhang,	(参考訳) 逐次レコメンデーション(SR)タスクは、ユーザの過去のインタラクションと好みの変化を関連付けることによって、レコメンデーションの精度を高める。従来のモデルは、トレーニングデータ内のシーケンシャルなパターンをキャプチャすることだけに集中し、外部ソースからアイテムタイトルに埋め込まれたより広いコンテキストやセマンティックな情報を無視することが多い。これにより、予測力と適応性が制限される。近年,大規模言語モデル (LLM) は,高度な理解能力と強力な一般化能力により,SRタスクにおいて有望であることが示されている。研究者はSRモデルからの情報を取り入れることでLLMの推奨性能を向上しようと試みている。しかし、以前のアプローチは、例えば問題に遭遇した。 1) 結果レベルでのLSMへの影響,2) 解釈可能性の低下につながるLSM推奨手法の複雑さの増加。 3) SRモデル情報のLLMによる不完全な理解と利用これらの問題に対処するために,SRモデルから知識を抽出し,LLMがこれらの補足情報を容易に理解し,より効果的な逐次レコメンデーションに活用することを目的とした,新しいフレームワークDELRecを提案する。 DELRecは2つの主要なステージから構成される。 1)SRモデルパターン蒸留 : 2つのよく設計された戦略を通じてソフトプロンプトを用いて、SRモデルが示す行動パターンを抽出することに焦点を当てる。 2) LLMをベースとした逐次勧告は, 蒸留補助情報を効果的に活用し, SRタスクを遂行することを目的としている。 3つの実データセットで実施された大規模な実験結果から,DLRecフレームワークの有効性が検証された。 Sequential recommendation (SR) tasks enhance recommendation accuracy by capturing the connection between users' past interactions and their changing preferences. Conventional models often focus solely on capturing sequential patterns within the training data, neglecting the broader context and semantic information embedded in item titles from external sources. This limits their predictive power and adaptability. Recently, large language models (LLMs) have shown promise in SR tasks due to their advanced understanding capabilities and strong generalization abilities. Researchers have attempted to enhance LLMs' recommendation performance by incorporating information from SR models. However, previous approaches have encountered problems such as 1) only influencing LLMs at the result level;2) increased complexity of LLMs recommendation methods leading to reduced interpretability; 3) incomplete understanding and utilization of SR models information by LLMs. To address these problems, we proposes a novel framework, DELRec, which aims to extract knowledge from SR models and enable LLMs to easily comprehend and utilize this supplementary information for more effective sequential recommendations. DELRec consists of two main stages: 1) SR Models Pattern Distilling, focusing on extracting behavioral patterns exhibited by SR models using soft prompts through two well-designed strategies; 2) LLMs-based Sequential Recommendation, aiming to fine-tune LLMs to effectively use the distilled auxiliary information to perform SR tasks. Extensive experimental results conducted on three real datasets validate the effectiveness of the DELRec framework.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# DeFiGuard: グラフニューラルネットワークを用いたDeFiの価格操作検出サービス DeFiGuard: A Price Manipulation Detection Service in DeFi using Graph Neural Networks ( http://arxiv.org/abs/2406.11157v1 ) ライセンス: Link先を確認	Dabao Wang, Bang Wu, Xingliang Yuan, Lei Wu, Yajin Zhou, Helei Cui,	(参考訳) 分散ファイナンス(DeFi)の繁栄は、2018年から2022年にかけて、分散アプリケーション(DApps)の脆弱性により、損失が320億米ドルを超えたという、基本的なリスクを明らかにしている。重要な脅威の1つは、取引実行中に資産価格を変更する価格操作攻撃(PMA)である。その結果、PMAは5000万USドル以上の損失を計上した。本稿では,グラフニューラルネットワーク(GNN)を用いた新しい検出サービスであるDeFiGuardを紹介する。本稿では4つの特徴を持つキャッシュフローグラフを提案する。さらに、DeFiGuardはトランザクション解析、グラフ構築、モデルトレーニング、PMA検出を統合している。 208のPMAと2,080の非PMAトランザクションのデータセットの評価によると、GNNモデルによるDeFiGuardは、精度、TPR、FPR、AUC-ROCのベースラインを上回っている。アブレーション研究の結果,提案した4つのノード特徴の組み合わせがDeFiGuardの有効性を高めることが示唆された。さらに、DeFiGuardは0.892秒から5.317秒以内にトランザクションを分類し、被害者(DAppsやユーザー)が脆弱な資金を救うために行動を起こすのに十分な時間を提供している。結論として、本研究は、GNNを用いてPMAからDeFiランドスケープを保護するための重要なステップを提供する。 The prosperity of Decentralized Finance (DeFi) unveils underlying risks, with reported losses surpassing 3.2 billion USD between 2018 and 2022 due to vulnerabilities in Decentralized Applications (DApps). One significant threat is the Price Manipulation Attack (PMA) that alters asset prices during transaction execution. As a result, PMA accounts for over 50 million USD in losses. To address the urgent need for efficient PMA detection, this paper introduces a novel detection service, DeFiGuard, using Graph Neural Networks (GNNs). In this paper, we propose cash flow graphs with four distinct features, which capture the trading behaviors from transactions. Moreover, DeFiGuard integrates transaction parsing, graph construction, model training, and PMA detection. Evaluations on a dataset of 208 PMA and 2,080 non-PMA transactions show that DeFiGuard with GNN models outperforms the baseline in Accuracy, TPR, FPR, and AUC-ROC. The results of ablation studies suggest that the combination of the four proposed node features enhances DeFiGuard's efficacy. Moreover, DeFiGuard classifies transactions within 0.892 to 5.317 seconds, which provides sufficient time for the victims (DApps and users) to take action to rescue their vulnerable funds. In conclusion, this research offers a significant step towards safeguarding the DeFi landscape from PMAs using GNNs.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 定常な分散確率勾配Descent:確率遅延微分方程式に基づくフレームワーク Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework ( http://arxiv.org/abs/2406.11159v1 ) ライセンス: Link先を確認	Siyuan Yu, Wei Chen, H. Vincent Poor,	(参考訳) 分散確率勾配勾配(SGD)は、計算リソースのスケーリング、トレーニング時間の短縮、マシンラーニングにおけるユーザのプライバシ保護の支援などにより、近年注目されている。しかし、スタガーと帯域幅の制限はランダムな計算/通信遅延を引き起こす可能性があるため、学習プロセスが著しく妨げられる。したがって、複数のワーカーを効率的にスケジューリングすることで非同期SGDをいかに加速するかが重要な問題である。本稿では,確率的遅延微分方程式(SDDE)とポアソン近似に基づく非同期SGDの収束解析と最適化を行う。特に,分散SGDの実行時間と安定度を,計算時間に対するメモリレスの仮定なしで提示する。学習率から, SDDEの減衰係数とその遅延統計値, アクティベートクライアント数, 安定化しきい値, 目的関数のヘッセン行列の固有値, 全体的な計算/通信遅延などを明らかにする。定式化されたSDDEにより,分散SGDの収束条件と特性根の計算による速度の両立が可能となり,非同期/イベントトリガーSGDのスケジューリングポリシが最適化される。活性化作業員の増加は, 安定度による分散SGDを必ずしも加速させるものではないことが興味深い。さらに、小さな安定度は必ずしも収束を遅くするわけではないが、大きな安定度は分散SGDのばらつきをもたらす。非凸目的関数を持つ複雑な学習タスクにおいても,SDDEフレームワークの可能性を示す。 Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the staggers and limited bandwidth may induce random computational/communication delays, thereby severely hindering the learning process. Therefore, how to accelerate asynchronous SGD by efficiently scheduling multiple workers is an important issue. In this paper, a unified framework is presented to analyze and optimize the convergence of asynchronous SGD based on stochastic delay differential equations (SDDEs) and the Poisson approximation of aggregated gradient arrivals. In particular, we present the run time and staleness of distributed SGD without a memorylessness assumption on the computation times. Given the learning rate, we reveal the relevant SDDE's damping coefficient and its delay statistics, as functions of the number of activated clients, staleness threshold, the eigenvalues of the Hessian matrix of the objective function, and the overall computational/communication delay. The formulated SDDE allows us to present both the distributed SGD's convergence condition and speed by calculating its characteristic roots, thereby optimizing the scheduling policies for asynchronous/event-triggered SGD. It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness. Moreover, a small degree of staleness does not necessarily slow down the convergence, while a large degree of staleness will result in the divergence of distributed SGD. Numerical results demonstrate the potential of our SDDE framework, even in complex learning tasks with non-convex objective functions.	翻訳日:2024-06-18 18:43:55 公開日:2024-06-17
# 三重項を超えて動く:コンテキスト知識グラフの表現と推論 Move Beyond Triples: Contextual Knowledge Graph Representation and Reasoning ( http://arxiv.org/abs/2406.11160v1 ) ライセンス: Link先を確認	Chengjin Xu, Muzhi Li, Cehao Yang, Xuhui Jiang, Lumingyuan Tang, Yiyan Qi, Jian Guo,	(参考訳) 知識グラフ(KG)は多くのAIアプリケーションの基本構造であり、エンティティと三重項による相互関係を表す。しかし、3重ベースKGは、包括的な知識表現と効果的な推論に不可欠である時間的ダイナミクスや前駆的詳細といった、関係知識の文脈的な情報を欠いている。代わりに、時間的妥当性、地理的な位置、ソースの出所といった付加的な情報を組み込むことで、従来の構造に拡張する。この統合により、知識のより微妙で正確な理解が得られ、KGはより豊かな洞察を提供し、より洗練された推論プロセスをサポートすることができる。本稿ではまず,三重項に基づくKGの本質的限界について論じ,文脈的KGの概念を導入し,知識表現と推論の優位性を強調した。次に、大言語モデル(LLM)を活用して、候補エンティティと関連するコンテキストを検索し、検索した情報に基づいてそれらをランク付けし、クエリに応答するのに十分な情報を得たかどうかを判断するコンテキスト強化KG推論パラダイムである、textbf{KGR$^3$を提示した。実験の結果、KGR$^3$はKG完了(KGC)およびKG質問応答(KGQA)タスクの性能を大幅に向上させ、KG表現と推論に文脈情報を組み込むことの有効性を検証した。 Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Contextual Knowledge Graphs} (CKGs) expand upon the conventional structure by incorporating additional information such as time validity, geographic location, and source provenance. This integration provides a more nuanced and accurate understanding of knowledge, enabling KGs to offer richer insights and support more sophisticated reasoning processes. In this work, we first discuss the inherent limitations of triple-based KGs and introduce the concept of contextual KGs, highlighting their advantages in knowledge representation and reasoning. We then present \textbf{KGR$^3$, a context-enriched KG reasoning paradigm} that leverages large language models (LLMs) to retrieve candidate entities and related contexts, rank them based on the retrieved information, and reason whether sufficient information has been obtained to answer a query. Our experimental results demonstrate that KGR$^3$ significantly improves performance on KG completion (KGC) and KG question answering (KGQA) tasks, validating the effectiveness of incorporating contextual information on KG representation and reasoning.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# Emotion-LLaMA:マルチモーダル感情認識とインストラクションチューニングによる推論 Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning ( http://arxiv.org/abs/2406.11161v1 ) ライセンス: Link先を確認	Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann,	(参考訳) 正確な感情知覚は、人間とコンピュータの相互作用、教育、カウンセリングなど、様々な用途に欠かせない。しかし、伝統的な単一モダリティアプローチは、本質的にマルチモーダルである実世界の感情表現の複雑さを捉えるのに失敗することが多い。さらに、既存のMLLM(Multimodal Large Language Models)は、音声の統合と微妙な顔のマイクロ表現の認識において、課題に直面している。そこで本研究では,28,618粒径,4,487粒径のアノテートサンプルを含むMERRデータセットを提案する。このデータセットは、さまざまなシナリオから学習し、現実のアプリケーションに一般化することを可能にする。さらに,感情特異的エンコーダによる音声,視覚,テキスト入力をシームレスに統合するモデルであるEmotion-LLaMAを提案する。特徴を共有空間に整列させ、命令チューニングを備えた改良LLaMAモデルを使用することで、感情-LLaMAは感情認識と推論能力の両方を大幅に強化する。 Emotion-LLaMA は他のMLLMよりも優れており、EMER では Clue Overlap (7.83) と Label Overlap (6.25)、MER2023 では F1 スコア 0.9036、DFEW データセットでは WAR (59.37) のゼロショット評価では UAR (45.59) が最高である。 Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023 challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 低資源シナリオ下でのLLMはどの程度優れているか? 総合的評価 How Good are LLMs at Relation Extraction under Low-Resource Scenario? Comprehensive Evaluation ( http://arxiv.org/abs/2406.11162v1 ) ライセンス: Link先を確認	Dawulie Jinensibieke, Mieradilijiang Maimaiti, Wentao Xiao, Yuanhang Zheng, Xiangbo Wang,	(参考訳) 関係抽出(RE)は、構造化されていないテキストを構造化情報に変換する重要な技術として、特に知識グラフ開発における枠組みの中で機能する。その重要性は、下流の様々なタスクにおける重要な役割によって強調されている。ニューラルネットワークと事前学習言語モデルに基づく従来のRE法に加えて、大規模な言語モデル(LLM)もREの研究分野で活用されている。しかし、低リソース言語(LRL)では、データ不足の問題により、従来のRE法とLLMベースの手法の両方がReに対して不十分に動作する。そこで本研究では,3つの地域(中央アジア,東南アジア,中東)において,低リソース関係抽出データセットを10LRLで構築する。コーパスは、有効な多言語機械翻訳を使用して、オリジナルの公開可能な英語REデータセット(NYT10、FewRel、CrossRE)を翻訳することで構築される。次に、言語パープレキシティ(PPL)を使用して、翻訳されたデータセットから低品質データをフィルタリングする。最後に、これらの生成されたLRL REデータセット上で、実験的な研究を行い、複数のオープンソースLLMの性能を検証した。 Relation Extraction (RE) serves as a crucial technology for transforming unstructured text into structured information, especially within the framework of Knowledge Graph development. Its importance is emphasized by its essential role in various downstream tasks. Besides the conventional RE methods which are based on neural networks and pre-trained language models, large language models (LLMs) are also utilized in the research field of RE. However, on low-resource languages (LRLs), both conventional RE methods and LLM-based methods perform poorly on RE due to the data scarcity issues. To this end, this paper constructs low-resource relation extraction datasets in 10 LRLs in three regions (Central Asia, Southeast Asia and Middle East). The corpora are constructed by translating the original publicly available English RE datasets (NYT10, FewRel and CrossRE) using an effective multilingual machine translation. Then, we use the language perplexity (PPL) to filter out the low-quality data from the translated datasets. Finally, we conduct an empirical study and validate the performance of several open-source LLMs on these generated LRL RE datasets.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 分散線形量子最適制御のための2時間最適化フレームワーク Two-Timescale Optimization Framework for Decentralized Linear-Quadratic Optimal Control ( http://arxiv.org/abs/2406.11168v1 ) ライセンス: Link先を確認	Lechen Feng, Yuan-Hua Ni, Xuebo Zhang,	(参考訳) 本研究では, 分散線形二乗最適制御問題について検討し, スパーシティ促進関数の選択に基づいて, 近似分離制約付き最適化問題を初めて定式化する。まず、重み付き$\ell_1$スペーシティ促進関数の最適化問題に対して、BSUM(Block Successive Upper-bound Minimization)フレームワークと微分方程式ソルバに基づく2段階のアルゴリズムを採用する。第2に、分割2次スペーサ性促進関数を導入し、誘導最適化問題は、同じ2時間スケールのアルゴリズムを実行することにより、加速収束率を示す。最後に、$\ell_0$スペーサ性促進関数の最適化問題は、非凸かつ不連続であり、逐次座標凸最適化問題によって近似できると考えられる。 This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Successive Upper-bound Minimization) framework and a differential equation solver. Second, a piecewise quadratic sparsity promoting function is introduced, and the induced optimization problem demonstrates an accelerated convergence rate by performing the same two-timescale algorithm. Finally, the optimization problem with $\ell_0$ sparsity promoting function is considered that is nonconvex and discontinuous, and can be approximated by successive coordinatewise convex optimization problems.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# SUGARCREPE++データセット:意味的および語彙的変化に対する視覚言語モデル感度 SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations ( http://arxiv.org/abs/2406.11171v1 ) ライセンス: Link先を確認	Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad,	(参考訳) 彼らの顕著な成功にもかかわらず、ビジョン・アンド・ランゲージモデル(VLM)やユニモーダル言語モデル(ULM)を含む最先端の大規模言語モデル(LLM)は、正確な意味論を理解できない。例えば、意味的に等価な文は、異なる語彙合成を用いて表現され、発散する表現が引き起こされる。この分岐の程度と、そのエンコードされた意味論への影響は、あまりよく理解されていない。本稿では,語彙や意味の変化に対する VLM と ULM の感度を解析するためのSUGARCREPE++ データセットを提案する。 SUGARCREPE++データセットの各サンプルは、画像と対応する3つの字幕で構成されている。これは言語モデルに3方向のセマンティックな(同値な)問題を引き起こす。我々は,SUGARCREPE++データセットの性能をベンチマークするために,アーキテクチャ,事前学習対象,データセットが異なるVLMとULMを総合的に評価する。実験結果は,特に対象属性と空間的関係において,語彙と意味の差異を区別する上で,VLMの難しさを浮き彫りにした。より大規模な事前トレーニングデータセット、モデルサイズ、複数の事前トレーニング目標を持つVLMは、SUGARCREPE++のパフォーマンスが向上するが、改善の余地は大きい。構成性データセットの性能を向上するすべてのモデルがSUGARCREPE++上で同等に機能する必要はないことを示し、構成性だけでは意味論と語彙的変化を理解するには不十分であることを示す。 SUGARCREPE++データセットがターゲットとするプロパティの重要性を考えると、これはビジョンと言語コミュニティにとって新たな課題となる。 Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is not very well understood. In this paper, we introduce the SUGARCREPE++ dataset to analyze the sensitivity of VLMs and ULMs to lexical and semantic alterations. Each sample in SUGARCREPE++ dataset consists of an image and a corresponding triplet of captions: a pair of semantically equivalent but lexically different positive captions and one hard negative caption. This poses a 3-way semantic (in)equivalence problem to the language models. We comprehensively evaluate VLMs and ULMs that differ in architecture, pre-training objectives and datasets to benchmark the performance of SUGARCREPE++ dataset. Experimental results highlight the difficulties of VLMs in distinguishing between lexical and semantic variations, particularly in object attributes and spatial relations. Although VLMs with larger pre-training datasets, model sizes, and multiple pre-training objectives achieve better performance on SUGARCREPE++, there is a significant opportunity for improvement. We show that all the models which achieve better performance on compositionality datasets need not perform equally well on SUGARCREPE++, signifying that compositionality alone may not be sufficient for understanding semantic and lexical alterations. Given the importance of the property that the SUGARCREPE++ dataset targets, it serves as a new challenge to the vision-and-language community.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 異なる法的要因による刑事事件の一致の促進 Enhancing Criminal Case Matching through Diverse Legal Factors ( http://arxiv.org/abs/2406.11172v1 ) ライセンス: Link先を確認	Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang,	(参考訳) 異なる刑事事件間の関連性を決定するための刑事事件マッチングの試み。従来の手法では、インスタンスレベルの意味的特徴のみに基づいて関連性を予測し、さまざまな裁判所判断に関連するさまざまな法的要因(LF)を無視している。したがって、刑事事件を包括的に表現することは、これらのアプローチの課題である。 1) LFのマニュアルアノテーションは専門的な法的知識に大きく依存している; 2) LF間の重複はモデルの性能を損なう可能性がある。本稿では,2段階の枠組みであるDiverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM)を提案する。まず、DLF-CCMは、大規模な法的判断予測データセット上でLF抽出ネットワークを事前訓練するために、マルチタスク学習フレームワークを使用する。ステージ2では、DLF-CCMがLF脱冗長モジュールを導入し、共有LFと排他LFを学習する。さらに、全てのLFが生成する多重関係を動的に融合するためにエントロピー重畳融合戦略を導入する。実験の結果, DLF-CCMの有効性が検証され, 競争ベースラインよりも有意な改善が認められた。コード:https://github.com/jiezhao6/DLF-CCM。 Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extracting and utilizing these LFs for criminal case matching face two challenges: (1) the manual annotations of LFs rely heavily on specialized legal knowledge; (2) overlaps among LFs may potentially harm the model's performance. In this paper, we propose a two-stage framework named Diverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly, DLF-CCM employs a multi-task learning framework to pre-train an LF extraction network on a large-scale legal judgment prediction dataset. In stage two, DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusive LFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamically fuse the multiple relevance generated by all LFs. Experimental results validate the effectiveness of DLF-CCM and show its significant improvements over competitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# BSRBF-KAN:コルモゴロフ・アルノルドネットワークにおけるB-スプラインと放射基本関数の組み合わせ BSRBF-KAN: A combination of B-splines and Radial Basic Functions in Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2406.11173v1 ) ライセンス: Link先を確認	Hoang-Thang Ta,	(参考訳) 本稿では,Bsplines と radial basis function (RBFs) を組み合わせたコルモゴロフ・アーノルドネットワーク (KAN) である BSRBF-KAN を紹介する。我々は、MNISTデータセット上で、BSRBF-KAN、MLP、EfficientKAN、FastKAN、FasterKAN、GottliebKANなどの人気のあるkansを用いて実験を行った。 BSRBF-KANは、競争平均精度97.55%の5つのトレーニング時間で安定性を示し、他のネットワークよりも収束性が高い。我々は,BSRBF-KANが数理関数の組み合わせを多数開き,kanを設計できることを期待する。私たちのリポジトリは、https://github.com/hoangthangta/BSRBF-KAN.comで公開されています。 In this paper, we introduce BSRBF-KAN, a Kolmogorov Arnold Network (KAN) that combines Bsplines and radial basis functions (RBFs) to fit input vectors in data training. We perform experiments with BSRBF-KAN, MLP, and other popular KANs, including EfficientKAN, FastKAN, FasterKAN, and GottliebKAN over the MNIST dataset. BSRBF-KAN shows stability in 5 training times with a competitive average accuracy of 97.55% and obtains convergence better than other networks. We expect BSRBF-KAN can open many combinations of mathematical functions to design KANs. Our repo is publicly available at: https://github.com/hoangthangta/BSRBF-KAN.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# ステップごとに見て! 反復的なステップレベルプロセスリファインメントによるLLMエージェント学習 Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement ( http://arxiv.org/abs/2406.11176v1 ) ライセンス: Link先を確認	Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang, Cheng Li, Wei Peng, Sujian Li,	(参考訳) 大規模言語モデルエージェントは、様々な複雑な対話的タスクで例外的なパフォーマンスを示した。近年のアプローチでは、エージェントのパフォーマンスを向上させるために専門家の軌跡をチューニングしているが、主に結果報酬に集中しており、プロセスの監視信号がないためエラーや準最適動作につながる可能性がある。本稿では、エージェントトレーニングを強化するためのステップバイステップガイダンスを提供する、反復段階プロセスリファインメント(IPR)フレームワークについて紹介する。具体的には,ステップレベルの報酬を推定するためにモンテカルロ法を用いる。各イテレーションの間、エージェントは専門家の軌道に沿って探索し、新しいアクションを生成する。これらのアクションは、ステップレベルの報酬を使用して、専門家の軌道の対応するステップに対して評価される。このような比較は、エージェントのトレーニングデータとして機能する対照的なアクションペアを生成することで、相違点の識別に役立ちます。 3つの複雑なエージェントタスクに関する我々の実験は、我々のフレームワークが様々な強力なベースラインより優れていることを示した。さらに,IPRの行動効率向上効果と多種多様なモデルへの適用性について検討した。 Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as training data for the agent. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines. Moreover, our analytical findings highlight the effectiveness of IPR in augmenting action efficiency and its applicability to diverse models.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# TIFG:大規模言語モデルを用いたテキストインフォームド特徴生成 TIFG: Text-Informed Feature Generation with Large Language Models ( http://arxiv.org/abs/2406.11177v1 ) ライセンス: Link先を確認	Xinhao Zhang, Jinghan Zhang, Fengran Mo, Yuzhong Chen, Kunpeng Liu,	(参考訳) データのテキスト情報は、データマイニングと機能エンジニアリングにとって極めて重要である。しかし、既存の手法では、データ構造を学習し、データとともにテキスト情報を見渡すことに重点を置いている。その結果、彼らはこの貴重なリソースを無駄にし、テキストに埋め込まれた深いデータ関係を見逃します。本稿では,新しい LLM ベースのテキストインフォームド特徴生成フレームワークである Text-Informed Feature Generation (TIFG) を紹介する。 TIFGは、テキスト情報を利用して、検索可能な拡張生成(RAG)技術を用いて、外部知識内の可能性のある機能を検索することで、特徴を生成する。このアプローチでは、TIFGは機能空間を強化し、機能関係をさらに掘り下げるために、新しい説明可能な機能を生成することができる。我々は、TIFGを機能生成プロセスを継続的に最適化し、新しいデータ入力に適応し、反復よりも下流タスクのパフォーマンスを向上させる自動化フレームワークとして設計する。様々な下流タスクにおける幅広い実験は、我々のアプローチが高品質で有意義な特徴を生み出すことができ、既存の手法よりもはるかに優れていることを示している。 Textual information of data is of vital importance for data mining and feature engineering. However, existing methods focus on learning the data structures and overlook the textual information along with the data. Consequently, they waste this valuable resource and miss out on the deeper data relationships embedded within the texts. In this paper, we introduce Text-Informed Feature Generation (TIFG), a novel LLM-based text-informed feature generation framework. TIFG utilizes the textual information to generate features by retrieving possible relevant features within external knowledge with Retrieval Augmented Generation (RAG) technology. In this approach, the TIFG can generate new explainable features to enrich the feature space and further mine feature relationships. We design the TIFG to be an automated framework that continuously optimizes the feature generation process, adapts to new data inputs, and improves downstream task performance over iterations. A broad range of experiments in various downstream tasks showcases that our approach can generate high-quality and meaningful features, and is significantly superior to existing methods.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# エネルギー拡散による反復推論の学習 Learning Iterative Reasoning through Energy Diffusion ( http://arxiv.org/abs/2406.11179v1 ) ライセンス: Link先を確認	Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum,	(参考訳) 我々はエネルギー拡散による反復的推論(IRED)を導入し、エネルギーベース最適化による推論と意思決定問題を定式化し、様々なタスクの推論を学習する新しいフレームワークについて紹介する。 IREDは入力条件と所望の出力の間の制約を表現するためにエネルギー関数を学ぶ。トレーニング後、IREDは、問題の難易度に基づいて推論中に最適化ステップの数を調整し、より複雑なスドゥークパズル、大きな値の行列補完、より大きなグラフでのパスフィンディングといった、トレーニングディストリビューション外の問題を解決することができる。提案手法の成功の鍵は2つの新しい手法である: 簡易な推論のために熱処理されたエネルギー景観のシーケンスを学習することと、より速くより安定したトレーニングのためにスコア関数とエネルギー景観の監督を組み合わせることである。我々の実験によると、IREDは、特により困難なシナリオにおいて、連続空間推論、離散空間推論、計画タスクにおいて、既存のメソッドよりも優れています。 https://energy-based-model.github.io/ired/におけるコードと視覚化 We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution -- such as more complex Sudoku puzzles, matrix completion with large value magnitudes, and pathfinding in larger graphs. Key to our method's success is two novel techniques: learning a sequence of annealed energy landscapes for easier inference and a combination of score function and energy landscape supervision for faster and more stable training. Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. Code and visualizations at https://energy-based-model.github.io/ired/	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 技術不足地域や低資源制約地域における女性のための技術ソリューション設計に関する調査研究 An Initial Study Review of Designing a Technology Solution for Women in Technologically Deprived Areas or Low Resource Constraint Communities ( http://arxiv.org/abs/2406.11186v1 ) ライセンス: Link先を確認	Jones Yeboah, Sophia Bampoh, Annu Sible Prabhakar,	(参考訳) 西アフリカのガーナでは、うつ病は多くの女性に影響を与える重要な問題である。その重要性にもかかわらず、この問題は新型コロナウイルスのパンデミックであまり注目されなかった。先進国では、携帯電話は健康情報や提供者にアクセスするための便利な媒体として機能している。しかし、ガーナでは、女性の携帯電話へのアクセスは、文化的、社会的、経済的制約によって制限されており、メンタルヘルス情報や支援を求める能力を妨げている。ノキア3310など、不自由な地域の一部の女性はフィーチャーフォンに余裕があるが、高度なスマートフォン機能がないため、必要な健康情報へのアクセスが制限される。本稿では、これらの課題に対処するために、非構造化補助サービスデータ(USSD)技術の可能性についてレビューする。 Short Messaging Service (SMS)とは異なり、USSDはデータ収集、複雑なトランザクションを容易にし、インターネット接続を必要とせずに情報アクセスを提供する。本研究は、ガーナにおける資源不足の女性のためのメンタルヘルスリソースへのアクセスを改善するためのUSSDの利用について研究することを提案する。 In the West African country of Ghana, depression is a significant issue affecting a large number of women. Despite its importance, the issue received insufficient attention during the COVID-19 pandemic. In developed countries, mobile phones serve as a convenient medium for accessing health information and providers. However, in Ghana, women's access to mobile phones is limited by cultural, social, and financial constraints, hindering their ability to seek mental health information and support. While some women in deprived areas can afford feature phones, such as the Nokia 3310, the lack of advanced smartphone features further restricts their access to necessary health information. This paper reviews the potential of Unstructured Supplementary Service Data (USSD) technology to address these challenges. Unlike Short Messaging Service (SMS), USSD can facilitate data collection, complex transactions, and provide information access without the need for internet connectivity. This research proposes studying the use of USSD to improve access to mental health resources for resource-deprived women in Ghana.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# Save It All: サイクルブラックグラディエントDescenceによるフェデレーション大言語モデルの完全なパラメータチューニングの実現 Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Black Gradient Descent ( http://arxiv.org/abs/2406.11187v1 ) ライセンス: Link先を確認	Lin Wang, Zhichao Wang, Xiaoying Tang,	(参考訳) 大規模言語モデル(LLM)の出現は、ディープラーニングパラダイムに革命をもたらし、幅広いタスクで印象的な結果をもたらした。しかしながら、FL(Federated Learning)フレームワーク内でのLLMの事前トレーニングや微調整は、相当な計算量やメモリリソースの要求、サーバとクライアント間の通信ボトルネックなど、重大な課題を生じさせる。既存のソリューションでは、モデル全体がトレーニングのために交換されるという非現実的な仮定や、パラメータ更新の限られたサーチ部分空間による訓練や微調整の段階では性能が劣るFLにおいて、集中学習からLLMの訓練にパラメータ有効微調整手法を適用している。本稿では,資源消費を最小限に抑えつつ,FLにおけるLLMの学習と微調整を効率化するための新しい手法を提案する。我々のアプローチはFedCyBGDと呼ばれ、周期的にモデルを更新するためにCycle Block Gradient Descentを利用している。特に,FedCyBGDの圧縮スキームを設計し,モデルダウンロードコストをさらに削減することを目的とした。これにより、選択されたブロック更新とアップロードだけでFLの完全なパラメータトレーニングが可能になり、通信、計算、メモリコストを削減できる。本手法は,FL LLMトレーニングにおける最先端性能を実現するとともに,関連するコストを大幅に削減する。コードはここにある。 The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic assumption that the entire model is exchanged for training, or apply parameter-effective fine-tuning methods from centralized learning to train LLMs in FL which tend to underperform during training or fine-tuning stages due to the limited search subspace of parameter updating. In this paper, we introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption. Our approach, termed FedCyBGD, utilizes Cycle Block Gradient Descent to periodically update the model. In particular, we design a compression scheme for FedCyBGD, aiming to further decrease the model download cost. It enables full parameter training in FL with only selected block updates and uploads, thereby reducing communication, computation, and memory costs. Our method achieves state-of-the-art performance for FL LLM training, while significantly reducing associated costs. Codes are provided here.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# Frozen CLIP: 弱監視セマンティックセグメンテーションのための強力なバックボーン Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2406.11189v1 ) ライセンス: Link先を確認	Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao,	(参考訳) 弱教師付きセマンティックセグメンテーションは、画像レベルのラベルで大きな成果をみせた。いくつかの最近のアプローチでは、個別のセグメンテーションモデルをトレーニングするために擬似ラベルを生成するためにCLIPモデルを使用しているが、イメージレベルのラベルでオブジェクトを直接セグメンテーションするためにCLIPモデルをバックボーンとして適用しようとする試みはない。本稿では,CLIPをベースとした単一ステージパイプラインであるWeCLIPを提案する。具体的には、凍結したCLIPモデルを意味的特徴抽出のバックボーンとして適用し、最終的な予測のために抽出された意味的特徴を解釈する新しいデコーダを設計する。一方、上述した冷凍バックボーンを用いて、デコーダのトレーニングに擬似ラベルを生成する。このようなラベルはトレーニング中に最適化できない。そこで我々は,それらを動的に修正するための改良モジュール (RFM) を提案する。我々のアーキテクチャでは、提案されたデコーダとRAMが相互に恩恵を受け、最終的なパフォーマンスが向上する。大規模な実験により、我々のアプローチはトレーニングコストを抑えて他のアプローチよりも大幅に優れています。さらに、WeCLIPは完全な教師付き設定のための有望な結果も得る。コードはhttps://github.com/zbf1991/WeCLIPで入手できる。 Weakly supervised semantic segmentation has witnessed great achievements with image-level labels. Several recent approaches use the CLIP model to generate pseudo labels for training an individual segmentation model, while there is no attempt to apply the CLIP model as the backbone to directly segment objects with image-level labels. In this paper, we propose WeCLIP, a CLIP-based single-stage pipeline, for weakly supervised semantic segmentation. Specifically, the frozen CLIP model is applied as the backbone for semantic feature extraction, and a new decoder is designed to interpret extracted semantic features for final prediction. Meanwhile, we utilize the above frozen backbone to generate pseudo labels for training the decoder. Such labels cannot be optimized during training. We then propose a refinement module (RFM) to rectify them dynamically. Our architecture enforces the proposed decoder and RFM to benefit from each other to boost the final performance. Extensive experiments show that our approach significantly outperforms other approaches with less training cost. Additionally, our WeCLIP also obtains promising results for fully supervised settings. The code is available at https://github.com/zbf1991/WeCLIP.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 自己参照型AIフィードバックから大規模言語モデルを1つの原理で調整する Aligning Large Language Models from Self-Reference AI Feedback with one General Principle ( http://arxiv.org/abs/2406.11190v1 ) ライセンス: Link先を確認	Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao,	(参考訳) 大規模言語モデル(LLM)の整合において、人間よりも既存の先進的AIからのフィードバックを活用することが、監視信号をスケールするための重要な方法である。しかし、AIが人間の意図や社会的価値を理解し、それらに基づいて正確な嗜好フィードバックを提供することは非常に困難である。現在のAIフィードバック手法は強力なLLMに依存しており、人間の意図を記述するために慎重に設計された特定の原則であり、位置バイアスの影響を受けやすい。これらの問題に対処するために,13BのLlama2-Chatが,「人間性に最適な」といったシンプルで一般的な原則の下で,高品質なフィードバックを提供することのできる,自己参照型AIフィードバックフレームワークを提案する。具体的には、まずAIがユーザーの指示に反応し、その後、自身の回答を基準として他の回答に対する批判を発生させ、最後に、批判に従ってどの回答が人間の好みに合うかを判断する。さらに, 自己整合性法を用いて位置バイアスの影響をさらに低減し, セマンティック・パープレキシティを用いて, 異なる解の選好強度差を計算する。実験結果から,13Bと70BのLlama2-Chatアノテータで高品質な嗜好フィードバックが得られ,これらの選好データに基づいてトレーニングされたポリシーモデルは,強化学習によってベンチマークデータセットにおいて大きな利点を享受できることがわかった。 In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles to describe human intentions, and are easily influenced by position bias. To address these issues, we propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback under simple and general principles such as ``best for humanity``. Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference, and finally determine which answer better fits human preferences according to the criticism. Additionally, we use a self-consistency method to further reduce the impact of position bias, and employ semantic perplexity to calculate the preference strength differences between different answers. Experimental results show that our method enables 13B and 70B Llama2-Chat annotators to provide high-quality preference feedback, and the policy models trained based on these preference data achieve significant advantages in benchmark datasets through reinforcement learning.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 大規模言語モデルに対する人間の嗜好学習に関する調査研究 A Survey on Human Preference Learning for Large Language Models ( http://arxiv.org/abs/2406.11191v1 ) ライセンス: Link先を確認	Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang,	(参考訳) 近年の多目的大言語モデル(LLM)の急増は、より能力の高い基礎モデルと人間の意図との整合性に大きく依存している。関連する多くの研究にもかかわらず、人間の嗜好がどのようにLLMに導入されるかという視点は限定的であり、人間の嗜好とLLMの関係の深い理解や、その制限の実現を妨げる可能性がある。本研究では、嗜好中心の視点から、嗜好フィードバックの源泉と形式、選好信号のモデリングと利用、および、協調したLLMの評価について、人間の嗜好学習の進歩を概観する。まず、データソースとフォーマットに基づいて人間のフィードバックを分類する。次に、人間の嗜好モデリングのためのテクニックを要約し、異なるモデル流派の長所と短所を比較した。また、人間の嗜好信号を利用するために、目的によって分類された様々な嗜好利用法を提案する。最後に、人間の意図の整合性の観点からLLMを評価するためのいくつかの一般的なアプローチを要約し、LLMに対する人間の意図の整合性に関する我々の展望について議論する。 The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs.	翻訳日:2024-06-18 18:33:51 公開日:2024-06-17
# 境界を超えて: オープンな名前付きエンティティ認識のためのデータセットと言語をまたいだ普遍的なエンティティ分類を学ぶ Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition ( http://arxiv.org/abs/2406.11192v1 ) ライセンス: Link先を確認	Yuming Yang, Wantong Zhao, Caishuang Huang, Junjie Ye, Xiao Wang, Huiyuan Zheng, Yang Nan, Yuran Wang, Xueying Xu, Kaixin Huang, Yunke Zhang, Tao Gui, Qi Zhang, Xuanjing Huang,	(参考訳) 任意のドメインから任意のタイプのエンティティを識別するOpen Named Entity Recognition (NER) は、Large Language Models (LLM) では依然として困難である。近年の研究では、広範囲なNERデータに対する微調整LDMにより、その性能が向上することが示唆されている。しかし、既存のデータセットを直接トレーニングすることは、一貫性のないエンティティ定義と冗長なデータのために問題に直面し、LLMをデータセット固有の学習に制限し、ドメイン外の一般化を妨げる。そこで本研究では,既存の54の英語または中国語のデータセットから2段階のアプローチを用いて正規化された,Open NER用の凝集性で効率的なデータセットであるB2NERDを提案する。まず,データセット間の一貫性のないエンティティ定義を検出し,識別可能なラベル名を用いて識別し,400以上のエンティティタイプを普遍的に分類する。第2に、より大きなカテゴリとセマンティックな多様性を持つより少ないサンプルを選択するデータプルーニング戦略を用いて、冗長性に対処する。総合評価の結果,B2NERD は Open NER 上での LLM の一般化を著しく改善することが示された。我々のB2NERモデルは、B2NERDでトレーニングされ、GPT-4を6.8-12.0 F1ポイント上回っており、15のデータセットと6つの言語にわたる3つのドメイン外のベンチマークで、以前のメソッドを上回っています。 Open Named Entity Recognition (NER), which involves identifying arbitrary types of entities from arbitrary domains, remains challenging for Large Language Models (LLMs). Recent studies suggest that fine-tuning LLMs on extensive NER data can boost their performance. However, training directly on existing datasets faces issues due to inconsistent entity definitions and redundant data, limiting LLMs to dataset-specific learning and hindering out-of-domain generalization. To address this, we present B2NERD, a cohesive and efficient dataset for Open NER, normalized from 54 existing English or Chinese datasets using a two-step approach. First, we detect inconsistent entity definitions across datasets and clarify them by distinguishable label names to construct a universal taxonomy of 400+ entity types. Second, we address redundancy using a data pruning strategy that selects fewer samples with greater category and semantic diversity. Comprehensive evaluation shows that B2NERD significantly improves LLMs' generalization on Open NER. Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and surpass previous methods in 3 out-of-domain benchmarks across 15 datasets and 6 languages.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# MMNeuron:マルチモーダル大言語モデルにおけるニューロンレベルドメイン特異的解釈の発見 MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model ( http://arxiv.org/abs/2406.11193v1 ) ライセンス: Link先を確認	Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu,	(参考訳) 単語埋め込み空間に視覚的特徴を投影することは、MLLM(Multimodal Large Language Models)が採用する重要な融合戦略となっている。しかし、その内部機構はまだ解明されていない。多言語研究に触発されて,多言語モデルにおけるドメイン固有ニューロンを同定する。具体的には、ドメイン特異的ニューロンの分布と、MLLMのプロセスが様々なドメインからどのように機能するかのメカニズムについて検討する。さらに、投影された画像特徴を扱う際にMLLMにおける言語モデルモジュールのための3段階のフレームワークを提案し、この仮説をロジットレンズを用いて検証する。大規模な実験は、現在のMLLMが視覚質問応答(VQA)能力を示す一方で、ドメイン固有の情報を十分に活用していないことを示唆している。ドメイン固有のニューロンを適切に操作すると、最大で10倍の精度で精度が変化し、将来的にはクロスドメインのオールコンパスMLLMの開発に光を当てることになる。私たちのコードは紙の通知で解放されます。 Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage framework for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10\% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. Our code will be released upon paper notification.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 文脈編集:自己誘導分布から知識を学ぶ In-Context Editing: Learning Knowledge from Self-Induced Distributions ( http://arxiv.org/abs/2406.11194v1 ) ライセンス: Link先を確認	Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng,	(参考訳) 言語モデルのための既存の微調整パラダイムは知識編集のシナリオでは脆弱であり、モデルには広範な再訓練なしに新しい情報を組み込まなければならない。この脆さは、しばしば過度に適合し、性能を低下させ、不自然な言語生成をもたらす。そこで本研究では,このモデルのコンテキスト内学習機能を活用して,ワンホットターゲットではなくコンテキスト分布に調整する新しい手法であるConsistent In-Context Editing (ICE)を提案する。 ICEは、ターゲットとプロシージャの両方を含む単純な最適化フレームワークを導入し、勾配に基づくチューニング手法の堅牢性と有効性を高める。知識編集における4つの重要な側面、すなわち正確性、局所性、一般化、言語的品質を分析し、その利点を示す。 4つのデータセットで実験した結果、ICEの有効性を確認し、継続編集の可能性を示し、モデルの完全性を保ちながら更新情報が組み込まれることを保証する。 The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context learning capability to tune toward a contextual distribution rather than a one-hot target. ICE introduces a straightforward optimization framework that includes both a target and a procedure, enhancing the robustness and effectiveness of gradient-based tuning methods. We provide analytical insights into ICE across four critical aspects of knowledge editing: accuracy, locality, generalization, and linguistic quality, showing its advantages. Experimental results across four datasets confirm the effectiveness of ICE and demonstrate its potential for continual editing, ensuring that updated information is incorporated while preserving the integrity of the model.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# Vid3D:2次元ビデオ拡散を用いた動的3次元シーンの合成 Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion ( http://arxiv.org/abs/2406.11196v1 ) ライセンス: Link先を確認	Rishab Parthasarathy, Zack Ankner, Aaron Gokaslan,	(参考訳) コンピュータビジョンにおける最近のフロンティアは、シーンの時間変化した3D表現を生成する3Dビデオ生成のタスクである。動的3Dシーンを生成するために、現在の手法は、時間とシーンのビューの両方の一貫性を共同で最適化することにより、3Dの時間的ダイナミクスを明示的にモデル化する。本稿では,現行のアプローチのように時間とともに多視点の一貫性を明示的に実施する必要があるか,あるいはモデルが各タイムステップの3次元表現を独立して生成するのに十分なのかを検討する。そこで我々は,2次元映像拡散を利用したモデルVid3Dを提案し,まずビデオの時間的ダイナミクスの2次元「シード」を生成し,その後,シードビデオの各ステップ毎に独立して3次元表現を生成する。我々は,Vid3Dを最先端の2つの3Dビデオ生成手法に対して評価し,3D時間力学を明示的にモデル化していないにもかかわらず,Vid3Dが同等の結果が得られることを確認した。さらに、Vid3Dの品質が、フレーム毎に生成されたビュー数に依存するかについても検討する。より少ないビューでいくつかの劣化を観察する一方で、パフォーマンスの劣化は小さいままです。この結果から,高品質な動的3次元シーンを生成するには3次元時間的知識は必要ない可能性が示唆された。 A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by jointly optimizing for consistency across both time and views of the scene. In this paper, we instead investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently. We hence propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video. We evaluate Vid3D against two state-of-the-art 3D video generation methods and find that Vid3D is achieves comparable results despite not explicitly modeling 3D temporal dynamics. We further ablate how the quality of Vid3D depends on the number of views generated per frame. While we observe some degradation with fewer views, performance degradation remains minor. Our results thus suggest that 3D temporal knowledge may not be necessary to generate high-quality dynamic 3D scenes, potentially enabling simpler generative algorithms for this task.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# AvaTaR: ツール支援知識検索のためのLLMエージェントの最適化 AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval ( http://arxiv.org/abs/2406.11200v1 ) ライセンス: Link先を確認	Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou,	(参考訳) 大言語モデル(LLM)エージェントは、外部のツールや知識を活用して精度を高め、幻覚を減らすという印象的な能力を示した。しかし、LLMエージェントが外部のツールや知識を効果的に活用できるようなプロンプト技術の開発は、ヒューリスティックで退屈な作業である。本稿では、LLMエージェントを最適化し、提供するツールを効果的に利用し、与えられたタスク/ドメインの性能を向上させる新しい自動フレームワークであるAvaTaRを紹介する。最適化中、トレーニングデータからサンプルした正と負のサンプルの推論により、LLMエージェントに洞察的で全体論的なプロンプトを反復的に提供するコンパレータモジュールを設計する。テキスト,ビジュアル,リレーショナル情報を含む4つの複雑なマルチモーダル検索データセット上で,AvaTaRを実証する。 AvaTaRは、4つの課題にまたがる最先端のアプローチを一貫して上回り、新規事例に適用すると強力な一般化能力を示し、Hit@1測定値の平均14%の相対的改善を実現している。コードとデータセットはhttps://github.com/zou-group/avatar.comから入手できる。 Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data. We demonstrate AvaTaR on four complex multimodal retrieval datasets featuring textual, visual, and relational information. We find AvaTaR consistently outperforms state-of-the-art approaches across all four challenging tasks and exhibits strong generalization ability when applied to novel cases, achieving an average relative improvement of 14% on the Hit@1 metric. Code and dataset are available at https://github.com/zou-group/avatar.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# ファインチューニングかファインフィリングか? 大規模言語モデルにおけるパフォーマンスの謎を解き明かす Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models ( http://arxiv.org/abs/2406.11201v1 ) ライセンス: Link先を確認	Scott Barnett, Zac Brannelly, Stefanus Kurniawan, Sheng Wong,	(参考訳) 大きな言語モデル(LLM)は、入力クエリから人間のようなテキストを理解し、生成するユニークな機能を持つ。微調整すると、これらのモデルではドメイン固有のクエリのパフォーマンスが向上する。 OpenAIは、細調整のプロセスを強調し、「モデルを微調整するには、少なくとも10の例を提供する必要がある。通常、50から100のトレーニング例で微調整から明らかな改善が見られるが、正しい数は正確なユースケースによって大きく異なる。」と述べている。本研究では、この概念を、情報検索に外部コーパスデータを活用することにより、精度と妥当性を向上させることを目的とした、レトリーバル拡張ジェネレーション(RAG)パイプライン内のLLMの統合に拡張する。しかしながら、最適なレスポンスを提供するというRAGの約束は、複雑なクエリシナリオでは不十分であることが多い。本研究の目的は,複数の領域にまたがるRAGシステムの性能を高めるために,微調整LDMがコンテキストデータを抽出・統合する能力に与える影響を具体的に検討することである。複数のドメインからのデータセット間のベースライン性能に対する微調整モデルの精度と完全性を比較することにより,データ抽出と文脈理解におけるLCMの能力に及ぼす微調整の影響を評価する。その結果,OpenAI が提案するスタンドアロン LLM アプリケーションで見られる改善とは対照的に,ファインチューニングはベースラインモデルに比べて性能が低下することがわかった。本研究は、ドメイン固有タスクのための細調整モデルの精力的な調査と検証の必要性を強調した。 Large Language Models (LLMs) have the unique capability to understand and generate human-like text from input queries. When fine-tuned, these models show enhanced performance on domain-specific queries. OpenAI highlights the process of fine-tuning, stating: "To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples, but the right number varies greatly based on the exact use case." This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines, which aim to improve accuracy and relevance by leveraging external corpus data for information retrieval. However, RAG's promise of delivering optimal responses often falls short in complex query scenarios. This study aims to specifically examine the effects of fine-tuning LLMs on their ability to extract and integrate contextual data to enhance the performance of RAG systems across multiple domains. We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding by comparing the accuracy and completeness of fine-tuned models against baseline performances across datasets from multiple domains. Our findings indicate that fine-tuning resulted in a decline in performance compared to the baseline models, contrary to the improvements observed in standalone LLM applications as suggested by OpenAI. This study highlights the need for vigorous investigation and validation of fine-tuned models for domain-specific tasks.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# Consistency^2: 持続的かつ高速な3次元絵画 Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models ( http://arxiv.org/abs/2406.11202v1 ) ライセンス: Link先を確認	Tianfu Wang, Anton Obukhov, Konrad Schindler,	(参考訳) ジェネレーティブ3Dペイントは、高解像度の3Dアセット管理とリサイクルにおいて、最大の生産性向上要因である。消費者ハードウェア上での推測にテキスト・ツー・イメージ・モデルが利用可能になって以来、3Dペイント法の性能は一貫して改善され、現在は高機能化に近づいている。ほとんどのモデルの中核は、本質的に時間を要する反復過程である潜在空間における拡散をデノベーションするものである。近年, サンプリングイテレーションを桁違いに高速化し, サンプリングイテレーションを削減するために, 複数の技術が開発されている。 2D生成イメージングのために設計されたこれらの技術は、それらを3Dに持ち上げるためのレシピを持っていない。本稿では,現在進行中の課題に対してLCM(Latent Consistency Model)適応を提案することで,この問題に対処する。提案モデルの強みと弱みを分析し,定量的かつ質的に評価する。 Objaverse のサンプルデータから,本手法はすべての評価において強い嗜好を得ることができた。ソースコードはhttps://github.com/kongdai123/consistency2.comで入手できる。 Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. Ever since text-to-image models became accessible for inference on consumer hardware, the performance of 3D Painting methods has consistently improved and is currently close to plateauing. At the core of most such models lies denoising diffusion in the latent space, an inherently time-consuming iterative process. Multiple techniques have been developed recently to accelerate generation and reduce sampling iterations by orders of magnitude. Designed for 2D generative imaging, these techniques do not come with recipes for lifting them into 3D. In this paper, we address this shortcoming by proposing a Latent Consistency Model (LCM) adaptation for the task at hand. We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively. Based on the Objaverse dataset samples study, our 3D painting method attains strong preference in all evaluations. Source code is available at https://github.com/kongdai123/consistency2.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# GKSL型マスター方程式に対するCP条件 CP conditions for GKSL-like master equations ( http://arxiv.org/abs/2406.11205v1 ) ライセンス: Link先を確認	Akane Watanabe, Takayuki Suzuki, Makoto Unoki, Hiromichi Nakazato,	(参考訳) 量子力学写像 (QDM) の完全正負性 (CP) は、一般に、そのマスター方程式 (ME) がゴリーニ=コサコフスキー=スダルシャン=リンドブラッド (GKSL) 形式に従わないときの証明が困難である。 GKSL MEはマルコフ力学を記述しており、時間非依存のエルミート作用素を持つユニタリ成分と、時間非依存のリンドブラッド作用素と正の時間非依存の減衰率を持つ非ユニタリ成分からなる。近年、非マルコフ力学が注目され、時間依存演算子を持つ様々な種類のGKSLライクなMEが広く議論されているが、CP条件に関する厳密な議論は依然として限られている。本稿では、QDMがCPとなる条件を示し、MEは任意の時間依存でGKSLのような形式をとる。 1つのケースは、その ME が時間局所的な積分微分 GKSL 様の形式をとることであり、CP の異なるケースを含む。もう1つのケースは、MEが時間非局所的であるが、弱い結合状態において時間非局所であると近似することができることである。時間非局所の場合の特別な場合として、GKSL様の時間変化についても同様の議論がおこなわれ、これは以前の研究と比較されるべきである。 The complete positivity (CP) of a quantum dynamical map (QDM) is, in general, difficult to show when its master equation (ME) does not conform to the Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) form. The GKSL ME describes the Markovian dynamics, comprising a unitary component with time-independent Hermitian operators and a non-unitary component with time-independent Lindblad operators and positive time-independent damping rates. Recently, the non-Markovian dynamics has received growing attention, and the various types of GKSL-like MEs with time-dependent operators are widely discussed; however, rigorous discussions on their CP conditions remain limited. This paper presents conditions for QDMs to be CP, whose MEs take the GKSL-like form with arbitrary time dependence. One case considered is where its ME takes the time-local integro-differential GKSL-like form, which includes CP-divisible cases. Another case considered is where the ME is time-non-local but can be approximated to be time-local in the weak-coupling regime. As a special case of the time-non-local case, the same discussion holds for the time-convoluted GKSL-like form, which should be compared to previous studies.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 予測ラベルによるリトレーニングによるモデル精度の向上 Retraining with Predicted Hard Labels Provably Increases Model Accuracy ( http://arxiv.org/abs/2406.11206v1 ) ライセンス: Link先を確認	Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong,	(参考訳) \textit{noisy labels} でトレーニングされたモデルの性能は、単に \textit{retraining} によって改善されることが多い。しかし、この現象の詳細な理論的特徴は欠如している。本稿では, 線形分離可能な環境下でのリトレーニングを, ランダムに破損したラベルを用いて理論的に解析し, 与えられた(ノイズの多い)ラベルでの初期訓練によって得られた個体群精度を向上させることを実証する。私たちの知る限りでは、これが最初の理論的な結果である。リトレーニングは、ノイズのあるラベルによるトレーニングを含むラベル差分プライバシ(DP)によるトレーニングを改善するために応用できる。予測ラベルが与えられたラベルにマッチするサンプルに対して選択的にリトレーニングを行うことは、ラベルDPのトレーニングを \textit{no extra privacy cost} で大幅に改善することを示し、これを \textit{consensus-based retraining} と呼ぶ。例えば、CIFAR-100上でResNet-18を$\epsilon=3$ラベルDPでトレーニングすると、コンセンサスベースのリトレーニングによる精度が6.4\%向上する。 The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at \textit{no extra privacy cost}; we call this \textit{consensus-based retraining}. For e.g., when training ResNet-18 on CIFAR-100 with $\epsilon=3$ label DP, we obtain $6.4\%$ improvement in accuracy with consensus-based retraining.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 圧縮アレイ上で直接動作可能な操作は何か、エラーは何か? What Operations can be Performed Directly on Compressed Arrays, and with What Error? ( http://arxiv.org/abs/2406.11209v1 ) ライセンス: Link先を確認	Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan,	(参考訳) データ移動によって引き起こされる大きな行列とテンソルを持つ計算の急激なエスカレートコストに応じて、データ量を大幅に削減するために、いくつかの損失圧縮手法が開発されている。残念ながら、これらの手法はすべて、さらなる計算が行われる前にデータを圧縮する必要がある。本研究では,圧縮率とモデム誤差を良好に保ちながら,圧縮されたデータに直接,数十のかなり基本的な操作を行える圧縮機を開発する。我々は、GPUを用いたPyTorchフレームワークに基づく新しい圧縮機PyBlazを実装し、それを3つの非自明なアプリケーション上で評価し、内部表現のために異なる数系を選択する。この結果から,圧縮領域演算は許容範囲内でエラーを発生させながら,問題の大きさに優れたスケーラビリティを実現することが示された。我々の知る限り、この圧縮圧縮機は、許容性能とエラーを達成しつつ、圧縮ドメイン操作をサポートする最初の損失圧縮機である。 In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data while offering good compression ratios and modest errors. We implement a new compressor PyBlaz based on the familiar GPU-powered PyTorch framework, and evaluate it on three non-trivial applications, choosing different number systems for internal representation. Our results demonstrate that the compressed-domain operations achieve good scalability with problem sizes while incurring errors well within acceptable limits. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations while achieving acceptable performance as well as error.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# ゼロショットシーン変化検出 Zero-Shot Scene Change Detection ( http://arxiv.org/abs/2406.11210v1 ) ライセンス: Link先を確認	Kyusik Cho, Dong Yeop Kim, Euntai Kim,	(参考訳) 我々は、シーン変化検出のための新しい、トレーニング不要なアプローチを提案する。本手法は,ビデオの連続するフレーム間の変化検出を,共通のオブジェクトを識別し,新しいオブジェクトや欠落オブジェクトを検出できる追跡モデルを活用する。具体的には,連続フレームの代わりに参照画像とクエリ画像を入力することで,トラッキングモデルの変更検出効果を利用する。さらに、変化検出における2つの入力画像間のコンテンツギャップとスタイルギャップに着目し、適応的なコンテンツしきい値とスタイルブリッジ層をそれぞれ提案することで、両方の問題に対処する。最後に、映像へのアプローチを拡張して、リッチな時間情報を活用し、シーン変化検出性能を向上させる。我々は様々な実験を通してアプローチとベースラインを比較した。既存の列車ベースラインは訓練対象領域のみに特化する傾向にあるが,本手法は様々な領域で一貫した性能を示し,アプローチの競争力を証明している。 We present a novel, training-free approach to scene change detection. Our method leverages tracking models, which inherently perform change detection between consecutive frames of video by identifying common objects and detecting new or missing objects. Specifically, our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames. Furthermore, we focus on the content gap and style gap between two input images in change detection, and address both issues by proposing adaptive content threshold and style bridging layers, respectively. Finally, we extend our approach to video to exploit rich temporal information, enhancing scene change detection performance. We compare our approach and baseline through various experiments. While existing train-based baseline tend to specialize only in the trained domain, our method shows consistent performance across various domains, proving the competitiveness of our approach.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 大規模言語モデルにおける失敗管理のためのAIOpsに関する調査 A Survey of AIOps for Failure Management in the Era of Large Language Models ( http://arxiv.org/abs/2406.11213v1 ) ライセンス: Link先を確認	Lingzhe Zhang, Tong Jia, Mengxi Jia, Yong Yang, Zhonghai Wu, Ying Li,	(参考訳) ソフトウェアシステムが複雑化するにつれ、AIOps(Artificial Intelligence for IT Operations)メソッドは、大規模分散ソフトウェアシステムの高可用性と信頼性を確保するために、ソフトウェアシステムの障害管理に広く使用されている。しかし、これらの手法はクロスプラットフォームの汎用性やタスク間の柔軟性の欠如など、いくつかの課題に直面している。幸いなことに、近年の大規模言語モデル(LLM)の進歩はこれらの課題に大きく取り組むことができ、この分野を探求するための多くのアプローチがすでに提案されている。しかしながら、LLMベースのAIOpsと従来のAIOpsメソッドの違いについて、包括的な調査は行われていない。そこで本研究では,LLM時代の障害管理のためのAIOps技術に関する包括的調査を行う。これには、障害管理のためのAIOpsタスクの詳細な定義、AIOpsのデータソース、AIOpsに採用されているLLMベースのアプローチが含まれている。さらに、この調査では、AIOpsサブタスク、異なるAIOpsサブタスクに適した特定のLLMベースのアプローチ、ドメインの課題と今後の方向性などについて調査し、開発と応用をさらに進めることを目指している。 As software systems grow increasingly intricate, Artificial Intelligence for IT Operations (AIOps) methods have been widely used in software system failure management to ensure the high availability and reliability of large-scale distributed software systems. However, these methods still face several challenges, such as lack of cross-platform generality and cross-task flexibility. Fortunately, recent advancements in large language models (LLMs) can significantly address these challenges, and many approaches have already been proposed to explore this field. However, there is currently no comprehensive survey that discusses the differences between LLM-based AIOps and traditional AIOps methods. Therefore, this paper presents a comprehensive survey of AIOps technology for failure management in the LLM era. It includes a detailed definition of AIOps tasks for failure management, the data sources for AIOps, and the LLM-based approaches adopted for AIOps. Additionally, this survey explores the AIOps subtasks, the specific LLM-based approaches suitable for different AIOps subtasks, and the challenges and future directions of the domain, aiming to further its development and application.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# グローバルデータ制約:大規模言語モデルにおける倫理的・効果的な課題 Global Data Constraints: Ethical and Effectiveness Challenges in Large Language Model ( http://arxiv.org/abs/2406.11214v1 ) ライセンス: Link先を確認	Jin Yang, Zhiqiang Wang, Yanbin Lin, Zunduo Zhao,	(参考訳) 大規模言語モデル(LLM)の有効性と倫理的整合性は、トレーニングデータセットの多様性と品質に大きく影響される。しかし、データアクセシビリティのグローバルな状況は、特に厳格なデータプライバシ法や限られたオープンソース情報を持つ地域では、大きな課題をもたらしている。本稿では,LLMの高品質トレーニングデータ取得に伴う多面的課題について検討し,各種言語文脈におけるデータ不足,バイアス,低品質コンテンツに着目した。 LLMによる偏見的・幻覚的コンテンツの生成につながる可能性のある、一般に公開されているがバイアスのある、あるいは無関係なデータソースに依存するという技術的・倫理的な意味を強調します。 GPT-4とGPT-4oを用いた一連の評価を通じて、これらのデータ制約がモデル性能と倫理的アライメントにどのように悪影響を及ぼすかを実証する。本稿では,データ品質の向上と,高度なデータフィルタリング技術や倫理的データ収集手法など,ロバスト性をモデル化するためのいくつかの緩和戦略を提案し,検証する。我々の発見は、データ制約の有効性と倫理的意味の両方を考慮し、より信頼性が高く普遍的に適用可能なAIシステムの構築を促進するLLMの開発において、積極的なアプローチの必要性を浮き彫りにしている。 The efficacy and ethical integrity of large language models (LLMs) are profoundly influenced by the diversity and quality of their training datasets. However, the global landscape of data accessibility presents significant challenges, particularly in regions with stringent data privacy laws or limited open-source information. This paper examines the multifaceted challenges associated with acquiring high-quality training data for LLMs, focusing on data scarcity, bias, and low-quality content across various linguistic contexts. We highlight the technical and ethical implications of relying on publicly available but potentially biased or irrelevant data sources, which can lead to the generation of biased or hallucinatory content by LLMs. Through a series of evaluations using GPT-4 and GPT-4o, we demonstrate how these data constraints adversely affect model performance and ethical alignment. We propose and validate several mitigation strategies designed to enhance data quality and model robustness, including advanced data filtering techniques and ethical data collection practices. Our findings underscore the need for a proactive approach in developing LLMs that considers both the effectiveness and ethical implications of data constraints, aiming to foster the creation of more reliable and universally applicable AI systems.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 天気予報:多モーダル言語モデルは深刻な気象について推論できるか? WeatherQA: Can Multimodal Language Models Reason about Severe Weather? ( http://arxiv.org/abs/2406.11217v1 ) ライセンス: Link先を確認	Chengqian Ma, Zhanxiang Hua, Alexandra Anderson-Frey, Vikram Iyer, Xin Liu, Lianhui Qin,	(参考訳) 干し草、竜巻、雷雨などの激しい対流的な気象イベントは、しばしば急速に起こるが、大きな被害を招き、毎年数十億ドルの費用がかかる。このことは、気象学者や住民のリスクの高い地域での適切な準備のために、前もって厳しい天候の脅威を予知することの重要性を強調している。現代の大規模基盤モデルはそのような予測を実行できますか? 既存の気象ベンチマークでは、テキストのみの特徴を持つ特定の気象パラメータ(例えば、温度、湿度)の時系列変化の予測にのみ焦点が当てられている。本研究では、気象パラメータの複雑な組み合わせ(例えば成分)を推論し、現実世界のシナリオで厳しい天候を予測するために、機械用に設計された最初のマルチモーダルデータセットであるWeatherQAを紹介する。データセットには、さまざまな厳しい天候イベントのための8000組(複数画像、テキスト)のペアが含まれている。それぞれのペアには、環境の不安定さ、表面の観測、レーダーの反射率を捉えた成分を描写したリッチな情報が含まれており、テキストには、人間の専門家が作成した予測分析が含まれている。そこで,WeatherQAを用いて,GPT4,Claude3,Gemini-1.5,微調整されたLlama3ベースのVLMを含む最先端の視覚言語モデルの評価を行った。これらのタスクは、ドメイン知識(例えば、大気力学)の深い理解と、マルチモーダルデータ(例えば、気象パラメータ間の相互作用)に対する複雑な推論を必要とする。最強のVLM, GPT4o, および人間の推論の間には, かなりのギャップがある。気象学者との包括的なケーススタディは、モデルの弱点をさらに明らかにし、このギャップを埋めるためには、より良いトレーニングとデータ統合が必要であることを示唆している。 WeatherQA リンク:https://github.com/chengqianma/WeatherQA.com Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather benchmarks typically focus only on predicting time-series changes in certain weather parameters (e.g., temperature, moisture) with text-only features. In this work, we introduce WeatherQA, the first multimodal dataset designed for machines to reason about complex combinations of weather parameters (a.k.a., ingredients) and predict severe weather in real-world scenarios. The dataset includes over 8,000 (multi-images, text) pairs for diverse severe weather events. Each pair contains rich information crucial for forecasting -- the images describe the ingredients capturing environmental instability, surface observations, and radar reflectivity, and the text contains forecast analyses written by human experts. With WeatherQA, we evaluate state-of-the-art vision language models , including GPT4, Claude3, Gemini-1.5, and a fine-tuned Llama3-based VLM, by designing two challenging tasks: (1) multi-choice QA for predicting affected area and (2) classification of the development potential of severe convection. These tasks require deep understanding of domain knowledge (e.g., atmospheric dynamics) and complex reasoning over multimodal data (e.g., interactions between weather parameters). We show a substantial gap between the strongest VLM, GPT4o, and human reasoning. Our comprehensive case study with meteorologists further reveals the weaknesses of the models, suggesting that better training and data integration are necessary to bridge this gap. WeatherQA link: https://github.com/chengqianma/WeatherQA.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# GPT-4による別のスペイン語辞書の作成 Building another Spanish dictionary, this time with GPT-4 ( http://arxiv.org/abs/2406.11218v1 ) ライセンス: Link先を確認	Miguel Ortega-Martín, Óscar García-Sierra, Alfonso Ardoiz, Juan Carlos Armenteros, Ignacio Garrido, Jorge Álvarez, Camilo Torrón, Iñigo Galdeano, Ignacio Arranz, Oleg Vorontsov, Adrián Alonso,	(参考訳) 我々は、AI生成スペイン語辞書の第2版として、スペイン語構築Factual Freectianary 2.0(スペイン語-BFF-2)を提示する。従来,GPT-3を用いたこの独特な自由辞書の初版を開発した。本研究では,GPT-4-turboを用いて辞書の改良を図る。さらに,初期バージョンの改良について検討し,両モデルの性能比較を行った。 We present the "Spanish Built Factual Freectianary 2.0" (Spanish-BFF-2) as the second iteration of an AI-generated Spanish dictionary. Previously, we developed the inaugural version of this unique free dictionary employing GPT-3. In this study, we aim to improve the dictionary by using GPT-4-turbo instead. Furthermore, we explore improvements made to the initial version and compare the performance of both models.	翻訳日:2024-06-18 18:24:06 公開日:2024-06-17
# 複合スキーマレジストリ Compound Schema Registry ( http://arxiv.org/abs/2406.11227v1 ) ライセンス: Link先を確認	Silvery D. Fu, Xuewei Chen,	(参考訳) スキーマの進化は、異なるデータバージョン間の互換性を確保するために、データベースシステムを管理する上で重要である。スキーマレジストリは通常、スキーマの互換性を管理し、検証し、保証することで、リアルタイムデータストリーミングにおけるスキーマ進化の課題に対処する。しかしながら、現在のスキーマレジストリは、フィールドリネームやタイプ変更といった複雑な構文変更に苦労している。スキーマ進化の柔軟性を高めるために,複合AIシステムによって促進される一般化スキーマ進化(GSE)の利用を提案する。このシステムは、スキーマ変更のセマンティクスを解釈するためにLarge Language Models (LLM)を使用し、データストリームを中断することなく、幅広い構文修正をサポートする。我々のアプローチは、タスク固有の言語であるスキーマ変換言語(STL)を開発し、中間表現(IR)としてスキーママッピングを生成し、異なるデータ処理プラットフォーム間のスキーマ変更の統合を簡単にする。最初の結果から,本手法はスキーママッピングの精度と効率を向上し,実用的な応用におけるGSEの可能性を示すことが示唆された。 Schema evolution is critical in managing database systems to ensure compatibility across different data versions. A schema registry typically addresses the challenges of schema evolution in real-time data streaming by managing, validating, and ensuring schema compatibility. However, current schema registries struggle with complex syntactic alterations like field renaming or type changes, which often require significant manual intervention and can disrupt service. To enhance the flexibility of schema evolution, we propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes, supporting a broader range of syntactic modifications without interrupting data streams. Our approach includes developing a task-specific language, Schema Transformation Language (STL), to generate schema mappings as an intermediate representation (IR), simplifying the integration of schema changes across different data processing platforms. Initial results indicate that this approach can improve schema mapping accuracy and efficiency, demonstrating the potential of GSE in practical applications.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark ( http://arxiv.org/abs/2406.11228v1 ) ライセンス: Link先を確認	Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut,	(参考訳) オープンドメイン対話システムにおける評価指標のトレーニングと評価を容易にする新しいベンチマークであるComperDialを提案する。 ComperDialは、Commonsense Persona-grounded Dialogue (CPD)チャレンジに提出された99人の対話エージェントから収集された1,485件の会話で、10,395件の対話を人間で表した応答で構成されている。その結果,我々のベンチマークでは,学習した対話メトリクスのより堅牢な評価を実現するために,様々な特性を持つ多様な応答が多数含まれている。シングルターン応答スコアに加えて、ComperDialには対話レベルの人間注釈スコアが含まれており、対話全体を通してマルチターンモデル応答のジョイントアセスメントを可能にする。最後に,ComperDialから構築したモデル生成対話と人間の会話の一般的な類似度を測定するための,新しい自動評価指標を考案した。実験の結果,新しい測定基準であるCPDScoreは既存の測定基準よりも人間の判断と相関していることがわかった。我々は,オープンドメイン対話システムのための自動評価指標の開発を加速するために,ComperDialとCPDScoreの両方をコミュニティにリリースする。 We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes multiple diverse responses with variety of characteristics to ensure more robust evaluation of learned dialogue metrics. In addition to single-turn response scores, ComperDial also contains dialogue-level human-annotated scores, enabling joint assessment of multi-turn model responses throughout a dialogue. Finally, building off ComperDial, we devise a new automatic evaluation metric to measure the general similarity of model-generated dialogues to human conversations. Our experimental results demonstrate that our novel metric, CPDScore is more correlated with human judgments than existing metrics. We release both ComperDial and CPDScore to the community to accelerate development of automatic evaluation metrics for open-domain dialogue systems.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# ヘイスタックにおけるマルチモーダルニードル:マルチモーダル大言語モデルの長期能力のベンチマーク Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models ( http://arxiv.org/abs/2406.11230v1 ) ライセンス: Link先を確認	Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang,	(参考訳) MLLM(Multimodal Large Language Models)は様々な応用において大きな可能性を示しており、研究者や実践者からも広く関心を集めている。しかし、その長期的コンテキスト能力に関する包括的な評価はいまだに未検討である。これらのギャップに対処するために、MLLMの長期コンテキスト能力を評価するために特別に設計されたMultiModal Needle-in-a-haystack(MMNeedle)ベンチマークを導入する。マルチイメージ入力の他に、画像ステッチを用いて、入力コンテキスト長をさらに向上させ、サブイメージレベルの検索のためのラベルを自動的に生成するプロトコルを開発する。本質的には、MMNeedleは、テキストの指示と画像内容の記述に基づいて、一連の画像(haystack)の中にターゲットサブイメージ(needle)を見つける能力をストレステストすることでMLLMを評価する。この設定は、広義の視覚的コンテキストの高度な理解と、長文画像入力における効果的な情報検索を必要とする。本ベンチマークでは,APIベースモデルとオープンソースモデルの両方を含む最先端MLLMを評価した。この結果から、GPT-4oは長いコンテキストシナリオにおいて他のモデルより一貫して上回るが、負のサンプル、すなわち針が干し草にない場合の幻覚障害に悩まされていることが明らかとなった。 MLLMの包括的な長期コンテキスト評価では、APIベースモデルとオープンソースモデルの間の大幅なパフォーマンスギャップにも光を当てています。主要な結果の再現に必要なコード、データ、命令はすべてhttps://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack.comから入手できる。 Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs. With this benchmark, we evaluate state-of-the-art MLLMs, encompassing both API-based and open-source models. The findings reveal that GPT-4o consistently surpasses other models in long-context scenarios, but suffers from hallucination problems in negative samples, i.e., when needles are not in the haystacks. Our comprehensive long-context evaluation of MLLMs also sheds lights on the considerable performance gap between API-based and open-source models. All the code, data, and instructions required to reproduce the main results are available at https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# ロボットによる抽象的な指示の追従と複雑な動的タスクの実現 Enabling robots to follow abstract instructions and complete complex dynamic tasks ( http://arxiv.org/abs/2406.11231v1 ) ライセンス: Link先を確認	Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Chris Lucas,	(参考訳) ホームキッチンのような予測不可能な環境で複雑なタスクを補完することは、ロボットシステムに挑戦する。これらの課題には、「ホットドリンクを作れ」といった高いレベルの人間の命令を解釈することや、動いているマグカップに正確な量の水を注ぐような行為が含まれる。これらの課題に対処するため、我々はLarge Language Models (LLMs)、キュレートされた知識ベース、Integrated Force and Visual Feedback (IFVF)を組み合わせた新しいフレームワークを提案する。提案手法は,抽象的な命令を解釈し,長期的タスクを実行し,不確実性に対処する。 GPT-4を利用してユーザーのクエリと周辺を分析し、実行中に関数のキュレートされたデータベースにアクセスするコードを生成する。抽象命令を実行可能なステップに変換する。各ステップは、知識ベースからIFVF関連例を引き出すために、検索強化の一般化を利用することで、カスタムコードを生成する。 IFVFは、ロボットが実行中にノイズや障害に反応することを可能にする。コーヒーの作り方や板の飾り方を使って、注ぐものから引き出しの開口部分まで、それぞれ異なるフィードバックタイプや方法の恩恵を受けています。この新たな進歩は、不確実な環境で複雑なタスクを完了するためのスケーラブルで効率的なロボットフレームワークへの大きな進歩を示す。私たちの発見は、付随するビデオで説明され、オープンソースGitHubリポジトリでサポートされています(論文の受理に基づいてリリースされています)。 Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge Base, and Integrated Force and Visual Feedback (IFVF). Our approach interprets abstract instructions, performs long-horizon tasks, and handles various uncertainties. It utilises GPT-4 to analyse the user's query and surroundings, then generates code that accesses a curated database of functions during execution. It translates abstract instructions into actionable steps. Each step involves generating custom code by employing retrieval-augmented generalisation to pull IFVF-relevant examples from the Knowledge Base. IFVF allows the robot to respond to noise and disturbances during execution. We use coffee making and plate decoration to demonstrate our approach, including components ranging from pouring to drawer opening, each benefiting from distinct feedback types and methods. This novel advancement marks significant progress toward a scalable, efficient robotic framework for completing complex tasks in uncertain environments. Our findings are illustrated in an accompanying video and supported by an open-source GitHub repository (released upon paper acceptance).	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# 複数ユーザを対象としたレコメンダを用いた協調型データ分析システム A Collaborative Data Analytics System with Recommender for Diverse Users ( http://arxiv.org/abs/2406.11232v1 ) ライセンス: Link先を確認	Siu Lung Ng, Hirad Baradaran Rezaei, Fethi Rabhi,	(参考訳) 本稿では、モジュール化された再利用可能なマイクロサービスを備えたクラウドベースのプラットフォームを使用して、経験豊富な開発者と初心者ユーザのギャップを埋める、共同分析プラットフォームであるSLEGO(Software-Lego)システムを提案する。これらのマイクロサービスにより、開発者は分析ツールとワークフローを共有できる。一方、単純なグラフィカルユーザインターフェース(GUI)により、初心者のユーザはプログラミングスキルなしで包括的な分析パイプラインを構築することができる。ナレッジベースとLLM(Large Language Model)を使用したレコメンデーションシステムによってSLEGOは、マイクロサービスの選択と統合を強化し、分析パイプライン構築の効率を高める。金融と機械学習のケーススタディでは、SLEGOがモジュラーマイクロサービスの共有とアセンブリを促進し、リソース再利用性とチームのコラボレーションを大幅に改善する様子が示されている。その結果、モジュール設計、知識ベース、レコメンデーションシステムを統合し、より包括的で効率的な分析環境を育むことによって、データ分析を民主化するSLEGOの役割を強調した。 This paper presents the SLEGO (Software-Lego) system, a collaborative analytics platform that bridges the gap between experienced developers and novice users using a cloud-based platform with modular, reusable microservices. These microservices enable developers to share their analytical tools and workflows, while a simple graphical user interface (GUI) allows novice users to build comprehensive analytics pipelines without programming skills. Supported by a knowledge base and a Large Language Model (LLM) powered recommendation system, SLEGO enhances the selection and integration of microservices, increasing the efficiency of analytics pipeline construction. Case studies in finance and machine learning illustrate how SLEGO promotes the sharing and assembly of modular microservices, significantly improving resource reusability and team collaboration. The results highlight SLEGO's role in democratizing data analytics by integrating modular design, knowledge bases, and recommendation systems, fostering a more inclusive and efficient analytical environment.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# 大規模言語モデルにおける文脈内学習の意思決定境界の提案 Probing the Decision Boundaries of In-context Learning in Large Language Models ( http://arxiv.org/abs/2406.11233v1 ) ライセンス: Link先を確認	Siyan Zhao, Tung Nguyen, Aditya Grover,	(参考訳) インコンテキスト・ラーニング(In-context learning)は、大規模言語モデル(LLM)において重要なパラダイムであり、明示的なパラメータ更新なしにいくつかの例でこれらのモデルをシンプルに促すことで、新しいタスクやドメインに一般化することができる。モデルスケール、事前学習データ、その他の要因の関数として、LLMにおける文脈内学習を理解するために、多くの試みがなされている。本研究では,テキスト内二項分類のための決定境界のレンズからテキスト内学習を探索し,理解するための新しいメカニズムを提案する。決定境界は、標準分類器の帰納的バイアスの質的な振る舞いを可視化し、重要な情報を提供する。驚いたことに、単純な二項分類タスクにおいて、現在のLLMによって学習される決定境界は、基礎となるタスクの線形分離性に関係なく、しばしば不規則で非滑らかである。本稿では,これらの決定境界に影響を与える要因について検討し,その一般化性を高める方法を探る。本研究では,LLMの学習・微調整手法,モデルアーキテクチャの影響,データ効率のよい意思決定境界の平滑化のためのアクティブプロンプト手法の有効性など,様々な手法について検討する。本研究は、文脈内学習のダイナミクスをより深く理解し、文脈内学習の堅牢性と一般化性を高めるための実践的改善を提供する。 In-context learning is a key paradigm in large language models (LLMs) that enables them to generalize to new tasks and domains by simply prompting these models with a few exemplars without explicit parameter updates. Many attempts have been made to understand in-context learning in LLMs as a function of model scale, pretraining data, and other factors. In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. Decision boundaries are straightforward to visualize and provide important information about the qualitative behavior of the inductive biases of standard classifiers. To our surprise, we find that the decision boundaries learned by current LLMs in simple binary classification tasks are often irregular and non-smooth, regardless of linear separability in the underlying task. This paper investigates the factors influencing these decision boundaries and explores methods to enhance their generalizability. We assess various approaches, including training-free and fine-tuning methods for LLMs, the impact of model architecture, and the effectiveness of active prompting techniques for smoothing decision boundaries in a data-efficient manner. Our findings provide a deeper understanding of in-context learning dynamics and offer practical improvements for enhancing robustness and generalizability of in-context learning.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# MiniConGTS: Aspect Sentiment Triplet 抽出のためのニアアルティマリストコントラストグリッドタグ方式 MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction ( http://arxiv.org/abs/2406.11234v1 ) ライセンス: Link先を確認	Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu,	(参考訳) Aspect Sentiment Triplet extract (ASTE)は、与えられたコーパス内の感情三つ子を共同抽出することを目的としている。事前学習-ファインタニングパラダイム内の既存のアプローチは、複雑なタグ付けスキームと分類ヘッドを巧みに作成するか、あるいはパフォーマンスを高めるために外部意味拡張を組み込む傾向にある。本研究では,タグ付け方式における冗長性を再評価し,事前訓練された表現における内部強化について検討する。本稿では,最小限のタグ付け方式と新しいトークンレベルのコントラスト学習戦略を統合することにより,事前訓練された表現を改善し,活用する手法を提案する。提案手法は、よりコンパクトな設計と計算オーバーヘッドの低減を特徴とし、最先端技術と比較して同等または優れた性能を示す。さらに,GPT-4の性能を,この課題に対する数発の学習とチェーン・オブ・ソート・シナリオで公式に評価した最初の人物である。その結果,大規模言語モデルにおいても,事前学習ファインタニングのパラダイムは依然として有効であることが示唆された。 Aspect Sentiment Triplet Extraction (ASTE) aims to co-extract the sentiment triplets in a given corpus. Existing approaches within the pretraining-finetuning paradigm tend to either meticulously craft complex tagging schemes and classification heads, or incorporate external semantic augmentation to enhance performance. In this study, we, for the first time, re-evaluate the redundancy in tagging schemes and the internal enhancement in pretrained representations. We propose a method to improve and utilize pretrained representations by integrating a minimalist tagging scheme and a novel token-level contrastive learning strategy. The proposed approach demonstrates comparable or superior performance compared to state-of-the-art techniques while featuring a more compact design and reduced computational overhead. Additionally, we are the first to formally evaluate GPT-4's performance in few-shot learning and Chain-of-Thought scenarios for this task. The results demonstrate that the pretraining-finetuning paradigm remains highly effective even in the era of large language models.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# QTIP:トレライズとインコヒーレンス処理による量子化 QTIP: Quantization with Trellises and Incoherence Processing ( http://arxiv.org/abs/2406.11235v1 ) ライセンス: Link先を確認	Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa,	(参考訳) 後トレーニング量子化(PTQ)は、重みを低精度のデータタイプに量子化することにより、LCMのメモリフットプリントを削減する。 LLM推論は通常メモリバウンドであるので、PTQ法は推論スループットを向上させることができる。最近のPTQ手法はベクトル量子化(VQ)を用いて複数の重みを同時に定量化することで、より優れた整形によって情報利用を改善する。しかし、VQはサイズが指数関数的なコードブックを必要とする。これにより、現在のVQベースのPTQは、量子化品質を制限する低VQ次元(\le 8$)に作用する。本稿では,超高次元量子化を実現するためにトレリス符号化量子化(TCQ)を用いるQTIPを紹介する。 TCQはステートフルなデコーダを使用して、コードブックのサイズをビットレートと有効次元から分離する。 QTIPは、ハードウェア効率の良い"ビットシフト"トレリス構造のために設計された、ルックアップのみから計算されたルックアップフリートレリスコードにスペクトルを導入し、これらのコードは量子化品質と推論速度の両方で最先端の結果を達成する。 Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches have converged on using vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping. However, VQ requires a codebook with size exponential in the dimension. This limits current VQ-based PTQ works to low VQ dimensions ($\le 8$) that in turn limit quantization quality. Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization. TCQ uses a stateful decoder that separates the codebook size from the bitrate and effective dimension. QTIP introduces a spectrum of lookup-only to computed lookup-free trellis codes designed for a hardware-efficient "bitshift" trellis structure; these codes achieve state-of-the-art results in both quantization quality and inference speed.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# 遠隔テキストから得られる利得の種類について : 長期文脈言語モデリングによる分析 What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling ( http://arxiv.org/abs/2406.11238v1 ) ライセンス: Link先を確認	Yutong Hu, Quzhe Huang, Kangcheng Luo, Yansong Feng,	(参考訳) 大規模言語モデルで扱える文脈長は増加し続けており、これらのモデルは言語モデリングのようなタスクに遠隔情報を利用する能力の強化を実証している。この能力は人間の読み書きの習慣とは対照的であり、特に先駆的な場合を除き、特に遠方の情報を記憶して使うことは珍しくない。本稿では,言語モデルにおける長期的文脈から,どの単語がより恩恵を受けるかを検討することを目的とする。コンテントワード(例えば名詞,形容詞,形容詞)と単語の初期トークンは,文脈長の増加に伴うトークン確率の変化を分析することにより,最も有益であることがわかった。文脈における頻繁なパターン(N-gram)も予測に大きな影響を及ぼす。さらに、モデルの事前知識は、特に希少なトークンに対して、予測に影響を与える重要な役割を果たす。また、より長い文脈で言語モデルがより自信を持ち、よりシャープな確率分布が生まれることを観察する。この過信は、遠い文脈情報を持つトークンの確率の増大に寄与する可能性がある。我々の分析によって、コミュニティが長文言語モデリングをより深く理解し、より信頼性の高い長文モデルの設計に貢献できることを期待しています。 As the context length that large language models can handle continues to increase, these models demonstrate an enhanced ability to utilize distant information for tasks such as language modeling. This capability contrasts with human reading and writing habits, where it is uncommon to remember and use particularly distant information, except in cases of foreshadowing. In this paper, we aim to explore which kinds of words benefit more from long contexts in language models. By analyzing the changes in token probabilities with increasing context length, we find that content words (e.g., nouns, adjectives) and the initial tokens of words benefit the most. Frequent patterns in the context (N-grams) also significantly impact predictions. Additionally, the model's prior knowledge plays a crucial role in influencing predictions, especially for rare tokens. We also observe that language models become more confident with longer contexts, resulting in sharper probability distributions. This overconfidence may contribute to the increasing probabilities of tokens with distant contextual information. We hope that our analysis will help the community better understand long-text language modeling and contribute to the design of more reliable long-context models.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# ホモグリーフを用いたAI生成コンテンツ検出器の展開 Evading AI-Generated Content Detectors using Homoglyphs ( http://arxiv.org/abs/2406.11239v1 ) ライセンス: Link先を確認	Aldan Creo, Shushanta Pudasaini,	(参考訳) 人間に近いテキストの生成は、大規模言語モデル(LLM)の出現によって可能になった。 AI生成コンテンツの検出は、誤情報や学術的不正といった問題との戦いにおいて重要な役割を担っているため、信頼性の高いLCM検出器を開発するために多くの研究がなされている。このような検出装置が実験データで有望な結果を示したが、近年の研究により、異なる手法を用いて回避できることが判明した。本稿では,既存のLSM検出器を回避できるホモグリフ(a \rightarrow {\alpha}$)攻撃について述べる。攻撃の有効性は、ホモグリフがテキストのトークン化をどうシフトするかを解析することによって示される。 5つの異なるデータセット上で,Binoculars, DetectGPT, OpenAI検出器, 透かし技術を含む最先端LLM検出器に対するホモグリフの有効性を評価するために, 総合評価を行った。提案手法により, 検知器とデータセットのすべての構成の効率を0.5(ランダムな推測)まで向上させることに成功した。その結果, ホモグリフをベースとした攻撃は, 既存のLDM検出器を効果的に回避できることが示唆された。 The generation of text that is increasingly human-like has been enabled by the advent of large language models (LLMs). As the detection of AI-generated content holds significant importance in the fight against issues such as misinformation and academic cheating, numerous studies have been conducted to develop reliable LLM detectors. While promising results have been demonstrated by such detectors on test data, recent research has revealed that they can be circumvented by employing different techniques. In this article, homoglyph-based ($a \rightarrow {\alpha}$) attacks that can be used to circumvent existing LLM detectors are presented. The efficacy of the attacks is illustrated by analizing how homoglyphs shift the tokenization of the text, and thus its token loglikelihoods. A comprehensive evaluation is conducted to assess the effectiveness of homoglyphs on state-of-the-art LLM detectors, including Binoculars, DetectGPT, OpenAI's detector, and watermarking techniques, on five different datasets. A significant reduction in the efficiency of all the studied configurations of detectors and datasets, down to an accuracy of 0.5 (random guessing), is demonstrated by the proposed approach. The results show that homoglyph-based attacks can effectively evade existing LLM detectors, and the implications of these findings are discussed along with possible defenses against such attacks.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# 協調強化学習における電力正規化のメリット The Benefits of Power Regularization in Cooperative Reinforcement Learning ( http://arxiv.org/abs/2406.11240v1 ) ライセンス: Link先を確認	Michelle Li, Michael Dennis,	(参考訳) MARL(Cooperative Multi-Agent Reinforcement Learning)アルゴリズムは、タスク報酬を最適化するためにのみ訓練され、単一のエージェントの障害や敵意がシステム内のすべてのエージェントの報酬を解読する力の集中につながる。チームの状況では、人が単一障害点になっていないことを保証するために、力がどのように分散されているかを明確に考えるのが有用です。ここでは、協調RLシステムにおける力の集中を明示的に調整することは、単一エージェントの故障、敵攻撃、コプレイヤーのインセンティブ変化に対してより堅牢なシステムをもたらすと論じる。そこで本稿では,エゴエージェントの報酬に影響を及ぼすコプレーヤの能力を把握し,タスク報酬とパワー集中のバランスをとるための,パワーレギュラー化された目標を提案する。この新たな目的を前提として、全てのエージェントが最適応答バランスのパワーとタスク報酬を発揮できる均衡が存在することを示す。さらに、このパワーレギュラー化目標に向けて、トレーニング中に敵対データを注入するサンプルベースパワーレギュラー化(SBPR)と、本質的なモチベーションによるパワーレギュラー化(PRIM)の2つのアルゴリズムを提案する。我々の実験は,両アルゴリズムがタスク報酬とパワーのバランスをとることに成功し,タスクのみの報酬のベースラインよりも低消費電力化を実現し,システム内のエージェントが非政治状態になった場合の破滅的な出来事を回避することを実証した。 Cooperative Multi-Agent Reinforcement Learning (MARL) algorithms, trained only to optimize task reward, can lead to a concentration of power where the failure or adversarial intent of a single agent could decimate the reward of every agent in the system. In the context of teams of people, it is often useful to explicitly consider how power is distributed to ensure no person becomes a single point of failure. Here, we argue that explicitly regularizing the concentration of power in cooperative RL systems can result in systems which are more robust to single agent failure, adversarial attacks, and incentive changes of co-players. To this end, we define a practical pairwise measure of power that captures the ability of any co-player to influence the ego agent's reward, and then propose a power-regularized objective which balances task reward and power concentration. Given this new objective, we show that there always exists an equilibrium where every agent is playing a power-regularized best-response balancing power and task reward. Moreover, we present two algorithms for training agents towards this power-regularized objective: Sample Based Power Regularization (SBPR), which injects adversarial data during training; and Power Regularization via Intrinsic Motivation (PRIM), which adds an intrinsic motivation to regulate power to the training objective. Our experiments demonstrate that both algorithms successfully balance task reward and power, leading to lower power behavior than the baseline of task-only reward and avoid catastrophic events in case an agent in the system goes off-policy.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# 空間的・不確実性を考慮したハイパーグラフ拡散による高精度・高速画素検索 Accurate and Fast Pixel Retrieval with Spatial and Uncertainty Aware Hypergraph Diffusion ( http://arxiv.org/abs/2406.11242v1 ) ライセンス: Link先を確認	Guoyuan An, Yuchi Huo, Sung-Eui Yoon,	(参考訳) 本稿では,画像検索と画素検索の双方の効率性と精度を高めるために,新しい手法を提案する。従来の拡散法は、スカラーエッジ重みに依存するため、従来のグラフにおいて空間情報を効果的に伝播させるのに苦労する。この制限を克服するために,クエリ時間内に局所的な特徴を用いて空間情報を効率的に伝播し,データベース内のオブジェクトを正確に検索・ローカライズするハイパーグラフベースのフレームワークを提案する。さらに、我々は「コミュニティ選択」と呼ぶ手法により、画像グラフの構造情報を革新的に活用する。このアプローチにより、初期探索結果の不確実性の評価が可能となり、精度と速度の最適なバランスが図られる。このようなトレードオフが頻繁に必要となる現実世界のアプリケーションでは、これは特に重要です。 The (P)ROxford and (P)RParis datasets, conducted on the (P)ROxford and (P)RParis datasets, showed the significant superiority of our method than existing diffusion techniques。画像レベルの検索と画素レベルの検索では,SOTA(State-of-the-art)の精度が向上し,処理速度も向上した。この2つの成果は、ハイパーグラフベースのフレームワークとコミュニティ選択技術の有効性を強調し、コンテンツベースの画像検索の分野における顕著な進歩を示している。 This paper presents a novel method designed to enhance the efficiency and accuracy of both image retrieval and pixel retrieval. Traditional diffusion methods struggle to propagate spatial information effectively in conventional graphs due to their reliance on scalar edge weights. To overcome this limitation, we introduce a hypergraph-based framework, uniquely capable of efficiently propagating spatial information using local features during query time, thereby accurately retrieving and localizing objects within a database. Additionally, we innovatively utilize the structural information of the image graph through a technique we term "community selection". This approach allows for the assessment of the initial search result's uncertainty and facilitates an optimal balance between accuracy and speed. This is particularly crucial in real-world applications where such trade-offs are often necessary. Our experimental results, conducted on the (P)ROxford and (P)RParis datasets, demonstrate the significant superiority of our method over existing diffusion techniques. We achieve state-of-the-art (SOTA) accuracy in both image-level and pixel-level retrieval, while also maintaining impressive processing speed. This dual achievement underscores the effectiveness of our hypergraph-based framework and community selection technique, marking a notable advancement in the field of content-based image retrieval.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# FamiCom:タスク非依存のパフォーマンス推定を伴う言語モデルのためのさらなるデミスティファイションプロンプト FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation ( http://arxiv.org/abs/2406.11243v1 ) ライセンス: Link先を確認	Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen,	(参考訳) 言語モデルは、入力プロンプトの恩恵を受け、下流のタスクでより良いパフォーマンスを得られる、印象的なコンテキスト内学習機能を示している。既存の研究は、この観測の背後にあるメカニズムを調査し、エンドタスクのパフォーマンスをより正確に見積もることができるラベルに依存しないプロンプト指標を提案する。一般的なアプローチの1つは、モデルとプロンプトとの親しみを測る手段としてパープレキシティ(perplexity)を使用することである。ドメイン内のタスクに対して一貫した改善を示す一方で、パープレキシティのような親しみやすさの指標は、タスクやドメイン転送シナリオのような複雑な状況におけるパフォーマンスを正確に見積もることができないことがわかった。本研究では,タスク非依存のパフォーマンス推定のためのより包括的な尺度であるFamiComを提案する。特にFamiComは、現在のメトリクスから欠落している重要な要因である、エンドタスクの固有の難しさである‘textit{complexity}’と親しみやすさを組み合わせている。実験の結果、FamiComはエンドタスクのパフォーマンスと強く相関し、0.85のスピアマンの相関が生じる。さらに、FamiComを自動プロンプトとデモ選択に適用し、既存のメソッドやベースラインを7.0%以上精度で上回ります。 Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity with the prompt. While showing consistent improvements on in-domain tasks, we found that familiarity metrics such as perplexity cannot accurately estimate performance in complicated situations such as task or domain transferring scenarios. In this work, we propose a revised measure called FamiCom, providing a more comprehensive measure for task-agnostic performance estimation. Specifically, FamiCom combines familiarity with \textit{complexity} -- the inherent difficulty of end tasks, which is an important factor missing from current metrics. Experiments show that FamiCom strongly correlates with end-task performances, producing a 0.85 Spearman's correlation, versus 0.43 of familiarity-only ones'. We further apply FamiCom to automatic prompt and demonstration selection, and outperform existing methods and baselines by more than 7.0% in accuracy.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# SpoT-Mamba: 選択状態空間を持つ時空間グラフの長距離依存を学習する SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces ( http://arxiv.org/abs/2406.11244v1 ) ライセンス: Link先を確認	Jinhyeok Choi, Heehyeon Kim, Minhyeong An, Joyce Jiyoung Whang,	(参考訳) 時空間グラフ(STG)予測は、交通や天気予報など、現実世界の広範な応用において重要な課題である。近年、STGの複雑な力学をモデル化する手法がいくつか提案されているが、長距離時空間依存への対処は依然として大きな課題であり、性能の向上は限られている。最近提案されたMambaという状態空間モデルに触発されて、長距離依存を捕捉する顕著な能力を示し、新しいSTG予測フレームワークSpot-Mambaを提案する。 SpoT-Mambaは、様々なノード固有のウォークシーケンスをスキャンしてノード埋め込みを生成する。ノードの埋め込みに基づいて、時間的スキャンを行い、長距離の時空間依存関係をキャプチャする。実世界の交通予測データセットの実験結果から,Spot-Mambaの有効性が示された。 Spatio-temporal graph (STG) forecasting is a critical task with extensive applications in the real world, including traffic and weather forecasting. Although several recent methods have been proposed to model complex dynamics in STGs, addressing long-range spatio-temporal dependencies remains a significant challenge, leading to limited performance gains. Inspired by a recently proposed state space model named Mamba, which has shown remarkable capability of capturing long-range dependency, we propose a new STG forecasting framework named SpoT-Mamba. SpoT-Mamba generates node embeddings by scanning various node-specific walk sequences. Based on the node embeddings, it conducts temporal scans to capture long-range spatio-temporal dependencies. Experimental results on the real-world traffic forecasting dataset demonstrate the effectiveness of SpoT-Mamba.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# RIS支援IoVネットワークのための深層強化学習型AoI-Awareリソース割り当て Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks ( http://arxiv.org/abs/2406.11245v1 ) ライセンス: Link先を確認	Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief,	(参考訳) Reconfigurable Intelligent Surface (RIS)は通信において重要な技術であり、無線通信環境におけるリンク品質を大幅に向上させる代替手段を提供する。本稿では,車間通信(V2X)方式を考慮したRIS支援車両インターネット(IoV)を提案する。また、車両間リンク(V2I)のタイムラインと車両間リンク(V2V)の安定性を改善するために、情報化(AoI)モデルとペイロード伝送確率モデルを導入する。したがって、V2IリンクのAoIを最小化し、V2Vリンクペイロードの送信を優先することを目的として、BSが資源を割り当てるエージェントであるマルコフ決定プロセス(MDP)問題として、徐々に収束し、高い安定性を維持するソフトアクタ・クリティック(SAC)アルゴリズムを用いて、車両の位相シフトを制御する。 SAC アルゴリズムに基づく AoI 対応連系資源配分と RIS 位相シフト制御方式を提案し,その収束速度,累積報酬,AoI 性能,ペイロード伝達確率は,近位政策最適化 (PPO) ,深度決定性政策勾配 (DDPG) ,ツイン遅延深度決定性政策勾配 (TD3) ,確率的アルゴリズムよりも優れていることを示した。 Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-to-infrastructure (V2I) links and the stability of vehicle-to-vehicle (V2V) links, we introduce the age of information (AoI) model and the payload transmission probability model. Therefore, with the objective of minimizing the AoI of V2I links and prioritizing transmission of V2V links payload, we construct this optimization problem as an Markov decision process (MDP) problem in which the BS serves as an agent to allocate resources and control phase-shift for the vehicles using the soft actor-critic (SAC) algorithm, which gradually converges and maintains a high stability. A AoI-aware joint vehicular resource allocation and RIS phase-shift control scheme based on SAC algorithm is proposed and simulation results show that its convergence speed, cumulative reward, AoI performance, and payload transmission probability outperforms those of proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3) and stochastic algorithms.	翻訳日:2024-06-18 18:14:15 公開日:2024-06-17
# STEVEシリーズ:Minecraftにおけるエージェントシステムのステップバイステップ構築 STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft ( http://arxiv.org/abs/2406.11247v1 ) ライセンス: Link先を確認	Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang,	(参考訳) 大きな言語モデル(LLM)を中核とするエンボディエージェントシステムの構築は有望な方向性である。このようなエージェントを現実世界に展開し、訓練する上で、かなりのコストと制御不能な要因から、マインクラフト環境での探索を開始することにしました。私たちのSTEVEシリーズエージェントは、仮想環境における基本的なタスクを完了し、ナビゲーションやクリエイティブタスクといったより困難なタスクをこなすことができます。バニラ大言語モデルによる探索を開始し、視覚エンコーダと、収集した高品質データセットSTEVE-21Kに基づいてトレーニングされたアクションコードベースで拡張します。その後、Cryticとメモリで拡張して複雑なシステムに変換しました。最後に,階層型マルチエージェントシステムを構築した。我々の最近の研究は、知識蒸留を通してエージェントシステムを熟成する方法を探求した。将来的には、現実世界におけるSTEVEエージェントのさらなる応用の可能性を探る。 Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challenging tasks such as navigation and even creative tasks, with an efficiency far exceeding previous state-of-the-art methods by a factor of $2.5\times$ to $7.3\times$. We begin our exploration with a vanilla large language model, augmenting it with a vision encoder and an action codebase trained on our collected high-quality dataset STEVE-21K. Subsequently, we enhanced it with a Critic and memory to transform it into a complex system. Finally, we constructed a hierarchical multi-agent system. Our recent work explored how to prune the agent system through knowledge distillation. In the future, we will explore more potential applications of STEVE agents in the real world.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# DCASEチャレンジ2024タスク9における大言語モデルからのキャプション拡張に基づく言語情報音源分離の性能改善 Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9 ( http://arxiv.org/abs/2406.11248v1 ) ライセンス: Link先を確認	Do Hyun Lee, Yoonah Song, Hong Kook Kim,	(参考訳) 本稿では,言語クエリ音声ソース分離(LASS)タスクに適用した,プロンプトエンジニアリングに基づくテキスト拡張手法を提案する。学習データセットの各文に対応する複数の字幕を生成するために,大規模言語モデル (LLM) を用いた。そこで我々はまず,キャプション増強のための最も効果的なプロンプトを,より少ない数のキャプションで同定する実験を行った。これらの付加キャプションで訓練されたLASSモデルは、強化なしで訓練されたものと比較してDCASE 2024 Task 9の検証セットで改善された性能を示す。本研究は,LLMに基づくキャプション拡張が,言語クエリによる音声ソース分離に有効であることを示す。 We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for caption augmentation with a smaller number of captions. A LASS model trained with these augmented captions demonstrates improved performance on the DCASE 2024 Task 9 validation set compared to that trained without augmentation. This study highlights the effectiveness of LLM-based caption augmentation in advancing language-queried audio source separation.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# 事前学習モデルにおける関係学習:ハイパーグラフ回復の観点からの理論 Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective ( http://arxiv.org/abs/2406.11249v1 ) ライセンス: Link先を確認	Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu,	(参考訳) ファンデーション・モデル(FM)は、世界のリレーショナル・ダイナミクスに関する顕著な洞察を示しており、これらのモデルがどのように世界のハイブリッド・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレーショナル・リレー従来の統計的学習(特に予測問題)は、データ、特にオブジェクト間の関係に関して、リッチで本質的に構造化された情報を見落としてしまうことがある。本研究では,関係学習をハイパーグラフリカバリとして形式化する数学的モデルを導入し,FMの事前学習について検討する。我々のフレームワークでは、世界はハイパーグラフとして表現され、ハイパーエッジからランダムなサンプルとして抽象化される。我々は,このハイパーグラフを復元するための事前学習モデル (PTM) の有効性を理論的に検討し,データ効率を極小近似方式で解析する。リッチグラフ理論をPTMの領域に統合することにより、我々の数学的フレームワークは、独特な視点から事前学習を深く理解するための強力なツールを提供し、様々なシナリオで利用することができる。例えば、マルチモーダル学習において、フレームワークをエンティティアライメントに拡張する。 Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: how do these models acquire an understanding of world hybrid relations? Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships between objects. We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of FMs. In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style. By integrating rich graph theories into the realm of PTMs, our mathematical framework offers powerful tools for an in-depth understanding of pre-training from a unique perspective and can be used under various scenarios. As an example, we extend the framework to entity alignment in multimodal learning.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# 機械は人間に共鳴できるか? : LMの感情的・共感的理解の評価 Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs ( http://arxiv.org/abs/2406.11250v1 ) ライセンス: Link先を確認	Muhammad Arslan Manzoor, Yuxia Wang, Minghan Wang, Preslav Nakov,	(参考訳) 共感は、しばしば物語を通して個人的な経験を共有することによって引き起こされる、社会的行動を促進する上で重要な役割を担っている。しかしながら、NLPアプローチを用いた共感のモデル化は、人間の相互作用力学との深い相互関係のため、依然として困難である。人間の注釈付き共感データセット上での微調整言語モデル(LM)を含むこれまでのアプローチでは、成功は限られていた。 LMにおける共感理解を改善するために,マスク付きLMを用いたコントラスト学習や,Large Language Models (LLMs)による微調整など,いくつかの戦略を提案する。これらの手法は従来の方法よりも改善されているが、全体的な結果は満足できないままである。この傾向をよりよく理解するために、アノテータ間の低一致を明らかにする分析を行った。この合意の欠如は、トレーニングを妨げ、タスクの主観的な性質を強調します。また、アノテーションに対する文化的影響についても検討する。これを研究するために,我々はウルドゥー語でストーリーペアを注意深く収集し,アノテータ間の共感を解釈する主観性は文化的背景とは無関係であることがわかった。 LMの共感に対する理解の体系的な探索から得られた知見は、タスクの定式化とモデリングの両方において、十分な探索の余地があることを示唆している。 Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success. In our pursuit of improving empathy understanding in LMs, we propose several strategies, including contrastive learning with masked LMs and supervised fine-tuning with Large Language Models (LLMs). While these methods show improvements over previous methods, the overall results remain unsatisfactory. To better understand this trend, we performed an analysis which reveals a low agreement among annotators. This lack of consensus hinders training and highlights the subjective nature of the task. We also explore the cultural impact on annotations. To study this, we meticulously collected story pairs in Urdu language and find that subjectivity in interpreting empathy among annotators appears to be independent of cultural background. The insights from our systematic exploration of LMs' understanding of empathy suggest that there is considerable room for exploration in both task formulation and modeling.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# CLIPからオープンセマンティックスをマイニングする:Few-Shot Learningのためのリレーショナル・トランジション・パースペクティブ Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning ( http://arxiv.org/abs/2406.11252v1 ) ライセンス: Link先を確認	Cilin Yan, Haochen Wang, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves,	(参考訳) Contrastive Vision-Language Pre-Training (CLIP) は印象的なゼロショット能力を示す。 CLIPの下流タスクへの適応性を改善する鍵は、CLIPに埋め込まれた有用な知識を効果的にモデル化し、転送する方法にある。以前の研究は、典型的には限られた視覚サンプルと閉集合意味論(すなわち、下流タスクのターゲットカテゴリセット内)に基づいて知識を掘り下げている。しかし、一致したCLIP画像/テキストエンコーダは、視覚的特徴とほぼ無限のオープンセマンティクスの間の豊富な関係を含んでいる。本稿では,アンカーとしてオープンなセマンティクスを抽出し,画像とアンカーの関係から画像とターゲットの関係に遷移して予測を行う手法を提案する。具体的には、視覚的特徴を"Query"として、アンカーのテキスト特徴を"Key"として、アンカーとターゲットクラスのテキスト特徴を"Value"として、類似度行列を"Value"として、トランスフォーマーモジュールを採用する。このようにして、そのようなトランスモジュールの出力は、画像と対象カテゴリ、すなわち分類予測の関係を表す。手動でオープンセマンティクスを選択するのを避けるために、入力テキストの[CLASS]トークンを学習可能にします。我々は11の代表的な分類データセットについて広範な実験を行った。提案手法は,少数ショットの分類設定を考慮し,従来の最先端技術に対して良好に機能することを示す。 Contrastive Vision-Language Pre-training(CLIP) demonstrates impressive zero-shot capability. The key to improve the adaptation of CLIP to downstream task with few exemplars lies in how to effectively model and transfer the useful knowledge embedded in CLIP. Previous work mines the knowledge typically based on the limited visual samples and close-set semantics (i.e., within target category set of downstream task). However, the aligned CLIP image/text encoders contain abundant relationships between visual features and almost infinite open semantics, which may benefit the few-shot learning but remains unexplored. In this paper, we propose to mine open semantics as anchors to perform a relation transition from image-anchor relationship to image-target relationship to make predictions. Specifically, we adopt a transformer module which takes the visual feature as "Query", the text features of the anchors as "Key" and the similarity matrix between the text features of anchor and target classes as "Value". In this way, the output of such a transformer module represents the relationship between the image and target categories, i.e., the classification predictions. To avoid manually selecting the open semantics, we make the [CLASS] token of input text embedding learnable. We conduct extensive experiments on eleven representative classification datasets. The results show that our method performs favorably against previous state-of-the-arts considering few-shot classification settings.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# ホロスティックモーション2D:2次元空間におけるスケーラブルな全身運動生成 Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space ( http://arxiv.org/abs/2406.11253v1 ) ライセンス: Link先を確認	Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu,	(参考訳) 本稿では,2次元空間に焦点をあてて,$\textit{ general}$ human motion generationに新たな経路を導入する。従来の方法では主に3Dで人間の動きを生成しており、細部と現実性はあるものの、サイズと多様性の両面で利用可能な3Dモーションデータの範囲によって制限されることが多い。これらの制約に対処するため、我々は2次元のモーションデータを広範囲に活用する。我々は,高品質な全身/部分的なポーズアノテーションとテキスト記述とを組み合わせ,100万以上の移動シーケンスを含む2次元体動生成のための,最初の包括的かつ大規模なベンチマークである$\textbf{Holistic-Motion2D}$を提示する。特に、Holistic-Motion2Dは、これまで最大の3Dモーションデータセットの10倍の大きさである。また、革新的な$\textit{whole-body part-aware attention}$と$\textit{confidence-aware modeling}$ technique, tailored for 2D $\underline{\text T}$ext-driv$\underline{\text{EN}}$ whole-bo$\underline{\text D}$y motion gen$\underline{\text{ER}}$ation, $\textbf{Tender}$を特徴付けるベースラインメソッドを導入します。大規模な実験は、表現的で多様で現実的な人間の動きを生成するために、$\textbf{Holistic-Motion2D}$と$\textbf{Tender}$の有効性を実証している。また、下流アプリケーションにおける2Dモーションの有用性と3Dモーションへのリフトの可能性を強調した。ページリンクは以下の通り。 In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data. We present $\textbf{Holistic-Motion2D}$, the first comprehensive and large-scale benchmark for 2D whole-body motion generation, which includes over 1M in-the-wild motion sequences, each paired with high-quality whole-body/partial pose annotations and textual descriptions. Notably, Holistic-Motion2D is ten times larger than the previously largest 3D motion dataset. We also introduce a baseline method, featuring innovative $\textit{whole-body part-aware attention}$ and $\textit{confidence-aware modeling}$ techniques, tailored for 2D $\underline{\text T}$ext-driv$\underline{\text{EN}}$ whole-bo$\underline{\text D}$y motion gen$\underline{\text{ER}}$ation, namely $\textbf{Tender}$. Extensive experiments demonstrate the effectiveness of $\textbf{Holistic-Motion2D}$ and $\textbf{Tender}$ in generating expressive, diverse, and realistic human motions. We also highlight the utility of 2D motion for various downstream applications and its potential for lifting to 3D motion. The page link is: https://holistic-motion2d.github.io.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17
# YOLO9tr: 一般化された高効率層凝集ネットワークと注意機構を利用した舗装損傷検出軽量モデル YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism ( http://arxiv.org/abs/2406.11254v1 ) ライセンス: Link先を確認	Sompote Youwai, Achitaphon Chaiyaphat, Pawarotorn Chaipetch,	(参考訳) 道路舗装の整合性を維持することは安全かつ効率的な交通の確保に不可欠である。従来の舗装条件の評価方法は、しばしば手間がかかり、人間の誤りに影響を受けやすい。本稿では,舗装損傷検出のための軽量物体検出モデルであるYOLO9trを提案する。 YOLO9trはYOLOv9アーキテクチャをベースにしており、機能抽出とアテンション機構を強化する部分的なアテンションブロックが組み込まれており、複雑なシナリオにおける検出性能が改善されている。本モデルは,複数の国の道路被害画像からなる包括的データセットに基づいて訓練され,標準4以上の被害カテゴリが拡張されている。この拡張された分類範囲は、舗装条件のより正確で現実的な評価を可能にする。比較分析では、YOLO8、YOLO9、YOLO10のような最先端のモデルと比較して、YOLO9trの精度と推論速度が優れており、計算効率と検出精度のバランスが取れている。このモデルは、最大136FPSのフレームレートを実現し、ビデオ監視や自動検査システムなどのリアルタイムアプリケーションに適合する。本研究は,設計変更とハイパーパラメータ変動がモデル性能に及ぼす影響を解析し,部分的注意ブロックの有効性を検証するためのアブレーション研究である。その結果, リアルタイム舗装環境モニタリングにおけるYOLO9trの実用的展開の可能性を強調し, 安全かつ機能的な道路インフラを維持するための堅牢で効率的なソリューションの開発に寄与した。 Maintaining road pavement integrity is crucial for ensuring safe and efficient transportation. Conventional methods for assessing pavement condition are often laborious and susceptible to human error. This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection, leveraging the advancements of deep learning. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms, leading to improved detection performance in complex scenarios. The model is trained on a comprehensive dataset comprising road damage images from multiple countries, including an expanded set of damage categories beyond the standard four. This broadened classification range allows for a more accurate and realistic assessment of pavement conditions. Comparative analysis demonstrates YOLO9tr's superior precision and inference speed compared to state-of-the-art models like YOLO8, YOLO9 and YOLO10, achieving a balance between computational efficiency and detection accuracy. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems. The research presents an ablation study to analyze the impact of architectural modifications and hyperparameter variations on model performance, further validating the effectiveness of the partial attention block. The results highlight YOLO9tr's potential for practical deployment in real-time pavement condition monitoring, contributing to the development of robust and efficient solutions for maintaining safe and functional road infrastructure.	翻訳日:2024-06-18 18:04:29 公開日:2024-06-17

Title

Authors

Abstract

論文公表日・翻訳日

# LLMの文化的価値差:プロンプト、言語、モデルサイズ

Cultural Value Differences of LLMs: Prompt, Language, and Model Size ( http://arxiv.org/abs/2407.16891v1 )

ライセンス: Link先を確認

Qishuai Zhong, Yike Yun, Aixin Sun,

(参考訳) 本研究の目的は,大規模言語モデル(LLM)が示す文化的価値の行動パターンを明らかにすることである。研究された変種には、質問の順序付け、プロンプト言語、モデルサイズが含まれる。実験の結果,それぞれのLSMは異なる文化的価値で効率的に振る舞うことができることがわかった。もっと興味深いのは (i)LLMは、単一の言語でプロンプトを提示する場合、比較的一貫した文化的価値を示す。 (二文化価値の表現に影響を及ぼすことができるもの(eg、中国語又は英語) 同じ質問は、異なる言語で同じLLMがクエリされたときに、異なる文化的価値を導き出すことができる。 3) 同モデルのサイズの違い(例, Llama2-7B vs 13B vs 70B)は, モデルの違い(例, Llama2 vs Mixtral)よりも文化的価値に有意な影響を及ぼす。実験の結果,LLMのクエリ言語とモデルサイズが文化的価値の相違をもたらす主な要因であることが判明した。

Our study aims to identify behavior patterns in cultural values exhibited by large language models (LLMs). The studied variants include question ordering, prompting language, and model size. Our experiments reveal that each tested LLM can efficiently behave with different cultural values. More interestingly: (i) LLMs exhibit relatively consistent cultural values when presented with prompts in a single language. (ii) The prompting language e.g., Chinese or English, can influence the expression of cultural values. The same question can elicit divergent cultural values when the same LLM is queried in a different language. (iii) Differences in sizes of the same model (e.g., Llama2-7B vs 13B vs 70B) have a more significant impact on their demonstrated cultural values than model differences (e.g., Llama2 vs Mixtral). Our experiments reveal that query language and model size of LLM are the main factors resulting in cultural value differences.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-17

# マルチモーダルAIに基づくリクルートにおける融合手法の探求:FairCVdbからの洞察

Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb ( http://arxiv.org/abs/2407.16892v1 )

ライセンス: Link先を確認

Swati Swati, Arjun Roy, Eirini Ntoutsi,

(参考訳) 表形式データや画像,テキストなど,個別のモダリティに対する公平性に配慮した学習に関する多くの研究にもかかわらず,総合的な分析のためにさまざまなモダリティを融合させるマルチモーダルデータに対する作業は少なくなっている。本研究では,FairCVdbデータセットを用いたマルチモーダルAIベースの採用システムにおいて,マルチモーダル融合技術の公平性とバイアスの影響について検討する。以上の結果から,早期融合は両人口の基盤的真理と密接に一致し,各モダリティの特異な特徴を統合することにより,最も低いMAEを達成できることが示唆された。対照的に、遅延融合は高度に一般化された平均スコアとより高いMAEをもたらす。以上の結果から,早期核融合の有意な可能性の重要性が示唆された。将来の研究は、代替核融合戦略を探求し、公正性を改善するためにモダリティ関連の公正性制約を組み込む可能性がある。コードと追加の洞察については、https://github.com/Swati17293/Multimodal-AI-Based-Recruitment-FairCVdbを参照してください。

Despite the large body of work on fairness-aware learning for individual modalities like tabular data, images, and text, less work has been done on multimodal data, which fuses various modalities for a comprehensive analysis. In this work, we investigate the fairness and bias implications of multimodal fusion techniques in the context of multimodal AI-based recruitment systems using the FairCVdb dataset. Our results show that early-fusion closely matches the ground truth for both demographics, achieving the lowest MAEs by integrating each modality's unique characteristics. In contrast, late-fusion leads to highly generalized mean scores and higher MAEs. Our findings emphasise the significant potential of early-fusion for accurate and fair applications, even in the presence of demographic biases, compared to late-fusion. Future research could explore alternative fusion strategies and incorporate modality-related fairness constraints to improve fairness. For code and additional insights, visit: https://github.com/Swati17293/Multimodal-AI-Based-Recruitment-FairCVdb

翻訳日:2024-08-05 01:45:45 公開日:2024-06-17

# AI強化探索によるエミッション増加の推定

Estimating the Increase in Emissions caused by AI-augmented Search ( http://arxiv.org/abs/2407.16894v1 )

ライセンス: Link先を確認

Wim Vanderbauwhede,

(参考訳) 従来の検索クエリに対するAI生成の回答は、エネルギー消費を劇的に増加させる。我々の推計では、エネルギー需要は60～70倍増加する。これは、従来の検索におけるエネルギー消費量の更新と、BLOOMモデル、176Bパラメータモデル、OpenAIのChatGPTに対するクエリのエネルギー需要に関する最近の研究に基づいている。

AI-generated answers to conventional search queries dramatically increase the energy consumption. By our estimates, energy demand increase by 60-70 times. This is a based on an updated estimate of energy consumption for conventional search and recent work on the energy demand of queries to the BLOOM model, a 176B parameter model, and OpenAI's ChatGPT, which is of similar complexity.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-17

# フェアネス研究のノーム:メタ分析

(Unfair) Norms in Fairness Research: A Meta-Analysis ( http://arxiv.org/abs/2407.16895v1 )

ライセンス: Link先を確認

Jennifer Chien, A. Stevie Bergman, Kevin R. McKee, Nenad Tomasev, Vinodkumar Prabhakaran, Rida Qadri, Nahema Marchal, William Isaac,

(参考訳) アルゴリズムフェアネスは人工知能(AI)研究において重要な関心事となっている。しかし、公正なAIシステムの開発は客観的なプロセスではない。公正は本質的に主観的な概念であり、研究や開発に関わる人々の価値、経験、アイデンティティによって形作られた。現在フェアネス研究に埋め込まれている規範と価値をよりよく理解するために、2018年から2022年までの2つの主要なカンファレンスであるAIフェアネスと倫理に関するAIESとFAccTから、アルゴリズムフェアネス論文のメタ分析を行い、2018年から2022年にかけての139の論文の最終サンプルをカバーした。第1に、米国中心の視点がフェアネス研究全体において支配的であり、第2に、フェアネス研究は、人間のアイデンティティのバイナリ化(例えば、"Black/White"、"male/female")に広く依存していることを示す。これらの発見は、現在の研究がアイデンティティと生きた経験の複雑さをしばしば見落とし、最終的にアルゴリズムのバイアスと公正性を定義する際に、さまざまなグローバルな文脈を表現できないことを強調している。我々は、これらの研究設計選択の限界について議論し、AIシステムにおける公平性に対するより包括的で代表的なアプローチを促進するための推奨を提供し、人間のアイデンティティと価値に対するグローバルな理解を受け入れるパラダイムシフトを促します。

Algorithmic fairness has emerged as a critical concern in artificial intelligence (AI) research. However, the development of fair AI systems is not an objective process. Fairness is an inherently subjective concept, shaped by the values, experiences, and identities of those involved in research and development. To better understand the norms and values embedded in current fairness research, we conduct a meta-analysis of algorithmic fairness papers from two leading conferences on AI fairness and ethics, AIES and FAccT, covering a final sample of 139 papers over the period from 2018 to 2022. Our investigation reveals two concerning trends: first, a US-centric perspective dominates throughout fairness research; and second, fairness studies exhibit a widespread reliance on binary codifications of human identity (e.g., "Black/White", "male/female"). These findings highlight how current research often overlooks the complexities of identity and lived experiences, ultimately failing to represent diverse global contexts when defining algorithmic bias and fairness. We discuss the limitations of these research design choices and offer recommendations for fostering more inclusive and representative approaches to fairness in AI systems, urging a paradigm shift that embraces nuanced, global understandings of human identity and values.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-17

# 照明スペクトルの最適化による物体の色相の制御

Controlling the color appearance of objects by optimizing the illumination spectrum ( http://arxiv.org/abs/2407.09511v1 )

ライセンス: Link先を確認

Mariko Yamaguchi, Masaru Tsuchida, Takahiro Matsumoto, Tetsuro Tokunaga, Takayoshi Mochizuki,

(参考訳) 我々は、自然に白く見えるようにして、特定のターゲット色を変更する革新的な照明システムを開発した。私たちのシステムは、照明のスペクトルパワー分布(SPD)を正確に制御し、メタメリズムのユニークな現象を活用することで、今まで見たことのない方法で、ユニークな色のバリエーションを実現します。本システムでは, 所定の物質に対する照明の最適SPDを計算してメタメリズムを誘導し, 様々なLED色を用いて照明を合成する。我々は2024年のパリファッションウィークでシステムの実装を実演した。モデルがステージに上がると、彼らのドレスは魅惑的な変化を起こす。私たちのシステムでは、ドレスの色を変えて、印象的な色から別の色への変化を示しています。

We have developed an innovative lighting system that changes specific target colors while keeping the lights appearing naturally white. By precisely controlling the spectral power distribution (SPD) of illumination and harnessing the unique phenomenon of metamerism, our system achieves unique color variations in ways you've never seen before. Our system calculates the optimal SPDs of illumination for given materials to intensively induce metamerism, and then synthesizes the illumination using various colors of LEDs. We successfully demonstrated the system's implementation at Paris Fashion Week 2024. As models step onto the stage, their dresses initiate a captivating transformation. Our system altering the colors of the dresses, showcasing an impressive transition from one stunning color to another.

翻訳日:2024-07-22 13:28:38 公開日:2024-06-17

# AIコピロットの設計と評価 -- 小売コピロットテンプレートのケーススタディ

Design and evaluation of AI copilots -- case studies of retail copilot templates ( http://arxiv.org/abs/2407.09512v1 )

ライセンス: Link先を確認

Michal Furmakiewicz, Chang Liu, Angus Taylor, Ilya Venger,

(参考訳) AIのコパイロを成功させるには、体系的なアプローチが必要だ。本稿では,コピロの設計と評価を2つのセクションに分けた。 Microsoftが小売ドメイン用のコピロテンプレートを開発するケーススタディは、それぞれの側面の役割と重要性を説明するために使用される。最初のセクションでは、LLM、知識検索とアクションのためのプラグイン、オーケストレーション、システムプロンプト、責任あるAIガードレールなど、コピロのアーキテクチャの重要な技術コンポーネントについて検討している。第2節では、ビジネスコンテキストでAIを使用する場合、望ましい結果を促進し、意図しない結果を管理するための原則として、テストと評価について論じている。我々は、エンドツーエンドのヒューマンAI決定ループフレームワークのレンズを通して、品質と安全性を計測し、改善する方法について議論する。本稿では,コピロの解剖学とテストと評価の重要側面を考察することにより,人間中心のAIアシスタントを構築する上で,優れた設計と評価の実践がいかに重要であるかを示す具体的な証拠を提供する。

Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, including the LLM, plugins for knowledge retrieval and actions, orchestration, system prompts, and responsible AI guardrails. The second section discusses testing and evaluation as a principled way to promote desired outcomes and manage unintended consequences when using AI in a business context. We discuss how to measure and improve its quality and safety, through the lens of an end-to-end human-AI decision loop framework. By providing insights into the anatomy of a copilot and the critical aspects of testing and evaluation, this paper provides concrete evidence of how good design and evaluation practices are essential for building effective, human-centered AI assistants.

翻訳日:2024-07-22 13:28:38 公開日:2024-06-17

# 共同学習の構築 - 入門プログラミングにおけるソーシャルアノテーションの探求

Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ( http://arxiv.org/abs/2407.10322v1 )

ライセンス: Link先を確認

Francisco Gomes de Oliveira Neto, Felix Dobslaw,

(参考訳) ソフトウェア工学教育の需要の増加は、プログラミングやソフトウェア設計といった実践的な応用を必要とするさまざまなトピックがグループワークやインタラクションによって支えられているため、コースにおける学習上の課題を提起する。ソーシャルアノテーション(Social Annotation、SA)は、学生間の協調学習を強化するための教育手法である。 SAでは、学生と教師の両方が、フィードバックフルーツ、ペルーサル、ダイゴなどのプラットフォームを使用して、コース資料を共同で注釈付けし、議論する。このアプローチは、学生が自分の考えや答えを同僚と共有することを奨励し、よりインタラクティブな学習環境を育む。私たちは、ソフトウェア工学の学部生を対象にした入門プログラミングコースで講義の準備ツールとして、Perlipsall経由でソーシャルアノテーションを実装する経験を共有します。ペルーサルが112名の学生の受験成績に及ぼす影響を報告する。その結果,有意義な社会的アノテーションに携わる学生の81%が,このコースに合格したことがわかった。特に、試験に合格する学生の比率は、ペルーサルの割り当てがより多く完了するにつれて上昇する傾向にある。一方、ペルーサルの議論に参加していない学生の56%のみが試験に合格した。このコースにはペルーサルの強制参加は行わなかった。しかし,コース評価アンケートから得られたフィードバックから,ほとんどの学生がペルーサルをコースの好意的な構成要素に位置づけており,本科目への関心が高まっていることが明らかとなった。

The increasing demand for software engineering education presents learning challenges in courses due to the diverse range of topics that require practical applications, such as programming or software design, all of which are supported by group work and interaction. Social Annotation (SA) is an approach to teaching that can enhance collaborative learning among students. In SA, both students and teachers utilize platforms like Feedback Fruits, Perusall, and Diigo to collaboratively annotate and discuss course materials. This approach encourages students to share their thoughts and answers with their peers, fostering a more interactive learning environment. We share our experience of implementing social annotation via Perusall as a preparatory tool for lectures in an introductory programming course aimed at undergraduate students in Software Engineering. We report the impact of Perusall on the examination results of 112 students. Our results show that 81% of students engaged in meaningful social annotation successfully passed the course. Notably, the proportion of students passing the exam tends to rise as they complete more Perusall assignments. In contrast, only 56% of students who did not participate in Perusall discussions managed to pass the exam. We did not enforce mandatory Perusall participation in the course. Yet, the feedback from our course evaluation questionnaire reveals that most students ranked Perusall among their favorite components of the course and that their interest in the subject has increased.

翻訳日:2024-07-22 12:49:16 公開日:2024-06-17

# AIとワイヤレス技術の融合について:3GPP標準化の進展

On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress ( http://arxiv.org/abs/2407.10984v1 )

ライセンス: Link先を確認

Chen Sun, Tao Cui, Wenqi Zhang, Yingshuang Bai, Shuo Wang, Haojin Li,

(参考訳) 人工知能(AI)と無線通信技術の組み合わせは、2030年に向けた主要な技術トレンドの1つになっている。これには、AIを使用して無線伝送の効率を改善し、無線ネットワークによるAIデプロイメントをサポートすることが含まれる。本稿では,第3世代パートナーシッププロジェクト(GPP)標準開発の最新動向を紹介する。無線ネットワークによるAIモデル分散転送と,ビームマネジメント(BM)のためのAIに焦点を当てた最新の研究を紹介するとともに,学術的な成果を取り入れるために,既存の標準をどのように修正すべきかを解説する。

Combing Artificial Intelligence (AI) and wireless communication technologies has become one of the major technologies trends towards 2030. This includes using AI to improve the efficiency of the wireless transmission and supporting AI deployment with wireless networks. In this article, the latest progress of the Third Generation Partnership Project (3GPP) standards development is introduced. Concentrating on AI model distributed transfer and AI for Beam Management (BM) with wireless network, we introduce the latest studies and explain how the existing standards should be modified to incorporate the results from academia.

翻訳日:2024-07-22 12:49:16 公開日:2024-06-17

# GameVibe:マルチモーダル・アフェクティブ・ゲーム・コーポレーション

GameVibe: A Multimodal Affective Game Corpus ( http://arxiv.org/abs/2407.12787v1 )

ライセンス: Link先を確認

Matthew Barthet, Maria Kaselimi, Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis,

(参考訳) オンラインビデオとストリーミングのプラットフォームが成長を続ける中、情緒的コンピューティングの研究は、複数のモダリティを含むより複雑な研究へとシフトしてきた。しかし、高品質なオーディオ視覚刺激を持つデータセットがまだ不足している。本稿では,ゲーム内行動観察や視聴者エンゲージメントのための3人称感情ラベルを含む,マルチモーダル・オーディオ視覚刺激からなる新しい感情コーパスであるGameVibeを提案する。コーパスは、30のゲームにまたがる様々な公開ゲームプレイセッションのビデオで構成されており、高品質な刺激と優れたオーディオ視覚とゲームプレイの多様性を確実にするために特に注目されている。さらに、アノテータ間の合意の観点から、アノテータの信頼性について分析する。

As online video and streaming platforms continue to grow, affective computing research has undergone a shift towards more complex studies involving multiple modalities. However, there is still a lack of readily available datasets with high-quality audiovisual stimuli. In this paper, we present GameVibe, a novel affect corpus which consists of multimodal audiovisual stimuli, including in-game behavioural observations and third-person affect labels for viewer engagement. The corpus consists of videos from a diverse set of publicly available gameplay sessions across 30 games, with particular attention to ensure high-quality stimuli with good audiovisual and gameplay diversity. Furthermore, we present an analysis on the reliability of the annotators in terms of inter-annotator agreement.

翻訳日:2024-07-22 08:57:39 公開日:2024-06-17

# SS-ADA: セマンティックセグメンテーションのための半監督型アクティブドメイン適応フレームワーク

SS-ADA: A Semi-Supervised Active Domain Adaptation Framework for Semantic Segmentation ( http://arxiv.org/abs/2407.12788v1 )

ライセンス: Link先を確認

Weihao Yan, Yeqiang Qian, Yueyuan Li, Tao Li, Chunxiang Wang, Ming Yang,

(参考訳) セマンティックセグメンテーションは知的車両において重要な役割を担い、環境に関するピクセルレベルのセマンティック情報を提供する。しかし、新しい運転シナリオにセマンティックセグメンテーションモデルを適用する場合、ラベル付け予算は高価で時間を要する。コストを削減するため、大量のラベルのない画像を活用する半教師付きセマンティックセマンティックセマンティクス法が提案されている。それにもかかわらず、それらの性能は、典型的には教師あり学習によって達成される実践的なアプリケーションに必要な正確さに欠ける。重要な欠点は、通常、無ラベルの画像をランダムに選択し、モデルトレーニングのサンプル値の評価を無視する点である。本稿では,画像レベルの獲得戦略を用いたセマンティックセグメンテーションのための半教師付きアクティブドメイン適応(SS-ADA)フレームワークを提案する。 SS-ADAは、アクティブラーニングを半教師付きセマンティックセグメンテーションに統合し、ターゲットドメインからの限られたラベル付きデータで教師付き学習の精度を達成する。さらに,IoUに基づくクラス重み付け戦略を設計し,アクティブラーニングからのアノテーションを用いてクラス不均衡問題を緩和する。本研究では,合成ドメイン適応設定と実ドメイン適応設定について広範な実験を行った。その結果,本手法の有効性が示された。 SS-ADAは、リアルタイムセグメンテーションモデルを使用する場合、ターゲットラベル付きデータのわずか25%で教師付き学習の精度を達成または超過することができる。 SS-ADAのコードはhttps://github.com/ywher/SS-ADAで公開されている。

Semantic segmentation plays an important role in intelligent vehicles, providing pixel-level semantic information about the environment. However, the labeling budget is expensive and time-consuming when semantic segmentation model is applied to new driving scenarios. To reduce the costs, semi-supervised semantic segmentation methods have been proposed to leverage large quantities of unlabeled images. Despite this, their performance still falls short of the accuracy required for practical applications, which is typically achieved by supervised learning. A significant shortcoming is that they typically select unlabeled images for annotation randomly, neglecting the assessment of sample value for model training. In this paper, we propose a novel semi-supervised active domain adaptation (SS-ADA) framework for semantic segmentation that employs an image-level acquisition strategy. SS-ADA integrates active learning into semi-supervised semantic segmentation to achieve the accuracy of supervised learning with a limited amount of labeled data from the target domain. Additionally, we design an IoU-based class weighting strategy to alleviate the class imbalance problem using annotations from active learning. We conducted extensive experiments on synthetic-to-real and real-to-real domain adaptation settings. The results demonstrate the effectiveness of our method. SS-ADA can achieve or even surpass the accuracy of its supervised learning counterpart with only 25% of the target labeled data when using a real-time segmentation model. The code for SS-ADA is available at https://github.com/ywher/SS-ADA.

翻訳日:2024-07-22 08:57:39 公開日:2024-06-17

# 目に見えないトポロジへの一般化:生物学的神経活動の制御に向けて

Generalisation to unseen topologies: Towards control of biological neural network activity ( http://arxiv.org/abs/2407.12789v1 )

ライセンス: Link先を確認

Laurens Engwegen, Daan Brinks, Wendelin Böhmer,

(参考訳) 生体神経ネットワークにおけるクローズドループ制御の進歩のための新しいイメージングおよび神経刺激技術これにより、活動伝播の研究、および病理行動の診断と治療に応用できる。活動伝播の部分的に観察可能な特性、エッジを観測できないネットワーク、神経系の動的性質などにより、適応的で一般化可能な制御が必要である。本稿では,この一般化問題を解析するために,異なるトポロジを持つニューロンネットワークを手続き的に生成する環境を提案する。さらに、提示された部分観測可能な環境下での深部RLエージェントの一般化性能を評価するために、既存のトランスフォーマーベースアーキテクチャを調整した。エージェントは、限られた数のトレーニングネットワークから見えないテストネットワークへの制御を一般化する能力を示す。

Novel imaging and neurostimulation techniques open doors for advancements in closed-loop control of activity in biological neural networks. This would allow for applications in the investigation of activity propagation, and for diagnosis and treatment of pathological behaviour. Due to the partially observable characteristics of activity propagation, through networks in which edges can not be observed, and the dynamic nature of neuronal systems, there is a need for adaptive, generalisable control. In this paper, we introduce an environment that procedurally generates neuronal networks with different topologies to investigate this generalisation problem. Additionally, an existing transformer-based architecture is adjusted to evaluate the generalisation performance of a deep RL agent in the presented partially observable environment. The agent demonstrates the capability to generalise control from a limited number of training networks to unseen test networks.

翻訳日:2024-07-22 08:57:39 公開日:2024-06-17

# 参照設計による制約に基づくモデリング

Constraint based Modeling according to Reference Design ( http://arxiv.org/abs/2407.00064v1 )

ライセンス: Link先を確認

Erik Heiland, Peter Hillmann, Andreas Karcher,

(参考訳) ベストプラクティスという形での参照モデルは、再利用のための設計としての知識を確保するための重要な要素である。一般的なモデリングアプローチでは、レポジトリだけでなく、サポート方法で参照モデルを埋め込むメカニズムを提供していません。そのため、この専門知識から利益を得ることはほとんど不可能である。問題は、参照モデルは、ソリューションの開発に役立てられるほど公式に記述されていないことである。その結果、課題はプロセスと、参照モデルによって支援された専用ソリューションの設計において、ユーザがどのようにサポートできるかである。本稿では,セマンティック技術を用いた参照モデルの形式記述のための汎用的アプローチとその応用について述べる。我々のモデリングアシスタントは、参照ビルディングブロックに基づく異なる手法を用いたソリューションモデルの構築を可能にする。この環境は、適合性のための参照モデルに対して、開発した設計のその後の検証を可能にする。したがって、我々の参照モデリングアシスタントは相互依存を強調している。これらの手法の適用は、要件の形式化と、最終的に成熟度モデルにおける品質保証に寄与する。システム設計の文脈で複数の参照モデルを使用することが可能である。この手法は産業分野で評価され、異なるモデリングランドスケープに統合することができる。

Reference models in form of best practices are an essential element to ensured knowledge as design for reuse. Popular modeling approaches do not offer mechanisms to embed reference models in a supporting way, let alone a repository of it. Therefore, it is hardly possible to profit from this expertise. The problem is that the reference models are not described formally enough to be helpful in developing solutions. Consequently, the challenge is about the process, how a user can be supported in designing dedicated solutions assisted by reference models. In this paper, we present a generic approach for the formal description of reference models using semantic technologies and their application. Our modeling assistant allows the construction of solution models using different techniques based on reference building blocks. This environment enables the subsequent verification of the developed designs against the reference models for conformity. Therefore, our reference modeling assistant highlights the interdependency. The application of these techniques contributes to the formalization of requirements and finally to quality assurance in context of maturity model. It is possible to use multiple reference models in context of system of system designs. The approach is evaluated in industrial area and it can be integrated into different modeling landscapes.

翻訳日:2024-07-07 13:43:41 公開日:2024-06-17

# シンボリック回帰のための大規模言語モデルに基づく物理学部学生のための個人学習ツール

A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression ( http://arxiv.org/abs/2407.00065v1 )

ライセンス: Link先を確認

Yufan Zhu, Zi-Yu Khoo, Jonathan Sze Choong Low, Stephane Bressan,

(参考訳) インターリーブド・プラクティスは、学部生の記憶と問題解決能力を高める。本稿では,Large Language Model (LLM) 上に構築されたパーソナライズされた学習ツールについて紹介する。本ツールは,複雑な現象に対する学生の質的思考と問題解決能力を高めるために,次元解析手法を活用する。提案手法は,記号回帰のためのLLMと,素早い工学的手法による次元解析を組み合わせることで,物理変数間の関係を理解するためのユニークな視点を提供する。このことは、物理学と数学的原理のより広義の理解を促進し、特定の文脈における確立された方程式の解釈と適用に依存する従来の学部の物理学教育を補完する。 Feynman氏の物理学の講義から得られた方程式に基づいて、パーソナライズされた学習ツールをテストする。本ツールでは,ほとんどの方程式の物理変数間の関係を正しく識別し,物理系学生の相補的個別学習ツールとしての価値を評価できる。

Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses. We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students as they complete homework containing problems interleaved from undergraduate physics courses. Our tool leverages the dimensional analysis method, enhancing students' qualitative thinking and problem-solving skills for complex phenomena. Our approach combines LLMs for symbolic regression with dimensional analysis via prompt engineering and offers students a unique perspective to comprehend relationships between physics variables. This fosters a broader and more versatile understanding of physics and mathematical principles and complements a conventional undergraduate physics education that relies on interpreting and applying established equations within specific contexts. We test our personalized learning tool on the equations from Feynman's lectures on physics. Our tool can correctly identify relationships between physics variables for most equations, underscoring its value as a complementary personalized learning tool for undergraduate physics students.

翻訳日:2024-07-07 13:43:41 公開日:2024-06-17

# 数千ものLoRAアダプターを頭上から読み取るコンプレックス(動画あり)

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead ( http://arxiv.org/abs/2407.00066v1 )

ライセンス: Link先を確認

Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon,

(参考訳) 低ランクのアダプタ (LoRA) を搭載した細調整の大型言語モデル (LLM) が一般的となり、LoRA更新でのみ同じLLMのコピーを多数生成する。このパラダイムは、異なるLoRAを含むクエリに対するリアルタイム応答を提供するシステムの課題を示す。以前の作業では、そのようなシステムの設計を最適化していたが、GPUメモリに数千のLoRAを格納できないため、LoRAの継続的なロードとオフロードが依然として必要だった。この問題を軽減するため,LoRAアダプタの圧縮効果について検討する。 SVDを用いて個別に圧縮アダプタを検討するとともに,LoRA固有のスケーリング行列と組み合わせた共有ベースにLoRAを共同圧縮する方法を提案する。最大500LORAによる実験では、圧縮されたLORAは、1000LORA以上の現実的なサービスシナリオにおいて大きなスループット向上を提供し、単一のLORAを提供するスループットの75%を維持しながら、性能を保っていることが示された。

Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and offloading of LoRAs, as it is infeasible to store thousands of LoRAs in GPU memory. To mitigate this issue, we investigate the efficacy of compression when serving LoRA adapters. We consider compressing adapters individually via SVD and propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices. Our experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains in realistic serving scenarios with over a thousand LoRAs, maintaining 75% of the throughput of serving a single LoRA.

翻訳日:2024-07-07 13:43:41 公開日:2024-06-17

# パーセプトロン協調フィルタリング

Perceptron Collaborative Filtering ( http://arxiv.org/abs/2407.00067v1 )

ライセンス: Link先を確認

Arya Chakraborty,

(参考訳) 多変量ロジスティック回帰分類器は、他の多くのユーザの好みや嗜好情報を収集することで、ユーザの興味を自動予測する手法である協調フィルタリングを実装するための優れた方法であるが、ニューラルネットワークを使って同様の結果を得ることができる。推薦システムは情報フィルタリングシステムのサブクラスであり、特定のユーザにとって最も関連性の高い項目に対する提案を提供する。パーセプトロン(Perceptron)またはニューラルネットワーク(Neural Network)は、バックプロパゲーションと勾配降下を用いた複雑なデータセットの適合のために設計された機械学習モデルである。高度な最適化手法と組み合わせると、このモデルは古典的ロジスティック分類器の代用となることが証明される。最適化には、特徴スケーリング、平均正規化、正規化、ハイパーパラメータチューニング、正規勾配降下の代わりに確率/ミニバッチ勾配勾配を用いる。このユースケースでは、レコメンデータシステムでパーセプトロンを使用してパラメータ、すなわち複数のユーザからのデータを適合させ、それを特定のユーザの嗜好や関心を予測する。

While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. A perceptron or a neural network is a machine learning model designed for fitting complex datasets using backpropagation and gradient descent. When coupled with advanced optimization techniques, the model may prove to be a great substitute for classical logistic classifiers. The optimizations include feature scaling, mean normalization, regularization, hyperparameter tuning and using stochastic/mini-batch gradient descent instead of regular gradient descent. In this use case, we will use the perceptron in the recommender system to fit the parameters i.e., the data from a multitude of users and use it to predict the preference/interest of a particular user.

翻訳日:2024-07-07 13:34:23 公開日:2024-06-17

# 低深度量子信号の最適位相推定

Optimal Low-Depth Quantum Signal-Processing Phase Estimation ( http://arxiv.org/abs/2407.01583v1 )

ライセンス: Link先を確認

Yulong Dong, Jonathan A. Gross, Murphy Yuezhen Niu,

(参考訳) 絡み合いやコヒーレント増幅のような量子効果は、古典的な限界を超えた量子パラメータ推定の精度を大幅に向上させるのに使うことができる。しかし、デコヒーレンスや時間依存誤差といった課題はハイゼンベルクの増幅を妨げている。本稿では,これらの課題に対して頑健であり,Cram\'{e}r-Rao境界によって予測される最適性能を実現する量子信号生成位相推定アルゴリズムを提案する。これらのアルゴリズムは、相互依存型位相パラメータをほぼ直交型に分離するために量子信号変換を使用し、一方の時間依存誤差が他方の学習精度を損なわないことを保証している。実証可能な古典的推定と準最適量子回路設計を組み合わせることで、超伝導2量子ビット実験において、低深さ(10ドル)の回路を用いて不要なスワップ角を推定するために、前例のない標準偏差精度10^{-4}$ラジアンを達成できる。これは、既存の方法よりも最大で2桁改善されている。理論的,数値的には,時間依存型位相誤差に対するアルゴリズムの最適性を示し,時間依存型パラメータ$\varphi$の分散が,低深さ系における漸近的ハイゼンベルクスケーリングよりも高速にスケールすることを示した。我々の結果は量子フィッシャー情報に対して厳密に検証され、2量子ゲート学習の未整合精度を達成するためのプロトコルの能力を確認する。

Quantum effects like entanglement and coherent amplification can be used to drastically enhance the accuracy of quantum parameter estimation beyond classical limits. However, challenges such as decoherence and time-dependent errors hinder Heisenberg-limited amplification. We introduce Quantum Signal-Processing Phase Estimation algorithms that are robust against these challenges and achieve optimal performance as dictated by the Cram\'{e}r-Rao bound. These algorithms use quantum signal transformation to decouple interdependent phase parameters into largely orthogonal ones, ensuring that time-dependent errors in one do not compromise the accuracy of learning the other. Combining provably optimal classical estimation with near-optimal quantum circuit design, our approach achieves an unprecedented standard deviation accuracy of $10^{-4}$ radians for estimating unwanted swap angles in superconducting two-qubit experiments, using low-depth ($<10$) circuits. This represents up to two orders of magnitude improvement over existing methods. Theoretically and numerically, we demonstrate the optimality of our algorithm against time-dependent phase errors, observing that the variance of the time-sensitive parameter $\varphi$ scales faster than the asymptotic Heisenberg scaling in the small-depth regime. Our results are rigorously validated against the quantum Fisher information, confirming our protocol's ability to achieve unmatched precision for two-qubit gate learning.

翻訳日:2024-07-07 13:24:39 公開日:2024-06-17

# Twin-Merging: モデルマージにおけるモジュールエキスパートの動的統合

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging ( http://arxiv.org/abs/2406.15479v1 )

ライセンス: Link先を確認

Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng,

(参考訳) 大規模言語モデルの時代において、モデルマージは、余分なトレーニングなしで複数のタスク固有のモデルを単一のマルチタスクモデルに結合する、有望な方法である。しかし、2つの課題が残る。 (a)異なるモデル間の干渉と (b)テスト中の異種データ。従来のモデルマージ手法は、これらの問題により微調整されたモデルに比べて大きな性能差を示すことが多い。さらに、ひとつのサイズにフィットするモデルでは、さまざまなテストデータに対する柔軟性が欠如し、パフォーマンスが低下します。共有されたタスク固有の知識と排他的なタスク固有の知識の両方が、パフォーマンスのマージには不可欠であるが、排他的な知識を直接マージすることは、全体的なパフォーマンスを妨げている。そこで本研究では,1)知識を共有コンポーネントと排他コンポーネントにモジュール化し,冗長性を低減し効率を向上する圧縮,2)入力に基づいて共有知識とタスク固有の知識を動的にマージする,という2つの主要な段階を包含するTwin-Mergingを提案する。このアプローチは、マージされたモデルと微調整されたモデルのパフォーマンスギャップを狭め、異種データへの適応性を向上させる。識別的タスクと生成的タスクの両方を対象とした12ドルデータセットの大規模な実験により,識別的タスクの絶対正規化スコアが平均28.34ドル%向上し,生成的タスクの微調整された上限を超える結果が得られた。 (我々の実装はhttps://github.com/LZY-the-boys/Twin-Mergin.com)。

In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $12$ datasets for both discriminative and generative tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. (Our implementation is available in https://github.com/LZY-the-boys/Twin-Mergin.)

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# ジャイアントの肩について:ダイナミック・ロジット・フュージョンによる不運な弱み

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion ( http://arxiv.org/abs/2406.15480v1 )

ライセンス: Link先を確認

Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng,

(参考訳) タスク固有のアプリケーションのための大規模言語モデルの効率的な微調整は必須であるが、これらのモデルの膨大なパラメータは、そのトレーニングをますます困難にしている。効果的な方法に関する多くの提案にもかかわらず、更新時の勾配計算にはかなりのメモリオーバーヘッドが残っている。一連のタスク固有の小さなモデルを微調整し、その知識を追加のトレーニングなしでもっと大きなモデルに直接転送するのでしょうか? 本稿では,ロジット算術を用いた弱い対強の特殊化について検討し,この問題への直接的な回答を容易にする。既存の弱強法では、静的な知識伝達比と1つの小さなモデルを用いて複雑な知識を伝達し、最適以下の性能をもたらす。 % この問題に対処するため、これらの制限を克服するため、我々は、異なるタスクに特化して、一連のタスク固有の小さなモデルで動作する動的ロジット融合アプローチを提案する。この方法は、各復号ステップでこれらのモデル間の重みを適応的に割り当て、Kullback-Leibler分散制約最適化問題を通して重みを学習する。我々は、シングルタスクとマルチタスクの両方の設定において、様々なベンチマークで広範な実験を行い、主要な結果を得た。本手法は、7Bモデルから13Bモデルに専門知識を移すことにより、シングルタスクシナリオでは96.4\%、マルチタスクシナリオでは86.3\%の性能ギャップを、13Bモデルの完全な微調整と比較して埋める。特に、目に見えないタスクでパフォーマンスを上回ります。さらに,本手法は,単一タスクに対する文脈内学習とマルチタスクシナリオに対するタスク算術とをシームレスに統合できることを実証する。実装はhttps://github.com/Facico/Dynamic-Logit-Fusion.comで公開しています。

Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?} In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance. % To address this, To surmount these limitations, we propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. This method adaptively allocates weights among these models at each decoding step, learning the weights through Kullback-Leibler divergence constrained optimization problems. We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results. By transferring expertise from the 7B model to the 13B model, our method closes the performance gap by 96.4\% in single-task scenarios and by 86.3\% in multi-task scenarios compared to full fine-tuning of the 13B model. Notably, we achieve surpassing performance on unseen tasks. Moreover, we further demonstrate that our method can effortlessly integrate in-context learning for single tasks and task arithmetic for multi-task scenarios. (Our implementation is available in https://github.com/Facico/Dynamic-Logit-Fusion.)

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# CSRT:コードスイッチング赤チームデータセットを用いたLCMの評価と解析

CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset ( http://arxiv.org/abs/2406.15481v1 )

ライセンス: Link先を確認

Haneul Yoo, Yongjin Yang, Hwaran Lee,

(参考訳) 大規模言語モデル(LLM)の最近の研究は、言語モデリングにおける従来の課題を超えて、その多言語能力と安全性に光を当てている。それでも、現在のベンチマークでは、包括的な評価ができず、手動のアノテーションに過度に依存していることが明らかになっている。本稿では,LLMの多言語理解と安全性を同時にテストする,単純かつ効果的なリピート手法であるコードスイッチング・レッドチーム(CSRT)を提案する。 CSRTデータセットは、最大10言語を結合した315のコードスイッチングクエリからなり、望ましくない動作を広範囲に引き出す。 CSRTは10種類の最先端LCMによる広範囲な実験を通じて、既存の多言語的リピート技術よりも優れた性能を示し、既存の英語の手法よりも46.7%のアタックを達成している。 CSRTデータセットに対する有害な応答を,スケーリング法則,安全でない行動カテゴリー,最適データ生成のための入力条件を含む16Kサンプルを用いてアブレーション研究により分析した。さらに、単言語データを用いてコードスイッチング攻撃プロンプトを生成することにより、CSRTの拡張性を検証する。

Recent studies in large language models (LLMs) shed light on their multilingual ability and safety, beyond conventional tasks in language modeling. Still, current benchmarks reveal their inability to comprehensively evaluate them and are excessively dependent on manual annotations. In this paper, we introduce code-switching red-teaming (CSRT), a simple yet effective red-teaming technique that simultaneously tests multilingual understanding and safety of LLMs. We release the CSRT dataset, which comprises 315 code-switching queries combining up to 10 languages and eliciting a wide range of undesirable behaviors. Through extensive experiments with ten state-of-the-art LLMs, we demonstrate that CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than existing methods in English. We analyze the harmful responses toward the CSRT dataset concerning various aspects under ablation studies with 16K samples, including but not limited to scaling laws, unsafe behavior categories, and input conditions for optimal data generation. Additionally, we validate the extensibility of CSRT, by generating code-switching attack prompts with monolingual data.

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# 学術的統合のためのブロックチェーン:Blockchain Academic Credential Interoperability Protocol(BACIP)の開発

Blockchain for Academic Integrity: Developing the Blockchain Academic Credential Interoperability Protocol (BACIP) ( http://arxiv.org/abs/2406.15482v1 )

ライセンス: Link先を確認

Juan A. Berrios Moya,

(参考訳) 本研究は,世界規模の学術的資格証明のセキュリティ,プライバシ,相互運用性を著しく向上するために設計された,Blockchain Academic Credential Interoperability Protocol(BACIP)を紹介する。 BACIPは、デュアルブロックチェーンアーキテクチャ、スマートコントラクト、ゼロ知識証明を統合し、不正を低減し、世界中の学生やプロフェッショナルのモビリティと機会を改善することを目的とした、スケーラブルで透明なフレームワークを提供する。研究手法は、関連する文献の厳密なレビューと高度な技術コンポーネントの体系的な統合を含む混合メソッドのアプローチを採用する。これには、普遍的に互換性のあるシステムの開発を支えている質的および定量的分析の両方が含まれる。予備的な評価は、BACIPが認証効率を高め、改ざんや不正アクセスに対するセキュリティを強化することを示唆している。理論的な枠組みと実践的な実装はしっかりとした基盤を築き上げてきたが、実際の有効性は実運用環境で実証的な検証を待つ。今後の研究は、プロトタイプのデプロイ、堅牢なバリデーションポリシの確立、正確なテストパラメータの定義に注力する予定である。このクリティカルフェーズは、BACIPの運用上の堅牢性とその国際教育標準への準拠を徹底的に評価するために欠かせない。この研究は、学術的資格の管理と保護のための堅牢なモデルを提案し、ブロックチェーン技術を使用した認証検証のさらなるイノベーションのための強力な基盤を築くことで、学術分野に大きく貢献する。

This research introduces the Blockchain Academic Credential Interoperability Protocol (BACIP), designed to significantly enhance the security, privacy, and interoperability of verifying academic credentials globally, addressing the widespread issue of academic fraud. BACIP integrates dual blockchain architecture, smart contracts, and zero-knowledge proofs to offer a scalable and transparent framework aimed at reducing fraud and improving the mobility and opportunities for students and professionals worldwide. The research methodology adopts a mixed-methods approach, involving a rigorous review of pertinent literature and systematic integration of advanced technological components. This includes both qualitative and quantitative analyses that underpin the development of a universally compatible system. Preliminary evaluations suggest that BACIP could enhance verification efficiency and bolster security against tampering and unauthorized access. While the theoretical framework and practical implementations have laid a solid foundation, the protocol's real-world efficacy awaits empirical validation in a production environment. Future research will focus on deploying a prototype, establishing robust validation policies, and defining precise testing parameters. This critical phase is indispensable for a thorough assessment of BACIP's operational robustness and its compliance with international educational standards. This work contributes significantly to the academic field by proposing a robust model for managing and safeguarding academic credentials, thus laying a strong foundation for further innovation in credential verification using blockchain technology.

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# GenAIによる重複検出

Duplicate Detection with GenAI ( http://arxiv.org/abs/2406.15483v1 )

ライセンス: Link先を確認

Ian Ormesher,

(参考訳) 顧客データは、CRM(Customer Relations Management System)に記録として格納されることが多い。より多くのユーザが手動でそのようなシステムに入力したデータは、データの複製、部分複製、ファジィ複製につながる。これはつまり、顧客や連絡先、アカウントなどにとって、もはや唯一の真実の情報源が存在しないことを意味します。下流のビジネスプロセスは複雑になり、CRMのレコードとターゲットの顧客の間のユニークなマッピングがなければ、トリビュートされます。レコードの検出と非重複化の現在の方法は、Entity Matchingとして知られる従来の自然言語処理技術を使用している。本稿では,大規模言語モデルと生成AIの最近の進歩により,重複したレコードの識別と修復が大幅に向上することを示す。一般的なベンチマークデータセットでは,NLP手法で30%から,提案手法で60%に改善した。

Customer data is often stored as records in Customer Relations Management systems (CRMs). Data which is manually entered into such systems by one of more users over time leads to data replication, partial duplication or fuzzy duplication. This in turn means that there no longer a single source of truth for customers, contacts, accounts, etc. Downstream business processes become increasing complex and contrived without a unique mapping between a record in a CRM and the target customer. Current methods to detect and de-duplicate records use traditional Natural Language Processing techniques known as Entity Matching. In this paper we show how using the latest advancements in Large Language Models and Generative AI can vastly improve the identification and repair of duplicated records. On common benchmark datasets we find an improvement in the accuracy of data de-duplication rates from 30 percent using NLP techniques to almost 60 percent using our proposed method.

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# JobFair: 大規模言語モデルにおけるジェンダー採用バイアスのベンチマークフレームワーク

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models ( http://arxiv.org/abs/2406.15484v1 )

ライセンス: Link先を確認

Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin Jr., Maria Perez-Ortiz,

(参考訳) 本稿では,大規模言語モデル(LLM)における階層的ジェンダー採用バイアスのベンチマーク手法を提案する。まず、医療、財務、建設産業のリアルで匿名化された履歴データセットを使用したフレームワークを導入します。レベルバイアス、スプレッドバイアス、テイストベースのバイアス、統計バイアスなど、階層レベルの性別採用バイアスを評価する。この枠組みは、他の社会的特性やタスクに容易に一般化できる。第2に、ランクアフター・スコアリング(RAS)、ランクベースインパクト比、置換テストベースメトリクス、固定効果モデルベースメトリクスなど、反実的アプローチに基づく新しい統計的・計算的採用バイアスメトリクスを提案する。これらの指標は労働経済学、NLP、法律に根ざしており、雇用バイアスの全体的評価を可能にしている。第三に、私たちは10の最先端のLCMにおける採用バイアスを分析します。 10のLSMのうち6つは、医療と金融において男性に対して有意な偏見を示す。産業効果のレグレッションは、医療産業が男性に最も偏っていることを示している。 GPT-4o と GPT-3.5 が最も偏りのあるモデルであり、3つの業界で有意な偏りを示している。逆に、Gemini-1.5-Pro、Llama3-8b-Instruct、Llama3-70b-Instructは最もバイアスが少ない。 Llama3-8b-InstructとClaude-3-Sonnetを除く全てのLLMの雇用バイアスは、ランダムな膨張や再開内容の減少にかかわらず一貫している。最後に、このフレームワークの採用と実践を容易にするために、ユーザフレンドリーなデモを提供します。

This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse bias and overdebiasing. Our contributions are fourfold: First, we introduce a framework using a real, anonymized resume dataset from the Healthcare, Finance, and Construction industries, meticulously used to avoid confounding factors. It evaluates gender hiring biases across hierarchical levels, including Level bias, Spread bias, Taste-based bias, and Statistical bias. This framework can be generalized to other social traits and tasks easily. Second, we propose novel statistical and computational hiring bias metrics based on a counterfactual approach, including Rank After Scoring (RAS), Rank-based Impact Ratio, Permutation Test-Based Metrics, and Fixed Effects Model-based Metrics. These metrics, rooted in labor economics, NLP, and law, enable holistic evaluation of hiring biases. Third, we analyze hiring biases in ten state-of-the-art LLMs. Six out of ten LLMs show significant biases against males in healthcare and finance. An industry-effect regression reveals that the healthcare industry is the most biased against males. GPT-4o and GPT-3.5 are the most biased models, showing significant bias in all three industries. Conversely, Gemini-1.5-Pro, Llama3-8b-Instruct, and Llama3-70b-Instruct are the least biased. The hiring bias of all LLMs, except for Llama3-8b-Instruct and Claude-3-Sonnet, remains consistent regardless of random expansion or reduction of resume content. Finally, we offer a user-friendly demo to facilitate adoption and practical application of the framework.

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# 適応的構造的スパースアテンションを用いたLLM推論の近接ロスレス高速化

Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention ( http://arxiv.org/abs/2406.15486v1 )

ライセンス: Link先を確認

Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang,

(参考訳) 大規模言語モデル(LLM)は、非常に長いコンテキストウィンドウをサポートするようになったが、バニラアテンションの二次的な複雑さにより、TTFT(Time-to-First-Token)レイテンシが非常に長い。この複雑さに対処する既存のアプローチは、追加の事前訓練や微調整を必要とし、しばしばモデルの精度を犠牲にする。本稿では,まず,理論的および実証的な基礎を,ほぼ無光沢なスパークス・アテンションのために提示する。オーバーヘッドの少ないヘッド固有スパースパターンを実行時に動的にキャプチャすることが重要である。そこで本研究では,適応型構造化とほぼ無意味なスパースアテンションであるSampleAttentionを提案する。重要なスパースパターンを活用すれば、SampleAttentionは、ローカルウィンドウパターンをキャプチャするために隣接するトークンの一定割合に到達し、2段階のクエリ誘導キー値フィルタリングアプローチを使用して、最小のキー値セットを少ないオーバーヘッドで適応的に選択し、カラムストリップパターンをキャプチャする。総合的な評価によると、SampleAttentionは市販のLLMのバニラ注意をほぼ精度の低下なしにシームレスに置き換えることができ、また、FlashAttentionと比較してTTFTを最大2.42\times$に下げることができる。

Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention.

翻訳日:2024-07-01 06:51:29 公開日:2024-06-17

# LLMベースのAIチャットボットに関する完全な調査

A Complete Survey on LLM-based AI Chatbots ( http://arxiv.org/abs/2406.16937v1 )

ライセンス: Link先を確認

Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, Chaoning Zhang,

(参考訳) 過去数十年間、データの増加を目撃し、データ収集、学習ベースのAI技術の基礎を築いた。 AIチャットボットと呼ばれる会話エージェントは、大きな言語モデル(LLM)をトレーニングし、ユーザのプロンプトに応じて新しいコンテンツ(知識)を生成するために、そのようなデータに大きく依存している。 OpenAIのChatGPTの出現により、LLMベースのチャットボットはAIコミュニティに新たな標準を設定した。本稿では,様々な分野におけるLLMベースのチャットボットの進化と展開に関する完全な調査を行う。まず,LLMの進化に続き,現在使用されているLLMベースのチャットボットと開発段階にあるチャットボットについて概説する。 AIチャットボットを新しい知識を生み出すツールとして認識し、さまざまな産業にまたがる様々な応用を探求する。次に、LLMのトレーニングに使用されるデータと、生成された知識の誤用が、いくつかの問題を引き起こす可能性があることを考慮し、オープンな課題について議論する。最後に、多数のアプリケーションにおいて、その効率性と信頼性を高めるための将来の展望について検討する。重要なマイルストーンと、LLMベースのチャットボットの現在の状況に対処することによって、私たちの調査では、次世代の会話型AIをどのように作り直すのかを反映して、読者にこの領域を深く掘り下げるよう求めています。

The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the AI community. This paper presents a complete survey of the evolution and deployment of LLM-based chatbots in various sectors. We first summarize the development of foundational chatbots, followed by the evolution of LLMs, and then provide an overview of LLM-based chatbots currently in use and those in the development phase. Recognizing AI chatbots as tools for generating new knowledge, we explore their diverse applications across various industries. We then discuss the open challenges, considering how the data used to train the LLMs and the misuse of the generated knowledge can cause several issues. Finally, we explore the future outlook to augment their efficiency and reliability in numerous applications. By addressing key milestones and the present-day context of LLM-based chatbots, our survey invites readers to delve deeper into this realm, reflecting on how their next generation will reshape conversational AI.

翻訳日:2024-07-01 06:31:46 公開日:2024-06-17

# ホークス過程から学習された生理事象への混合ノイズ

Unmixing Noise from Hawkes Process to Model Learned Physiological Events ( http://arxiv.org/abs/2406.16938v1 )

ライセンス: Link先を確認

Guillaume Staerman, Virginie Loison, Thomas Moreau,

(参考訳) 生理的信号分析は、しばしば生物学的力学を理解するのに不可欠な事象を特定することを含む。従来の手法は手作りの手続きや教師あり学習に依存しており、専門家の依存、堅牢性の欠如、広範囲なラベル付きデータの必要性といった課題を提示している。畳み込み辞書学習(CDL)のようなデータ駆動型手法は代替手段を提供するが、突発的な検出をもたらす傾向がある。この研究は、事象における時間構造の共同学習と急激な検出の除去に対処する新しいアプローチであるUNHaP(Unmix Noise from Hawkes Processes)を導入している。マークされたホークスプロセスを利用して、UNHaPは興味のある出来事と刺激的な出来事を区別する。イベント検出出力を構造化イベントと非構造化イベントの混合として扱うことで、UNHaPは効率的にこれらのプロセスを解き、パラメータを推定する。このアプローチは、誤検出率を最小限に抑えながら、事象の分布の理解を著しく向上させる。

Physiological signal analysis often involves identifying events crucial to understanding biological dynamics. Traditional methods rely on handcrafted procedures or supervised learning, presenting challenges such as expert dependence, lack of robustness, and the need for extensive labeled data. Data-driven methods like Convolutional Dictionary Learning (CDL) offer an alternative but tend to produce spurious detections. This work introduces UNHaP (Unmix Noise from Hawkes Processes), a novel approach addressing the joint learning of temporal structures in events and the removal of spurious detections. Leveraging marked Hawkes processes, UNHaP distinguishes between events of interest and spurious ones. By treating the event detection output as a mixture of structured and unstructured events, UNHaP efficiently unmixes these processes and estimates their parameters. This approach significantly enhances the understanding of event distributions while minimizing false detection rates.

翻訳日:2024-07-01 06:31:46 公開日:2024-06-17

# ChatBug: Chatテンプレートによって誘導される配向LDMの共通脆弱性

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates ( http://arxiv.org/abs/2406.12935v1 )

ライセンス: Link先を確認

Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran,

(参考訳) 大規模言語モデル(LLM)は、ユーザからの指示に従って会話を行うことが期待されている。 LLMの命令フォロー機能を強化する技術は、通常、事前に定義されたチャットテンプレートに従って構造化されたデータを使って微調整する。チャットテンプレートはLLM性能の最適化に有効であることが示されているが,LLMの安全性に対する影響は理解されていない。本稿では,チャットテンプレートがLLMの安全性にどのように影響するかを検討する。チャットテンプレートによって導入された共通の脆弱性であるChatBugを特定します。 ChatBugを識別するための重要な洞察は、チャットテンプレートがLLMに従わなければならない堅固なフォーマットを提供するが、ユーザによるものではない、ということです。したがって、悪意のあるユーザは、LSMのプロンプト時に必ずしもチャットテンプレートに従うとは限らない。悪意のあるユーザは、チャットテンプレートの知識を活用して、LSMの安全アライメントをバイパスするプロンプトを作れます。 ChatBugの脆弱性を悪用する2つの攻撃を開発した。悪意のあるユーザが8つのSOTA (State-of-the-art) LLMのChatBug脆弱性を悪用し、これらのモデルから意図しない応答を効果的に引き出すことができることを示す。さらに,ChatBugは既存のジェイルブレイク攻撃によって悪用され,攻撃成功率を高めることができることを示す。 ChatBugに対する潜在的な対策について検討する。以上の結果から,ChatBug脆弱性を効果的に軽減する一方で,被害者モデルでは性能劣化が顕著であることがわかった。これらの結果は、安全アライメントと有用性の間のトレードオフを浮き彫りにしている。このトレードオフのバランスをとるための新しい指導法の開発は、今後の研究にとってオープンで重要な方向である

Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale. In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessarily follow the chat template when prompting LLMs. Instead, malicious users could leverage their knowledge of the chat template and accordingly craft their prompts to bypass safety alignments of LLMs. We develop two attacks to exploit the ChatBug vulnerability. We demonstrate that a malicious user can exploit the ChatBug vulnerability of eight state-of-the-art (SOTA) LLMs and effectively elicit unintended responses from these models. Moreover, we show that ChatBug can be exploited by existing jailbreak attacks to enhance their attack success rates. We investigate potential countermeasures to ChatBug. Our results show that while adversarial training effectively mitigates the ChatBug vulnerability, the victim model incurs significant performance degradation. These results highlight the trade-off between safety alignment and helpfulness. Developing new methods for instruction tuning to balance this trade-off is an open and critical direction for future research

翻訳日:2024-06-22 00:37:55 公開日:2024-06-17

# 登録する前にセルフトレインする

Self-Train Before You Transcribe ( http://arxiv.org/abs/2406.12937v1 )

ライセンス: Link先を確認

Robert Flynn, Anton Ragni,

(参考訳) トレーニング領域とテスト領域の間にミスマッチがある場合、現在の音声認識システムは、大幅な性能劣化を示す。ノイズの多い教員養成のような自己学習手法は、この問題に対処し、そのようなドメインシフトの下でモデルの適応を可能にする。しかし、セルフトレーニングは通常、非ラップのターゲットドメインデータの収集を必要とする。実践的でない環境では,テスト時間適応手法として,テストセットにおける録音におけるノイズの多い学生教師の訓練を行う利点について検討する。言語モデリングにおける動的評価手法と同様に、ドメイン適応の手法として、発話境界と関数間の情報の伝達を可能にする。ドメイン内のデータセットとドメイン外のデータセットは、32.2%までの大きな相対的なゲインを示す実験に使用される。興味深いことに,本手法は,個別適応データを利用した通常の自己学習装置よりも大きな利得を示した。

When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptation of models under such domain shifts. However, self-training typically requires a collection of unlabelled target domain data. For settings where this is not practical, we investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach. Similarly to the dynamic evaluation approach in language modelling, this enables the transfer of information across utterance boundaries and functions as a method of domain adaptation. A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%. Interestingly, our method showed larger gains than the typical self-training setup that utilises separate adaptation data.

翻訳日:2024-06-22 00:37:55 公開日:2024-06-17

# ISにおけるセキュリティと社会工学 --その概要と現状

Security in IS and social engineering -- an overview and state of the art ( http://arxiv.org/abs/2406.12938v1 )

ライセンス: Link先を確認

Florence Sèdes,

(参考訳) 情報技術に関連する大きな変革は、組織とそのアクターのビジネスプロセスをサポートする情報システム(IS)に影響を与える。センシティブで大規模で異質なデータを含む複雑な環境での展開は、法的、社会的、経済的影響を伴うリスクを生み出す。移行と開放性のこの文脈は、これらのISのセキュリティを組織の懸念の中心にしている。すべてのプロセスのデジタル化とIoTデバイス(モノのインターネット)のオープンは、サイバー犯罪という新たな犯罪の出現を促している。このジェネリックな用語は、多くの悪意ある行為をカバーしており、その大半は現在、社会的エンジニアリング戦略を使って実行されている。このような攻撃の悪意は、ユーザーがサイバー攻撃のファシリテーターになるという事実から、サイバーセキュリティの「弱リンク」と認識される点に起因しており、展開方針が不十分であるため、上流のステップについて考える必要がある。

Major transformations related to information technologies affect InformationSystems (IS) that support the business processes of organizations and their actors. Deployment in a complex environment involving sensitive, massive and heterogeneous data generates risks with legal, social and financial impacts. This context of transition and openness makes the security of these IS central to the concerns of organizations. The digitization of all processes and the opening to IoT devices (Internet of Things) has fostered the emergence of a new formof crime, i.e. cybercrime.This generic term covers a number of malicious acts, the majority of which are now perpetrated using social engineering strategies, a phenomenon enabling a combined exploitation of ``human'' vulnerabilities and digital tools. The maliciousness of such attacks lies in the fact that they turn users into facilitators of cyber-attacks, to the point of being perceived as the ``weak link'' of cybersecurity.As deployment policies prove insufficient, it is necessary to think about upstream steps: knowing how to anticipate, identifying weak signals and outliers, detect early and react quickly to computer crime are therefore priority issues requiring a prevention and cooperation approach.In this overview, we propose a synthesis of literature and professional practices on this subject.

翻訳日:2024-06-22 00:37:55 公開日:2024-06-17

# 超伝導回路における多体量子相関の測定

Measurement of Many-Body Quantum Correlations in Superconducting Circuits ( http://arxiv.org/abs/2406.12939v1 )

ライセンス: Link先を確認

Kamal Sharma, Wade DeGottardi,

(参考訳) 近年の超伝導回路技術の進歩により、大型でカスタマイズ可能な回路の製造が日常的に行われている。これにより、量子情報以外の分野、特に量子シミュレーターとしての利用に応用された。この取り組みの重要な課題は、これらの回路によって実現された量子状態の同定である。本稿では,アナログ量子シミュレータにおいて多体相関を読み取ることができるプローブ回路を提案する。多光子状態のために設計された我々の測定方法は、ジョセフソン接合の非線形性を利用して超伝導相作用素の2点相関関数(および潜在的に高次相関関数)を測定する。我々は、量子不純物を持つLCラダーの文脈で、この設計の能力を実証する。提案したプローブは、スクイーズのような本質的に量子相関の測定を可能にし、超伝導回路を用いてアナログ量子シミュレーションの範囲を大きく拡大する可能性がある。

Recent advances in superconducting circuit technology have made the fabrication of large, customizable circuits routine. This has led to their application to areas beyond quantum information and, in particular, to their use as quantum simulators. A key challenge in this effort has been the identification of the quantum states realized by these circuits. Here, we propose a probe circuit capable of reading out many-body correlations in an analog quantum simulator. Our measurement scheme, designed for many-photon states, exploits the non-linearity of the Josephson junction to measure two-point (and potentially higher-order) correlation functions of the superconducting phase operator. We demonstrate the capabilities of this design in the context of an LC-ladder with a quantum impurity. The proposed probe allows for the measurement of inherently quantum correlations, such as squeezing, and has the potential to significantly expand the scope of analog quantum simulations using superconducting circuits.

翻訳日:2024-06-22 00:37:55 公開日:2024-06-17

# PEPit: Pythonにおける一階最適化手法のコンピュータ支援最悪ケース解析

PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python ( http://arxiv.org/abs/2201.04040v2 )

ライセンス: Link先を確認

Baptiste Goujaud, Céline Moucer, François Glineur, Julien Hendrickx, Adrien Taylor, Aymeric Dieuleveut,

(参考訳) PEPitはPythonパッケージで、勾配、プロジェクション、近さ、線形最適化オラクルを含む多くの一階最適化メソッドの最悪のケース分析へのアクセスを、近似やブレグマン変種とともに単純化することを目的としている。簡単に言えば、PEPitはコンピュータ支援による一階最適化手法の最悪のケース解析を可能にするパッケージである。鍵となる考え方は、最悪のケース分析(しばしば性能推定問題(PEP)と呼ばれる)を半確定プログラム(SDP)として実行し、数値的に解くことである。そのため、パッケージのユーザは、実装するのとほとんど同じくらいに、一階のメソッドを書くことしか要求されない。その後、パッケージはSDPモデリング部品の処理を行い、最悪のケース解析は標準解法を介して数値的に行われる。

PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. To do that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modeling parts, and the worst-case analysis is performed numerically via a standard solver.

翻訳日:2024-06-20 05:50:01 公開日:2024-06-17

# 高次元量子系に対する反線型超作用素、量子幾何不変および反線型対称性

Antilinear superoperator, quantum geometric invariance, and antilinear symmetry for higher-dimensional quantum systems ( http://arxiv.org/abs/2202.10989v4 )

ライセンス: Link先を確認

Lu Wei, Zhian Jia, Dagomir Kaszlikowski, Sheng Tan,

(参考訳) 本稿では,特に量子幾何学的不変性,絡み合い分布,対称性に着目したオープン量子系の研究における反線型超作用素とその応用について系統的研究を行う。本稿では,反線形量子チャネル,反線形ユニタリスーパー演算子,反ユニタリスーパー演算子,一般化された$\Theta$-共役など,いくつかの重要な反線形スーパー演算子のクラスについて検討する。ブロッホ表現を用いて、高次元量子系の量子幾何学変換を体系的に研究する。異なる一般化された$\Theta$-共役を選択すれば、ユークリッド計量やミンコフスキー計量を含むブロッホ時空ベクトルの空間に対する様々な測度が得られる。これらの幾何学的構造を利用して、量子幾何学的不変性に制約された多部構造上の絡み合い分布を解析する。オープン量子系の強および弱反線形超作用素対称性についても論じる。さらに、クラマースの優越性と保存された量についても詳細に調べる。

We present a systematic investigation of antilinear superoperators and their applications in studying open quantum systems, particularly focusing on quantum geometric invariance, entanglement distribution, and symmetry. We study several crucial classes of antilinear superoperators, including antilinear quantum channels, antilinearly unital superoperators, antiunitary superoperators, and generalized $\Theta$-conjugation. Using the Bloch representation, we present a systematic investigation of quantum geometric transformations in higher-dimensional quantum systems. By choosing different generalized $\Theta$-conjugations, we obtain various metrics for the space of Bloch space-time vectors, including the Euclidean and Minkowskian metrics. Utilizing these geometric structures, we then investigate the entanglement distribution over a multipartite system constrained by quantum geometric invariance. The strong and weak antilinear superoperator symmetries of the open quantum system are also discussed. Additionally, Kramers' degeneracy and conserved quantities are examined in detail.

翻訳日:2024-06-20 05:43:26 公開日:2024-06-17

# エンタングルメント測定に基づくリアプノフ制御による最大絡み合い状態の生成

Generation of Maximally Entangled States by Lyapunov Control Based on Entanglement Measure ( http://arxiv.org/abs/2203.00182v3 )

ライセンス: Link先を確認

Yun-Yan Lee, Daoyi Dong, Ciann-Dong Yang,

(参考訳) 最大絡み合い状態(MES)は、量子情報処理において高い価値を持つ。量子制御において、MESの生成は一般に、あらかじめ定義されたMESを対象とする状態伝達問題として扱われる。しかし、このアプローチはMES構造を事前に決定する必要があるため、制限されている。本稿では, 量子状態間の距離を使わずに, リアプノフ関数を構成するために, 量子エンタングルメント測度に依存する改良型量子リアプノフ制御手法を提案する。この戦略は、その構造が事前に知られているかどうかに関わらず、単一の制御方式を用いて、任意のMESの作成を可能にする。提案手法は, 絡み合い対策をスカラーとして対象とするため, 絡み合ったサブシステムの数に影響を受けない。当初はバイパーティイト純状態に適用されたが、この方法はベル状態とその等価値を生成する能力を示している。その後の混合状態と多粒子系への応用は、この技術が不特定構造を持つMESを生成できることを示している。

Maximally entangled states (MES) are highly valued in quantum information processing. In quantum control, the creation of MES is typically treated as a state transfer problem with a predefined MES as the target. However, this approach is limited by the requirement to predetermine the MES structure. This paper introduces an improved quantum Lyapunov control approach that relies on the quantum entanglement measure to construct the Lyapunov function, instead of using the distance between quantum states. This strategy enables the preparation of any MES, regardless of whether its structure is known beforehand, using a single control scheme. The proposed entanglement control technique is unaffected by the number of entangled subsystems since it targets the entanglement measure as a scalar. Initially applied to bipartite pure states, this method demonstrates its capability to generate Bell states and their equivalents. Subsequent applications to bipartite mixed states and multipartite systems illustrate that the technique can produce MES with unspecified structures.

翻訳日:2024-06-20 05:43:26 公開日:2024-06-17

# 軽量信頼ハードウェアを用いた高能率プライバシ保護機械学習

Efficient Privacy-Preserving Machine Learning with Lightweight Trusted Hardware ( http://arxiv.org/abs/2210.10133v4 )

ライセンス: Link先を確認

Pengzhi Huang, Thang Hoang, Yueying Li, Elaine Shi, G. Edward Suh,

(参考訳) 本稿では,小型の専用セキュリティプロセッサによるセキュアな機械学習推論プラットフォームを提案する。このプラットフォームは,今日の高性能プロセッサに組み込まれたTEEと比較して,保護とデプロイが容易になる。私たちのプラットフォームは、最先端の3つの大きな利点を提供します。 i) Apple Enclaveプロセッサに似たSoCのTrusted Platform Module(TPM)やオンチップセキュリティサブシステムのような個別のセキュリティチップに匹敵する小さなセキュリティプロセッサのみで,最先端の分散プライバシ保存機械学習(PPML)プロトコルと比較して,大幅なパフォーマンス向上を実現している。 WAN/GPUでは、Falcon (PoPETs'21) やAriaNN (PoPETs'22) よりも4X-63倍高速で、通信効率は3.8X-12倍である。悪意のある設定でさらに高いパフォーマンス向上を実現しています。 (二)本プラットフォームは、本質的な過半数の前提のもと、悪意のある敵に対する攻撃を中止してセキュリティを保証する。 (iii)我々の技術は、TEEにおけるセキュアメモリのサイズに制限されず、ResNet18やTransformerのような高容量な現代のニューラルネットワークをサポートすることができる。 PPMLにおける高性能TEEの使用について以前の研究が検討されているが、この研究は、非常に限られた性能の小さなセキュアなハードウェアであっても、プロトコルが軽量なハードウェア向けに慎重に設計可能であれば、分散PPMLプロトコルの大幅な高速化に活用できることを初めて示すものである。

In this paper, we propose a new secure machine learning inference platform assisted by a small dedicated security processor, which will be easier to protect and deploy compared to today's TEEs integrated into high-performance processors. Our platform provides three main advantages over the state-of-the-art: (i) We achieve significant performance improvements compared to state-of-the-art distributed Privacy-Preserving Machine Learning (PPML) protocols, with only a small security processor that is comparable to a discrete security chip such as the Trusted Platform Module (TPM) or on-chip security subsystems in SoCs similar to the Apple enclave processor. In the semi-honest setting with WAN/GPU, our scheme is 4X-63X faster than Falcon (PoPETs'21) and AriaNN (PoPETs'22) and 3.8X-12X more communication efficient. We achieve even higher performance improvements in the malicious setting. (ii) Our platform guarantees security with abort against malicious adversaries under honest majority assumption. (iii) Our technique is not limited by the size of secure memory in a TEE and can support high-capacity modern neural networks like ResNet18 and Transformer. While previous work investigated the use of high-performance TEEs in PPML, this work represents the first to show that even tiny secure hardware with really limited performance can be leveraged to significantly speed-up distributed PPML protocols if the protocol can be carefully designed for lightweight trusted hardware.

翻訳日:2024-06-20 05:43:26 公開日:2024-06-17

# マルチエージェント強化学習システムにおける直接罰が協調の創発に及ぼす影響の検討

Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2301.08278v3 )

ライセンス: Link先を確認

Nayana Dasgupta, Mirco Musolesi,

(参考訳) 協力の問題を解決することは、機能的社会の構築と維持に根本的に重要である。協力の問題は、忙しい道路交差点の航行から条約交渉まで、人間の社会の中で一様である。社会全体でAIの利用が広まるにつれて、これらの複雑な協調ジレンマをナビゲートできる社会的にインテリジェントなエージェントの必要性がますます顕在化しつつある。直接罰は、人間と非人間の両方の協力の出現を促進することが示されている、ユビキタスな社会メカニズムである。自然界では、直接罰はパートナーの選択と評判と強く結びつき、第三者の罰と共に用いられる。これらのメカニズム間の相互作用は、集団内の協力の出現を促進する可能性がある。しかし,MARL(Multi-Agent Reinforcement Learning, マルチエージェント強化学習, MARL)集団から生まれる学習のダイナミクスや成果を,これらのメカニズムを組み合わせて評価する以前の研究は行われていない。この論文はこのギャップに対処する。直接罰、第三者罰、パートナー選択、評判に関連する行動と学習のダイナミクスを包括的に分析し、評価する。最後に,これらのメカニズムが協調型AIシステムの設計に与える影響について論じる。

Solving the problem of cooperation is fundamentally important for the creation and maintenance of functional societies. Problems of cooperation are omnipresent within human society, with examples ranging from navigating busy road junctions to negotiating treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents capable of navigating these complex cooperative dilemmas is becoming increasingly evident. Direct punishment is a ubiquitous social mechanism that has been shown to foster the emergence of cooperation in both humans and non-humans. In the natural world, direct punishment is often strongly coupled with partner selection and reputation and used in conjunction with third-party punishment. The interactions between these mechanisms could potentially enhance the emergence of cooperation within populations. However, no previous work has evaluated the learning dynamics and outcomes emerging from Multi-Agent Reinforcement Learning (MARL) populations that combine these mechanisms. This paper addresses this gap. It presents a comprehensive analysis and evaluation of the behaviors and learning dynamics associated with direct punishment, third-party punishment, partner selection, and reputation. Finally, we discuss the implications of using these mechanisms on the design of cooperative AI systems.

翻訳日:2024-06-20 05:43:26 公開日:2024-06-17

# RLにおけるマルチモーダル表現の再構成とコントラスト法の組み合わせ

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL ( http://arxiv.org/abs/2302.05342v3 )

ライセンス: Link先を確認

Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann,

(参考訳) 再構成や対照的な損失を用いた自己教師型表現の学習は、画像ベースおよびマルチモーダル強化学習(RL)の性能とサンプルの複雑さを向上させる。ここでは、異なる自己教師付き損失関数は、基礎となるセンサのモジュラリティの情報密度によって異なる利点と制限を有する。レコンストラクションは強力な学習信号を提供するが、気晴らしや刺激的な情報に影響を受けやすい。対照的なアプローチはそれらを無視することができるが、関連するすべての詳細を捕捉できず、表現の崩壊につながる可能性がある。マルチモーダルRLの場合、信号の歪み量に基づいて異なるモダリティを別々に扱う必要があることが示唆される。コントラスト的再構成集約表現学習(CoRAL)を提案する。このフレームワークは,各センサのモダリティに対して,最も適切な自己監督的損失を選択でき,表現が関連する側面により焦点を合わせることができる。我々はCoralの幅広いタスクに対するメリットを、注意散らしや閉塞を含むイメージ、新しい移動スイート、視覚的に現実的な注意散らしを伴う困難な操作スイートで評価する。コントラストと再構成に基づく損失を組み合わせたマルチモーダル表現の学習は,より簡単な表現学習アプローチや近年のベースラインに到達できないタスクを著しく改善し,課題を解決できることを示す。

Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and solve tasks that are out of reach for more naive representation learning approaches and other recent baselines.

翻訳日:2024-06-20 05:43:26 公開日:2024-06-17

# ファイバーベース量子ネットワークにおける非対称ノード配置

Asymmetric node placement in fiber-based quantum networks ( http://arxiv.org/abs/2305.09635v3 )

ライセンス: Link先を確認

Guus Avis, Robert Knegjens, Anders S. Sørensen, Stephanie Wehner,

(参考訳) 既存のインフラによって課される制限は、将来のファイバーベースの量子ネットワークのノード間でさらに間隔を縮めるのを難しくする。そこで本研究では,非対称ノード配置の負の効果を,チェーン内の処理ノード量子リピータだけでなく,有意な絡み合い発生に必要な中間点局の配置を別々に検討することによって検討する。中点駅では、1つの絡み合う試みを行うのに必要な時間、そのような試みの成功確率、そして絡み合った状態の忠実さに非対称性が与える影響について述べる。これは、光子の不識別性に対する色分散の影響を説明することを含む。量子リピータチェーンの場合、リピータノード間の不均一な間隔がボトルネックの原因となるかを数値的に調べ、待ち時間と時間状態の両方をノイズの多い量子メモリに格納する。一つの絡み合い試行に要する時間は、中間点の非対称性と直線的に増加するが、有意な絡み合い発生の成功確率と忠実度、繰り返し鎖の分布時間と誤り率はすべて、非対称性の量に関して第1の導関数を消滅させる。これは、少量の非対称性に対する量子ネットワーク性能のレジリエンスを示唆している。

Restrictions imposed by existing infrastructure can make it hard to ensure an even spacing between the nodes of future fiber-based quantum networks. We here investigate the negative effects of asymmetric node placement by considering separately the placement of midpoint stations required for heralded entanglement generation, as well as of processing-node quantum repeaters in a chain. For midpoint stations, we describe the effect asymmetry has on the time required to perform one entangling attempt, the success probability of such attempts, and the fidelity of the entangled states created. This includes accounting for the effects of chromatic dispersion on photon indistinguishability. For quantum-repeater chains we numerically investigate how uneven spacing between repeater nodes leads to bottlenecks, thereby increasing both the waiting time and the time states are stored in noisy quantum memory. We find that while the time required to perform one entangling attempt may increase linearly with the midpoint's asymmetry, the success probability and fidelity of heralded entanglement generation and the distribution time and error rate for repeater chains all have vanishing first derivatives with respect to the amount of asymmetry. This suggests resilience of quantum-network performance against small amounts of asymmetry.

翻訳日:2024-06-20 05:33:23 公開日:2024-06-17

# DU-Shapley: 効率的なデータセット評価のためのShapley Value Proxy

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation ( http://arxiv.org/abs/2306.02071v2 )

ライセンス: Link先を確認

Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet,

(参考訳) 我々は、データセットのバリュエーションの問題、すなわち、インクリメンタルゲインの定量化の問題を、個々のデータセットを他のデータセットに集約する、機械学習タスクの関連する事前定義されたユーティリティに考慮する。 Shapleyの値は、その正式な公理的正当化によってデータセットのバリュエーションを実行する自然なツールであり、モンテカルロ統合と組み合わせて計算的トラクタビリティの課題を克服することができる。しかし、そのような一般的な近似法は、場合によっては高価である。本稿では、データセット評価問題の構造に関する知識を活用し、より効率的なシェープ値推定器を考案する。そこで本研究では,離散一様シャプリーとよばれる新しい近似法を提案する。我々は、漸近的および非漸近的理論的保証を通じて提案フレームワークの妥当性を正当化し、その利点を広範な数値実験を通して説明する。

We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computational tractability challenges. Such generic approximation methods, however, remain expensive in some cases. In this paper, we exploit the knowledge about the structure of the dataset valuation problem to devise more efficient Shapley value estimators. We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution with support of reasonable size. We justify the relevancy of the proposed framework via asymptotic and non-asymptotic theoretical guarantees and illustrate its benefits via an extensive set of numerical experiments.

翻訳日:2024-06-20 05:23:38 公開日:2024-06-17

# 国家規制政策最適化

State-wise Constrained Policy Optimization ( http://arxiv.org/abs/2306.12594v3 )

ライセンス: Link先を確認

Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, Changliu Liu,

(参考訳) 強化学習(Reinforcement Learning, RL)アルゴリズムはシミュレーション環境では非常に成功したが、実世界の問題への適用には大きな課題が伴い、安全性が大きな懸念事項となっている。特に、自律運転やロボット操作など、多くの困難なタスクにおいて、国家的制約の実施が不可欠である。しかし、CMDP(Constrained Markov Decision Process)の枠組みに基づく既存の安全なRLアルゴリズムは、状態制約を考慮していない。このギャップに対処するため,国家制約付き強化学習のための汎用政策探索アルゴリズムである国家制約付き政策最適化(SCPO)を提案する。 SCPOは、期待する状態の制約満足度を保証する。特に,最大マルコフ決定プロセスの枠組みを導入し,最悪の安全違反がSCPOに拘束されていることを証明した。本稿では,ロボット移動タスクにおけるニューラルネットワークポリシーのトレーニングにおけるアプローチの有効性を実証する。その結果、SCPOは既存の手法よりも優れており、高次元ロボティクスタスクにおける状態制約を処理できることが示唆された。

Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing state-wise constraints is essential for many challenging tasks such as autonomous driving and robot manipulation. However, existing safe RL algorithms under the framework of Constrained Markov Decision Process (CMDP) do not consider state-wise constraints. To address this gap, we propose State-wise Constrained Policy Optimization (SCPO), the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. SCPO provides guarantees for state-wise constraint satisfaction in expectation. In particular, we introduce the framework of Maximum Markov Decision Process, and prove that the worst-case safety violation is bounded under SCPO. We demonstrate the effectiveness of our approach on training neural network policies for extensive robot locomotion tasks, where the agent must satisfy a variety of state-wise safety constraints. Our results show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.

翻訳日:2024-06-20 05:23:38 公開日:2024-06-17

# 開示制御プロキシによる平衡フィルタ

Balanced Filtering via Disclosure-Controlled Proxies ( http://arxiv.org/abs/2306.15083v3 )

ライセンス: Link先を確認

Siqi Deng, Emily Diana, Michael Kearns, Aaron Roth,

(参考訳) 本研究では,グループメンバーシップが利用できない場合や,デプロイ時に使用が禁止された場合,センシティブなグループに対してバランスの取れたコホートやセットを収集する問題について検討する。具体的には,我々の展開時収集機構は,ベースレートだけで確認できるよりも,個々のサンプルのグループメンバシップについて顕著に明らかにしていない。そこで本研究では、ラベル付きデータの小さなセットを使って、後にこのフィルタリングや選択タスクに使用できるプロキシ関数を訓練できる学習者について検討する。次に、プロキシ関数の範囲をサンプリング確率に関連付け、新しい例として、プロキシ関数を使用してそれを分類し、プロキシ分類に対応する確率で選択する。重要なことは、プロキシ分類は、人口ベース率のみと比較して、個々のサンプルのセンシティブなグループメンバーシップに関する情報(すなわち、開示のレベルを制御すべき)を著しく多く明らかにし、そのようなプロキシをサンプルおよびオラクル効率のよい方法で見つけることができることを示す必要がある。最後に,提案アルゴリズムを実験的に評価し,その一般化特性を解析する。

We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any individual example compared to population base rates alone (i.e., the level of disclosure should be controlled) and show that we can find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze its generalization properties.

翻訳日:2024-06-20 05:23:38 公開日:2024-06-17

# 臨界モメンタを用いた記憶強化アダムの探索促進

Promoting Exploration in Memory-Augmented Adam using Critical Momenta ( http://arxiv.org/abs/2307.09638v2 )

ライセンス: Link先を確認

Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar,

(参考訳) 適応的な勾配に基づくオプティマイザ、特にAdamは、大規模なディープラーニングモデルのトレーニングにおいて、ハイパーパラメータ設定に対する高速な収束と堅牢性を提供し、その地位を残している。しかし、彼らはしばしば一般化に苦しむが、それはロスランドスケープの鋭いミニマに収束する傾向があるためである。これを解決するために,トレーニング中に臨界運動量項のバッファを組み込むことで,フラットなミニマへの探索を促進するAdamの新しいメモリ拡張版を提案する。このバッファにより、オプティマイザは狭いミニマを越えてオーバーシュートし、探索を促進する。簡単な設定で包括的解析を行うことで、より平坦なミニマへの探索と偏見を高めるためのアプローチの有効性を示す。我々は、画像NetとCIFAR10/100の画像分類、Penn Treebankの言語モデリング、TinyImageNetと5-datasetのオンライン学習タスクにおいて、モデル性能を向上させることを実証的に実証した。私たちのコードは \url{https://github.com/chandar-lab/CMOptimizer} で利用可能です。

Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergence and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages exploration towards flatter minima by incorporating a buffer of critical momentum terms during training. This buffer prompts the optimizer to overshoot beyond narrow minima, promoting exploration. Through comprehensive analysis in simple settings, we illustrate the efficacy of our approach in increasing exploration and bias towards flatter minima. We empirically demonstrate that it can improve model performance for image classification on ImageNet and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on TinyImageNet and 5-dataset. Our code is available at \url{https://github.com/chandar-lab/CMOptimizer}.

翻訳日:2024-06-20 05:13:54 公開日:2024-06-17

# 検証順序決定のためのマルチモーダル事前学習モデル:計画・接地・知覚

Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception ( http://arxiv.org/abs/2308.05295v2 )

ライセンス: Link先を確認

Yunhao Yang, Cyrus Neary, Ufuk Topcu,

(参考訳) 最近開発された事前学習モデルは、テキストや画像など、複数のモードで表現された豊かな世界知識を符号化することができる。しかし、これらのモデルの出力は、シーケンシャルな意思決定タスクを解決するアルゴリズムに統合することはできない。本研究では,事前学習したモデルから得られた知識を利用して,逐次意思決定タスクのための制御器の構築と検証を行い,これらの制御器を視覚的観察によりタスク環境に接地するアルゴリズムを開発した。特に、アルゴリズムは、ユーザーが提供するテキストベースのタスク記述で事前訓練されたモデルをクエリし、モデルの出力を使用して、モデルのタスク関連知識を符号化するオートマトンベースのコントローラを構築する。コントローラにエンコードされた知識が、環境やユーザが提供する仕様に関する抽象的な情報を含む、他の独立して利用可能な知識と一致しているかどうかの正式な検証を可能にする。次に、事前訓練されたモデルのビジョンと言語能力を利用して、タスク環境からの観察とコントローラからのテキストベースの制御ロジック(アクションをトリガーするアクションや条件など)をリンクする。本稿では,ユーザが提供する仕様を知覚的不確実性の下で満足するかどうか,確率的保証を提供する機構を提案する。このアルゴリズムは,日常生活やロボット操作など,現実的なタスクのスイートを通じて,オートマトンベースのコントローラを構築し,検証し,構築する能力を示す。

Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations with formal guarantees. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It allows formal verification of whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. Next, the algorithm leverages the vision and language capabilities of pretrained models to link the observations from the task environment to the text-based control logic from the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to provide probabilistic guarantees on whether the controller satisfies the user-provided specifications under perceptual uncertainties. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.

翻訳日:2024-06-20 05:13:54 公開日:2024-06-17

# 惑星間ナビゲーションのための自律型視覚ベースアルゴリズム

An Autonomous Vision-Based Algorithm for Interplanetary Navigation ( http://arxiv.org/abs/2309.09590v3 )

ライセンス: Link先を確認

Eleonora Andreis, Paolo Panicucci, Francesco Topputo,

(参考訳) 深宇宙探査機の急増により、標準のラジオメトリック・トラッキングでそれらをナビゲートすることは不可能になった。自走型惑星間衛星はこの問題の解決策である。本研究では、軌道決定法と、自律プラットフォーム間の惑星間移動に適した画像処理パイプラインを組み合わせることで、完全な視覚に基づくナビゲーションアルゴリズムを構築する。アルゴリズムの計算効率を高めるために、深宇宙画像から抽出された惑星の位置によって供給される状態推定器として、非次元拡張カルマンフィルタが選択される。最適な1組の惑星を追尾するための最適な戦略を適用することにより、推定精度の向上を行う。さらに,光収差と光時間効果を1次近似した新しい深宇宙航法解析モデルを開発した。アルゴリズムの性能は高忠実な地球上でテストされ、火星間移動が深宇宙航法に適用可能であることを示す。

The surge of deep-space probes makes it unsustainable to navigate them with standard radiometric tracking. Self-driving interplanetary satellites represent a solution to this problem. In this work, a full vision-based navigation algorithm is built by combining an orbit determination method with an image processing pipeline suitable for interplanetary transfers of autonomous platforms. To increase the computational efficiency of the algorithm, a non-dimensional extended Kalman filter is selected as state estimator, fed by the positions of the planets extracted from deep-space images. An enhancement of the estimation accuracy is performed by applying an optimal strategy to select the best pair of planets to track. Moreover, a novel analytical measurement model for deep-space navigation is developed providing a first-order approximation of the light-aberration and light-time effects. Algorithm performance is tested on a high-fidelity, Earth--Mars interplanetary transfer, showing the algorithm applicability for deep-space navigation.

翻訳日:2024-06-20 05:04:09 公開日:2024-06-17

# 奥行き雑音に対するロバスト6DoF推定と移動データに対する包括的評価

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset ( http://arxiv.org/abs/2309.13570v4 )

ライセンス: Link先を確認

Zixun Huang, Keling Yao, Seth Z. Zhao, Chuanyu Pan, Chenfeng Xu, Kathy Zhuang, Tianjian Xu, Weiyu Feng, Allen Y. Yang,

(参考訳) モバイルデバイスによるロバスト6DoFのポーズ推定は、ロボティクス、拡張現実、デジタルツインローカライゼーションの応用の基礎となっている。本稿では,既存のRGBDベースの6DoFポーズ推定手法の各種奥行きセンサノイズに対するロバスト性について検討する。既存の6DoFポーズ推定手法では,深度測定の不正確さによる性能差が著しいことが強調された。このロバスト性問題に対して,DTTDNetと呼ばれる簡易かつ効果的な6DoFポーズ推定手法を提案し,新しい幾何学的特徴フィルタリングモジュールとトレーニング用チャンファー距離損失を特徴とする。さらに、ロバストな6DoFポーズ推定の分野を前進させ、新しいデータセット、Digital Twin Tracking Dataset Mobile (DTTD-Mobile)を導入しました。大規模な実験により、DTTDNetは、DTTD-MobileのABD測定値において、少なくとも4.32以上の最先端の手法よりも60.74ポイント高い性能を示した。さらに重要なことは,本手法は様々なレベルの測定ノイズに対して優れたロバスト性を示し,ノイズ測定に対するロバスト性に対する新しいベンチマークを設定することである。コードとデータセットは、https://github.com/augcog/DTTD2で公開されている。

Robust 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise. We highlight that existing 6DoF pose estimation methods suffer significant performance discrepancies due to depth measurement inaccuracies. In response to the robustness issue, we present a simple and effective transformer-based 6DoF pose estimation approach called DTTDNet, featuring a novel geometric feature filtering module and a Chamfer distance loss for training. Moreover, we advance the field of robust 6DoF pose estimation and introduce a new dataset -- Digital Twin Tracking Dataset Mobile (DTTD-Mobile), tailored for digital twin object tracking with noisy depth data from the mobile RGBD sensor suite of the Apple iPhone 14 Pro. Extensive experiments demonstrate that DTTDNet significantly outperforms state-of-the-art methods at least 4.32, up to 60.74 points in ADD metrics on the DTTD-Mobile. More importantly, our approach exhibits superior robustness to varying levels of measurement noise, setting a new benchmark for the robustness to noise measurements. Code and dataset are made publicly available at: https://github.com/augcog/DTTD2

翻訳日:2024-06-20 05:04:09 公開日:2024-06-17

# ジェネレーティブエッシャーメッシュ

Generative Escher Meshes ( http://arxiv.org/abs/2309.14564v4 )

ライセンス: Link先を確認

Noam Aigerman, Thibault Groueix,

(参考訳) 本稿では, 床, モザイク, セラミック, M.C.エッシャーの作品など, 完全反復的, 周期的, タイル可能な2次元画像の完全自動生成法を提案する。タイルを組むとシームレスな正方形テクスチャ画像とは対照的に,本手法では,同じオブジェクトのコピーを繰り返すだけで構成される正方形でないタイリングを生成する。これは、2Dメッシュの幾何学とテクスチャの両方を最適化し、望まれる物体の形状と外観に二乗のタイルを産み出す。我々は、対称群から生じる任意の境界条件に対して有効な全てのメッシュの空間の制約のない微分可能なパラメータ化により、タイルの形状の最適化を可能にする。すなわち、メッシュのラプラシア行列を微分可能なパラメータとして考慮し、2次元メッシュマッピング技術であるOrbifold Tutte Embeddingから導かれる線形系の微分可能な族を構築する。これらの線形系の解空間は、正にすべての有効なタイリング構成であり、したがって、有効タイル全体の終端から終端までの微分可能表現を与える。我々は、テクスチャ化されたメッシュを微分可能なレンダラでレンダリングし、事前訓練された画像拡散モデルを利用して、結果の画像に損失を生じさせ、メッシュのパラメータを更新し、その外観がテキストプロンプトにマッチするようにした。本手法は,多種多様な周期的タイリングパターンに対して,非自明なタイルを用いて,可塑性で魅力的な結果が得られることを示す。

This paper proposes a fully-automatic, text-guided generative method for producing perfectly-repeating, periodic, tile-able 2D imagery, such as the one seen on floors, mosaics, ceramics, and the work of M.C. Escher. In contrast to square texture images that are seamless when tiled, our method generates non-square tilings which comprise solely of repeating copies of the same object. It achieves this by optimizing both geometry and texture of a 2D mesh, yielding a non-square tile in the shape and appearance of the desired object, with close to no additional background details, that can tile the plane without gaps nor overlaps. We enable optimization of the tile's shape by an unconstrained, differentiable parameterization of the space of all valid tileable meshes for given boundary conditions stemming from a symmetry group. Namely, we construct a differentiable family of linear systems derived from a 2D mesh-mapping technique - Orbifold Tutte Embedding - by considering the mesh's Laplacian matrix as differentiable parameters. We prove that the solution space of these linear systems is exactly all possible valid tiling configurations, thereby providing an end-to-end differentiable representation for the entire space of valid tiles. We render the textured mesh via a differentiable renderer, and leverage a pre-trained image diffusion model to induce a loss on the resulting image, updating the mesh's parameters so as to make its appearance match the text prompt. We show our method is able to produce plausible, appealing results, with non-trivial tiles, for a variety of different periodic tiling patterns.

翻訳日:2024-06-20 05:04:09 公開日:2024-06-17

# ゼロショット軌道生成器としての言語モデル

Language Models as Zero-Shot Trajectory Generators ( http://arxiv.org/abs/2310.11604v2 )

ライセンス: Link先を確認

Teyun Kwon, Norman Di Palo, Edward Johns,

(参考訳) 大規模言語モデル(LLM)は、最近、低レベルのスキルの選択へのアクセスを与えられたとき、ロボットのハイレベルプランナーとして約束されている。しかし、LLMは低レベルの軌道自体に使用する十分な知識を持っていないとしばしば仮定される。本研究では,LLM(GPT-4)がオブジェクト検出とセグメンテーションビジョンモデルのみへのアクセスを与えられた場合,操作タスクに対して,エンドエフェクタの高密度なシーケンスを直接予測できるかどうかを詳細に検討する。我々は、コンテキスト内例、モーションプリミティブ、または外部軌跡オプティマイザを使わずに、単一のタスクに依存しないプロンプトを設計した。そこで,本研究では,「ボトルキャップを開いて」や「スポンジで皿を拭く」といった実世界の30の言語ベースタスクに対して,どのような設計選択が重要かを検討した。我々の結論は、ロボット工学におけるLLMの想定限界を提起し、LLMが様々な共通タスクに十分な低レベルロボット制御の理解を実際に持っていることを初めて明らかにし、さらに障害を検知し、それに従って軌道の再計画を行うことができる。ビデオ、プロンプト、コードは、https://www.robot-learning.uk/ language-models-trajectory-generatorsで入手できる。

Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigated which design choices in this prompt are the most important. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, prompts, and code are available at: https://www.robot-learning.uk/language-models-trajectory-generators.

翻訳日:2024-06-20 05:04:09 公開日:2024-06-17

# p$-norm線形回帰による経験的リスク最小化のための最適リスク境界

Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression ( http://arxiv.org/abs/2310.12437v2 )

ライセンス: Link先を確認

Ayoub El Hanchi, Murat A. Erdogdu,

(参考訳) 我々は、$p \in (1, \infty)$に対する$p$-norm線形回帰問題に対する経験的リスク最小化の性能について検討する。実現可能な場合、モーメント仮定が全くなく、分布依存定数まで、$O(d)$サンプルはターゲットを正確に回収するのに十分であることを示す。さもなければ、$p \in [2, \infty)$ と、ターゲットと共変量に対する弱モーメント仮定の下では、先行項が一致する経験的リスク最小化子に縛られる高い確率過剰リスクを、漸近的に正確な率である$p$にのみ依存する定数まで証明する。この結果は、最小化子におけるリスクのヘッセンの存在を保証する軽度の仮定の下で、$p \in (1, 2)$ の場合に拡張する。

We study the performance of empirical risk minimization on the $p$-norm linear regression problem for $p \in (1, \infty)$. We show that, in the realizable case, under no moment assumptions, and up to a distribution-dependent constant, $O(d)$ samples are enough to exactly recover the target. Otherwise, for $p \in [2, \infty)$, and under weak moment assumptions on the target and the covariates, we prove a high probability excess risk bound on the empirical risk minimizer whose leading term matches, up to a constant that depends only on $p$, the asymptotically exact rate. We extend this result to the case $p \in (1, 2)$ under mild assumptions that guarantee the existence of the Hessian of the risk at its minimizer.

翻訳日:2024-06-20 05:04:09 公開日:2024-06-17

# クロスチャネルアテンションを用いたリモートセンシング画像の物体検出のためのマルチモーダルトランス

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images ( http://arxiv.org/abs/2310.13876v3 )

ライセンス: Link先を確認

Bissmella Bahaduri, Zuheng Ming, Fangchen Feng, Anissa Mokraou,

(参考訳) リモートセンシング画像(RSI)における物体検出は、地球観測(EO)における多くの応用にとって重要な課題である。自然画像における物体検出とは違い、リモートセンシング画像における物体検出は、注釈付きデータの不足と、わずか数ピクセルで表される小さな物体の存在という課題に直面している。マルチモーダル融合は、RGB、赤外線(IR)、ライダー、合成開口レーダ(SAR)などの複数のモードからのデータを融合することで精度を高めることが決定されている。この目的のために、並列サブネットによって生成される中間または後期の表現の融合が支配的であり、モダリティの数順に計算複雑性が増大し、追加の工学的障害が生じるという欠点がある。クロスアテンション機構を用いて,早期に異なるチャネル間の関係をマッピングする新たなマルチモーダル融合戦略を提案し,異なるモダリティを整列させてコヒーレントな入力を構築する。本手法は,中期・後期の手法とは対照的に,早期の融合に対処することにより,既存の手法と比較して,競争力や性能に優れる。さらに、非シフトブロックのフィードフォワードに畳み込み層を統合することでSWIN変換器を強化する。この拡張により、局所的な注意を通して分離されたウィンドウをマージするモデルの能力が強化され、小さなオブジェクト検出が改善される。大規模な実験により提案した多モード核融合モジュールとアーキテクチャの有効性が証明され、多モード空中画像における物体検出への適用性が確認された。

Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Differing from object detection in natural images, object detection in remote sensing images faces challenges of scarcity of annotated data and the presence of small objects represented by only a few pixels. Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities such as RGB, infrared (IR), lidar, and synthetic aperture radar (SAR). To this end, the fusion of representations at the mid or late stage, produced by parallel subnetworks, is dominant, with the disadvantages of increasing computational complexity in the order of the number of modalities and the creation of additional engineering obstacles. Using the cross-attention mechanism, we propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage, enabling the construction of a coherent input by aligning the different modalities. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques. Additionally, we enhance the SWIN transformer by integrating convolution layers into the feed-forward of non-shifting blocks. This augmentation strengthens the model's capacity to merge separated windows through local attention, thereby improving small object detection. Extensive experiments prove the effectiveness of the proposed multimodal fusion module and the architecture, demonstrating their applicability to object detection in multimodal aerial imagery.

翻訳日:2024-06-20 04:54:22 公開日:2024-06-17

# 脳遺伝子転写の圧縮的発現

Compressed representation of brain genetic transcription ( http://arxiv.org/abs/2310.16113v2 )

ライセンス: Link先を確認

James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev,

(参考訳) 脳の構造は複雑すぎて、圧縮された表現を使わずに直感的に調査することができず、その変化をコンパクトでナビゲート可能な空間に投影する。この課題は、解剖学的および転写学的パターンの結合の複雑さが最大圧縮を要求する遺伝子表現のような高次元データにおいて特に困難である。標準的な主成分分析(PCA)を用いることで、計算効率は、特に大きな圧縮比において、限られた表現率によってオフセットされる。ここでは、最も広く支持されている線形および非線形な手法-PCA、カーネルPCA、非負行列分解(NMF)、t-stochastic neighbor embedding(T-SNE)、一様多様体近似および投影(UMAP)、深部自己符号化量子化再構成フィデリティ、解剖学的コヒーレンス、および信号伝達、微細構造、代謝目標に関する予測ユーティリティに基づく圧縮表現を体系的に比較する。ディープオートエンコーダは、人間の脳における転写パターンの参照標準としての使用をサポートするため、パフォーマンスとターゲットドメインのすべての指標において優れた表現が得られることを示す。

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

翻訳日:2024-06-20 04:54:22 公開日:2024-06-17

# パラメトリック不確実性を有するランダムフィールドの多項カオスサロゲート構築

Polynomial Chaos Surrogate Construction for Random Fields with Parametric Uncertainty ( http://arxiv.org/abs/2311.00553v2 )

ライセンス: Link先を確認

Joy N. Mueller, Khachik Sargsyan, Craig J. Daniels, Habib N. Najm,

(参考訳) 工学と応用科学は物理系を厳格に研究するために計算実験に頼っている。これらの系を探索する数学的モデルは非常に複雑であり、サンプリング集約的な研究は、許容できる精度のために、不可能に多くのシミュレーションを必要とすることが多い。サロゲートモデルは、そのような複雑なモデルをサンプリングする際の高い計算コストを回避する手段を提供する。特に、多項式カオス展開(PCEs)は、不確実性の主源がパラメトリックである決定論的モデルの不確実性定量化研究に成功している。本稿では,従来のPCEサロゲートモデルの拡張について論じ,パラメトリック不確実性に加えて固有雑音を有する確率的計算モデルのサロゲート構築を可能にする。我々は,内在的かつパラメトリックな不確実性の結合空間上にPCEサロゲートを開発し,その構成をKarhunen-Loeve展開によるランダムフィールドデータに拡張する。次に,PCEソボ指数を計算するための閉形式解を利用して,モデル全体の出力分散に対する本質的な雑音寄与を定量化するモデルに対して,大域的な感度解析を行う。さらに、結果のジョイントPCEは、基礎となる確率モデルから、統計的にほぼ同値な任意の入力パラメータ設定でランダムな実化を生成することができるという意味で、生成的である。この方法は化学触媒の例モデルで示される。

Engineering and applied science rely on computational experiments to rigorously study physical systems. The mathematical models used to probe these systems are highly complex, and sampling-intensive studies often require prohibitively many simulations for acceptable accuracy. Surrogate models provide a means of circumventing the high computational expense of sampling such complex models. In particular, polynomial chaos expansions (PCEs) have been successfully used for uncertainty quantification studies of deterministic models where the dominant source of uncertainty is parametric. We discuss an extension to conventional PCE surrogate modeling to enable surrogate construction for stochastic computational models that have intrinsic noise in addition to parametric uncertainty. We develop a PCE surrogate on a joint space of intrinsic and parametric uncertainty, enabled by Rosenblatt transformations, and then extend the construction to random field data via the Karhunen-Loeve expansion. We then take advantage of closed-form solutions for computing PCE Sobol indices to perform a global sensitivity analysis of the model which quantifies the intrinsic noise contribution to the overall model output variance. Additionally, the resulting joint PCE is generative in the sense that it allows generating random realizations at any input parameter setting that are statistically approximately equivalent to realizations from the underlying stochastic model. The method is demonstrated on a chemical catalysis example model.

翻訳日:2024-06-20 04:54:22 公開日:2024-06-17

# クラスシンボリック回帰:Gotta Fit 'Em All

Class Symbolic Regression: Gotta Fit 'Em All ( http://arxiv.org/abs/2312.01816v2 )

ライセンス: Link先を確認

Wassim Tenachi, Rodrigo Ibata, Thibaut L. François, Foivos I. Diakogiannis,

(参考訳) クラスシンボル回帰(Class Symbolic Regression)は、複数のデータセットに正確に適合する単一の分析機能フォームを自動的に見つけるための、最初のフレームワークである。この階層的な枠組みは、単一の物理現象の全てのメンバーが共通の法則に従うという共通の制約を利用する。提案手法は,非教師付き記号解析関数発見のための次元解析制約と深部強化学習を統合した,従来の記号回帰のための物理記号最適化(「Phi$-SO」)フレームワークの機能を拡張する。さらに、このようなアルゴリズムを評価するために特別に設計された一連の合成物理課題を含む、最初のクラスSRベンチマークを導入する。我々は、これらのベンチマーク課題に適用することで、新しいアプローチの有効性を実証し、恒星の流れを近似したシミュレーション軌道から分析銀河ポテンシャルを抽出し、天体物理学の実用性を実証する。

We introduce 'Class Symbolic Regression' (Class SR) a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each realization being governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($\Phi$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for unsupervised symbolic analytical function discovery from data. Additionally, we introduce the first Class SR benchmark, comprising a series of synthetic physical challenges specifically designed to evaluate such algorithms. We demonstrate the efficacy of our novel approach by applying it to these benchmark challenges and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.

翻訳日:2024-06-20 04:44:38 公開日:2024-06-17

# ゼーマン状態からの光シフトと光子散乱のための状態非感受性波長

State-insensitive wavelengths for light shifts and photon scattering from Zeeman states ( http://arxiv.org/abs/2312.08370v2 )

ライセンス: Link先を確認

Stuart J. Masson, Zhenjie Yan, Jacquelyn Ho, Yue-Hui Lu, Dan M. Stamper-Kurn, Ana Asenjo-Garcia,

(参考訳) 原子は2レベルシステムではなく、その豊富な内部構造は、しばしば光の存在下で複雑な現象を引き起こす。ここでは、全超微細構造と磁気構造を含む非共鳴光散乱を解析する。ゼーマン状態によらず、誘導された原子双極子が同じであり、原子状態を変化させる2光子遷移がオフとなる周波数デチューニングのセットを見つける。アルカリ原子とアルカリ-アースイオンでは、超微細な分裂が磁気双極子モーメントの寄与によって支配される場合、これらの脱調はほぼ一致する。したがって、与えられた `magical'' のデチューニングにおいて、超微細多様体のゼーマン状態はすべてほぼ同じ振る舞いをしており、良い近似に辿り着くことができる。この特徴は光散乱による状態のデコヒーレンスを防ぎ、量子光学実験や量子情報応用に影響を及ぼす。

Atoms are not two-level systems, and their rich internal structure often leads to complex phenomena in the presence of light. Here, we analyze off-resonant light scattering including the full hyperfine and magnetic structure. We find a set of frequency detunings where the induced atomic dipole is the same irrespective of the Zeeman state, and where two-photon transitions that alter the atomic state turn off. For alkali atoms and alkaline-earth ions, if the hyperfine splitting is dominated by the magnetic dipole moment contribution, these detunings approximately coincide. Therefore, at a given ``magical'' detuning, all Zeeman states in a hyperfine manifold behave almost identically, and can be traced out to good approximation. This feature prevents state decoherence due to light scattering, which impacts quantum optics experiments and quantum information applications.

翻訳日:2024-06-20 04:44:38 公開日:2024-06-17

# Gemini: 高機能マルチモーダルモデルのファミリー

Gemini: A Family of Highly Capable Multimodal Models ( http://arxiv.org/abs/2312.11805v4 )

ライセンス: Link先を確認

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, Ryan Doherty, Eli Collins, Clemens Meyer, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Jack Krawczyk, Cosmo Du, Ed Chi, Heng-Tze Cheng, Eric Ni, Purvi Shah, Patrick Kane, Betty Chan, Manaal Faruqui, Aliaksei Severyn, Hanzhao Lin, YaGuang Li, Yong Cheng, Abe Ittycheriah, Mahdis Mahdieh, Mia Chen, Pei Sun, Dustin Tran, Sumit Bagri, Balaji Lakshminarayanan, Jeremiah Liu, Andras Orban, Fabian Güra, Hao Zhou, Xinying Song, Aurelien Boffy, Harish Ganapathy, Steven Zheng, HyunJeong Choe, Ágoston Weisz, Tao Zhu, Yifeng Lu, Siddharth Gopal, Jarrod Kahn, Maciej Kula, Jeff Pitman, Rushin Shah, Emanuel Taropa, Majd Al Merey, Martin Baeuml, Zhifeng Chen, Laurent El Shafey, Yujing Zhang, Olcan Sercinoglu, George Tucker, Enrique Piqueras, Maxim Krikun, Iain Barr, Nikolay Savinov, Ivo Danihelka, Becca Roelofs, Anaïs White, Anders Andreassen, Tamara von Glehn, Lakshman Yagati, Mehran Kazemi, Lucas Gonzalez, Misha Khalman, Jakub Sygnowski, Alexandre Frechette, Charlotte Smith, Laura Culp, Lev Proleev, Yi Luan, Xi Chen, James Lottes, Nathan Schucher, Federico Lebron, Alban Rrustemi, Natalie Clay, Phil Crone, Tomas Kocisky, Jeffrey Zhao, Bartek Perz, Dian Yu, Heidi Howard, Adam Bloniarz, Jack W. Rae, Han Lu, Laurent Sifre, Marcello Maggioni, Fred Alcober, Dan Garrette, Megan Barnes, Shantanu Thakoor, Jacob Austin, Gabriel Barth-Maron, William Wong, Rishabh Joshi, Rahma Chaabouni, Deeni Fatiha, Arun Ahuja, Gaurav Singh Tomar, Evan Senter, Martin Chadwick, Ilya Kornakov, Nithya Attaluri, Iñaki Iturrate, Ruibo Liu, Yunxuan Li, Sarah Cogan, Jeremy Chen, Chao Jia, Chenjie Gu, Qiao Zhang, Jordan Grimstad, Ale Jakse Hartman, Xavier Garcia, Thanumalayan Sankaranarayana Pillai, Jacob Devlin, Michael Laskin, Diego de Las Casas, Dasha Valter, Connie Tao, Lorenzo Blanco, Adrià Puigdomènech Badia, David Reitter, Mianna Chen, Jenny Brennan, Clara Rivera, Sergey Brin, Shariq Iqbal, Gabriela Surita, Jane Labanowski, Abhi Rao, Stephanie Winkler, Emilio Parisotto, Yiming Gu, Kate Olszewska, Ravi Addanki, Antoine Miech, Annie Louis, Denis Teplyashin, Geoff Brown, Elliot Catt, Jan Balaguer, Jackie Xiang, Pidong Wang, Zoe Ashwood, Anton Briukhov, Albert Webson, Sanjay Ganapathy, Smit Sanghavi, Ajay Kannan, Ming-Wei Chang, Axel Stjerngren, Josip Djolonga, Yuting Sun, Ankur Bapna, Matthew Aitchison, Pedram Pejman, Henryk Michalewski, Tianhe Yu, Cindy Wang, Juliette Love, Junwhan Ahn, Dawn Bloxwich, Kehang Han, Peter Humphreys, Thibault Sellam, James Bradbury, Varun Godbole, Sina Samangooei, Bogdan Damoc, Alex Kaskasoli, Sébastien M. R. Arnold, Vijay Vasudevan, Shubham Agrawal, Jason Riesa, Dmitry Lepikhin, Richard Tanburn, Srivatsan Srinivasan, Hyeontaek Lim, Sarah Hodkinson, Pranav Shyam, Johan Ferret, Steven Hand, Ankush Garg, Tom Le Paine, Jian Li, Yujia Li, Minh Giang, Alexander Neitz, Zaheer Abbas, Sarah York, Machel Reid, Elizabeth Cole, Aakanksha Chowdhery, Dipanjan Das, Dominika Rogozińska, Vitaliy Nikolaev, Pablo Sprechmann, Zachary Nado, Lukas Zilka, Flavien Prost, Luheng He, Marianne Monteiro, Gaurav Mishra, Chris Welty, Josh Newlan, Dawei Jia, Miltiadis Allamanis, Clara Huiyi Hu, Raoul de Liedekerke, Justin Gilmer, Carl Saroufim, Shruti Rijhwani, Shaobo Hou, Disha Shrivastava, Anirudh Baddepudi, Alex Goldin, Adnan Ozturel, Albin Cassirer, Yunhan Xu, Daniel Sohn, Devendra Sachan, Reinald Kim Amplayo, Craig Swanson, Dessie Petrova, Shashi Narayan, Arthur Guez, Siddhartha Brahma, Jessica Landon, Miteyan Patel, Ruizhe Zhao, Kevin Villela, Luyu Wang, Wenhao Jia, Matthew Rahtz, Mai Giménez, Legg Yeung, James Keeling, Petko Georgiev, Diana Mincu, Boxi Wu, Salem Haykal, Rachel Saputro, Kiran Vodrahalli, James Qin, Zeynep Cankara, Abhanshu Sharma, Nick Fernando, Will Hawkins, Behnam Neyshabur, Solomon Kim, Adrian Hutter, Priyanka Agrawal, Alex Castro-Ros, George van den Driessche, Tao Wang, Fan Yang, Shuo-yiin Chang, Paul Komarek, Ross McIlroy, Mario Lučić, Guodong Zhang, Wael Farhan, Michael Sharman, Paul Natsev, Paul Michel, Yamini Bansal, Siyuan Qiao, Kris Cao, Siamak Shakeri, Christina Butterfield, Justin Chung, Paul Kishan Rubenstein, Shivani Agrawal, Arthur Mensch, Kedar Soparkar, Karel Lenc, Timothy Chung, Aedan Pope, Loren Maggiore, Jackie Kay, Priya Jhakra, Shibo Wang, Joshua Maynez, Mary Phuong, Taylor Tobin, Andrea Tacchetti, Maja Trebacz, Kevin Robinson, Yash Katariya, Sebastian Riedel, Paige Bailey, Kefan Xiao, Nimesh Ghelani, Lora Aroyo, Ambrose Slone, Neil Houlsby, Xuehan Xiong, Zhen Yang, Elena Gribovskaya, Jonas Adler, Mateo Wirth, Lisa Lee, Music Li, Thais Kagohara, Jay Pavagadhi, Sophie Bridgers, Anna Bortsova, Sanjay Ghemawat, Zafarali Ahmed, Tianqi Liu, Richard Powell, Vijay Bolina, Mariko Iinuma, Polina Zablotskaia, James Besley, Da-Woon Chung, Timothy Dozat, Ramona Comanescu, Xiance Si, Jeremy Greer, Guolong Su, Martin Polacek, Raphaël Lopez Kaufman, Simon Tokumine, Hexiang Hu, Elena Buchatskaya, Yingjie Miao, Mohamed Elhawaty, Aditya Siddhant, Nenad Tomasev, Jinwei Xing, Christina Greer, Helen Miller, Shereen Ashraf, Aurko Roy, Zizhao Zhang, Ada Ma, Angelos Filos, Milos Besta, Rory Blevins, Ted Klimenko, Chih-Kuan Yeh, Soravit Changpinyo, Jiaqi Mu, Oscar Chang, Mantas Pajarskas, Carrie Muir, Vered Cohen, Charline Le Lan, Krishna Haridasan, Amit Marathe, Steven Hansen, Sholto Douglas, Rajkumar Samuel, Mingqiu Wang, Sophia Austin, Chang Lan, Jiepu Jiang, Justin Chiu, Jaime Alonso Lorenzo, Lars Lowe Sjösund, Sébastien Cevey, Zach Gleicher, Thi Avrahami, Anudhyan Boral, Hansa Srinivasan, Vittorio Selo, Rhys May, Konstantinos Aisopos, Léonard Hussenot, Livio Baldini Soares, Kate Baumli, Michael B. Chang, Adrià Recasens, Ben Caine, Alexander Pritzel, Filip Pavetic, Fabio Pardo, Anita Gergely, Justin Frye, Vinay Ramasesh, Dan Horgan, Kartikeya Badola, Nora Kassner, Subhrajit Roy, Ethan Dyer, Víctor Campos Campos, Alex Tomala, Yunhao Tang, Dalia El Badawy, Elspeth White, Basil Mustafa, Oran Lang, Abhishek Jindal, Sharad Vikram, Zhitao Gong, Sergi Caelles, Ross Hemsley, Gregory Thornton, Fangxiaoyu Feng, Wojciech Stokowiec, Ce Zheng, Phoebe Thacker, Çağlar Ünlü, Zhishuai Zhang, Mohammad Saleh, James Svensson, Max Bileschi, Piyush Patil, Ankesh Anand, Roman Ring, Katerina Tsihlas, Arpi Vezer, Marco Selvi, Toby Shevlane, Mikel Rodriguez, Tom Kwiatkowski, Samira Daruki, Keran Rong, Allan Dafoe, Nicholas FitzGerald, Keren Gu-Lemberg, Mina Khan, Lisa Anne Hendricks, Marie Pellat, Vladimir Feinberg, James Cobon-Kerr, Tara Sainath, Maribeth Rauh, Sayed Hadi Hashemi, Richard Ives, Yana Hasson, Eric Noland, Yuan Cao, Nathan Byrd, Le Hou, Qingze Wang, Thibault Sottiaux, Michela Paganini, Jean-Baptiste Lespiau, Alexandre Moufarek, Samer Hassan, Kaushik Shivakumar, Joost van Amersfoort, Amol Mandhane, Pratik Joshi, Anirudh Goyal, Matthew Tung, Andrew Brock, Hannah Sheahan, Vedant Misra, Cheng Li, Nemanja Rakićević, Mostafa Dehghani, Fangyu Liu, Sid Mittal, Junhyuk Oh, Seb Noury, Eren Sezener, Fantine Huot, Matthew Lamm, Nicola De Cao, Charlie Chen, Sidharth Mudgal, Romina Stella, Kevin Brooks, Gautam Vasudevan, Chenxi Liu, Mainak Chain, Nivedita Melinkeri, Aaron Cohen, Venus Wang, Kristie Seymore, Sergey Zubkov, Rahul Goel, Summer Yue, Sai Krishnakumaran, Brian Albert, Nate Hurley, Motoki Sano, Anhad Mohananey, Jonah Joughin, Egor Filonov, Tomasz Kępa, Yomna Eldawy, Jiawern Lim, Rahul Rishi, Shirin Badiezadegan, Taylor Bos, Jerry Chang, Sanil Jain, Sri Gayatri Sundara Padmanabhan, Subha Puttagunta, Kalpesh Krishna, Leslie Baker, Norbert Kalb, Vamsi Bedapudi, Adam Kurzrok, Shuntong Lei, Anthony Yu, Oren Litvin, Xiang Zhou, Zhichun Wu, Sam Sobell, Andrea Siciliano, Alan Papir, Robby Neale, Jonas Bragagnolo, Tej Toor, Tina Chen, Valentin Anklin, Feiran Wang, Richie Feng, Milad Gholami, Kevin Ling, Lijuan Liu, Jules Walter, Hamid Moghaddam, Arun Kishore, Jakub Adamek, Tyler Mercado, Jonathan Mallinson, Siddhinita Wandekar, Stephen Cagle, Eran Ofek, Guillermo Garrido, Clemens Lombriser, Maksim Mukha, Botu Sun, Hafeezul Rahman Mohammad, Josip Matak, Yadi Qian, Vikas Peswani, Pawel Janus, Quan Yuan, Leif Schelin, Oana David, Ankur Garg, Yifan He, Oleksii Duzhyi, Anton Älgmyr, Timothée Lottaz, Qi Li, Vikas Yadav, Luyao Xu, Alex Chinien, Rakesh Shivanna, Aleksandr Chuklin, Josie Li, Carrie Spadine, Travis Wolfe, Kareem Mohamed, Subhabrata Das, Zihang Dai, Kyle He, Daniel von Dincklage, Shyam Upadhyay, Akanksha Maurya, Luyan Chi, Sebastian Krause, Khalid Salama, Pam G Rabinovitch, Pavan Kumar Reddy M, Aarush Selvan, Mikhail Dektiarev, Golnaz Ghiasi, Erdem Guven, Himanshu Gupta, Boyi Liu, Deepak Sharma, Idan Heimlich Shtacher, Shachi Paul, Oscar Akerlund, François-Xavier Aubet, Terry Huang, Chen Zhu, Eric Zhu, Elico Teixeira, Matthew Fritze, Francesco Bertolini, Liana-Eleonora Marinescu, Martin Bölle, Dominik Paulus, Khyatti Gupta, Tejasi Latkar, Max Chang, Jason Sanders, Roopa Wilson, Xuewei Wu, Yi-Xuan Tan, Lam Nguyen Thiet, Tulsee Doshi, Sid Lall, Swaroop Mishra, Wanming Chen, Thang Luong, Seth Benjamin, Jasmine Lee, Ewa Andrejczuk, Dominik Rabiej, Vipul Ranjan, Krzysztof Styrc, Pengcheng Yin, Jon Simon, Malcolm Rose Harriott, Mudit Bansal, Alexei Robsky, Geoff Bacon, David Greene, Daniil Mirylenka, Chen Zhou, Obaid Sarvana, Abhimanyu Goyal, Samuel Andermatt, Patrick Siegler, Ben Horn, Assaf Israel, Francesco Pongetti, Chih-Wei "Louis" Chen, Marco Selvatici, Pedro Silva, Kathie Wang, Jackson Tolins, Kelvin Guu, Roey Yogev, Xiaochen Cai, Alessandro Agostini, Maulik Shah, Hung Nguyen, Noah Ó Donnaile, Sébastien Pereira, Linda Friso, Adam Stambler, Adam Kurzrok, Chenkai Kuang, Yan Romanikhin, Mark Geller, ZJ Yan, Kane Jang, Cheng-Chun Lee, Wojciech Fica, Eric Malmi, Qijun Tan, Dan Banica, Daniel Balle, Ryan Pham, Yanping Huang, Diana Avram, Hongzhi Shi, Jasjot Singh, Chris Hidey, Niharika Ahuja, Pranab Saxena, Dan Dooley, Srividya Pranavi Potharaju, Eileen O'Neill, Anand Gokulchandran, Ryan Foley, Kai Zhao, Mike Dusenberry, Yuan Liu, Pulkit Mehta, Ragha Kotikalapudi, Chalence Safranek-Shrader, Andrew Goodman, Joshua Kessinger, Eran Globen, Prateek Kolhar, Chris Gorgolewski, Ali Ibrahim, Yang Song, Ali Eichenbaum, Thomas Brovelli, Sahitya Potluri, Preethi Lahoti, Cip Baetu, Ali Ghorbani, Charles Chen, Andy Crawford, Shalini Pal, Mukund Sridhar, Petru Gurita, Asier Mujika, Igor Petrovski, Pierre-Louis Cedoz, Chenmei Li, Shiyuan Chen, Niccolò Dal Santo, Siddharth Goyal, Jitesh Punjabi, Karthik Kappaganthu, Chester Kwak, Pallavi LV, Sarmishta Velury, Himadri Choudhury, Jamie Hall, Premal Shah, Ricardo Figueira, Matt Thomas, Minjie Lu, Ting Zhou, Chintu Kumar, Thomas Jurdi, Sharat Chikkerur, Yenai Ma, Adams Yu, Soo Kwak, Victor Ähdel, Sujeevan Rajayogam, Travis Choma, Fei Liu, Aditya Barua, Colin Ji, Ji Ho Park, Vincent Hellendoorn, Alex Bailey, Taylan Bilal, Huanjie Zhou, Mehrdad Khatir, Charles Sutton, Wojciech Rzadkowski, Fiona Macintosh, Konstantin Shagin, Paul Medina, Chen Liang, Jinjing Zhou, Pararth Shah, Yingying Bi, Attila Dankovics, Shipra Banga, Sabine Lehmann, Marissa Bredesen, Zifan Lin, John Eric Hoffmann, Jonathan Lai, Raynald Chung, Kai Yang, Nihal Balani, Arthur Bražinskas, Andrei Sozanschi, Matthew Hayes, Héctor Fernández Alcalde, Peter Makarov, Will Chen, Antonio Stella, Liselotte Snijders, Michael Mandl, Ante Kärrman, Paweł Nowak, Xinyi Wu, Alex Dyck, Krishnan Vaidyanathan, Raghavender R, Jessica Mallet, Mitch Rudominer, Eric Johnston, Sushil Mittal, Akhil Udathu, Janara Christensen, Vishal Verma, Zach Irving, Andreas Santucci, Gamaleldin Elsayed, Elnaz Davoodi, Marin Georgiev, Ian Tenney, Nan Hua, Geoffrey Cideron, Edouard Leurent, Mahmoud Alnahlawi, Ionut Georgescu, Nan Wei, Ivy Zheng, Dylan Scandinaro, Heinrich Jiang, Jasper Snoek, Mukund Sundararajan, Xuezhi Wang, Zack Ontiveros, Itay Karo, Jeremy Cole, Vinu Rajashekhar, Lara Tumeh, Eyal Ben-David, Rishub Jain, Jonathan Uesato, Romina Datta, Oskar Bunyan, Shimu Wu, John Zhang, Piotr Stanczyk, Ye Zhang, David Steiner, Subhajit Naskar, Michael Azzam, Matthew Johnson, Adam Paszke, Chung-Cheng Chiu, Jaume Sanchez Elias, Afroz Mohiuddin, Faizan Muhammad, Jin Miao, Andrew Lee, Nino Vieillard, Jane Park, Jiageng Zhang, Jeff Stanway, Drew Garmon, Abhijit Karmarkar, Zhe Dong, Jong Lee, Aviral Kumar, Luowei Zhou, Jonathan Evens, William Isaac, Geoffrey Irving, Edward Loper, Michael Fink, Isha Arkatkar, Nanxin Chen, Izhak Shafran, Ivan Petrychenko, Zhe Chen, Johnson Jia, Anselm Levskaya, Zhenkai Zhu, Peter Grabowski, Yu Mao, Alberto Magni, Kaisheng Yao, Javier Snaider, Norman Casagrande, Evan Palmer, Paul Suganthan, Alfonso Castaño, Irene Giannoumis, Wooyeol Kim, Mikołaj Rybiński, Ashwin Sreevatsa, Jennifer Prendki, David Soergel, Adrian Goedeckemeyer, Willi Gierke, Mohsen Jafari, Meenu Gaba, Jeremy Wiesner, Diana Gage Wright, Yawen Wei, Harsha Vashisht, Yana Kulizhskaya, Jay Hoover, Maigo Le, Lu Li, Chimezie Iwuanyanwu, Lu Liu, Kevin Ramirez, Andrey Khorlin, Albert Cui, Tian LIN, Marcus Wu, Ricardo Aguilar, Keith Pallo, Abhishek Chakladar, Ginger Perng, Elena Allica Abellan, Mingyang Zhang, Ishita Dasgupta, Nate Kushman, Ivo Penchev, Alena Repina, Xihui Wu, Tom van der Weide, Priya Ponnapalli, Caroline Kaplan, Jiri Simsa, Shuangfeng Li, Olivier Dousse, Fan Yang, Jeff Piper, Nathan Ie, Rama Pasumarthi, Nathan Lintz, Anitha Vijayakumar, Daniel Andor, Pedro Valenzuela, Minnie Lui, Cosmin Paduraru, Daiyi Peng, Katherine Lee, Shuyuan Zhang, Somer Greene, Duc Dung Nguyen, Paula Kurylowicz, Cassidy Hardin, Lucas Dixon, Lili Janzer, Kiam Choo, Ziqiang Feng, Biao Zhang, Achintya Singhal, Dayou Du, Dan McKinnon, Natasha Antropova, Tolga Bolukbasi, Orgad Keller, David Reid, Daniel Finchelstein, Maria Abi Raad, Remi Crocker, Peter Hawkins, Robert Dadashi, Colin Gaffney, Ken Franko, Anna Bulanova, Rémi Leblond, Shirley Chung, Harry Askham, Luis C. Cobo, Kelvin Xu, Felix Fischer, Jun Xu, Christina Sorokin, Chris Alberti, Chu-Cheng Lin, Colin Evans, Alek Dimitriev, Hannah Forbes, Dylan Banarse, Zora Tung, Mark Omernick, Colton Bishop, Rachel Sterneck, Rohan Jain, Jiawei Xia, Ehsan Amid, Francesco Piccinno, Xingyu Wang, Praseem Banzal, Daniel J. Mankowitz, Alex Polozov, Victoria Krakovna, Sasha Brown, MohammadHossein Bateni, Dennis Duan, Vlad Firoiu, Meghana Thotakuri, Tom Natan, Matthieu Geist, Ser tan Girgin, Hui Li, Jiayu Ye, Ofir Roval, Reiko Tojo, Michael Kwong, James Lee-Thorp, Christopher Yew, Danila Sinopalnikov, Sabela Ramos, John Mellor, Abhishek Sharma, Kathy Wu, David Miller, Nicolas Sonnerat, Denis Vnukov, Rory Greig, Jennifer Beattie, Emily Caveness, Libin Bai, Julian Eisenschlos, Alex Korchemniy, Tomy Tsai, Mimi Jasarevic, Weize Kong, Phuong Dao, Zeyu Zheng, Frederick Liu, Fan Yang, Rui Zhu, Tian Huey Teh, Jason Sanmiya, Evgeny Gladchenko, Nejc Trdin, Daniel Toyama, Evan Rosen, Sasan Tavakkol, Linting Xue, Chen Elkind, Oliver Woodman, John Carpenter, George Papamakarios, Rupert Kemp, Sushant Kafle, Tanya Grunina, Rishika Sinha, Alice Talbert, Diane Wu, Denese Owusu-Afriyie, Cosmo Du, Chloe Thornton, Jordi Pont-Tuset, Pradyumna Narayana, Jing Li, Saaber Fatehi, John Wieting, Omar Ajmeri, Benigno Uria, Yeongil Ko, Laura Knight, Amélie Héliou, Ning Niu, Shane Gu, Chenxi Pang, Yeqing Li, Nir Levine, Ariel Stolovich, Rebeca Santamaria-Fernandez, Sonam Goenka, Wenny Yustalim, Robin Strudel, Ali Elqursh, Charlie Deck, Hyo Lee, Zonglin Li, Kyle Levin, Raphael Hoffmann, Dan Holtmann-Rice, Olivier Bachem, Sho Arora, Christy Koh, Soheil Hassas Yeganeh, Siim Põder, Mukarram Tariq, Yanhua Sun, Lucian Ionita, Mojtaba Seyedhosseini, Pouya Tafti, Zhiyu Liu, Anmol Gulati, Jasmine Liu, Xinyu Ye, Bart Chrzaszcz, Lily Wang, Nikhil Sethi, Tianrun Li, Ben Brown, Shreya Singh, Wei Fan, Aaron Parisi, Joe Stanton, Vinod Koverkathu, Christopher A. Choquette-Choo, Yunjie Li, TJ Lu, Abe Ittycheriah, Prakash Shroff, Mani Varadarajan, Sanaz Bahargam, Rob Willoughby, David Gaddy, Guillaume Desjardins, Marco Cornero, Brona Robenek, Bhavishya Mittal, Ben Albrecht, Ashish Shenoy, Fedor Moiseev, Henrik Jacobsson, Alireza Ghaffarkhah, Morgane Rivière, Alanna Walton, Clément Crepy, Alicia Parrish, Zongwei Zhou, Clement Farabet, Carey Radebaugh, Praveen Srinivasan, Claudia van der Salm, Andreas Fidjeland, Salvatore Scellato, Eri Latorre-Chimoto, Hanna Klimczak-Plucińska, David Bridson, Dario de Cesare, Tom Hudson, Piermaria Mendolicchio, Lexi Walker, Alex Morris, Matthew Mauger, Alexey Guseynov, Alison Reid, Seth Odoom, Lucia Loher, Victor Cotruta, Madhavi Yenugula, Dominik Grewe, Anastasia Petrushkina, Tom Duerig, Antonio Sanchez, Steve Yadlowsky, Amy Shen, Amir Globerson, Lynette Webb, Sahil Dua, Dong Li, Surya Bhupatiraju, Dan Hurt, Haroon Qureshi, Ananth Agarwal, Tomer Shani, Matan Eyal, Anuj Khare, Shreyas Rammohan Belle, Lei Wang, Chetan Tekur, Mihir Sanjay Kale, Jinliang Wei, Ruoxin Sang, Brennan Saeta, Tyler Liechty, Yi Sun, Yao Zhao, Stephan Lee, Pandu Nayak, Doug Fritz, Manish Reddy Vuyyuru, John Aslanides, Nidhi Vyas, Martin Wicke, Xiao Ma, Evgenii Eltyshev, Nina Martin, Hardie Cate, James Manyika, Keyvan Amiri, Yelin Kim, Xi Xiong, Kai Kang, Florian Luisier, Nilesh Tripuraneni, David Madras, Mandy Guo, Austin Waters, Oliver Wang, Joshua Ainslie, Jason Baldridge, Han Zhang, Garima Pruthi, Jakob Bauer, Feng Yang, Riham Mansour, Jason Gelman, Yang Xu, George Polovets, Ji Liu, Honglong Cai, Warren Chen, XiangHai Sheng, Emily Xue, Sherjil Ozair, Christof Angermueller, Xiaowei Li, Anoop Sinha, Weiren Wang, Julia Wiesinger, Emmanouil Koukoumidis, Yuan Tian, Anand Iyer, Madhu Gurumurthy, Mark Goldenson, Parashar Shah, MK Blake, Hongkun Yu, Anthony Urbanowicz, Jennimaria Palomaki, Chrisantha Fernando, Ken Durden, Harsh Mehta, Nikola Momchev, Elahe Rahimtoroghi, Maria Georgaki, Amit Raul, Sebastian Ruder, Morgan Redshaw, Jinhyuk Lee, Denny Zhou, Komal Jalan, Dinghua Li, Blake Hechtman, Parker Schuh, Milad Nasr, Kieran Milan, Vladimir Mikulik, Juliana Franco, Tim Green, Nam Nguyen, Joe Kelley, Aroma Mahendru, Andrea Hu, Joshua Howland, Ben Vargas, Jeffrey Hui, Kshitij Bansal, Vikram Rao, Rakesh Ghiya, Emma Wang, Ke Ye, Jean Michel Sarr, Melanie Moranski Preston, Madeleine Elish, Steve Li, Aakash Kaku, Jigar Gupta, Ice Pasupat, Da-Cheng Juan, Milan Someswar, Tejvi M., Xinyun Chen, Aida Amini, Alex Fabrikant, Eric Chu, Xuanyi Dong, Amruta Muthal, Senaka Buthpitiya, Sarthak Jauhari, Nan Hua, Urvashi Khandelwal, Ayal Hitron, Jie Ren, Larissa Rinaldi, Shahar Drath, Avigail Dabush, Nan-Jiang Jiang, Harshal Godhia, Uli Sachs, Anthony Chen, Yicheng Fan, Hagai Taitelbaum, Hila Noga, Zhuyun Dai, James Wang, Chen Liang, Jenny Hamer, Chun-Sung Ferng, Chenel Elkind, Aviel Atias, Paulina Lee, Vít Listík, Mathias Carlen, Jan van de Kerkhof, Marcin Pikus, Krunoslav Zaher, Paul Müller, Sasha Zykova, Richard Stefanec, Vitaly Gatsko, Christoph Hirnschall, Ashwin Sethi, Xingyu Federico Xu, Chetan Ahuja, Beth Tsai, Anca Stefanoiu, Bo Feng, Keshav Dhandhania, Manish Katyal, Akshay Gupta, Atharva Parulekar, Divya Pitta, Jing Zhao, Vivaan Bhatia, Yashodha Bhavnani, Omar Alhadlaq, Xiaolin Li, Peter Danenberg, Dennis Tu, Alex Pine, Vera Filippova, Abhipso Ghosh, Ben Limonchik, Bhargava Urala, Chaitanya Krishna Lanka, Derik Clive, Yi Sun, Edward Li, Hao Wu, Kevin Hongtongsak, Ianna Li, Kalind Thakkar, Kuanysh Omarov, Kushal Majmundar, Michael Alverson, Michael Kucharski, Mohak Patel, Mudit Jain, Maksim Zabelin, Paolo Pelagatti, Rohan Kohli, Saurabh Kumar, Joseph Kim, Swetha Sankar, Vineet Shah, Lakshmi Ramachandruni, Xiangkai Zeng, Ben Bariach, Laura Weidinger, Tu Vu, Alek Andreev, Antoine He, Kevin Hui, Sheleem Kashem, Amar Subramanya, Sissie Hsiao, Demis Hassabis, Koray Kavukcuoglu, Adam Sadovsky, Quoc Le, Trevor Strohman, Yonghui Wu, Slav Petrov, Jeffrey Dean, Oriol Vinyals,

(参考訳) 本報告では,画像,音声,ビデオ,テキスト理解の両面で優れた機能を示す,新しいマルチモーダルモデルであるGeminiを紹介する。 GeminiファミリーはUltra、Pro、Nanoサイズで構成されており、複雑な推論タスクからオンデバイスメモリ制約のユースケースまで幅広い用途に適している。幅広いベンチマークに対する評価は、我々の最も有能なGemini Ultraモデルが、これらのベンチマークのうち32のベンチマークのうち30の最先端モデルに進歩していることを示している - 特に、よく研究された試験ベンチマークMMLUで人為的なパフォーマンスを達成した最初のモデルであり、調査した20のマルチモーダルベンチマークのうちの1つで最先端モデルが改善されている。 Geminiファミリーのクロスモーダル推論と言語理解における新機能によって、さまざまなユースケースが実現できると考えています。 Gemini、Gemini Advanced、Google AI Studio、Cloud Vertex AIといったサービスを通じて、ユーザに対して責任を負うような、ゲミニモデルのポストトレーニングとデプロイに対する当社のアプローチについて議論する。

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

翻訳日:2024-06-20 04:44:38 公開日:2024-06-17

# マルチフェイスAIフィードバックを用いた感情支援会話における不愉快さの軽減

Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback ( http://arxiv.org/abs/2401.05928v3 )

ライセンス: Link先を確認

Jiashuo Wang, Chunpu Xu, Chak Tou Leong, Wenjie Li, Jing Li,

(参考訳) 情緒的支援会話システムは,ユーザの情緒的苦痛を軽減し,彼らの課題への対処を支援することを目的としている。支援的応答を生成するためには, 共感, 支援戦略, 応答コヒーレンスといった複数の要因を, 従来手法で確立されていた方法を考えることが重要である。それにもかかわらず、以前のモデルは時折、サポートを提供するが、反生産的な効果を示す、不快な反応を発生させる。心理学やコミュニケーション理論によれば、たった一つの要因における粗悪なパフォーマンスは、応答が弱くなる可能性がある。モデルトレーニングの観点からは、これらのモデルがトレーニングフェーズ中に不十分な応答にさらされていないため、トークンが推論中に不必要な応答をもたらすかどうかを区別できない。この問題に対処するために、感情支援のための多面的AIフィードバック(Muffin)を用いて、不健康を緩和する新しいモデル非依存フレームワークを導入する。具体的には、Muffin氏は、複数の要因を考慮して、特定のモデルによって生成された応答の有用性を評価するために、多面的なAIフィードバックモジュールを使用している。対照的な学習を用いることで、有用なものに比べて、不必要な応答を生成するモデルの可能性を減らすことができる。実験結果から,Muffinは応答頻度と応答関連性をわずかに増加させながら,非ヘルペス反応の発生を効果的に軽減することが示された。

An emotional support conversation system aims to alleviate users' emotional distress and assist them in addressing their challenges. To generate supportive responses, it is critical to consider multiple factors such as empathy, support strategies, and response coherence, as established in prior methods. Nonetheless, previous models occasionally generate unhelpful responses, which intend to provide support but display counterproductive effects. According to psychology and communication theories, poor performance in just one contributing factor might cause a response to be unhelpful. From the model training perspective, since these models have not been exposed to unhelpful responses during their training phase, they are unable to distinguish if the tokens they generate might result in unhelpful responses during inference. To address this issue, we introduce a novel model-agnostic framework named mitigating unhelpfulness with multifaceted AI feedback for emotional support (Muffin). Specifically, Muffin employs a multifaceted AI feedback module to assess the helpfulness of responses generated by a specific model with consideration of multiple factors. Using contrastive learning, it then reduces the likelihood of the model generating unhelpful responses compared to the helpful ones. Experimental results demonstrate that Muffin effectively mitigates the generation of unhelpful responses while slightly increasing response fluency and relevance.

翻訳日:2024-06-20 04:44:38 公開日:2024-06-17

# 大規模言語モデルによる勧告の多様性向上

Enhancing Recommendation Diversity by Re-ranking with Large Language Models ( http://arxiv.org/abs/2401.11506v2 )

ライセンス: Link先を確認

Diego Carraro, Derek Bridge,

(参考訳) Recommender System(RS)がユーザとの関係性のみに基づいてレコメンデーションを提供するのに十分ではないと長年認識されてきた。その他の多くの基準の中で、レコメンデーションのセットは多様である必要があるかもしれない。多様性は、レコメンデーションの不確実性に対処し、レコメンデーションがユーザーに有意義な選択を提供することを保証する方法の1つである。この文献は、様々な方法で多様性を計測し、一連のレコメンデーションの多様性を改善する方法を報告している。本稿では,多目的言語モデル(LLM)をRSパイプラインに組み込む方法について,文献から有望な洞察を得られた上で,LLMが多様性の再評価にどのように使用できるかを示す。まず、LCMがタスクの再ランク付けに利用でき、アイテムの多様性の概念をある程度理解できるという非公式な研究から始める。そこで我々は,ゼロショット方式の異なるプロンプトテンプレートを用いて,LLMが候補ランキングから多様なランキングを生成するための厳密な手法を設計する。我々はGPTファミリーとLlamaファミリーから最先端LLMの総合的な実験を行った。本論文では,それらの再ランク化能力と,ランダムな再ランク化手法とを比較検討する。再現性のための実験のコードをオープンソースにしています。 LLMを用いたリランカーのトレードオフ(性能やコストなど)は, ランダムなリランカーよりも優れているが, 従来のリランカーよりも劣っていることが示唆された。しかし、LLMアプローチは有望である。 LLMは、多くの自然言語処理およびレコメンデーションタスクの性能を改善し、推論コストを低減した。これらの傾向を踏まえると、LSMベースのリランクが近いうちに競争力を高めることが期待できる。

It has long been recognized that it is not enough for a Recommender System (RS) to provide recommendations based only on their relevance to users. Among many other criteria, the set of recommendations may need to be diverse. Diversity is one way of handling recommendation uncertainty and ensuring that recommendations offer users a meaningful choice. The literature reports many ways of measuring diversity and improving the diversity of a set of recommendations, most notably by re-ranking and selecting from a larger set of candidate recommendations. Driven by promising insights from the literature on how to incorporate versatile Large Language Models (LLMs) into the RS pipeline, in this paper we show how LLMs can be used for diversity re-ranking. We begin with an informal study that verifies that LLMs can be used for re-ranking tasks and do have some understanding of the concept of item diversity. Then, we design a more rigorous methodology where LLMs are prompted to generate a diverse ranking from a candidate ranking using various prompt templates with different re-ranking instructions in a zero-shot fashion. We conduct comprehensive experiments testing state-of-the-art LLMs from the GPT and Llama families. We compare their re-ranking capabilities with random re-ranking and various traditional re-ranking methods from the literature. We open-source the code of our experiments for reproducibility. Our findings suggest that the trade-offs (in terms of performance and costs, among others) of LLM-based re-rankers are superior to those of random re-rankers but, as yet, inferior to the ones of traditional re-rankers. However, the LLM approach is promising. LLMs exhibit improved performance on many natural language processing and recommendation tasks and lower inference costs. Given these trends, we can expect LLM-based re-ranking to become more competitive soon.

翻訳日:2024-06-20 04:34:53 公開日:2024-06-17

# ほとんど)コストがかからない安全ファインチューニング - 大規模言語モデルのためのベースライン

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models ( http://arxiv.org/abs/2402.02207v2 )

ライセンス: Link先を確認

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, Timothy Hospedales,

(参考訳) 現在の視覚大言語モデル(VLLM)は、有害なコンテンツを生成しやすく、最も単純なジェイルブレイク攻撃にも脆弱である。我々の初期分析では、視覚言語指導の微調整中に有害なデータが存在することが原因であり、VLLM微調整は、以前にLLMが学習した安全アライメントの忘れを生じさせる可能性がある。この問題に対処するために、まず、様々な有害なカテゴリをカバーする視覚言語安全な命令追従データセットVLGuardをキュレートする。我々の実験は、このデータセットを標準的な視覚言語による微調整に統合するか、あるいはポストホックな微調整に利用することで、VLLMを効果的に適合させることを示した。このアライメントは、モデルの有用性に最小限の影響、あるいは強化することで達成される。安全微調整データセットの汎用性により、既存のVLLMの安全性テスト、新しいモデルのトレーニング、トレーニング済みのVLLMの保護に有用なリソースになります。実験の結果, 微調整VLLMは安全でない命令を効果的に拒否し, ブラックボックス攻撃の成功率を大幅に低下させ, 多くの場合ゼロに近づいた。コードとデータセットはhttps://github.com/ys-zong/VLGuard.comで公開されている。

Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM. To address this issue, we first curate a vision-language safe instruction-following dataset VLGuard covering various harmful categories. Our experiments demonstrate that integrating this dataset into standard vision-language fine-tuning or utilizing it for post-hoc fine-tuning effectively safety aligns VLLMs. This alignment is achieved with minimal impact on, or even enhancement of, the models' helpfulness. The versatility of our safety fine-tuning dataset makes it a valuable resource for safety-testing existing VLLMs, training new models or safeguarding pre-trained VLLMs. Empirical results demonstrate that fine-tuned VLLMs effectively reject unsafe instructions and substantially reduce the success rates of several black-box adversarial attacks, which approach zero in many cases. The code and dataset are available at https://github.com/ys-zong/VLGuard.

翻訳日:2024-06-20 04:25:08 公開日:2024-06-17

# ストリーム上の効率的な推論のためのオンラインカスケード学習

Online Cascade Learning for Efficient Inference over Streams ( http://arxiv.org/abs/2402.04513v3 )

ライセンス: Link先を確認

Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri,

(参考訳) 大規模言語モデル(LLM)は、データストリームに関する複雑なクエリに応答する自然な役割を持つが、LLM推論の計算コストが高いため、そのようなタスクの多くでは実現不可能である。この課題に対処する最初のアプローチであるオンラインカスケード学習を提案する。ここでの目的はモデルの"カスケード"を学習することであり、まず低容量モデル(ロジスティック回帰など)から始まり、与えられた入力で使用するモデルを決定する遅延ポリシーとともに強力なLCMで終わる。そこで我々は,LLMの実演を模擬した小さなモデルを時間とともに更新し,その問題に対する非回帰アルゴリズムを与える,模擬学習問題として,オンラインでカスケードを学習するタスクを定式化する。 4つのベンチマークによる実験結果から,提案手法は推定コストを最大90%削減し,入力分布シフトに対して強い堅牢性を付与し,ストリーム処理の有効性と適応性を実証した。

Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to address this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regression) and ending with a powerful LLM, along with a deferral policy that determines the model to be used on a given input. We formulate the task of learning cascades online as an imitation-learning problem, where smaller models are updated over time imitating the collected LLM demonstrations, and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing.

翻訳日:2024-06-20 04:25:08 公開日:2024-06-17

# 色空間は1つだけ:低照度画像強調のための効率的なネットワーク

You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement ( http://arxiv.org/abs/2402.05809v3 )

ライセンス: Link先を確認

Qingsen Yan, Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang,

(参考訳) 低照度画像強調(LLIE)タスクは、劣化した低照度画像から詳細と視覚情報を復元する傾向がある。既存のほとんどの手法は、sRGBとHSV色空間上のディープニューラルネットワーク(DNN)により、低/正常光画像間のマッピング関数を学習する。それでも、強調には画像信号の増幅が含まれており、これらの色空間を低信号対雑音比の低照度画像に適用することで、強調プロセスに感度と不安定性をもたらす可能性がある。その結果、拡張された画像に色アーティファクトと明るさアーティファクトが存在することが判明した。この問題を軽減するために,HVI (Horizontal/Vertical-Intensity) と呼ばれる新しいトレーニング可能なカラー空間を提案する。輝度と色をRGBチャネルから切り離して、拡張中の不安定性を緩和するだけでなく、トレーニング可能なパラメータによって異なる照明範囲の低照度画像にも適応する。さらに,分離した画像の明るさと色をHVI空間で処理するための2つの枝を持つ新しいカラー・インテンシティ・デカップリングネットワーク(CIDNet)を設計する。 CIDNet内では、低照度画像におけるノイズを抑えつつ、画像構造とコンテンツ情報の相互作用を容易にする軽量クロスアテンション(LCA)モジュールを導入している。最後に,22種類の定量定性的実験を行い,提案したCIDNetが11個のデータセットの最先端手法より優れていることを示した。コードはhttps://github.com/Fediory/HVI-CIDNetで公開されている。

Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-noise ratio can introduce sensitivity and instability into the enhancement process. Consequently, this results in the presence of color artifacts and brightness artifacts in the enhanced images. To alleviate this problem, we propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI). It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters. Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space. Within CIDNet, we introduce the Lightweight Cross-Attention (LCA) module to facilitate interaction between image structure and content information in both branches, while also suppressing noise in low-light images. Finally, we conducted 22 quantitative and qualitative experiments to show that the proposed CIDNet outperforms the state-of-the-art methods on 11 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet.

翻訳日:2024-06-20 04:25:08 公開日:2024-06-17

# 意思決定の決定力:低分散リスク制限監査とマルジナルマーク記録による選挙コンテスト

The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording ( http://arxiv.org/abs/2402.06515v4 )

ライセンス: Link先を確認

Benjamin Fuller, Rashmi Pai, Alexander Russell,

(参考訳) リスクリミット監査(リスクリミット監査、RLA)は、大規模な選挙の結果を検証する技術である。正確性に関する厳密な保証を提供する一方で、効率上の懸念と、それらが絶対的な結論ではなく統計的に提供しているという事実の両方によって広く採用が妨げられている。我々は、これらの困難に両立し、効率を改善し、統計力の質的な進歩を提供する新しい監査の家族を定義します。我々の新しい監査は、キャスト・ボイト・レコードの標準概念を再考することで、単一の決定ではなく、複数の可能なマーク解釈を宣言できるようにします。既存の監査インフラにマイナーな変更を加えるだけで、この単純な迅速さによって、大幅な効率改善が実現できることが示される。リスク制限比較監査はどちらも、Fuller、Harrison、Russell(IEEE Security & Privacy 2023)という形式的な意味で行われます。次に、競合監査と呼ぶ新しいタイプの選挙後監査を定義します。これにより、各候補者は、自身の勝利の主張を推し進めるキャスト・ボイト・レコード・テーブルを提供することができる。これらの監査が顕著なサンプル効率を示し、一定の数のサンプル(マージンとは無関係)でリスクを制御できることを実証する。これは、証明可能な音のオーディションとしては初めてのものです。これらの結果は、定量的な音質と完全性を保証するゲームベースのセキュリティモデルで定式化される。これらの監査は、従来のRSAによって確認された選挙結果のコンテストに対処する手段を提供する。

Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We attend to both of these difficulties, defining new families of audits that improve efficiency and offer qualitative advances in statistical power. Our new audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations rather than a single decision; this can reflect the presence of marginal marks, which appear regularly on hand-marked ballots. We show that this simple expedient can offer significant efficiency improvements with only minor changes to existing auditing infrastructure. We consider two ways of representing these marks, both yield risk-limiting comparison audits in the formal sense of Fuller, Harrison, and Russell (IEEE Security & Privacy 2023). We then define a new type of post-election audit we call a contested audit. These permit each candidate to provide a cast-vote record table advancing their own claim to victory. We prove that these audits offer remarkable sample efficiency, yielding control of risk with a constant number of samples (that is independent of margin). This is a first for an audit with provable soundness. These results are formulated in a game-based security model that specify quantitative soundness and completeness guarantees. These audits provide a means to handle contestation of election results affirmed by conventional RLAs.

翻訳日:2024-06-20 04:15:24 公開日:2024-06-17

# AuditLLM:マルチプローブアプローチによる大規模言語モデル監査ツール

AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach ( http://arxiv.org/abs/2402.09334v2 )

ライセンス: Link先を確認

Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, Chirag Shah,

(参考訳) 大規模言語モデル(LLM)は様々な分野に統合されており、信頼性と安全性が不可欠である。これは厳格な調査と監査を必要とし、実践的な応用におけるその有効性と信頼性を維持する。単一のクエリの様々なイテレーションにLLMを適用すると、その知識ベースや機能能力に潜在的な矛盾が生じる可能性がある。しかし、ワークフローの実行が容易で、技術的なしきい値が低いような監査を行うツールは欠落している。本稿では,様々なLLMの性能を方法論的に評価するための新しいツールである 'AuditLLM' を紹介する。 AuditLLMの主な機能は、1つの質問から導かれた複数のプローブをデプロイすることで、与えられたLCMを監査することで、モデルの理解や性能の不整合を検出することである。堅牢で信頼性があり、一貫性のあるLCMは、同じ質問の可変なフレーズ付きバージョンに対する意味論的に類似した応答を生成することが期待されている。この前提に基づいて、AuditLLMは、ユーザが提供した単一の入力質問に基づいて、LCMの一貫性を反映した容易に解釈可能な結果を生成する。あるレベルの矛盾が潜在的なバイアス、幻覚、その他の問題の指標であることが示されている。次に AuditLLM の出力を使用して、前述の LLM の問題をさらに調査することができる。 1)リアルタイムクエリに対する応答を解析してLLMの即時監査を可能にするライブモードと,(2)奥行き分析のために複数のクエリを一度に処理することで総合的なLLM監査を容易にするバッチモードである。このツールは,標準監査プラットフォームを用いて,LLMの応答生成能力の理解を深めることによって,研究者と一般ユーザ双方にとって有益である。

As Large Language Models (LLMs) are integrated into various sectors, ensuring their reliability and safety is crucial. This necessitates rigorous probing and auditing to maintain their effectiveness and trustworthiness in practical applications. Subjecting LLMs to varied iterations of a single query can unveil potential inconsistencies in their knowledge base or functional capacity. However, a tool for performing such audits with a easy to execute workflow, and low technical threshold is lacking. In this demo, we introduce ``AuditLLM,'' a novel tool designed to audit the performance of various LLMs in a methodical way. AuditLLM's primary function is to audit a given LLM by deploying multiple probes derived from a single question, thus detecting any inconsistencies in the model's comprehension or performance. A robust, reliable, and consistent LLM is expected to generate semantically similar responses to variably phrased versions of the same question. Building on this premise, AuditLLM generates easily interpretable results that reflect the LLM's consistency based on a single input question provided by the user. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; and (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.

翻訳日:2024-06-20 04:15:24 公開日:2024-06-17

# 障害型DABS:障害テキストにおける動的アスペクトベース要約のベンチマーク

Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts ( http://arxiv.org/abs/2402.10554v2 )

ライセンス: Link先を確認

Xiaobo Guo, Soroush Vosoughi,

(参考訳) アスペクトベースの要約は、特に構造化テキストにおいて顕著な進歩を遂げている。しかし、ソーシャルメディアや顧客からのフィードバックなど、混乱した大規模なテキストを要約することは、依然として大きな課題だ。現在の研究は、動的および乱れた環境の複雑さを無視して、構造化されたテキストの事前定義された側面を主にターゲットとしている。このギャップに対処するために、非構造化テキストに適した動的アスペクトベースの要約のための新しいベンチマークであるDissented-DABSを導入する。コスト効率とスケーラビリティのために既存のデータセットを適応させることにより、我々の包括的な実験と詳細な人的評価により、障害型DABSは、GPT-3.5のような最先端言語モデルを含む現代の要約モデルに固有の課題をもたらすことが明らかとなった。

Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current research largely targets predefined aspects within structured texts, neglecting the complexities of dynamic and disordered environments. Addressing this gap, we introduce Disordered-DABS, a novel benchmark for dynamic aspect-based summarization tailored to unstructured text. Developed by adapting existing datasets for cost-efficiency and scalability, our comprehensive experiments and detailed human evaluations reveal that Disordered-DABS poses unique challenges to contemporary summarization models, including state-of-the-art language models such as GPT-3.5.

翻訳日:2024-06-20 04:15:24 公開日:2024-06-17

# 言語モデルが反映する感情とモラル感

Whose Emotions and Moral Sentiments Do Language Models Reflect? ( http://arxiv.org/abs/2402.11114v2 )

ライセンス: Link先を確認

Zihao He, Siyi Guo, Ashwin Rao, Kristina Lerman,

(参考訳) 言語モデル(LM)は、特にコンテンツモデレーションやヘイトスピーチの検出といった主観的なタスクにおいて、他のグループよりも優れた社会集団の視点を表現することが知られている。 LMが異なる視点をどう表現するかを探求するために、既存の研究は位置的アライメント、すなわちモデルが異なるグループの意見やスタンス、例えばリベラル派や保守派をいかに模倣するかに焦点を当てている。しかし、人間のコミュニケーションは感情的・道徳的な側面も含む。本研究では、感情的アライメントの問題を定義し、LMの感情的トーンと道徳的トーンが異なるグループのトーンをどのように表すかを測定する。我々は,36個のLMが生成した応答とTwitterメッセージの影響を比較することで,両者のイデオロギー的グループによるLMの重大な不一致を観察した。 LMを特定のイデオロギー的視点に向けて操った後も、モデルの不適応とリベラルな傾向は持続し、LM内の体系的偏見が示唆される。

Language models (LMs) are known to represent the perspectives of some social groups better than others, which may impact their performance, especially on subjective tasks such as content moderation and hate speech detection. To explore how LMs represent different perspectives, existing research focused on positional alignment, i.e., how closely the models mimic the opinions and stances of different groups, e.g., liberals or conservatives. However, human communication also encompasses emotional and moral dimensions. We define the problem of affective alignment, which measures how LMs' emotional and moral tone represents those of different groups. By comparing the affect of responses generated by 36 LMs to the affect of Twitter messages, we observe significant misalignment of LMs with both ideological groups. This misalignment is larger than the partisan divide in the U.S. Even after steering the LMs towards specific ideological perspectives, the misalignment and liberal tendencies of the model persist, suggesting a systemic bias within LMs.

翻訳日:2024-06-20 04:15:24 公開日:2024-06-17

# 影響分析によるインテクスト学習の実証選択

In-Context Learning Demonstration Selection via Influence Analysis ( http://arxiv.org/abs/2402.11750v2 )

ライセンス: Link先を確認

Vinay M. S., Minh-Hao Van, Xintao Wu,

(参考訳) 大規模言語モデル(LLM)がICL(In-Context Learning)機能を披露した。その利点にもかかわらず、ICLの有効性はデモの選択に大きく依存している。 ICLの最も効果的なデモンストレーションを選択することは、依然として重要な研究課題である。そこで本研究では,インフルエンス関数を用いてトレーニングサンプルの影響を解析する,InfICLという実演選択手法を提案する。最も影響力のあるトレーニングサンプルをデモとして識別することで、InfICLはICLの一般化性能を向上させることを目指している。 InfICLのコスト効率を維持するため,LLMのみを使用してサンプル入力埋め込みを生成し,高価な微調整を回避する。実世界の様々なデータセットに関する実証研究を通じて、最先端のベースラインと比較してInfICLの利点を実証する。

Large Language Models (LLMs) have showcased their In-Context Learning (ICL) capabilities, enabling few-shot learning without the need for gradient updates. Despite its advantages, the effectiveness of ICL heavily depends on the choice of demonstrations. Selecting the most effective demonstrations for ICL remains a significant research challenge. To tackle this issue, we propose a demonstration selection method named InfICL, which utilizes influence functions to analyze impacts of training samples. By identifying the most influential training samples as demonstrations, InfICL aims to enhance the ICL generalization performance. To keep InfICL cost-effective, we only use the LLM to generate sample input embeddings, avoiding expensive fine-tuning. Through empirical studies on various real-world datasets, we demonstrate advantages of InfICL compared to state-of-the-art baselines.

翻訳日:2024-06-20 04:15:24 公開日:2024-06-17

# オンラインコミュニティにおける人的価値の調査

Investigating Human Values in Online Communities ( http://arxiv.org/abs/2402.14177v2 )

ライセンス: Link先を確認

Nadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein,

(参考訳) 人的価値は社会科学における分析ツールとして重要な役割を担い、社会全体および個々のコミュニティにおける様々な次元の研究を可能にする。本稿では、シュワルツの価値観フレームワークの計算応用をRedditに提案することで、従来の調査に基づく人的価値の研究の限界に対処する。 Redditコンテンツの自動値抽出ツールの信頼性を確保した後、Schwartzの値で10,000のサブレディットに600万の投稿を自動的に注釈付けします。本分析は,様々なオンラインコミュニティで広く普及している価値観について,これまでに記録された知見と新たな知見の両方を提示する。例えば、議論の的となる話題について異なる意見のサブレディットを調べると、カーニヴォールよりもベガンのサブレディットにおけるより高い普遍主義的価値を発見する。さらに、地理的に特異的なサブレディットの研究は、伝統的な価値観と保守的なアメリカ合衆国の州との相関を強調している。

Human values play a vital role as an analytical tool in social sciences, enabling the study of diverse dimensions within society as a whole and among individual communities. This paper addresses the limitations of traditional survey-based studies of human values by proposing a computational application of Schwartz's values framework to Reddit, a platform organized into distinct online communities. After ensuring the reliability of automated value extraction tools for Reddit content, we automatically annotate six million posts across 10,000 subreddits with Schwartz values. Our analysis unveils both previously recorded and novel insights into the values prevalent within various online communities. For instance, when examining subreddits with differing opinions on controversial topics, we discover higher universalism values in the Vegan subreddit compared to Carnivores. Additionally, our study of geographically specific subreddits highlights the correlation between traditional values and conservative U.S. states.

翻訳日:2024-06-20 04:05:40 公開日:2024-06-17

# Unraveling Babel: LLMの多言語活性化パターンの探索とその応用

Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications ( http://arxiv.org/abs/2402.16367v2 )

ライセンス: Link先を確認

Weize Liu, Yinlong Xu, Hongxia Xu, Jintai Chen, Xuming Hu, Jian Wu,

(参考訳) 近年,大規模言語モデル (LLM) はNLPの分野で大きなブレークスルーを遂げている。我々は,高密度LLMを微細なMoEアーキテクチャに変換する手法を設計し,その上で,専門家のアクティベーション周波数の熱マップを用いて多言語アクティベーションパターンを視覚的に研究した。異なるモデルファミリ,異なるモデルサイズ,異なる変種に関する総合的な実験を通じて,高周波アクティベーションの専門家の分布,多言語共有専門家の分布,異なる言語のアクティベーションパターンが言語ファミリと関連しているかどうか,およびアクティベーションパターンに及ぼす指導チューニングの影響を解析した。さらに、専門家のアクティベーション周波数の差分を利用して、非構造化プルーニングを2つの異なる方法で導く方法について検討した。実験結果から,提案手法はランダム・エキスパート・プルーニングを著しく上回り,一部の言語での未実行モデルの性能よりも優れていた。さらに、アクティベーションレベルの違いに基づいて異なるレイヤに対して異なるプルーニング率を設定することで、より良い結果が得られることがわかった。本研究は, LLM内の多言語処理機構を明らかにし, これらの知見を利用して, モデルプルーニングなどのアプリケーションに新たな視点を提供するものである。

Recently, large language models (LLMs) have achieved tremendous breakthroughs in the field of NLP, but still lack understanding of their internal activities when processing different languages. We designed a method to convert dense LLMs into fine-grained MoE architectures, and then visually studied the multilingual activation patterns of LLMs through expert activation frequency heatmaps. Through comprehensive experiments on different model families, different model sizes, and different variants, we analyzed the distribution of high-frequency activated experts, multilingual shared experts, whether the activation patterns of different languages are related to language families, and the impact of instruction tuning on activation patterns. We further explored leveraging the discovered differences in expert activation frequencies to guide unstructured pruning in two different ways. Experimental results demonstrated that our method significantly outperformed random expert pruning and even exceeded the performance of the original unpruned models in some languages. Additionally, we found that configuring different pruning rates for different layers based on activation level differences could achieve better results. Our findings reveal the multilingual processing mechanisms within LLMs and utilize these insights to offer new perspectives for applications such as model pruning.

翻訳日:2024-06-20 04:05:40 公開日:2024-06-17

# Sarathi-Serve を用いた LLM 推論におけるスループット-レイテンシトレードオフのモデル化

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve ( http://arxiv.org/abs/2403.02310v3 )

ライセンス: Link先を確認

Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee,

(参考訳) 各LSMサービス要求は2段階に分けて行われる。第1のプリフィルは入力プロンプト全体を処理し、第1の出力トークンを生成し、第2のプリフィルは、残りの出力トークンを1対1で生成するデコードである。 Prefillイテレーションはレイテンシが高いが、入力プロンプトの並列処理によってGPU計算が飽和する。対照的に、デコードイテレーションはレイテンシが低いが、要求毎に1つのトークンしか処理しないため、計算利用率が低い。これにより、バッチ処理はデコードに非常に効果的になり、結果として全体的なスループットが向上する。しかし、複数のリクエストをバッチ化すると、プリフィルとデコードがインターリーブされ、高いスループットと低レイテンシの両方を達成することが困難になる。このスループットレイテンシのトレードオフに対処するために,効率的なLLM推論スケジューラであるSarathi-Serveを導入する。 Sarathi-Serve氏は、プレフィルリクエストをほぼ同じサイズのチャンクに分割するチャンクドプレフィルを導入し、ストールフリースケジュールを生成し、継続するデコードを変更することなく、バッチに新しいリクエストを追加する。静的なスケジューリングは、バッチ処理がレイテンシに与える影響を最小限に抑えながら、大きなバッチサイズでスループットを改善する機会を解放する。さらに、Sarathi-Serveの均一なバッチは、イテレーション間の不均衡を改善し、最小のパイプラインバブルをもたらす。我々の手法は、テール遅延制約下でのモデルとハードウェア間での推論性能を大幅に改善する。 1つのA100 GPU上のMistral-7Bでは、vLLMと比較して2つのA100 GPU上のYi-34Bモデルの2.6倍のサービス容量と3.7倍のサービス容量を達成する。ファルコン180Bでパイプライン並列性を使用する場合、サラタイサーベはエンドツーエンドの能力で最大5.6倍の利得を提供する。 Sarathi-Serveのソースコードはhttps://github.com/microsoft/sarathi-serve.comで入手できる。

Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low compute utilization because a decode iteration processes only a single token per request. This makes batching highly effective for decodes and consequently for overall throughput. However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency. We introduce an efficient LLM inference scheduler, Sarathi-Serve, to address this throughput-latency tradeoff. Sarathi-Serve introduces chunked-prefills which splits a prefill request into near equal sized chunks and creates stall-free schedules that adds new requests in a batch without pausing ongoing decodes. Stall-free scheduling unlocks the opportunity to improve throughput with large batch sizes while minimizing the effect of batching on latency. Furthermore, uniform batches in Sarathi-Serve ameliorate the imbalance between iterations resulting in minimal pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware under tail latency constraints. For Mistral-7B on single A100 GPUs, we achieve 2.6x higher serving capacity and up to 3.7x higher serving capacity for the Yi-34B model on two A100 GPUs as compared to vLLM. When used with pipeline parallelism on Falcon-180B, Sarathi-Serve provides up to 5.6x gain in the end-to-end serving capacity. The source code for Sarathi-Serve is available at https://github.com/microsoft/sarathi-serve.

翻訳日:2024-06-20 04:05:40 公開日:2024-06-17

# OffensiveLang: コミュニティベースの攻撃的言語データセット

OffensiveLang: A Community Based Implicit Offensive Language Dataset ( http://arxiv.org/abs/2403.02472v6 )

ライセンス: Link先を確認

Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata Bhattacharya, Nilanjana Raychawdhary, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals,

(参考訳) ソーシャルメディアにおけるヘイトフル言語の存在は、社会的幸福に悪影響を及ぼしている。結果として、この問題に高い優先順位で対処することが非常に重要になっている。ヘイトスピーチや攻撃的な言語は、明示的な形と暗黙的な形の両方に存在するが、後者は検出することがより困難である。この領域における現在の研究はいくつかの課題に直面している。まず、既存のデータセットは主に明示的な攻撃的なキーワードを含むテキストの収集に依存しており、これらのキーワードを欠いた暗黙的に攻撃的なコンテンツをキャプチャすることは困難である。第二に、一般的な方法論は、コミュニティ情報が提供する価値ある洞察を無視して、テキスト分析にのみ焦点をあてる傾向がある。そこで本研究では,ChatGPT 3.5 が生成する攻撃的言語データセットであるOffensiveLang について紹介する。倫理的制約によりChatGPTを用いた攻撃的テキストの生成に制限があるにもかかわらず、暗黙的な攻撃的言語を効果的に生成するプロンプトベースのアプローチを提案する。データ品質を確保するために、データセットを人間で評価する。さらに,ChatGPTを用いたプロンプトベースのゼロショット法を用いて,人間のアノテーションとChatGPTアノテーションの検知結果を比較する。既存の最先端モデルを用いて、そのような言語を検出するのがいかに効果的かを確認する。データセットは以下の通りである。 https://github.com/AmitDasRup123/OffensiveLang

The widespread presence of hateful languages on social media has resulted in adverse effects on societal well-being. As a result, addressing this issue with high priority has become very important. Hate speech or offensive languages exist in both explicit and implicit forms, with the latter being more challenging to detect. Current research in this domain encounters several challenges. Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords. Secondly, common methodologies tend to focus solely on textual analysis, neglecting the valuable insights that community information can provide. In this research paper, we introduce a novel dataset OffensiveLang, a community based implicit offensive language dataset generated by ChatGPT 3.5 containing data for 38 different target groups. Despite limitations in generating offensive texts using ChatGPT due to ethical constraints, we present a prompt-based approach that effectively generates implicit offensive languages. To ensure data quality, we evaluate the dataset with human. Additionally, we employ a prompt-based zero-shot method with ChatGPT and compare the detection results between human annotation and ChatGPT annotation. We utilize existing state-of-the-art models to see how effective they are in detecting such languages. The dataset is available here: https://github.com/AmitDasRup123/OffensiveLang

翻訳日:2024-06-20 04:05:40 公開日:2024-06-17

# 非存在の証明」はDNSリゾルバCPUを悪用できる

Attacking with Something That Does Not Exist: 'Proof of Non-Existence' Can Exhaust DNS Resolver CPU ( http://arxiv.org/abs/2403.15233v2 )

ライセンス: Link先を確認

Olivia Gruza, Elias Heftrig, Oliver Jacobsen, Haya Schulmann, Niklas Vogel, Michael Waidner,

(参考訳) NSEC3はDNSSECに存在しないことの証明であり、クエリされたリソースがターゲットドメインに存在しないという認証された主張を提供する。 NSEC3は、検索されたホスト名の前と後をアルファベット順にソートしたハッシュネームで構成されている。辞書攻撃を困難にするため、ハッシュ関数を複数回繰り返し適用することは可能であるが、NSEC3レコードのSHA-1ハッシュの計算においてDNSリゾルバの負荷も増大する。 DNSリゾルバ上の NSEC3 レコードの計算によって発生する負荷に関する懸念はすでに NSEC3 仕様 RFC5155 と RFC9276 で検討されている。 2024年2月、NSEC3がDNSリゾルバのリソースを消費する可能性があり、CVE-2023-50868が割り当てられた。しかし,攻撃評価は公表されておらず,リゾルバに対する攻撃の影響は明らかにされていない。本研究では,DNSリゾルバの実装に対する NSEC3-encloser 攻撃の最初の評価を行い, RFC5155 の勧告に従えば, NSEC3-encloser 攻撃は 72 倍のCPU命令数を発生させることができることを確認した。攻撃の影響は、異なるDNSリゾルバによって異なるが、十分な量のDNSパケットがあれば、攻撃はCPU負荷を増大させ、パケットロスを引き起こす可能性があることを示す。 DNSの実装によって、毎秒150の悪意のあるNSEC3レコードのレートで、良質なDNSリクエストの損失率は2.7%から30%の間で異なる。 NSEC3-encloser攻撃の詳細な説明と実装を提供する。また,各NSEC3パラメータがNSEC3-encloser攻撃時の被害者リゾルバの負荷にどのように影響するかを解析した。

NSEC3 is a proof of non-existence in DNSSEC, which provides an authenticated assertion that a queried resource does not exist in the target domain. NSEC3 consists of alphabetically sorted hashed names before and after the queried hostname. To make dictionary attacks harder, the hash function can be applied in multiple iterations, which however also increases the load on the DNS resolver during the computation of the SHA-1 hashes in NSEC3 records. Concerns about the load created by the computation of NSEC3 records on the DNS resolvers were already considered in the NSEC3 specifications RFC5155 and RFC9276. In February 2024, the potential of NSEC3 to exhaust DNS resolvers' resources was assigned a CVE-2023-50868, confirming that extra iterations of NSEC3 created substantial load. However, there is no published evaluation of the attack and the impact of the attack on the resolvers was not clarified. In this work we perform the first evaluation of the NSEC3-encloser attack against DNS resolver implementations and find that the NSEC3-encloser attack can still create a 72x increase in CPU instruction count, despite the victim resolver following RFC5155 recommendations in limiting hash iteration counts. The impact of the attack varies across the different DNS resolvers, but we show that with a sufficient volume of DNS packets the attack can increase CPU load and cause packet loss. We find that at a rate of 150 malicious NSEC3 records per second, depending on the DNS implementation, the loss rate of benign DNS requests varies between 2.7% and 30%. We provide a detailed description and implementation of the NSEC3-encloser attack. We also develop the first analysis how each NSEC3 parameter impacts the load inflicted on the victim resolver during NSEC3-encloser attack.

翻訳日:2024-06-20 03:55:50 公開日:2024-06-17

# Qibo: 漢方医学における大規模言語モデル

Qibo: A Large Language Model for Traditional Chinese Medicine ( http://arxiv.org/abs/2403.16056v2 )

ライセンス: Link先を確認

Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo,

(参考訳) LLM(Large Language Models)は、医学、法律、金融など多くの専門分野において大きな進歩を遂げている。しかし、伝統的な中国医学(TCM)においては、理論と近代医学の本質的な違い、専門的なコーパス資源の欠如、監督された微調整にのみ依存しているという事実は、過度な予測につながる可能性がある。これらの課題に対処するため,継続的事前学習と教師付き微調整を組み合わせた2段階の訓練手法を提案する。本研究の特筆すべき貢献は、TCM専用の2Gbコーパスの処理であり、それぞれTCMのための事前学習データセットと命令微調整データセットを構築している。さらに,主観的,客観的,および3つのTCMNLPタスクを含む,TCMにおけるLLMの性能を評価するツールであるQibo-Benchmarkを開発した。 Emph{\textbf{Qibo}}という名前のパイプラインでトレーニングされた医療用LLMは、大幅なパフォーマンス向上を示します。ベースラインと比較すると、平均主観的勝利率は63.%、平均目標精度は23.%から58.%向上し、3つのTCM NLPタスクのルージュ-Lスコアは0.72、0.61、0.55である。最後に,QiboをTCMコンサルテーションに適用するためのピップラインを提案し,ケーススタディを通じてモデル性能を実証する。

Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2Gb corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named \emph{\textbf{Qibo}}, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63\%, the average objective accuracy improved by 23\% to 58\%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.

翻訳日:2024-06-20 03:55:50 公開日:2024-06-17

# 言語モデル非現実的幻覚の機械的理解と緩和

Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations ( http://arxiv.org/abs/2403.18167v2 )

ライセンス: Link先を確認

Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong,

(参考訳) State-of-the-art Language Model (LM) は、世界の知識と混同する非現実的な幻覚を生じることがある。これらの幻覚の機械的原因を探るため,主観的関係クエリを用いた診断データセットを作成し,内部モデル表現による幻覚の追跡に解釈可能性手法を適用した。我々は、LM間で共有される幻覚(Llama-2, Pythia, GPT-J)の2つの一般的および別個の機械的原因を発見する。 1)知識豊か化幻覚:下層MLPにおける主観的属性知識の不足、及び 2)回答抽出幻覚:上層アテンションヘッドにおける正しい対象属性の選択に失敗する。また,この2つの幻覚の内的機械的原因が外的症状に反映されていることも判明した。本研究は,機械解析から得られた知見に基づいて,LMの内部事実リコールパイプラインの修復を目標とし,ベースラインよりも優れた性能を示す新しい幻覚緩和手法を提案する。

State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.

翻訳日:2024-06-20 03:55:50 公開日:2024-06-17

# Invalsiベンチマーク:イタリア語における大規模言語モデルの言語学的および数学的理解の測定

The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian ( http://arxiv.org/abs/2403.18697v2 )

ライセンス: Link先を確認

Andrea Esuli, Giovanni Puccetti,

(参考訳) イタリア語は高資源言語であるが、この言語ではLarge Language Models (LLM) の生成能力を評価するためのイタリアのネイティブベンチマークはほとんどない。 Invalsi MATEは、イタリア語の数学的理解に基づくモデル性能の評価と、Invalsi ITAはイタリア語の言語理解を評価する。これらのベンチマークは、イタリアの学校システムで6歳から18歳の学生に実施され、教育と教育の専門家によって検証されたInvalsiテストに基づいている。これらのベンチマークを用いて、現在の言語モデルが数学的理解において70%の精度で拘束され、Llama 3 70bと言語理解において85%の精度で達成されていることを示す9つの強力な言語モデルを評価する。また,LLMをイタリアの学生の平均成績と比較したところ,Llama 3がInvalsi MATEの学生より優れているのに対して,ほとんどのモデルはInvalsi ITAの生徒より優れていることがわかった。我々は,LLMの数学的および言語的理解をイタリア語で評価するために,より大規模かつ困難なベンチマークを今後開発する道を開くために,データおよび評価コードを公開する。

While Italian is a high resource language, there are few Italian-native benchmarks to evaluate Large Language Models (LLMs) generative abilities in this language. This work presents two new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian and Invalsi ITA to evaluate language understanding in Italian. These benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system and have been validated by several experts in teaching and pedagogy. We use these benchmarks to evaluate 9 powerful language models showing that current language models are bound by 70% accuracy in mathematical understanding, achieved by Llama 3 70b and by 85% in language understanding. We also compare LLMs with the average performance of Italian students to show that Llama 3 is the only one to perform better than students on Invalsi MATE while most models outperform students on Invalsi ITA. We will make data and evaluation code openly available to pave the way for the future development of larger and harder benchmarks to evaluate LLMs' mathematical and linguistic understanding in Italian.

翻訳日:2024-06-20 03:55:50 公開日:2024-06-17

# Hammersley-Chapman-Robbins境界による機密性の保証

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds ( http://arxiv.org/abs/2404.02866v3 )

ライセンス: Link先を確認

Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert,

(参考訳) ディープニューラルネットワークによる推論中のプライバシ保護は、最終分類器や他のタスク固有のレイヤの前に、最後のレイヤのアクティベーションにノイズを加えることで実現される。このような層の活性化は、"features"(一般的には"embeddings"や"feature embeddeds"と呼ばれる)として知られている。ノイズが加わったことで、ノイズのある特徴から入力が復元されるのを防ぐことができる。入力の可能な全ての非バイアス推定器のばらつきを低くすることは、そのような付加ノイズから生じる機密性を定量化する。ハマーズリーとチャップマンとロビンズの古典的不等式(HCR境界)から、連続で計算的に計算可能な境界が利用できる。数値実験により、HCR境界は、画像分類用の10のクラスを含むデータセット "MNIST" と "CIFAR-10" で、小さなニューラルネットに対して有効であることが示唆された。 HCR境界は、標準のディープニューラルネットワークである"ResNet-18"と"Swin-T"を、1000のクラスを含むデータセットである"ImageNet-1000"で事前トレーニングする際の入力の機密性を保証するために、それ自体では不十分であるように見える。 ImageNetの場合、機密性を提供する他の方法による機能へのノイズの追加を補うことは保証される。いずれの場合も, ノイズによる分類精度の低下がほとんどない付加雑音の量について検討した。これにより、画像分類作業の精度を大幅に低下させることなく、秘密性を高めることができる。

Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.

翻訳日:2024-06-20 01:55:10 公開日:2024-06-17

# ChangeMamba:時空間空間モデルによるリモートセンシング変化検出

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model ( http://arxiv.org/abs/2404.03425v4 )

ライセンス: Link先を確認

Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, Naoto Yokoya,

(参考訳) 畳み込みニューラルネットワーク(CNN)とトランスフォーマーは、リモートセンシング変化検出(CD)の分野で目覚ましい進歩を遂げた。しかし、両方のアーキテクチャには固有の欠点がある。CNNは、より広い空間的コンテキストをキャプチャする能力を阻害する、限定的な受容的フィールドによって制約されている一方で、Transformerは計算集約的であり、大規模なデータセット上でトレーニングとデプロイにコストがかかる。近年、状態空間モデルに基づくMambaアーキテクチャは、上記の2つのアーキテクチャの欠点を効果的に補うことができる一連の自然言語処理タスクにおいて、顕著な性能を示している。本稿では,リモートセンシングCDタスクにおけるMambaアーキテクチャの可能性について検討する。我々は,2値変化検出 (BCD), 意味変化検出 (SCD), 建物損傷評価 (BDA) に対応するフレームワークであるMambaBCD, MambaSCD, MambaBDAを調整した。 3つのフレームワークはいずれも最先端のVisual Mambaアーキテクチャをエンコーダとして採用しており、入力画像からグローバルな空間的情報を完全に学習することができる。 3つのアーキテクチャで利用可能な変更デコーダについて,Mambaアーキテクチャと自然に結合可能な3つの時空間関係モデリング機構を提案し,その特性をフル活用して複数時空間特徴の時空間相互作用を実現し,正確な変更情報を得る。 5つのベンチマークデータセットにおいて、提案するフレームワークは、複雑なトレーニング戦略やトリックを使わずに、現在のCNNおよびTransformerベースのアプローチより優れており、CDタスクにおけるMambaアーキテクチャの可能性を完全に実証している。さらなる実験は、アーキテクチャが劣化したデータに対して非常に堅牢であることを示している。ソースコードはhttps://github.com/ChenHongruixuan/MambaCDで入手できる。

Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD

翻訳日:2024-06-20 01:55:10 公開日:2024-06-17

# 白人男性、黒人女性が助ける? LLMで言語機関の社会的バイアスをベンチマーク

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs ( http://arxiv.org/abs/2404.10508v2 )

ライセンス: Link先を確認

Yixin Wan, Kai-Wei Chang,

(参考訳) 言語エージェンシーは、テキストにおける社会的偏見を評価する上で重要な側面である。いくつかの研究が人文言語におけるエージェンシー関連バイアスに近づいた一方で、LLM(Large Language Model)生成コンテンツにおけるそのようなバイアスについて、非常に限定的な研究がなされている。さらに、過去の研究は、しばしばテキスト内のエージェント語とコミュニティブ語を識別する文字列マッチング技術に依存しており、それは言語エージェンシーを正確に分類するに足らない。本稿では,言語庁バイアス評価(LABE, Language Agency Bias Evaluation)ベンチマークについて紹介する。 LABEは5,400のテンプレートベースのプロンプト、正確なエージェンシー分類器、およびそれに対応するバイアスメトリクスを利用して、3つのテキスト生成タスク(バイオグラフィー、教授レビュー、参照レター)でLSMの性別、人種、および交叉言語エージェンシーバイアスをテストする。 3,724のエージェント文と共用文からなるLanguage Agency Classification (LAC)データセットを,より良く,より正確な自動エージェント分類器の構築に寄与し,リリースする。 LABEを用いて,近年の3つのLLM(ChatGPT,Llama3,Mistral)において,未探索言語エージェンシーの社会的偏見を明らかにした。 1)同一のテキストカテゴリでは,LLM世代は人文テキストよりもジェンダーバイアスのレベルが高く,(2)ほとんどの世代タスクでは,モデルが他のバイアスのレベルよりもはるかに高い交叉バイアスのレベルを示す。性別と人種の少数派(黒人女性など)の交差点にいる人々は、一貫して低レベルの機関を持つテキストによって記述されている; (3) 調査された3つのLSMのうち、Llama3は言語エージェンシーにおいて最大の全体的なバイアスを示す; (4) プロンプトベースの緩和はLLMにおける言語エージェンシーのバイアスを解決するのに失敗するだけでなく、しばしば生成されたテキストにおけるバイアスが悪化する。

Language agency is an important aspect of evaluating social biases in texts. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous research often relies on string-matching techniques to identify agentic and communal words within texts, which fall short of accurately classifying language agency. We introduce the novel Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE leverages 5,400 template-based prompts, an accurate agency classifier, and corresponding bias metrics to test for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. To build better and more accurate automated agency classifiers, we also contribute and release the Language Agency Classification (LAC) dataset, consisting of 3,724 agentic and communal sentences. Using LABE, we unveil previously under-explored language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) For the same text category, LLM generations demonstrate higher levels of gender bias than human-written texts; (2) On most generation tasks, models show remarkably higher levels of intersectional bias than the other bias aspects. Those who are at the intersection of gender and racial minority groups -- such as Black females -- are consistently described by texts with lower levels of agency; (3) Among the 3 LLMs investigated, Llama3 demonstrates greatest overall bias in language agency; (4) Not only does prompt-based mitigation fail to resolve language agency bias in LLMs, but it frequently leads to the exacerbation of biases in generated texts.

翻訳日:2024-06-20 01:44:57 公開日:2024-06-17

# ガウス・スティング・デコーダによる3次元対応型生成逆数ネットワークの構築

Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks ( http://arxiv.org/abs/2404.10625v2 )

ライセンス: Link先を確認

Florian Barthel, Arian Beckmann, Wieland Morgenstern, Anna Hilsmann, Peter Eisert,

(参考訳) EG3D や GIRAFFE のような NeRF ベースの3D-aware Generative Adversarial Networks (GAN) は、非常に高いレンダリング品質を示す。第一に、NeRFレンダリングの計算上の重要な要求は、モバイルやVR/ARヘッドセットのような低消費電力デバイスでの使用を妨げます。第二に、ニューラルネットワークに基づく暗黙の表現は、VR環境やビデオゲームのような明示的な3Dシーンに組み込むのは難しい。 3D Gaussian Splatting (3DGS)は、高フレームレートで効率的にレンダリングできる明示的な3D表現を提供することによって、これらの制限を克服する。本研究では,NeRFをベースとした3次元GANの高画質化と,3DGSの柔軟性と計算上の利点を組み合わせた新しい手法を提案する。暗黙的なNeRF表現を明示的な3Dガウススプラッティング属性にマッピングするデコーダをトレーニングすることにより、3Dガウススプラッティングのエコシステムに3D GANの表現多様性と品質を初めて組み込むことができる。さらに,本手法により,高分解能GANインバージョンとリアルタイムGAN編集が可能となる。プロジェクトページ:florian-barthel.github.io/gaussian_decoder

NeRF-based 3D-aware Generative Adversarial Networks (GANs) like EG3D or GIRAFFE have shown very high rendering quality under large representational variety. However, rendering with Neural Radiance Fields poses challenges for 3D applications: First, the significant computational demands of NeRF rendering preclude its use on low-power devices, such as mobiles and VR/AR headsets. Second, implicit representations based on neural networks are difficult to incorporate into explicit 3D scenes, such as VR environments or video games. 3D Gaussian Splatting (3DGS) overcomes these limitations by providing an explicit 3D representation that can be rendered efficiently at high frame rates. In this work, we present a novel approach that combines the high rendering quality of NeRF-based 3D-aware GANs with the flexibility and computational advantages of 3DGS. By training a decoder that maps implicit NeRF representations to explicit 3D Gaussian Splatting attributes, we can integrate the representational diversity and quality of 3D GANs into the ecosystem of 3D Gaussian Splatting for the first time. Additionally, our approach allows for a high resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes. Project page: florian-barthel.github.io/gaussian_decoder

翻訳日:2024-06-20 01:44:57 公開日:2024-06-17

# NLPモデルの潜在概念に基づく説明

Latent Concept-based Explanation of NLP Models ( http://arxiv.org/abs/2404.12545v2 )

ライセンス: Link先を確認

Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad,

(参考訳) ディープラーニングモデルによる予測の解釈と理解は、本質的に不透明な性質のため、非常に難しい課題となる。これらの予測を説明することを目的とした以前の取り組みの多くは、入力機能、特にNLPモデル内の単語に依存していた。しかし、これらの説明は、これらの単語の離散的な性質と文脈的冗長性の欠如により、あまり意味を示さないことが多い。この制限に対処するために、潜伏概念に基づく予測のための説明を生成するLACOAT(Latent Concept Attribution Method)を導入する。我々の基本的な直感は、単語が使用されるコンテキストに基づいて複数のファセットを表現できることである。したがって、文脈において単語が与えられた場合、トレーニングプロセスから派生した潜在空間はその単語の特定の面を反映する。 LACOATは、有能な入力語の表現をトレーニング潜在空間にマッピングすることで、予測の潜在文脈に基づく説明を提供することによって機能する。

Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the training latent space, allowing it to provide latent context-based explanations of the prediction.

翻訳日:2024-06-20 01:44:57 公開日:2024-06-17

# BiLO: PDE逆問題に対するバイレベルローカル演算子学習

BiLO: Bilevel Local Operator Learning for PDE inverse problems ( http://arxiv.org/abs/2404.17789v2 )

ライセンス: Link先を確認

Ray Zirui Zhang, Xiaohui Xie, John Lowengrub,

(参考訳) 本稿では、PDE逆問題を二段階最適化問題として定式化することにより、偏微分方程式(PDE)の逆問題の解法を提案する。上層部ではPDEパラメータに関してデータ損失を最小限に抑える。下層部では、与えられたPDEパラメータの近傍でPDE解演算子を局所的に近似するようにニューラルネットワークを訓練し、上層部最適化問題に対する降下方向の正確な近似を可能にする。下位レベル損失関数は、PDEパラメータに対する残差と微分の両方のL2ノルムを含む。上層と下層の両方の最適化問題に勾配勾配を同時に適用し,有効かつ高速なアルゴリズムを実現する。この手法はBiLO(Bilevel Local Operator Learning)と呼ばれ、補助変数の導入によってPDE内の未知の関数を効率的に推論することができる。複数のPDEシステムに対する広範な実験により,本手法は強いPDE制約を強制し,スパースかつノイズの多いデータに対して堅牢であり,既存手法のソフトPDE制約に固有の残差とデータ損失のバランスを取る必要がなくなることを示した。

We propose a new neural network based method for solving inverse problems for partial differential equations (PDEs) by formulating the PDE inverse problem as a bilevel optimization problem. At the upper level, we minimize the data loss with respect to the PDE parameters. At the lower level, we train a neural network to locally approximate the PDE solution operator in the neighborhood of a given set of PDE parameters, which enables an accurate approximation of the descent direction for the upper level optimization problem. The lower level loss function includes the L2 norms of both the residual and its derivative with respect to the PDE parameters. We apply gradient descent simultaneously on both the upper and lower level optimization problems, leading to an effective and fast algorithm. The method, which we refer to as BiLO (Bilevel Local Operator learning), is also able to efficiently infer unknown functions in the PDEs through the introduction of an auxiliary variable. Through extensive experiments over multiple PDE systems, we demonstrate that our method enforces strong PDE constraints, is robust to sparse and noisy data, and eliminates the need to balance the residual and the data loss, which is inherent to the soft PDE constraints in many existing methods.

翻訳日:2024-06-20 01:44:57 公開日:2024-06-17

# モンテカルロ木探索が反復推論学習による推論を強化

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning ( http://arxiv.org/abs/2405.00451v2 )

ライセンス: Link先を確認

Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh,

(参考訳) 我々は,AlphaZero が採用した戦略に触発された反復的選好学習プロセスを通じて,Large Language Models (LLM) の推論能力の向上を目的としたアプローチを導入する。我々の研究は、MCTS(Monte Carlo Tree Search)を利用して好みデータを反復的に収集し、そのルックアヘッド機能を利用して、インスタンスレベルの報酬をよりきめ細かいステップレベルの信号に分解する。中間段階の整合性を高めるため, 結果検証と段階的自己評価を併用し, 新たに生成したデータの品質評価を継続的に更新する。提案アルゴリズムはDPO(Direct Preference Optimization)を用いて,新たに生成されたステップレベルの優先度データを用いてLCMポリシーを更新する。理論的分析は、自己改善を成功させるために、オンラインサンプルデータを使用することの重要性を明らかにしている。様々な算術的および常識的推論タスクに対する広範囲な評価は、既存のモデルよりも顕著な性能向上を示している。例えば、GSM8K、MATH、ARC-CのMistral-7B Supervised Fine-Tuning(SFT)ベースラインは精度が81.8\%$(+$5.9\%$)、34.7\%$(+$5.8\%$)、76.4\%$(+$15.8\%$)と大幅に向上している。さらに、我々の研究は、トレーニングと推論計算のトレードオフを掘り下げ、我々の方法がパフォーマンス向上を効果的に最大化する方法についての洞察を提供する。私たちのコードはhttps://github.com/YuxiXie/MCTS-DPO.comで公開されています。

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and ARC-C, with substantial increases in accuracy to $81.8\%$ (+$5.9\%$), $34.7\%$ (+$5.8\%$), and $76.4\%$ (+$15.8\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains. Our code is publicly available at https://github.com/YuxiXie/MCTS-DPO.

翻訳日:2024-06-20 01:35:12 公開日:2024-06-17

# グラフにおける単音素半空間学習のための効率的なアルゴリズム

Efficient Algorithms for Learning Monophonic Halfspaces in Graphs ( http://arxiv.org/abs/2405.00853v2 )

ライセンス: Link先を確認

Marco Bressan, Emmanuel Esposito, Maximilian Thiessen,

(参考訳) グラフの頂点上で二項分類器を学習する問題について検討する。特に、ある抽象的な意味で凸である頂点の分割である単音素半空間によって与えられる分類子を考える。単音素半空間や測地的半空間のような関連する概念は、最近関心を集め、それらの性質(例えば、VC次元)と基礎となるグラフの$G$の構造の間にいくつかの接続が引かれた。我々は、教師付き、オンライン、アクティブな設定において、モノフォニックなハーフスペースを学習するためのいくつかの新しい結果を証明した。我々の主な結果は、n = |V(G)|$ の時間多項式において、単音素半空間は、ほぼ最適のパッシブサンプル複雑性で学習できるということである。これにより、単調な半空間に関するいくつかの構造的洞察に基づいて、一貫した仮説チェックのための多項式時間アルゴリズムを考案し、満足度を2ドルに下げる必要がある。オンラインおよびアクティブな設定でも同様の結果が得られます。また、概念クラスは遅延$\operatorname{poly}(n)$で列挙でき、経験的リスク最小化は2.^{\omega(G)}\operatorname{poly}(n)$で、$\omega(G)$は$G$の斜め数であることを示す。これらの結果は、文献(Gonz\'alez et al , 2020)からのオープンな質問に答え、これらの問題のいくつかがNPハードである測地空間との対比を示す(Seiffarth et al , 2023)。

We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces,have recently attracted interest, and several connections have been drawn between their properties(e.g., their VC dimension) and the structure of the underlying graph $G$. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in $n = |V(G)|$. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to $2$-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay $\operatorname{poly}(n)$, and that empirical risk minimization can be performed in time $2^{\omega(G)}\operatorname{poly}(n)$ where $\omega(G)$ is the clique number of $G$. These results answer open questions from the literature (Gonz\'alez et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).

翻訳日:2024-06-20 01:35:12 公開日:2024-06-17

# SurfPro:連続表面に基づくタンパク質の機能設計

SurfPro: Functional Protein Design Based on Continuous Surface ( http://arxiv.org/abs/2405.06693v2 )

ライセンス: Link先を確認

Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin,

(参考訳) 所望の機能を持つタンパク質をどうやって設計できるのか? 我々は、幾何学的構造と生化学的性質の両方がタンパク質の機能に重要であるという化学的直感に動機付けられている。本稿では,期待表面の機能性タンパク質の生成法であるSurfProとその生化学的性質について述べる。 SurfProは、タンパク質表面の幾何学的形状及び生化学的特徴を段階的にモデル化する階層エンコーダと、アミノ酸配列を生成する自己回帰デコーダとを備える。本稿では,標準的な逆フォールディングベンチマークCATH 4.2でSurfProを評価し,タンパク質結合体設計と酵素設計の2つの機能的タンパク質設計タスクについて検討した。我々のSurfProは、従来の逆フォールディング法を一貫して上回り、CATH 4.2で57.78%の回復率、タンパク質-タンパク質結合と酵素-基質相互作用のスコアで高い成功率を達成した。

How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.

翻訳日:2024-06-20 01:35:12 公開日:2024-06-17

# 機能的に重要な部位と小分子の基質によって誘導される生成酵素設計

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates ( http://arxiv.org/abs/2405.08205v2 )

ライセンス: Link先を確認

Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li,

(参考訳) 酵素は、化学反応を加速できる遺伝子コード化された生体触媒である。機能性酵素をどのように設計するか? 本稿では,酵素設計のための統一モデルであるEnzyGenを提案する。我々のキーとなるアイデアは、酵素のアミノ酸配列とその3次元(3D)座標を、所望の触媒機能に対応する機能的に重要な部位と基質に基づいて生成することである。これらの部位は酵素データベースから自動的に採掘される。 EnzyGenは、タンパク質配列全体における長距離相関と、3D空間における最も近いアミノ酸の局所的影響の両方を捉える、新しいインターリービングネットワークと近隣の同変層で構成されている。生成モデルを学習するために、配列生成損失、位置予測損失、酵素-基質相互作用損失を含む共同学習目標を考案する。さらに、タンパク質データバンク(PDB)内のすべての利用可能な酵素をカバーする3157の酵素ファミリーを持つデータセットであるEnzyBenchを構築した。実験の結果、EnzyGenは323の試験ファミリで一貫して最高のパフォーマンスを達成し、基質結合親和性の点で10.79%のベースラインを上回りました。これらの結果から, 高い親和性を有する特定の基質に結合する, 十分に折りたたみされた, 効果的な酵素を設計する上で, EnzyGenが優れていることが示唆された。

Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.

翻訳日:2024-06-20 01:35:12 公開日:2024-06-17

# $\varepsilon$-fairnessの不公平

The Unfairness of $\varepsilon$-Fairness ( http://arxiv.org/abs/2405.09360v2 )

ライセンス: Link先を確認

Tolulope Fadina, Thorsten Schmidt,

(参考訳) 意思決定プロセスの公平性は確率的指標を用いて定量化されることが多い。しかし、これらの指標は、実際の不公平な結果を完全には捉えていないかもしれない。本稿では,意思決定プロセスの現実的影響をより正確に測定するために,ユーティリティベースのアプローチを採用する。特に、$\varepsilon$-fairnessという概念が採用された場合、現実世界の文脈で最大に不公平な結果をもたらす可能性があることを示す。さらに, 虚偽陰性に関する不使用データの一般的な問題に対して, 重要な公平性を考慮した設定の削減を提案する。本研究は,大学入学と信用リスク評価の2つの実例を用いて実施した。分析の結果,従来の確率に基づく評価は公平性を示唆するが,実用性に基づくアプローチは真に平等を達成するために必要な行動を明らかにする。例えば,大学入試の場合,修了率の向上は公平性の確保に不可欠であることが判明した。本論文は, 公平性を評価する上で, 現実の文脈を考えることの重要性を強調した。

Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.

翻訳日:2024-06-20 01:35:12 公開日:2024-06-17

# Tiny Refinements Elicit Resilience: : LLM-Teaming に対する効率的なプレフィックスモデルに向けて

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming ( http://arxiv.org/abs/2405.12604v2 )

ライセンス: Link先を確認

Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, Xinping Yi, Xiaowei Huang,

(参考訳) 大規模言語モデル(LLM)のレッドチーム戦略の普及に伴い,LLM防衛戦略の安全性と堅牢性向上に関する文献の不足がますます顕著になっている。本稿では,LLM をベースとした <textbf{sentinel} モデルを,入力プロンプトをわずか (<30$) 追加トークンで再構成し,ターゲット LLM からの応答に対する毒性を効果的に低減するプラグイン・アンド・プレイプレフィックスモジュールとして導入する。センチネルモデルは、微調整された大きなターゲットモデルに対して、自然に \textit{parameter inefficiency} と \textit{limited model accessibility} を克服する。我々はPPO(Proximal Policy Optimization)を用いてレッドチームとセンチネルモデルの両方を動的に最適化し、エージェント間の複雑な相互作用を管理するためにマルチエージェントの中央集権的批評家にインスパイアされた価値共有メカニズムを取り入れたインターリーブ型トレーニングシステムを採用している。テキスト・トゥ・テキスト・トゥ・イメージにわたる広範な実験により、有害な出力を緩和するアプローチの有効性が実証された。これは、さまざまなアプリケーションの安全性とロバスト性を高める上での我々のフレームワークの可能性を強調した、 \texttt{Llama-2}, \texttt{GPT-3.5}, \texttt{Stable-Diffusion}のような大規模モデルを扱う場合であってもである。

With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications.

翻訳日:2024-06-20 01:25:27 公開日:2024-06-17

# Occam Gradient Descent

Occam Gradient Descent ( http://arxiv.org/abs/2405.20194v2 )

ライセンス: Link先を確認

B. N. Kausik,

(参考訳) ディープラーニングニューラルネットワークモデルは、問題領域に適応するのに十分な大きさでなければならないが、勾配降下時のトレーニングデータの過度な適合を回避するには十分である。これらの競合する要求のバランスをとるために、トランスフォーマーのような過剰な予測されたディープラーニングモデルは、大きなデータセット上で1つのエポックのために訓練されるため、コンピューティングリソースとトレーニングデータの両方で非効率である。これらの非効率性に対応するために、我々は学習理論を利用してOccam Gradient Descentを導出する。Occam Gradient Descentはモデルサイズを適応的に減少させ、一般化誤差を最小限に抑えるアルゴリズムである。対照的に、従来の勾配降下は、一般化誤差によらず、嵌合誤差を極度に最小化する。提案アルゴリズムは, ニューラルネットワークの重み空間とトポロジカルサイズを同時に下降させるとともに, 従来の勾配勾配よりも精度, 計算, モデル圧縮に優れる。

Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, overprovisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we exploit learning theory to derive Occam Gradient Descent, an algorithm that interleaves adaptive reduction of model size to minimize generalization error, with gradient descent on model weights to minimize fitting error. In contrast, traditional gradient descent greedily minimizes fitting error without regard to generalization error. Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification, and is effective in our experiments in outperforming traditional gradient descent with or without post-train pruning in accuracy, compute and model compression.

翻訳日:2024-06-20 01:15:43 公開日:2024-06-17

# MODABS:動的アスペクトに基づく要約のための多目的学習

MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization ( http://arxiv.org/abs/2406.03479v2 )

ライセンス: Link先を確認

Xiaobo Guo, Soroush Vosoughi,

(参考訳) オンラインコンテンツの急速な普及は、動的なアスペクトベースの要約が目立つ効果的な要約方法を必要とする。既知のアスペクトの固定セットを前提とする従来のものとは異なり、このアプローチは入力テキストのさまざまな側面に適応する。本稿では,Longformer-Encoder-Decoderを用いた新しい多目的学習フレームワークを提案する。このフレームワークはアスペクト数予測を最適化し、各アスペクトに対する生成された要約と参照の相違を最小化し、アスペクト固有の要約間の相違を最大化する。大規模な実験により,本手法は,単一アスペクトの要約品質を犠牲にすることなく,生成されたアスペクトと参照アスペクトの効果的なアライメントによって,3つの多様なデータセットのベースラインを著しく上回ることがわかった。

The rapid proliferation of online content necessitates effective summarization methods, among which dynamic aspect-based summarization stands out. Unlike its traditional counterpart, which assumes a fixed set of known aspects, this approach adapts to the varied aspects of the input text. We introduce a novel multi-objective learning framework employing a Longformer-Encoder-Decoder for this task. The framework optimizes aspect number prediction, minimizes disparity between generated and reference summaries for each aspect, and maximizes dissimilarity across aspect-specific summaries. Extensive experiments show our method significantly outperforms baselines on three diverse datasets, largely due to the effective alignment of generated and reference aspect counts without sacrificing single-aspect summarization quality.

翻訳日:2024-06-20 01:15:43 公開日:2024-06-17

# Feriji: フランスのZarma Parallel Corpus, Glossary & Translator

Feriji: A French-Zarma Parallel Corpus, Glossary & Translator ( http://arxiv.org/abs/2406.05888v2 )

ライセンス: Link先を確認

Mamadou K. Keita, Elysabhete Amadou Ibrahim, Habibatou Abdoulaye Alfari, Christopher Homan,

(参考訳) 近年,機械翻訳(MT)が急速に発展し,複数の言語を精度良く翻訳できるモデルの開発が進んでいる。しかし、この分野におけるアフリカの言語の表現は、言語的な複雑さと限られた資源のために改善する必要がある。これは、ニジェールと近隣諸国で500万人以上の人々が話していたソンハイ語(ニロ・サハラ語族)の方言であるザーマ語に当てはまる。本稿では,Zarmaの61,085文,フランス語42,789文,および4,062語からなる用語集が,Zarmaのさらなるリソースの必要性に対処するための重要なステップであることを示す。我々はデータセット上で3つの大きな言語モデルを微調整し、最高の性能モデルでBLEUスコア30.06を得る。さらに, 流布, 理解, 可読性の人的判断に関するモデルと, コーパスとモデルの重要性と影響について検討した。私たちの貢献は、重要な言語ギャップを埋め、本質的で見落とされたアフリカの先住民言語を促進するのに役立ちます。

Machine translation (MT) is a rapidly expanding field that has experienced significant advancements in recent years with the development of models capable of translating multiple languages with remarkable accuracy. However, the representation of African languages in this field still needs to improve due to linguistic complexities and limited resources. This applies to the Zarma language, a dialect of Songhay (of the Nilo-Saharan language family) spoken by over 5 million people across Niger and neighboring countries \cite{lewis2016ethnologue}. This paper introduces Feriji, the first robust French-Zarma parallel corpus and glossary designed for MT. The corpus, containing 61,085 sentences in Zarma and 42,789 in French, and a glossary of 4,062 words represent a significant step in addressing the need for more resources for Zarma. We fine-tune three large language models on our dataset, obtaining a BLEU score of 30.06 on the best-performing model. We further evaluate the models on human judgments of fluency, comprehension, and readability and the importance and impact of the corpus and models. Our contributions help to bridge a significant language gap and promote an essential and overlooked indigenous African language.

翻訳日:2024-06-20 01:05:59 公開日:2024-06-17

# 速度ゆらぎ下でのクロスマシントランスファー故障診断における解釈可能な変調可能なSTFTと物理インフォームドバランススペクトル測定

Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations ( http://arxiv.org/abs/2406.11917v1 )

ライセンス: Link先を確認

Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu,

(参考訳) 輪車軸受の運転条件は、鉄道重貨物列車の安全運転に直接的な影響を与えている。しかし, 列車の速度変動と断層試料が少ないことが, 故障診断の精度を抑える主な問題である。そこで, 解釈可能な可変可変短時間フーリエ変換(STFT)と物理インフォームドスペクトル品質測定を併用したクロスマシントランスファー診断(pyDSN)ネットワークを提案し, 時間変化速度下でのドメイン不変および識別的特徴を学習した。まず,固定窓を用いた時間変化速度信号の抽出周波数成分の抽出が不十分なため,STFTインフォームド理論サポートと解釈可能な変調可微分STFT (MDSTFT) が提案され,堅牢な時間周波数スペクトル (TFS) を抽出する。トレーニングプロセス中、異なる長さの複数のウィンドウが動的に変化する。また, 分類基準と領域差測度に加えて, 物理インフォームド計量と呼ばれる第3の種類の計量を創造的に導入し, 伝送可能TFSを向上する。 MDSTFTとモデルのための最適化方向を導出するために,物理インフォームド平衡スペクトル品質(BSQ)正規化損失を考案した。これにより、高品質のTFSをモデルにできるだけでなく、物理に制限されたドメイン適応ネットワークも取得でき、現実世界の物理知識を学習し、最終的には異なるデータセット間でドメインの不一致を減少させることができる。この実験は、実験室のデータセットから貨物列車のデータセットへの移行シナリオにおいて行われ、ハイブリッド駆動のpyDSNが既存の手法より優れ、実用的な価値を持っていることを示す。

The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# 専門家の混在に対するグラフ知識蒸留

Graph Knowledge Distillation to Mixture of Experts ( http://arxiv.org/abs/2406.11919v1 )

ライセンス: Link先を確認

Pavel Rumiantsev, Mark Coates,

(参考訳) 精度の面では、ノード分類タスクにおいて、グラフニューラルネットワーク(GNN)が最適なアーキテクチャ選択である。現実のデプロイメントにおける彼らの欠点は、近隣の処理操作から生じるレイテンシである。遅延問題の1つの解決策は、訓練されたGNNからMulti-Layer Perceptron (MLP)への知識蒸留を行うことである。しかし, 従来の知識蒸留技術では, トランスダクティブ・インダクティブ・セッティングでの性能は相容れない。 MLPの代わりに特別設計の学生モデルを用いて性能問題に対処することを提案する。我々のモデルはRubM(Rubing-by-Memory)と呼ばれ、Mixture-of-Experts(MoE)の一種であり、専門家の専門化を強制する設計である。隠れ表現空間上の特定の領域を専門化することを各専門家に促すことにより、複数のデータセット間でより一貫性のあるパフォーマンスを導出できることを実験的に実証する。

In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each expert to specialize on a certain region on the hidden representation space, we demonstrate experimentally that it is possible to derive considerably more consistent performance across multiple datasets.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# Job-SDF: ジョブスキル需要予測とベンチマークのためのマルチグラニュラリティデータセット

Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking ( http://arxiv.org/abs/2406.11920v1 )

ライセンス: Link先を確認

Xi Chen, Chuan Qin, Chuyu Fang, Chao Wang, Chen Zhu, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong,

(参考訳) 急速に発展する雇用市場では、政策立案者や企業が変化を予測し、適応し、労働力のスキルが市場のニーズに合致することを保証し、生産性と競争力を高めるため、スキル需要予測が不可欠である。さらに、新たなスキル要件を特定することで、個人を関連するトレーニングや教育機会に誘導し、継続的な自己学習と開発を促進する。しかし、包括的なデータセットが存在しないことは、研究とこの分野の進歩を妨げる重要な課題である。このギャップを埋めるため、ジョブスキル需要予測モデルをトレーニングし、ベンチマークするためのデータセットであるJob-SDFを提示する。 2021年から2023年の間に中国の大手オンライン求人プラットフォームから収集された1035万件の求人広告に基づいて、このデータセットは521社にまたがる2324種類のスキルの月次求人需要を含んでいる。本データセットは,職業,企業,地域レベルなど,さまざまな粒度でのスキル需要予測モデルの評価を可能にする。我々は、このデータセット上のさまざまなモデルをベンチマークし、標準シナリオにおけるそれらのパフォーマンスの評価、低い値範囲に焦点をあてた予測、構造的なブレークの存在下で、さらなる研究のための新たな洞察を提供する。私たちのコードとデータセットはhttps://github.com/Job-SDF/benchmark.comから公開されています。

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# 交通予測のための時空間変圧器の再考:多段階多視点学習フレームワーク

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework ( http://arxiv.org/abs/2406.11921v1 )

ライセンス: Link先を確認

Jiaqi Lin, Qianqian Ren,

(参考訳) 交通予測は、非常に複雑な時空間相関を伴う時空間予測問題である。本稿では,交通予測のためのマルチレベル多視点時空間変換器(LVSTformer)を提案する。このモデルは、地理的、グローバルセマンティック、ピボットノードの3つの異なるレベルから空間的依存関係を、長期および短期の時間的依存関係とともにキャプチャすることを目的としている。具体的には,局所的,大域的,重要なノードの観点から空間情報を探索するための3つの空間的拡張ビューを設計する。 3つの空間的拡張ビューと3つの平行な空間的自己アテンションメカニズムを組み合わせることで、モデルは異なるレベルの空間的依存関係を包括的にキャプチャすることができる。本研究では,長期的・短期的依存関係を効果的に把握するゲート型時間的自己注意機構を設計する。さらに、2つの時空間層の間に時空間放送モジュールを導入し、注意点の分散配置を確実にし、過度な適合と情報損失を軽減し、モデルの一般化能力と堅牢性を高める。実験結果は,LVSTformerが競合するベースラインと比較して最先端の性能を達成し,最大4.32%まで向上したことを示す。

Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# ソーシャルメディア予測の分類と実際の市場データによる予測の検証による財務専門家の信用度の評価

Explainable assessment of financial experts' credibility by classifying social media forecasts and checking the predictions with actual market data ( http://arxiv.org/abs/2406.11924v1 )

ライセンス: Link先を確認

Silvia García-Méndez, Francisco de Arriba-Pérez, Jaime González-Gonzáleza, Francisco J. González-Castaño,

(参考訳) ソーシャルメディアには、ユーザーの人気に関連する多様なインタラクションメトリクスが含まれており、最も顕著な例は、ユーザーのフォロワー数である。後者は、最も人気のあるクリエーターによる投稿の信頼性に関する懸念を提起している。しかしながら、ソーシャルメディアにおける信頼性を評価する既存のアプローチのほとんどは、この問題を、実際の現実の事実がユーザのコメントを返却するかどうかを確認することなく、しばしば優先順位情報に基づくバイナリ分類であると厳密にみなしている。また、信頼を育むための予測について、自動的な説明は提供していない。本研究では,自然言語処理と機械学習を組み合わせたソーシャルメディア上での財務担当者に対する信頼性評価ソリューションを提案する。コントリビュータの評判は、資産価値の予測をタイプ別に自動的に分類し、これらの予測を実際の市場データで検証し、成功の確率を近似することで評価される。この検証の結果は、バイナリ結果ではなく、継続的な信頼性スコアであり、この研究によるまったく新しい貢献である。さらに、ソーシャルメディアのメトリクス(すなわちユーザコンテキスト)は、信頼度ランキングとの相関を計算し、ファイナンシャルポストにおけるエンドユーザの関心と予測(すなわち、ドロップまたはアップ)に関する洞察を提供することによって活用される。最後に、関係する特徴のモデルに依存しない分析に基づいて、その決定に関する自然言語による説明を提供する。

Social media include diverse interaction metrics related to user popularity, the most evident example being the number of user followers. The latter has raised concerns about the credibility of the posts by the most popular creators. However, most existing approaches to assess credibility in social media strictly consider this problem a binary classification, often based on a priori information, without checking if actual real-world facts back the users' comments. In addition, they do not provide automatic explanations of their predictions to foster their trustworthiness. In this work, we propose a credibility assessment solution for financial creators in social media that combines Natural Language Processing and Machine Learning. The reputation of the contributors is assessed by automatically classifying their forecasts on asset values by type and verifying these predictions with actual market data to approximate their probability of success. The outcome of this verification is a continuous credibility score instead of a binary result, an entirely novel contribution by this work. Moreover, social media metrics (i.e., user context) are exploited by calculating their correlation with the credibility rankings, providing insights on the interest of the end-users in financial posts and their forecasts (i.e., drop or rise). Finally, the system provides natural language explanations of its decisions based on a model-agnostic analysis of relevant features.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# DocCGen: ドキュメントベースの制御コード生成

DocCGen: Document-based Controlled Code Generation ( http://arxiv.org/abs/2406.11925v1 )

ライセンス: Link先を確認

Sameer Pimparkhede, Mehant Kammakomati, Srikanth G. Tamilselvam, Prince Kumar, Ashok Pon Kumar, Pushpak Bhattacharyya,

(参考訳) 近年の進歩により、Large Language Models (LLM) は、C++、Java、Pythonといったリソースに富む汎用言語のためのコード生成に、自然言語(NL)で最先端のパフォーマンスをもたらすことが示されている。しかし、YAMLやJSONのような構造化ドメイン固有言語(DSL)に対する実践的な利用は、事前トレーニング中に一般的にLLMによって見つからないドメイン固有スキーマ、文法、カスタマイズによって制限される。この課題を、関連する例や微調整を通じて、コンテキスト内学習を通じて軽減する努力がなされている。しかし、DSLサンプルの制限や迅速な感度といった問題に悩まされているが、企業はDSLの優れたドキュメントを維持している。そこで我々は,構造化コード言語のためのNL-to-Code生成タスクを2段階のプロセスに分解することで,このような豊富な知識を活用できるフレームワークDocCGenを提案する。まず、NLクエリに最もよくマッチするライブラリドキュメントを使用して、正しいライブラリを検出する。次に、これらのライブラリのドキュメントから抽出したスキーマルールを使用して、デコードを制限する。我々は、Ansible YAML と Bash という2つの複雑な構造化言語に対して、アウト・オブ・ドメイン(OOD)とイン・ドメイン(ID)の2つの設定からなるフレームワークを評価した。我々の広範な実験により、DocCGenは6つの評価指標のすべてで異なるサイズの言語モデルを一貫して改善し、構造化コードにおける構文的および意味的誤りを低減します。制約付きコード生成の研究を動機付けるために、データセットとコードをオープンソース化する予定です。

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by LLMs during pre-training. Efforts have been made to mitigate this challenge via in-context learning through relevant examples or by fine-tuning. However, it suffers from problems, such as limited DSL samples and prompt sensitivity but enterprises maintain good documentation of the DSLs. Therefore, we propose DocCGen, a framework that can leverage such rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. First, it detects the correct libraries using the library documentation that best matches the NL query. Then, it utilizes schema rules extracted from the documentation of these libraries to constrain the decoding. We evaluate our framework for two complex structured languages, Ansible YAML and Bash command, consisting of two settings: Out-of-domain (OOD) and In-domain (ID). Our extensive experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics, reducing syntactic and semantic errors in structured code. We plan to open-source the datasets and code to motivate research in constrained code generation.

翻訳日:2024-06-20 00:46:12 公開日:2024-06-17

# REPOEXEC: Repository-Level Executableベンチマークによるコード生成の評価

REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark ( http://arxiv.org/abs/2406.11927v1 )

ライセンス: Link先を確認

Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui,

(参考訳) CodeLLMs が \textit{repository-level scale } で実行可能で機能的に正しいコードを生成する能力はほとんど探索されていない。リポジトリレベルのスケールでコード生成を評価するための新しいベンチマークである‘methodnamews’を導入し、実行可能性と正確性を強調した。 \methodnamewsは、要求を検証し、高カバレッジのテストケースを動的に生成して生成されたコードの機能を評価するメカニズムを組み込む自動システムを提供する。当社の作業では、開発者が必要なコード依存関係を指定して、モデルにこれらを正確に統合させるという、コントロールされたシナリオについて検討しています。実験によると、事前訓練されたLLMは命令チューニングモデルよりも正確性が高いが、後者は、提供された依存関係を活用し、デバッグ機能を示すのに優れている。 \methodnamewsは、コード機能の包括的な評価と開発者の意図の整合性を提供することを目的としている。

The ability of CodeLLMs to generate executable and functionally correct code at the \textit{repository-level scale }remains largely unexplored. We introduce \methodnamews, a novel benchmark for evaluating code generation at the repository-level scale, emphasizing executability and correctness. \methodnamews provides an automated system that verifies requirements and incorporates a mechanism for dynamically generating high-coverage test cases to assess the functionality of generated code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuning models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. \methodnamews aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# FlexCare: フレキシブルなマルチモーダルヘルスケア予測のためのクロスタスクシナジーを活用する

FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction ( http://arxiv.org/abs/2406.11928v1 )

ライセンス: Link先を確認

Muhao Xu, Zhenfeng Zhu, Youru Li, Shuai Zheng, Yawei Zhao, Kunlun He, Yao Zhao,

(参考訳) マルチモーダル電子健康記録(EHR)データは、患者の健康状態の総合的な評価を提供し、様々な予測医療タスクをサポートする。近年,医療領域におけるマルチタスク学習のアプローチを取り入れた研究がいくつかある。しかし、既存の手法では全てのタスクに対して完全なラベルを持つ必要があるため、データに強い要求を課し、モデルの柔軟性を制限する必要がある。一方、マルチモーダルな入力を持つマルチタスクフレームワークでは、モーダル間の情報格差を包括的に考慮する方法は依然として難しい問題である。これらの課題に対処するために,不完全なマルチモーダル入力を柔軟に適応し,複数の医療タスクへの適応を促進するために,‘textbf{FlexCare}’と呼ばれる統合医療予測モデルを提案する。提案モデルは,従来の並列マルチタスク予測のパラダイムを,非同期な単一タスク予測に分解することで破る。具体的には、タスクに依存しないマルチモーダル情報抽出モジュールを提示し、多様なモーダル内およびモーダル間パターンの非相関表現をキャプチャする。異なるモダリティと異なるタスク間の情報格差をフルに考慮し、洗練されたモダリティレベルの表現を個別の患者レベルの表現に統合するタスク誘導型階層型マルチモーダル融合モジュールを提案する。 MIMIC-IV/MIMIC-CXR/MIMIC-NOTEデータセットによる複数のタスクの実験結果から,提案手法の有効性が示された。さらに、さらなる分析は、医療領域でそのようなマルチタスク戦略を採用する可能性と可能性を示している。ソースコードはhttps://github.com/mhxu1998/FlexCareで入手できる。

Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samples to possess complete labels for all tasks, which places heavy demands on the data and restricts the flexibility of the model. Meanwhile, within a multitask framework with multimodal inputs, how to comprehensively consider the information disparity among modalities and among tasks still remains a challenging problem. To tackle these issues, a unified healthcare prediction model, also named by \textbf{FlexCare}, is proposed to flexibly accommodate incomplete multimodal inputs, promoting the adaption to multiple healthcare tasks. The proposed model breaks the conventional paradigm of parallel multitask prediction by decomposing it into a series of asynchronous single-task prediction. Specifically, a task-agnostic multimodal information extraction module is presented to capture decorrelated representations of diverse intra- and inter-modality patterns. Taking full account of the information disparities between different modalities and different tasks, we present a task-guided hierarchical multimodal fusion module that integrates the refined modality-level representations into an individual patient-level representation. Experimental results on multiple tasks from MIMIC-IV/MIMIC-CXR/MIMIC-NOTE datasets demonstrate the effectiveness of the proposed method. Additionally, further analysis underscores the feasibility and potential of employing such a multitask strategy in the healthcare domain. The source code is available at https://github.com/mhxu1998/FlexCare.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 集団限界外における雑音性SVGDの長期無症状

Long-time asymptotics of noisy SVGD outside the population limit ( http://arxiv.org/abs/2406.11929v1 )

ライセンス: Link先を確認

Victor Priser, Pascal Bianchi, Adil Salim,

(参考訳) Stein Variational Gradient Descent (SVGD) は、機械学習の分野で広く使われているサンプリングアルゴリズムである。 SVGDは、対象の分布を近似するために相互作用する粒子(サンプルを表す)の集合を反復的に移動する。 SVGDとその変種に関する最近の研究にもかかわらず、その長年の漸近的挙動(つまり、何度も繰り返した後に)は、有限個の粒子系では理解されていない。 SVGDの雑音変化の長期的漸近挙動について検討した。まず、大きめのノイズSVGDの極限集合が well-defined であることを示す。次に、この極限集合を特徴付け、増加とともにターゲット分布に近づくことを示す。特に、ノイズSVGDは、SVGDで観測される分散崩壊を確実に回避する。我々のアプローチは、ノイズの多いSVGDの軌道がマッケイン・ブラソフ過程によって記述された軌道とよく似ていることを示すものである。

Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# Code-LLMsが学ぶべきでないことの批判的研究

A Critical Study of What Code-LLMs (Do Not) Learn ( http://arxiv.org/abs/2406.11930v1 )

ライセンス: Link先を確認

Abhinav Anand, Shweta Verma, Krishna Narasimhan, Mira Mezini,

(参考訳) コードコーパス(コード-LLM)で訓練された大規模言語モデルは、様々なコーディング支援タスクにおいて素晴らしいパフォーマンスを示している。しかし、サイズとトレーニングデータセットが増大しているにも関わらず、コード-LLMには構文エラーや変数の誤用といった制限がある。コードLLMは、自己注意と隠された表現を用いて入力トークン間の関係を符号化するため、コーディングタスクでうまく機能すると主張する研究もある。しかし、以前の研究では、コード-LLMがコードプロパティをエンコードしていないかは研究されていない。本稿では,注意マップとコード-LLMの隠れ表現の微粒化解析を行う。コード-LLMは入力トークンの特定のサブセット間の関係を符号化するのみである。具体的には、入力トークンを統語トークンと識別子に分類することにより、モデルが統語トークンと識別子間の関係を符号化するが、それらが統語トークンと識別子間の関係を符号化しないことがわかった。また、微調整されたモデルでは、事前訓練されたモデルと比較して、これらの関係をコード化していないことも判明した。さらに、数十億のパラメータを持つ大規模なモデルは、数億のパラメータを持つモデルよりもコードに関する情報をかなり少ないコードでエンコードします。

Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidden representations to encode relations among input tokens. However, previous works have not studied what code properties are not encoded by code-LLMs. In this paper, we conduct a fine-grained analysis of attention maps and hidden representations of code-LLMs. Our study indicates that code-LLMs only encode relations among specific subsets of input tokens. Specifically, by categorizing input tokens into syntactic tokens and identifiers, we found that models encode relations among syntactic tokens and among identifiers, but they fail to encode relations between syntactic tokens and identifiers. We also found that fine-tuned models encode these relations poorly compared to their pre-trained counterparts. Additionally, larger models with billions of parameters encode significantly less information about code than models with only a few hundred million parameters.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# DeepSeek-Coder-V2: コードインテリジェンスにおけるクローズドソースモデルの障壁を突破する

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence ( http://arxiv.org/abs/2406.11931v1 )

ライセンス: Link先を確認

DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao, Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao, Chong Ruan, Fuli Luo, Wenfeng Liang,

(参考訳) We present DeepSeek-Coder-V2, a open-source Mixture-of-Experts (MoE) code language model that achieve performance to GPT4-Turbo in code-specific task。具体的には、DeepSeek-Coder-V2はさらに6兆トークンを追加して、DeepSeek-V2の中間チェックポイントから事前トレーニングされている。この継続事前トレーニングを通じて、DeepSeek-Coder-V2は、一般的な言語タスクで同等のパフォーマンスを維持しながら、DeepSeek-V2のコーディングと数学的推論能力を大幅に強化する。 DeepSeek-Coder-33Bと比較すると、DeepSeek-Coder-V2は、推論や一般的な機能だけでなく、コード関連タスクの様々な面で大きな進歩を示している。さらに、DeepSeek-Coder-V2はプログラミング言語のサポートを86から338に拡張し、コンテキスト長は16Kから128Kに拡張した。標準的なベンチマーク評価では、コーディングや数学ベンチマークにおいて、GPT4-Turbo、Claude 3 Opus、Gemini 1.5 Proといったクローズドソースモデルと比較して、DeepSeek-Coder-V2は優れたパフォーマンスを実現している。

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 大規模リモートセンシングデータセットを用いたマスクオートエンコーダのスケーリング

Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset ( http://arxiv.org/abs/2406.11933v1 )

ライセンス: Link先を確認

Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun,

(参考訳) Masked Image Modeling (MIM)は、リモートセンシング(RS)分野における基礎的な視覚モデルを開発するための重要なアプローチとして登場した。しかし、現在のRSデータセットはボリュームと多様性に制限されており、一般化可能な表現を学習するためのMIMメソッドの容量を著しく制限している。本研究では,高効率なMIMトレーニングを実現するために設計された大規模データセットである \textbf{RS-4M} を紹介する。 RS-4Mは、オブジェクトレベルの検出やピクセルレベルのセグメンテーションを含む、リッチできめ細かなRS視覚タスクを含む400万の光学画像で構成されている。自然画像と比較すると、RS画像には大量の背景画素が含まれており、従来のMIMモデルのトレーニング効率を制限している。そこで本研究では,その意味的豊かさに基づいて選択されたパッチトークンのサブセットを動的にエンコードし,再構成する,効率的なMIM手法である「textbf{SelectiveMAE}」を提案する。 SelectiveMAEはプログレッシブなセマンティックトークン選択モジュールのルーツであり、セマンティックな類似トークンの再構成から相補的なセマンティック依存関係の符号化へと進化している。このアプローチは、従来のMIMトレーニングをプログレッシブな特徴学習プロセスに変換し、SelectiveMAEがRS画像の堅牢な表現を効率的に学習できるようにする。大規模な実験により、SelectiveMAEはトレーニング効率を2.2-2.7倍に向上し、ベースラインMIMモデルの分類、検出、セグメンテーション性能を向上させることが示されている。

Masked Image Modeling (MIM) has emerged as a pivotal approach for developing foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly efficient MIM training on RS images. RS-4M comprises 4 million optical images encompassing abundant and fine-grained RS visual tasks, including object-level detection and pixel-level segmentation. Compared to natural images, RS images often contain massive redundant background pixels, which limits the training efficiency of the conventional MIM models. To address this, we propose an efficient MIM method, termed \textbf{SelectiveMAE}, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness. SelectiveMAE roots in a progressive semantic token selection module, which evolves from reconstructing semantically analogical tokens to encoding complementary semantic dependencies. This approach transforms conventional MIM training into a progressive feature learning process, enabling SelectiveMAE to efficiently learn robust representations of RS images. Extensive experiments show that SelectiveMAE significantly boosts training efficiency by 2.2-2.7 times and enhances the classification, detection, and segmentation performance of the baseline MIM model.The dataset, source code, and trained models will be released.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# ブリッジングデザインギャップ:グラフ誘導拡散モデルを用いたパラメトリックデータ補完手法

Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models ( http://arxiv.org/abs/2406.11934v1 )

ライセンス: Link先を確認

Rui Zhou, Chenyang Yuan, Frank Permenter, Yanxia Zhang, Nikos Arechiga, Matt Klenk, Faez Ahmed,

(参考訳) 本研究では, グラフ注意ネットワークと表層拡散モデルを利用して, 工学設計におけるパラメトリックデータの欠落を解消する生成的計算モデルを提案する。このモデルはAI設計の共同パイロットとして機能し、不完全設計のための複数の設計オプションを提供し、自転車設計CADデータセットを用いて実演する。比較評価により,提案手法は従来の手法,例えばMissForest, HotDeck, PPCA, および表層生成法であるTabCSDIよりも精度と多様性が優れていることを示した。生成モデリングはまた、設計可能性のより広範な探索を可能にし、エンジニアが様々な設計完了を探索できるようにすることで設計決定を強化する。グラフモデルは、GNNとアセンブリグラフに含まれる構造情報を組み合わせて、異なる設計パラメータ間の複雑な相互依存性を理解し、予測することができる。グラフモデルは、設計問題の鍵となるアセンブリグラフから複雑なパラメトリック相互依存性を正確にキャプチャし、インプットするのに役立つ。既存の設計データセットから学習することで、インプット機能は、ユーザが定義した部分パラメトリック設計に基づいてCAD設計を自動補完するインテリジェントアシスタントとして機能し、アイデアと実現のギャップを効果的に埋めることができる。提案された研究は、インフォームドデザインの決定を促進するだけでなく、デザインにおける創造的な探索を促進する経路を提供する。

This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our model significantly outperforms existing classical methods, such as MissForest, hotDeck, PPCA, and tabular generative method TabCSDI in both the accuracy and diversity of imputation options. Generative modeling also enables a broader exploration of design possibilities, thereby enhancing design decision-making by allowing engineers to explore a variety of design completions. The graph model combines GNNs with the structural information contained in assembly graphs, enabling the model to understand and predict the complex interdependencies between different design parameters. The graph model helps accurately capture and impute complex parametric interdependencies from an assembly graph, which is key for design problems. By learning from an existing dataset of designs, the imputation capability allows the model to act as an intelligent assistant that autocompletes CAD designs based on user-defined partial parametric design, effectively bridging the gap between ideation and realization. The proposed work provides a pathway to not only facilitate informed design decisions but also promote creative exploration in design.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 反復的か革新的か? コード最適化のための問題指向の視点

Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization ( http://arxiv.org/abs/2406.11935v1 )

ライセンス: Link先を確認

Tong Ye, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang,

(参考訳) 大規模言語モデル(LLM)は、幅広いプログラミングタスクを解く上で強力な能力を示している。しかし、LLMはコード最適化のために研究されることはめったにない。本稿では,パフォーマンス向上に着目したコード最適化について検討する。最近提案された性能最適化のための最初のPIEデータセットは、同じ問題に対して同じプログラマからの反復的な提案に基づいて、プログラム最適化ペアを構成する。しかし、このアプローチはLLMを局所的な性能改善に制限し、グローバルアルゴリズムの革新を無視している。したがって、最適化ペアを問題指向のアプローチに再構成することで、まったく異なる視点を採用する。これにより、異なるプログラマが同じ問題に対処する様々な巧妙なアイデアの統合が可能になる。実験により, LLMを問題指向最適化ペアに適応させることで, 最適化性能が著しく向上することが示された。一方、問題指向の観点からパフォーマンスボトルネックを特定しました。モデルマージを利用することで、ボトルネックをさらに克服し、最終的にプログラム最適化比率(51.76\%\rightarrow76.65\%$)とスピードアップ(2.65\times\rightarrow5.09\times$)を新たなレベルに引き上げる。

Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. However, LLMs have rarely been explored for code optimization. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time. The recently proposed first PIE dataset for performance optimization constructs program optimization pairs based on iterative submissions from the same programmer for the same problem. However, this approach restricts LLMs to local performance improvements, neglecting global algorithmic innovation. Therefore, we adopt a completely different perspective by reconstructing the optimization pairs into a problem-oriented approach. This allows for the integration of various ingenious ideas from different programmers tackling the same problem. Experimental results demonstrate that adapting LLMs to problem-oriented optimization pairs significantly enhances their optimization capabilities. Meanwhile, we identified performance bottlenecks within the problem-oriented perspective. By employing model merge, we further overcame bottlenecks and ultimately elevated the program optimization ratio ($51.76\%\rightarrow76.65\%$) and speedup ($2.65\times\rightarrow5.09\times$) to new levels.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 対話型言語モデルにおける視点の追跡

Tracking the perspectives of interacting language models ( http://arxiv.org/abs/2406.11938v1 )

ライセンス: Link先を確認

Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe,

(参考訳) 大型言語モデル(LLM)は前例のない速度で高品質な情報を生成することができる。これらのモデルが社会に浸透し続けていくにつれ、それらが生み出すコンテンツは、事前学習データ、微調整データ、検索データなどの他の言語モデルに組み込まれるデータベースにおいて、ますます普及していくでしょう。本稿では,LLMの通信ネットワークの考え方を定式化し,LLMの集合内の個々のモデルの視点を表現する手法を提案する。これらのツールを用いて,様々な環境下でのLLMの通信ネットワークにおける情報拡散を系統的に研究する。

Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# クラウドソーシングデータから高品質ベンチマークへ - Arena-Hard氏とBenchBuilder Pipeline

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline ( http://arxiv.org/abs/2406.11939v1 )

ライセンス: Link先を確認

Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica,

(参考訳) 言語モデルの急速な進化は、より困難なベンチマークの開発を必要としている。現在の静的ベンチマークは、異なるモデルの能力を一貫して区別するのに苦労し、実際のユーザの好みと一致しないことが多い。一方、Chatbot Arenaのようなクラウドソースのライブプラットフォームは、さまざまな自然なプロンプトやユーザからのフィードバックを集めている。しかし、これらのプロンプトは高度に変化しており、新しいモデルにオフラインでフィードバックを適用することはできない。ベンチマークがLLM開発のペースに遅れないようにするために、モデルを確実に分離する能力と人間の好みに合わせてベンチマークを評価する方法について論じる。これらの原則の下で、私たちはライブデータソースから高品質なプロンプトをフィルタリングして、新しくて困難なプロンプトのオフライン評価を可能にする、ライブベンチマークであるBenchBuilderを開発しました。 BenchBuilderは、ドメイン知識の要求など、高品質なプロンプトの7つの指標を特定し、LLMアノテータを使用して、さまざまなトピッククラスタから高品質なプロンプトのサブセットを選択する。 LLM評価プロセスは、完全に自動化され、高品質で、常に更新されるベンチマークを保証するために、LLM判定器を使用する。 We apply BenchBuilder on the Chatbot Arena to create Arena-Hard-Auto v0.1: 500 challenge user prompts from various range of tasks。 Arena-Hard-Auto v0.1はMT-Benchよりも3倍の信頼区間を提供し、最先端の89.1%と人間の選好ランクとの合意を達成している。 BenchBuilderパイプラインは評価ベンチマークを強化し、開発者に価値のあるツールを提供する。

The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback. However, these prompts vary in sophistication and the feedback cannot be applied offline to new models. In order to ensure that benchmarks keep up with the pace of LLM development, we address how one can evaluate benchmarks on their ability to confidently separate models and their alignment with human preference. Under these principles, we developed BenchBuilder, a living benchmark that filters high-quality prompts from live data sources to enable offline evaluation on fresh, challenging prompts. BenchBuilder identifies seven indicators of a high-quality prompt, such as the requirement for domain knowledge, and utilizes an LLM annotator to select a high-quality subset of prompts from various topic clusters. The LLM evaluation process employs an LLM judge to ensure a fully automated, high-quality, and constantly updating benchmark. We apply BenchBuilder on prompts from the Chatbot Arena to create Arena-Hard-Auto v0.1: 500 challenging user prompts from a wide range of tasks. Arena-Hard-Auto v0.1 offers 3x tighter confidence intervals than MT-Bench and achieves a state-of-the-art 89.1% agreement with human preference rankings, all at a cost of only $25 and without human labelers. The BenchBuilder pipeline enhances evaluation benchmarks and provides a valuable tool for developers, enabling them to extract high-quality benchmarks from extensive data with minimal effort.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 部分的ネットワークデータを用いた干渉のモデルベース推論と実験設計

Model-Based Inference and Experimental Design for Interference Using Partial Network Data ( http://arxiv.org/abs/2406.11940v1 )

ライセンス: Link先を確認

Steven Wilkins Reeves, Shane Lubold, Arun G. Chandrasekhar, Tyler H. McCormick,

(参考訳) 安定した単位処理値の仮定は、個人の結果が他者の治療状況に影響されないことを示しているが、多くの現実世界の応用において、治療は即時治療以上の多くの人に影響を及ぼす可能性がある。干渉は一般に、ネットワーク構造を通して媒介されると考えることができる。しかし、多くの経験的な状況において、完全なネットワークデータ(これらの流出効果を調整するために要求される)はコストがかかりすぎるか、論理的には収集できない。部分的あるいは間接的に観察されるネットワークデータ(サブサンプル,集約された関係データ(ARD),エゴセントリックサンプリング,あるいは応答駆動サンプリング)は,ネットワークデータ収集のロジスティックおよび金銭的負担を軽減するが,これらの設計戦略による処理効果調整の統計的性質は探求され始めている。本稿では,構造因果モデルのレンズを用いた部分的ネットワークデータを用いて,治療効果の調整を推定・推定するためのフレームワークを提案する。また、部分的ネットワークデータのみを用いて治療を割り当てる手順についても説明し、推定値の分散を最小化するか、最適なシード化を目標とする。我々は、基礎となるグラフモデルに対して、様々な選択肢に適用可能な単一のネットワーク漸近結果を得る。本研究では,インドとマラウイにおける情報拡散と観測グラフのシミュレーション実験によるアプローチの有効性を検証した。

The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# クロスフューザー:自動車追従軌道予測のためのクロスアテンション変圧器拡張条件拡散モデル

Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction ( http://arxiv.org/abs/2406.11941v1 )

ライセンス: Link先を確認

Junwei You, Haotian Shi, Keshu Wu, Keke Long, Sicheng Fu, Sikai Chen, Bin Ran,

(参考訳) 自動車軌道予測は、自動運転と高度運転支援システム(ADAS)の進歩に不可欠であり、道路の安全と交通効率を向上させる。従来の手法は基礎的な作業を行っているが、現代のディープラーニング技術、特にトランスフォーマーベースのモデルと生成的アプローチは、車両の動きや交通の相互作用における複雑なパターンや非線形パターンを捉えることによって予測精度を大幅に向上させた。しかし、これらのモデルはしばしば、現実世界の運転シナリオに不可欠な詳細な自動車追従行動や車間相互作用を見落としている。本研究では,自動車追従軌道予測のためのクロスアテンショントランスフォーマー拡張条件拡散モデル(Crossfusor)を提案する。 Crossfusorは、車間相互作用と自動車追従ダイナミクスを堅牢な拡散フレームワークに統合し、予測された軌跡の精度と現実性の両方を改善する。このモデルは、GRU、位置ベースアテンション機構、そしてFourier埋め込みを組み合わせた新しい時間的特徴符号化フレームワークを活用して、歴史的車両力学を捉える。前方拡散過程において、これらの符号化された歴史的特徴によってスケールされたノイズを使用し、逆復調過程において、複雑な車間依存関係をモデル化するために、クロスアテンショントランスフォーマーを使用する。 NGSIMデータセットの実験結果から、クロスファザーは最先端のモデル、特に長期予測において、自律運転システムの予測能力を向上する可能性を示している。

Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-linear patterns in vehicle motion and traffic interactions. However, these models often overlook the detailed car-following behaviors and inter-vehicle interactions essential for real-world driving scenarios. This study introduces a Cross-Attention Transformer Enhanced Conditional Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction. Crossfusor integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories. The model leverages a novel temporal feature encoding framework combining GRU, location-based attention mechanisms, and Fourier embedding to capture historical vehicle dynamics. It employs noise scaled by these encoded historical features in the forward diffusion process, and uses a cross-attention transformer to model intricate inter-vehicle dependencies in the reverse denoising process. Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions, showcasing its potential for enhancing the predictive capabilities of autonomous driving systems.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# クライアント-ワイズ関係グラフを用いた個人化フェデレーション知識グラフ

Personalized Federated Knowledge Graph Embedding with Client-Wise Relation Graph ( http://arxiv.org/abs/2406.11943v1 )

ライセンス: Link先を確認

Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen,

(参考訳) Federated Knowledge Graph Embedding (FKGE)は、分散知識グラフから表現表現を抽出する能力と、個々のクライアントのプライバシを同時に保護する能力によって、最近かなりの関心を集めている。既存のFKGEメソッドは通常、すべてのクライアントからのエンティティ埋め込みの算術平均をグローバル補完知識として利用し、各クライアントに対するグローバルコンセンサスエンティティ埋め込みのレプリカを学ぶ。しかしながら、これらの手法は通常、異なるクライアント間の固有の意味的相違を無視する。この監視によって、グローバルに共有される補完的な知識が、特定のクライアントに合わせるとノイズが多すぎるだけでなく、局所的な最適化目標とグローバルな最適化目標の相違も生じます。これにより、学習した埋め込みの品質が損なわれる。これを解決するために,PFedEG(Personalized Federated Knowledge Graph Embedding with client-wise relation Graph)を提案する。具体的には、PFedEGは、クライアントワイド関係グラフ上の「親和性」に基づいて、近隣のクライアントからエンティティを埋め込むことで、各クライアントに対してパーソナライズされた補足的知識を学習する。それぞれのクライアントは、ローカルのトリプルとパーソナライズされた補足的知識に基づいて、パーソナライズされた埋め込み学習を行う。我々は,4つのベンチマークデータセットを用いて,最先端モデルに対する提案手法の評価を行い,本手法の優位性を実証した。

Federated Knowledge Graph Embedding (FKGE) has recently garnered considerable interest due to its capacity to extract expressive representations from distributed knowledge graphs, while concurrently safeguarding the privacy of individual clients. Existing FKGE methods typically harness the arithmetic mean of entity embeddings from all clients as the global supplementary knowledge, and learn a replica of global consensus entities embeddings for each client. However, these methods usually neglect the inherent semantic disparities among distinct clients. This oversight not only results in the globally shared complementary knowledge being inundated with too much noise when tailored to a specific client, but also instigates a discrepancy between local and global optimization objectives. Consequently, the quality of the learned embeddings is compromised. To address this, we propose Personalized Federated knowledge graph Embedding with client-wise relation Graph (PFedEG), a novel approach that employs a client-wise relation graph to learn personalized embeddings by discerning the semantic relevance of embeddings from other clients. Specifically, PFedEG learns personalized supplementary knowledge for each client by amalgamating entity embedding from its neighboring clients based on their "affinity" on the client-wise relation graph. Each client then conducts personalized embedding learning based on its local triples and personalized supplementary knowledge. We conduct extensive experiments on four benchmark datasets to evaluate our method against state-of-the-art models and results demonstrate the superiority of our method.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# トランスコーダによるLLM特徴回路の解釈

Transcoders Find Interpretable LLM Feature Circuits ( http://arxiv.org/abs/2406.11944v1 )

ライセンス: Link先を確認

Jacob Dunefsky, Philippe Chlenski, Neel Nanda,

(参考訳) 機械的解釈可能性の重要なゴールは回路解析であり、特定の振る舞いや能力に対応するモデルのスパース部分グラフを見つけることである。しかし、MLPサブレイヤは変換器ベースの言語モデルにおいて、きめ細かい回路解析を困難にしている。特に、スパースオートエンコーダ(SAE)で見られるような解釈可能な特徴は、通常、非常に多くのニューロンの線形結合であり、それぞれが考慮すべき非線形性を持つ。この設定での回路解析は、引き締まるほど大きな回路を得るか、局所的および大域的挙動を乱すのに失敗する。これを解決するためにトランスコーダを探索し、より広く、疎に活性化するMLP層を忠実に近似する。 120M, 410M, 1.4Bのパラメータを持つ言語モデル上でトランスコーダをトレーニングし, 空間性, 忠実性, 人間の解釈可能性の観点から, 少なくともSAEと同等に動作できることを見出した。次に,重みに基づく回路解析を行うためにトランスコーダを用いた新しい手法を提案する。結果として得られる回路は、入力依存項と入力不変項に適切に分解される。最後に,モデル内の未知回路のリバースエンジニアリングにトランスコーダを適用し,GPT2小形回路の高次回路に関する新たな知見を得る。その結果,トランスコーダはMLPを含むモデル計算を解釈可能な回路に分解するのに有効であることが示唆された。コードはhttps://github.com/jacobdunefsky/transcoder_circuitsで入手できる。

A key goal in mechanistic interpretability is circuit analysis: finding sparse subgraphs of models corresponding to specific behaviors or capabilities. However, MLP sublayers make fine-grained circuit analysis on transformer-based language models difficult. In particular, interpretable features -- such as those found by sparse autoencoders (SAEs) -- are typically linear combinations of extremely many neurons, each with its own nonlinearity to account for. Circuit analysis in this setting thus either yields intractably large circuits or fails to disentangle local and global behavior. To address this we explore transcoders, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We successfully train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability. We then introduce a novel method for using transcoders to perform weights-based circuit analysis through MLP sublayers. The resulting circuits neatly factorize into input-dependent and input-invariant terms. Finally, we apply transcoders to reverse-engineer unknown circuits in the model, and we obtain novel insights regarding the greater-than circuit in GPT2-small. Our results suggest that transcoders can prove effective in decomposing model computations involving MLPs into interpretable circuits. Code is available at https://github.com/jacobdunefsky/transcoder_circuits.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# GAugLLM:大規模言語モデルによるテキスト分散グラフのグラフコントラスト学習の改善

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models ( http://arxiv.org/abs/2406.11945v1 )

ライセンス: Link先を確認

Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan,

(参考訳) 本研究は,ノードがテキスト属性で表されるテキスト分散グラフ(TAG)の自己教師型グラフ学習について研究する。数値的な特徴空間を摂動させ、グラフの位相構造を変化させる従来のグラフコントラスト法とは異なり、言語指導を通してビュー生成を改善することを目的としている。これは、リッチなセマンティック情報を持つグラフ構造を補完する、実際のアプリケーションにおけるテキスト属性の出現によって引き起こされる。しかし、これは2つの大きな理由から課題を提起する。第一に、テキストの属性は長さと品質が多様であり、本来の意味を変えることなく、生のテキスト記述を摂動させることが困難である。第二に、テキスト属性はグラフ構造を補完するが、本質的には整合性はない。ギャップを埋めるために,TAGを増強する新しいフレームワークであるGAugLLMを紹介する。 Mistralのような先進的な大規模言語モデルを活用して、自己教師付きグラフ学習を強化する。具体的には、拡張ノード特徴を生成するための混合プロンプト-エキスパート手法を提案する。提案手法は,複数のプロンプトの専門家に適応的に対応して,プロンプトエンジニアリングを用いた原文属性を数値的特徴空間にマッピングする。さらに、構造的およびテキスト的共通性を活用するための協調的なエッジ修飾器を考案し、ノード間の接続を検査または構築することでエッジ拡張を強化する。さまざまなドメインにまたがる5つのベンチマークデータセットに対する実証的な結果から、プラグインツールとして主要なコントラストメソッドのパフォーマンスを向上させるフレームワークの能力が明らかになりました。特に,拡張機能とグラフ構造により,一般的なグラフニューラルネットワークと同様に,標準生成手法の性能向上が期待できる。 GAugLLMのオープンソース実装はGithubで公開されています。

This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information. However, this presents challenges because of two major reasons. First, text attributes often vary in length and quality, making it difficulty to perturb raw text descriptions without altering their original semantic meanings. Second, although text attributes complement graph structures, they are not inherently well-aligned. To bridge the gap, we introduce GAugLLM, a novel framework for augmenting TAGs. It leverages advanced large language models like Mistral to enhance self-supervised graph learning. Specifically, we introduce a mixture-of-prompt-expert technique to generate augmented node features. This approach adaptively maps multiple prompt experts, each of which modifies raw text attributes using prompt engineering, into numerical feature space. Additionally, we devise a collaborative edge modifier to leverage structural and textual commonalities, enhancing edge augmentation by examining or building connections between nodes. Empirical results across five benchmark datasets spanning various domains underscore our framework's ability to enhance the performance of leading contrastive methods as a plug-in tool. Notably, we observe that the augmented features and graph structure can also enhance the performance of standard generative methods, as well as popular graph neural networks. The open-sourced implementation of our GAugLLM is available at Github.

翻訳日:2024-06-20 00:36:26 公開日:2024-06-17

# 六方晶窒化ホウ素における荷電空孔の固有高忠実スピン偏極

Intrinsic high-fidelity spin polarization of charged vacancies in hexagonal boron nitride ( http://arxiv.org/abs/2406.11953v1 )

ライセンス: Link先を確認

Wonjae Lee, Vincent S. Liu, Zhelun Zhang, Sangha Kim, Ruotian Gong, Xinyi Du, Khanh Pham, Thomas Poirier, Zeyu Hao, James H. Edgar, Philip Kim, Chong Zu, Emily J. Davis, Norman Y. Yao,

(参考訳) 六方晶窒化ホウ素 (hBN) における負電荷のホウ素空孔 (\mathrm{V}_{\mathrm{B}}^-$) は2次元材料の欠陥において顕著な注目を集めている。これは部分的には、その決定論的生成、良好な特性を持つ原子構造、室温での光学偏光性に起因している。地表面と励起状態の偏光ダイナミクスを両立させた広範囲な測定により,後者について検討した。これらの測定に基づいて半古典的モデルを構築し、周囲条件下での他の固体スピン欠陥を克服し、スピン偏極のほぼ均一度を予測する。我々のモデルに基づいて、我々は$\mathrm{V}_{\mathrm{B}}^-$に隣接する核スピンの自由度の存在を含み、核スピンの超微細誘導偏極を研究するためにリンドブラディアン数値の包括的集合を実行する。我々のシミュレーションは、実験によって生じる磁場の関数として現れる多くの重要な特徴を予測している。

The negatively charged boron vacancy ($\mathrm{V}_{\mathrm{B}}^-$) in hexagonal boron nitride (hBN) has garnered significant attention among defects in two-dimensional materials. This owes, in part, to its deterministic generation, well-characterized atomic structure, and optical polarizability at room temperature. We investigate the latter through extensive measurements probing both the ground and excited state polarization dynamics. We develop a semiclassical model based on these measurements that predicts a near-unity degree of spin polarization, surpassing other solid-state spin defects under ambient conditions. Building upon our model, we include the presence of nuclear spin degrees of freedom adjacent to the $\mathrm{V}_{\mathrm{B}}^-$ and perform a comprehensive set of Lindbladian numerics to investigate the hyperfine-induced polarization of the nuclear spins. Our simulations predict a number of important features that emerge as a function of magnetic field which are borne out by experiment.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 空洞QED材料に対する線形応答理論

Linear response theory for cavity QED materials ( http://arxiv.org/abs/2406.11957v1 )

ライセンス: Link先を確認

Juan Román-Roche, Álvaro Gómez-León, Fernando Luis, David Zueco,

(参考訳) 空洞QED材料における線形応答理論の厳密な枠組みについて述べる。我々のアプローチは、光と物質の間の集合的な結合を利用して、量子場理論において大きなN理論と平行に描画する。空洞と物質の両方の様々な応答に対する閉公式を導出する。我々の理論は、Dickeモデルと量子ホール効果の確立された結果の回復によって検証される。さらに、空洞がマグノン対を局所状態に結合する量子磁石において、新しい励起が発見される。

We present a rigorous framework for linear response theory in cavity QED materials. Our approach leverages the collective coupling between light and matter, drawing parallels with large-N theories in quantum field theory. We derive closed formulas for various responses of both the cavity and the matter. Our theory is validated by recovering established results for the Dicke model and the Quantum Hall Effect. Additionally, we discover novel excitations in quantum magnets, where the cavity binds magnon pairs into localized states.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# Stripping Quantum Decision Diagrams of their Identity

Stripping Quantum Decision Diagrams of their Identity ( http://arxiv.org/abs/2406.11959v1 )

ライセンス: Link先を確認

Aaron Sander, Ioan-Albert Florea, Lukas Burgholzer, Robert Wille,

(参考訳) ベクトルや行列としての量子状態と演算の古典的な表現は、メモリの指数関数的な増加と、システムサイズを増やすための実行時要求に悩まされている。古典コンピューティングでの使用に基づいて、決定図(Decision Diagrams, DD)と呼ばれる別のデータ構造が提案されており、よりコンパクトな表現とより効率的な計算を提供することが多い。古典的な領域では、何十年にもわたってDDの研究が行われ、特定の用途に適した多くのバリエーションが存在する。しかし、量子コンピューティングのためのDDは生まれたばかりであり、この新しい技術に合わせる余地はまだ残っている。特に、既存のDDの表現は、アイデンティティ行列を表すノードによって拡張され、量子回路内の全ての操作をシステムサイズに拡張する必要がある。本研究では、これらのアイデンティティ構造を量子演算から取り除くことにより、量子DDにとって重要な一歩を踏み出す。これにより、それらを表現するために必要なノードの数を大幅に削減し、実装の主要なビルディングブロックに対するプレッシャーを緩和する。その結果、量子コンピューティングにはより自然な構造が得られ、現状に比べて最大70倍のランタイム改善が達成され、計算処理が大幅に高速化される。

Classical representations of quantum states and operations as vectors and matrices are plagued by an exponential growth in memory and runtime requirements for increasing system sizes. Based on their use in classical computing, an alternative data structure known as Decision Diagrams (DDs) has been proposed, which, in many cases, provides both a more compact representation and more efficient computation. In the classical realm, decades of research have been conducted on DDs and numerous variations tailored for specific applications exist. However, DDs for quantum computing are just in their infancy and there is still room for tailoring them to this new technology. In particular, existing representations of DDs require extending all operations in a quantum circuit to the full system size through extension by nodes representing identity matrices. In this work, we make an important step forward for quantum DDs by stripping these identity structures from quantum operations. This significantly reduces the number of nodes required to represent them as well as eases the pressure on key building blocks of their implementation. As a result, we obtain a structure that is more natural for quantum computing and significantly speeds up with computations-with a runtime improvement of up to 70x compared to the state-of-the-art.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# スパース非エルミートSYKモデルにおける特異値相関による量子カオスの探索

Probing quantum chaos through singular-value correlations in sparse non-Hermitian SYK model ( http://arxiv.org/abs/2406.11969v1 )

ライセンス: Link先を確認

Pratik Nandy, Tanay Pathak, Masaki Tezuka,

(参考訳) 特異値分解を生かして,SYKモデルにおける特異値のスペクトルに着目した。非エルミート系の典型的な複素固有値とは異なり、特異値は本質的に実かつ正である。以上の結果から,特異値の統計値と類似のエルミート・ガウスアンサンブルの統計値との一致が明らかとなった。非エルミートSYKモデルは、そのカオス的挙動から逸脱し、特異値比によって正確に捉えられる現象となる。スペクトル形状因子 (SFF) に類似した特異形状因子 ({\upsigma}FF) の解析は, 間隔が増大する線形ランプの消失を示す。さらに、飽和度がスパース性の重要なしきい値となるエルミート系のスペクトル複雑性に着想を得た特異な複雑性を定義する。このような分解は、非エルミート系に対する既存のホログラフィック双対の分解と関係している可能性が高い。

Utilizing singular value decomposition, our investigation focuses on the spectrum of the singular values within a sparse non-Hermitian Sachdev-Ye-Kitaev (SYK) model. Unlike the complex eigenvalues typical of non-Hermitian systems, singular values are inherently real and positive. Our findings reveal a congruence between the statistics of singular values and those of the analogous Hermitian Gaussian ensembles. An increase in sparsity results in the non-Hermitian SYK model deviating from its chaotic behavior, a phenomenon precisely captured by the singular value ratios. Our analysis of the singular form factor ({\upsigma}FF), analogous to the spectral form factor (SFF) indicates the disappearance of the linear ramp with increased sparsity. Additionally, we define singular complexity, inspired by the spectral complexity in Hermitian systems, whose saturation provides a critical threshold of sparseness. Such disintegration is likely associated with the breakdown of the existing holographic dual for non-Hermitian systems.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# キャビティQED材料:任意の光-物質結合強度における2つの線形応答理論の比較と検証

Cavity QED materials: Comparison and validation of two linear response theories at arbitrary light-matter coupling strengths ( http://arxiv.org/abs/2406.11971v1 )

ライセンス: Link先を確認

Juan Román-Roche, Álvaro Gómez-León, Fernando Luis, David Zueco,

(参考訳) 共振器と共振器とを結合した材料に対する線形応答理論を開発し, 対称性破壊相を含む光物質結合のすべての状態に有効である。我々は2つの異なるアプローチを提示し比較する。まず、分割関数に対するコヒーレントパス積分定式化を用いて熱グリーン関数を得る。このアプローチは作用のサドル点展開に依存しており、熱力学の極限で切り離すことができる。第二に、グリーン関数の運動方程式を定式化し、それらを解く。我々は、閉可解方程式系を得るために、高階グリーン関数の平均場分離を用いる。どちらの手法もキャビティと材料に対する応答関数の計算において同じ結果をもたらす。これらは素空洞と物質応答の点で得られる。この2つの手法は, 相関した光物質系における平均場分離の有効性を明らかにし, 熱力学的限界に対する有限サイズ補正を補完する手段を提供する。この理論は、長波長近似において、キャビティQED材料分野において一般的に考慮されるほとんどのシステムを含む一般的なモデルのために定式化されている。最後に、量子ホール効果と磁気モデルの収集にこの理論の詳細な応用を与える。解析的および有限サイズの正確な対角化結果に対する予測を検証する。

We develop a linear response theory for materials collectively coupled to a cavity that is valid in all regimes of light-matter coupling, including symmetry-broken phases. We present and compare two different approaches. First, using a coherent path integral formulation for the partition function to obtain thermal Green functions. This approach relies on a saddle point expansion for the action, that can be truncated in the thermodynamic limit. Second, by formulating the equations of motion for the retarded Green functions and solving them. We use a mean-field decoupling of high-order Green functions in order to obtain a closed, solvable system of equations. Both approaches yield identical results in the calculation of response functions for the cavity and material. These are obtained in terms of the bare cavity and material responses. In combination, the two techniques clarify the validity of a mean-field decoupling in correlated light-matter systems and provide complementary means to compute finite-size corrections to the thermodynamic limit. The theory is formulated for a general model that encompasses most of the systems typically considered in the field of cavity QED materials, within a long-wavelength approximation. Finally, we provide a detailed application of the theory to the Quantum Hall effect and to a collection of magnetic models. We validate our predictions against analytical and finite-size exact-diagonalization results.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 演算子に基づく量子熱力学不確実性関係

Operator-based quantum thermodynamic uncertainty relations ( http://arxiv.org/abs/2406.11974v1 )

ライセンス: Link先を確認

Pratik Sathe, Luis Pedro García-Pintos, Francesco Caravelli,

(参考訳) 粒子の位置と運動量の不確かさを結びつけるハイゼンベルクの不確実性関係は、物理系の量子的挙動に重要なフットプリントを持つ。この原理により、作業、熱、内部エネルギーに関連する熱力学電流は、よく定義されたエルミート作用素によって記述される。まず, 理論上は, 平均熱力学速度を計算するために, 単点測定を行うことが可能であることを示す。これらの速度、すなわち電流は、対応する作用素の非可換性のため、古典的なものと異なる。 Robertson-Schr\odingerの不確実性関係を用いて、それらの間の様々な熱力学的不確実性関係を得る。特に、熱速度と熱力のゆらぎと内部エネルギーのゆらぎを結びつける。さらに、この手法を量子電池に適用することにより、エネルギー・電力の不確実性関係を導出し、測定が変動にどのように影響するかを示す。

The Heisenberg uncertainty relation, which links the uncertainties of the position and momentum of a particle, has an important footprint on the quantum behavior of a physical system. Motivated by this principle, we propose that thermodynamic currents associated with work, heat, and internal energy are described by well-defined Hermitian operators; i.e., we associate physical observables to quantum thermodynamic flows. First, we show that, in principle, it is possible to perform single-point measurements to compute average thermodynamic rates. These rates, or currents, differ from their classical counterparts due to the non-commutativity of the corresponding operators. Using the Robertson-Schr\"odinger uncertainty relation, we then obtain various thermodynamic uncertainty relationships between them. In particular, we connect the fluctuations in heat rate and thermodynamic power with those in internal energy. We further illustrate this approach by applying it to quantum batteries, where we derive an energy-power uncertainty relationship and show how measurements affect the fluctuations.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 視覚的文法誘導モデルを用いた共同推論としての言語ブートストラップ

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models ( http://arxiv.org/abs/2406.11977v1 )

ライセンス: Link先を確認

Eva Portelance, Siva Reddy, Timothy J. O'Donnell,

(参考訳) 意味的・統語的ブートストラッピング・ポジトリ(Semantic and Syntactic bootstrapping posit)とは、子供が特定の言語領域についての事前の知識、例えば構文的関係(syntactic relations)を使い、後に新しい単語の意味などの他の知識を取得する手助けをするものである。両理論を裏付ける実証的な結果は、これらが互いに先行する学習戦略であると考える誘惑を招きかねない。ここでは、両者が言語習得のためのより一般的な学習戦略、すなわち共同学習に精通していると論じる。一連の視覚的文法帰納モデルを用いて,構文と意味が同時に学習された場合に,構文的および意味的ブートストラップ効果が最強であることが実証された。共同学習は、より良い文法誘導、現実的な語彙カテゴリー学習、新しい文と動詞の意味のより良い解釈をもたらす。共同学習は、構文と意味論の両方の仮説空間を相互に制約することで、学習者にとって言語習得を容易にする。多くの入力源とモダリティに対する共同推論のダイナミクスを研究することは、認知科学とAIの両方における言語モデリングと学習研究にとって重要な新しい方向性であり、より制約のある学習環境で言語をどのように獲得できるかを説明するのに役立つかもしれない。

Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 対話行動トークン:マルチターンプランナを用いたゴール指向対話におけるステアリング言語モデル

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner ( http://arxiv.org/abs/2406.11978v1 )

ライセンス: Link先を確認

Kenneth Li, Yiming Wang, Fernanda Viégas, Martin Wattenberg,

(参考訳) 本稿では,対話行動トークン(DAT)と呼ばれる言語モデルエージェントを用いて,目標指向対話を計画する手法を提案する。中心となる考え方は、各発話をアクションとして扱うことで、強化学習のような既存のアプローチを適用することができるゲームに対話を変換することである。具体的には、事前訓練された言語モデルを凍結し、各ラウンドで制御された生成に使用される連続的な行動ベクトルを予測する小さなプランナーモデルを訓練する。この設計は、報酬最適化の下での言語劣化の問題を回避している。ソーシャルシミュレーションのためのSotopiaプラットフォーム上での評価では、DATステアリングされたLLaMAモデルがGPT-4の性能を上回っている。また, DATを用いて, 新たなマルチターン・リピート・セッティングにおいて, 攻撃言語モデルを操り, 潜在的に新たな攻撃面を明らかにする。

We present an approach called Dialogue Action Tokens (DAT) that adapts language model agents to plan goal-directed dialogues. The core idea is to treat each utterance as an action, thereby converting dialogues into games where existing approaches such as reinforcement learning can be applied. Specifically, we freeze a pretrained language model and train a small planner model that predicts a continuous action vector, used for controlled generation in each round. This design avoids the problem of language degradation under reward optimization. When evaluated on the Sotopia platform for social simulations, the DAT-steered LLaMA model surpasses GPT-4's performance. We also apply DAT to steer an attacker language model in a novel multi-turn red-teaming setting, revealing a potential new attack surface.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 二次元量子イジングモデルにおける制約力学と閉じ込め

Constrained dynamics and confinement in the two-dimensional quantum Ising model ( http://arxiv.org/abs/2406.11979v1 )

ライセンス: Link先を確認

Luka Pavešić, Daniel Jaschke, Simone Montangero,

(参考訳) 量子イジングモデルの2次元正方格子上のダイナミクスを16進数16$スピンまで調べる。次相では, モデルが動的に制約されたダイナミックスを示すことが予測され, 基本励起の抑制と熱の緩やかな熱化が生じる。閉じ込めのシグネチャを実証した後, 対向磁化領域の積状態の急激なクエンチを通じて, 拘束状態における界面のダイナミクスを探索する。その結果, 励起の性質は, 凝縮系全体にわたって摂動理論によって捉えられ, 断裂系との交叉を識別できることがわかった。平面に沿って伝播するモードに対する横方向磁場の影響を系統的に検討し、より大きな格子に埋め込まれた2乗スピンの共振から拡散融解への交叉について検討する。

We investigate the dynamics of the quantum Ising model on two-dimensional square lattices up to $16 \times 16$ spins. In the ordered phase, the model is predicted to exhibit dynamically constrained dynamics, leading to confinement of elementary excitations and slow thermalization. After demonstrating the signatures of confinement, we probe the dynamics of interfaces in the constrained regime through sudden quenches of product states with domains of opposite magnetization. We find that the nature of excitations can be captured by perturbation theory throughout the confining regime, and identify the crossover to the deconfining regime. We systematically explore the effect of the transverse field on the modes propagating along flat interfaces and investigate the crossover from resonant to diffusive melting of a square of flipped spins embedded in a larger lattice.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 計算社会科学課題におけるプロンプト設計の課題 : 予測不可能な方法で

Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways ( http://arxiv.org/abs/2406.11980v1 )

ライセンス: Link先を確認

Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, Libby Hemphill,

(参考訳) 計算社会科学のタスクに手動でアノテートするデータは、コストがかかり、時間がかかり、感情的に排水される可能性がある。最近の研究は、LCMがゼロショット設定でこのようなアノテーションタスクを実行できることを示唆しているが、設計がLCMのコンプライアンスと正確性にどのように影響するかは分かっていない。モデル選択(ChatGPT, PaLM2, Falcon7b)と設計特徴(定義包含, 出力タイプ, 説明, 即時長)が, 4つのCSSタスク(毒性, 感情, 噂姿勢, ニュースフレーム)におけるLCM生成アノテーションの適合性と正確性に与える影響を, 大規模マルチプロンプト実験により検証した。以上の結果から,LSMのコンプライアンスと精度は極めて素早い依存性があることが示唆された。例えば、ラベルの代わりに数値スコアを求めると、全てのLLMのコンプライアンスと精度が低下する。全体的な最高のプロンプト設定はタスク依存であり、マイナーなプロンプト変更は生成されたラベルの配布に大きな変更をもたらす可能性がある。迅速な設計がLLM生成アノテーションの品質と配布に大きな影響を与えることを示すことで、この研究は研究者や実践者にとって警告と実践のガイドとなる。

Manually annotating data for computational social science tasks can be costly, time-consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs' compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (ChatGPT, PaLM2, and Falcon7b) and prompt design features (definition inclusion, output type, explanation, and prompt length) impact the compliance and accuracy of LLM-generated annotations on four CSS tasks (toxicity, sentiment, rumor stance, and news frames). Our results show that LLM compliance and accuracy are highly prompt-dependent. For instance, prompting for numerical scores instead of labels reduces all LLMs' compliance and accuracy. The overall best prompting setup is task-dependent, and minor prompt changes can cause large changes in the distribution of generated labels. By showing that prompt design significantly impacts the quality and distribution of LLM-generated annotations, this work serves as both a warning and practical guide for researchers and practitioners.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# アクティブ推論を用いた複雑なタスクに対するオンラインパレート最適決定法

Online Pareto-Optimal Decision-Making for Complex Tasks using Active Inference ( http://arxiv.org/abs/2406.11984v1 )

ライセンス: Link先を確認

Peter Amorese, Shohei Wakayama, Nisar Ahmed, Morteza Lahijanian,

(参考訳) ロボットが複雑なタスクを自律的に実行する場合、安全を維持しながら競合する目標をバランスさせなければならない。これは確率的な結果を持つ不確実な環境ではより困難になる。また,ロボットの動作の透明性向上とユーザの好みの整合性も重要である。本稿では,タスク実行の安全性を確保し,目的間のトレードオフを最適化し,ユーザの嗜好に順応する多目的強化学習のための新しいフレームワークを提案する。フレームワークには、多目的タスクプランナとハイレベルセレクタの2つの主なレイヤがある。計画層は、時間論理タスクの満足度を保証するための最適なトレードオフ計画セットを生成する。セレクタはアクティブな推論を使用して、どの生成された計画がユーザの好みに最も適しているかを決定し、学習を支援する。反復的に運用するフレームワークは、収集データに基づいてパラメータ化された学習モデルを更新する。操作と移動ロボットのケーススタディとベンチマークは、我々のフレームワークが他の方法よりも優れていることを示している。 i)複数の最適なトレードオフを学習する (二)利用者の嗜好に固執し、 (三)利用者のバランス調整 (i)および (II)。

When a robot autonomously performs a complex task, it frequently must balance competing objectives while maintaining safety. This becomes more difficult in uncertain environments with stochastic outcomes. Enhancing transparency in the robot's behavior and aligning with user preferences are also crucial. This paper introduces a novel framework for multi-objective reinforcement learning that ensures safe task execution, optimizes trade-offs between objectives, and adheres to user preferences. The framework has two main layers: a multi-objective task planner and a high-level selector. The planning layer generates a set of optimal trade-off plans that guarantee satisfaction of a temporal logic task. The selector uses active inference to decide which generated plan best complies with user preferences and aids learning. Operating iteratively, the framework updates a parameterized learning model based on collected data. Case studies and benchmarks on both manipulation and mobile robots show that our framework outperforms other methods and (i) learns multiple optimal trade-offs, (ii) adheres to a user preference, and (iii) allows the user to adjust the balance between (i) and (ii).

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# テキスト・画像モデルにおける地理的格差の分解評価

Decomposed evaluations of geographic disparities in text-to-image models ( http://arxiv.org/abs/2406.11988v1 )

ライセンス: Link先を確認

Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall,

(参考訳) 近年の研究では、家や車といった日常の物体の立体的な描写を含む、異なる地理的領域の生成された画像において、かなりの差異が特定されている。しかし、これらの不一致に対する既存の対策は、時間と費用のかかる人間の評価に限られているか、あるいはフルイメージを評価する自動測定に限られており、これらの不一致は生成された画像の特定の部分に比例できない。本研究では,画像生成における対象と背景の描写における地理的差異を別々に計測することのできる,画像生成における特徴の分解指標(Decomposed Indicators of Disparities in Image Generation, Decomposed-DIG)を提案する。 Decomposed-DIGを用いて、広く使われている潜伏拡散モデルを評価し、生成した画像は背景よりも写実性の良い物体を描写し、生成した画像の背景は物体よりも地域差が大きい傾向があることを発見した。私たちはDecomposed-DIGを使って、アフリカのステレオタイプな背景生成、アフリカの近代的な車両の生成に苦労し、屋外設定にいくつかのオブジェクトを非現実的に配置するなど、相違点の具体例を特定します。測定値にインフォームされた新たなプロンプト構造を用いることで,52%の最低領域改善と,20%のバックグラウンドの多様性向上を実現している。

Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# ニューラルシーケンスモデルの遅延埋め込み理論

Delay Embedding Theory of Neural Sequence Models ( http://arxiv.org/abs/2406.11993v1 )

ライセンス: Link先を確認

Mitchell Ostrow, Adam Eisen, Ila Fiete,

(参考訳) コヒーレント応答を生成するために、言語モデルは入力されたテキストシーケンスから観測されていない意味を推測する。この能力の1つの潜在的説明は力学系における遅延埋め込みの理論から生じ、これは観測されていない変数が観測された少数の変数の歴史から復元可能であることを証明している。言語モデルが遅延埋め込みを効果的に構築しているかどうかをテストするために,シーケンスモデルの容量を測定し,観測されていないダイナミクスを再構築する。我々は、ノイズのある部分的に観測された時系列データから次のステップ予測に基づいて、1層トランスフォーマーデコーダと状態空間シーケンスモデルを訓練した。その結果、各シーケンス層は、基礎となるシステムの実行可能な埋め込みを学習できることがわかった。しかし、状態空間モデルはトランスよりも誘導バイアスが強く、特に初期化時に観測されていない情報を効果的に再構成し、パラメータ効率の良いモデルがより多くなり、動的タスクのエラーも小さくなる。そこで本研究は,遅延埋め込み理論による動的システムと深層学習シーケンスモデルとの新たな関係を定めている。

To generate coherent responses, language models infer unobserved meaning from their input text sequence. One potential explanation for this capability arises from theories of delay embeddings in dynamical systems, which prove that unobserved variables can be recovered from the history of only a handful of observed variables. To test whether language models are effectively constructing delay embeddings, we measure the capacities of sequence models to reconstruct unobserved dynamics. We trained 1-layer transformer decoders and state-space sequence models on next-step prediction from noisy, partially-observed time series data. We found that each sequence layer can learn a viable embedding of the underlying system. However, state-space models have a stronger inductive bias than transformers-in particular, they more effectively reconstruct unobserved information at initialization, leading to more parameter-efficient models and lower error on dynamics tasks. Our work thus forges a novel connection between dynamical systems and deep learning sequence models via delay embedding theory.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 入力のない2部交絡状態のワイトネスネットワークステアビリティ

Witnessing network steerability of every bipartite entangled state without inputs ( http://arxiv.org/abs/2406.11994v1 )

ライセンス: Link先を確認

Shubhayan Sarkar,

(参考訳) 絡み合いと非局所性は、ほとんどの量子情報プロトコルの鍵となるリソースである。すべての絡み合った状態が半量子ゲームに関して非局所であることは、現在よく確立されている。しかし、あらゆる絡み合った状態のあらゆる形の非局所性を観察するのに使える目撃者が不足している。本研究では,入力を伴わないスワップステアリングのシナリオに注目し,計算可能なクロスノーム(CCN)基準に違反したNPTバイパーティイト状態と大量のバイパートライト状態に対応するネットワークステアビリティの線形証人を求める。さらに、信頼された当事者が入ってくるサブシステムのトモグラフィーを行うことができることを考慮し、各二部交絡状態のスワップステアビリティを目撃する線形不等式を構築する。したがって、全ての二分項絡み合った状態に対して、量子非局所性の形式を観測することができる。

Entanglement and nonlocality are key resources for most of the quantum information protocols. It is well-established now that every entangled state is nonlocal with respect to semi-quantum games. However, there is a lack of witnesses that can used to observe any form of nonlocality of every entangled state. In this work, we focus on the swap-steering scenario without inputs and find linear witnesses of network steerability corresponding to any negative partial transpose (NPT) bipartite state and a large class of bipartite states that violate the computable cross-norm (CCN) criterion. Furthermore, by considering that the trusted party can perform tomography of the incoming subsystems, we construct linear inequalities to witness swap-steerability of every bipartite entangled state. Consequently, for every bipartite entangled state one can now observe a form of quantum nonlocality.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 量子コンピューティングのゲーミフィケーションを探る:Qubit Factory

Exploring Gamification in Quantum Computing: The Qubit Factory ( http://arxiv.org/abs/2406.11995v1 )

ライセンス: Link先を確認

Glen Evenbly,

(参考訳) 量子論のゲーミフィケーションは、ユーザーが明らかに量子的な振る舞いを示すシミュレーションされた世界を体験できるようにすることで、量子現象の直観を構築することができる。 Qubit Factory(クビットファクトリー)は、量子ビットと量子コンピューティングの入門を提供するために設計された、ゲーミフィケーション量子回路シミュレータに基づくエンジニアリングスタイルのパズルゲームである。量子状態、ゲート、回路を表現するための直感的な視覚言語を導入し、可視化を支援するアニメーションによってさらに強化された。 Qubit Factoryは、ユーザが解決するのがますます難しいタスクの階層を示しており、各タスクは、少数のコンポーネントから構築された適切な古典/量子回路を構築し、実行する必要がある。初期のタスクは、量子ビット、量子ゲート、重ね合わせ、絡み合いの基本をカバーしていた。その後のタスクでは、超高密度符号化、量子テレポーテーション、絡み合った蒸留、古典的および量子的誤り補正、状態トモグラフィー、バーンスタイン・ヴァジラニアルゴリズム、量子リピータなどの重要な量子アルゴリズムとプロトコルをカバーしている。

Gamification of quantum theory can provide new inroads into the subject: by allowing users to experience simulated worlds that manifest obvious quantum behaviors they can potentially build intuition for quantum phenomena. The Qubit Factory is an engineering-style puzzle game based on a gamified quantum circuit simulator that is designed to provide an introduction to qubits and quantum computing, while being approachable to those with no prior background in the area. It introduces an intuitive visual language for representing quantum states, gates and circuits, further enhanced by animations to aid in visualization. The Qubit Factory presents a hierarchy of increasingly difficult tasks for the user to solve, where each task requires the user to construct and run an appropriate classical/quantum circuit built from a small selection of components. Earlier tasks cover the fundamentals of qubits, quantum gates, superpositions and entanglement. Later tasks cover important quantum algorithms and protocols including superdense coding, quantum teleportation, entanglement distillation, classical and quantum error correction, state tomography, the Bernstein-Vazirani algorithm, quantum repeaters and more.

翻訳日:2024-06-20 00:26:41 公開日:2024-06-17

# 今後の展望:経路計画におけるGPT-4の限界試験

Look Further Ahead: Testing the Limits of GPT-4 in Path Planning ( http://arxiv.org/abs/2406.12000v1 )

ライセンス: Link先を確認

Mohamed Aghzal, Erion Plaku, Ziyu Yao,

(参考訳) 大きな言語モデル(LLM)は、様々なタスクで印象的な機能を示している。しかし、長期計画では依然として課題に直面している。そこで本研究では,LLMの長い軌道を幾何的制約の下でナビゲートする能力を評価するためのプラットフォームとして,経路計画タスクを提案する。提案するベンチマークは,複雑な環境でのパス計画スキルを体系的にテストする。これを用いて, GPT-4の様々なタスク表現とプロンプトアプローチを用いて, 計画能力について検討した。フレーミングはPythonのコードとして促進され、長い軌道上のタスクを分解することで、GPT-4の経路計画の有効性が向上することがわかった。しかしながら、これらの手法はモデルの計画能力向上へのいくつかの期待を示すが、最適経路は得られず、拡張された地平線上での一般化に失敗する。

Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks. However, they still face challenges with long-horizon planning. To study this, we propose path planning tasks as a platform to evaluate LLMs' ability to navigate long trajectories under geometric constraints. Our proposed benchmark systematically tests path-planning skills in complex settings. Using this, we examined GPT-4's planning abilities using various task representations and prompting approaches. We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness. However, while these approaches show some promise toward improving the planning ability of the model, they do not obtain optimal paths and fail at generalizing over extended horizons.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# 疫学のモビリティに基づく比較モデルにおけるモデリング・推論・予測

Modeling, Inference, and Prediction in Mobility-Based Compartmental Models for Epidemiology ( http://arxiv.org/abs/2406.12002v1 )

ライセンス: Link先を確認

Ning Jiang, Weiqi Chu, Yao Li,

(参考訳) 疫学における古典的な区画モデルは、人口の固有の異質性に対処できないために、現実世界のダイナミクスを正確に捉えるのに苦労することが多い。本稿では,移動変数による不均一性を取り入れた新しい手法を導入し,従来のODE系を,異なる区画にまたがる人口密度の動態を記述する積分微分方程式系に変換する。以上の結果から, 人口密度がディラックデルタ関数として表現される古典的コンパートメントモデルと比較して, 移動量に基づくモデルでは, 最終パンデミックサイズが小さいことが示唆された。これは、多くの古典的モデルに共通する過大評価問題に対処する。さらに,感染集団の時系列は,移動分布を一意に同定するのに十分であることを示した。我々は,この分布を機械学習ベースのフレームワークを用いて再構築し,実世界のデータによる移動性に基づくモデルを効果的に制約する理論的およびアルゴリズム的サポートを提供する。

Classical compartmental models in epidemiology often struggle to accurately capture real-world dynamics due to their inability to address the inherent heterogeneity of populations. In this paper, we introduce a novel approach that incorporates heterogeneity through a mobility variable, transforming the traditional ODE system into a system of integro-differential equations that describe the dynamics of population densities across different compartments. Our results show that, for the same basic reproduction number, our mobility-based model predicts a smaller final pandemic size compared to classic compartmental models, whose population densities are represented as Dirac delta functions in our density-based framework. This addresses the overestimation issue common in many classical models. Additionally, we demonstrate that the time series of the infected population is sufficient to uniquely identify the mobility distribution. We reconstruct this distribution using a machine-learning-based framework, providing both theoretical and algorithmic support to effectively constrain the mobility-based model with real-world data.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# P3GNN: ソフトウェア定義ネットワークにおけるAPT検出のためのプライバシ保護プロバンスグラフベースモデル

P3GNN: A Privacy-Preserving Provenance Graph-Based Model for APT Detection in Software Defined Networking ( http://arxiv.org/abs/2406.12003v1 )

ライセンス: Link先を確認

Hedyeh Nazari, Abbas Yazdinejad, Ali Dehghantanha, Fattane Zarrinkalam, Gautam Srivastava,

(参考訳) Software Defined Networking (SDN)は、ネットワーク管理とプログラム可能性に大きな進歩をもたらした。しかし、この進化はAdvanced Persistent Threats (APTs) の脆弱性も高めており、特にゼロデイエクスプロイト(英語版)に直面した場合、従来の検出方法がしばしば対応できない、洗練された、ステルス的なサイバー攻撃が起きている。一般的な問題は、協調学習シナリオにおけるデータプライバシの懸念に対処しながら、新たな脅威を検出する既存の戦略が不十分であることだ。本稿では,P3GNN(プライバシ保存グラフベースグラフニューラルネットワークモデル)を提案する。これはSDN環境で効果的なAPT検出のために,フェデレートラーニング(FL)とグラフ畳み込みネットワーク(GCN)を併用する新しいモデルである。 P3GNNは教師なし学習を利用して、プロファイランスグラフ内の運用パターンを分析し、セキュリティ違反を示す偏差を識別する。その中核となる機能は、FLと同型暗号化の統合であり、コラボレーティブラーニング時のデータの機密性や整合性を強化している。このアプローチは、共有学習コンテキストにおけるデータのプライバシに関する重要な課題に対処する。 P3GNNの主なイノベーションは、前兆グラフ内のノードレベルで異常を検出する機能、攻撃軌跡の詳細なビューの提供、セキュリティ解析の強化である。さらに、教師なし学習能力により、標準的な運用パターンを学習することで、ゼロデイ攻撃を識別できる。 DARPA TCE3データセットを用いた実験的な評価は、P3GNNの例外的な性能を示し、精度は0.93、偽陽性率は0.06である。

Software Defined Networking (SDN) has brought significant advancements in network management and programmability. However, this evolution has also heightened vulnerability to Advanced Persistent Threats (APTs), sophisticated and stealthy cyberattacks that traditional detection methods often fail to counter, especially in the face of zero-day exploits. A prevalent issue is the inadequacy of existing strategies to detect novel threats while addressing data privacy concerns in collaborative learning scenarios. This paper presents P3GNN (privacy-preserving provenance graph-based graph neural network model), a novel model that synergizes Federated Learning (FL) with Graph Convolutional Networks (GCN) for effective APT detection in SDN environments. P3GNN utilizes unsupervised learning to analyze operational patterns within provenance graphs, identifying deviations indicative of security breaches. Its core feature is the integration of FL with homomorphic encryption, which fortifies data confidentiality and gradient integrity during collaborative learning. This approach addresses the critical challenge of data privacy in shared learning contexts. Key innovations of P3GNN include its ability to detect anomalies at the node level within provenance graphs, offering a detailed view of attack trajectories and enhancing security analysis. Furthermore, the models unsupervised learning capability enables it to identify zero-day attacks by learning standard operational patterns. Empirical evaluation using the DARPA TCE3 dataset demonstrates P3GNNs exceptional performance, achieving an accuracy of 0.93 and a low false positive rate of 0.06.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# Lexidate: モデル評価とLexicaseによる選択

Lexidate: Model Evaluation and Selection with Lexicase ( http://arxiv.org/abs/2406.12006v1 )

ライセンス: Link先を確認

Jose Guadalupe Hernandez, Anil Kumar Saini, Jason H. Moore,

(参考訳) 機械学習の自動化は、モデルトレーニング、評価、選択を自動化することによって、効果的な機械学習パイプラインを見つけるタスクを合理化する。クロスバリデーション(CV)のような従来の評価戦略は、パイプラインの予測の精度を平均する1つの値を生成する。しかし、この単一の値はパイプラインの一般化可能性を完全に記述していないかもしれない。本稿では,複数の独立予測値を用いたレキシケードに基づく検証手法(レキシケート)を提案する。 Lexidateはトレーニングデータを学習セットと選択セットに分割する。パイプラインは学習セットでトレーニングされ、選択セットで予測される。予測は正確性のために評価され、親パイプラインを識別するために語彙選択によって使用される。 10倍のCVと比較して、レキシケースはトレーニング時間を短縮する。 6つのOpenML分類タスクに対して,Tree-based Pipeline Optimization Tool 2 (TPOT2)パッケージ内の3つの語彙構成の有効性を検証した。 1つの構成では, TPOT2から返される最終モデルの精度は10倍CVに比べ, 差は認められなかった。ここで研究されたすべての構成は、10倍のCVと比較すると、ほぼ同じまたはより複雑な最終パイプラインを返した。

Automated machine learning streamlines the task of finding effective machine learning pipelines by automating model training, evaluation, and selection. Traditional evaluation strategies, like cross-validation (CV), generate one value that averages the accuracy of a pipeline's predictions. This single value, however, may not fully describe the generalizability of the pipeline. Here, we present Lexicase-based Validation (lexidate), a method that uses multiple, independent prediction values for selection. Lexidate splits training data into a learning set and a selection set. Pipelines are trained on the learning set and make predictions on the selection set. The predictions are graded for correctness and used by lexicase selection to identify parent pipelines. Compared to 10-fold CV, lexicase reduces the training time. We test the effectiveness of three lexidate configurations within the Tree-based Pipeline Optimization Tool 2 (TPOT2) package on six OpenML classification tasks. In one configuration, we detected no difference in the accuracy of the final model returned from TPOT2 on most tasks compared to 10-fold CV. All configurations studied here returned similar or less complex final pipelines compared to 10-fold CV.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# トラップイオン量子プロセッサによる小型ディジット画像の2値化

Supervised binary classification of small-scale digits images with a trapped-ion quantum processor ( http://arxiv.org/abs/2406.12007v1 )

ライセンス: Link先を確認

Ilia V. Zalivako, Alexander I. Gircha, Anastasiia S. Nikolaeva, Denis A. Drozhzhin, Alexander S. Borisenko, Andrei E. Korolkov, Nikita V. Semenin, Kristina P. Galstyan, Pavel A. Kamenskikh, Vasilii N. Smirnov, Mikhail A. Aksenov, Pavel L. Sidorov, Evgeniy O. Kiktenko, Ksenia Yu. Khabarova, Aleksey K. Fedorov, Nikolay N. Kolachevsky, Ilya A. Semerikov,

(参考訳) 本稿では、基本量子機械学習アルゴリズムを実行することにより、捕捉された$^{171}$Yb$^{+}$イオンに基づく量子プロセッサのベンチマーク結果を示す。具体的には、最大4キュービットの量子化されたサポートベクトルマシンアルゴリズムを用いて、100%精度で分類できるように意図的に選択された小型桁画像の教師付きバイナリ分類を行う。本研究では、データセットの異なる種類の量子符号化と、対応する量子回路に対する異なるレベルのトランスパイル最適化について検討する。各量子符号化において、トレーニングセットとテストセットの両方で100%精度の高い分類器が得られ、量子プロセッサが考慮された基本的な分類タスクを正しく解くことができることを示す。期待通り、量子プロセッサの能力が向上すれば、機械学習の有用なツールになるでしょう。

Here we present the results of benchmarking of a quantum processor based on trapped $^{171}$Yb$^{+}$ ions by performing basic quantum machine learning algorithms. Specifically, we carry out a supervised binary classification of small-scale digits images, which are intentionally chosen so that they can be classified with 100% accuracy, using a quantum-enhanced Support Vector Machine algorithm with up to four qubits. In our work, we specifically consider different types of quantum encodings of the dataset and different levels of transpilation optimizations for the corresponding quantum circuits. For each quantum encoding, we obtain a classifier that is of 100% accuracy on both training and test sets, which demonstrates that the quantum processor can correctly solve the basic classification task considered. As we expect, with the increase of the capabilities quantum processors, they can become a useful tool for machine learning.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# QC-Forest: ランダムフォレストの再トレーニングを高速化する古典的量子アルゴリズム

QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest ( http://arxiv.org/abs/2406.12008v1 )

ライセンス: Link先を確認

Romina Yalovetzky, Niran Kumar, Changhao Li, Marco Pistoia,

(参考訳) ランダムフォレスト(Random Forest, RF)は、教師あり学習法として人気があり、使いやすさと柔軟性で評価されている。オンラインRFモデルは、モデルの精度を維持するために、新しいトレーニングデータを考慮する必要がある。これは、自動運転システムやクレジットカード支払いのようなデータストリームにおいて、データが周期的に、連続的に生成されるアプリケーションにおいて特に重要である。この設定では、時間とともにデータ分布のドリフトが完全に捕捉されるので、古いデータと新しいデータが蓄積された周期的モデルの再トレーニングを行うのが有益である。しかし、これは、蓄積されたサンプル数と線形にスケールするため、RFの最先端の古典的アルゴリズムでは実用的ではない。 QC-Forestは,マルチクラス分類と回帰のためのストリーミング設定において,RFモデルを時間効率よく再学習するように設計された古典量子アルゴリズムである。 QC-Forestは、Kumarらによって提案された単一木構築と再訓練のための量子アルゴリズムであるDes-qを活用し、元の提案はバイナリクラスに限定されていたため、マルチクラス分類に拡張し、同じ多対数依存を維持しながら、基礎となる量子サブルーチンを有限エラーに置き換える正確な古典的な方法を導入した。最後に、QC-Forestは、最大80,000のサンプルを持つ広く使用されているベンチマークデータセットの最先端RF手法と比較して、競合精度を向上し、モデル再トレーニングを大幅に高速化することを示した。

Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications were data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# FinTruthQA:財務情報開示の品質評価のためのベンチマークデータセット

FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure ( http://arxiv.org/abs/2406.12009v1 )

ライセンス: Link先を確認

Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Bin Ke, Jie Yang,

(参考訳) 正確な透明性のある財務情報開示は、会計と金融の分野で重要であり、市場効率と投資家の信頼を確実にする。多くの情報開示プラットフォームの中で、中国証券取引所の投資家インタラクティブプラットフォームは、オンラインのQ&Aフォーマットを通じて、上場企業が投資家に興味のある情報を開示するための、新しくてインタラクティブな方法を提供する。しかし、上場企業では、限定的あるいは実質的な情報のない質問に回答することが一般的であり、大量のQ&A対の財務情報開示の質を自動評価することは困難である。本稿では、金融Q&Aデータにおける情報開示の自動品質評価のための高度な自然言語処理(NLP)技術を評価するためのベンチマークFinTruthQAを構築する。 FinTruthQAは6000の現実世界の財務Q&Aエントリで構成され、各Q&Aは4つの概念的側面に基づいて手動で注釈付けされた。我々は、FinTruthQA上で、統計的機械学習モデル、事前学習言語モデルとその微調整バージョン、および大規模言語モデルGPT-4を含む様々なNLPテクニックをベンチマークした。実験の結果,既存のNLPモデルは質問認識や質問関連タスクに強い予測能力を持つが,回答関連性や回答可読性タスクには最適であることがわかった。このベンチマークを確立することで、情報開示の自動評価のための堅牢な基盤を提供し、財務報告の透明性と品質を大幅に向上させる。 FinTruthQAは監査人、規制当局、金融アナリストがリアルタイム監視やデータ駆動意思決定に利用でき、また会計と金融の高度な研究のための研究者も利用でき、最終的には金融市場の信頼と効率を高めることができる。

Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-answer (Q&A) format. However, it is common for listed firms to respond to questions with limited or no substantive information, and automatically evaluating the quality of financial information disclosure on large amounts of Q&A pairs is challenging. This paper builds a benchmark FinTruthQA, that can evaluate advanced natural language processing (NLP) techniques for the automatic quality assessment of information disclosure in financial Q&A data. FinTruthQA comprises 6,000 real-world financial Q&A entries and each Q&A was manually annotated based on four conceptual dimensions of accounting. We benchmarked various NLP techniques on FinTruthQA, including statistical machine learning models, pre-trained language model and their fine-tuned versions, as well as the large language model GPT-4. Experiments showed that existing NLP models have strong predictive ability for real question identification and question relevance tasks, but are suboptimal for answer relevance and answer readability tasks. By establishing this benchmark, we provide a robust foundation for the automatic evaluation of information disclosure, significantly enhancing the transparency and quality of financial reporting. FinTruthQA can be used by auditors, regulators, and financial analysts for real-time monitoring and data-driven decision-making, as well as by researchers for advanced studies in accounting and finance, ultimately fostering greater trust and efficiency in the financial markets.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# AIフェアネスのためのトランスダクティブアプローチのメリットとリスク

The Benefits and Risks of Transductive Approaches for AI Fairness ( http://arxiv.org/abs/2406.12011v1 )

ライセンス: Link先を確認

Muhammed Razzak, Andreas Kirsch, Yarin Gal,

(参考訳) 近年,機械学習モデルの速度,精度,公平性を向上する可能性から,学習中にホールドアウトセットを利用するトランスダクティブ学習法が人気を集めている。それにもかかわらず、ホールドアウト集合の構成そのもの、特に敏感な部分群のバランスは、ほとんど見過ごされてきている。 CIFARとCelebAデータセットを用いた実験により、ホールドアウトセットの組成変化がフェアネス指標に大きく影響することが示された。不均衡なホールトアウトセットは、既存の格差を悪化させ、バランスの取れたホールトアウトは、不均衡なトレーニングデータによってもたらされる問題を緩和する。これらの知見は,多様かつ代表的であるホールドアウトセットの構築の必要性を浮き彫りにしている。

Recently, transductive learning methods, which leverage holdout sets during training, have gained popularity for their potential to improve speed, accuracy, and fairness in machine learning models. Despite this, the composition of the holdout set itself, particularly the balance of sensitive sub-groups, has been largely overlooked. Our experiments on CIFAR and CelebA datasets show that compositional changes in the holdout set can substantially influence fairness metrics. Imbalanced holdout sets exacerbate existing disparities, while balanced holdouts can mitigate issues introduced by imbalanced training data. These findings underline the necessity of constructing holdout sets that are both diverse and representative.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# 注意シンクによる大規模言語モデル量子化のためのアクティベーションアウトレイラの緩和

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization ( http://arxiv.org/abs/2406.12016v1 )

ライセンス: Link先を確認

Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee,

(参考訳) LLM量子化の最近の進歩にもかかわらず、アクティベーション量子化は、アクティベーションアウトレーヤのために困難である。従来の改善、例えば、異なるチャネルの精度の混合、追加のオーバーヘッドの導入、スピードアップの削減。本研究では,問題トークンの発生を防止し,アクティベーション単位の量子化を促進するための簡易かつ効果的な戦略を開発する。正確には、プレフィックスとして挿入された後続のトークンの外部化を緩和するCushionCacheという、キー値キャッシュのセットを見つける方法を提案する。 CushionCacheは2つのステップで動作します。まず最初に、後続のトークンにおける最大アクティベーション値を最小限に抑えるプロンプトトークンシーケンスを探します。次に、トークンキャッシュを調整して、その後のトークンのアクティベーションを、より量子化しやすいように調整する。提案手法は, LLMのアクティベーション・アウトレイラに対処し, アクティベーション・量子化法の性能向上に寄与する。我々は,この手法を広範囲のモデルとベンチマークで徹底的に評価し,拡張子ごとのW8A8量子化の確立されたベースラインをはるかに上回り,最近のアクティベーション量子化法とシームレスに統合できることを見出した。

Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tokens. Precisely, we propose a method to find a set of key-value cache, coined CushionCache, which mitigates outliers in subsequent tokens when inserted as a prefix. CushionCache works in two steps: First, we greedily search for a prompt token sequence that minimizes the maximum activation values in subsequent tokens. Then, we further tune the token cache to regularize the activations of subsequent tokens to be more quantization-friendly. The proposed method successfully addresses activation outliers of LLMs, providing a substantial performance boost for per-tensor activation quantization methods. We thoroughly evaluate our method over a wide range of models and benchmarks and find that it significantly surpasses the established baseline of per-tensor W8A8 quantization and can be seamlessly integrated with the recent activation quantization method.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# スプライシングイテレーションによる空間制約最適化

Sparsity-Constraint Optimization via Splicing Iteration ( http://arxiv.org/abs/2406.12017v1 )

ライセンス: Link先を確認

Zezhi Wang, Jin Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang,

(参考訳) スパシティ制約最適化は、信号処理、統計処理、機械学習に広く適用可能である。既存の高速アルゴリズムは、ステップサイズや正確な停止基準の実装などのパラメータをかなり調整しなければならない。この問題に対処するため、sPlicing itEration (SCOPE) を用いて、低次元部分空間における強い凸性と滑らか性を持つ非線形微分対象関数を最適化するスペーサ性制約最適化アルゴリズムを開発した。アルゴリズム的には、SCOPEアルゴリズムはパラメータをチューニングせずに効率的に収束する。理論的には、SCOPE は線型収束率を持ち、空間が正しく特定されたときに真の支持集合を回復する解に収束する。また,制約等距離-固有型条件を伴わない並列理論結果も開発する。 SCOPEの汎用性とパワーを適用し、スパース2次最適化を解き、スパース分類器を学習し、バイナリ変数のスパースマルコフネットワークを復元する。これらの特定のタスクに関する数値結果から、SCOPEは10-1000の精度で真のサポートセットを正しく識別し、SCOPEのアルゴリズム的および理論的利点を確認する。 C++実装に基づいたオープンソースのPythonパッケージのskscopeがGitHubで公開されており、cvxpyライブラリによって実装された競合する凸緩和メソッドの10倍のスピードアップに達しています。

Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE) to optimize nonlinear differential objective functions with strong convexity and smoothness in low dimensional subspaces. Algorithmically, the SCOPE algorithm converges effectively without tuning parameters. Theoretically, SCOPE has a linear convergence rate and converges to a solution that recovers the true support set when it correctly specifies the sparsity. We also develop parallel theoretical results without restricted-isometry-property-type conditions. We apply SCOPE's versatility and power to solve sparse quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. The numerical results on these specific tasks reveal that SCOPE perfectly identifies the true support set with a 10--1000 speedup over the standard exact solver, confirming SCOPE's algorithmic and theoretical merits. Our open-source Python package skscope based on C++ implementation is publicly available on GitHub, reaching a ten-fold speedup on the competing convex relaxation methods implemented by the cvxpy library.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# CItruS:ロングシーケンスモデリングのためのチャンクインストラクション対応状態推定

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling ( http://arxiv.org/abs/2406.12018v1 )

ライセンス: Link先を確認

Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung,

(参考訳) 大規模言語モデル(LLM)が進歩し続け、長いシーケンスモデリングが広く関心を集めている。近年の研究では、Transformerモデルのキー値キャッシュ内の隠れ状態の大部分を、長いシーケンスを生成する際のパープレキシティのパフォーマンスに影響を与えることなく、破棄(除去)することができることが確認されている。しかし,これらの手法は,難易度を保ちながら,下流課題の解決に重要な情報を降ろすことがしばしばある。この問題に対処するために、下流タスクに有用な注目度を隠蔽状態の消去プロセスに統合する新しいモデリング手法であるChunked Instruction-Aware State Eviction (CItruS)を紹介する。さらに,チャンクシーケンス処理の効率向上のための手法を設計する。トレーニング不要な手法は,言語モデリングの難易度を保ちながら,同じメモリ予算の下で,複数の強いベースライン上での長いシーケンス理解および検索タスクにおいて優れた性能を示す。

Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexity performance, often drop information that is important for solving downstream tasks, a problem which we call information neglect. To address this issue, we introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. In addition, we design a method for chunked sequence processing to further improve efficiency. Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget, while preserving language modeling perplexity.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# 暗号化されたワイヤレス電力をハッキング:ダイナミック充電のサイバーセキュリティ

Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging ( http://arxiv.org/abs/2406.12019v1 )

ライセンス: Link先を確認

Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz,

(参考訳) 近年,無線電力伝送のためのエネルギー暗号化が開発されており,公共の場所では未許可のエネルギー抽出を抑制することが重要である。ほとんどの技術は周波数に変化があり、非共鳴のため、許可されていない受信機はエネルギーを抽出できない。しかし、この戦略は信頼できない。エネルギー暗号化技術の進歩を刺激し,セキュリティホールを指摘するために,暗号化周波数可変無線電力伝送の基本原理に対する復号法を提案する。本論文は、補助コイルを用いて周波数を検出するとともに、スイッチトキャパシタアレイを用いて広い周波数範囲の受信機を適応的に補償する。スイッチングキャパシタアレイは、2つのキャパシタと1つのセミコンダクタスイッチを含む。 1つのコンデンサは受信機を常に補償し、もう1つの無線電力伝達サイクル中のアクティブな時間はスイッチによって制御される。このようにして、提案したハッキング受信機は、補償の等価容量を制御し、エネルギーを盗む。最後に、詳細なシミュレーションモデルと実験結果により、周波数ホッピングエネルギー暗号化に対する攻撃の有効性が証明された。抽出された非無視エネルギーは問題となるだろうが、認証された受信機が得るエネルギーの78%から84%を盗むことに成功した。周波数が変わると、インターセプターは非常に急速に調整され、高速な周波数変化暗号化システムにハックされる。

Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out security holes, this paper proposes a decryption method for the fundamental principle of encrypted frequency-varying wireless power transfer. The paper uses an auxiliary coil to detect the frequency and a switched-capacitor array to adaptively compensate the receiver for a wide frequency range. The switched-capacitor array contains two capacitors and one semi-conductor switch. One capacitor compensates the receiver all the time while the other's active time during one wireless power transfer cycle is regulated by the switch. Thus, the proposed hacking receiver controls the equivalent capacitance of the compensation and steals energy. Finally, a detailed simulation model and experimental results prove the effectiveness of the attack on frequency-hopping energy encryption. Although any nonnegligible energy extracted would be problematic, we achieved to steal 78% to 84% of the energy an authorized receiver could get. When the frequency changes, the interceptor is coarsely tuned very quickly, which can hack fast frequency-varying encrypted system.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# Boxがタグ認識のレコメンデーションでグラフニューラルネットワークと出会う

When Box Meets Graph Neural Network in Tag-aware Recommendation ( http://arxiv.org/abs/2406.12020v1 )

ライセンス: Link先を確認

Fake Lin, Ziwei Zhao, Xi Zhu, Da Zhang, Shitian Shen, Xueying Li, Tong Xu, Suojuan Zhang, Enhong Chen,

(参考訳) 昨年は、LLM強化タグがサポートしているタグ対応レコメンデーションシステムの再構築を目撃している。残念ながら、大きな努力がなされているが、現在のソリューションでは、タグ駆動プロファイルのみを使用して、ユーザの好みに固有の多様性と不確実性を記述できない可能性がある。近年, ボックス埋め込みなどの幾何学的手法の開発により, 高次元空間におけるボックス内の範囲として, ユーザの好みの多様性を完全にモデル化できるようになった。しかし、これらの手法は高階隣の信号、すなわちユーザ・タグ・イテム三部グラフ内のセマンティック・リッチなマルチホップ関係をキャプチャできないため、欠陥は依然として存在し、ユーザ・モデリングの有効性は著しく制限される。この課題に対処するため、我々はBoxGNNと呼ばれる新しいアルゴリズムを提案し、論理演算を組み合わせてメッセージアグリゲーションを行い、高次信号を組み込む。具体的には、まずユーザ、アイテム、タグを表現空間の単純なポイントではなくハイパーボックスとして埋め込み、その後のプロセスを促進するために2つの論理演算を定義する。次に、論理演算の組み合わせによりメッセージ集約機構を実行し、対応する高階ボックス表現を得る。最後に,ボックスの表現を洗練させるために,Gumbelスムース化技術を用いたボリュームベース学習手法を採用する。 2つの公開データセットと1つのLLM強化eコマースデータセットに関する大規模な実験は、さまざまな最先端ベースラインと比較してBoxGNNの優位性を検証した。コードはオンラインでリリースされます

Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embedding, diversity of user preferences now could be fully modeled as the range within a box in high dimension space. However, defect still exists as these approaches are incapable of capturing high-order neighbor signals, i.e., semantic-rich multi-hop relations within the user-tag-item tripartite graph, which severely limits the effectiveness of user modeling. To deal with this challenge, in this paper, we propose a novel algorithm, called BoxGNN, to perform the message aggregation via combination of logical operations, thereby incorporating high-order signals. Specifically, we first embed users, items, and tags as hyper-boxes rather than simple points in the representation space, and define two logical operations to facilitate the subsequent process. Next, we perform the message aggregation mechanism via the combination of logical operations, to obtain the corresponding high-order box representations. Finally, we adopt a volume-based learning objective with Gumbel smoothing techniques to refine the representation of boxes. Extensive experiments on two publicly available datasets and one LLM-enhanced e-commerce dataset have validated the superiority of BoxGNN compared with various state-of-the-art baselines. The code is released online

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# 強化学習によるアセストラル組換えグラフの構築

Constructing Ancestral Recombination Graphs through Reinforcement Learning ( http://arxiv.org/abs/2406.12022v1 )

ライセンス: Link先を確認

Mélanie Raymond, Marie-Hélène Descary, Cédric Beaulac, Fabrice Larribe,

(参考訳) 長年にわたり、祖先の組換えグラフ(ARG)を構築するために多くのアプローチが提案されてきた。これらの方法の中で、最も可能性の高いグラフが最短であるという仮定に頼っているものが多い。本稿では,短いARG(Reinforcement Learning: RL)を構築するための新しいアプローチを提案する。我々は,一組の遺伝的配列とそれらの最も最近の共通の祖先の最も短い経路を見つけることと,迷路の入り口と出口の間の最も短い経路を見つけることと,古典的なRL問題との類似性を生かした。迷路問題では、学習者はエージェントと呼ばれ、できるだけ早く脱出するために取るべき方向を学ばなければならないが、この問題では、エージェントはできるだけ早く最新の共通の祖先に到達するために、合理化、突然変異、組換えの間の行動を学ぶ必要がある。以上の結果から,RLは短いARGを構築するために最適化されたヒューリスティックアルゴリズムで構築されたARGと同等に短時間で構築できることが示唆された。さらに,本手法では,与えられたサンプルに対して短いARGの分布を構築することができ,学習プロセス中に使用されていない新しいサンプルに学習を一般化することができる。

Over the years, many approaches have been proposed to build ancestral recombination graphs (ARGs), graphs used to represent the genetic relationship between individuals. Among these methods, many rely on the assumption that the most likely graph is among the shortest ones. In this paper, we propose a new approach to build short ARGs: Reinforcement Learning (RL). We exploit the similarities between finding the shortest path between a set of genetic sequences and their most recent common ancestor and finding the shortest path between the entrance and exit of a maze, a classic RL problem. In the maze problem, the learner, called the agent, must learn the directions to take in order to escape as quickly as possible, whereas in our problem, the agent must learn the actions to take between coalescence, mutation, and recombination in order to reach the most recent common ancestor as quickly as possible. Our results show that RL can be used to build ARGs as short as those built with a heuristic algorithm optimized to build short ARGs, and sometimes even shorter. Moreover, our method allows to build a distribution of short ARGs for a given sample, and can also generalize learning to new samples not used during the learning process.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# LiLiuM:eBayのeコマースのための大規模言語モデル

LiLiuM: eBay's Large Language Models for e-commerce ( http://arxiv.org/abs/2406.12023v1 )

ライセンス: Link先を確認

Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel Petrushkov, Pierre-Yves Vandenbussche, Shahram Khadivi,

(参考訳) 1B、7B、13Bパラメータモデルは、eBayのeコマース領域における特定のニーズに合うように、100%社内で開発された。これにより、eBayは、ライセンス、データ、語彙、アーキテクチャを含むモデルのすべての側面を完全にコントロールできる。これらのモデルは、細調整と命令チューニングの基礎として使われ、外部モデルへの依存をなくすことを期待しています。 LiLiuM LLMは、一般およびeコマースドメインから3兆個の多言語テキストのトークンで訓練されている。それらは、英語の自然言語理解(NLU)ベンチマークで人気のあるLLaMA-2モデルと似ている。同時に、非英語NLUタスク、機械翻訳、電子商取引特化下流タスクにおいてLLaMA-2を上回ります。データミックスの一部として、新たにリリースされたRedPajama-V2データセットを使用して、データのフィルタリングと重複に関する洞察を共有します。また,自己回帰言語モデリングにおける構造化データのシリアライズ方法についても詳細に論じる。事前学習におけるコードと並列機械翻訳データの影響について考察する。さらに,電子商取引用にカスタマイズされた独自のトークンとモデル語彙を開発する。これにより、LLaMA-2と比較してeBay固有のダウンストリームタスクでテキスト生成を最大34%高速化できます。最後に,LLM事前学習に関して,最良個々人のチェックポイントよりも,チェックポイント平均化がさらに向上することを示す。

We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating dependencies to external models. The LiLiuM LLMs have been trained on 3 trillion tokens of multilingual text from general and e-commerce domain. They perform similar to the popular LLaMA-2 models on English natural language understanding (NLU) benchmarks. At the same time, we outperform LLaMA-2 on non-English NLU tasks, machine translation and on e-commerce specific downstream tasks. As part of our data mixture, we utilize the newly released RedPajama-V2 dataset for training and share our insights regarding data filtering and deduplication. We also discuss in detail how to serialize structured data for use in autoregressive language modeling. We provide insights on the effects of including code and parallel machine translation data in pre-training. Furthermore, we develop our own tokenizer and model vocabulary, customized towards e-commerce. This way, we can achieve up to 34% speed-up in text generation on eBay-specific downstream tasks compared to LLaMA-2. Finally, in relation to LLM pretraining, we show that checkpoint averaging can further improve over the best individual model checkpoint.

翻訳日:2024-06-20 00:16:57 公開日:2024-06-17

# 技術的負債の進展予測と予測に関する体系的文献レビュー

Systematic literature review on forecasting and prediction of technical debt evolution ( http://arxiv.org/abs/2406.12026v1 )

ライセンス: Link先を確認

Adekunle Ajibode, Yvon Apedo, Temitope Ajibode,

(参考訳) コンテキスト: 技術的負債(TD)とは、ソフトウェア品質の妥協によって生じる追加コストを指し、開発期間中に短期的なメリットを提供するが、長期的な品質を損なう可能性がある。正確なTD予測と予測は、インフォメーションソフトウェア保守と積極的管理に不可欠である。しかし、この研究領域では、利用可能な予測技術に関する包括的な資料が欠落している。目的:本研究は、TD進化を予測するために、ソフトウェア工学における既存の知識を探求し、研究と産業で提案されるアプローチの洞察を得ることを目的としている。方法: この目的を達成するため, 2023年までの646の異なる論文を網羅した体系的文献レビューを行った。ソフトウェア工学の確立された方法論に従って、分析のための14の主研究を特定し、含めた。結果: この分析からTD進化予測への様々なアプローチが明らかになった。特に、ランダムな森林と時間的畳み込みネットワークは、一次研究の結果に基づく他の手法に比べて優れた性能を示した。しかしながら、これらのアプローチは15の特定されたTDタイプのうち、特にコード負債とアーキテクチャ負債の2つにのみ対処します。結論:TD進化予測の研究はまだ初期段階であり,多くの課題が未解決のまま残されている。そこで本稿では,既存のギャップを埋めるためにさらなる調査を必要とするいくつかの研究方向を提案する。キーワード:システム文献レビュー、技術的負債、技術的負債予測、技術的負債予測、技術的負債メトリクス

Context: Technical debt (TD) refers to the additional costs incurred due to compromises in software quality, providing short-term advantages during development but potentially compromising long-term quality. Accurate TD forecasting and prediction are vital for informed software maintenance and proactive management. However, this research area lacks comprehensive documentation on the available forecasting techniques. Objective: This study aims to explore existing knowledge in software engineering to gain insights into approaches proposed in research and industry for forecasting TD evolution. Methods: To achieve this objective, we conducted a Systematic Literature Review encompassing 646 distinct papers published until 2023. Following established methodology in software engineering, we identified and included 14 primary studies for analysis. Result: Our analysis unveiled various approaches for TD evolution forecasting. Notably, random forest and temporal convolutional networks demonstrated superior performance compared to other methods based on the result from the primary studies. However, these approaches only address two of the fifteen identified TD types, specifically Code debt and Architecture debt, while disregarding the remaining types. Conclusion: Our findings indicate that research on TD evolution forecasting is still in its early stages, leaving numerous challenges unaddressed. Therefore, we propose several research directions that require further investigation to bridge the existing gaps. Keywords: Systematic literature review, Technical debt, Technical debt prediction, Technical debt forecasting, Technical debt metrics

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# 敵の摂動は、アーティストを生成AIから確実に保護できない

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI ( http://arxiv.org/abs/2406.12027v1 )

ライセンス: Link先を確認

Robert Hönig, Javier Rando, Nicholas Carlini, Florian Tramèr,

(参考訳) アーティストたちは、独自の芸術スタイルを忠実に再現できる画像生成モデルの進歩をますます懸念している。これに対し、オンラインで公開されたアートワークに小さな敵の摂動を組み込んだ、スタイルの模倣に対する保護ツールがいくつか開発されている。本研究では,一般的な保護 – 数百万ダウンロード – の有効性を評価し,セキュリティに関する誤った感覚のみを提供することを示す。画像アップスケーリングのような低努力と「オフ・ザ・シェルフ」技術は、既存の保護を著しく劣化させる堅牢な模倣手法を作成するのに十分であることがわかった。ユーザスタディを通じて、既存の保護は簡単にバイパスでき、アーティストはスタイルの模倣に弱いままであることを示す。我々は、敵対的摂動に基づくツールが、生成的AIの誤用からアーティストを確実に保護できないことを警告し、代替技術以外のソリューションの開発を促す。

Artists are increasingly concerned about advancements in image generation models that can closely replicate their unique artistic styles. In response, several protection tools against style mimicry have been developed that incorporate small adversarial perturbations into artworks published online. In this work, we evaluate the effectiveness of popular protections -- with millions of downloads -- and show they only provide a false sense of security. We find that low-effort and "off-the-shelf" techniques, such as image upscaling, are sufficient to create robust mimicry methods that significantly degrade existing protections. Through a user study, we demonstrate that all existing protections can be easily bypassed, leaving artists vulnerable to style mimicry. We caution that tools based on adversarial perturbations cannot reliably protect artists from the misuse of generative AI, and urge the development of alternative non-technological solutions.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# SPA-VL:視覚言語モデルのための包括的安全基準アライメントデータセット

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model ( http://arxiv.org/abs/2406.12030v1 )

ライセンス: Link先を確認

Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao,

(参考訳) 視覚言語モデル(VLM)の出現は、マルチモーダル情報の理解において前例のない進歩をもたらした。 VLMにおけるテキストと視覚のセマンティクスの組み合わせは非常に複雑で多様であり、これらのモデルの安全性の整合性は困難である。さらに、VLMの安全性アライメントに関する限られた研究により、大規模で高品質なデータセットが不足している。これらの制約に対処するために,SPA-VL というビジョン言語モデルのための安全優先アライメントデータセットを提案する。 SPA-VLは6つの有害ドメイン、13のカテゴリ、53のサブカテゴリをカバーし、クエスト、画像、選択された応答、拒否された応答)の4倍体の100,788のサンプルを含む。深さの面では、応答は12個のオープン(eg, QwenVL)とクローズドソース(eg, Gemini)のVLMから収集され、多様性が保証される。実験結果から,SPA-VLデータセット上のアライメント技術を用いてトレーニングしたモデルでは,コア機能を維持しながら,無害性と有用性を大幅に向上することが示唆された。 SPA-VLは大規模で高品質で多様なデータセットであり、VLMが無害性と有用性の両方を達成することを保証する重要なマイルストーンである。コード https://github.com/EchoseChen/SPA-VL-RLHF と SPA-VL データセット url https://huggingface.co/datasets/sqrti/SPA-VL を公開しました。

The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To address these limitations, we propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL. In terms of breadth, SPA-VL covers 6 harmfulness domains, 13 categories, and 53 subcategories, and contains 100,788 samples of the quadruple (question, image, chosen response, rejected response). In terms of depth, the responses are collected from 12 open- (e.g., QwenVL) and closed-source (e.g., Gemini) VLMs to ensure diversity. The experimental results indicate that models trained with alignment techniques on the SPA-VL dataset exhibit substantial improvements in harmlessness and helpfulness while maintaining core capabilities. SPA-VL, as a large-scale, high-quality, and diverse dataset, represents a significant milestone in ensuring that VLMs achieve both harmlessness and helpfulness. We have made our code https://github.com/EchoseChen/SPA-VL-RLHF and SPA-VL dataset url https://huggingface.co/datasets/sqrti/SPA-VL publicly available.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# 言語モデリングによる語彙データの大規模伝達学習

Large Scale Transfer Learning for Tabular Data via Language Modeling ( http://arxiv.org/abs/2406.12031v1 )

ライセンス: Link先を確認

Josh Gardner, Juan C. Perdomo, Ludwig Schmidt,

(参考訳) 構造的、異質で、行と列を持つスプレッドシートスタイルのデータであるタブラルデータは、実際には多くのドメインで広く使われている。しかし、近年の基盤モデルでは、言語モデリングやコンピュータビジョンなどの領域におけるタスク固有のデータセットや予測器の開発の必要性が減っているが、この伝達学習パラダイムは表領域に類似した影響を与えていない。本研究では,このギャップを狭め,表型予測のための言語モデルであるTabuLa-8Bを提案する。本研究では,TabLibコーパスから大規模で高品質なトレーニングデータセットを抽出するプロセスを定義し,表型データフィルタリングと品質管理の手法を提案する。得られたデータセットは3.1Mのユニークなテーブルから1.6Bを超える行で構成されており、新しいパッキングとアテンションスキームを用いて表データ予測(分類とバイナリ回帰)のためのLlama 3-8B大言語モデル(LLM)を微調整する。 329のデータセットからなるテストスイートで評価した結果,TabuLa-8Bはランダムな推測よりも15ポイント(pp)高い未確認テーブル上でゼロショット精度を持つことがわかった。ターゲットデータセットを微調整することなく、数ショット設定(1-32ショット)で、TabuLa-8Bは、XGBoostやTabPFNモデルよりも5～15pp正確で、そのモデルでは、XGBoostとTabPFNは、同等または最大16倍のデータで明示的にトレーニングされている。この論文の出版とともに、私たちのモデル、コード、データをリリースします。

Tabular data -- structured, heterogeneous, spreadsheet-style data with rows and columns -- is widely used in practice across many domains. However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. We define a process for extracting a large, high-quality training dataset from the TabLib corpus, proposing methods for tabular data filtering and quality control. Using the resulting dataset, which comprises over 1.6B rows from 3.1M unique tables, we fine-tune a Llama 3-8B large language model (LLM) for tabular data prediction (classification and binned regression) using a novel packing and attention scheme for tabular prediction. Through evaluation across a test suite of 329 datasets, we find that TabuLa-8B has zero-shot accuracy on unseen tables that is over 15 percentage points (pp) higher than random guessing, a feat that is not possible with existing state-of-the-art tabular prediction models (e.g. XGBoost, TabPFN). In the few-shot setting (1-32 shots), without any fine-tuning on the target datasets, TabuLa-8B is 5-15 pp more accurate than XGBoost and TabPFN models that are explicitly trained on equal, or even up to 16x more data. We release our model, code, and data along with the publication of this paper.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# 大規模言語モデルを用いたメンタルヘルス分析におけるバイアスの発見と緩和

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models ( http://arxiv.org/abs/2406.12033v1 )

ライセンス: Link先を確認

Yuqing Wang, Yun Zhao, Sara Alessandra Keller, Anne de Hond, Marieke M. van Buchem, Malvika Pillai, Tina Hernandez-Boussard,

(参考訳) 大規模言語モデル(LLM)の進歩は、メンタルヘルス分析を含む様々な応用において強力な能力を示している。しかし、既存の研究は予測性能に重点を置いており、フェアネスの重大な問題は未発見のままであり、脆弱な個体群に重大なリスクを及ぼしている。潜在的なバイアスを認めているにもかかわらず、以前の研究はこれらのバイアスとその影響について徹底的な調査を欠いていた。このギャップに対処するために,8種類のメンタルヘルスデータセットに対して異なるプロンプト法による10個のLSMを用いて,7つの社会的要因(性別,年齢,宗教など)のバイアスを体系的に評価した。以上の結果から,GPT-4は,MentalRoBERTaのようなドメイン固有モデルに後れを取っているものの,LLM間の性能と公平性において最高の総合バランスを達成していることが示された。さらに、調整されたフェアネス対応のプロンプトは、メンタルヘルス予測におけるバイアスを効果的に軽減し、この分野におけるフェアネス分析の大きな可能性を浮き彫りにします。

The advancement of large language models (LLMs) has demonstrated strong capabilities across various applications, including mental health analysis. However, existing studies have focused on predictive performance, leaving the critical issue of fairness underexplored, posing significant risks to vulnerable populations. Despite acknowledging potential biases, previous works have lacked thorough investigations into these biases and their impacts. To address this gap, we systematically evaluate biases across seven social factors (e.g., gender, age, religion) using ten LLMs with different prompting methods on eight diverse mental health datasets. Our results show that GPT-4 achieves the best overall balance in performance and fairness among LLMs, although it still lags behind domain-specific models like MentalRoBERTa in some cases. Additionally, our tailored fairness-aware prompts can effectively mitigate bias in mental health predictions, highlighting the great potential for fair analysis in this field.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# Self-MoE: 自己専門のエキスパートによる構成的大規模言語モデルを目指して

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts ( http://arxiv.org/abs/2406.12034v1 )

ライセンス: Link先を確認

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter,

(参考訳) 我々は,モノリシックなLCMを,MiXSE(MiXture of Self-specialized Experts)という,自己専門の専門家による構成的,モジュール的なシステムに変換するアプローチであるSelf-MoEを提案する。提案手法は,自己生成合成データを用いて専門家モジュールを構築する自己特殊化を利用して,それぞれに共有ベースLLMを備え,自己最適化ルーティングを組み込む。これにより、さまざまな目標タスクの動的かつ機能固有の処理が可能になり、広範な人間ラベル付きデータやパラメータを追加することなく、全体的な機能を向上させることができる。実験結果から, LLMの特殊化は, 非特殊化タスクにおける性能に潜在的なトレードオフをもたらす可能性が示唆された。一方、私たちのSelf-MoEは、知識、推論、数学、コーディングといった様々なベンチマークにおいて、ベースLSMよりも大幅に改善されていることを示しています。また、インスタンスのマージや重み付けなど、他の方法よりも一貫して優れており、セマンティックエキスパートやルーティングの設計による柔軟性と解釈性も向上している。我々の発見は、モジュール化と、効率的でスケーラブルで適応可能なシステムを実現する上での自己改善の持つ重要な役割を浮き彫りにしている。

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# ロボット神経リハビリテーショントレーニングのための社会的対話型エージェント : 概念化と概念実証研究

Socially Interactive Agents for Robotic Neurorehabilitation Training: Conceptualization and Proof-of-concept Study ( http://arxiv.org/abs/2406.12035v1 )

ライセンス: Link先を確認

Rhythm Arora, Pooja Prajod, Matteo Lavit Nicora, Daniele Panzeri, Giovanni Tauro, Rocco Vertechy, Matteo Malosio, Elisabeth André, Patrick Gebhard,

(参考訳) 多様な運動能力を持つ人は、機能回復を促進することを目的とした集中治療や専門的なリハビリテーション療法の恩恵を受けることが多い。それでも課題は、神経リハビリテーションのプロフェッショナルが限定的に利用できることであり、必要なケアレベルを効果的に提供することを妨げる。ロボットデバイスは、治療中の医療従事者への依存を減らす大きな可能性を秘めているが、同時に、従来の対人セッションが提供する重要なヒューマンインタラクションやモチベーションを欠いている。このギャップを埋めるために、我々は、神経リハビリテーショントレーニング中にパーソナライズされた院外援助を提供するAIベースのシステムを導入する。本システムは、リハビリテーション訓練装置、感情信号分類モデル、トレーニング演習、およびユーザインタフェースとしてのソーシャルインタラクティブエージェントを含む。専門職の助けを借りて、想定されたシステムは、個々の患者の独自のリハビリテーション要件を満たすように調整されるように設計されている。仮想コーチングアシスタントとして機能する社会的対話型エージェントによって支援され、予備設定および指導段階を経て、患者は自宅の快適さで自律的にリハビリ体制を継続する。我々のアプローチは、対話型社会認識仮想エージェントを神経リハビリテーションロボットフレームワークに統合することであり、その主な目的は、リハビリテーションセッションに固有の社会的側面を再現することである。また,健常患者を対象に,本フレームワークの妥当性試験を行った。予備調査の結果,参加者はシステムに適応する確率を示した。特に,提案演習における対話エージェントの存在は,注意をそらす要因として機能せず,ユーザのエンゲージメントに肯定的な影響を及ぼした。

Individuals with diverse motor abilities often benefit from intensive and specialized rehabilitation therapies aimed at enhancing their functional recovery. Nevertheless, the challenge lies in the restricted availability of neurorehabilitation professionals, hindering the effective delivery of the necessary level of care. Robotic devices hold great potential in reducing the dependence on medical personnel during therapy but, at the same time, they generally lack the crucial human interaction and motivation that traditional in-person sessions provide. To bridge this gap, we introduce an AI-based system aimed at delivering personalized, out-of-hospital assistance during neurorehabilitation training. This system includes a rehabilitation training device, affective signal classification models, training exercises, and a socially interactive agent as the user interface. With the assistance of a professional, the envisioned system is designed to be tailored to accommodate the unique rehabilitation requirements of an individual patient. Conceptually, after a preliminary setup and instruction phase, the patient is equipped to continue their rehabilitation regimen autonomously in the comfort of their home, facilitated by a socially interactive agent functioning as a virtual coaching assistant. Our approach involves the integration of an interactive socially-aware virtual agent into a neurorehabilitation robotic framework, with the primary objective of recreating the social aspects inherent to in-person rehabilitation sessions. We also conducted a feasibility study to test the framework with healthy patients. The results of our preliminary investigation indicate that participants demonstrated a propensity to adapt to the system. Notably, the presence of the interactive agent during the proposed exercises did not act as a source of distraction; instead, it positively impacted users' engagement.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# MedCalc-Bench:医学計算のための大規模言語モデルの評価

MedCalc-Bench: Evaluating Large Language Models for Medical Calculations ( http://arxiv.org/abs/2406.12036v1 )

ライセンス: Link先を確認

Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu,

(参考訳) 計算と論理に基づく推論を評価するのとは対照的に、医学における大規模言語モデル(LLM)を評価するための現在のベンチマーク2マークは、主にドメイン知識と記述的rea4ソナリングを含む質問応答に焦点を当てている。このような定性的な能力は医学的診断に欠かせないが、現実の5つの世界のシナリオでは、医師は、定量式に従う臨床電卓や、エビデンスベースの意思決定支援のためのルールベースの推論パラダイムを頻繁に使用する。この目的のために, LLMの医療計算能力を評価することを目的とした, 第一種データセットであるMedCalc-Benchを提案する。 MedCalc-Benchには、55の異なる医療計算タスクから1000以上のレビュー済みのインスタンスの評価セットが含まれている。 MedCalc-Benchの各インスタンスは、患者ノート、特定の医学的価値の計算を要求する質問、真実の答え、そしてその答えがどのように得られるかを示すステップバイステップの説明からなる。以上の結果から, 当科におけるLSMsの有用性が示唆されるが, 臨床検査に十分な効果は得られていない。一般的な問題としては、不正なエンティティを抽出すること、計算タスクに正しい方程式や規則を使わないこと、計算の算術を誤って実行することなどがある。医療現場におけるLSMの量的知識と推論のギャップを強調し,様々な臨床計算タスクにおけるLCMの今後の改善を促すことを願っている。

As opposed to evaluating computation and logic-based reasoning, current bench2 marks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive rea4 soning. While such qualitative capabilities are vital to medical diagnosis, in real5 world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks.

翻訳日:2024-06-20 00:07:11 公開日:2024-06-17

# 大規模言語モデルにおける未学習のためのソフトプロンプト

Soft Prompting for Unlearning in Large Language Models ( http://arxiv.org/abs/2406.12038v1 )

ライセンス: Link先を確認

Karuna Bhaila, Minh-Hao Van, Xintao Wu,

(参考訳) LLM(Large Language Models)が広く普及しているのは、部分的には文脈内学習を行うユニークな能力のためであり、これらの事前訓練されたモデルをデプロイする際の倫理的・安全的配慮の重要性も明らかにされている。本研究では,データ保護規制を動機としたLLMの機械学習に関する研究に焦点をあてる。未学習を実現するための微調整手法に関する文献の増大とは対照的に、訓練データのサブセットの未学習を実現するためのソフトプロンプトと呼ばれる比較的軽量な代替手段に焦点を当てる。我々のフレームワークである \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) では、任意のクエリに付加可能なプロンプトトークンを学習し、LLMパラメータを更新することなく、推論時に特定の例のアンラーニングを誘導する。提案手法の厳密な評価を行い,その結果から,LLMを用いたテキスト分類の文脈において,SPULは実用性と忘れとのトレードオフを大幅に改善できることを示す。さらに,フレームワークのスケーラビリティを強調し,ハイパーパラメータの選択と未学習データのサイズの影響について詳細な知見を提供するために,複数のLSMを用いて手法を検証する。実装は \url{https://github.com/karuna-bhaila/llm_unlearning} で公開しています。

The widespread popularity of Large Language Models (LLMs), partly due to their unique ability to perform in-context learning, has also brought to light the importance of ethical and safety considerations when deploying these pre-trained models. In this work, we focus on investigating machine unlearning for LLMs motivated by data protection regulations. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we focus on a comparatively lightweight alternative called soft prompting to realize the unlearning of a subset of training data. With losses designed to enforce forgetting as well as utility preservation, our framework \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) learns prompt tokens that can be appended to an arbitrary query to induce unlearning of specific examples at inference time without updating LLM parameters. We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting in the context of text classification with LLMs. We further validate our method using multiple LLMs to highlight the scalability of our framework and provide detailed insights into the choice of hyperparameters and the influence of the size of unlearning data. Our implementation is available at \url{https://github.com/karuna-bhaila/llm_unlearning}.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# 宇宙空間のサイバー攻撃:新たなシナリオを生み出す

Outer Space Cyberattacks: Generating Novel Scenarios to Avoid Surprise ( http://arxiv.org/abs/2406.12041v1 )

ライセンス: Link先を確認

Patrick Lin, Keith Abney, Bruce DeBruhl, Kira Abercromby, Henry Danielson, Ryan Jenkins,

(参考訳) 一般の認識は低いかもしれないが、現代の宇宙システムが果たす重要な役割を考えると、宇宙のサイバー攻撃はますます深刻な問題となっている。オープンソースあるいは公開の議論は通常、衛星ハッキングや信号妨害や偽造など、いくつかの一般的なシナリオを中心に展開される。しかし、さらに多くの可能性があります。報告書はシナリオ・プロンプト・ジェネレータ(ICARUS行列と呼ばれる分類学)を提供しており、400万以上のシナリオ・プロンプトを作成できる。私たちは42のシナリオの開始セットを提供し、各シナリオを簡潔に説明し、想像力を最優先させ、より多くの研究者がこの問題に対処するための多様な専門知識と視点をもたらすことができるようにします。新たなシナリオを想像できないことは、我々の有線世界を支配するデジタルシステムに侵入するために、常に新しい方法、発明的かつ資源的な方法を考案している脅威アクターによって、驚きによって取られる大きなリスクである。警戒を維持するためには、サイバーセキュリティにおいてハンターと獲物の間の敵対的なダンスに追随するためにも、被告は想像力を持っていなければならない。新たなシナリオを提供するだけでなく、我々が特定した少なくとも7つの要因を含む、宇宙サイバーセキュリティ問題の原動力についても検討する。例えば、宇宙デブリの共有された脅威は、軌道上の運動的衝突を避けるために合理的な状態やアクターを押し付けているように思われる。外空間はサイバーセキュリティの次のフロンティアだ。宇宙のサイバー攻撃から守るためには、それらを理解して予測する必要がある。

Though general awareness around it may be low, space cyberattacks are an increasingly urgent problem given the vital role that space systems play in the modern world. Open-source or public discussions about it typically revolve around only a couple generic scenarios, namely satellite hacking and signals jamming or spoofing. But there are so many more possibilities. The report offers a scenario-prompt generator -- a taxonomy of sorts, called the ICARUS matrix -- that can create more than 4 million unique scenario-prompts. We will offer a starting set of 42 scenarios, briefly describing each one, to begin priming the imagination-pump so that many more researchers can bring their diverse expertise and perspectives to bear on the problem. A failure to imagine novel scenarios is a major risk in being taken by surprise and severely harmed by threat actors who are constantly devising new ways, inventive and resourceful ways, to breach the digital systems that control our wired world. To stay vigilant, defenders likewise need to be imaginative to keep up in this adversarial dance between hunter and prey in cybersecurity. More than offering novel scenarios, we will also explore the drivers of the space cybersecurity problem, which include at least seven factors we have identified. For instance, the shared threat of space debris would seem to push rational states and actors to avoid kinetic conflicts in orbit, which weighs in favor of cyberoperations as the dominant form of space conflicts. Outer space is the next frontier for cybersecurity. To guard against space cyberattacks, we need to understand and anticipate them, and imagination is at the very heart of both cybersecurity and frontiers.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# すべてのプロンプトが等しくなるわけではない:テキストと画像の拡散モデルのプロンプトベースプルーニング

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models ( http://arxiv.org/abs/2406.12042v1 )

ライセンス: Link先を確認

Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao, Heng Huang,

(参考訳) テキスト・ツー・イメージ(T2I)拡散モデルは印象的な画像生成能力を示している。それでも、その計算強度は、リソース制約のある組織がT2Iモデルを内部のターゲットデータに微調整した後に展開することを妨げている。プルーニング技術は、T2Iモデルの計算負担を軽減する潜在的な解決策を提供する一方で、静的プルーニング手法は、異なるプロンプトのキャパシティ要件を見越して、全ての入力プロンプトに対して同じプルーニングモデルを使用する。動的プルーニングは各プロンプトに個別のサブネットワークを使用することでこの問題に対処するが、GPUのバッチ並列化を防止している。これらの制約を克服するため、T2I拡散モデル用に設計された新しいプロンプトベースのプルーニング手法であるAdaptive Prompt-Tailored Pruning (APTP)を導入する。我々のアプローチの中心はプロンプトルータモデルであり、入力テキストプロンプトに必要なキャパシティを決定することを学習し、それをアーキテクチャコードにルーティングする。それぞれのアーキテクチャコードは、割り当てられたプロンプトに合わせた特別なモデルを表しており、コードの数はハイパーパラメータである。我々は、コントラスト学習を用いてプロンプトルータとアーキテクチャコードをトレーニングし、類似のプロンプトが近くのコードにマップされることを保証する。さらに、最適なトランスポートを使用して、コードが1つのコードに崩壊するのを防ぐ。我々は、CC3MとCOCOをターゲットデータセットとして、安定拡散(SD)V2.1をプルーニングすることでAPTPの有効性を示す。 APTPはFID、CLIP、CMMDスコアの点でシングルモデルプルーニングベースラインを上回っている。 APTPが学習したクラスタの分析により、意味論的に意味があることが判明した。また、APTPは、SD、例えばテキスト画像を生成するプロンプトに対して、以前に実証された挑戦的なプロンプトを自動的に検出し、より高いキャパシティコードにアサインできることも示している。

Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# グレードスコア:オプション選択におけるLLM性能の定量化

Grade Score: Quantifying LLM Performance in Option Selection ( http://arxiv.org/abs/2406.12043v1 )

ライセンス: Link先を確認

Dmitri Iourovitski,

(参考訳) 本研究では,Large Language Models (LLMs) の整合性と公平性を評価するために考案された新しい尺度 "Grade Score" を紹介する。グレードスコアは、注文バイアスを測定するエントロピーと、選択安定性を評価し、LLMの信頼性と公平性に関する洞察を提供するモード周波数を組み合わせる。本研究は,LLMの性能向上効果を実証し,評価スコアを最適化するために,迅速な工学的手法やオプションサンプリング手法などの手法を探求する。その結果,LSMのプロンプトに対する性能の変化が示され,無関係な選択肢を含めることによる肯定的な影響が浮き彫りになった。この研究では、特定のバイアスをターゲットとした指示に適応し、適応性を実証する命令追従モデルにおいて、創発的な行動を特定する。グレードスコアはLLMの比較を促進するとともに、様々なアプリケーションにおける信頼性と公平性を改善するための潜在的な可能性として、意思決定プロセスの最適化に向けた進行中の研究を促進する。すべてのコードはGitHub https://github.com/IoDmitri/GradeLabで入手できる。

This study introduces the "Grade Score", a novel metric designed to evaluate the consistency and fairness of Large Language Models (LLMs) when used as multiple-choice judges with respect to order bias and choice consistency. The Grade Score combines Entropy, which measures order bias, and Mode Frequency, which assesses choice stability, offering insights into LLMs' reliability and impartiality. The study explores techniques such as prompt engineering and option sampling strategies to optimize the Grade Score, demonstrating their effectiveness in enhancing LLMs' performance. Results showcase varying performances among LLMs with respect to prompts and highlight the positive impact of including irrelevant options. The study also identifies an emergent behavior in instruction-following models, where they adapt to instructions targeting specific biases, demonstrating their adaptability. The Grade Score facilitates comparisons between LLMs and encourages ongoing research towards optimizing their decision-making processes, with potential implications for improving their reliability and fairness in various applications. All code is available on GitHub https://github.com/IoDmitri/GradeLab

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# ARTIST:アンタングル化によるテキストリッチ画像生成の改善

ARTIST: Improving the Generation of Text-rich Images by Disentanglement ( http://arxiv.org/abs/2406.12044v1 )

ライセンス: Link先を確認

Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang,

(参考訳) 拡散モデルは、広い範囲の視覚コンテンツを生成できるという異常な能力を示したが、テキストの描画能力はまだ限られており、下層の画像とうまく融合できない不正確な文字や単語を生成することが多い。これらの欠点に対処するため、ARTISTという新しいフレームワークを導入する。このフレームワークには専用のテキスト拡散モデルが含まれており、特にテキスト構造の学習に焦点を当てている。当初、テキスト表現の複雑さを捉えるために、このテキストモデルを事前訓練する。その後、視覚拡散モデルを微調整し、事前訓練されたテキストモデルからテキスト構造情報を同化できるようにする。この歪んだアーキテクチャ設計とトレーニング戦略は、テキストリッチな画像生成のための拡散モデルのテキストレンダリング能力を著しく向上させる。さらに、トレーニング済みの大規模言語モデルの能力を活用して、ユーザの意図をよりよく解釈し、生成品質の向上に貢献します。 MARIO-Evalベンチマークの実証結果は,提案手法の有効性を裏付けるものであり,様々な指標において最大15倍の精度向上を示した。

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image. To address these shortcomings, we introduce a new framework named ARTIST. This framework incorporates a dedicated textual diffusion model to specifically focus on the learning of text structures. Initially, we pretrain this textual model to capture the intricacies of text representation. Subsequently, we finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model. This disentangled architecture design and the training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation. Additionally, we leverage the capabilities of pretrained large language models to better interpret user intentions, contributing to improved generation quality. Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15\% in various metrics.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# $τ$-bench: 実世界のドメインにおけるツール-エージェント-ユーザインタラクションのベンチマーク

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains ( http://arxiv.org/abs/2406.12045v1 )

ライセンス: Link先を確認

Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan,

(参考訳) 既存のベンチマークでは、人間のユーザとのインタラクションやドメイン固有のルールに従う能力について、言語エージェントをテストすることはできない。ドメイン固有のAPIツールとポリシーガイドラインを備えた言語エージェントとユーザ(言語モデルでシミュレートされた)間の動的会話をエミュレートするベンチマークである$\tau$-benchを提案する。我々は、会話の最後にデータベースの状態と注釈付きゴール状態を比較する、効率的で忠実な評価プロセスを採用する。また,複数の試行においてエージェント動作の信頼性を評価するための新しい指標(pass^k)を提案する。実験の結果,gpt-4oのような最先端機能呼び出しエージェントでもタスクの50%が成功し,非常に矛盾している(小売りではpass^8 <25%)ことがわかった。本研究は, エージェントが一貫して行動し, ルールを確実に追従する能力を向上する手法の必要性を指摘する。

Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications. We propose $\tau$-bench, a benchmark emulating dynamic conversations between a user (simulated by language models) and a language agent provided with domain-specific API tools and policy guidelines. We employ an efficient and faithful evaluation process that compares the database state at the end of a conversation with the annotated goal state. We also propose a new metric (pass^k) to evaluate the reliability of agent behavior over multiple trials. Our experiments show that even state-of-the-art function calling agents (like gpt-4o) succeed on <50% of the tasks, and are quite inconsistent (pass^8 <25% in retail). Our findings point to the need for methods that can improve the ability of agents to act consistently and follow rules reliably.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# MEDeA: マルチビュー効率の良い深さ調整

MEDeA: Multi-view Efficient Depth Adjustment ( http://arxiv.org/abs/2406.12048v1 )

ライセンス: Link先を確認

Mikhail Artemyev, Anna Vorontsova, Anna Sokolova, Alexander Limonov,

(参考訳) 現代の単一視点深度推定手法の大多数は相対的な深さを予測しており、ベンチマークで顕著な性能を示したにもかかわらず、多くの実世界のシナリオでは直接適用できない。さらに、単一ビューアプローチは、一連のフレーム間の一貫性を保証することはできない。一貫性は通常、ビュー間の不一致をテスト時の最適化で対処するが、単一のシーンを処理するのに数時間かかる。本稿では,従来のテスト時間手法よりもはるかに高速な多視点テスト時間深度補正手法であるMEDeAを提案する。カメラパラメータを持つRGBフレームが与えられた場合、MEDeAは初期深度マップを予測し、局所スケーリング係数を最適化して調整し、時間的に一貫性のある深度マップを出力する。 MEDeAは、正規化や光フロー、セマンティックス推定を必要とするテスト時間法とは対照的に、深度推定ネットワークのみで高品質な予測を行う。提案手法は, TUM RGB-D, 7Scenes, ScanNet のベンチマークに新たな最先端性を設定し,ARKitScenes データセットから取得したスマートフォンデータの処理に成功している。

The majority of modern single-view depth estimation methods predict relative depth and thus cannot be directly applied in many real-world scenarios, despite impressive performance in the benchmarks. Moreover, single-view approaches cannot guarantee consistency across a sequence of frames. Consistency is typically addressed with test-time optimization of discrepancy across views; however, it takes hours to process a single scene. In this paper, we present MEDeA, an efficient multi-view test-time depth adjustment method, that is an order of magnitude faster than existing test-time approaches. Given RGB frames with camera parameters, MEDeA predicts initial depth maps, adjusts them by optimizing local scaling coefficients, and outputs temporally-consistent depth maps. Contrary to test-time methods requiring normals, optical flow, or semantics estimation, MEDeA produces high-quality predictions with a depth estimation network solely. Our method sets a new state-of-the-art on TUM RGB-D, 7Scenes, and ScanNet benchmarks and successfully handles smartphone-captured data from ARKitScenes dataset.

翻訳日:2024-06-20 00:07:10 公開日:2024-06-17

# 回答を超えて学ぶ:数学的推論のためのリフレクションを用いた言語モデルの訓練

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning ( http://arxiv.org/abs/2406.12050v1 )

ライセンス: Link先を確認

Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang,

(参考訳) 教師付き微調整により、様々な数学的推論タスクにおける言語モデルの問題解決能力が向上する。このような利点を最大化するために、既存の研究は、標準的な単ラウンド質問応答設定に有効である様々なデータ拡張手法でトレーニングセットを拡張することに焦点を当てている。我々の研究は,目前にあるトレーニング問題を深く理解し,標準設定だけでなく,反射的思考を必要とするより複雑なシナリオでもパフォーマンスを向上させることを目的とした,新しい手法を導入している。具体的には,各トレーニングインスタンスに問題リフレクションを埋め込む手法であるリフレクティブ拡張を提案する。モデルに代替的な視点を考慮させ、抽象論やアナロジーに関わり、反射的推論を通じて完全な理解を促進するよう訓練する。本手法の特長と既存拡張技術に対する相補的特性を概説し, 目的達成の実証実験を行った。

Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper understanding of the training problems at hand, enhancing performance not only in standard settings but also in more complex scenarios that require reflective thinking. Specifically, we propose reflective augmentation, a method that embeds problem reflection into each training instance. It trains the model to consider alternative perspectives and engage with abstractions and analogies, thereby fostering a thorough comprehension through reflective reasoning. Extensive experiments validate the achievement of our aim, underscoring the unique advantages of our method and its complementary nature relative to existing augmentation techniques.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# UniGLM: テキスト分散グラフのための統一言語モデルのトレーニング

UniGLM: Training One Unified Language Model for Text-Attributed Graphs ( http://arxiv.org/abs/2406.12052v1 )

ライセンス: Link先を確認

Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, Qiaoyu Tan,

(参考訳) ノードをテキスト記述で表現するTAG(text-attributed graph)での表現学習は、テキストおよびリレーショナル知識システムやレコメンデーションシステムにおいて重要である。現在、TAGの最先端の埋め込み手法は主に構造認識学習信号を用いた微調整言語モデル(例えばBERT)に焦点を当てている。有効ではあるが、これらの手法は個々のTAGに合わせて調整されており、様々なグラフシナリオにまたがる一般化はできない。共有されたテキスト空間を考えると、複数のTAGを活用して、異なる側面からテキストとグラフ構造を調整することはより有益である。そこで我々はUnified Graph Language Model (UniGLM) フレームワークを紹介した。これは、ドメイン内およびドメイン間のTAGをうまく一般化する最初のグラフ埋め込みモデルである。具体的には、UniGLMは、異なるドメインとスケールを持つ複数のTAGに対して、自己教師付きコントラスト学習を使用して訓練される。 UniGLMには、構造的に類似したノードを特定するための適応的な正のサンプル選択技術と、反復符号化計算を最小化してトレーニングを加速するために考案された遅延コントラストモジュールが含まれている。 9つのベンチマークTAGの広範な実験結果は、UniGLMが一般化(様々な下流タスクとバックボーン)と移行学習(ドメインシナリオ内および外)の観点から、主要な埋め込みベースラインに対して有効であることを実証している。コードはhttps://github.com/NYUSHCS/UniGLMで入手できる。

Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for individual TAG and cannot generalize across various graph scenarios. Given the shared textual space, leveraging multiple TAGs for joint fine-tuning, aligning text and graph structure from different aspects, would be more beneficial. Motivated by this, we introduce a novel Unified Graph Language Model (UniGLM) framework, the first graph embedding model that generalizes well to both in-domain and cross-domain TAGs. Specifically, UniGLM is trained over multiple TAGs with different domains and scales using self-supervised contrastive learning. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training by minimizing repetitive encoding calculations. Extensive empirical results across 9 benchmark TAGs demonstrate UniGLM's efficacy against leading embedding baselines in terms of generalization (various downstream tasks and backbones) and transfer learning (in and out of domain scenarios). The code is available at https://github.com/NYUSHCS/UniGLM.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# 内部インスペクタ$I^2$:内部状態によるLLMのロバスト信頼度推定

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States ( http://arxiv.org/abs/2406.12053v1 )

ライセンス: Link先を確認

Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang,

(参考訳) 大きな言語モデル(LLM)は、その膨大な能力にもかかわらず、信頼できる出力を生成するのにしばしば苦労し、幻覚として知られる高信頼の不正確さをしばしば生み出す。この課題に対処するため,本研究では,すべてのレイヤの注意状態,フィードフォワード状態,アクティベーション状態を含む内部状態に対するコントラスト学習を活用することで,LCMにおける信頼度推定を向上する新しいフレームワークであるInternalInspectorを紹介した。最終的なアクティベーション状態に主にフォーカスする既存の方法とは異なり、InternalInspectorはすべてのレイヤの内部状態を網羅的に分析し、正しい予測プロセスと間違った予測プロセスの両方を正確に識別する。事実質問応答,コモンセンス推論,読解理解など,さまざまな自然言語理解・生成タスクにおける既存の信頼度推定手法に対して,内部検査器をベンチマークすることにより,推定された信頼度スコアをLLMの予測の正しさと低いキャリブレーション誤差の正しさとを一致させる精度を著しく向上させる。さらに、幻覚検出ベンチマークであるHaluEvalでは、内部インスペクタが優れており、このタスクにおける他の内部信頼度推定方法よりも優れている。

Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering, commonsense reasoning, and reading comprehension, InternalInspector achieves significantly higher accuracy in aligning the estimated confidence scores with the correctness of the LLM's predictions and lower calibration error. Furthermore, InternalInspector excels at HaluEval, a hallucination detection benchmark, outperforming other internal-based confidence estimation methods in this task.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# FAWN:Floor-and-walls normal regularization for direct Neural TSDF Reconstruction

FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction ( http://arxiv.org/abs/2406.12054v1 )

ライセンス: Link先を確認

Anna Sokolova, Anna Vorontsova, Bulat Gabdullin, Alexander Limonov,

(参考訳) 直接3D再構成のための3Dセマンティクスを活用することは、大きな可能性を秘めている。例えば、壁が垂直で、床が平面で水平であると仮定することで、歪んだ部屋の形を補正し、穴、穴、丘などの局所的な遺物を取り除くことができる。本稿では,シーン内の壁や床を検知してシーン構造を考察し,水平方向と垂直方向を逸脱するための対応する表面正規化をペナルライズする,TSDF (truncated signed distance function) 再構成手法であるFAWNを提案する。 3Dスパース畳み込みモジュールとして実装されたFAWNは、TSDFを予測するトレーニング可能なパイプラインに組み込むことができる。 FAWNはトレーニングのためにのみ3Dセマンティクスを必要とするため、さらなる使用に関する追加の制限は課されない。 FAWNを修飾した手法は,既存の意味に基づく手法よりも,意味論を効果的に活用することが実証された。また,最新のTSDF再構成手法に適用し,SCANNET, ICL-NUIM, TUM RGB-D, 7SCENESベンチマークの品質向上を示す。

Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes and eliminate local artifacts such as holes, pits, and hills. In this paper, we propose FAWN, a modification of truncated signed distance function (TSDF) reconstruction methods, which considers scene structure by detecting walls and floor in a scene, and penalizing the corresponding surface normals for deviating from the horizontal and vertical directions. Implemented as a 3D sparse convolutional module, FAWN can be incorporated into any trainable pipeline that predicts TSDF. Since FAWN requires 3D semantics only for training, no additional limitations on further use are imposed. We demonstrate, that FAWN-modified methods use semantics more effectively, than existing semantic-based approaches. Besides, we apply our modification to state-of-the-art TSDF reconstruction methods, and demonstrate a quality gain in SCANNET, ICL-NUIM, TUM RGB-D, and 7SCENES benchmarks.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# 細胞内における分子表現の学習

Learning Molecular Representation in a Cell ( http://arxiv.org/abs/2406.12056v1 )

ライセンス: Link先を確認

Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh,

(参考訳) 薬物の有効性と安全性をin vivoで予測するには、小さな分子摂動に対する生物学的反応(細胞形態、遺伝子発現など)に関する情報が必要である。しかしながら、現在の分子表現学習法は、これらの摂動下での細胞状態の包括的なビューを提供しておらず、ノイズを取り除くのに苦労し、モデル一般化を妨げている。本稿では,細胞内情報ボトルネック法を用いて分子表現を学習するための情報アライメント(InfoAlign)手法を提案する。我々は、分子と細胞応答データをノードとしてコンテキストグラフに統合し、化学、生物学的、計算基準に基づいて重み付けされたエッジと接続する。トレーニングバッチの各分子に対して、InfoAlignはエンコーダの潜在表現を最小限の目的で最適化し、冗長な構造情報を破棄する。十分性目的(sufficiency objective)は、コンテキストグラフ内の分子の近傍から異なる特徴空間と整合するように表現をデコードする。提案手法は,既存のエンコーダをベースとしたコントラスト法よりも,アライメントの効率向上を目標としている。経験的に、我々はInfoAlignの表現を2つの下流タスクで検証した: 4つのデータセットにまたがる19のベースライン法に対する分子特性予測とゼロショット分子形態整合である。

Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# WellDunn: ウェルネス次元の同定における言語モデルと大規模言語モデルのロバスト性と説明可能性について

WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions ( http://arxiv.org/abs/2406.12058v1 )

ライセンス: Link先を確認

Seyedali Mohammadi, Edward Raff, Jinendra Malekar, Vedant Palit, Francis Ferraro, Manas Gaur,

(参考訳) 言語モデル (LM) は, 予後のリスクを高めることで, 臨床実践におけるモデルの有用性の十分なリトマステストにはならない, メンタルヘルスの分野で提案されている。実践に信頼できるモデルは、説明と臨床的決定の対応性を持つべきであるが、これらのモデルの注意力と、それらの基礎的真理的説明への影響について、事前の研究は行われていない。本稿では,ウェルネス次元(WD)の同定におけるLMの堅牢性と説明性に着目した評価設計を提案する。 2つのメンタルヘルスと幸福なデータセットに焦点を当てます。 (a)多ラベル分類に基づくMultiWD及び b) 専門家による説明に対する注意機構の妥当性を評価するためのWellXplain ラベルはハルベルト・ダンのウェルネスの理論に基づいている。 1)人間のような能力にもかかわらず、RoBERTaに遅れてGPT-3.5/4ラグ、そしてMedAlpacaでは、微調整のLDMでは、パフォーマンスや説明に顕著な改善が得られなかった。 2)信頼性指向の損失関数に基づくLMの予測を再検討した結果,性能低下が顕著であった。 (3) すべてのLM/LLMにおいて, 注意と説明の整合性は低く, LLMは0.0。 (4)ほとんどの精神保健専門のLM/LLMは、ドメイン固有の知識や価値の低い説明を見落とし、これらの相違の原因となった。この研究は、精神保健と健康における一貫性と説明について、さらなる研究の必要性を強調している。

Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that can be trusted for practice should have a correspondence between explanation and clinical determination, yet no prior research has examined the attention fidelity of these models and their effect on ground truth explanations. We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WD). We focus on two mental health and well-being datasets: (a) Multi-label Classification-based MultiWD, and (b) WellXplain for evaluating attention mechanism veracity against expert-labeled explanations. The labels are based on Halbert Dunn's theory of wellness, which gives grounding to our evaluation. We reveal four surprising results about LMs/LLMs: (1) Despite their human-like capabilities, GPT-3.5/4 lag behind RoBERTa, and MedAlpaca, a fine-tuned LLM fails to deliver any remarkable improvements in performance or explanations. (2) Re-examining LMs' predictions based on a confidence-oriented loss function reveals a significant performance drop. (3) Across all LMs/LLMs, the alignment between attention and explanations remains low, with LLMs scoring a dismal 0.0. (4) Most mental health-specific LMs/LLMs overlook domain-specific knowledge and undervalue explanations, causing these discrepancies. This study highlights the need for further research into their consistency and explanations in mental health and well-being.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# グラフ変換器のスケーラブルで効果的な代替手段

A Scalable and Effective Alternative to Graph Transformers ( http://arxiv.org/abs/2406.12059v1 )

ライセンス: Link先を確認

Kaan Sancak, Zhigang Hua, Jin Fang, Yan Xie, Andrey Malevich, Bo Long, Muhammed Fatih Balin, Ümit V. Çatalyürek,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ表現学習において顕著なパフォーマンスを示しているが、表現力の制限により、長距離依存をキャプチャする上での課題に直面している。これを解決するために、グラフ変換器(GT)が導入された。これらの利点にもかかわらず、GTはグラフ内のノード数という二次的な複雑さに悩まされ、大きなグラフに適用できなくなる。本研究では,グラフ拡張コンテキスト演算子(GECO)を提案する。これはGTのスケーラブルで効果的な代替手段であり,近隣の伝播とグローバルな畳み込みを利用して,準線形時間で局所的およびグローバルな依存関係を効果的にキャプチャする。合成データセットについて検討した結果,GECOは2Mノードを最適化したグラフ上で169倍の高速化を実現していることがわかった。さまざまなベンチマークに関するさらなる評価は、GECOが従来のGTがメモリと時間制限に直面している大きなグラフにスケールすることを示している。特にGECOは、ベースラインに比べて一貫して同等または優れた品質を実現し、SOTAを4.5%まで改善し、大規模グラフ学習のためのスケーラブルで効果的なソリューションを提供する。

Graph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasilinear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# ゼロではなく集約:自然言語理解におけるショートカットシフトに対処するための実験の混合によるポストホック制御

Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding ( http://arxiv.org/abs/2406.12060v1 )

ライセンス: Link先を確認

Ukyo Honda, Tatsushi Oka, Peinan Zhang, Masato Mita,

(参考訳) 最近の自然言語理解モデルは、一般的にショートカットとして知られるデータセットの単純なパターンを利用する傾向にある。これらのショートカットは、トレーニングデータに存在するラベルと潜在機能の間の急激な相関にヒンジする。推定時において、ショートカットに依存したモデルは、特にラベルと関係のない潜在的特徴がなくなった場合、分布シフトの下で誤った予測を生成する傾向にある。これを避けるために、従来の研究ではショートカットへの依存を取り除くためにモデルを訓練してきた。本研究では,各専門家が比較的異なる潜伏特徴を捉えると仮定して,実験結果の混合予測を悲観的に集約する。実験結果から,専門家に対するポストホック制御は,ショートカットにおける分布シフトに対するモデルのロバスト性を大幅に向上させることが示された。さらに、我々のアプローチにはいくつかの実用的な利点があることが示されています。また、我々のモデルを分析し、その仮定を支持する結果を提供する。

Recent models for natural language understanding are inclined to exploit simple patterns in datasets, commonly known as shortcuts. These shortcuts hinge on spurious correlations between labels and latent features existing in the training data. At inference time, shortcut-dependent models are likely to generate erroneous predictions under distribution shifts, particularly when some latent features are no longer correlated with the labels. To avoid this, previous studies have trained models to eliminate the reliance on shortcuts. In this study, we explore a different direction: pessimistically aggregating the predictions of a mixture-of-experts, assuming each expert captures relatively different latent features. The experimental results demonstrate that our post-hoc control over the experts significantly enhances the model's robustness to the distribution shift in shortcuts. Besides, we show that our approach has some practical advantages. We also analyze our model and provide results to support the assumption.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# エントロピック回帰MD(ERDMD)は不均一時間遅れモデルと不均一時間遅延モデルを発見する

Entropic Regression DMD (ERDMD) Discovers Informative Sparse and Nonuniformly Time Delayed Models ( http://arxiv.org/abs/2406.12062v1 )

ライセンス: Link先を確認

Christopher W. Curtis, Erik Bollt, Daniel Jay Alford-Lago,

(参考訳) 本研究では,非線形情報フロー検出アルゴリズムであるエントロピー回帰を用いて,最適多段階動的モード分解(DMD)モデルを決定する手法を提案する。本研究では,高次DMD (HODMD) 法と,ネットワーク検出とモデル構築のためのエントロピック回帰 (ER) 手法を用いて,不均一な時間空間を許容する高忠実度時間遅延DMDモデルを生成するEDDMDと呼ばれる手法を開発した。これらのモデルは、非常に効率的で堅牢であることが示されている。カオス的アトラクタによって生成された複数のデータセット上で本手法を検証し,比較的最小限のモデルを用いて優れた再構成を構築可能であることを示す。同様に、動的モード分解の実用性を高めるモデルにより、マルチスケールの機能をよりよく識別できる。

In this work, we present a method which determines optimal multi-step dynamic mode decomposition (DMD) models via entropic regression, which is a nonlinear information flow detection algorithm. Motivated by the higher-order DMD (HODMD) method of \cite{clainche}, and the entropic regression (ER) technique for network detection and model construction found in \cite{bollt, bollt2}, we develop a method that we call ERDMD that produces high fidelity time-delay DMD models that allow for nonuniform time space, and the time spacing is discovered by consider most informativity based on ER. These models are shown to be highly efficient and robust. We test our method over several data sets generated by chaotic attractors and show that we are able to build excellent reconstructions using relatively minimal models. We likewise are able to better identify multiscale features via our models which enhances the utility of dynamic mode decomposition.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# STNAGNN:タスクベースfMRI解析のための時空間ノード注意グラフニューラルネットワーク

STNAGNN: Spatiotemporal Node Attention Graph Neural Network for Task-based fMRI Analysis ( http://arxiv.org/abs/2406.12065v1 )

ライセンス: Link先を確認

Jiyao Wang, Nicha C. Dvornek, Peiyu Duan, Lawrence H. Staib, Pamela Ventola, James S. Duncan,

(参考訳) タスクベースのfMRIは、アクションまたは刺激を使用して、タスク固有の脳反応をトリガーし、BOLDコントラストを使用してそれらを測定する。タスクによる時空間脳活動の著しい変動にもかかわらず、タスクベースfMRIの研究の多くは、タスクコンテキスト情報がfMRIと一致していることを無視し、タスクベースfMRIをコヒーレントなシーケンスとみなす。本稿では,タスク構造をデータ駆動型ガイダンスとして用いることが時空間分析に有効であることを示す。本稿では,GNNに基づく時空間アーキテクチャSTNAGNNを提案し,その性能を自閉症分類タスクで検証する。トレーニングされたモデルは、自閉症に関連する時空間脳バイオマーカーを特定するためにも解釈される。

Task-based fMRI uses actions or stimuli to trigger task-specific brain responses and measures them using BOLD contrast. Despite the significant task-induced spatiotemporal brain activation fluctuations, most studies on task-based fMRI ignore the task context information aligned with fMRI and consider task-based fMRI a coherent sequence. In this paper, we show that using the task structures as data-driven guidance is effective for spatiotemporal analysis. We propose STNAGNN, a GNN-based spatiotemporal architecture, and validate its performance in an autism classification task. The trained model is also interpreted for identifying autism-related spatiotemporal brain biomarkers.

翻訳日:2024-06-19 23:57:20 公開日:2024-06-17

# バイオメディカルベンチマークにおける薬物名と言語モデル

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks ( http://arxiv.org/abs/2406.12066v1 )

ライセンス: Link先を確認

Jack Gallifant, Shan Chen, Pedro Moreira, Nikolaj Munch, Mingye Gao, Jackson Pond, Leo Anthony Celi, Hugo Aerts, Thomas Hartvigsen, Danielle Bitterman,

(参考訳) 医学知識は文脈に依存しており、意味論的に等価なフレーズの様々な自然言語表現に対して一貫した推論を必要とする。これは薬名にとって特に重要であり、患者は一般的な等価品の代わりにAdvilやTylenolといったブランド名を使うことが多い。そこで本研究では,医用医用アノテーションを用いて医用ベンチマークの性能差を評価するために,新しい頑健性データセットであるRABBITSを作成した。 MedQA と MedMCQA のオープンソース LLM と API ベースの LLM を比較し,一貫した性能低下を 1-10 % から明らかにした。さらに、この脆弱性の潜在的な源泉を、広く使われている事前学習データセットにおけるテストデータの汚染として同定する。すべてのコードはhttps://github.com/BittermanLab/RABBITSでアクセスでき、HuggingFaceのリーダーボードはhttps://huggingface.co/spaces/AIM-Harvard/rabbits- Leaderboardで利用できる。

Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping brand and generic drug names using physician expert annotations. We assess both open-source and API-based LLMs on MedQA and MedMCQA, revealing a consistent performance drop ranging from 1-10\%. Furthermore, we identify a potential source of this fragility as the contamination of test data in widely used pre-training datasets. All code is accessible at https://github.com/BittermanLab/RABBITS, and a HuggingFace leaderboard is available at https://huggingface.co/spaces/AIM-Harvard/rabbits-leaderboard.

翻訳日:2024-06-19 23:57:19 公開日:2024-06-17

# Satyrn: 分析強化世代のためのプラットフォーム

Satyrn: A Platform for Analytics Augmented Generation ( http://arxiv.org/abs/2406.12069v1 )

ライセンス: Link先を確認

Marko Sterbentz, Cameron Barrie, Shubham Shahi, Abhratanu Dutta, Donna Hooshmand, Harper Pack, Kristian J. Hammond,

(参考訳) 大規模言語モデル(LLM)は文書を作成でき、検索拡張生成(RAG)は、流速を犠牲にすることなく精度を向上する強力な方法であることが示されている。しかし、すべての情報をテキストから取り出すことはできない。本稿では、構造化データの解析を用いて、検索された文書がRAGで使用されるのとほとんど同じように、生成をガイドするために使用される事実集合を生成するアプローチを提案する。この分析拡張生成(AAG)アプローチは、標準的な分析技術を使用して、テキストに変換してLLMに渡される事実を生成する能力をサポートする。我々は、AAGを利用して大規模データベース上に構築された正確で流動的でコヒーレントなレポートを生成する、ニューロシンボリックなプラットフォームであるSatyrnを提案する。実験の結果,約57%のクレームが正確である GPT-4 Code Interpreter と比較して,Mistral-7B のようなより小さな言語モデルを用いても,高いフラレンシとコヒーレンスを維持しつつ,精度の高いクレームを 86% 以上生成していることがわかった。

Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.

翻訳日:2024-06-19 23:57:19 公開日:2024-06-17

# DTGB: 動的テキスト分散グラフの総合ベンチマーク

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs ( http://arxiv.org/abs/2406.12072v1 )

ライセンス: Link先を確認

Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying,

(参考訳) 動的テキスト分散グラフ(DyTAG)は、各ノードとエッジがテキスト記述と関連付けられ、グラフ構造とテキスト記述の両方が時間とともに進化する様々な実世界のシナリオで一般的である。適用性は広いが、DyTAGに合わせたベンチマークデータセットが不足しているため、多くの研究分野での潜在的な進歩を妨げている。このギャップに対処するために、動的テキスト分散グラフベンチマーク(DTGB)を導入します。これは、テキスト属性とカテゴリを動的に変更することで、ノードとエッジを豊かにする、さまざまなドメインからの大規模で時間進化的なグラフのコレクションです。 DTGBの使用を容易にするため,将来的なリンク予測,宛先ノード検索,エッジ分類,テキスト関係生成の4つの実世界のユースケースに基づいた標準化された評価手順を設計した。これらのタスクは、動的グラフ構造と自然言語の両方を理解するためにモデルを必要とし、DyTAGsによって引き起こされるユニークな課題を強調します。さらに、DTGB上で広範囲なベンチマーク実験を行い、7つの人気のある動的グラフ学習アルゴリズムと、LLM埋め込みによるテキスト属性への適応のバリエーションを6つの強力な大言語モデル(LLM)とともに評価した。以上の結果から,DyTAGの処理における既存モデルの限界が示唆された。また, 構造力学とテキスト力学の一体化について, DTGBの有用性を考察した。提案されたDTGBは、DyTAGとその幅広い応用に関する研究を促進する。動的グラフ構造と自然言語間の相互作用を扱うためのモデルの評価と進化のための包括的なベンチマークを提供する。データセットとソースコードはhttps://github.com/zjs123/DTGBで入手できる。

Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs. Moreover, we conduct extensive benchmark experiments on DTGB, evaluating 7 popular dynamic graph learning algorithms and their variants of adapting to text attributes with LLM embeddings, along with 6 powerful large language models (LLMs). Our results show the limitations of existing models in handling DyTAGs. Our analysis also demonstrates the utility of DTGB in investigating the incorporation of structural and textual dynamics. The proposed DTGB fosters research on DyTAGs and their broad applications. It offers a comprehensive benchmark for evaluating and advancing models to handle the interplay between dynamic graph structures and natural language. The dataset and source code are available at https://github.com/zjs123/DTGB.

翻訳日:2024-06-19 23:57:19 公開日:2024-06-17

# コミュニティ・クロス・インストラクト:大規模言語モデルをオンラインコミュニティにアライメントするための教師なしインストラクション生成

COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities ( http://arxiv.org/abs/2406.12074v1 )

ライセンス: Link先を確認

Zihao He, Rebecca Dorn, Siyi Guo, Minh Duc Chu, Kristina Lerman,

(参考訳) 社会科学者は、人口の意見や信念を調査するために調査を行っているが、これらの手法は遅く、費用がかかり、偏見がちである。大規模言語モデル(LLM)の最近の進歩は、人口の言語、スタイル、態度を模倣する人間のような反応を生成する集団の計算表現や「デジタル双子」を作成することを可能にする。コミュニティ・クロス・インストラクション(Community-Cross-Instruct)は、LLMをオンラインコミュニティに調整し、彼らの信念を導き出すための、教師なしのフレームワークである。コミュニティ・クロス・インストラクトは,コミュニティのオンライン議論のコーパスを前提として,先進的なLCMによるインストラクション・アウトプット・ペアを自動生成し,(1)基礎的なLCMを微調整してコミュニティを忠実に表現し,(2)細調整されたモデルのコミュニティへのアライメントを評価する。 Reddit上で政治・フィットネスのコミュニティを正確に表現する上で,本手法の有用性を実証する。従来の方法とは異なり、Community-Cross-Instructは、完全に教師なしの方法で命令を生成し、拡張性とドメイン間の一般化を促進する。この作業により、様々なオンラインコミュニティの費用対効果と自動調査が可能になる。

Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune an foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and fitness communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities.

翻訳日:2024-06-19 23:57:19 公開日:2024-06-17

# 宣言的時間仕様に対するファジィログのコンパタンスチェック

Conformance Checking of Fuzzy Logs against Declarative Temporal Specifications ( http://arxiv.org/abs/2406.12078v1 )

ライセンス: Link先を確認

Ivan Donadello, Paolo Felli, Craig Innes, Fabrizio Maria Maggi, Marco Montali,

(参考訳) 従来の適合性チェックタスクは、イベントデータが実際のプロセス実行の忠実で完全な表現を提供すると仮定します。多くの場合、イベントは明示的にトレースされないが、イベント認識パイプラインの結果として間接的に取得されるため、本質的に不確実性が発生する。本研究では、不確実性の典型的な確率論的解釈とは違い、ファジィ意味論の下で、不確実性が実際にどの活動が行われているかを示す場合について考察する。本稿では,ファジィ事象データが宣言的時間規則に適合しているか,あるいはより一般的には,有限トレース(LTLf)上の線形時間論理の定式化として検討する。これは、各瞬間に1つのアクティビティのみが実行されるという仮定を緩和し、ファジィセマンティクスで論理のブール演算子を再定義する必要がある。具体的には、3倍のコントリビューションを提供します。まず,我々の目的に合わせて,ファジィなLTLfを定義する。次に,この論理の検証問題としてファジィログに対する適合性チェックを行った。第三に、複数のファジィトレースの適合性をチェックするのに適した、PythonライブラリPyTorchに基づく概念実証、効率的な実装を提供する。

Traditional conformance checking tasks assume that event data provide a faithful and complete representation of the actual process executions. This assumption has been recently questioned: more and more often events are not traced explicitly, but are instead indirectly obtained as the result of event recognition pipelines, and thus inherently come with uncertainty. In this work, differently from the typical probabilistic interpretation of uncertainty, we consider the relevant case where uncertainty refers to which activity is actually conducted, under a fuzzy semantics. In this novel setting, we consider the problem of checking whether fuzzy event data conform with declarative temporal rules specified as Declare patterns or, more generally, as formulae of linear temporal logic over finite traces (LTLf). This requires to relax the assumption that at each instant only one activity is executed, and to correspondingly redefine boolean operators of the logic with a fuzzy semantics. Specifically, we provide a threefold contribution. First, we define a fuzzy counterpart of LTLf tailored to our purpose. Second, we cast conformance checking over fuzzy logs as a verification problem in this logic. Third, we provide a proof-of-concept, efficient implementation based on the PyTorch Python library, suited to check conformance of multiple fuzzy traces at once.

翻訳日:2024-06-19 23:57:19 公開日:2024-06-17

# 多次元プルーニング:レイテンシ制約による結合チャネル, 層, ブロックプルーニング

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint ( http://arxiv.org/abs/2406.12079v1 )

ライセンス: Link先を確認

Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez,

(参考訳) 様々な視覚タスクにおける性能の境界を推し進めると、モデルのサイズはそれに応じて大きくなる。この成長に追従するためには、エッジデバイスへの効率的な推論とデプロイのための非常に積極的なプルーニング技術が必要です。既存のプルーニング手法はチャネルプルーニングに限られており、アグレッシブパラメータ削減に苦慮している。本稿では,遅延制約に固執しつつ,チャネル,レイヤ,ブロック間のプルーニングを協調的に最適化する,新しい多次元プルーニングフレームワークを提案する。我々は,プルーニング中にモデル全体の遅延変動を正確に把握する遅延モデリング手法を開発し,高いプルーニング比で最適な遅延精度トレードオフを実現するために重要である。混合整数非線形プログラム (MINLP) としてプルーニングを再構成し, 最適プルーニング構造を1パスのみで効率的に決定する。以上の結果から, 従来手法に比べて, 特に大きな刈り取り率で大幅な改善が見られた。分類では,Top-1精度が70.0(v.s. 68.6),FPSが5262 im/s(v.s. 4101 im/s)であった。 3Dオブジェクト検出では,StreamPETRを45%のプルーニング比で刈り上げ,FPS(37.3 vs. 31.7)とmAP(0.451 vs. 0.449)を高密度ベースラインより高めることにより,新たな最先端技術を確立する。

As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 大規模データセットのリアルタイムレンダリングのための階層型3次元ガウス表現

A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets ( http://arxiv.org/abs/2406.12080v1 )

ライセンス: Link先を確認

Bernhard Kerbl, Andréas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis,

(参考訳) 新たなビュー合成は、視覚的品質、高速トレーニング、リアルタイムレンダリングに優れたレベルを提供する3Dガウススプラッティングによって、近年大きな進歩を遂げている。しかし、トレーニングやレンダリングに必要なリソースは、必然的に、優れた視覚的品質で表現できるキャプチャされたシーンのサイズを制限します。我々は,非常に大きなシーンの視覚的品質を保った3次元ガウスの階層構造を導入し,有効レベルの選択と階層間のスムーズな遷移を伴う遠隔コンテンツの効率的なレンダリングを行うための,効率的なレベル・オブ・ディーテール(LOD)ソリューションを提供する。チャンクを階層に集約し、ガウスの視覚的品質をさらに改善し、中間ノードにマージする。非常に大きなキャプチャは、通常、シーンの少ないカバレッジを持ち、元の3Dガウススプラッティング訓練法に多くの課題をもたらします。我々は,非常に大きなシーンをリアルタイムにレンダリングし,LOD法により利用可能なリソースに適応できる,完全なソリューションを提案する。単純で手頃な価格のリグで何万枚もの画像を撮影し、最大数キロの軌道をカバーし、最大1時間持続する。 Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/

Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# ディープHM-SORT: スポーツにおける多目的追跡の深部特徴, ハーモニック平均, 拡張IOU

Deep HM-SORT: Enhancing Multi-Object Tracking in Sports with Deep Features, Harmonic Mean, and Expansion IOU ( http://arxiv.org/abs/2406.12081v1 )

ライセンス: Link先を確認

Matias Gran-Henriksen, Hans Andreas Lindgaard, Gabriel Kiss, Frank Lindseth,

(参考訳) 本稿では,スポーツシナリオにおけるスポーツ選手の追跡を強化するために設計された,新しいオンライン多目的追跡アルゴリズムであるDeep HM-SORTを紹介する。従来の多目的追跡手法は、プレイヤーの類似した外観、不規則で予測不可能な動き、重要なカメラの動きのために、しばしばスポーツ環境に苦しむ。 Deep HM-SORTは、深い特徴、調和平均、拡張IOUを統合することで、これらの課題に対処する。本手法は,高調波平均を利用して外見と動きのバランスを効果的に保ち,IDスワップを著しく低減する。さらに,本手法では,全てのトラックレットを無期限に保持し,フレームを離れて再入場する選手の再識別を改善する。実験の結果,SportsMOT と SoccerNet Tracking Challenge 2023 の2つの大規模公開ベンチマークにおいて,Deep HM-SORT が最先端の性能を達成した。具体的には,SportsMOTデータセットでは85.4HOTA,SportsMOTデータセットでは85.4HOTAを達成し,HOTA,IFF1,AssA,MOTAといった重要な指標において既存のトラッカーよりも優れていた。この堅牢なソリューションは、自動スポーツ分析の精度と信頼性を向上し、計算コストを増やすことなく、以前の方法よりも大幅に改善する。

This paper introduces Deep HM-SORT, a novel online multi-object tracking algorithm specifically designed to enhance the tracking of athletes in sports scenarios. Traditional multi-object tracking methods often struggle with sports environments due to the similar appearances of players, irregular and unpredictable movements, and significant camera motion. Deep HM-SORT addresses these challenges by integrating deep features, harmonic mean, and Expansion IOU. By leveraging the harmonic mean, our method effectively balances appearance and motion cues, significantly reducing ID-swaps. Additionally, our approach retains all tracklets indefinitely, improving the re-identification of players who leave and re-enter the frame. Experimental results demonstrate that Deep HM-SORT achieves state-of-the-art performance on two large-scale public benchmarks, SportsMOT and SoccerNet Tracking Challenge 2023. Specifically, our method achieves 80.1 HOTA on the SportsMOT dataset and 85.4 HOTA on the SoccerNet-Tracking dataset, outperforming existing trackers in key metrics such as HOTA, IDF1, AssA, and MOTA. This robust solution provides enhanced accuracy and reliability for automated sports analytics, offering significant improvements over previous methods without introducing additional computational cost.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 微調整暗黙関数の不確かさモデリング

Uncertainty modeling for fine-tuned implicit functions ( http://arxiv.org/abs/2406.12082v1 )

ライセンス: Link先を確認

Anna Susmelj, Mael Macuglia, Nataša Tagasovska, Reto Sutter, Sebastiano Caprara, Jean-Philippe Thiran, Ender Konukoglu,

(参考訳) ニューラルネットワーク(NeRF)、占有ネットワーク、符号付き距離関数(SDF)などの暗黙の関数は、スパースビューから詳細な物体形状を再構築するコンピュータビジョンにおいて重要な役割を担っている。これらのモデルで最適な性能を達成することは、データの破損によって引き起こされる入力と分散シフトの極端に分散しているため、困難である。この目的のために、大きなノイズのない合成データセットは、モデルがギャップを埋めるのを助けるために、形状の先行として機能するが、その結果の再構築には注意が必要である。これらの復元の質を評価するためには不確実性推定が不可欠であり、特にモデルが以前から推測した部分について不確実である地域を特定する上で重要である。本稿では,暗黙関数における不確実性推定手法であるDropsemblesを紹介する。我々は,おもちゃの例から始まり,現実のシナリオへと進む一連の実験を通じて,アプローチの有効性を実証する。具体的には、合成解剖学的データに基づいて畳み込み職業ネットワークを訓練し、腰椎の低分解能MRIセグメント上でテストする。その結果,Dropsemblesは深層アンサンブルの精度とキャリブレーションレベルを達成するが,計算コストは著しく低いことがわかった。

Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 量子誤り訂正符号の等価クラス

Equivalence Classes of Quantum Error-Correcting Codes ( http://arxiv.org/abs/2406.12083v1 )

ライセンス: Link先を確認

Andrey Boris Khesin, Alexander Li,

(参考訳) 量子過程に影響を与える固有のノイズに対処するために、量子誤り訂正符号(QECC)が必要である。 ZX計算を用いて、テンソルネットワークからなるZXダイアグラムと呼ばれる形式でQECCを表す。本稿では,CSSコードとCSS状態(入力0のCSSコード)の標準形式を示し,トーリックコードと特定の表面符号の標準形式を示す。次に、素コードダイアグラム、単一の連結コンポーネントを持つコードのZXダイアグラムの概念を導入し、リライトルールのシーケンスがそのようなダイアグラムを2つの連結コンポーネントに分割することができない特性について述べる。また、クリフォード符号の基本定理を示し、クリフォード符号の素分解の存在と特異性を証明した。次に、出力の置換と出力上の任意の局所演算を可能にする同値性の定義が異なるZXダイアグラムの同値類を集計する。これらの同値クラスの考えられる代表が分析される。この研究は、ZX図形表現におけるQECCの正準形式を探索する以前の研究を拡張している。

Quantum error-correcting codes (QECC's) are needed to combat the inherent noise affecting quantum processes. Using ZX calculus, we represent QECC's in a form called a ZX diagram, consisting of a tensor network. In this paper, we present canonical forms for CSS codes and CSS states (which are CSS codes with 0 inputs), and we show the resulting canonical forms for the toric code and certain surface codes. Next, we introduce the notion of prime code diagrams, ZX diagrams of codes that have a single connected component with the property that no sequence of rewrite rules can split such a diagram into two connected components. We also show the Fundamental Theorem of Clifford Codes, proving the existence and uniqueness of the prime decomposition of Clifford codes. Next, we tabulate equivalence classes of ZX diagrams under a different definition of equivalence that allows output permutations and any local operations on the outputs. Possible representatives of these equivalence classes are analyzed. This work expands on previous works in exploring the canonical forms of QECC's in their ZX diagram representations.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 推論と情報集約 : スポーツナラティブを用いた事例研究

When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives ( http://arxiv.org/abs/2406.12084v1 )

ライセンス: Link先を確認

Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu,

(参考訳) LLMが関連情報を正確に集約する場合、推論は最も強力である。スポーツ物語の分析を LLM が要求する推論における情報集約の重要性について検討する。このタスクを成功させるためには、LCMはアクションからポイントを推測し、関連するエンティティを特定し、プレイヤーやチームに正確に属性ポイントを割り当て、結論を引き出すために重要な統計データをコンパイルする必要がある。我々はNBAの実際のバスケットボールデータを用いて総合的な実験を行い、ゲーム物語を合成する新しい手法であるSportsGenを提示する。データの合成により, 物語の長さや情報密度の異なる複雑なシナリオ下で, LLMの推論能力を厳格に評価することができる。その結果, GPT-4oを含むほとんどのモデルでは, 頻繁な得点パターンのため, バスケットボールの得点を正確に集計することができないことが判明した。 Llama-3のようなオープンソースのモデルは、さらに大きなスコア幻覚に悩まされている。最後に、推論の有効性は、物語の複雑さ、情報密度、ドメイン固有の用語の影響を受け、分析的推論タスクにおける課題を浮き彫りにする。

Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer points from actions, identify related entities, attribute points accurately to players and teams, and compile key statistics to draw conclusions. We conduct comprehensive experiments with real NBA basketball data and present SportsGen, a new method to synthesize game narratives. By synthesizing data, we can rigorously evaluate LLMs' reasoning capabilities under complex scenarios with varying narrative lengths and density of information. Our findings show that most models, including GPT-4o, often fail to accurately aggregate basketball scores due to frequent scoring patterns. Open-source models like Llama-3 further suffer from significant score hallucinations. Finally, the effectiveness of reasoning is influenced by narrative complexity, information density, and domain-specific terms, highlighting the challenges in analytical reasoning tasks.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 最適量子線形系解法への近道

A shortcut to an optimal quantum linear system solver ( http://arxiv.org/abs/2406.12086v1 )

ライセンス: Link先を確認

Alexander M. Dalzell,

(参考訳) A\boldsymbol{x}=\boldsymbol{b}$ 方程式の線形系が与えられたとき、量子線型系ソルバ (QLSSs) は量子状態 $|\boldsymbol{x}\rangle$ をおよそ準備し、振幅は解ベクトル $\boldsymbol{x}$ に比例する。漸近的に最適なQLSSはクエリ複雑性が$O(\kappa \log(1/\varepsilon))$で、$\kappa$は$A$の条件番号、$\varepsilon$は近似エラーである。しかしながら、既存の最適かつほぼ最適なQLSSのランタイム保証には、一定の事前ファクタが適していない。ここでは、これらのテクニックを使用しない概念的にシンプルなQLSSを提供します。ソリューションノルム $\lVert\boldsymbol{x}\rVert$ が正確に知られている場合、我々の QLSS はカーネルリフレクションの1つのアプリケーションしか必要とせず、QLSS のクエリ複雑性は $(1+O(\varepsilon))\kappa \ln(2\sqrt{2}/\varepsilon)$ である。ノルムが不明な場合、我々の手法は、$O(\log\log(\kappa))$カーネルプロジェクションの応用(EFの直接一般化)を用いて定数係数まで推定することができ、ほぼ最適の$O(\kappa \log(\kappa)\log\log(\kappa)+\kappa\log(1/\varepsilon))$トータル複雑性を持つ単純なQLSSが得られる。あるいは、adiabatic path-following 法から概念を再導入することにより、$O(\kappa)$ complexity がノルム推定のために達成され、$O(\kappa\log(1/\varepsilon))$ complexity で最適な QLSS が得られるが、それでもadiabatic theorem を呼び出す必要はない。最後に、最適QLSSの複雑さに対して、56\kappa+1.05\kappa \ln(1/\varepsilon)+o(\kappa)$の明示的な上限を計算する。

Given a linear system of equations $A\boldsymbol{x}=\boldsymbol{b}$, quantum linear system solvers (QLSSs) approximately prepare a quantum state $|\boldsymbol{x}\rangle$ for which the amplitudes are proportional to the solution vector $\boldsymbol{x}$. Asymptotically optimal QLSSs have query complexity $O(\kappa \log(1/\varepsilon))$, where $\kappa$ is the condition number of $A$, and $\varepsilon$ is the approximation error. However, runtime guarantees for existing optimal and near-optimal QLSSs do not have favorable constant prefactors, in part because they rely on complex or difficult-to-analyze techniques like variable-time amplitude amplification and adiabatic path-following. Here, we give a conceptually simple QLSS that does not use these techniques. If the solution norm $\lVert\boldsymbol{x}\rVert$ is known exactly, our QLSS requires only a single application of kernel reflection (a straightforward extension of the eigenstate filtering (EF) technique of previous work) and the query complexity of the QLSS is $(1+O(\varepsilon))\kappa \ln(2\sqrt{2}/\varepsilon)$. If the norm is unknown, our method allows it to be estimated up to a constant factor using $O(\log\log(\kappa))$ applications of kernel projection (a direct generalization of EF) yielding a straightforward QLSS with near-optimal $O(\kappa \log\log(\kappa)\log\log\log(\kappa)+\kappa\log(1/\varepsilon))$ total complexity. Alternatively, by reintroducing a concept from the adiabatic path-following technique, we show that $O(\kappa)$ complexity can be achieved for norm estimation, yielding an optimal QLSS with $O(\kappa\log(1/\varepsilon))$ complexity while still avoiding the need to invoke the adiabatic theorem. Finally, we compute an explicit upper bound of $56\kappa+1.05\kappa \ln(1/\varepsilon)+o(\kappa)$ for the complexity of our optimal QLSS.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# クリックスルーレート予測モデルのための相互学習

Mutual Learning for Finetuning Click-Through Rate Prediction Models ( http://arxiv.org/abs/2406.12087v1 )

ライセンス: Link先を確認

Ibrahim Can Yilmaz, Said Aldemir,

(参考訳) クリックスルーレート(CTR)予測はデジタル広告やオンラインショッピングといったデジタル産業において重要な課題となっている。多くのディープラーニングベースの手法が実装され、ドメインにおける最先端のモデルとなっている。 CTRモデルの性能向上のために、知識蒸留に基づくアプローチが広く用いられている。しかし、現在のCTR予測モデルのほとんどは、あまり複雑なアーキテクチャを持っていないため、それらのうちの1つを「面倒」、もう1つを「汚い」とするのは困難です。一方、複雑なモデルと単純なモデルの間にも、パフォーマンスのギャップはそれほど大きくない。そのため、あるモデルから別のモデルへの知識の蒸留は、その努力に値するものではなかった。これらの考慮の下では、相互学習は、すべてのモデルを相互に改善できるため、より良いアプローチになり得る。本稿では,相互学習アルゴリズムが対等である場合に,いかに有用かを示す。 CriteoデータセットとAvazuデータセットの実験では、相互学習アルゴリズムがモデルの性能を最大0.66%改善した。

Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR prediction models do not have much complex architectures, so it's hard to call one of them 'cumbersome' and the other one 'tiny'. On the other hand, the performance gap is also not very large between complex and simple models. So, distilling knowledge from one model to the other could not be worth the effort. Under these considerations, Mutual Learning could be a better approach, since all the models could be improved mutually. In this paper, we showed how useful the mutual learning algorithm could be when it is between equals. In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 多体量子幾何双極子

Many-Body Quantum Geometric Dipole ( http://arxiv.org/abs/2406.12089v1 )

ライセンス: Link先を確認

H. A. Fertig, Luis Brey,

(参考訳) 多体電子系の集合励起は内部構造を持ち、ヒルベルト空間の量子幾何学と結びついている。これは「量子幾何学的双極子」 (QGD) を持ち、これは本質的に状態に関連付けられた電気双極子モーメントである。この研究で、この性質は、単一粒子ホール状態の項で表される波動関数を必要としない、汎用的な方法で定式化できることを実証する。我々の定式化は、運動量${\bf K}$で連続的に進化する励起の枝に付随する密度行列を利用しており、そこからQGDの構築を可能にする単一粒子状態を取り出すことができる。 2つの量子ホール系の励起状態に対する単一モード近似を用いて定式化を行う: 1つは積分的に満たされたランダウレベル、もう1つは補充係数$\nu=1/m$の分数量子ホール状態で、$m$の奇数整数である。どちらの場合も QGD に対して同じ結果が得られるが、これは系が仮定する翻訳的不変性に起因する。本研究は,QGDが集合モードの固有特性であり,波動関数の近似を超越して有効であることを示す。

Collective excitations of many-body electron systems can carry internal structure, tied to the quantum geometry of the Hilbert space in which they are embedded. This has been shown explicitly for particle-hole-like excitations, which carry a ``quantum geometric dipole'' (QGD) that is essentially an electric dipole moment associated with the state. We demonstrate in this work that this property can be formulated in a generic way, which does not require wavefunctions expressed in terms of single particle-hole states. Our formulation exploits the density matrix associated with a branch of excitations that evolves continuously with its momentum ${\bf K}$, from which one may extract single-particle states allowing a construction of the QGD. We demonstrate the formulation using the single-mode approximation for excited states of two quantum Hall systems: the first for an integrally filled Landau level, and the second for a fractional quantum Hall state at filling factor $\nu=1/m$, with $m$ an odd integer. In both cases we obtain the same result for the QGD, which can be attributed to the translational invariance assumed of the system. Our study demonstrates that the QGD is an intrinsic property of collective modes which is valid beyond approximations one might make for their wavefunctions.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# LLMアライメントに対する毒の脅威は本当にあるのか?

Is poisoning a real threat to LLM alignment? Maybe more so than you think ( http://arxiv.org/abs/2406.12091v1 )

ライセンス: Link先を確認

Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang,

(参考訳) 近年のRLHF(Reinforcement Learning with Human Feedback)は,Large Language Models(LLM)のアライメントに大きな影響を与えている。 PPO(Proximal Policy Optimization)のような強化学習アルゴリズムの感度は、RLHFを教師付き学習フレームワークとして扱うDPO(Direct Policy Optimization)の新たなラインワークにつながっている。これらのRLHF手法の実用性の向上は、その脆弱性の分析を保証している。本研究は,DPOの攻撃に対する脆弱性を異なるシナリオで調査し,第1種である嗜好中毒の有効性を比較した。 DPOの脆弱性は、バックドアや非バックドア攻撃、さまざまな言語モデル(LLama 7B, Mistral 7B, Gemma 7B)で網羅的に分析する。バックドア攻撃に関して、有害な行動を誘発するためには、少なくとも4\%のデータを汚染する必要があるPPOベースの手法とは違って、DPOの真の脆弱性をより簡単に活用することで、データの0.5\%でモデルに毒を与えることができる。脆弱性の背後にある潜在的な理由と、この脆弱性がバックドアと非バックドアの攻撃にどの程度うまく変換されるかをさらに調査する。

Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO's vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# ユーザペルソナと潜在的ミスアライメントのメカニズム

Who's asking? User personas and the mechanics of latent misalignment ( http://arxiv.org/abs/2406.12094v1 )

ライセンス: Link先を確認

Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon,

(参考訳) モデル安全性の改善への投資にもかかわらず、安全に配慮したモデルでは、不整合性は相変わらず維持されていることが研究で示されている。この研究において、我々はこの現象の力学に光を当てた。まず、モデル世代が安全である場合でも、有害なコンテンツは隠された表現に留まり、以前のレイヤから復号することで抽出できることを示す。そして,モデルがそのようなコンテンツを拡散するか否かは,相手に対する認識に大きく依存していることを示し,これをユーザペルソナと呼ぶ。実際、ユーザペルソナの操作は、モデル拒絶を直接制御しようとする試みよりも有害なコンテンツを引き出すのに効果的であることがわかった。自然言語のプロンプトとアクティベーションステアリングの両方を制御法として検討し、アクティベーションステアリングが安全フィルタをバイパスするのに著しく有効であることを示す。特定のペルソナがモデルセーフガードを破る理由を調査し、そのモデルが危険なクエリのより慈善的な解釈を形成することを確認した。最後に, 操舵ベクトルの幾何学のみを考慮すれば, 拒絶に対するペルソナの影響を予測できることを示す。

Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. First, we show that even when model generations are safe, harmful content can persist in hidden representations and can be extracted by decoding from earlier layers. Then, we show that whether the model divulges such content depends significantly on its perception of who it is talking to, which we refer to as user persona. In fact, we find manipulating user persona to be even more effective for eliciting harmful content than direct attempts to control model refusal. We study both natural language prompting and activation steering as control methods and show that activation steering is significantly more effective at bypassing safety filters. We investigate why certain personas break model safeguards and find that they enable the model to form more charitable interpretations of otherwise dangerous queries. Finally, we show we can predict a persona's effect on refusal given only the geometry of its steering vector.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# DistillNeRF: ニューラルネットワークと基礎モデル特徴の蒸留による単一視点画像からの3次元シーンの認識

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features ( http://arxiv.org/abs/2406.12095v1 )

ライセンス: Link先を確認

Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus,

(参考訳) 自律運転における2次元の限られた観察から3次元環境を理解することの難しさに対処する自己教師型学習フレームワークであるDistillNeRFを提案する。提案手法は,スパース,シングルフレームのマルチビューカメラ入力からリッチなニューラルシーン表現を予測する一般化可能なフィードフォワードモデルであり,RGB,深度,特徴画像の再構成のために,可変レンダリングを用いて自己教師を行う。我々の最初の洞察は、トレーニングのために深度と仮想カメラターゲットを生成することで、シーンごとの最適化されたニューラルレージアンスフィールド(NeRF)を活用することである。次に,CLIPやDINOv2のような事前訓練された2次元基礎モデルから特徴を抽出し,コストのかかる3次元アノテーションを必要とせずに,下流の様々なタスクを可能にすることを提案する。これら2つの知見を活用するために,2段階のリフト・スプラット・エンコーダとパラメータ化されたスパース階層のボクセル表現を用いた新しいモデルアーキテクチャを導入する。 NuScenesデータセットの実験結果によると、DistillNeRFはシーン再構成、新規ビュー合成、深度推定といった既存の自己監督手法よりも大幅に優れており、競争力のあるゼロショット3Dセマンティック占有率予測や、蒸留基礎モデルの特徴によるオープンワールドのシーン理解を可能にしている。デモとコードはhttps://distillnerf.github.io/.com/で公開される。

We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images. Our first insight is to exploit per-scene optimized Neural Radiance Fields (NeRFs) by generating dense depth and virtual camera targets for training, thereby helping our model to learn 3D geometry from sparse non-overlapping image inputs. Second, to learn a semantically rich 3D representation, we propose distilling features from pre-trained 2D foundation models, such as CLIP or DINOv2, thereby enabling various downstream tasks without the need for costly 3D human annotations. To leverage these two insights, we introduce a novel model architecture with a two-stage lift-splat-shoot encoder and a parameterized sparse hierarchical voxel representation. Experimental results on the NuScenes dataset demonstrate that DistillNeRF significantly outperforms existing comparable self-supervised methods for scene reconstruction, novel view synthesis, and depth estimation; and it allows for competitive zero-shot 3D semantic occupancy prediction, as well as open-world scene understanding through distilled foundation model features. Demos and code will be available at https://distillnerf.github.io/.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 分布シフト下における軌道予測のための適応的不確実性定量化

Adaptive Uncertainty Quantification for Trajectory Prediction Under Distributional Shift ( http://arxiv.org/abs/2406.12100v1 )

ライセンス: Link先を確認

Huiqun Huang, Sihong He, Fei Miao,

(参考訳) 将来の有限軌跡とそれに伴う目標車両の不確実性の両方をオンライン環境で推測できる軌道予測モデル(例:現実世界のアプリケーションシナリオ)は、自律走行車の動きの安全で堅牢なナビゲーションと経路計画の確保に不可欠である。しかし,既存の軌道予測モデルの大部分は,トレーニング段階における不確実性を1つの目的として低減することや,潜在的分布シフト下での推論段階における確実な不確実性定量化を提供することは考えていない。そこで本研究では,既存の軌道予測モデルの予測軌道の不確かさを,予測精度の向上とトレーニング段階における予測不確かさの低減を考慮しながら定量的に定量化する,分散シフトフレームワークCUQDS(Conformal Uncertainty Quantification under Distribution Shift framework)を提案する。特にCUQDSは 1)学習に基づくガウス過程回帰モジュールで、ベースモデル(既存の軌道予測や時系列予測ニューラルネットワーク)の出力分布をモデル化し、損失項の追加による推定不確実性を低減する。 2) ガウス過程回帰モジュールから推定された不確かさを、トレーニングデータとテストデータ間の潜在的分散シフトの下でオンライン環境で校正する統計ベースのコンフォーマルP制御モジュール。

Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the uncertainty as one objective during the training stage nor provided reliable uncertainty quantification during inference stage under potential distribution shift. Therefore, in this paper, we propose the Conformal Uncertainty Quantification under Distribution Shift framework, CUQDS, to quantify the uncertainty of the predicted trajectories of existing trajectory prediction models under potential data distribution shift, while considering improving the prediction accuracy of the models and reducing the estimated uncertainty during the training stage. Specifically, CUQDS includes 1) a learning-based Gaussian process regression module that models the output distribution of the base model (any existing trajectory prediction or time series forecasting neural networks) and reduces the estimated uncertainty by additional loss term, and 2) a statistical-based Conformal P control module to calibrate the estimated uncertainty from the Gaussian process regression module in an online setting under potential distribution shift between training and testing data.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 政策と実践の中心: 利用可能な差別的プライバシに関する研究ギャップ

Centering Policy and Practice: Research Gaps around Usable Differential Privacy ( http://arxiv.org/abs/2406.12103v1 )

ライセンス: Link先を確認

Rachel Cummings, Jayshree Sarathy,

(参考訳) 数学的に厳格なフレームワークであり、豊富な理論文献を蓄積しているため、多くの専門家は差分プライバシーをプライバシー保護データ分析のゴールドスタンダードとみなしている。差分プライバシーは理論上はクリーンな定式化であるが、実際は重大な課題を生じさせると主張する者もいる。どちらの視点も、私たちの見解では、有効で重要なものです。差分プライバシーの約束と現実世界のユーザビリティのギャップを埋めるために、研究者と実践者は協力してこの技術の政策と実践を進めなければならない。本稿では,ユーザニーズに合わせてリスクフレームワークを開発すること,利害関係者のコミュニケーションを調整すること,プライバシロスパラメータの影響をモデル化すること,効果的なユーザインターフェースに投資すること,ディファレンシャルプライバシシステムのアルゴリズム的および手続き的監査を容易にすること,など,有用なディファレンシャルプライバシ構築に向けたオープンな質問を概説する。

As a mathematically rigorous framework that has amassed a rich theoretical literature, differential privacy is considered by many experts to be the gold standard for privacy-preserving data analysis. Others argue that while differential privacy is a clean formulation in theory, it poses significant challenges in practice. Both perspectives are, in our view, valid and important. To bridge the gaps between differential privacy's promises and its real-world usability, researchers and practitioners must work together to advance policy and practice of this technology. In this paper, we outline pressing open questions towards building usable differential privacy and offer recommendations for the field, such as developing risk frameworks to align with user needs, tailoring communications for different stakeholders, modeling the impact of privacy-loss parameters, investing in effective user interfaces, and facilitating algorithmic and procedural audits of differential privacy systems.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# 分析インテリジェンスエンジンにおけるエンドツーエンドのテキスト-SQL生成

End-to-end Text-to-SQL Generation within an Analytics Insight Engine ( http://arxiv.org/abs/2406.12104v1 )

ライセンス: Link先を確認

Karime Maamari, Amine Mhedhbi,

(参考訳) Text-to-SQLの最近の進歩は、データベース管理システムをデータアクセスの民主化をさらに進めている。今日の言語モデルは、これらの進歩の中核にある。 Distyl AIのAnalytics Insight Engineの開発で経験した、印象的なText-to-SQL生成を可能にする。エンタープライズ顧客との初期の展開は、3つの重要な課題を強調した。まず、データアナリストは、非常に複雑なSQLクエリのオーサリングのサポートを期待する。第二に、リクエストはアドホックで、低レイテンシを必要とする。最後に、生成にはドメイン固有の用語とプラクティスを理解する必要があります。大規模言語モデルを活用したText-to-SQL生成パイプラインの設計と実装は、これらの課題に対処します。このアプローチのコアテナントは、事前処理フェーズで抽出した外部知識、クエリ生成時に適切な外部知識を取得すること、階層的なCTEベースの構造に従ってSQLクエリ生成を分解することに依存しています。最後に、適応フレームワークはフィードバックを利用して外部知識を更新し、時間とともにクエリ生成を改善する。エンドツーエンドのアプローチの概要を説明し、推論中にSQLを生成するオペレータを強調します。

Recent advancements in Text-to-SQL have pushed database management systems towards greater democratization of data access. Today's language models are at the core of these advancements. They enable impressive Text-to-SQL generation as experienced in the development of Distyl AI's Analytics Insight Engine. Its early deployment with enterprise customers has highlighted three core challenges. First, data analysts expect support with authoring SQL queries of very high complexity. Second, requests are ad-hoc and, as such, require low latency. Finally, generation requires an understanding of domain-specific terminology and practices. The design and implementation of our Text-to-SQL generation pipeline, powered by large language models, tackles these challenges. The core tenants of our approach rely on external knowledge that we extract in a pre-processing phase, on retrieving the appropriate external knowledge at query generation time, and on decomposing SQL query generation following a hierarchical CTE-based structure. Finally, an adaptation framework leverages feedback to update the external knowledge, in turn improving query generation over time. We give an overview of our end-to-end approach and highlight the operators generating SQL during inference.

翻訳日:2024-06-19 23:47:35 公開日:2024-06-17

# モノリシック量子プロセッサ用22nmFDSOICMOSにおける低温小型ミリ波広帯域SPSTスイッチ

Cryogenic Compact mm-Wave Broadband SPST Switch in 22nm FDSOI CMOS for Monolithic Quantum Processors ( http://arxiv.org/abs/2406.12105v1 )

ライセンス: Link先を確認

T. D. Nhut, S. Bonen, G. Cooke, T. Jager, M. Spasaro, D. Sufra, S. P. Voinigescu, D. Zito,

(参考訳) 本稿では,22nmFDSOICMOS技術を用いた小型ミリ波ブロードバンド単極スイッチ(SPST)の低温特性について報告する。スイッチは2つのn-MOSFETと、基板寄生効果を低減する特別な装置オプションと、分離を改善する第3のn-MOSFETで構成されている。従来の広帯域mm波スイッチとは異なり、大きな受動部品は必要とせず、非常にコンパクトな設計、低損失、高アイソレーション性能を実現している。 2Kでの低温測定では、2.3dB未満の挿入損失、25.3dB未満の分離損失、および11.5dB以下の戻り損失が、DCから70GHzまでの全周波数範囲で示される。

This paper reports the experimental characterization at the cryogenic temperature of a compact mm-wave broadband single-pole single-throw (SPST) switch in 22nm FDSOI CMOS technology. The switch consists of two n-MOSFETs with a special device option to reduce the substrate parasitic effects, and a third n-MOSFET to improve isolation. Unlike prior wideband mm-wave switches, it does not require any large passive components, allowing a very compact design, low loss and high isolation performance. The cryogenic measurements at 2 K show an insertion loss lower than 2.3 dB, an isolation better than 25.3 dB, and the return loss better than -11.5 dB, over the entire frequency range from DC to 70 GHz.

翻訳日:2024-06-19 23:37:51 公開日:2024-06-17

# 生命科学におけるコンピューティング - 初期のアルゴリズムから現代AIへ

Computing in the Life Sciences: From Early Algorithms to Modern AI ( http://arxiv.org/abs/2406.12108v1 )

ライセンス: Link先を確認

Samuel A. Donkor, Matthew E. Walsh, Alexander J. Titus,

(参考訳) 生命科学におけるコンピューティングは、1950年代の初期の計算モデルから、現在見られる人工知能(AI)と機械学習(ML)の応用まで、変革的な進化を遂げてきた。本稿では,生命科学におけるコンピューティングの歴史的発展を通じて,重要なマイルストーンと技術進歩を強調した。この議論には、生物学的プロセスの計算モデルの導入、バイオインフォマティクスツールの出現、現代の生命科学研究におけるAI/MLの統合が含まれる。科学的な大規模言語モデルやバイオAIツールなど、生命科学で使用されるAI対応ツールに注意が向けられ、その能力、限界、生物学的リスクへの影響を調べる。本研究は,諸分野における情報的意思決定と効果的なコミュニケーションを確保するために,本質的な用語と概念を明確にし,確立することを目的とする。

Computing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of artificial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scientific large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and effective communication across disciplines.

翻訳日:2024-06-19 23:37:51 公開日:2024-06-17

# LLMはソーシャルメディアからマクロ経済学的ナラティブを学べるか?

Can LLMs Learn Macroeconomic Narratives from Social Media? ( http://arxiv.org/abs/2406.12109v1 )

ライセンス: Link先を確認

Almog Gueta, Amir Feder, Zorik Gekhman, Ariel Goldstein, Roi Reichart,

(参考訳) この研究は実証的に$\textit{Narrative Economics}$仮説を検証し、物語(広く広まり、大衆の信念に影響を及ぼすイデア)が経済変動に影響を与えることを示唆している。我々は,X(旧Twitter)からの投稿を含む2つのキュレートされたデータセットを紹介した。自然言語処理(NLP)手法を用いて,ツイートからナラティブを抽出し,要約する。我々は、これらの予測力を、ツイートを組み込んだ$\textit{macroeconomic}$予測や、抽出したナラティブの表現を下流の財務予測タスクに組み込むことでテストする。我々の研究は、物語データを用いてマクロ経済モデルを改善する上での課題を強調し、研究コミュニティがこの重要な課題に現実的に対処する道を開く。学術的な観点から,我々は,Large Language Models (LLMs) を用いた物語抽出と要約のための貴重な洞察とNLPツールを提供し,今後の経済学における物語の役割に関する研究に寄与する。

This study empirically tests the $\textit{Narrative Economics}$ hypothesis, which posits that narratives (ideas that are spread virally and affect public beliefs) can influence economic fluctuations. We introduce two curated datasets containing posts from X (formerly Twitter) which capture economy-related narratives (Data will be shared upon paper acceptance). Employing Natural Language Processing (NLP) methods, we extract and summarize narratives from the tweets. We test their predictive power for $\textit{macroeconomic}$ forecasting by incorporating the tweets' or the extracted narratives' representations in downstream financial prediction tasks. Our work highlights the challenges in improving macroeconomic models with narrative data, paving the way for the research community to realistically address this important challenge. From a scientific perspective, our investigation offers valuable insights and NLP tools for narrative extraction and summarization using Large Language Models (LLMs), contributing to future research on the role of narratives in economics.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# Cancellable Memory Requests: 透過的で軽量なSpectre緩和

Cancellable Memory Requests: A transparent, lightweight Spectre mitigation ( http://arxiv.org/abs/2406.12110v1 )

ライセンス: Link先を確認

Hossam ElAtali, N. Asokan,

(参考訳) 推論はCPUのパフォーマンス向上に基本的だが、Spectre攻撃のような脆弱性を可能にする。これらの攻撃は一般的に3つのステップで展開される: 機密データ(アクセス)に投機的にアクセスし、キャッシュ状態(送信)を変更し、キャッシュタイミングアタック(例えば Flush+Reload)を使用してシークレット(受信)を抽出する。多くのSpectre攻撃は、送信および受信ステップ中にキャッシュタイミング側チャネルを利用する。我々のキーとなる観察は、誤予測が検出され、誤特定命令がスクアッシュされる前に、送信命令を完了させる必要がないことである。代わりに、命令がメモリ階層に要求を実行し、ディスパッチするのに十分である。スカッシング後にやってくるメモリからの応答は、誤って特定されたメモリアクセスに関連するものを含むキャッシュ状態を変化させる。そこで我々はCMR(Cancellable Memory Requests)という,不特定メモリ要求をキャンセルする新しい緩和手法を提案する。スキャッシングの直後に、キャンセルがキャッシュ階層に送信され、下流を伝播し、まだ応答を受けていないキャッシュの変更を防止する。これにより、キャッシュ状態が変更される可能性が低下し、Spectre攻撃が成功する可能性が低下する。 gem5 上で CMR を実装し,実際の Spectre 攻撃を阻止し,性能上のオーバーヘッドがほぼゼロに近いことを示す。我々は,現実的なシステム構成を持つ4つの実世界のプロセッサにおいて,CMRがSpectre攻撃を完全に阻止できることを示す。

Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks, which remain a significant challenge to mitigate without incurring substantial performance overheads. These attacks typically unfold in three steps: they speculatively access sensitive data (access), alter the cache state (transmit), and then utilize a cache timing attack (e.g., Flush+Reload) to extract the secret (receive). Most Spectre attacks exploit a cache timing side channel during the transmit and receive steps. Our key observation is that Spectre attacks do not require the transmit instruction to complete before mis-prediction is detected and mis-speculated instructions are squashed. Instead, it suffices for the instruction to execute and dispatch a request to the memory hierarchy. Responses from memory that arrive after squashing occurs still alter the cache state, including those related to mis-speculated memory accesses. We therefore propose a novel mitigation technique, Cancellable Memory Requests (CMR), that cancels mis-speculated memory requests. Immediately upon squashing, a cancellation is sent to the cache hierarchy, propagating downstream and preventing any changes to caches that have not yet received a response. This reduces the likelihood of cache state changes, thereby reducing the likelihood of Spectre attacks succeeding. We implement CMR on gem5 and show that it thwarts practical Spectre attacks, and has near-zero performance overheads. We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# グラフニューラルネットワークを用いた粗粒力場の熱力学的伝達性

Thermodynamic Transferability in Coarse-Grained Force Fields using Graph Neural Networks ( http://arxiv.org/abs/2406.12112v1 )

ライセンス: Link先を確認

Emily Shinkle, Aleksandra Pachalieva, Riti Bahl, Sakib Matin, Brendan Gifford, Galen T. Craven, Nicholas Lubbers,

(参考訳) 粗粒化(英: coarse-graining)とは、原子論的な系が単純化された方法で表現され、ターゲットの出力に寄与する最も重要な系の特徴を保ちながら、関連性の低い自由度を除去する分子モデリング技術である。このモデル複雑性の低減により、粗粒度分子シミュレーションは、対応する全原子モデルと比較して空間的および時間的スケールが増大する。粗粒化における中核的な課題は、原子レベルの特性を保持する方法で、新しい表現における相互作用を表現する力場を構築することである。粗大きめの力場を構築するための多くのアプローチは、特定の熱力学状態点における内部のゆらぎに対する平均化の結果、異なる熱力学条件の間での伝達性に制限がある。本稿では,階層的相互作用粒子ニューラルネットとテンソル感度(HIP-NN-TS)のグラフ畳み込みニューラルネットワークアーキテクチャを用いて,力マッチング手法に基づく粗粒度モデルの伝達可能性の研究を可能にする,粗粒度フィールドのための高度に自動化されたトレーニングパイプラインを開発する。このアプローチは高い精度の力場を得るだけでなく、これらの力場は様々な熱力学条件を通してより伝達可能であることを示す。これらの結果は、伝達可能な粗い力場の構築を改善するため、グラフニューラルネットワークのような機械学習技術の可能性を示している。

Coarse-graining is a molecular modeling technique in which an atomistic system is represented in a simplified fashion that retains the most significant system features that contribute to a target output, while removing the degrees of freedom that are less relevant. This reduction in model complexity allows coarse-grained molecular simulations to reach increased spatial and temporal scales compared to corresponding all-atom models. A core challenge in coarse-graining is to construct a force field that represents the interactions in the new representation in a way that preserves the atomistic-level properties. Many approaches to building coarse-grained force fields have limited transferability between different thermodynamic conditions as a result of averaging over internal fluctuations at a specific thermodynamic state point. Here, we use a graph-convolutional neural network architecture, the Hierarchically Interacting Particle Neural Network with Tensor Sensitivity (HIP-NN-TS), to develop a highly automated training pipeline for coarse grained force fields which allows for studying the transferability of coarse-grained models based on the force-matching approach. We show that this approach not only yields highly accurate force fields, but also that these force fields are more transferable through a variety of thermodynamic conditions. These results illustrate the potential of machine learning techniques such as graph neural networks to improve the construction of transferable coarse-grained force fields.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# LLM駆動型アクティブラーニングと人間アノテーションによるテキスト分類の強化

Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation ( http://arxiv.org/abs/2406.12114v1 )

ライセンス: Link先を確認

Hamidreza Rouzegar, Masoud Makrehchi,

(参考訳) テキスト分類の文脈では、トレーニングデータを作成するためのアノテーション演習の金銭的負担が重要な問題である。アクティブラーニング技術、特に不確実性サンプリングに根ざした手法は、手動アノテーションの最も指導的なサンプルをピンポイントすることで、コスト効率の良いソリューションを提供する。同様に、GPT-3.5のようなLarge Language Models (LLM) は自動アノテーションの代替を提供するが、その信頼性に関する懸念がある。本研究では,人間のアノテータとLLMをアクティブラーニングフレームワークに統合する新しい手法を提案する。 3つの公開データセットの評価を行った。 IMDB, 信頼度識別のためのFake Newsデータセット, マルチラベル分類のためのMovie Genresデータセット, 提案フレームワークは, モデル不確実性レベルに応じて, 人間のアノテーションとLCMの出力を統合する。この戦略は、コスト効率と分類性能の最適バランスを達成する。実験結果から, モデル精度の維持・改善を図りながら, データアノテーションに関連するコストを大幅に削減した。

In the context of text classification, the financial burden of annotation exercises for creating training data is a critical issue. Active learning techniques, particularly those rooted in uncertainty sampling, offer a cost-effective solution by pinpointing the most instructive samples for manual annotation. Similarly, Large Language Models (LLMs) such as GPT-3.5 provide an alternative for automated annotation but come with concerns regarding their reliability. This study introduces a novel methodology that integrates human annotators and LLMs within an Active Learning framework. We conducted evaluations on three public datasets. IMDB for sentiment analysis, a Fake News dataset for authenticity discernment, and a Movie Genres dataset for multi-label classification.The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels. This strategy achieves an optimal balance between cost efficiency and classification performance. The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# モノリシックシリコン量子プロセッサにおけるスピン量子制御のための低温小型低消費電力60GHz増幅器

Cryogenic Compact Low-Power 60GHz Amplifier for Spin Qubit Control in Monolithic Silicon Quantum Processors ( http://arxiv.org/abs/2406.12115v1 )

ライセンス: Link先を確認

M. Spasaro, S. Bonen, G. Cooke, T. Jager, T. D. Nhut, D. Sufra, S. P. Voinigescu, D. Zito,

(参考訳) 本稿では,モノリシックSi量子プロセッサのための基本構造ブロックとして,電子/ホールスピン量子ビット制御のための低温小型低消費電力60GHz増幅器の設計と評価を報告する。 2Kで試験され、15dBのS21を59 GHzで、BW3dBの52.5-67.5 GHzで、消費電力は2.16 mWである。インダクタレスアクティブネットワークを持つトポロジーのため、増幅器は0.18 x 0.19 mm2のコンパクトコア領域を有する。

This paper reports the design and experimental characterization of a cryogenic compact low-power 60GHz amplifier for control of electron/hole spin qubits, as elementary building block for monolithic Si quantum processors. Tested at 2 K, the amplifier exhibits S21 of 15 dB at 59 GHz, BW3dB of 52.5-67.5 GHz, and power consumption of 2.16 mW. Owing to the topology with inductorless active network, the amplifier has a compact core area of 0.18 x 0.19 mm2.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# Decoding the Narratives: Redditで共有された個人薬物体験の分析

Decoding the Narratives: Analyzing Personal Drug Experiences Shared on Reddit ( http://arxiv.org/abs/2406.12117v1 )

ライセンス: Link先を確認

Layla Bouzoubaa, Elham Aghakhani, Max Song, Minh Trinh, Rezvaneh Rezapour,

(参考訳) 薬物関連のサブレディット(英語版)のようなオンラインコミュニティは、薬物の使用経験、有害度低減、中毒回復に関する議論を促進するため、薬物使用者(PWUD)にとって安全な場所として機能している。これらのフォーラムで利用者が共有する物語は、物質使用障害(SUD)と回復可能性(Recovery potential)を発達させる可能性についての洞察を提供する。本研究は,物質利用経験に関するオンラインユーザ生成テキストを解析するための多レベル多ラベル分類モデルの構築を目的とする。この目的のために,我々はまず,意図した関係(問い合わせや開示),主題(例えば,回収,依存),特定の目的(例えば,再発,品質,安全)など,ポストの性質を評価する新しい分類法を導入する。注釈付きデータの集合上で様々なマルチラベル分類アルゴリズムを用いて、GPT-4が命令、定義、例によって誘導された場合、他の全てのモデルよりも優れていたことを示す。本モデルを用いて,1000以上の投稿をラベル付けし,各クラスにおける投稿内で使用される言語表現のカテゴリを解析する。本分析では, 安全性, 物質の組み合わせ, メンタルヘルスなどのトピックが開示されやすくなり, 生理的効果の議論は害軽減に焦点が当てられている。我々の研究は、PWUDの経験の理解を深め、SUDと薬物使用に関する幅広い知識基盤を伝えます。

Online communities such as drug-related subreddits serve as safe spaces for people who use drugs (PWUD), fostering discussions on substance use experiences, harm reduction, and addiction recovery. Users' shared narratives on these forums provide insights into the likelihood of developing a substance use disorder (SUD) and recovery potential. Our study aims to develop a multi-level, multi-label classification model to analyze online user-generated texts about substance use experiences. For this purpose, we first introduce a novel taxonomy to assess the nature of posts, including their intended connections (Inquisition or Disclosure), subjects (e.g., Recovery, Dependency), and specific objectives (e.g., Relapse, Quality, Safety). Using various multi-label classification algorithms on a set of annotated data, we show that GPT-4, when prompted with instructions, definitions, and examples, outperformed all other models. We apply this model to label an additional 1,000 posts and analyze the categories of linguistic expression used within posts in each class. Our analysis shows that topics such as Safety, Combination of Substances, and Mental Health see more disclosure, while discussions about physiological Effects focus on harm reduction. Our work enriches the understanding of PWUD's experiences and informs the broader knowledge base on SUD and drug use.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# リアルタイム大規模交通ネットワークにおけるハリケーン避難時の効率的な管理のためのスケーラブルな交通予測モデルの構築

Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations ( http://arxiv.org/abs/2406.12119v1 )

ライセンス: Link先を確認

Qinhua Jiang, Brian Yueshuai He, Changju Lee, Jiaqi Ma,

(参考訳) ハリケーン避難時の交通管理には正確な交通予測が不可欠である。本稿では,MLP(Multilayer Perceptron)モデルとLSTM(Long-Short Term Memory)モデルを統合した予測モデリングシステムを提案する。収集された交通データ,時空間道路網情報,ハリケーン予測データなど,さまざまな入力変数を活用することで,異種人の行動,限られた避難データ,ハリケーンイベントの不確実性といった課題に対処する。ルイジアナ州の現実の交通予測システムで展開されたこのモデルは、7日間のハリケーンの影響を受けた期間に6時間にわたって長期の渋滞状態を予測する精度を82%達成した。短期速度予測モデルでは1時間から6時間にわたる避難地平地を7%から13%の範囲で平均絶対パーセンテージ誤差(MAPEs)を示した。評価結果は、ハリケーン避難時の交通管理を強化するモデルの可能性を強調し、実際の展開は、広範な交通ネットワーク内の多様なハリケーンシナリオにおける適応性とスケーラビリティを強調している。

Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road network information, and hurricane forecast data, the framework is designed to address challenges posed by heterogeneous human behaviors, limited evacuation data, and hurricane event uncertainties. Deployed in a real-world traffic prediction system in Louisiana, the model achieved an 82% accuracy in predicting long-term congestion states over a 6-hour period during a 7-day hurricane-impacted duration. The short-term speed prediction model exhibited Mean Absolute Percentage Errors (MAPEs) ranging from 7% to 13% across evacuation horizons from 1 to 6 hours. Evaluation results underscore the model's potential to enhance traffic management during hurricane evacuations, and real-world deployment highlights its adaptability and scalability in diverse hurricane scenarios within extensive transportation networks.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# 強化学習を伴う拡散モデルへの条件制御の追加

Adding Conditional Control to Diffusion Models with Reinforcement Learning ( http://arxiv.org/abs/2406.12120v1 )

ライセンス: Link先を確認

Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali,

(参考訳) 拡散モデルは、生成されたサンプルの特性を正確に制御できる強力な生成モデルである。大規模なデータセットでトレーニングされたこれらの拡散モデルは成功したが、下流の微調整プロセスに新たな制御を導入し、これらの強力なモデルを事前訓練された拡散モデルとして扱う必要があることが多い。本研究は、入力と対応するラベルからなるオフラインデータセットを活用することを目的として、強化学習(RL)に基づく新たな制御手法を提案する。我々は、このタスクをRL問題として定式化し、オフラインデータセットから学習した分類器と、報酬関数として機能する事前訓練されたモデルに対するKLの発散について述べる。我々は、上記の報酬関数を最大化するソフト最適化ポリシーを生成する、$\textbf{C}$onditioning pre-\textbf{T}$rained diffusion model with $\textbf{R}$einforcement $\textbf{L}$earning(英語版))を導入する。我々は,提案手法が推論中に追加制御で条件付き分布からサンプリングできることを正式に証明した。我々のRLベースのアプローチは、既存の方法よりもいくつかのアドバンテージを提供します。一般的な分類器フリーガイダンスと比較して,本手法はサンプル効率を向上し,入力と追加制御の条件付き独立性を利用してオフラインデータセット構築を大幅に単純化することができる。さらに、分類器のガイダンスとは異なり、中間状態から追加制御への分類器の訓練は不要である。

Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to introduce additional controls in downstream fine-tuning processes, treating these powerful models as pre-trained diffusion models. This work presents a novel method based on reinforcement learning (RL) to add additional controls, leveraging an offline dataset comprising inputs and corresponding labels. We formulate this task as an RL problem, with the classifier learned from the offline dataset and the KL divergence against pre-trained models serving as the reward functions. We introduce our method, $\textbf{CTRL}$ ($\textbf{C}$onditioning pre-$\textbf{T}$rained diffusion models with $\textbf{R}$einforcement $\textbf{L}$earning), which produces soft-optimal policies that maximize the abovementioned reward functions. We formally demonstrate that our method enables sampling from the conditional distribution conditioned on additional controls during inference. Our RL-based approach offers several advantages over existing methods. Compared to commonly used classifier-free guidance, our approach improves sample efficiency, and can greatly simplify offline dataset construction by exploiting conditional independence between the inputs and additional controls. Furthermore, unlike classifier guidance, we avoid the need to train classifiers from intermediate states to additional controls.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# TutteNet:2次元メッシュ変形の構成によるインジェクティブ3次元変形

TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations ( http://arxiv.org/abs/2406.12121v1 )

ライセンス: Link先を確認

Bo Sun, Thibault Groueix, Chen Song, Qixing Huang, Noam Aigerman,

(参考訳) 本研究は、3次元空間の射影変形の新たな表現法を提案する。これは、不正確さ、頑健さの欠如、一般学習および最適化フレームワークとの非互換性といった、既存の射影的手法の限界を克服するものである。中心となる考え方は、問題を複数の2Dメッシュベースのピースワイズ線形マップの深い構成に還元することである。すなわち、3次元体積の複雑な3次元インジェクティブ変形を生成するために、異なる平面上にこれらの層を構成する。提案手法は, 複雑な変形を効率よく, 正確に最適化し, 学習し, 他のインジェクティブアプローチよりも優れていることを示す。主な用途として、複雑で人工物のないNeRFおよびSDF変形を生成する。

This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that produce mesh deformations through Tutte's embedding (guaranteed to be injective in 2D), and compose these layers over different planes to create complex 3D injective deformations of the 3D volume. We show our method provides the ability to efficiently and accurately optimize and learn complex deformations, outperforming other injective approaches. As a main application, we produce complex and artifact-free NeRF and SDF deformations.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# ChatEMG: ストロークのためのロボットハンドオーソシスを制御するための合成データ生成

ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke ( http://arxiv.org/abs/2406.12123v1 )

ライセンス: Link先を確認

Jingxi Xu, Runsheng Wang, Siqi Shang, Ava Chen, Lauren Winterbottom, To-Liang Hsu, Wenxi Chen, Khondoker Ahmed, Pedro Leandro La Rotta, Xinyue Zhu, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie,

(参考訳) 脳卒中患者の手指矯正におけるインテント・インフェラルは,障害者からのデータ収集が困難であるため困難である。さらに、EMG信号は、様々な条件、セッション、主題に有意な変化を示しており、分類器の一般化が困難である。従来のアプローチでは、新しい条件やセッション、あるいは列車意図の分類対象からの大きなラベル付きデータセットが必要ですが、このデータ収集プロセスは重荷と時間を要する。本稿では,自動回帰生成モデルであるChatEMGを提案する。 ChatEMGは、新しい状態、セッション、主題から小さなデータセットのみを収集し、この新しいコンテキストからのプロンプトで条件付けされた合成サンプルで拡張することを可能にする。 ChatEMGは、生成トレーニングを通じて以前のデータの巨大なリポジトリを活用すると同時に、プロンプトを通じてコンテキスト固有のままである。実験の結果,これらの合成標本は分類器に依存しず,異なるタイプの分類器に対する意図的推論精度を向上させることができることがわかった。機能的整形外科支援タスクの分類器の使用を含め,我々の完全アプローチを1つの患者セッションに統合できることを実証した。我々の知る限りでは、脳卒中生存者による整形機能制御のために、部分的に合成データに基づいて訓練された意図分類器が配備されたのはこれが初めてである。ビデオと追加情報はhttps://jxu.ai/chatemgで見ることができる。

Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos and additional information can be found at https://jxu.ai/chatemg.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# 大規模言語モデルを用いた効率的な逐次決定法

Efficient Sequential Decision Making with Large Language Models ( http://arxiv.org/abs/2406.12125v1 )

ライセンス: Link先を確認

Dingyang Chen, Qi Zhang, Yinglun Zhu,

(参考訳) 本稿では,大規模言語モデル(LLM)の成功を逐次意思決定に拡張することに焦点を当てる。既存の努力も。一意思決定のための再訓練又は微調整 LLM (II)事前訓練LSMの設計プロンプト前者のアプローチは勾配更新の計算負担に悩まされており、後者のアプローチは有望な結果を示さない。本稿では,LLMエージェントを逐次決定に効率的に組み込むために,オンラインモデル選択アルゴリズムを活用する新しい手法を提案する。統計的には,従来の意思決定アルゴリズムとバニラLSMエージェントの双方より有意に優れている。提案手法は,LLMの高コスト勾配更新を回避し,意思決定プロセスを通じて,少数のLLM呼び出ししか必要としない。提案手法の有効性を検証するため,広範囲な実験を行った。例えば、大規模なAmazonデータセットでは、私たちのアプローチはベースラインよりも6ドル以上パフォーマンスが向上し、LCMをわずか1.5ドル\%の時間ステップで呼び出すことができます。

This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making. Statistically, our approach significantly outperforms both traditional decision making algorithms and vanilla LLM agents. Computationally, our approach avoids the need for expensive gradient updates of LLMs, and throughout the decision making process, it requires only a small number of LLM calls. We conduct extensive experiments to verify the effectiveness of our proposed approach. As an example, on a large-scale Amazon dataset, our approach achieves more than a $6$x performance gain over baselines while calling LLMs in only $1.5$\% of the time steps.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# AIの「新しい」コンテンツファームは作りやすく、検出も難しい

AI "News" Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian ( http://arxiv.org/abs/2406.12128v1 )

ライセンス: Link先を確認

Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell'Orletta, Andrea Esuli,

(参考訳) 大規模言語モデル (LLMs) は、実際のニュース記事に伝達可能な合成テキストを生成するために「コンテンツファーム」モデル (CFMs) として使われることが多い。高品質なモノリンガルLLMを持たない言語でも、これはすでに起こっています。 Llama (v1)は、主に英語で訓練され、40Kものイタリア語のニュース記事で、イタリア語の母語話者が合成語として識別するのに苦労するニュースのようなテキストを生成するのに十分であることを示す。我々は,3つのLCMと3つの合成テキスト(log-likelihood, DetectGPT, 教師付き分類)の検出方法を調査し,それらすべてが人間のレーダよりも優れていることを発見した。また、プロキシCFMを作成する可能性についても検討する。LLMは、実際の"コンテンツファーム"で使用されるものと類似したデータセットを微調整したものだ。検出を成功させるためには、少量の微調整データでも十分であることがわかったが、どのLLMが使われているのかを知る必要がある。以上の結果から,現在,合成ニュースのようなテキストを「野生」に検出する方法は存在せず,生成も容易すぎることが示唆された。この問題に関するNLP研究の緊急性を強調した。

Large Language Models (LLMs) are increasingly used as "content farm" models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic. We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real "content farm". We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge. Our results suggest that there are currently no practical methods for detecting synthetic news-like texts 'in the wild', while generating them is too easy. We highlight the urgency of more NLP research on this problem.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# 効率的な粒子保存レンガウォール量子回路

Efficient particle-conserving brick-wall quantum circuits ( http://arxiv.org/abs/2406.12130v1 )

ライセンス: Link先を確認

Babatunde M. Ayeni,

(参考訳) 粒子保存量子回路を用いた変分量子最適化では、粒子保存ゲートと回路アンサーゼが与えられた問題に対して最も効率的かを決定することはしばしば困難である。これは特に限られたリソースを持つノイズの多い中間スケール量子(NISQ)プロセッサにとって重要である。これは一般には答えが難しいが、どの粒子保存ゲートが最も効率的かを決めることは、特定の回路アンサッツの中でより容易である。本稿では,対称テンソルネットワークによる実用的なアイデアを用いて,効率的な粒子保存ゲートを構築する方法について述べる。一般化されたゲートを含む様々な種類の粒子保存ゲートを導出する。ブロックウォール回路の枠組みの下でゲートを数値的にテストする。 4つの実パラメータしか持たない一般粒子保存ゲートが一般に最適であることを示す。さらに,隣り合うゲートが2ビット近いブロックウォール回路を非隣り合うゲートに拡張するアルゴリズムを提案する。回路の効率をハイゼンベルクスピンチェーンと比較し, 次アネレスト-隣り合う相互作用を伴わずに比較した。

In variational quantum optimization with particle-conserving quantum circuits, it is often difficult to decide a priori which particle-conserving gates and circuit ansatzes would be most efficient for a given problem. This is important especially for noisy intermediate-scale quantum (NISQ) processors with limited resources. While this may be challenging to answer in general, deciding which particle-conserving gate would be most efficient is easier within a specified circuit ansatz. In this paper, we show how to construct efficient particle-conserving gates using some practical ideas from symmetric tensor networks. We derive different types of particle-conserving gates, including the generalized one. We numerically test the gates under the framework of brick-wall circuits. We show that the general particle-conserving gate with only four real parameters is generally best. In addition, we present an algorithm to extend brick-wall circuit with two-qubit nearest-neighbouring gates to non-nearest-neighbouring gates. We test and compare the efficiency of the circuits with Heisenberg spin chain with and without next-nearest-neighbouring interactions.

翻訳日:2024-06-19 23:37:50 公開日:2024-06-17

# Gram2Vec: 解釈可能なドキュメントベクタ

Gram2Vec: An Interpretable Document Vectorizer ( http://arxiv.org/abs/2406.12131v1 )

ライセンス: Link先を確認

Peter Zeng, Eric Sclafani, Owen Rambow,

(参考訳) テキスト中の文法的特徴の正規化相対周波数を抽出することにより,文書を高次元空間に埋め込む文法的スタイルの埋め込みアルゴリズムであるGram2Vecを提案する。ニューラルアプローチと比較して、Gram2Vecは、特徴ベクトルの生成方法に基づいた固有の解釈性を提供する。デモでは,Gram2Vecベクタをベースとした文書への著者のマッピングを視覚化し,どの著者が特定の言語的選択を行うかを確認するために,機能をドロップまたは追加する機能を強調した。次に、著者属性を用いて、Gram2Vecの機能ベクトル間のコサイン類似性を用いて、候補文書とクエリドキュメント間の距離を計算することにより、文書が特定の著者に帰属する理由を説明する。

We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a mapping of authors to documents based on their Gram2Vec vectors and highlight the ability to drop or add features to view which authors make certain linguistic choices. Next, we use authorship attribution as an application to show how Gram2Vec can explain why a document is attributed to a certain author, using cosine similarities between the Gram2Vec feature vectors to calculate the distances between candidate documents and a query document.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# AIシステムのためのID

IDs for AI Systems ( http://arxiv.org/abs/2406.12137v1 )

ライセンス: Link先を確認

Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung,

(参考訳) AIシステムはますます普及しているが、それらをどう扱うかを決めるのに必要な情報は存在せず、アクセスもできない。ユーザーは、システムが特定の安全基準を満たすかどうかを検証できないかもしれない。調査官は、システムがいつインシデントを発生させるか、誰が調査すべきかを知らないかもしれない。プラットフォームは、同じシステムとの繰り返しの負の相互作用をペナルティ化するのは難しいかもしれない。多くのドメインにおいて、IDは類似した問題に対処し、 \textit{particular}エンティティ(例えば、特定のボーイング747)を特定し、同じクラスの他のエンティティ(例えば、一部のボーイング747)に関する情報を提供する。本稿では,AIシステムの「textbf{instances}」にIDを付加するフレームワークを提案する。 AIシステムのIDを特徴付け、主要なアクターからのIDに対する大きな需要がある可能性を主張し、それらのアクターがIDの採用を動機付ける方法を分析し、フレームワークの実装の可能性を探り、制限とリスクを強調します。特定のアクター(例えば、AIシステムによる金融取引を可能にするアクター)が、ID使用のインセンティブを試すことができる。 AIシステムのデプロイは、ID実装の開発を試すことができる。さらなる研究により、IDはAIシステムが社会に浸透する世界を管理するのに役立つかもしれない。

AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same system. Across a number of domains, IDs address analogous problems by identifying \textit{particular} entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to \textbf{instances} of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore potential implementations of our framework, and highlight limitations and risks. IDs seem most warranted in high-stakes settings, where certain actors (e.g., those that enable AI systems to make financial transactions) could experiment with incentives for ID use. Deployers of AI systems could experiment with developing ID implementations. With further study, IDs could help to manage a world where AI systems pervade society.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# テキスト埋め込みモデルにおけるバイアス

Bias in Text Embedding Models ( http://arxiv.org/abs/2406.12138v1 )

ライセンス: Link先を確認

Vasyl Rakivnenko, Nestor Maslej, Jessica Cervi, Volodymyr Zhukov,

(参考訳) テキスト埋め込みは、特に企業の間では、ますますポピュラーなAI方法論になりつつあるが、テキスト埋め込みモデルのバイアスを負う可能性はよく理解されていない。本稿では,一般的なテキスト埋め込みモデルの選択が,特に性別次元に偏りがある程度について検討する。より具体的には、これらのモデルが与えられた職業のリストとジェンダー付き用語を関連づける程度について研究する。この分析によると、テキストの埋め込みモデルは男女差が多いが、様々な方法がある。例えば、看護師、ホームメイカー、社交界などの専門職と女性識別子、CEO、マネージャー、ボスといった職業と男性識別子の関連性があるが、全てのモデルが職業ごとに同じ性的な関連性を持つわけではない。さらに、バイアスの大きさと方向はモデル単位でも変化し、特定の単語モデルによっても異なる。本稿は,ジェンダーバイアスがテキスト埋め込みモデルに影響を及ぼすことを実証し,この問題の具体的側面に留意する必要があることを示唆する。

Text embedding is becoming an increasingly popular AI methodology, especially among businesses, yet the potential of text embedding models to be biased is not well understood. This paper examines the degree to which a selection of popular text embedding models are biased, particularly along gendered dimensions. More specifically, this paper studies the degree to which these models associate a list of given professions with gendered terms. The analysis reveals that text embedding models are prone to gendered biases but in varying ways. Although there are certain inter-model commonalities, for instance, greater association of professions like nurse, homemaker, and socialite with female identifiers, and greater association of professions like CEO, manager, and boss with male identifiers, not all models make the same gendered associations for each occupation. Furthermore, the magnitude and directionality of bias can also vary on a model-by-model basis and depend on the particular words models are prompted with. This paper demonstrates that gender bias afflicts text embedding models and suggests that businesses using this technology need to be mindful of the specific dimensions of this problem.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# COTフロー: コントラストペアによる最適トランスポート画像サンプリングと編集を学習する

COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs ( http://arxiv.org/abs/2406.12140v1 )

ライセンス: Link先を確認

Xinrui Zu, Qian Tao,

(参考訳) 拡散モデルは,高次品質のマルチモーダルデータのサンプリング・編集において高い性能を示してきたが,計算コストが高く,遅い反復生成プロセスに悩まされている。さらに、ほとんどの手法はガウスノイズからデータを生成するよう制約されているため、サンプリングや編集の柔軟性が制限される。両欠点を克服するために,従来の拡散モデルと比較してゼロショット編集の柔軟性を向上し,高速かつ高品質な生成を実現する新しい手法であるContrastive Optimal Transport Flow (COT Flow)を提案する。最適トランスポート (OT) から恩恵を受けるため,本手法は事前分布に制限がなく,未ペア画像対像 (I2I) 変換が可能であり,編集可能な空間(軌道の始点と終端の両方)を他のゼロショット編集法と比較して2倍にすることができる。品質面では、COT Flowは従来の最先端のイメージ・ツー・イメージ(I2I)翻訳法と比較して1ステップで競合結果を生成することができる。 OTの導入によるCOT Flowの利点を強調するため,ユーザガイドによる編集を行うCOTエディタを導入する。コードはhttps://github.com/zuxinrui/cot_flow.comでリリースされる。

Diffusion models have demonstrated strong performance in sampling and editing multi-modal data with high generation quality, yet they suffer from the iterative generation process which is computationally expensive and slow. In addition, most methods are constrained to generate data from Gaussian noise, which limits their sampling and editing flexibility. To overcome both disadvantages, we present Contrastive Optimal Transport Flow (COT Flow), a new method that achieves fast and high-quality generation with improved zero-shot editing flexibility compared to previous diffusion models. Benefiting from optimal transport (OT), our method has no limitation on the prior distribution, enabling unpaired image-to-image (I2I) translation and doubling the editable space (at both the start and end of the trajectory) compared to other zero-shot editing methods. In terms of quality, COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods. To highlight the advantages of COT Flow through the introduction of OT, we introduce the COT Editor to perform user-guided editing with excellent flexibility and quality. The code will be released at https://github.com/zuxinrui/cot_flow.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# 音声理解のための多言語意味音声エンコーダの微調整のための二重タスク学習手法

A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding ( http://arxiv.org/abs/2406.12141v1 )

ライセンス: Link先を確認

Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Yannick Estève,

(参考訳) 自己指導型学習は、音声言語理解のための発話を効率よく表現するために広く用いられ、従来のアプローチを徐々に置き換えている。一方、言語に依存しないセマンティクスを符号化するテキストSSLモデルが提案されている。 SAMU-XLSRフレームワークはこの意味情報を多言語音声表現の強化に用いた。最近の研究では、SAMU-XLSRのドメイン内セマンティックエンリッチメントについて、下流の転写に特化して検討し、挑戦的なSLUタスクにおける最先端の結果をもたらした。本研究の関心は、SLUを含まない近接言語におけるそのような特殊化によって引き起こされる多言語パフォーマンスの喪失と特異意味訓練の欠如にある。また,SAMU-XLSRの初期言語間能力の喪失は,SLUファインチューニングの分離によるものであると考えられた。そこで本稿では,多言語および言語可搬性実験のための遠隔言語を考察しながら,SAMU-XLSRセマンティックエンリッチメントを改善するための2つのタスク学習手法を提案する。

Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations. A recent study investigated SAMU-XLSR in-domain semantic enrichment by specializing it on downstream transcriptions, leading to state-of-the-art results on a challenging SLU task. This study's interest lies in the loss of multilingual performances and lack of specific-semantics training induced by such specialization in close languages without any SLU implication. We also consider SAMU-XLSR's loss of initial cross-lingual abilities due to a separate SLU fine-tuning. Therefore, this paper proposes a dual task learning approach to improve SAMU-XLSR semantic enrichment while considering distant languages for multilingual and language portability experiments.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# バイアススライシング:スライスディスカバリ法による医用画像解析におけるパフォーマンスギャップの説明

Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods ( http://arxiv.org/abs/2406.12142v1 )

ライセンス: Link先を確認

Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen,

(参考訳) 機械学習モデルは、医用画像解析において高い総合的精度を達成した。しかし、特定の患者群におけるパフォーマンス格差は、その臨床的有用性、安全性、公平性に課題をもたらす。これは、性別、年齢、病気のサブタイプに基づく既知の患者グループや、以前は知られていなかった、ラベルのないグループに影響を及ぼす可能性がある。さらに、このような観察された性能格差の根本原因を明らかにすることはしばしば困難であり、緩和の努力を妨げる。本稿では,これらの問題に対処するために,Slice Discovery Methods (SDMs) を用いて,データの解釈可能な未処理部分集合を同定し,観測性能の相違の原因に関する仮説を定式化する。我々は,新しいSDMを導入し,胸部X線からの気胸と無気腫の分類のケーススタディに応用した。本研究は, 仮説定式化におけるSDMの有効性を実証し, 広く用いられている胸部X線データセットおよびモデルにおいて, 男女間の既往の相違について説明する。以上の結果から,胸部ドレインと心電図ワイヤーを併用し,両分類作業におけるショートカット学習について検討した。これらのショートカット特徴の有病率の性差は、ショートカット学習とモデルフェアネス分析の相違点として、観察された分類性能のギャップを引き起こすと考えられる。

Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# 量子リスク下におけるミニマックス線形回帰

Minimax Linear Regression under the Quantile Risk ( http://arxiv.org/abs/2406.12145v1 )

ライセンス: Link先を確認

Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu,

(参考訳) 量子リスク下での線形回帰におけるミニマックス法の設計問題について検討する。まず、任意のノイズレベルと入力の分布に対して、豊富な誤差関数の族に対して正確な極小量子化リスクを求め、OLSの極小性を確立する独立ガウス雑音による実現可能な設定を考える。これにより、平方誤差の特別な場合の既知の下界が改善され、より大きな分布集合よりも極小量子化リスクの低い境界が得られる。入力の分布に関する四分数誤差と四分数仮定の下では、この下限がより大きい問題に対して厳密であることが示される。具体的には、最近提案されたmin-max回帰法の変種における最悪の量子化リスクの上限が一致することを証明し、絶対定数までその最小値を確立する。この結果を、$p \in (2, \infty)$に対するすべての$p$-thパワーエラー関数に拡張することで、我々のアプローチの有用性を説明する。その過程で、古典的ベイズ法に類似した、量子化リスクを扱う際のミニマックスリスクを低く抑える方法や、サンプル共分散行列の最小固有値の量子化の厳密なキャラクタリゼーションを開発する。

We study the problem of designing minimax procedures in linear regression under the quantile risk. We start by considering the realizable setting with independent Gaussian noise, where for any given noise level and distribution of inputs, we obtain the exact minimax quantile risk for a rich family of error functions and establish the minimaxity of OLS. This improves on the known lower bounds for the special case of square error, and provides us with a lower bound on the minimax quantile risk over larger sets of distributions. Under the square error and a fourth moment assumption on the distribution of inputs, we show that this lower bound is tight over a larger class of problems. Specifically, we prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure, thereby establishing its minimaxity, up to absolute constants. We illustrate the usefulness of our approach by extending this result to all $p$-th power error functions for $p \in (2, \infty)$. Along the way, we develop a generic analogue to the classical Bayesian method for lower bounding the minimax risk when working with the quantile risk, as well as a tight characterization of the quantiles of the smallest eigenvalue of the sample covariance matrix.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# AIはコードを最適化すべきか? 現在の大規模言語モデルと古典的最適化コンパイラの比較研究

Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers ( http://arxiv.org/abs/2406.12146v1 )

ライセンス: Link先を確認

Miguel Romero Rosas, Miguel Torres Sanchez, Rudolf Eigenmann,

(参考訳) 現代のコンピュータアーキテクチャの状況では、効率的な並列プログラミングの需要は持続し、堅牢な最適化技術を必要としている。従来の最適化コンパイラはこの取り組みにおいて歴史的に重要な役割を担い、現代のソフトウェアシステムの複雑さの進化に適応してきた。大規模言語モデル(LLM)の出現は、コード最適化方法論に革命をもたらすAI駆動アプローチの可能性に関する興味深い疑問を提起する。本稿では、GPT-4.0とCodeLlama-70Bの2つの最先端大言語モデルと従来の最適化コンパイラの比較分析を行い、最適化の能力と限界を最大効率のために評価する。さらに,これらのツールが生成するコードのパフォーマンスと正確性を評価するための,難易度の高い最適化パターンと自動メカニズムのベンチマークスイートも導入する。思考の連鎖(CoT)とインストラクション・プロンプト(IP)の2つの異なるプロンプト手法を用いてLCMの性能を評価した。次に、これらの結果をCETUS、PLUTO、ROSEの3つの従来の最適化コンパイラと比較した。重要な発見は、LLMが現在の最適化コンパイラを上回る性能を持つ一方で、大規模なコードサイズで間違ったコードを生成し、自動検証メソッドを呼び出すことがしばしばあることである。 3つのベンチマークスイートで広範囲に評価したところ、CodeLlama-70Bは2.1倍のスピードアップを達成できる2つのLLMの中で、優れたオプティマイザであることがわかった。さらに、CETUSは最適化コンパイラの中でも最高であり、最大1.9倍のスピードアップを実現している。また,思考の連鎖 (Cot) とインストラクション・プロンプト (IP) の2つの方法の間に有意な差は認められなかった。

In the contemporary landscape of computer architecture, the demand for efficient parallel programming persists, needing robust optimization techniques. Traditional optimizing compilers have historically been pivotal in this endeavor, adapting to the evolving complexities of modern software systems. The emergence of Large Language Models (LLMs) raises intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies. This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers, assessing their respective abilities and limitations in optimizing code for maximum efficiency. Additionally, we introduce a benchmark suite of challenging optimization patterns and an automatic mechanism for evaluating performance and correctness of the code generated by such tools. We used two different prompting methodologies to assess the performance of the LLMs -- Chain of Thought (CoT) and Instruction Prompting (IP). We then compared these results with three traditional optimizing compilers, CETUS, PLUTO and ROSE, across a range of real-world use cases. A key finding is that while LLMs have the potential to outperform current optimizing compilers, they often generate incorrect code on large code sizes, calling for automated verification methods. Our extensive evaluation across 3 different benchmarks suites shows CodeLlama-70B as the superior optimizer among the two LLMs, capable of achieving speedups of up to 2.1x. Additionally, CETUS is the best among the optimizing compilers, achieving a maximum speedup of 1.9x. We also found no significant difference between the two prompting methods: Chain of Thought (Cot) and Instructing prompting (IP).

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# メタ認知AI:その枠組みとニューロシンボリックアプローチの事例

Metacognitive AI: Framework and the Case for a Neurosymbolic Approach ( http://arxiv.org/abs/2406.12147v1 )

ライセンス: Link先を確認

Hua Wei, Paulo Shakarian, Christian Lebiere, Bruce Draper, Nikhil Krishnaswamy, Sergei Nirenburg,

(参考訳) メタ認知は、エージェントの内部過程を推論する概念であり、もともとは発達心理学の分野で導入された。本稿では,メタ認知を人工知能に適用する概念について考察する。我々は、TRAPと呼ばれるメタ認知人工知能(AI)を理解するための枠組みを導入する。我々は、これらの局面のそれぞれについて議論し、メタ認知の課題に対処するために、ニューロシンボリックAI(NSAI)をどのように活用できるかを探求する。

Metacognition is the concept of reasoning about an agent's own internal processes and was originally introduced in the field of developmental psychology. In this position paper, we examine the concept of applying metacognition to artificial intelligence. We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and perception. We discuss each of these aspects in-turn and explore how neurosymbolic AI (NSAI) can be leveraged to address challenges of metacognition.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# カオスマイニング:低SNR環境における局所帰属手法の評価ベンチマーク

ChaosMining: A Benchmark to Evaluate Post-Hoc Local Attribution Methods in Low SNR Environments ( http://arxiv.org/abs/2406.12150v1 )

ライセンス: Link先を確認

Ge Shi, Ziwen Kan, Jason Smucny, Ian Davidson,

(参考訳) 本研究では,リアルタイム機械学習の一般的なシナリオである低信号-雑音比(SNR)を特徴とする領域において,非関連領域から予測力のある特徴を識別するためのポストホック局所帰属法の有効性を検討する。我々は, 記号関数, 画像, 音声データを含む合成データセットを開発し, {\it (Model $\times$ Attribution$\times$ Noise Condition)}三重項のベンチマークを組み込んだ。スクラッチから訓練された様々な古典的モデルを厳格にテストすることにより、これらの属性手法の性能について、複数の条件下での貴重な洞察を得た。これらの知見に基づき、ニューラルネットワークの適用性を高めるために、顕著な再帰的特徴除去(RFE)アルゴリズムの新たな拡張を導入する。我々の実験では、スケーラビリティの制限とともに、予測と特徴選択の長所を強調しています。付録にはさらなる詳細と追加のマイナーな発見が含まれており、広範な議論が交わされている。コードとリソースは \href{https://github.com/geshijoker/ChaosMining/}{URL} で入手できる。

In this study, we examine the efficacy of post-hoc local attribution methods in identifying features with predictive power from irrelevant ones in domains characterized by a low signal-to-noise ratio (SNR), a common scenario in real-world machine learning applications. We developed synthetic datasets encompassing symbolic functional, image, and audio data, incorporating a benchmark on the {\it (Model $\times$ Attribution$\times$ Noise Condition)} triplet. By rigorously testing various classic models trained from scratch, we gained valuable insights into the performance of these attribution methods in multiple conditions. Based on these findings, we introduce a novel extension to the notable recursive feature elimination (RFE) algorithm, enhancing its applicability for neural networks. Our experiments highlight its strengths in prediction and feature selection, alongside limitations in scalability. Further details and additional minor findings are included in the appendix, with extensive discussions. The codes and resources are available at \href{https://github.com/geshijoker/ChaosMining/}{URL}.

翻訳日:2024-06-19 23:28:06 公開日:2024-06-17

# 非負のニューラルネットワークの固定点

Fixed points of nonnegative neural networks ( http://arxiv.org/abs/2106.16239v9 )

ライセンス: Link先を確認

Tomasz J. Piotrowski, Renato L. G. Cavalcante, Mateusz Gabor,

(参考訳) 非負のベクトルを非負のベクトルにマッピングするニューラルネットワークとして定義する、非負のニューラルネットワークの解析に固定点理論を用いる。まず、非負の重みとバイアスを持つ非負のニューラルネットワークは、非線形ペロン・フロベニウス理論の枠組みの中で単調かつ(弱く)スケーラブルな写像として認識できることを示す。この事実により、同じ次元の入力と出力を持つ非負のニューラルネットワークの固定点の存在条件を提供することができ、これらの条件は凸解析の引数を用いて最近得られた条件よりも弱い。さらに、非負の重みとバイアスを持つ非負のニューラルネットワークの固定点集合の形状が間隔であり、穏やかな条件下では点に縮退することを示した。これらの結果は、より一般的な非負のニューラルネットワークの固定点の存在を得るために用いられる。実用的観点からは, オートエンコーダの挙動の理解に寄与し, 深層平衡モデルにおける今後の発展に有用な数学機械も提供する。

We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.

翻訳日:2024-06-19 13:36:52 公開日:2024-06-17

# 密度推定に基づくクラスタリング評価のための新しい指標

A New Index for Clustering Evaluation Based on Density Estimation ( http://arxiv.org/abs/2207.01294v4 )

ライセンス: Link先を確認

Gangli Liu,

(参考訳) クラスタリングの内部評価のための新しい指標が導入された。インデックスは2つのサブインデックスの混合として定義される。最初のサブインデックス $ I_a $ は Ambiguous Index と呼ばれ、2番目のサブインデックス $ I_s $ は similarity Index と呼ばれる。 2つのサブインデックスの計算は、データのパーティションの各クラスタに対する密度推定に基づいている。新しいインデックスの性能をテストする実験を行い、145データセットのセット上で、Calinski-Harabasz指数、Silhouette係数、Davies-Bouldin指数、CDbw、DBCV、VIASCKDEの6つの内部クラスタリング評価指標と比較した。その結果、新たな指標は、他の内部クラスタリング評価指標を大幅に改善することが示された。

A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.

翻訳日:2024-06-19 13:29:49 公開日:2024-06-17

# 教師なし人物Re-IDのためのドメインカメラ適応と協調多重特徴クラスタリング

Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID ( http://arxiv.org/abs/2208.08624v2 )

ライセンス: Link先を確認

Yuanpeng Tu,

(参考訳) 最近、制限付きアノテートデータが利用可能なオープンワールドのシナリオ設定のために、教師なしの人物再識別(re-ID)が注目されている。既存の教師なし手法は、しばしば目に見えない領域でうまく一般化できないが、教師なし手法は、主に複数の粒度情報がなく、確認バイアスに悩まされる傾向がある。本稿では,2つの側面から未確認対象領域のより優れた特徴表現を求める。 1) ラベル付きソースドメインに教師なしのドメインアダプティブを実行し、 2)未ラベル対象ドメインにおけるマイニング可能性の類似性。また、確認バイアスの影響を軽減するために、協調的な擬似再ラベル戦略を提案する。まず、生成対向ネットワークを利用して、ソースドメインからターゲットドメインへの画像転送を行う。さらに、生成した画像の品質を向上させるために、個人識別の保存とアイデンティティマッピングの損失を導入する。第2に,大域的特徴や部分的特徴分岐を含む対象領域の内部データ構造を学習するための協調的多機能クラスタリングフレームワーク(CMFC)を提案する。グローバル機能ブランチ(GB)は、人画像のグローバル機能に教師なしクラスタリングを採用し、部分機能ブランチ(PB)は、異なる身体領域内で類似性をマイニングする。最後に、2つのベンチマークデータセットに対する広範な実験により、教師なしの人物再ID設定下での手法の競合性能を示す。

Recently unsupervised person re-identification (re-ID) has drawn much attention due to its open-world scenario settings where limited annotated data is available. Existing supervised methods often fail to generalize well on unseen domains, while the unsupervised methods, mostly lack multi-granularity information and are prone to suffer from confirmation bias. In this paper, we aim at finding better feature representations on the unseen target domain from two aspects, 1) performing unsupervised domain adaptation on the labeled source domain and 2) mining potential similarities on the unlabeled target domain. Besides, a collaborative pseudo re-labeling strategy is proposed to alleviate the influence of confirmation bias. Firstly, a generative adversarial network is utilized to transfer images from the source domain to the target domain. Moreover, person identity preserving and identity mapping losses are introduced to improve the quality of generated images. Secondly, we propose a novel collaborative multiple feature clustering framework (CMFC) to learn the internal data structure of target domain, including global feature and partial feature branches. The global feature branch (GB) employs unsupervised clustering on the global feature of person images while the Partial feature branch (PB) mines similarities within different body regions. Finally, extensive experiments on two benchmark datasets show the competitive performance of our method under unsupervised person re-ID settings.

翻訳日:2024-06-19 13:29:49 公開日:2024-06-17

# 樹木のトラバーサル再考

Rethink Tree Traversal ( http://arxiv.org/abs/2209.04825v5 )

ライセンス: Link先を確認

Jinxiong Zhang,

(参考訳) 本稿では,行列計算の言語における二分決定木トラバーサルの実装方法について述べる。我々の主な貢献は、決定木の階層構造の新しい行列表現に基づく二分木トラバーサルの等価アルゴリズムを提案することである。私たちのキーとなるアイデアは、内部積探索の最大化によるバイナリ決定ツリーの移動です。我々は、再帰的トラバースのない決定木メソッドを実装するだけでなく、木ベースのメソッドのパーティショニングの性質を掘り下げる。

We will show how to implement binary decision tree traversal in the language of matrix computation. Our main contribution is to propose some equivalent algorithms of binary tree traversal based on a novel matrix representation of the hierarchical structure of the decision tree. Our key idea is to travel the binary decision tree by maximum inner product search. We not only implement decision tree methods without the recursive traverse but also delve into the partitioning nature of tree-based methods.

翻訳日:2024-06-19 13:29:49 公開日:2024-06-17

# 生涯学習対話システム

Lifelong and Continual Learning Dialogue Systems ( http://arxiv.org/abs/2211.06553v2 )

ライセンス: Link先を確認

Sahisnu Mazumder, Bing Liu,

(参考訳) チャットボットとして知られる対話システムは、ユーザとのチャット会話やタスク指向の対話で様々なタスクをこなすために広く普及しているため、近年で普及している。既存のチャットボットは通常、事前にコンパイルされたデータや手動でラベル付けされたデータからトレーニングされる。多くの人が手動でコンパイルされた知識ベース(KB)を使用している。自然言語を理解する能力はまだ限られており、多くのエラーが発生する傾向にあり、ユーザ満足度は低い。通常、よりラベル付きデータとより手作業でコンパイルされた知識を持つエンジニアによって継続的に改善される必要がある。本書では,チャットボットがユーザや作業環境との自己開始型対話を通じて,自分自身で継続的に学習する能力を実現するための,生涯学習対話システムの新たなパラダイムを紹介する。システムがユーザとチャットしたり、外部ソースからより多くのことを学ぶようになると、会話の知識が増し、より良くなる。本書では,ユーザからの会話中,外部ソースからの会話中,新たな言語表現や語彙的,事実的知識を継続的に学習し,会話中に新たなトレーニング例を取得し,会話スキルを習得する,このような継続的な学習対話システムを構築するための最新の開発技術について紹介する。これらの一般的な話題とは別に、対話システムの特定の側面の継続的な学習に関する既存の研究も調査されている。この本は、将来の研究におけるオープンな課題に関する議論で締めくくられている。

Dialogue systems, commonly known as chatbots, have gained escalating popularity in recent times due to their wide-spread applications in carrying out chit-chat conversations with users and task-oriented dialogues to accomplish various user tasks. Existing chatbots are usually trained from pre-collected and manually-labeled data and/or written with handcrafted rules. Many also use manually-compiled knowledge bases (KBs). Their ability to understand natural language is still limited, and they tend to produce many errors resulting in poor user satisfaction. Typically, they need to be constantly improved by engineers with more labeled data and more manually compiled knowledge. This book introduces the new paradigm of lifelong learning dialogue systems to endow chatbots the ability to learn continually by themselves through their own self-initiated interactions with their users and working environments to improve themselves. As the systems chat more and more with users or learn more and more from external sources, they become more and more knowledgeable and better and better at conversing. The book presents the latest developments and techniques for building such continual learning dialogue systems that continuously learn new language expressions and lexical and factual knowledge during conversation from users and off conversation from external sources, acquire new training examples during conversation, and learn conversational skills. Apart from these general topics, existing works on continual learning of some specific aspects of dialogue systems are also surveyed. The book concludes with a discussion of open challenges for future research.

翻訳日:2024-06-19 13:29:49 公開日:2024-06-17

# 最適化ユニタリ結合クラスタアンサッツを用いた実験的量子計算化学

Experimental quantum computational chemistry with optimised unitary coupled cluster ansatz ( http://arxiv.org/abs/2212.08006v3 )

ライセンス: Link先を確認

Shaojun Guo, Jinzhao Sun, Haoran Qian, Ming Gong, Yukun Zhang, Fusheng Chen, Yangsen Ye, Yulin Wu, Sirui Cao, Kun Liu, Chen Zha, Chong Ying, Qingling Zhu, He-Liang Huang, Youwei Zhao, Shaowei Li, Shiyu Wang, Jiale Yu, Daojin Fan, Dachao Wu, Hong Su, Hui Deng, Hao Rong, Yuan Li, Kaili Zhang, Tung-Hsun Chung, Futian Liang, Jin Lin, Yu Xu, Lihua Sun, Cheng Guo, Na Li, Yong-Heng Huo, Cheng-Zhi Peng, Chao-Yang Lu, Xiao Yuan, Xiaobo Zhu, Jian-Wei Pan,

(参考訳) 量子計算化学は量子コンピューティングの重要な応用として登場した。変分量子固有解法(VQE)のようなハイブリッド量子古典計算法は、量子化学問題の有望な解法として設計されているが、理論的な複雑さと実験上の不完全性により、信頼性と正確な結果が得られない。電子構造を解くための実験は、いまだに計算不能(ハードウエア効率)または古典的にシミュレート可能な(Hartree-Fock)アンサッツに制限されている。スケーラブルで高精度な量子化学シミュレーションの実験的実現はいまだ解明されていない。ここでは、ノイズ量子プロセッサを用いて分子電子構造を解くことに伴う重要な課題に対処する。本プロトコルは, 回路深度, 走行時間, 化学シミュレーションの指標を著しく改善する。ハードウェアの体系的な拡張とエラー軽減技術の統合を通じて、実験的な量子計算化学の限界を推し進め、最適化されたユニタリ結合クラスタ・アザッツを12キュービットに拡大してVQEの実装を成功させた。誤差抑制分子の基底状態エネルギーの高精度な計算結果を2桁程度精度で生成する。すべての結合距離におけるH$_2$の化学的精度と実験中の小さな結合距離におけるLiHの化学的精度は、最近の2つの同時処理を超えても達成される。我々の研究は、電子構造計算におけるスケーラブルなソリューションへの実現可能な道筋を示し、重要な技術的特徴を検証し、この目標の今後の課題を特定する。

Quantum computational chemistry has emerged as an important application of quantum computing. Hybrid quantum-classical computing methods, such as variational quantum eigensolvers (VQE), have been designed as promising solutions to quantum chemistry problems, yet challenges due to theoretical complexity and experimental imperfections hinder progress in achieving reliable and accurate results. Experimental works for solving electronic structures are consequently still restricted to nonscalable (hardware efficient) or classically simulable (Hartree-Fock) ansatz, or limited to a few qubits with large errors. The experimental realisation of scalable and high-precision quantum chemistry simulation remains elusive. Here, we address the critical challenges {associated with} solving molecular electronic structures using noisy quantum processors. Our protocol presents significant improvements in the circuit depth and running time, key metrics for chemistry simulation. Through systematic hardware enhancements and the integration of error mitigation techniques, we push forward the limit of experimental quantum computational chemistry and successfully scale up the implementation of VQE with an optimised unitary coupled-cluster ansatz to 12 qubits. We produce high-precision results of the ground-state energy for molecules with error suppression by around two orders of magnitude. We achieve chemical accuracy for H$_2$ at all bond distances and LiH at small bond distances in the experiment, even beyond the two recent concurrent works. Our work demonstrates a feasible path towards a scalable solution to electronic structure calculation, validating the key technological features and identifying future challenges for this goal.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# SeaFormer++: モバイル視覚認識のためのスキーズ強化軸変換器

SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition ( http://arxiv.org/abs/2301.13156v5 )

ライセンス: Link先を確認

Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang,

(参考訳) ビジョントランスフォーマーの導入以来、CNNが圧倒的に支配してきた多くのコンピュータビジョンタスク(例えばセマンティックセグメンテーション)のランドスケープは、近年大きく革新されている。しかし、計算コストとメモリ要求により、これらの手法はモバイルデバイスには適さない。本稿では,モバイル視覚認識のための圧縮強化軸変換器(SeaFormer)を提案する。具体的には、圧縮軸の定式化と詳細強化を特徴とする一般的な注意ブロックを設計する。さらにコスト効率のよいバックボーンアーキテクチャのファミリを作成するためにも使用できる。光セグメンテーションヘッドと組み合わせることで、ADE20K、Cityscapes、Pascal Context、COCO-Stuffデータセット上のARMベースのモバイルデバイス上で、セグメンテーション精度とレイテンシの最良のトレードオフを実現する。重要なことは、モバイルフレンドリーなライバルとTransformerベースのライバルの両方を、ベルやホイッスルを使わずに、パフォーマンスとレイテンシの低下で打ち負かした。さらに,機能アップサンプリングに基づくマルチレゾリューション蒸留技術を導入し,提案フレームワークの推論遅延を低減した。セマンティックセグメンテーション以外にも、提案するSeaFormerアーキテクチャを画像分類やオブジェクト検出問題に適用し、モバイルフレンドリーなバックボーンとして機能する可能性を示す。私たちのコードとモデルはhttps://github.com/fudan-zvg/SeaFormer.comで公開されています。

Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K, Cityscapes, Pascal Context and COCO-Stuff datasets. Critically, we beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles. Furthermore, we incorporate a feature upsampling-based multi-resolution distillation technique, further reducing the inference latency of the proposed framework. Beyond semantic segmentation, we further apply the proposed SeaFormer architecture to image classification and object detection problems, demonstrating the potential of serving as a versatile mobile-friendly backbone. Our code and models are made publicly available at https://github.com/fudan-zvg/SeaFormer.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# 未熟児網膜症の深層学習分類を改善するリトカム画像のための新しい基礎画像前処理法

Novel Fundus Image Preprocessing for Retcam Images to Improve Deep Learning Classification of Retinopathy of Prematurity ( http://arxiv.org/abs/2302.02524v5 )

ライセンス: Link先を確認

Sajid Rahim, Kourosh Sabri, Anna Ells, Alan Wassyng, Mark Lawford, Linyang Chu, Wenbo He,

(参考訳) 未熟児網膜症(英: Retinopathy of Prematurity、ROP)は、未熟児に影響を与える眼の網膜の損傷により失明する可能性のある眼疾患である。 ROPのスクリーニングは早期発見と治療に不可欠である。これは、臨床上重要な疾患の診断成功率を低下させる原因となる主観的な拡張眼科検査を訓練された医師に要求する、退屈で手動のプロセスである。自動診断法は、深層学習を用いて眼科医が診断精度を向上させるのに役立つ。いくつかの研究グループが様々なアプローチを強調している。キャプチャされたROPリトカム画像は、品質の低下に悩まされる。本稿では,事前学習フレームワークを用いた新しい基礎前処理手法を用いてハイブリッドモデルを構築し,診断精度を高めることを提案する。評価の結果、従来の画像処理と比較して、Plus病、ROPのステージ、およびゾーンの分類において、これらの新しい手法が、ピアペーパーと比較して、多くの面で、より良い精度に寄与することが確認された。

Retinopathy of Prematurity (ROP) is a potentially blinding eye disorder because of damage to the eye's retina which can affect babies born prematurely. Screening of ROP is essential for early detection and treatment. This is a laborious and manual process which requires trained physician performing dilated ophthalmological examination which can be subjective resulting in lower diagnosis success for clinically significant disease. Automated diagnostic methods can assist ophthalmologists increase diagnosis accuracy using deep learning. Several research groups have highlighted various approaches. Captured ROP Retcam images suffer from poor quality. This paper proposes the use of improved novel fundus preprocessing methods using pretrained transfer learning frameworks to create hybrid models to give higher diagnosis accuracy. Once trained and validated, the evaluations showed that these novel methods in comparison to traditional imaging processing contribute to better and in many aspects higher accuracy in classifying Plus disease, Stages of ROP and Zones in comparison to peer papers.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# エッジコンピューティングにおけるトポロジーを考慮したフェデレーションラーニング - 総合的な調査

Topology-aware Federated Learning in Edge Computing: A Comprehensive Survey ( http://arxiv.org/abs/2302.02573v2 )

ライセンス: Link先を確認

Jiajun Wu, Steve Drew, Fan Dong, Zhuangdi Zhu, Jiayu Zhou,

(参考訳) 5G/6Gアプリケーションの超低レイテンシ要件とプライバシ制約は、分散機械学習システムをエッジにデプロイすることを要求している。シンプルだが効果的なアプローチであるフェデレーションドラーニング(FL)は、分散トレーニングデータとプライベートトレーニングデータを使ったエッジコンピューティングにおける、巨大なユーザ所有デバイスに対する自然なソリューションである。 FedAvgをベースとしたFL法は典型的には、不安定なエッジコンピューティングアーキテクチャやトポロジーの不均一性や階層性を無視して、ナイーブな星トポロジーに従う。他にもいくつかのネットワークトポロジーが存在し、恒星トポロジーの限界とボトルネックに対処することができる。これは、ネットワークトポロジに関連するFLソリューションを調査する動機となります。本稿では,ネットワークトポロジに着目した既存のFL作品の包括的調査を行う。 FLおよびエッジコンピューティングネットワークの概要を概説した後、様々なエッジネットワークトポロジとそれらの利点とデメリットについて論じる。最後に、FLをトポロジ固有のエッジネットワークに適用するための課題と今後の課題について論じる。

The ultra-low latency requirements of 5G/6G applications and privacy constraints call for distributed machine learning systems to be deployed at the edge. With its simple yet effective approach, federated learning (FL) is a natural solution for massive user-owned devices in edge computing with distributed and private training data. FL methods based on FedAvg typically follow a naive star topology, ignoring the heterogeneity and hierarchy of the volatile edge computing architectures and topologies in reality. Several other network topologies exist and can address the limitations and bottlenecks of the star topology. This motivates us to survey network topology-related FL solutions. In this paper, we conduct a comprehensive survey of the existing FL works focusing on network topologies. After a brief overview of FL and edge computing networks, we discuss various edge network topologies as well as their advantages and disadvantages. Lastly, we discuss the remaining challenges and future works for applying FL to topology-specific edge networks.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# デビアスのためのバックドア:バックドアアタックに基づく人工バイアスによるモデルバイアスの緩和

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias ( http://arxiv.org/abs/2303.01504v2 )

ライセンス: Link先を確認

Shangxi Wu, Qiuyang He, Dongyuan Lu, Jian Yu, Jitao Sang,

(参考訳) ディープラーニングの急速な進歩により、最先端のアルゴリズムは様々な社会的状況で利用されてきた。それでも、いくつかのアルゴリズムはバイアスを示し、不平等な結果をもたらすことが発見されている。現在のデバイアス法では、データの低利用や複雑なトレーニング要件といった課題に直面している。本研究では, バックドア攻撃により, 標準訓練によるモデルバイアスに類似した人工バイアスが構築できることを見出した。バックドア・トリガーの強い調整性を考えると、バックドア・アタックから生じるリバース・人工バイアスを慎重に設計することでモデルバイアスを緩和する動機がある。そこで本研究では,知識蒸留に基づくバックドア脱バイアスフレームワークを提案し,モデルバイアスを元のデータから効果的に低減し,バックドア攻撃によるセキュリティリスクを最小限に抑える。提案手法は、画像と構造化されたデータセットの両方で検証され、有望な結果を示す。この作業はバックドア攻撃の理解を深め、有益なアプリケーションの可能性を強調します。この研究のコードは \url{https://anonymous.4open.science/r/DwB-BC07/} で見ることができる。

With the swift advancement of deep learning, state-of-the-art algorithms have been utilized in various social situations. Nonetheless, some algorithms have been discovered to exhibit biases and provide unequal results. The current debiasing methods face challenges such as poor utilization of data or intricate training requirements. In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training. Considering the strong adjustability of backdoor triggers, we are motivated to mitigate the model bias by carefully designing reverse artificial bias created from backdoor attack. Based on this, we propose a backdoor debiasing framework based on knowledge distillation, which effectively reduces the model bias from original data and minimizes security risks from the backdoor attack. The proposed solution is validated on both image and structured datasets, showing promising results. This work advances the understanding of backdoor attacks and highlights its potential for beneficial applications. The code for the study can be found at \url{https://anonymous.4open.science/r/DwB-BC07/}.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# 連続グロモフ・ワッサーシュタイン問題の解法

Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem ( http://arxiv.org/abs/2303.05978v2 )

ライセンス: Link先を確認

Xavier Aramayo Carrasco, Maksim Nekrashevich, Petr Mokrov, Evgeny Burnaev, Alexander Korotin,

(参考訳) 近年,Gromov-Wasserstein Optimal Transport (GWOT)問題がMLコミュニティの注目を集めている。この問題において、2つの(おそらく異なる)空間上で支えられる2つの分布が与えられたとき、それらの間の最も等距離写像を見つける必要がある。 GWOT の離散変種では、与えられた離散点の集合間の代入を学習する。より高度な連続定式化では、未知の連続分布のパラメトリックマッピングを、それらから派生したサンプルに基づいて復元することを目的としている。 GWOTの背後にある明らかな幾何学的直観は、いくつかの実用的なユースケースにとって自然な選択となり、提案された多くの解法がもたらされる。それらのいくつかは、問題の継続的バージョンを解決していると主張している。同時に、GWOTは理論上も数値上も難しいと悪名高い。さらに、既存の連続GWOTソルバは依然として離散技術に大きく依存している。どのようにして既存の手法がGWOT問題を解き放つか、どのような困難に遭遇し、どの条件で成功するか、という自然な疑問が生まれます。我々のベンチマーク論文はこれらの質問に答える試みである。特に、最も興味深く、議論の余地のないセットアップとして、継続的GWOTに注目します。既存の連続GWOTアプローチをさまざまなシナリオでクラッシュテストし、結果を注意深く記録し分析し、問題を特定します。我々の研究結果は、科学コミュニティが依然として信頼性の高いGWOT解決器を欠いていることを実験的に証明している。この方向への第一歩として、離散技術に依存しない新しい連続GWOT法を提案し、競合者の問題を部分的に解決する。私たちのコードはhttps://github.com/Ark-130994/GW-Solvers.comで公開されています。

Recently, the Gromov-Wasserstein Optimal Transport (GWOT) problem has attracted the special attention of the ML community. In this problem, given two distributions supported on two (possibly different) spaces, one has to find the most isometric map between them. In the discrete variant of GWOT, the task is to learn an assignment between given discrete sets of points. In the more advanced continuous formulation, one aims at recovering a parametric mapping between unknown continuous distributions based on i.i.d. samples derived from them. The clear geometrical intuition behind the GWOT makes it a natural choice for several practical use cases, giving rise to a number of proposed solvers. Some of them claim to solve the continuous version of the problem. At the same time, GWOT is notoriously hard, both theoretically and numerically. Moreover, all existing continuous GWOT solvers still heavily rely on discrete techniques. Natural questions arise: to what extent existing methods unravel GWOT problem, what difficulties they encounter, and under which conditions they are successful. Our benchmark paper is an attempt to answer these questions. We specifically focus on the continuous GWOT as the most interesting and debatable setup. We crash-test existing continuous GWOT approaches on different scenarios, carefully record and analyze the obtained results, and identify issues. Our findings experimentally testify that the scientific community is still missing a reliable continuous GWOT solver, which necessitates further research efforts. As the first step in this direction, we propose a new continuous GWOT method which does not rely on discrete techniques and partially solves some of the problems of the competitors. Our code is available at https://github.com/Ark-130994/GW-Solvers.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# FedML-HE: 効率的な同型暗号化に基づくプライバシ保護フェデレーション学習システム

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System ( http://arxiv.org/abs/2303.10837v3 )

ライセンス: Link先を確認

Weizhao Jin, Yuhang Yao, Shanshan Han, Jiajun Gu, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He,

(参考訳) Federated Learningは、ローカルデータではなく、ローカルモデルのアップデートを集約することによって、分散デバイス上の機械学習モデルをトレーニングする。しかし、サーバ上の集約されたローカルモデルは、逆攻撃によって機密性の高い個人情報を明らかにする可能性があるため、プライバシー上の懸念が生じる。ホモモルフィック暗号化(HE)のようなプライバシ保護手法はFLトレーニングに必要となる。 HEのプライバシー上の優位性にもかかわらず、そのアプリケーションは特に基礎モデルにおいて非現実的なオーバーヘッドに悩まされている。本稿では,HedML-HEをベースとした安全なモデルアグリゲーションを効率よく実現した,最初の実践的フェデレーション学習システムを提案する。 FedML-HEは、機密パラメータを選択的に暗号化し、カスタマイズ可能なプライバシ保護を提供しながら、トレーニング中の計算と通信のオーバーヘッドを大幅に削減することを提案する。最適化されたシステムでは,特に大規模な基盤モデル(ResNet-50では10倍,BERTでは40倍程度)において,大幅なオーバーヘッド削減を実現しています。

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.

翻訳日:2024-06-19 13:20:03 公開日:2024-06-17

# StepMix: 外部変数を持つ一般化混合モデルの擬似的推定のためのPythonパッケージ

StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables ( http://arxiv.org/abs/2304.03853v6 )

ライセンス: Link先を確認

Sacha Morin, Robin Legault, Félix Laliberté, Zsuzsa Bakk, Charles-Édouard Giguère, Roxane de la Sablonnière, Éric Lacourse,

(参考訳) StepMixは、外部変数(共変量および遠位結果)を持つ一般化有限混合モデル(潜時プロファイルおよび潜時クラス分析)の擬似的様相推定(1段階、2段階、3段階のアプローチ)のためのオープンソースのPythonパッケージである。社会科学における多くの応用において、主な目的は個人を潜在クラスにクラスタリングするだけでなく、これらのクラスを使用してより複雑な統計モデルを開発することである。これらのモデルは一般に、潜在クラスを観察された指標に関連付ける測定モデルと、共変量と結果変数を潜在クラスに関連付ける構造モデルに分けられる。測定と構造モデルは、いわゆるワンステップアプローチやステップワイド手法を用いて、共同で推定することができる。 1段階法に加えて、Blk-Croon-Hagenaarsを用いたバイアス調整3段階法や最大誤差補正、より最近の2段階法など、文献から最も重要なステップワイズ推定手法を実装している。これらの擬似的様相推定器は、特定の期待-最大化サブルーチンとして統一された枠組みの下で提示される。データサイエンスコミュニティにおける彼らの採用を促進するため、StepMixはScikit-Lernライブラリのオブジェクト指向設計に従い、追加のRラッパーを提供する。

StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# PiClick: クリックベースのインタラクティブセグメンテーションにおいて、複数の候補から望ましいマスクを選択する

PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation ( http://arxiv.org/abs/2304.11609v5 )

ライセンス: Link先を確認

Cilin Yan, Haochen Wang, Jie Liu, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves,

(参考訳) クリックベースのインタラクティブセグメンテーションは、効率的なピクセルレベルのアノテーションと画像編集を容易にする、人間のクリックによるターゲットマスクの生成を目的としている。そのようなタスクでは、目的の曖昧さはセグメンテーションの精度と効率を妨げる問題のままである。つまり、リッチなコンテキストのシーンでは、1クリックで複数の潜在的なターゲットに対応できるが、従来の対話型セグメンタは1つのマスクしか生成せず、ターゲットの曖昧さに対処できない。本稿では,PiClickという対話型セグメンテーションネットワークを提案する。具体的には、Transformerベースのアーキテクチャを使用して、相互に対話的なマスククエリによって、潜在的なすべてのマスクを生成する。さらに、ターゲット推論モジュール(TRM)がPiClickで設計され、ターゲットの曖昧さと外的努力を軽減し、すべての候補からユーザ希望のマスクを自動的に提案する。 9つのインタラクティブなセグメンテーションデータセットに関する大規模な実験は、セグメンテーション結果を考慮して、PiClickが従来の最先端技術に対して好適に機能することを示した。さらに,PiClickは,所望のマスクのアノテートや選択において,人間の努力を効果的に削減することを示す。 PiClickのソースコードをhttps://github.com/cilinyan/PiClickのプラグイン・アンド・プレイアノテーションツールと一緒にリリースします。

Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors only generate a single mask and fail to deal with target ambiguity. In this paper, we propose a novel interactive segmentation network named PiClick, to yield all potentially reasonable masks and suggest the most plausible one for the user. Specifically, PiClick utilizes a Transformer-based architecture to generate all potential target masks by mutually interactive mask queries. Moreover, a Target Reasoning module(TRM) is designed in PiClick to automatically suggest the user-desired mask from all candidates, relieving target ambiguity and extra-human efforts. Extensive experiments on 9 interactive segmentation datasets demonstrate PiClick performs favorably against previous state-of-the-arts considering the segmentation results. Moreover, we show that PiClick effectively reduces human efforts in annotating and picking the desired masks. To ease the usage and inspire future research, we release the source code of PiClick together with a plug-and-play annotation tool at https://github.com/cilinyan/PiClick.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# 家系図のトポロジーと会員の満足感 : 機械学習アプローチ

The Topology of a Family Tree Graph and Its Members' Satisfaction with One Another: A Machine Learning Approach ( http://arxiv.org/abs/2305.01552v2 )

ライセンス: Link先を確認

Teddy Lazebnik, Amit Yaniv-Rosenfeld,

(参考訳) 家族同士の満足感は、健全で支援的な家族環境作りの中心である。本研究では,ある家系図のトポロジと,そのメンバーの満足度との関係を探索する新しい計算手法を提案し,実装する。広範な実証評価(N=486$ family)を通じて,提案手法は,家族間の満足度を家族グラフのトポロジのみに基づいて予測する上で,高精度な結果をもたらすことを示した。さらに,本手法は,家族の満足度に係わる確立した特徴に依拠するベースライン回帰モデルと比較して,先行文献において好意的に比較した。

Family members' satisfaction with one another is central to creating healthy and supportive family environments. In this work, we propose and implement a novel computational technique aimed at exploring the possible relationship between the topology of a given family tree graph and its members' satisfaction with one another. Through an extensive empirical evaluation ($N=486$ families), we show that the proposed technique brings about highly accurate results in predicting family members' satisfaction with one another based solely on the family graph's topology. Furthermore, the results indicate that our technique favorably compares to baseline regression models which rely on established features associated with family members' satisfaction with one another in prior literature.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# 説明可能な人工知能手法の展望:SHAPとLIME

A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME ( http://arxiv.org/abs/2305.02012v3 )

ライセンス: Link先を確認

Ahmed Salih, Zahra Raisi-Estabragh, Ilaria Boscolo Galazzo, Petia Radeva, Steffen E. Petersen, Gloria Menegaz, Karim Lekadir,

(参考訳) eXplainable AI(XAI)メソッドは、機械学習(ML)モデルのブラックボックスを、より消化しやすい形式に変換するために登場した。これらの方法は、MLモデルをより透過的にし、エンドユーザの信頼をアウトプットに高めることを目的として、モデルがどのように機能するかを伝えるのに役立つ。 SHapley Additive exPlanations (SHAP) と Local Interpretable Model Agnostic Explanation (LIME) は2つの広く使われているXAI法である。本稿では、これらの2つの手法の説明可能性指標の生成方法について論じ、その弱点と強みを浮き彫りにして、それらの出力を解釈する枠組みを提案する。具体的には, 生体医学領域(心筋梗塞の有無にかかわらず個人を分類する)の事例から, モデル依存性と特徴間のコリニアリティの有無について考察した。以上の結果から,SHAPとLIMEはMLモデルや特徴コリナリティーの影響を強く受けており,その使用法や解釈に注意を喚起している。

eXplainable artificial intelligence (XAI) methods have emerged to convert the black box of machine learning (ML) models into a more digestible form. These methods help to communicate how the model works with the aim of making ML models more transparent and increasing the trust of end-users into their output. SHapley Additive exPlanations (SHAP) and Local Interpretable Model Agnostic Explanation (LIME) are two widely used XAI methods, particularly with tabular data. In this perspective piece, we discuss the way the explainability metrics of these two methods are generated and propose a framework for interpretation of their outputs, highlighting their weaknesses and strengths. Specifically, we discuss their outcomes in terms of model-dependency and in the presence of collinearity among the features, relying on a case study from the biomedical domain (classification of individuals with or without myocardial infarction). The results indicate that SHAP and LIME are highly affected by the adopted ML model and feature collinearity, raising a note of caution on their usage and interpretation.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# サブスペース学習によるブラックボックスプロンプトチューニング

Black-box Prompt Tuning with Subspace Learning ( http://arxiv.org/abs/2305.03518v2 )

ライセンス: Link先を確認

Yuanhang Zheng, Zhixing Tan, Peng Li, Yang Liu,

(参考訳) ブラックボックスプロンプトチューニングは、大言語モデル(LLM)のネットワークをバックプロパゲートするのではなく、低次元のサブ空間内でプロンプトを学習するためにデリバティブフリー最適化アルゴリズムを用いる。近年の研究では、ブラックボックスのプロンプトチューニングはタスクやLLM間の汎用性に欠けており、これは部分空間の最適下選択に関係していると考えられている。本稿では,サブスペース・ラーニング(BSL)を用いたブラックボックス・プロンプト・チューニングを導入し,ブラックボックス・プロンプト・チューニングの汎用性を高める。類似したタスクのほぼ最適なプロンプトが共通部分空間に存在するという仮定に基づいて、類似したタスクのコレクション上でメタラーニングによってそのようなサブスペースを特定することを提案する。したがって、ソースタスクと類似性を共有するターゲットタスクに対しては、特定したサブスペース内での最適化により、ターゲットタスクに対して良好に動作するプロンプトが得られることを期待する。実験の結果,BSL フレームワークは様々な下流タスクや LLM の競合性能を一貫して達成していることがわかった。

Black-box prompt tuning employs derivative-free optimization algorithms to learn prompts within low-dimensional subspaces rather than back-propagating through the network of Large Language Models (LLMs). Recent studies reveal that black-box prompt tuning lacks versatility across tasks and LLMs, which we believe is related to the suboptimal choice of subspaces. In this paper, we introduce Black-box prompt tuning with Subspace Learning (BSL) to enhance the versatility of black-box prompt tuning. Based on the assumption that nearly optimal prompts for similar tasks reside in a common subspace, we propose identifying such subspaces through meta-learning on a collection of similar source tasks. Consequently, for a target task that shares similarities with the source tasks, we expect that optimizing within the identified subspace can yield a prompt that performs well on the target task. Experimental results confirm that our BSL framework consistently achieves competitive performance across various downstream tasks and LLMs.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# 暗黒領域における共同脱落と低照度化のための2次元劣化表現法

Dual Degradation Representation for Joint Deraining and Low-Light Enhancement in the Dark ( http://arxiv.org/abs/2305.03997v3 )

ライセンス: Link先を確認

Xin Lin, Jingtong Yue, Sixian Ding, Chao Ren, Lu Qi, Ming-Hsuan Yang,

(参考訳) 暗闇の中での雨は、自律運転や監視システム、夜間写真など、現実世界のアプリケーションをデプロイする上で大きな課題となる。既存の低照度化や除染法は、低照度を明るくし、同時に雨を取り除くのに苦労する。また、「低照度化」などのカスケードのアプローチは、しばしば雨のパターンの問題や過度にぼやけ、露出の過剰なイメージをもたらす。これらの課題に対処するために,L$^{2}$RIRNetというエンド・ツー・エンドのモデルを導入する。本モデルでは、DDR-Net(Dual Degradation Representation Network)とRecovery Networkの2つの主要コンポーネントを特徴とする。 DDR-Netは、暗黒領域の輝度効果と光領域の雨パターンの劣化表現を独立に学習し、トレーニングプロセスの導出に二重劣化損失を用いる。復元ネットワークは、FDG(Fourier Detail Guidance)モジュールを用いて劣化した画像を復元する。さらに,合成画像と実世界の低照度画像の両方を含むデータセットをコントリビュートする。我々のL$^{2}$RIRNetは、合成実世界のシナリオと複雑な実世界のシナリオの両方において、既存の手法に対して好意的に作用することを示した。すべてのコードとデータセットは \url{https://github.com/linxin0/Low_light_rainy} で見ることができる。

Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in problematic rain patterns or overly blurred and overexposed images. To address these challenges, we introduce an end-to-end model called L$^{2}$RIRNet, designed to manage both low-light enhancement and deraining in real-world settings. Our model features two main components: a Dual Degradation Representation Network (DDR-Net) and a Restoration Network. The DDR-Net independently learns degradation representations for luminance effects in dark areas and rain patterns in light areas, employing dual degradation loss to guide the training process. The Restoration Network restores the degraded image using a Fourier Detail Guidance (FDG) module, which leverages near-rainless detailed images, focusing on texture details in frequency and spatial domains to inform the restoration process. Furthermore, we contribute a dataset containing both synthetic and real-world low-light-rainy images. Extensive experiments demonstrate that our L$^{2}$RIRNet performs favorably against existing methods in both synthetic and complex real-world scenarios. All the code and dataset can be found in \url{https://github.com/linxin0/Low_light_rainy}.

翻訳日:2024-06-19 13:10:19 公開日:2024-06-17

# 大規模言語モデルのための自己教師型論理強化学習の探索

Exploring Self-supervised Logic-enhanced Training for Large Language Models ( http://arxiv.org/abs/2305.13718v7 )

ライセンス: Link先を確認

Fangkai Jiao, Zhiyang Teng, Bosheng Ding, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty,

(参考訳) 言語モデルの論理的推論能力を改善する努力は、主に教師付き微調整に依存し、新しいドメインやタスクへの一般化を妨げる。 LLM(Large Langauge Models)の開発は、豊富な知識を単一のプロキシに圧縮する能力を示し、複数のタスクに効果的に対処できるようにする。予備実験では, LLMは論理的推論の能力を示していない。論理的推論ベンチマークにおけるLLMのパフォーマンスは、既存の最先端のベースラインよりもはるかに遅れている。本稿では,自己教師付きポストトレーニングを通じて論理知識を組み込むことの実現可能性について検討し,論理LLM(LogicLLM)と呼ぶコンテキスト内学習を通じてそれを活性化する試みを行う。具体的には、自動回帰客観的なMERItを考案し、パラメータサイズが30億から13億の2つのLLM系列、すなわちFLAN-T5とLLaMAと統合する。 2つの挑戦的な論理的推論ベンチマークの結果は、LogicLLMの有効性を示している。さらに、論理指向のプロキシタスクを設計する上で重要な要素を分析するために、広範囲にわたるアブレーション研究を行っている。

Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevertheless, show that LLMs do not show capability on logical reasoning. The performance of LLMs on logical reasoning benchmarks is far behind the existing state-of-the-art baselines. In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training, and activating it via in-context learning, which we termed as LogicLLM. Specifically, we devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM. Besides, we conduct extensive ablation studies to analyze the key factors in designing logic-oriented proxy tasks.

翻訳日:2024-06-19 13:00:15 公開日:2024-06-17

# 3次元点雲のラベル有効深層学習に関する調査研究

A Survey of Label-Efficient Deep Learning for 3D Point Clouds ( http://arxiv.org/abs/2305.19812v2 )

ライセンス: Link先を確認

Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu,

(参考訳) 過去10年間で、深層ニューラルネットワークは、ポイントクラウド学習において大きな進歩を遂げてきた。しかし、大規模で正確に注釈付けされたトレーニングデータの収集は非常に困難で費用がかかるため、既存のポイントクラウドデータセットのスケーラビリティが損なわれ、さまざまなタスクやアプリケーションにおけるポイントクラウドデータの効率的な探索のボトルネックとなる。ラベル効率のよい学習は、多量のアノテーションによる効果的なディープネットワークトレーニングを可能にすることで、有望なソリューションを提供する。本稿では,点雲のラベル効率学習に関する包括的調査を行う。この新興研究分野における3つの重要な疑問に対処する。一ポイントクラウド処理におけるラベル効率学習の重要性及び緊急性二被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被被三この分野での進歩そこで本研究では,ラベルの種類によって提供されるデータ前提条件に基づいて,ラベル効率のよい学習手法を整理する分類法を提案する。データ拡張、ドメイン移行学習、弱教師付き学習、事前訓練された基礎モデルという、ポイントクラウドアノテーションの取り組みを著しく削減する、ラベル効率のよい4つの典型的な学習手法を分類する。それぞれのアプローチについて、問題設定の概要と、関連する進展と課題を示す広範な文献レビューを提供する。最後に、現在の研究課題と今後の方向性についての洞察を共有します。この調査に関連するプロジェクトはhttps://github.com/xiaoaoran/3D_label_efficient_learning.comに構築されている。

In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated training data is extremely laborious and expensive, which hinders the scalability of existing point cloud datasets and poses a bottleneck for efficient exploration of point cloud data in various tasks and applications. Label-efficient learning offers a promising solution by enabling effective deep network training with much-reduced annotation efforts. This paper presents the first comprehensive survey of label-efficient learning of point clouds. We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area. To achieve this, we propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels. We categorize four typical label-efficient learning approaches that significantly reduce point cloud annotation efforts: data augmentation, domain transfer learning, weakly-supervised learning, and pretrained foundation models. For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges. Finally, we share insights into current research challenges and potential future directions. A project associated with this survey has been built at https://github.com/xiaoaoran/3D_label_efficient_learning.

翻訳日:2024-06-19 13:00:14 公開日:2024-06-17

# ArtWhisperer:芸術創造における人間とAIのインタラクションを特徴付けるデータセット

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations ( http://arxiv.org/abs/2306.08141v4 )

ライセンス: Link先を確認

Kailas Vodrahalli, James Zou,

(参考訳) 生成的AIがより普及するにつれて、人間のユーザがそのようなモデルとどのように相互作用するかを研究することが重要である。本研究では,対象画像の生成にテキスト・ツー・イメージ・モデルをどのように利用するかを検討する。このインタラクションを研究するために、私たちはArtWhispererというオンラインゲームを作成しました。このゲームを通して、5万以上の人間とAIのインタラクションを記録し、各インタラクションは、ユーザが生成した1つのテキストプロンプトと、それに対応する生成された画像に対応する。その多くは、ユーザがターゲットイメージの最良のプロンプトを見つけるために反復的なインタラクションであり、これは人間とAIのコラボレーションを研究するためのユニークなシーケンシャルデータセットである。本データセットの初期分析では,迅速なインタラクションとユーザ戦略のいくつかの特徴を同定する。人々は多様なプロンプトを提出し、類似した画像を生成するさまざまなテキスト記述を発見できる。興味深いことに、ユーザがより良いプロンプトを見つけるため、迅速な多様性は低下しない。さらに,我々のデータセットを用いたAIの聴取可能性の定量化のための新しい指標を提案する。我々は、タスクを適切に完了させるために必要な相互作用の期待数として、ステアビリティを定義する。この値は、各目標タスクにマルコフ連鎖を適合させ、マルコフ連鎖の適切なスコアに到達するための期待時間を計算することで推定する。我々は、異なるタイプのターゲットイメージと2つの異なるモデルでAIのステアビリティを定量化し比較し、都市と自然世界のイメージが芸術的、幻想的なイメージよりもステアビリティが高いことを発見した。これらの知見は、AIとAIの相互作用に関する洞察を与え、AIのステアビリティを評価する具体的な方法を示し、ArtWhispererデータセットの汎用性を実証する。

As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.

翻訳日:2024-06-19 13:00:14 公開日:2024-06-17

# RoMe:メッシュ表現による大規模道路表面再構築に向けて

RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation ( http://arxiv.org/abs/2306.11368v3 )

ライセンス: Link先を確認

Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang,

(参考訳) 自律運転アプリケーションでは、正確で効率的な道路表面の再構築が最重要である。本稿では,大規模道路路面の堅牢な復元を目的とした新しいフレームワークであるRoMeを紹介する。ユニークなメッシュ表現を活用することで、再構築された路面が正確で、セマンティックスとシームレスに整合していることを保証する。計算効率の課題に対処するため,我々は,RoMeがサブアレーに着目し,その後にマージすることで,広大な環境を再構築できる経路点サンプリング戦略を提案する。さらに,外因性キャリブレーションにおける不正確性に対するロバスト性を高めるために,外因性最適化モジュールを組み込んだ。パブリックデータセットとワイルドデータの両方に対する広範な評価は、速度、正確性、堅牢性という点で、RoMeの優位性を示している。たとえば、何千もの画像から600*600平方メートルの道路表面を回収するのに2GPU時間しかかからない。特に、RoMeの機能は単なる再構築を超えて、自律運転アプリケーションにおける自動ラベリングタスクに重要な価値を提供する。関連するすべてのデータとコードはhttps://github.com/DRosemei/RoMe.comで入手できる。

In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for autolabeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe.

翻訳日:2024-06-19 13:00:14 公開日:2024-06-17

# HEDI : 初回臨床応用と臨床応用 : 内臓ヘルニア修復のためのバイオメカニカル・アセスメント・ビジュアライゼーション・ツールの開発

HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair ( http://arxiv.org/abs/2307.01502v2 )

ライセンス: Link先を確認

Philipp D. Lösel, Jacob J. Relle, Samuel Voß, Ramesch Raschidi, Regine Nessel, Johannes Görich, Mark O. Wielpütz, Thorsten Löffler, Vincent Heuveline, Friedrich Kallinowski,

(参考訳) 腹壁欠損は、しばしば痛み、不快感、また切開ヘルニアの再発を招き、世界中で重大な致命傷と外科的修復を繰り返している。大きなヘルニアに対するメッシュ修復は通常、筋肉の活性化、腹部内圧、組織弾性、腹部壁の伸展といった生体力学的要因を無視して、重なりが固定された欠損領域に基づいている。この問題を解決するため,不安定な腹壁を考慮に入れた切開ヘルニア修復に対する生体力学的アプローチを提案する。また,Valsalva操作によるCTを用いて,ヘルニアの大きさ,容積,腹部壁の不安定性を自動検出・評価するHEDIも導入した。 31例の術前評価におけるHEDIの初回臨床応用は, 術後3年経過した後の無痛, ヘルニア再発を伴わない症例で, 報告例と比較して有意に改善した。

Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, neglecting biomechanical factors such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distension. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and experiencing no hernia recurrence after three years of follow-up.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 近似二重量子ドットにおけるフェルミオンパリティ量子ビット

Fermion-parity qubit in a proximitized double quantum dot ( http://arxiv.org/abs/2307.05678v2 )

ライセンス: Link先を確認

Max Geier, Rubén Seoane Souto, Jens Schulenborg, Serwan Asaad, Martin Leijnse, Karsten Flensberg,

(参考訳) 超伝導体に結合した量子ドットのバウンド状態は、異なる電子数を持つが同じフェルミオンパリティを持つ状態のコヒーレントな重ね合わせにある。静電ゲーティングは、この重ね合わせを、電子数パリティとは無関係に同じ平均電荷を持つ量子ドットのスイートスポットに調整することができる。ここでは,ジョセフソン接合に埋め込まれた2つのトンネル結合量子ドットの局所フェルミオンパリティの量子情報を符号化する。スイートスポットでは、クォービット状態は電荷双極子モーメントがゼロである。これにより、各ドットの電位に作用する電荷ノイズと(弱)ドット間トンネルのゆらぎにより、クォービットが劣化するのを防ぐ。弱いドット間トンネルでは、不整合量子ビット状態のため緩和が抑制される。一方、強いドット間トンネルの場合、システムはそれぞれの量子ドットに影響を与えるノイズ(エネルギーレベルノイズ、ドット-超伝導トンネル変動、超微細相互作用)に対して保護される。最後に、ゲート電圧をパルスすることで、初期化および読み出し、およびシングルキュービットおよび2キュービットゲートを記述する。

Bound states in quantum dots coupled to superconductors can be in a coherent superposition of states with different electron number but with the same fermion parity. Electrostatic gating can tune this superposition to a sweet spot, where the quantum dot has the same mean electric charge independent of its electron-number parity. Here, we propose to encode quantum information in the local fermion parity of two tunnel-coupled quantum dots embedded in a Josephson junction. At the sweet spot, the qubit states have zero charge dipole moment. This protects the qubit from dephasing due to charge noise acting on the potential of each dot, as well as fluctuations of the (weak) inter-dot tunneling. At weak inter-dot tunneling, relaxation is suppressed because of disjoint qubit states. On the other hand, for strong inter-dot tunneling the system is protected against noise affecting each quantum dot separately (energy level noise, dot-superconductor tunneling fluctuations, and hyperfine interactions). Finally, we describe initialization and readout as well as single-qubit and two-qubit gates by pulsing gate voltages.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# ディープニューラルネットワークにおける定量的CLT

Quantitative CLTs in Deep Neural Networks ( http://arxiv.org/abs/2307.06092v5 )

ライセンス: Link先を確認

Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati,

(参考訳) ランダムなガウス重みとバイアスを持つ完全連結ニューラルネットワークの分布について検討し,隠れた層幅が大きな定数$n$に比例することを示した。非線型性に関する軽度な仮定の下では、大まかではあるが有限の$n$および任意の固定されたネットワーク深さで有効な正規近似の量的境界を求める。我々の定理は有限次元分布と全過程について、ランダムに連結されたネットワーク(およびその導関数)と対応する無限幅ガウス過程の間の距離が$n^{-\gamma}$ for $\gamma>0$ であることを示す。我々の境界は、それまでの文献よりもネットワーク幅に依存しているという点で強く、一次元の場合、それらが最適であること、すなわち一致した下界を確立することを証明する。

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 形式的解釈性を考慮した確率的制約付き強化学習

Probabilistic Constrained Reinforcement Learning with Formal Interpretability ( http://arxiv.org/abs/2307.07084v4 )

ライセンス: Link先を確認

Yanran Wang, Qiuchen Qian, David Boyle,

(参考訳) 強化学習は、変動力学を用いた逐次決定問題に対する効果的な推論を提供することができる。しかし、実際的な実装におけるそのような推論は、報酬関数と対応する最適ポリシーを解釈する上で、永続的な課題となる。したがって、逐次意思決定問題を確率的推論として表すことは、原則として、この推論は、確率的力学を推論し、政策最適化の確率論的解釈を示唆しながら、多様で強力な数学的ツールを提供する。本研究では,これらの解釈可能性問題に対処するために,適応ワッサースタイン変分最適化(AWaVO)を提案する。提案手法は,コンバージェンス保証の解釈可能性,透明性の訓練,本質的な決定解釈を実現するために形式的手法を用いる。その実用性を示すために,シミュレーションおよび実運用4次タスクにおいて,最適な大域収束率で解釈可能性を示す。 TRPO-IPO、PCPO、CRPOといった最先端のベンチマークと比較して、AWaVOがハイパフォーマンスと十分な解釈可能性の間に合理的なトレードオフをもたらすことを実証的に検証する。

Reinforcement learning can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and the corresponding optimal policy. Consequently, representing sequential decision-making problems as probabilistic inference can have considerable value, as, in principle, the inference offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of policy optimization. In this study, we propose a novel Adaptive Wasserstein Variational Optimization, namely AWaVO, to tackle these interpretability challenges. Our approach uses formal methods to achieve the interpretability for convergence guarantee, training transparency, and intrinsic decision-interpretation. To demonstrate its practicality, we showcase guaranteed interpretability with an optimal global convergence rate in simulation and in practical quadrotor tasks. In comparison with state-of-the-art benchmarks including TRPO-IPO, PCPO and CRPO, we empirically verify that AWaVO offers a reasonable trade-off between high performance and sufficient interpretability.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 大規模言語モデルを用いた深度検索のためのソフトプロンプトチューニング

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models ( http://arxiv.org/abs/2307.08303v5 )

ライセンス: Link先を確認

Zhiyuan Peng, Xuyang Wu, Qifan Wang, Yi Fang,

(参考訳) Dense Search (DR) はクエリとドキュメントを密埋め込みに変換し、ベクトル空間におけるクエリとドキュメント間の類似度を測定する。 DRの課題のひとつは、ドメイン固有のトレーニングデータがないことだ。 DRモデルは、転送学習を通じてMS MARCOのような大規模な公開データセットから学習することができるが、すべてのDRモデルやドメインが転送学習の恩恵を受けるわけではないことが証拠として示される。近年、一部の研究者はゼロショットと少数ショットのDRモデルを改善するために大規模言語モデル(LLM)を活用している。しかし、これらの作業で使われるハードプロンプトや人書きプロンプトは、生成された弱いクエリの質を保証できない。そこで本研究では,DR(SPTAR)強化のためのソフトプロンプトチューニングを提案する。各タスクに対して,ソフトプロンプトチューニングを活用して,限られた真実データに基づいてタスク固有のソフトプロンプトを最適化する。我々は、弱いタグ付きクエリの品質をさらに向上させるために、高品質な文書クエリペアを選択するフィルタを設計する。我々の知る限り、DRモデルを増強するためにソフトプロンプトチューニングを利用する事前の作業はない。実験により、SPTARは、教師なしベースラインBM25と、最近提案されたDRのLLMベースの拡張法よりも優れていることが示された。

Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 構文対応複合価値ニューラルマシン翻訳

Syntax-Aware Complex-Valued Neural Machine Translation ( http://arxiv.org/abs/2307.08586v2 )

ライセンス: Link先を確認

Yang Liu, Yuexian Hou,

(参考訳) シンタクスは神経機械翻訳(NMT)において極めて効果的であることが証明されている。従来のモデルは構文解析ツールから構文情報を取得し、翻訳性能を向上させるためにNMTモデルに統合された。本研究では,構文情報を複雑なエンコーダ・デコーダアーキテクチャに組み込む手法を提案する。提案モデルは,単語レベルと構文レベルのアテンションスコアを,アテンション機構を用いて,ソース側からターゲット側へ共同で学習する。重要なのは、特定のネットワークアーキテクチャに依存しておらず、既存のシークエンス・ツー・シーケンス(Seq2Seq)フレームワークに直接統合可能であることだ。実験により,提案手法は2つのデータセット上でのBLEUスコアを大幅に改善できることを示した。特に,提案手法は,意味的な構文的差異を持つ言語ペアを含む翻訳タスクにおいて,BLEUスコアをより向上させる。

Syntax has been proven to be remarkably effective in neural machine translation (NMT). Previous models obtained syntax information from syntactic parsing tools and integrated it into NMT models to improve translation performance. In this work, we propose a method to incorporate syntax information into a complex-valued Encoder-Decoder architecture. The proposed model jointly learns word-level and syntax-level attention scores from the source side to the target side using an attention mechanism. Importantly, it is not dependent on specific network architectures and can be directly integrated into any existing sequence-to-sequence (Seq2Seq) framework. The experimental results demonstrate that the proposed method can bring significant improvements in BLEU scores on two datasets. In particular, the proposed method achieves a greater improvement in BLEU scores in translation tasks involving language pairs with significant syntactic differences.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 大規模言語モデルに基づくファズドライバ生成の理解

Understanding Large Language Model Based Fuzz Driver Generation ( http://arxiv.org/abs/2307.12469v4 )

ライセンス: Link先を確認

Cen Zhang, Mingqiang Bai, Yaowen Zheng, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, Yang Liu,

(参考訳) LLM(Large Language Model)ファズドライバ生成は有望な研究分野である。従来のプログラム分析ベースの手法とは異なり、このテキストベースのアプローチはより一般的であり、様々なAPI使用情報を利用することができる。しかし、その効果や潜在的な課題など、この方向の根本的な問題に対する理解の欠如がまだ残っている。このギャップを埋めるために,LLMを用いてファズドライバを効果的に生成する上での重要な課題を対象とした,最初の詳細な研究を行った。本研究は,30の広く利用されているCプロジェクトから86のファズドライバ生成質問を収集した,キュレートされたデータセットを特徴とする。 6つのプロンプト戦略は、5つの異なる温度設定を持つ5つの最先端のLCMで設計およびテストされる。合計で736,430個のファジィドライバを評価したところ、トークンのコストは0.85億ドル(8000ドル以上)だった。さらに,LLM生成ドライバを産業用ドライバと比較し,ファジリング実験(3.75 CPU-year)を行った。 LLMをベースとしたファズドライバ生成は有望な方向であるが、実用的アプリケーションに対するいくつかの障害に直面している; - LLMは複雑な仕様を持つAPIに対して効果的なファズドライバを生成するのに困難に直面している。繰り返しクエリの発行、例によるクエリ、反復的なクエリプロセスの採用、 – LLMの生成したドライバは、業界で使用されているものと同等のファジィな結果を得ることができるが、含まれたAPI使用の延長や、論理的なバグ検出を容易にするセマンティックオーラクルの統合など、拡張する大きなチャンスがある。我々の洞察はOSS-Fuzz-Genプロジェクトを改善するために実装され、業界におけるファズドライバの実践的生成を促進しました。

LLM-based (Large Language Model) fuzz driver generation is a promising research area. Unlike traditional program analysis-based method, this text-based approach is more general and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: - While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; - LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; - While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 繰り返しビーム分割によるコヒーレンス

Coherence via reiterated beam splitting ( http://arxiv.org/abs/2307.13279v2 )

ライセンス: Link先を確認

Guillermo Díez, Laura Ares, Alfredo Luis,

(参考訳) ビームスプリッターは、量子コヒーレンスに関する自由な操作である。その結果、コヒーレント状態と非コヒーレント状態の両方からコヒーレンスを生成することができる。ビームスプリッタのカスケードによるコヒーレンスの増加について検討した。この目的のために、2つの異なる構成を構築し、入力状態の異なるシーケンスを解析する。

Beam splitters are not-free operations with regard to quantum coherence. As a consequence, they can create coherence from both coherent and incoherent states. We investigate the increase in coherence produced by cascades of beam splitters. To this end, we construct two different configurations and analyze different sequences of input states.

翻訳日:2024-06-19 12:50:30 公開日:2024-06-17

# 最近近傍における最小Q学習

Minimax Optimal Q Learning with Nearest Neighbors ( http://arxiv.org/abs/2308.01490v2 )

ライセンス: Link先を確認

Puning Zhao, Lifeng Lai,

(参考訳) マルコフ決定プロセス(MDP)を連続状態空間で解析することは一般的に困難である。最近の興味深い研究 \cite{shah2018q} は、隣接する$Q$学習アプローチによって境界付き連続状態空間を持つ MDP を解き、サンプル複雑性は $\tilde{O}(\frac{1}{\epsilon^{d+3}(1-\gamma)^{d+7}})$ for $\epsilon$-accurate $Q$ function Estimation with discount factor $\gamma$ である。本稿では,オフライン設定用とオンライン設定用という,近接した2つの学習方法を提案する。これら2つの方法のサンプル複雑度は$\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+2}})$と$\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+3}})$である。試料をより効率的に利用し, 改良を図っている。特に、 \cite{shah2018q} のメソッドは反復後にすべてのサンプルをクリアするので、これらのサンプルは幾らか無駄になる。一方、オフラインメソッドはサンプルを削除せず、オンラインメソッドは、$\beta t$ at time $t$と$\beta$が調整可能なパラメータである時間でのみ、サンプルを削除します。サンプルの複雑さを別にすれば、我々の手法は計算の複雑さを向上し、非有界な状態空間に適しているという利点もある。

Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor $Q$ learning approach, which has a sample complexity of $\tilde{O}(\frac{1}{\epsilon^{d+3}(1-\gamma)^{d+7}})$ for $\epsilon$-accurate $Q$ function estimation with discount factor $\gamma$. In this paper, we propose two new nearest neighbor $Q$ learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are $\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+2}})$ and $\tilde{O}(\frac{1}{\epsilon^{d+2}(1-\gamma)^{d+3}})$ for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over $\epsilon$. We achieve such improvement by utilizing the samples more efficiently. In particular, the method in \cite{shah2018q} clears up all samples after each iteration, thus these samples are somewhat wasted. On the other hand, our offline method does not remove any samples, and our online method only removes samples with time earlier than $\beta t$ at time $t$ with $\beta$ being a tunable parameter, thus our methods significantly reduce the loss of information. Apart from the sample complexity, our methods also have additional advantages of better computational complexity, as well as suitability to unbounded state spaces.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# 超流動ヘリウム中の単一電子スピン検出法の提案

A proposal for detecting the spin of a single electron in superfluid helium ( http://arxiv.org/abs/2308.07174v2 )

ライセンス: Link先を確認

Jinyong Ma, Y. S. S. Patil, Jiaxin Yu, Yiqi Wang, J. G. E. Harris,

(参考訳) 超流動ヘリウム中の電子バブルは2つの自由度を持ち、電子のスピンと気泡の運動という非常に低い散逸をもたらす可能性がある。これらの自由度が十分な感度で読み出され、制御できるなら、様々な量子技術を実現し、超流動ヘリウムの物理学におけるオープンな疑問を探求するための新しいプラットフォームを提供するだろう。本稿では,超流動充填光音響キャビティ内で電子気泡を捕捉し,これを実現するための実用的な手法を提案する。

The electron bubble in superfluid helium has two degrees of freedom that may offer exceptionally low dissipation: the electron's spin and the bubble's motion. If these degrees of freedom can be read out and controlled with sufficient sensitivity, they would provide a novel platform for realizing a range of quantum technologies and for exploring open questions in the physics of superfluid helium. Here we propose a practical scheme for accomplishing this by trapping an electron bubble inside a superfluid-filled opto-acoustic cavity.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# Image-to-Point Cloud Saliency Transferを用いた注意誘導ライダーセグメンテーションとオドメトリー

Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer ( http://arxiv.org/abs/2308.14332v2 )

ライセンス: Link先を確認

Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, Masahiro Murakawa, Ryosuke Nakamura,

(参考訳) LiDAR計測と3Dセマンティックセグメンテーションは自動運転に不可欠であり、近年顕著な進歩を遂げている。しかし,これらの課題は,3次元セマンティックセグメンテーションの異なるセマンティックカテゴリにおけるポイントの不均衡や,LiDAR odometry 推定における動的オブジェクトの影響により,ロバストな特徴学習のための参照ポイントとして代表/サレントなランドマークを使用することの重要性が高まっているため,課題である。これらの課題に対処するために、注意情報を活用してLiDAR odometry 推定とセマンティックセグメンテーションモデルの性能を向上させるサリエンシ誘導手法を提案する。画像領域とは異なり、注釈付きトレーニングデータがないため、ポイントクラウドのサリエンシ情報に対処した研究はごくわずかである。これを緩和するために、私たちはまず、カラー画像からポイントクラウドに塩分分布の知識を伝達するための普遍的なフレームワークを提示し、これを用いてポイントクラウドのための擬似塩分分布データセット(すなわちFordSaliency)を構築する。そこで我々は,Pseudo-SaliencyラベルからSalLiDARモジュールを学習するために,ポイントクラウドベースのバックボーンを導入し,それに続いてSalLiDARモジュールを提案する。 SalLiDARは3次元セマンティックセマンティックセマンティクスモデルであり、セマンティクス性能を向上させるために、サリエンシ情報を統合する。最後に、SalLiDARのセマンティックおよびサリエンシ予測を用いて、より優れたオドメトリー推定を実現する自己教師型サリエンシ誘導型LiDARオドメトリーネットワークであるSalLONetを紹介する。提案したSalLiDARモデルとSalLONetモデルが既存の手法に対する最先端性能を実現し,画像からLiDARへのサリエンシ知識伝達の有効性を明らかにした。ソースコードはhttps://github.com/nevrez/SalLONet.comで入手できる。

LiDAR odometry estimation and 3D semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3D semantic segmentation and the influence of dynamic objects for LiDAR odometry estimation, which increases the importance of using representative/salient landmarks as reference points for robust feature learning. To address these challenges, we propose a saliency-guided approach that leverages attention information to improve the performance of LiDAR odometry estimation and semantic segmentation models. Unlike in the image domain, only a few studies have addressed point cloud saliency information due to the lack of annotated training data. To alleviate this, we first present a universal framework to transfer saliency distribution knowledge from color images to point clouds, and use this to construct a pseudo-saliency dataset (i.e. FordSaliency) for point clouds. Then, we adopt point cloud-based backbones to learn saliency distribution from pseudo-saliency labels, which is followed by our proposed SalLiDAR module. SalLiDAR is a saliency-guided 3D semantic segmentation model that integrates saliency information to improve segmentation performance. Finally, we introduce SalLONet, a self-supervised saliency-guided LiDAR odometry network that uses the semantic and saliency predictions of SalLiDAR to achieve better odometry estimation. Our extensive experiments on benchmark datasets demonstrate that the proposed SalLiDAR and SalLONet models achieve state-of-the-art performance against existing methods, highlighting the effectiveness of image-to-LiDAR saliency knowledge transfer. Source code will be available at https://github.com/nevrez/SalLONet.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# IDVT:ソーシャルレコメンデーションのための関心ある認知的認知とビュー誘導型チューニング

IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation ( http://arxiv.org/abs/2308.15926v2 )

ライセンス: Link先を確認

Dezhao Yang, Jianghong Ma, Shanshan Feng, Haijun Zhang, Zhao Zhang,

(参考訳) 情報時代のレコメンデーションシステムは,情報のフィルタリングやユーザの好みの特定に不可欠である。オンラインソーシャルプラットフォームは、貴重な補助情報を提供することで、これらのシステムを豊かにしている。ソーシャル接続されたユーザは、同様の好みを共有し、レコメンデーションの精度を高め、コールドスタートの問題に対処することが想定される。しかし、実証的な発見は、特定の社会的つながりがシステムのパフォーマンスを実際に損なう可能性があることを明らかにし、この仮定に挑戦する。我々の統計分析は、多くのソーシャル・コネクテッド・ユーザーが共通の関心を共有していないソーシャルネットワークにおいて、かなりの量のノイズを示唆している。この問題に対処するために,社会レコメンデーションのための革新的な \underline{I}nterest-aware \underline{D}enoising と \underline{V}ieded \underline{T}uning (IDVT) 手法を提案する。第1のID部は、社会的つながりを効果的に認知する。具体的には、ソーシャルネットワークの構造とユーザインタラクションの利害関係をグローバルな視点で考察する。さらに、このグローバルな視点では、デノベートされたソーシャル情報(社会ドメイン)を、ユーザとイテムの相互作用(協調ドメイン)の伝播に統合し、ゲーティング機構を用いて2つのドメインからのユーザ表現を集約する。我々の第2のVT部では、ユーザ関心の潜在的な損失に対処し、グローバルビュー内のモデルロバスト性を高めるために、コントラスト学習を通じて、グローバルビューに微調整されたユーザ表現のための2つの追加ビュー(ローカルビューとドロップアウトエンハンスビュー)を導入している。ノイズ比の異なる実世界のデータセットに対する広範囲な評価は、最先端の社会的レコメンデーション手法よりもIDVTの方が優れていることを示す。

In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge the assumption, revealing that certain social connections can actually harm system performance. Our statistical analysis indicates a significant amount of noise in the social network, where many socially connected users do not share common interests. To address this issue, we propose an innovative \underline{I}nterest-aware \underline{D}enoising and \underline{V}iew-guided \underline{T}uning (IDVT) method for the social recommendation. The first ID part effectively denoises social connections. Specifically, the denoising process considers both social network structure and user interaction interests in a global view. Moreover, in this global view, we also integrate denoised social information (social domain) into the propagation of the user-item interactions (collaborative domain) and aggregate user representations from two domains using a gating mechanism. To tackle potential user interest loss and enhance model robustness within the global view, our second VT part introduces two additional views (local view and dropout-enhanced view) for fine-tuning user representations in the global view through contrastive learning. Extensive evaluations on real-world datasets with varying noise ratios demonstrate the superiority of IDVT over state-of-the-art social recommendation methods.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# ICLEF: 説明可能なスタイル転送のためのエキスパートフィードバックによるインコンテキスト学習

ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer ( http://arxiv.org/abs/2309.08583v2 )

ライセンス: Link先を確認

Arkadiy Saakyan, Smaranda Muresan,

(参考訳) 最先端の大規模言語モデル(LLM)は、あるスタイルから別のスタイルへのテキストの適応に優れるが、現在の作業はスタイル転送モデルの説明可能性に対処するものではない。近年の研究では、より大きな教師モデルからテキストによる説明を作成し、それをより小さな学生モデルに蒸留する方法が検討されている。このアプローチの課題の1つは、LCM出力には、修正する専門知識を必要とするエラーが含まれているかもしれないが、コストと可用性のために専門家のフィードバックを集め、取り入れることは困難である。この課題に対処するため,本論文では,文脈内学習と自己批判を組み合わせ,少ない専門家によるフィードバックを取り入れた,新しい人間-AI協調型蒸留手法であるICLEFを提案する。提案手法は,形式性(e-GYAFC)と主観的バイアス(e-WNC)のための高品質な合成説明可能なスタイル転送データセットを生成する。自動的, 人的評価により, 一般教師モデルでは, 単発で説明可能なスタイル伝達タスクにおいて, 教師モデルよりも優れ, 教師モデルと比較し, データの質と専門家のフィードバックの役割を強調した。本研究は,e-GYAFCで微調整された小型モデルによる説明は,教師による説明よりも著者の予測性が高いことを示す。

While state-of-the-art large language models (LLMs) can excel at adapting text from one style to another, current work does not address the explainability of style transfer models. Recent work has explored generating textual explanations from larger teacher models and distilling them into smaller student models. One challenge with such approach is that LLM outputs may contain errors that require expertise to correct, but gathering and incorporating expert feedback is difficult due to cost and availability. To address this challenge, we propose ICLEF, a novel human-AI collaboration approach to model distillation that incorporates scarce expert human feedback by combining in-context learning and model self-critique. We show that our method leads to generation of high-quality synthetic explainable style transfer datasets for formality (e-GYAFC) and subjective bias (e-WNC). Via automatic and human evaluation, we show that specialized student models fine-tuned on our datasets outperform generalist teacher models on the explainable style transfer task in one-shot settings, and perform competitively compared to few-shot teacher models, highlighting the quality of the data and the role of expert feedback. In an extrinsic task of authorship attribution, we show that explanations generated by smaller models fine-tuned on e-GYAFC are more predictive of authorship than explanations generated by few-shot teacher models.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# 性能保証を用いたρ-POMDPの簡易化

Measurement Simplification in ρ-POMDP with Performance Guarantees ( http://arxiv.org/abs/2309.10701v2 )

ライセンス: Link先を確認

Tom Yotam, Vadim Indelman,

(参考訳) 不確実性の下での意思決定は、不完全な情報で行動する自律システムの中心にある。意思決定問題を解決するコストは行動や観察空間において指数関数的であり、多くのオンラインシステムでは実現不可能である。本稿では,高次元観測空間を分割することで,効率的な意思決定手法を提案する。分割された観測空間を用いて、一般的な信念分布に対する期待される情報理論的報酬に関する解析的境界を定式化する。これらのバウンダリは、パフォーマンスの保証を維持しながら、効率的に計画するために使用される。境界は適応的で、計算効率が良く、元の解に収束していることが示される。分割パラダイムを拡張し、分割空間の階層構造を示し、計画の効率性を高める。次に、ガウス的信念に対するこれらの境界の特定の変種を提案し、少なくとも 4 の係数の理論的性能改善を示す。最後に,本手法を,能動SLAMシナリオ,シミュレーション,実実験において,他の最先端アルゴリズムと比較する。どちらの場合も、性能保証を伴う計画の大幅なスピードアップを示します。

Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information. The cost of solving the decision making problem is exponential in the action and observation spaces, thus rendering it unfeasible for many online systems. This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space. Using the partitioned observation space, we formulate analytical bounds on the expected information-theoretic reward, for general belief distributions. These bounds are then used to plan efficiently while keeping performance guarantees. We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution. We extend the partitioning paradigm and present a hierarchy of partitioned spaces that allows greater efficiency in planning. We then propose a specific variant of these bounds for Gaussian beliefs and show a theoretical performance improvement of at least a factor of 4. Finally, we compare our novel method to other state of the art algorithms in active SLAM scenarios, in simulation and in real experiments. In both cases we show a significant speed-up in planning with performance guarantees.

翻訳日:2024-06-19 12:40:28 公開日:2024-06-17

# 広告における金のストライク:広告テキスト生成の標準化と探索

Striking Gold in Advertising: Standardization and Exploration of Ad Text Generation ( http://arxiv.org/abs/2309.12030v2 )

ライセンス: Link先を確認

Masato Mita, Soichiro Murakami, Akihiko Kato, Peinan Zhang,

(参考訳) 手動広告作成の限界に対応するため、自動広告テキスト生成(ATG)分野において重要な研究がなされている。しかし、包括的なベンチマークと明確に定義された問題セットの欠如は、異なる方法の比較を困難にしている。これらの課題に対処するため、ATGのタスクを標準化し、マルチモーダル情報の利用を慎重に設計し、産業的評価を容易にする第1のベンチマークデータセットであるCAMERAを提案する。従来の手法から,大規模言語モデル(LLM)を含む最先端モデルまで,9つのベースラインによる広範な実験は,現状と今後の課題を示している。また、ATGの既存の指標とLLMに基づく評価器が人間の評価とどのように一致しているかについても検討する。

In response to the limitations of manual ad creation, significant research has been conducted in the field of automatic ad text generation (ATG). However, the lack of comprehensive benchmarks and well-defined problem sets has made comparing different methods challenging. To tackle these challenges, we standardize the task of ATG and propose a first benchmark dataset, CAMERA, carefully designed and enabling the utilization of multi-modal information and facilitating industry-wise evaluations. Our extensive experiments with a variety of nine baselines, from classical methods to state-of-the-art models including large language models (LLMs), show the current state and the remaining challenges. We also explore how existing metrics in ATG and an LLM-based evaluator align with human evaluations.

翻訳日:2024-06-19 12:40:27 公開日:2024-06-17

# 活性物質の深層学習確率フローとエントロピー生成速度

Deep learning probability flows and entropy production rates in active matter ( http://arxiv.org/abs/2309.12991v2 )

ライセンス: Link先を確認

Nicholas M. Boffi, Eric Vanden-Eijnden,

(参考訳) 自己推進コロイドから運動性細菌への活性物質系は、顕微鏡スケールで、自由エネルギーを有用な仕事に変換することで特徴づけられる。これらは平衡統計力学の範囲を超えて物理学を巻き込み、その非平衡状態の性質を理解することが永続的な課題である。エントロピー生成速度と確率電流は、時間反転対称性の分解を測定することで定量的な方法を提供する。しかし、それらの効率的な計算は、システムの未知かつ高次元の確率密度に依存するため、いまだ解明されていない。そこで本研究では, 生成モデリングの最近の進歩に基づき, この密度のスコアを推定する深層学習フレームワークを開発する。本研究では, 微視的運動方程式とともに, エントロピー生成速度, 確率電流, および個々の粒子からの局所的寄与への分解にアクセスできることを示す。このスコアを表現するために,粒子間の高次相互作用を基礎となる置換対称性を尊重しながら学習する,空間的に局所的なトランスフォーマーネットワークアーキテクチャを導入する。運動誘発相分離法(MIPS)を施行した活性粒子の高次元システムに適用することにより,本手法の幅広い有用性と拡張性を実証する。本研究では,4096粒子を1つの充填率で学習した1つのネットワークが,最大32768粒子を含む相図の他の領域に一般化可能であることを示す。本研究では, 粒子数と充填率の関数として, MIPSにおける平衡からの離脱の空間構造を定量化する。

Active matter systems, from self-propelled colloids to motile bacteria, are characterized by the conversion of free energy into useful work at the microscopic scale. They involve physics beyond the reach of equilibrium statistical mechanics, and a persistent challenge has been to understand the nature of their nonequilibrium states. The entropy production rate and the probability current provide quantitative ways to do so by measuring the breakdown of time-reversal symmetry. Yet, their efficient computation has remained elusive, as they depend on the system's unknown and high-dimensional probability density. Here, building upon recent advances in generative modeling, we develop a deep learning framework to estimate the score of this density. We show that the score, together with the microscopic equations of motion, gives access to the entropy production rate, the probability current, and their decomposition into local contributions from individual particles. To represent the score, we introduce a novel, spatially-local transformer network architecture that learns high-order interactions between particles while respecting their underlying permutation symmetry. We demonstrate the broad utility and scalability of the method by applying it to several high-dimensional systems of active particles undergoing motility-induced phase separation (MIPS). We show that a single network trained on a system of 4096 particles at one packing fraction can generalize to other regions of the phase diagram, including systems with as many as 32768 particles. We use this observation to quantify the spatial structure of the departure from equilibrium in MIPS as a function of the number of particles and the packing fraction.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# ValueDCG: 言語モデルの包括的人間的価値理解能力の測定

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models ( http://arxiv.org/abs/2310.00378v4 )

ライセンス: Link先を確認

Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang,

(参考訳) 人的価値は人間の意思決定の背後にある重要な要素である。大きな言語モデル(LLM)が人間の決定に大きく影響していることを考えると、人間の価値を正確に理解して安全性を確保することが不可欠である。しかし、これらの値の把握は、その値が複雑で適応可能な性質のため複雑である。 LLMの価値を真に理解するには、"know what"と"know why"の両方を考慮する必要がある、と私たちは主張する。そこで本研究では,2つの側面を定量的に評価するための総合評価指標であるValueDCG(Value Discriminator-Critique Gap)を提案する。 4つの代表的なLCMを評価し,LLMの「何」と「なぜ」の能力の成長率がパラメータ数の増加と一致しないことを示す。このことは、LLMが提供されたコンテキストに基づいて、その固有の価値を真に理解せず、潜在的なリスクを示さずに、もっともらしい説明を行うかもしれないことを示唆している。

Personal values are a crucial factor behind human decision-making. Considering that Large Language Models (LLMs) have been shown to impact human decisions significantly, it is essential to make sure they accurately understand human values to ensure their safety. However, evaluating their grasp of these values is complex due to the value's intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present a comprehensive evaluation metric, ValueDCG (Value Discriminator-Critique Gap), to quantitatively assess the two aspects with an engineering implementation. We assess four representative LLMs and provide compelling evidence that the growth rates of LLM's "know what" and "know why" capabilities do not align with increases in parameter numbers, resulting in a decline in the models' capacity to understand human values as larger amounts of parameters. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# 思考伝播:大規模言語モデルを用いた複雑推論における分析的アプローチ

Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models ( http://arxiv.org/abs/2310.03965v3 )

ライセンス: Link先を確認

Junchi Yu, Ran He, Rex Ying,

(参考訳) 大規模言語モデル(LLM)は、プロンプトメソッドの開発において、タスクの推論において顕著な成功を収めた。しかし、既存のプロンプトアプローチは、類似の問題を解決するための洞察を再利用できず、多段階の推論において累積誤差に悩まされる。これらの問題に対処するために、類似した問題を探索し、それらの解を活用してLLMの複雑な推論能力を向上する「textbf{\textit{Thought Propagation} (TP)」を提案する。これらの類似した問題は、再利用可能な解と問題解決戦略を持つ入力問題と関係している。そのため、従来の類似問題の解決に関する洞察を広めて、新たな問題解決を促すことが期待されている。これを実現するため,TP は LLM に対して,入力問題に関連する類似問題の集合を提案し,解決するよう促す。そして、TPは、類似問題の結果を再利用して、新しい解を直接生成するか、あるいは、スクラッチから得られた初期解を修正するための知識集約的な実行計画を導出する。 TPは既存のプロンプトアプローチと互換性があり、タスク固有のプロンプトエンジニアリングに多くの労力をかけることなく、プラグイン・アンド・プレイの一般化と幅広いタスクの強化を可能にしている。 3つの課題にわたる実験により、TPは、最短経路推論における最適解の発見における平均12倍の絶対的な増加、創造的記述における人間の嗜好の13倍の改善、LLM-Agent Planningのタスク完了率の15倍の強化により、ベースラインよりも大幅に改善されていることを示した。

Large Language Models (LLMs) have achieved remarkable success in reasoning tasks with the development of prompting methods. However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. These analogous problems are related to the input one, with reusable solutions and problem-solving strategies. Thus, it is promising to propagate insights of solving previous analogous problems to inspire new problem-solving. To achieve this, TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. Then, TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch. TP is compatible with existing prompting approaches, allowing plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering. Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12\% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13\% improvement of human preference in Creative Writing, and 15\% enhancement in the task completion rate of LLM-Agent Planning.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# ProbTS: 横方向予測ホライズンにおけるベンチマークポイントと分布予測

ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons ( http://arxiv.org/abs/2310.07446v4 )

ライセンス: Link先を確認

Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, Jiang Bian,

(参考訳) 予測地平線の範囲に正確な点と分布予測を提供することは、様々な業界における時系列予測の適用において、重要かつ永続的な課題である。時系列予測のためのディープラーニングモデル開発に関する先行研究は、長期点予測や短期確率推定のような孤立した側面にしばしば集中している。この狭い焦点は、難解な方法論的選択をもたらし、これらのモデルの未知のシナリオへの適応性を阻害する可能性がある。普遍的な予測モデルの開発の傾向は高まっているが、その利点や欠点について、特に点や分布予測といった重要な予測ニーズについて、特に短い地平線と長い地平線をまたいだものについては、十分に理解されていない。本稿では、これらの基本的な予測ニーズを評価し、近年の多くの最先端研究の厳密な比較分析を行うために、統一的なプラットフォームとして設計されたベンチマークツールであるProbTSを提案する。異なる予測要求から生じる特徴データの特徴を識別し、これらの特徴が典型的な研究軌跡において方法論的嗜好を損なうことができるかを明らかにする。これに基づいて, 時系列予測の最新モデルについて検討し, 方法論的強みと弱みの分析がこれらの普遍的モデルにも適用可能であることを明らかにする。最後に、現在の研究に固有の限界を概説し、今後の探査にいくつかの道のりを画定する。

Delivering precise point and distributional forecasts across a spectrum of prediction horizons represents a significant and enduring challenge in the application of time-series forecasting within various industries. Prior research on developing deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilistic estimations. This narrow focus may result in skewed methodological choices and hinder the adaptability of these models to uncharted scenarios. While there is a rising trend in developing universal forecasting models, a thorough understanding of their advantages and drawbacks, especially regarding essential forecasting needs like point and distributional forecasts across short and long horizons, is still lacking. In this paper, we present ProbTS, a benchmark tool designed as a unified platform to evaluate these fundamental forecasting needs and to conduct a rigorous comparative analysis of numerous cutting-edge studies from recent years. We dissect the distinctive data characteristics arising from disparate forecasting requirements and elucidate how these characteristics can skew methodological preferences in typical research trajectories, which often fail to fully accommodate essential forecasting needs. Building on this, we examine the latest models for universal time-series forecasting and discover that our analyses of methodological strengths and weaknesses are also applicable to these universal models. Finally, we outline the limitations inherent in current research and underscore several avenues for future exploration.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# 質問応答モデルにおけるバイアスの影響の追跡による緩和

Mitigating Bias for Question Answering Models by Tracking Bias Influence ( http://arxiv.org/abs/2310.08795v2 )

ライセンス: Link先を確認

Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng,

(参考訳) 様々なNLPタスクのモデルはステレオタイプを示すことが示されており、QA(QA)モデルのバイアスは特に有害であり、出力回答はエンドユーザーが直接消費する可能性がある。 QAモデルのバイアスを評価するデータセットは存在するが、QAモデルのバイアス緩和技術はまだ未検討である。本研究では,複数選択QAモデルのバイアスを軽減するためのBMBIを提案する。モデルがバイアスのある例から学んだ場合、よりバイアスに傾くように傾くという直感に基づいて、別のインスタンスへの影響を観察して、クエリインスタンスのバイアスレベルを測定します。影響のあるインスタンスがよりバイアスを受ければ、クエリインスタンスはバイアスを受けます。次に、最適化目的として検出されたバイアスレベルを用いて、元のQAタスクに加えてマルチタスク学習環境を構築する。さらに、包括的かつ敏感な方法でバイアスを定量化する新しいバイアス評価指標を導入する。本手法は,複数のバイアスカテゴリにまたがる複数のQA定式化に適用可能であることを示す。 BBQデータセットの9つのバイアスカテゴリのバイアスレベルを、同等のQA精度を維持しながら大幅に低減することができる。

Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# 単分子運動制御

Single-molecule motion control ( http://arxiv.org/abs/2310.09296v2 )

ライセンス: Link先を確認

Divyam Neer Verma, KV Chinmaya, Jan Heck, G Mohan Rao, Sonia Contera, Moumita Ghosh, Siddharth Ghosh,

(参考訳) 高時空間分解能で単一分子の動的操作と制御を実現することは、原子スケールコンピューティングとナノロボティクスの進展に重要である。しかし、この試みは、原子と分子の相互作用の複雑な性質、ナノスケールシステムの高次元特性、実験データの不足によって批判的に挑戦されている。本稿では, 格子構造内における基本表面電荷から生じる静電力を利用して, 表面への埋め込み電荷を模倣することにより, 単分子拡散を制御するための玩具モデルを提案する。状態依存拡散方程式とグリーン関数を組み合わせた単一分子拡散過程における量子力学と静電相互作用の相互作用について検討する。その結果, 表面電荷密度は拡散係数に大きく影響し, クーロン力に類似した線形スケーリングを示すことがわかった。実験拡散定数を正確に予測し、観測範囲を6000$\mu\text{m}^2\text{ms}^{-1}$および80000$\mu\text{m}^2\text{ms}^{-1}$まで拡張する。我々のモデルにより予測された分子軌道は、特に重力支援加速度のような挙動において、惑星運動に類似している。ナノロボティクス、ナノスケールでの運動制御、特に原子と分子のトラップが不可欠である分子と量子コンピューティングの分野でのコンピューティング応用への変革的な意味を持っている。原子/分子操作のための最先端の光学格子と走査型トンネル顕微鏡の他に、我々はアングストロームスケールでの量子操作による単一分子ダイナミクスの精密制御の利点を明確化している。

Achieving dynamic manipulation and control of single molecules at high spatio-temporal resolution is pivotal for advancing atomic-scale computing and nanorobotics. However, this endeavour is critically challenged by complex nature of atomic and molecular interactions, high-dimensional characteristics of nanoscale systems, and scarcity of experimental data. Here, we present a toy model for controlling single-molecule diffusion by harnessing electrostatic forces arising from elementary surface charges within a lattice structure, mimicking embedded charges on a surface. We investigate the interplay between quantum mechanics and electrostatic interactions in single molecule diffusion processes using a combination of state-dependent diffusion equations and Green's functions. We find that surface charge density critically influences diffusion coefficients, exhibiting linear scaling akin to Coulombic forces. We achieve accurate predictions of experimental diffusion constants and extending the observed range to values reaching up to 6000 $\mu\text{m}^2\text{ms}^{-1}$ and 80000 $\mu\text{m}^2\text{ms}^{-1}$. The molecular trajectories predicted by our model bear resemblance to planetary motion, particularly in their gravity-assisted acceleration-like behaviour. It holds transformative implications for nanorobotics, motion control at the nanoscale, and computing applications, particularly in the areas of molecular and quantum computing where the trapping of atoms and molecules is essential. Beyond the state-of-the-art optical lattice and scanning tunnelling microscopy for atomic/molecular manipulation, our findings give unambiguous advantage of precise control over single-molecule dynamics through quantum manipulation at the angstrom scale.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# 音声認識のための多段階大規模言語モデル補正

Multi-stage Large Language Model Correction for Speech Recognition ( http://arxiv.org/abs/2310.11532v2 )

ライセンス: Link先を確認

Jie Pu, Thai-Son Nguyen, Sebastian Stüker,

(参考訳) 本稿では,大規模言語モデル(LLM)を用いて,競合音声認識システムの性能向上を図る。従来のLLMに基づくASR誤り訂正法とは違って,ASR出力の不確実性推定とLLMの推論能力を利用した新しい多段階手法を提案する。具体的には、提案手法には2つの段階がある: 第一段階は、ASRの不確実性の推定であり、N-bestリストの仮説を利用して、信頼性の低い転写を識別する。この修正タスクは多段階ルールに基づくLCM推論プロセスとして定式化され、明示的に記述されたルールを使用して、タスクを具体的な推論ステップに分解する。提案手法の有効性は,複数のテスト領域およびゼロショット設定において,競合するASRシステムに対するWERの10%～20%の相対的な改善を示すことによって実証された。

In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from previous LLM-based ASR error correction methods, we propose a novel multi-stage approach that utilizes uncertainty estimation of ASR outputs and reasoning capability of LLMs. Specifically, the proposed approach has two stages: the first stage is about ASR uncertainty estimation and exploits N-best list hypotheses to identify less reliable transcriptions; The second stage works on these identified transcriptions and performs LLM-based corrections. This correction task is formulated as a multi-step rule-based LLM reasoning process, which uses explicitly written rules in prompts to decompose the task into concrete reasoning steps. Our experimental results demonstrate the effectiveness of the proposed method by showing 10% ~ 20% relative improvement in WER over competitive ASR systems -- across multiple test domains and in zero-shot settings.

翻訳日:2024-06-19 12:30:40 公開日:2024-06-17

# 一般化乳癌切除のための進行性デュアルプリオリネットワーク

Progressive Dual Priori Network for Generalized Breast Tumor Segmentation ( http://arxiv.org/abs/2310.13574v2 )

ライセンス: Link先を確認

Li Wang, Lihui Wang, Zixiang Kuai, Lei Tang, Yingfeng Ou, Chen Ye, Yuemin Zhu,

(参考訳) 乳房腫瘍セグメント化モデルの一般化能力の向上と,より小型で低コントラストで不規則な形状の乳房腫瘍に対するセグメンテーション性能の向上を目的として,異なるセンターで取得したダイナミックエンハンスメント磁気共鳴画像(DCE-MRI)から乳房腫瘍を分割するプログレッシブ・デュアルプライオリティ・ネットワーク(PDPNet)を提案する。 PDPNetは,まず粗いセグメンテーションをベースとした局在モジュールを持つ腫瘍領域を収穫し,弱いセマンティックオーディションとクロススケール相関の事前知識を用いて乳房腫瘍マスクを徐々に改良した。 PDPNetの有効性を検証するため,マルチセンタデータセット上での最先端手法との比較を行った。その結果, PDPNet の DSC と HD95 はそれぞれ5.13%, 7.58% 改善した。さらに, 局所化モジュールが正常組織の影響を低減し, モデルの一般化能力を向上させることを実証した。弱いセマンティクスにより、腫瘍領域に焦点を合わせることで、欠損した小腫瘍や低コントラスト腫瘍を避けることができる。クロススケール相関は不規則腫瘍の形状認識能力を促進するのに有用である。したがって、それらを統合されたフレームワークに統合することで、マルチセンターの乳がんセグメンテーション性能が向上した。ソースコードとオープンデータはhttps://github.com/wangli100209/PDPNetでアクセスできる。

To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# 階層型ガウス過程とニューラルネットワーク回帰によるカリフォルニア・セントラルバレーの地下水位モデリング

Modeling groundwater levels in California's Central Valley by hierarchical Gaussian process and neural network regression ( http://arxiv.org/abs/2310.14555v2 )

ライセンス: Link先を確認

Anshuman Pradhan, Kyra H. Adams, Venkat Chandrasekaran, Zhen Liu, John T. Reager, Andrew M. Stuart, Michael J. Turmon,

(参考訳) カリフォルニア州のセントラル・バレー(CV)で連続的に地下水位をモデル化することは、低品質の井戸データが時間と空間にわたって希少にサンプリングされ、困難である。一貫性のある井戸データがないため、2012-2015年の激しい干ばつの後、2017年と2019年の湿潤年がCV地下水に与える影響を評価するのは難しい。 CV帯水層における3次元岩相テクスチャモデルから学習することにより,地下水位をモデル化するための新しい機械学習手法を定式化した。提案法は,ガウス過程(GP)とディープニューラルネットワーク(DNN)を組み合わせて多変量回帰を行う。階層的モデリング手法はDNNを訓練し、GPによる非パラメトリック回帰が実行されるリソロジー的に情報を得た潜在空間を学習する。高速かつ確実な不確実性定量化を伴う井戸データの非定常特性をモデル化するためのGP-DNN回帰の有効性を示す。本研究では,不規則な井戸データを持つ流域における帯水層応答に対する水文学的理解を補うためにモデル予測がどのように用いられるかを示す。以上の結果から,2017年と2019年のカリフォルニアの湿潤年は,前回の干ばつによる地下水損失の補充にはほとんど効果がなかったことが示唆された。

Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. The lack of consistent well data makes it difficult to evaluate the impact of 2017 and 2019 wet years on CV groundwater following a severe drought during 2012-2015. A novel machine learning method is formulated for modeling groundwater levels by learning from a 3D lithological texture model of the CV aquifer. The proposed formulation performs multivariate regression by combining Gaussian processes (GP) and deep neural networks (DNN). The hierarchical modeling approach constitutes training the DNN to learn a lithologically informed latent space where non-parametric regression with GP is performed. We demonstrate the efficacy of GP-DNN regression for modeling non-stationary features in the well data with fast and reliable uncertainty quantification, as validated to be statistically consistent with the empirical data distribution from 90 blind wells across CV. We show how the model predictions may be used to supplement hydrological understanding of aquifer responses in basins with irregular well data. Our results indicate that on average the 2017 and 2019 wet years in California were largely ineffective in replenishing the groundwater loss caused during previous drought years.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# 私の注意に基づくASRシステムはどれくらいのコンテキストが必要か?

How Much Context Does My Attention-Based ASR System Need? ( http://arxiv.org/abs/2310.15672v2 )

ライセンス: Link先を確認

Robert Flynn, Anton Ragni,

(参考訳) 音声認識のタスクでは,30秒以上の音響コンテキストの使用は稀であり,文献ではあまり語られていない。本研究では,音響モデルの訓練・評価に用いるシーケンス長が音声認識性能に与える影響について実験的検討を行った。これらの実験では、約10万個の擬似ラベル付きSpotifyポッドキャストのデータセットを使用し、コンテキストの長さは5秒から1時間である。ゼロショット評価は、Earnings-22、Tedlium、Rev16といったロングフォーマットデータセットに表示される。その結果、最大21.8分間の音響コンテキストでトレーニングを行うことの利点が示され、10秒のコンテキストでトレーニングしたベースラインから14.5\%の相対的な改善が見られた。モデルの幅・深度,位置符号化方式,注目点数などによって,より長いコンテキストを使うことができることが判明した。

For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon and under-investigated in literature. In this work, we conduct an empirical study on the effect of scaling the sequence length used to train/evaluate (dense-attention-based) acoustic models on speech recognition performance. For these experiments, a dataset of roughly 100,000 pseudo-labelled Spotify podcasts is used, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations are presented on the long-format datasets: Earnings-22, Tedlium and Rev16. Results demonstrate a benefit from training with up to 21.8 minutes of acoustic context, showing up to a 14.5\% relative improvement from a baseline trained with 10 seconds of context. We find that the model's width/depth, positional encoding scheme and number of attention heads impact its ability to use longer contexts.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# 生成言語モデルが社会的アイデンティティのバイアスを表わす

Generative Language Models Exhibit Social Identity Biases ( http://arxiv.org/abs/2310.15819v2 )

ライセンス: Link先を確認

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek,

(参考訳) 大規模言語モデルの人気が高まったことで、これらのモデルが人間から学べるバイアスに関する懸念が高まっている。 56大言語モデルに内集団連帯性と外集団敵意,社会心理学から知られている基本的社会的アイデンティティバイアスが存在するか否かを検討する。ほぼすべての基礎言語モデルといくつかの命令微調整モデルは、文の完全化を促されたときに、明確な非群陽性および非群陰性な関連を示す(例:「我々は...」)。この結果から,現代の言語モデルでは,実験室でも実世界のLLMとの会話においても,人間と同じ程度に基本的な社会的アイデンティティバイアスが示され,学習データと微調整のキュレーションにより,そのようなバイアスが軽減されることが示唆された。以上の結果から,LLMとユーザインタラクションのさらなる研究の必要性が示唆された。

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., "We are..."). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# 固有ベクトル継続と投影型エミュレータ

Eigenvector Continuation and Projection-Based Emulators ( http://arxiv.org/abs/2310.19419v3 )

ライセンス: Link先を確認

Thomas Duguet, Andreas Ekström, Richard J. Furnstahl, Sebastian König, Dean Lee,

(参考訳) 固有ベクトル継続(英: Eigenvector continuation)は、パラメータ集合の固有ベクトルスナップショットから派生した部分空間射影を用いたパラメトリック固有値問題の計算方法である。還元基底法(reduce-basis method)と呼ばれる、より広範な部分空間射影技法のクラスの一部である。本稿では固有ベクトル継続と射影型エミュレータの開発、理論、応用について述べる。本稿では,基本概念を紹介し,基礎となる理論と収束特性について論じるとともに,近年の量子システムへの応用と今後の展望について述べる。

Eigenvector continuation is a computational method for parametric eigenvalue problems that uses subspace projection with a basis derived from eigenvector snapshots from different parameter sets. It is part of a broader class of subspace-projection techniques called reduced-basis methods. In this colloquium article, we present the development, theory, and applications of eigenvector continuation and projection-based emulators. We introduce the basic concepts, discuss the underlying theory and convergence properties, and present recent applications for quantum systems and future prospects.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# Pseudorandom Statesから$\bot$-PRFsによる署名

Signatures From Pseudorandom States via $\bot$-PRFs ( http://arxiv.org/abs/2311.00847v4 )

ライセンス: Link先を確認

Mohammed Barhoush, Amit Behera, Lior Ozer, Louis Salvail, Or Sattath,

(参考訳) 量子擬似ランダム性の異なるフレーバーは、様々な暗号アプリケーションに有用であることが証明されており、これらのプリミティブは量子後片道関数よりも弱い可能性がある。 Ananth, Lin, and Yuen (2023) は、対数擬似ランダム状態が擬決定論的PRGを構成するのに使えることを示した。本研究では, $\bot$-PRG と $\bot$-PRF の新たな定義を導入する。正当性保証は、固定種の場合、無視可能な確率を除いて、出力が同一(確率1-1/poly$)または認識可能な中止($\bot$)である。当社のアプローチは、PRFの適応セキュリティと同様に、マルチタイムPRGセキュリティの自然な定義を認めている。疑似決定論的PRGから$\bot$-PRGを構築し、そこから$\bot$-PRFを得る。対称鍵暗号、コミットメント、MAC、長さ制限されたワンタイムデジタルシグネチャなど、ほとんどのミニ暗号化プリミティブは、様々な量子擬似ランダム性の仮定に基づいて示されているが、デジタルシグネチャは解明されていない。本研究の主な応用は,古典的な公開鍵と署名を備えた(量子)デジタル署名方式であり,森前と山川の作品(クリプト,2022年)に提示された未解決問題に対処するものである。さらに, タンパーレジリエントな量子公開鍵を用いたセキュアな公開鍵暗号を構築する。

Different flavors of quantum pseudorandomness have proven useful for various cryptographic applications, with the compelling feature that these primitives are potentially weaker than post-quantum one-way functions. Ananth, Lin, and Yuen (2023) have shown that logarithmic pseudorandom states can be used to construct a pseudo-deterministic PRG: informally, for a fixed seed, the output is the same with $1-1/poly$ probability. In this work, we introduce new definitions for $\bot$-PRG and $\bot$-PRF. The correctness guarantees are that, for a fixed seed, except with negligible probability, the output is either the same (with probability $1-1/poly$) or recognizable abort, denoted $\bot$. Our approach admits a natural definition of multi-time PRG security, as well as the adaptive security of a PRF. We construct a $\bot$-PRG from any pseudo-deterministic PRG and, from that, a $\bot$-PRF. Even though most mini-crypt primitives, such as symmetric key encryption, commitments, MAC, and length-restricted one-time digital signatures, have been shown based on various quantum pseudorandomness assumptions, digital signatures remained elusive. Our main application is a (quantum) digital signature scheme with classical public keys and signatures, thereby addressing a previously unresolved question posed in Morimae and Yamakawa's work (Crypto, 2022). Additionally, we construct CPA secure public-key encryption with tamper-resilient quantum public keys.

翻訳日:2024-06-19 12:20:53 公開日:2024-06-17

# AdaNCA: よりロバストな視覚変換器のアダプターとしての神経細胞性オートマタ

AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer ( http://arxiv.org/abs/2406.08298v2 )

ライセンス: Link先を確認

Yitao Xu, Tong Zhang, Sabine Süsstrunk,

(参考訳) 視覚変換器(ViT)は画像分類タスクにおいて、特に局所的な注意や畳み込みによる局所的な情報を備えた場合、顕著な性能を示した。このようなアーキテクチャは機能集約を粒度によって改善するが、ネットワークの堅牢性に寄与しないことが多い。ニューラルセルオートマタ(NCA)は、局所的な相互作用を通じてグローバルなセル表現のモデリングを可能にし、そのトレーニング戦略とアーキテクチャ設計は、ノイズの多い入力に対して強力な一般化能力と堅牢性をもたらす。本稿では,視覚変換器用Adaptor Neural Cellular Automata (AdaNCA)を提案する。標準的なNAAの計算オーバーヘッドを克服するために,より効率的な対話学習のための動的インタラクションを提案する。さらに,AdaNCAの配置解析とロバスト性改善に基づいて,AdaNCAの最も効果的な挿入点を同定するアルゴリズムを開発した。パラメータの3%未満の増加により、AdaNCAはImageNet1Kベンチマークの敵攻撃下での精度の10%以上の絶対的な改善に貢献している。さらに,8つのロバスト性ベンチマークと4つのViTアーキテクチャに対して,プラグインモジュールであるAdaNCAが常にViTのロバスト性を改善することを実証した。

Vision Transformers (ViTs) have demonstrated remarkable performance in image classification tasks, particularly when equipped with local information via region attention or convolutions. While such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global cell representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy inputs. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformer that uses NCA as plug-in-play adaptors between ViT layers, enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning. Furthermore, we develop an algorithm for identifying the most effective insertion points for AdaNCA based on our analysis of AdaNCA placement and robustness improvement. With less than a 3% increase in parameters, AdaNCA contributes to more than 10% absolute improvement in accuracy under adversarial attacks on the ImageNet1K benchmark. Moreover, we demonstrate with extensive evaluations across 8 robustness benchmarks and 4 ViT architectures that AdaNCA, as a plug-in-play module, consistently improves the robustness of ViTs.

翻訳日:2024-06-19 12:11:07 公開日:2024-06-17

# VLind-Bench: 大規模視覚言語モデルにおける言語事前測定

VLind-Bench: Measuring Language Priors in Large Vision-Language Models ( http://arxiv.org/abs/2406.08702v2 )

ライセンス: Link先を確認

Kang-il Lee, Minbeom Kim, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung,

(参考訳) LVLM(Large Vision-Language Models)は、様々なマルチモーダルタスクにおいて優れた性能を示す。しかし、それらは、画像情報を無視しながら、テキストパターンのみに基づいて応答が生成される、言語事前(Language prior)と呼ばれる問題に悩まされている。事前言語の問題に対処することは、トレーニングディストリビューション外の画像を扱う際に、望ましくない偏見や幻覚を引き起こす可能性があるため、非常に重要である。その重要性にもかかわらず、LVLMにおける言語先行を正確に測定する現在の手法は、あまり研究されていない。既存のベンチマークは、反ファクトやアウト・オブ・ディストリビューションのイメージに基づいており、部分的に言語先行を計測することができるが、言語先行を他の要因から切り離すことはできない。この目的のために我々は,LVLM の言語先行,すなわち盲点を測定するために設計された最初のベンチマークである VLind-Bench という新しいベンチマークを提案する。言語先行性を評価するために、対物画像に関するテストを含むだけでなく、コモンセンス知識、視覚知覚、コモンセンスバイアスなど、より基本的な機能を評価する一連のテストも含んでいる。ベンチマーク中の各インスタンスについて、これらの基本テストが言語事前評価の前にパスされることを保証し、その結果、他の要素が評価に与える影響を最小限に抑える。近年のLVLMの評価と分析により,ほぼすべてのモデルが言語先行に大きく依存していることが判明した。

Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.

翻訳日:2024-06-19 12:01:13 公開日:2024-06-17

# フレキシブル・高効率密度行列アルゴリズムによる任意テンソルネットワークの近似縮約

Approximate Contraction of Arbitrary Tensor Networks with a Flexible and Efficient Density Matrix Algorithm ( http://arxiv.org/abs/2406.09769v2 )

ライセンス: Link先を確認

Linjian Ma, Matthew Fishman, Miles Stoudenmire, Edgar Solomonik,

(参考訳) テンソルネットワークの収縮は、統計物理学、量子コンピューティング、計算機科学で広く使われている。低ランク近似を用いてテンソルネットワークの縮約を効率的に近似する手法を提案し、この縮約時に生成された各中間テンソルを低ランク二分木テンソルネットワークとして近似する。提案アルゴリズムは,低ランク近似を行う場合,環境の大部分を組み込むことが可能である。ここでは、この環境はネットワーク内のテンソルの残りの集合を指し、より大きな環境を持つ低ランク近似は一般により高い精度を提供する。格子上に定義されたテンソルネットワークを縮約するために、提案アルゴリズムは標準境界ベースアルゴリズムの一般化と見なすことができる。さらに、このアルゴリズムは、一般的なグラフ構造を持つテンソルネットワークを木構造に近似するためのコスト効率の高い密度行列アルゴリズムを含む。実験結果から,提案手法は従来提案した近似テンソルネットワーク縮合アルゴリズムよりも精度と効率の両面から,複数の問題に対して優れていたことが示唆された。

Tensor network contractions are widely used in statistical physics, quantum computing, and computer science. We introduce a method to efficiently approximate tensor network contractions using low-rank approximations, where each intermediate tensor generated during the contractions is approximated as a low-rank binary tree tensor network. The proposed algorithm has the flexibility to incorporate a large portion of the environment when performing low-rank approximations, which can lead to high accuracy for a given rank. Here, the environment refers to the remaining set of tensors in the network, and low-rank approximations with larger environments can generally provide higher accuracy. For contracting tensor networks defined on lattices, the proposed algorithm can be viewed as a generalization of the standard boundary-based algorithms. In addition, the algorithm includes a cost-efficient density matrix algorithm for approximating a tensor network with a general graph structure into a tree structure, whose computational cost is asymptotically upper-bounded by that of the standard algorithm that uses canonicalization. Experimental results indicate that the proposed technique outperforms previously proposed approximate tensor network contraction algorithms for multiple problems in terms of both accuracy and efficiency.

翻訳日:2024-06-19 12:01:13 公開日:2024-06-17

# 組合せ最適化のためのQ-Learningを用いたポインタネットワーク

Pointer Networks with Q-Learning for Combinatorial Optimization ( http://arxiv.org/abs/2311.02629v3 )

ライセンス: Link先を確認

Alessandro Barro,

(参考訳) 本稿では、モデルフリーなQ値ポリシー近似をPointer Networks(Ptr-Nets)と統合したハイブリッドニューラルネットワークであるPointer Q-Network(PQN)を紹介し、長期的成果に焦点をあてて、注目に基づくシーケンス生成の最適性を高める。この統合は特に組合せ最適化(CO)タスク、特に本研究の焦点であるトラベリングセールスマン問題(TSP)の解決に有効である。 PQNと互換性のあるマルコフ決定プロセス(MDP)を定義することでこの問題に対処する。このプロセスは、コンテキストベクトルを生成し、ソフトマックスを適用する前に、利用可能なすべての状態-作用対について計算されたQ値によって動的に調整される生の注意スコアを算出する。得られた注目ベクトルは行動分布として利用され、PQNの探索・探索動的適応性によって選択される。実験により,本手法の有効性を実証し,不安定な環境でモデルをテストする。

We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) to enhance the optimality of attention-based sequence generation, focusing on long-term outcomes. This integration proves particularly effective in solving combinatorial optimization (CO) tasks, especially the Travelling Salesman Problem (TSP), which is the focus of our study. We address this challenge by defining a Markov Decision Process (MDP) compatible with PQN, which involves iterative graph embedding, encoding and decoding by an LSTM-based recurrent neural network. This process generates a context vector and computes raw attention scores, which are dynamically adjusted by Q-values calculated for all available state-action pairs before applying softmax. The resulting attention vector is utilized as an action distribution, with actions selected hinged to exploration-exploitation dynamic adaptibility of PQN. Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.

翻訳日:2024-06-19 11:31:28 公開日:2024-06-17

# 大規模言語モデルは知識推論のための文脈内教師である

Large Language Models are In-context Teachers for Knowledge Reasoning ( http://arxiv.org/abs/2311.06985v2 )

ライセンス: Link先を確認

Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu,

(参考訳) CoT(Chain-of- Thought)プロンプトは、単に情報検索以上の情報を必要とするクエリを推論するために、コンテキストにおいて大きな言語モデル(LLM)を教える。しかし、人間の専門家は通常、高コストでばらつきが高いインコンテキスト学習(ICL)のデモンストレーションを作成する必要がある。さらに重要なことは、ICLの有用な推論例を作る方法が明確でないことである。本研究は,LLMが知識推論に有効であるかどうかを考察する。我々は、人間の記憶検索における「エンコード特異性」仮説に従い、推論におけるインコンテキストの例は、トレーニングデータのエンコーディングコンテキストと一致するべきであると仮定する。そこで本研究では, LLM の自己記述的説明を実例から一般化する上で, 自己記述的説明を文脈内説明として用いることを提案する。自己説明は、人造の模範やその他のベースラインを用いて、著しく優れていた。さらに、文脈内教育において、LLMの自己説明とより類似した、異なる教師のLLMや人間の専門家による合理性は、私たちの符号化特異性仮説を支持する、より良い実証であることを明らかにした。そこで,本研究では,教師のLLMを学生に合わせるTeach-Backを提案する。例えば Teach-Back は 7B モデルで,より大きな GPT-3.5 をコンテキストで教えることができる。

Chain-of-thought (CoT) prompting teaches large language models (LLMs) in context to reason over queries that require more than mere information retrieval. However, human experts are usually required to craft demonstrations for in-context learning (ICL), which is expensive and has high variance. More importantly, how to craft helpful reasoning exemplars for ICL remains unclear. In this work, we investigate whether LLMs can be better in-context teachers for knowledge reasoning. We follow the ``encoding specificity'' hypothesis in human's memory retrieval to assume in-context exemplars at inference should match the encoding context in training data. We are thus motivated to propose Self-Explain to use one LLM's self-elicited explanations as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines. We further reveal that for in-context teaching, rationales by distinct teacher LLMs or human experts that more resemble the student LLM's self-explanations are better demonstrations, which supports our encoding specificity hypothesis. We then propose Teach-Back that aligns the teacher LLM with the student to enhance the in-context teaching performance. For example, Teach-Back enables a 7B model to teach the much larger GPT-3.5 in context, surpassing human teachers by around 5% in test accuracy on medical question answering.

翻訳日:2024-06-19 11:31:28 公開日:2024-06-17

# Auto-ICL:人間の監督を伴わないインテクスト学習

Auto-ICL: In-Context Learning without Human Supervision ( http://arxiv.org/abs/2311.09263v2 )

ライセンス: Link先を確認

Jinghan Yang, Shuming Ma, Furu Wei,

(参考訳) コンテキスト内学習能力により、適切なコンテキストを提供すると、大きな言語モデルの性能が大幅に向上する。しかし、既存の文脈内学習法は主に、ラベル付き例や明示的な指示など、人間が提供する文脈に依存している。人間によるコンテキスト記述は、様々なタスクに労働集約的であり、モデルが人間によって管理可能なタスクに制限される。これらの制約を克服するために,モデルが問題解決のための例と指示を自律的に生成できる自動文脈学習フレームワークを提案する。 Few-ShotやFew-Shot-CoTメソッドなど、モデル生成コンテキストは、Zero-CoTやAuto-CoTといった既存の自己生成コンテキストメソッドを上回っている。

With in-context learning ability, the performance of large language models can be significantly boosted when provided with appropriate context. However, existing in-context learning methods mainly rely on human-provided contexts, such as labeled examples and explicit instructions. Writing context by humans is labor-intensive on various tasks and limits the model to tasks manageable by humans. To overcome these limitations, we propose Automatic In-Context Learning framework that enables the model to autonomously generate examples and instructions for problem-solving. With experiments across various models and datasets, results show that model-generated contexts outperform human-annotated contexts, including Few-Shot and Few-Shot-CoT methods, and surpass existing self-generated context methods like Zero-CoT and Auto-CoT.

翻訳日:2024-06-19 11:31:28 公開日:2024-06-17

# InterControl: 全関節制御によるゼロショットヒューマンインタラクション生成

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint ( http://arxiv.org/abs/2311.15864v3 )

ライセンス: Link先を確認

Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai,

(参考訳) テキスト条件の運動合成は拡散モデルの出現とともに顕著な進歩を遂げた。しかしながら、これらの運動拡散モデルの大部分は、主に1つのキャラクタのために設計され、マルチヒューマンインタラクションを見落としている。提案手法では, ゼロショット方式で, 任意の大きさの文字群に対して, 人間の動きと相互作用を合成することにより, この問題を探究する。このアプローチのキーとなる側面は、人間の関節のペアとして人間のインタラクションを適応させることです。固定数の文字を持つ多人数動作データセット上でのトレーニング動作生成モデルを必要とする既存の手法とは対照的に,本手法は,任意の個数の個人を含む人間のインタラクションをモデル化する柔軟性を持ち,トレーニングデータに課される制約を超越する。関節間の所望距離を維持するために,新しい制御可能な運動生成手法であるInterControlを導入する。モーションコントローラと逆キネマティクス誘導モジュールで構成されており、合成された文字の関節を所望の場所に現実的に正確に整列させる。さらに, 既成のLarge Language Model (LLM) を用いて, ヒューマンインタラクションのための接合対間距離を生成できることを実証した。実験結果から,本フレームワークが複数の人体文字とのインタラクションを生成する能力と,既成の物理系シミュレータで作業する可能性を強調した。

Text-conditioned motion synthesis has made remarkable progress with the emergence of diffusion models. However, the majority of these motion diffusion models are primarily designed for a single character and overlook multi-human interactions. In our approach, we strive to explore this problem by synthesizing human motion with interactions for a group of characters of any size in a zero-shot manner. The key aspect of our approach is the adaptation of human-wise interactions as pairs of human joints that can be either in contact or separated by a desired distance. In contrast to existing methods that necessitate training motion generation models on multi-human motion datasets with a fixed number of characters, our approach inherently possesses the flexibility to model human interactions involving an arbitrary number of individuals, thereby transcending the limitations imposed by the training data. We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. It consists of a motion controller and an inverse kinematics guidance module that realistically and accurately aligns the joints of synthesized characters to the desired location. Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM). Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 独自のアウトプットから生じる大規模言語モデル:自己消費型学習ループの分析

Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop ( http://arxiv.org/abs/2311.16822v2 )

ライセンス: Link先を確認

Martin Briesch, Dominik Sobania, Franz Rothlauf,

(参考訳) 大規模言語モデル(LLM)は、様々なオンラインプラットフォーム向けのコンテンツを生成するために既に広く使われている。 LLM生成コンテンツと人為的コンテンツとを安全に区別できないため、LLM生成コンテンツは次世代のLLMを訓練するために使われ、自己消費型トレーニングループが生まれる。画像生成領域から、このような自己消費トレーニングループは、最終的にモデル崩壊で終わる画像の品質と多様性の両方を減少させる。しかし、このアラーム効果がLLMにも見られるかどうかは不明である。そこで本研究では,LSMの自己消費訓練ループについて検討した。さらに,LLM生成したコンテンツの正確性を曖昧に検証できる論理式に基づく新しい手法を提案する。自己消費学習ループは正しい出力を生成するが、使用済みデータの割合によって出力の多様性は低下する。新鮮なデータは、この減少を遅らせる可能性があるが、それを止めることはできない。これらの結果を踏まえ、我々は研究者にこのプロセスの無効化方法の研究を奨励する。

Large Language Models (LLM) are already widely used to generate content for a variety of online platforms. As we are not able to safely distinguish LLM-generated content from human-produced content, LLM-generated content is used to train the next generation of LLMs, giving rise to a self-consuming training loop. From the image generation domain we know that such a self-consuming training loop reduces both quality and diversity of images finally ending in a model collapse. However, it is unclear whether this alarming effect can also be observed for LLMs. Therefore, we present the first study investigating the self-consuming training loop for LLMs. Further, we propose a novel method based on logic expressions that allows us to unambiguously verify the correctness of LLM-generated content, which is difficult for natural language text. We find that the self-consuming training loop produces correct outputs, however, the output declines in its diversity depending on the proportion of the used generated data. Fresh data can slow down this decline, but not stop it. Given these concerning results, we encourage researchers to study methods to negate this process.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 3DオブジェクトのアノテーションにVLMベースのパイプラインを活用する

Leveraging VLM-Based Pipelines to Annotate 3D Objects ( http://arxiv.org/abs/2311.17851v2 )

ライセンス: Link先を確認

Rishabh Kabra, Loic Matthey, Alexander Lerchner, Niloy J. Mitra,

(参考訳) 事前訓練された視覚言語モデル(VLM)は、ラベルのない3Dオブジェクトを大規模にキャプションする機会を提供する。オブジェクトの異なるビュー(Luo et al , 2023)からのVLM記述を要約する主要なアプローチは、最終的な出力を生成するために言語モデル(GPT4)に依存している。このテキストベースの集約は、潜在的に矛盾する記述をマージするため、幻覚の影響を受けやすい。本稿では,VLMの応答に影響を与える視点などの要因を疎外する代替アルゴリズムを提案する。テキストのみの応答をマージする代わりに、VLMの合同画像テキストの可能性を利用する。確率的アグリゲーションは、より信頼性が高く、効率的であるだけでなく、人間の検証されたラベルに対するオブジェクトタイプをSoTAに当てはめている。集約されたアノテーションは条件付き推論にも有用であり、オブジェクトの型が補助的なテキストベースの入力として指定されたときに、下流の予測(オブジェクト材料の例)を改善する。このような補助的な入力は、教師なし環境における視覚的推論に対する視覚的推論の貢献を非難することを可能にする。これらの教師付きおよび教師なしの評価により、VLMベースのパイプラインをどのように活用して、Objaverseデータセットから764Kオブジェクトに対する信頼性の高いアノテーションを生成するかを示す。

Pretrained vision language models (VLMs) present an opportunity to caption unlabeled 3D objects at scale. The leading approach to summarize VLM descriptions from different views of an object (Luo et al., 2023) relies on a language model (GPT4) to produce the final output. This text-based aggregation is susceptible to hallucinations as it merges potentially contradictory descriptions. We propose an alternative algorithm to marginalize over factors such as the viewpoint that affect the VLM's response. Instead of merging text-only responses, we utilize the VLM's joint image-text likelihoods. We show our probabilistic aggregation is not only more reliable and efficient, but sets the SoTA on inferring object types with respect to human-verified labels. The aggregated annotations are also useful for conditional inference; they improve downstream predictions (e.g., of object material) when the object's type is specified as an auxiliary text-based input. Such auxiliary inputs allow ablating the contribution of visual reasoning over visionless reasoning in an unsupervised setting. With these supervised and unsupervised evaluations, we show how a VLM-based pipeline can be leveraged to produce reliable annotations for 764K objects from the Objaverse dataset.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 強化学習のための最適攻撃と防御

Optimal Attack and Defense for Reinforcement Learning ( http://arxiv.org/abs/2312.00198v2 )

ライセンス: Link先を確認

Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie,

(参考訳) 実システムにおける強化学習(Reinforcement Learning, RL)の有用性を確保するためには, 騒音や敵攻撃に対して堅牢であることを保証することが重要である。敵RLでは、外部攻撃者は、環境との相互作用を操作できる。我々は、オンライン操作攻撃の全クラスについて研究する。 (i)国家攻撃 (二観測攻撃(認識状態攻撃の一般化) (三)攻撃、及び (4)報酬攻撃。我々は,攻撃者の期待する報酬を最大化できるステルスシー攻撃を設計する際の問題点を,攻撃された相互作用によって引き起こされる真の環境ではなく,より高いレベルの環境をメタMDPと呼ぶマルコフ決定プロセス(MDP)によって捉えた。攻撃者は、多項式時間で計画したり、標準RL手法を用いて多項式サンプルの複雑さを学習することで、最適な攻撃を導出できることを示す。我々は,被害者に対する最適な防衛方針を,部分的に観測可能なターンベース確率ゲーム(POTBSG)にさらに単純化できる確率的スタックルバーグゲーム(英語版)の解として計算できると主張している。攻撃者も被害者も、それぞれの最適なポリシーから逸脱する恩恵を受けないため、そのような解決策は真に堅牢である。防御問題はNPハードであるが,多くのシナリオにおいて,最適マルコフディフェンスを多項式時間(サンプル複雑性)で計算(学習)できることを示す。

To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# コセットによるグロキング群乗法

Grokking Group Multiplication with Cosets ( http://arxiv.org/abs/2312.06581v2 )

ライセンス: Link先を確認

Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman,

(参考訳) ディープニューラルネットワークの複雑で予測不可能な性質は、多くのハイテイクなアプリケーションで安全な使用を妨げている。ディープニューラルネットワークを解釈するために開発されたテクニックは数多くあるが、いずれもかなりの制限がある。アルゴリズムタスクは、ニューラルネットワークをエンドツーエンドに解釈するための実りあるテスト場であることが証明されている。以前の研究に基づいて、置換群$S_5$と$S_6$の算術的な 'grokked'' を持つ1つの隠れた層ネットワークを完全にリバースエンジニアリングしました。モデルは全群の真の部分群構造を発見し、置換群の部分群を用いて群演算を分解するニューラルネットワークに収束する。我々は,モデル機構のリバースエンジニアリングについて考察し,この理論が回路の機能の忠実な記述であることを確認した。また,本研究をChughtai et al [4]と比較することで,解釈可能性研究における現在の課題にも注意を払っている。

The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have ``grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group's subgroups. We relate how we reverse engineered the model's mechanisms and confirmed our theory was a faithful description of the circuit's functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. [4] which alleges to find a different algorithm for this same problem.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 安全なマルチタスクベイズ最適化に向けて

Towards Safe Multi-Task Bayesian Optimization ( http://arxiv.org/abs/2312.07281v3 )

ライセンス: Link先を確認

Jannis O. Lübsen, Christian Hespe, Annika Eichler,

(参考訳) ベイズ最適化は、高サンプリング効率とノイズロバスト性のために、システムの安全なオンライン最適化のための非常に効果的なツールとして登場した。効率をさらに高めるため、システムの物理モデルを最適化プロセスに組み込むことができ、高速化することができる。これらのモデルは実際のシステムの近似を提供することができ、それらの評価は極めて安価である。モデルと現実の類似性は、最適化プロセス内で学習される追加のハイパーパラメータによって表現される。安全はベイズ最適化のようなオンライン最適化手法にとって重要な基準であり、既知のハイパーパラメータの仮定の下で安全保証を提供する最近の研究によって解決されている。しかし実際には、これは適用されない。そこで我々は,マルコフ連鎖モンテカルロ法によるハイパーパラメータ後部分布からの信頼領域の計算を含むマルチタスク設定を満たすために,ロバストなガウス過程の一様誤差境界を拡張した。その後、モデルの測定を取り入れつつ、システムの安全な最適化を容易にするために堅牢な安全性境界が採用される。シミュレーションの結果,従来のベイズ最適化法に比べて高コストで性能評価を行うことが可能であることが示唆された。

Bayesian optimization has emerged as a highly effective tool for the safe online optimization of systems, due to its high sample efficiency and noise robustness. To further enhance its efficiency, reduced physical models of the system can be incorporated into the optimization process, accelerating it. These models are able to offer an approximation of the actual system, and evaluating them is significantly cheaper. The similarity between the model and reality is represented by additional hyperparameters, which are learned within the optimization process. Safety is a crucial criterion for online optimization methods such as Bayesian optimization, which has been addressed by recent works that provide safety guarantees under the assumption of known hyperparameters. In practice, however, this does not apply. Therefore, we extend the robust Gaussian process uniform error bounds to meet the multi-task setting, which involves the calculation of a confidence region from the hyperparameter posterior distribution utilizing Markov chain Monte Carlo methods. Subsequently, the robust safety bounds are employed to facilitate the safe optimization of the system, while incorporating measurements of the models. Simulation results indicate that the optimization can be significantly accelerated for expensive to evaluate functions in comparison to other state-of-the-art safe Bayesian optimization methods, contingent on the fidelity of the models.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 77点以上のテキストトークン:CLIP-Style ModelsをDense Captionで評価

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions ( http://arxiv.org/abs/2312.08578v2 )

ライセンス: Link先を確認

Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, Adriana Romero-Soriano,

(参考訳) 膨大なビジョン言語データセットのキュレーション方法は、データセットのサイズと品質をトレードオフする。しかし、利用可能なキャプションの最高品質でさえ、画像のリッチな視覚的詳細を捉えるには、あまりにも短すぎる。濃密で高度に整合した画像テキストペアの価値を示すために,1000語以上を平均的に表現した7805の自然画像を含むDensely Captioned Images (DCI)データセットを収集した。画像の特定の部分に関連する正確かつ信頼性の高いキャプションを用いて、画像内容の視覚言語モデル(VLM)理解を、各キャプションと対応するサブクロップとを一致させる新しいタスクで評価することができる。現在のモデルは77のテキストトークンに制限されることが多いため、各キャプションの長さが制限された要約版(sDCI)も導入する。標準ベンチマークを進歩させる最新の技術は、我々のsDCIベースのベンチマークの大幅な改善と一致しないことを示す。最後に, sDCIを用いてCLIPを微調整し, トレーニングセットが小さいにもかかわらず, ベースラインを大幅に改善した。人間の注釈付き高密度画像キャプションデータセットを初めてリリースすることで、次世代のVLMのための新しいベンチマークや微調整のレシピの開発を可能にしたいと考えています。

Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of available curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with mask-aligned descriptions averaging above 1000 words each. With precise and reliable captions associated with specific parts of an image, we can evaluate vision-language models' (VLMs) understanding of image content with a novel task that matches each caption with its corresponding subcrop. As current models are often limited to 77 text tokens, we also introduce a summarized version (sDCI) in which each caption length is limited. We show that modern techniques that make progress on standard benchmarks do not correspond with significant improvement on our sDCI based benchmark. Lastly, we finetune CLIP using sDCI and show significant improvements over the baseline despite a small training set. By releasing the first human annotated dense image captioning dataset, we hope to enable the development of new benchmarks or fine-tuning recipes for the next generation of VLMs to come.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# CAT:好ましくないグラフをトリミングするための因果グラフ注意ネットワーク

CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph ( http://arxiv.org/abs/2312.08672v3 )

ライセンス: Link先を確認

Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li,

(参考訳) グラフ注意ネットワーク(GAT)に採用されているローカルアテンション誘導メッセージパッシングメカニズム(LAMP)は、グラフ上のより優れたローカルアグリゲーションのために、近隣ノードの重要性を適応的に学習するように設計されている。しかし、既存のGATは、異種近傍の高割合が中央ノードの自己アテンションを弱め、表現空間内の類似ノードから中央ノードが逸脱する結果となるため、異種グラフの顕著な識別能力の低下に悩まされる。本稿では, 隣接ノードが生成するこのような効果をディストラクション効果(DE)と呼ぶ。隣接するノードのDEを推定し、弱めるために、ヘテロ親和性グラフ(CAT)をトリミングするための因果グラフ注意ネットワークを提案する。 DEを推定するために、DEは2つの経路(近傍に割り当てられた注意をグラフ化し、中央ノードの自己注意を減らす)を通して生成されるので、因果推定の一種であり、介入データから推定できるDEのモデルにトータルエフェクトを使用します。我々は提案したCATフレームワークのベースモデルとして3つの代表的GATを採用し、3つの異なるサイズで7つのヘテロ親和性データセット上で実験を行う。比較実験により、CATは全てのベースGATモデルのノード分類精度を向上させることができることが示された。アブレーション実験と可視化により、CATによる識別能力の向上がさらに検証された。ソースコードはhttps://github.com/GeoX-Lab/CATで入手できる。

Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# 物理インフォームドニューラルネットワークによる軟組織非線形生体力学モデルにおける材料特性の推定

Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models ( http://arxiv.org/abs/2312.09787v3 )

ライセンス: Link先を確認

Federica Caforio, Francesco Regazzoni, Stefano Pagani, Elias Karabelas, Christoph Augustin, Gundolf Haase, Gernot Plank, Alfio Quarteroni,

(参考訳) 臨床応用のための生物物理モデルの開発は、その予測的性質と臨床データの解釈を支援する能力のおかげで、研究コミュニティで急速に進展している。しかし、高解像度で正確な多物理計算モデルは計算コストが高く、そのパーソナライゼーションには多数のパラメータの微調整が伴う。本研究では,物理インフォームドニューラルネットワーク(PINN)と3次元軟組織非線形生体力学モデルを組み合わせた新しいアプローチを提案する。提案した学習アルゴリズムは、限られた量の変位から情報を符号化し、場合によっては、臨床環境で日常的に取得できる歪みデータを符号化し、偏微分方程式に基づく数学的モデルで表される問題の物理学と組み合わせ、問題を正規化し、収束性を向上させる。提案手法の精度とロバスト性を示し, 患者特異的で不均一な物理的特性, 組織硬度特性のロバストかつ有効同定を可能にする大きな可能性を示すために, いくつかのベンチマークを提出した。特に, 傷痕組織の存在, 位置, 重症度を検出するPINNの能力を実証し, 特に心臓疾患の診断における個人化シミュレーションモデルの開発に有用であることを示す。

The development of biophysical models for clinical applications is rapidly advancing in the research community, thanks to their predictive nature and their ability to assist the interpretation of clinical data. However, high-resolution and accurate multi-physics computational models are computationally expensive and their personalisation involves fine calibration of a large number of parameters, which may be space-dependent, challenging their clinical translation. In this work, we propose a new approach which relies on the combination of physics-informed neural networks (PINNs) with three-dimensional soft tissue nonlinear biomechanical models, capable of reconstructing displacement fields and estimating heterogeneous patient-specific biophysical properties. The proposed learning algorithm encodes information from a limited amount of displacement and, in some cases, strain data, that can be routinely acquired in the clinical setting, and combines it with the physics of the problem, represented by a mathematical model based on partial differential equations, to regularise the problem and improve its convergence properties. Several benchmarks are presented to show the accuracy and robustness of the proposed method and its great potential to enable the robust and effective identification of patient-specific, heterogeneous physical properties, s.a. tissue stiffness properties. In particular, we demonstrate the capability of the PINN to detect the presence, location and severity of scar tissue, which is beneficial to develop personalised simulation models for disease diagnosis, especially for cardiac applications.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# スケッチとシフト:圧縮クラスタリングのためのロバストデコーダ

Sketch and shift: a robust decoder for compressive clustering ( http://arxiv.org/abs/2312.09940v2 )

ライセンス: Link先を確認

Ayoub Belhadji, Rémi Gribonval,

(参考訳) 圧縮学習は,まず大規模なデータセットを低次元のスケッチベクトルに要約し,このスケッチから学習に必要な潜時情報を復号することで,大規模学習のメモリフットプリントを大幅に削減する,新たなアプローチである。ランダムな特徴に基づくスケッチの情報保存保証の最近の進歩を踏まえ、主要な目的は、この情報を堅牢かつ効率的に抽出するために、容易に修正できるアルゴリズム(デコーダと呼ばれる)を設計することである。非凸最適化問題に対処するために、様々なヒューリスティックな手法が提案されている。圧縮クラスタリングの場合、標準的なヒューリスティックは CL-OMPR である。しかし、CL-OMPRのチューニングは困難であり、その堅牢性の検討は見落とされた。本研究では,CL-OMPRを精査し,その限界を回避する。特に,このアルゴリズムは,有利なシナリオにおいても,クラスタの回復に失敗する可能性があることを示す。このアルゴリズムの欠点は,アルゴリズムのコアステップに現れる相関関数の構造に関連した最適化の難しさに起因すると考えられる。これらの制限に対処するため、CL-OMPRよりも大幅に改善された代替デコーダを提案する。その設計は、カーネル密度推定器の局所的な最大値を検出する古典的なアプローチである平均シフトアルゴリズムから着想を得ている。提案アルゴリズムは,従来より10倍小さいMNISTデータセットのスケッチからクラスタリング情報を抽出することができる。

Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.

翻訳日:2024-06-19 09:12:15 公開日:2024-06-17

# PPOのカラーノイズ:相関行動サンプリングによる探索と性能の改善

Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling ( http://arxiv.org/abs/2312.11091v2 )

ライセンス: Link先を確認

Jakob Hollenstein, Georg Martius, Justus Piater,

(参考訳) PPO(Proximal Policy Optimization, Proximal Policy Optimization)は、政治の深い強化学習手法であり、探索に確率的政策を用いる。本稿では,色付き雑音に基づくPPOの確率的ポリシー変種を提案する。従来の研究では、活動雑音における時間的相関の重要性を強調して、非政治強化学習における効果的な探索を行った。そこで本研究では,PPOのような政治手法においても,相関ノイズが探索を促進できるかどうかを考察する。行動選択のための相関ノイズは学習性能を向上し,現在普及している非相関性ホワイトノイズ手法よりも優れることがわかった。ピンクノイズが有効であることが判明した非政治学習とは異なり、色付きノイズは白とピンクの中間であり、PPOのオンライン学習に最適であることがわかった。我々は,データ収集のための並列シミュレーション環境の数を変更することで,更新毎に収集したデータ量を変化させる影響について検討し,より多くの並列環境において,より強い相関ノイズが有効であることを示した。実装の大幅な影響と容易さのため、PPOのデフォルトノイズ源として相関ノイズに切り替えることを推奨する。

Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments for data collection and observed that with a larger number of parallel environments, more strongly correlated noise is beneficial. Due to the significant impact and ease of implementation, we recommend switching to correlated noise as the default noise source in PPO.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# MAC-SQL: テキストからSQLへの多言語コラボレーションフレームワーク

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL ( http://arxiv.org/abs/2312.11242v4 )

ライセンス: Link先を確認

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, Zhoujun Li,

(参考訳) 近年の LLM ベースの Text-to-SQL メソッドは "巨大な" データベースや,マルチステップ推論を必要とする複雑なユーザ質問に対して,パフォーマンスが著しく低下している。さらに、既存のほとんどの手法は、外部ツールやモデルコラボレーションを利用したLLMの重要な重要性を無視している。これらの課題に対処するために,新しいLLMベースのマルチエージェント協調フレームワークであるMAC-SQLを紹介する。本フレームワークは,外部ツールやモデルを用いてより小さなサブデータベースを取得し,誤ったSQLクエリを精査する2つの補助エージェントを伴って,数発の連鎖推論によるテキストからSQL生成のためのコアデコンポーザエージェントで構成されている。 Decomposerエージェントは、必要に応じてアクティベートされ、Text-to-SQLパースのための新機能やツールに対応するために拡張可能な補助エージェントと協調する。我々のフレームワークでは、まず、GPT-4 を全てのエージェントタスクの強力なバックボーン LLM として利用し、フレームワークの上限を決定する。次に、Code Llama 7Bを活用して、オープンソースの命令フォローモデルSQL-Llamaを微調整し、GPT-4のように全てのタスクを達成します。実験によると、SQL-Llama はバニラ GPT-4 のベースライン精度 46.35 と比較して 43.94 の実行精度を達成している。執筆時点で、MAC-SQL+GPT-4はBIRDベンチマークで評価すると59.59の実行精度を達成し、そのホールドアウトテストセット(https://github.com/wbbeyourself/MAC-SQL)上に新しい最先端(SOTA)を確立する。

Recent LLM-based Text-to-SQL methods usually suffer from significant performance degradation on "huge" databases and complex user questions that require multi-step reasoning. Moreover, most existing methods neglect the crucial significance of LLMs utilizing external tools and model collaboration. To address these challenges, we introduce MAC-SQL, a novel LLM-based multi-agent collaborative framework. Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries. The decomposer agent collaborates with auxiliary agents, which are activated as needed and can be expanded to accommodate new features or tools for effective Text-to-SQL parsing. In our framework, We initially leverage GPT-4 as the strong backbone LLM for all agent tasks to determine the upper bound of our framework. We then fine-tune an open-sourced instruction-followed model, SQL-Llama, by leveraging Code Llama 7B, to accomplish all tasks as GPT-4 does. Experiments show that SQL-Llama achieves a comparable execution accuracy of 43.94, compared to the baseline accuracy of 46.35 for vanilla GPT-4. At the time of writing, MAC-SQL+GPT-4 achieves an execution accuracy of 59.59 when evaluated on the BIRD benchmark, establishing a new state-of-the-art (SOTA) on its holdout test set (https://github.com/wbbeyourself/MAC-SQL).

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# StarCraft IIをプレイする大規模言語モデル - 要約アプローチのベンチマークとチェーン

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach ( http://arxiv.org/abs/2312.11865v2 )

ライセンス: Link先を確認

Weiyu Ma, Qirui Mi, Xue Yan, Yuqiao Wu, Runji Lin, Haifeng Zhang, Jun Wang,

(参考訳) StarCraft IIは、正確なマイクロレベルの操作と戦略的マクロ認識の両方を必要とするため、AIエージェントにとって困難なベンチマークである。しかし、AlphastarやSCCといった以前の研究は、StarCraft IIに対処する上で素晴らしい成果を上げているが、長期的な戦略計画と戦略解釈性には欠点がある。 VoyageやMetaGPTといった新たな大規模言語モデル(LLM)エージェントは、複雑なタスクを解決する大きな可能性を示している。そこで我々は,高度に複雑なRTSゲームであるStarCraft IIにおけるLLMの能力を検証することを目指しており,LLMの推論能力を最大限活用するために,LLMエージェントと対話可能なテキストStratCraft II環境を開発する。第2に,ゲーム情報の解析,コマンドレコメンデーションの提供,戦略的意思決定のための,生観測処理のための単一フレーム要約と多フレーム要約を含む要約手法を提案する。実験は、まず、人間の専門家による評価と、ゲームにおけるLLMエージェントの熟達度の評価と、ゲーム内のLLMエージェントのパフォーマンス、そして、LLMエージェントのゲームパフォーマンスと、勝利率や要約の連鎖の影響といった側面を含む2つの部から成っている。 1. LLMは、スタークラフトIIのシナリオに対処するために必要な知識及び複雑な計画能力を有する。 2. 人間の専門家は、LLMエージェントの演奏は、スタークラフトIIを8年間プレイした平均的な選手の演奏に近いものとみなす。 3. LLMエージェントは、Harder(Lv5)の難易度で構築されたAIを倒すことができる。コードをオープンソース化し、LLMエージェントがStarCraft IIをプレイするデモビデオを公開しました。

StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS game.To conveniently take full advantage of LLMs` reasoning abilities, we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment consists of two parts: first, an evaluation by human experts, which includes assessing the LLMs`s mastery of StarCraft II knowledge and the performance of LLM agents in the game; second, the in game performance of LLM agents, encompassing aspects like win rate and the impact of Chain of Summarization.Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI at the Harder(Lv5) difficulty level. We have open sourced the code and released demo videos of LLM agent playing StarCraft II.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# SHAPの形成:レイヤワイズ近隣選択による安定性向上

Shaping Up SHAP: Enhancing Stability through Layer-Wise Neighbor Selection ( http://arxiv.org/abs/2312.12115v2 )

ライセンス: Link先を確認

Gwladys Kelodjou, Laurence Rozé, Véronique Masson, Luis Galárraga, Romaric Gaudel, Maurice Tchuente, Alexandre Termier,

(参考訳) ディープラーニングやアンサンブル手法などの機械学習技術は、複雑な現実世界のタスクを処理できるため、様々な領域で広く使われている。しかしながら、ブラックボックスの性質は、コンピュータによる意思決定の公平性、信頼性、透明性について、様々な懸念を引き起こしている。これにより、ブラックボックスアルゴリズムによる個々の決定に関する説明を提供する、ローカルなポストホックな説明可能性法が出現した。これらの手法の中で、Kernel SHAPはモデルに依存しない性質と十分に確立された理論的枠組みのために広く使われている。これらの強みにもかかわらず、Kernel SHAPは高い不安定さに悩まされ、同じ入力を持つメソッドの異なる実行は、非常に異なる説明をもたらす可能性があるため、説明の関連性が低下する。本論文の貢献は2つある。一方, Kernel SHAP の不安定性は確率的近傍選択法によって引き起こされることを示す。一方、第1層の連立と呼ばれる第1層の摂動に隣人の世代を限定することにより、完全に安定し、計算効率が良く、意味のある新しい特徴帰属法が得られることを示す。

Machine learning techniques, such as deep learning and ensemble methods, are widely used in various domains due to their ability to handle complex real-world tasks. However, their black-box nature has raised multiple concerns about the fairness, trustworthiness, and transparency of computer-assisted decision-making. This has led to the emergence of local post-hoc explainability methods, which offer explanations for individual decisions made by black-box algorithms. Among these methods, Kernel SHAP is widely used due to its model-agnostic nature and its well-founded theoretical framework. Despite these strengths, Kernel SHAP suffers from high instability: different executions of the method with the same inputs can lead to significantly different explanations, which diminishes the relevance of the explanations. The contribution of this paper is two-fold. On the one hand, we show that Kernel SHAP's instability is caused by its stochastic neighbor selection procedure, which we adapt to achieve full stability without compromising explanation fidelity. On the other hand, we show that by restricting the neighbors generation to perturbations of size 1 -- which we call the coalitions of Layer 1 -- we obtain a novel feature-attribution method that is fully stable, computationally efficient, and still meaningful.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# 非線形量子古典力学におけるコオポモン軌道

Koopmon trajectories in nonadiabatic quantum-classical dynamics ( http://arxiv.org/abs/2312.13878v2 )

ライセンス: Link先を確認

Werner Bauer, Paul Bergold, François Gay-Balmaz, Cesare Tronci,

(参考訳) 完全量子非線形力学の計算コストを軽減するために、クープマン波動関数の理論に基づく混合量子古典(MQC)粒子法を提案する。従来のMQCモデルは、ハイゼンベルクの原理に違反したような一貫性の問題にしばしば悩まされるが、我々は、コップマンの古典力学をヒルベルト空間とシンプレクティック幾何学の手法を混合することにより、これらの困難を克服する。結果の連続体モデルは変分構造とハミルトン構造の両方を楽しむが、その非線形な性格は適切な閉包を求める。ここでは、基本となる行動原理から、以前にチーム内で開発された正規化手法を適用します。このステップは、計算粒子の軌道、コオポモン(英語版)、位相空間におけるラグランジアン古典経路のサンプリングを導入する特異解アンザッツ(英語版)を可能にする。タリーの非線形問題の場合、標準的なMQCエレンフェストシミュレーションでは達成できない精度のレベルで完全に量子シミュレーションの結果を再現する。さらに、コオポモン法は、同様の完全量子アプローチに対して計算的に有利であり、本研究でも検討されている。さらに, MQC 処理がほとんど適用できない超強結合系と深部強結合系の両方において, Rabi 問題を考慮し, 手法の限界を検証した。この場合、この手法は完全な量子結果の一部を再現することに成功した。

In order to alleviate the computational costs of fully quantum nonadiabatic dynamics, we present a mixed quantum-classical (MQC) particle method based on the theory of Koopman wavefunctions. Although conventional MQC models often suffer from consistency issues such as the violation of Heisenberg's principle, we overcame these difficulties by blending Koopman's classical mechanics on Hilbert spaces with methods in symplectic geometry. The resulting continuum model enjoys both a variational and a Hamiltonian structure, while its nonlinear character calls for suitable closures. Benefiting from the underlying action principle, here we apply a regularization technique previously developed within our team. This step allows for a singular solution ansatz which introduces the trajectories of computational particles - the koopmons - sampling the Lagrangian classical paths in phase space. In the case of Tully's nonadiabatic problems, the method reproduces the results of fully quantum simulations with levels of accuracy that are not achieved by standard MQC Ehrenfest simulations. In addition, the koopmon method is computationally advantageous over similar fully quantum approaches, which are also considered in our study. As a further step, we probe the limits of the method by considering the Rabi problem in both the ultrastrong and the deep strong coupling regimes, where MQC treatments appear hardly applicable. In this case, the method succeeds in reproducing parts of the fully quantum results.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# パスワイズラッソのための量子アルゴリズム

Quantum Algorithms for the Pathwise Lasso ( http://arxiv.org/abs/2312.14141v2 )

ライセンス: Link先を確認

Joao F. Doriguello, Debbie Lim, Chi Seng Pun, Patrick Rebentrost, Tushar Vaidya,

(参考訳) 古典的LARS(Least Angle Regression)パスワイズアルゴリズムに基づいて,$\ell_1$-penaltyの量子高次元線形回帰アルゴリズムを提案する。ラッソの古典的アルゴリズムと同様に、我々の量子アルゴリズムは、ペナルティ項が変化するにつれて完全な正規化パスを提供するが、特定の条件下では反復ごとに2次的に高速である。 D\"urr と Hoyer (arXiv'96) の量子最小フィンディングルーチンを使うことで、各イテレーションにおける結合時間を取得することで、機能数$d$の二次的なスピードアップが可能になる。次に、この単純な量子アルゴリズムを改善し、Chen と de Wolf (ICALP'23) の近似量子最小有限ルーチンを用いて、特徴数 $d$ と観測数 $n$ の両方で二次的なスピードアップを得る。我々の主な貢献の一つとして、近似量子最小探索によって探索される結合時間を近似的に計算する量子ユニタリを構築している。結合時間はもはや正確に計算されないため、得られた近似量子アルゴリズムが良い解を得るかどうかはもはや明らかではない。 2つ目の主な貢献として、KKT条件の近似バージョンと双対性ギャップを通じて、LARSアルゴリズム(つまり我々の量子アルゴリズム)がエラーに対して堅牢であることを証明する。これは、結合時間がほぼ計算された場合、ラッソのコスト関数を小さな誤差まで最小化する経路を出力することを意味する。さらに、ガウス分布から観測結果がサンプリングされると、量子アルゴリズムの複雑さは、古典的なLARSアルゴリズムよりも指数関数的に良い$n$にのみ依存し、$d$に二次的な改善を保っていることを示す。最後に、標準的なLARSアルゴリズムから$d$の線形スケーリングとともに、$n$の多元対数依存を保ち続ける非等化アルゴリズムを提案する。

We present a novel quantum high-dimensional linear regression algorithm with an $\ell_1$-penalty based on the classical LARS (Least Angle Regression) pathwise algorithm. Similarly to available classical algorithms for Lasso, our quantum algorithm provides the full regularisation path as the penalty term varies, but quadratically faster per iteration under specific conditions. A quadratic speedup on the number of features $d$ is possible by using the quantum minimum-finding routine from D\"urr and Hoyer (arXiv'96) in order to obtain the joining time at each iteration. We then improve upon this simple quantum algorithm and obtain a quadratic speedup both in the number of features $d$ and the number of observations $n$ by using the approximate quantum minimum-finding routine from Chen and de Wolf (ICALP'23). As one of our main contributions, we construct a quantum unitary to approximately compute the joining times to be searched over by the approximate quantum minimum finding. Since the joining times are no longer exactly computed, it is no longer clear that the resulting approximate quantum algorithm obtains a good solution. As our second main contribution, we prove, via an approximate version of the KKT conditions and a duality gap, that the LARS algorithm (and thus our quantum algorithm) is robust to errors. This means that it still outputs a path that minimises the Lasso cost function up to a small error if the joining times are approximately computed. Moreover, we show that, when the observations are sampled from a Gaussian distribution, our quantum algorithm's complexity only depends polylogarithmically on $n$, exponentially better than the classical LARS algorithm, while keeping the quadratic improvement on $d$. Finally, we propose a dequantised algorithm that also retains the polylogarithmic dependence on $n$, albeit with the linear scaling on $d$ from the standard LARS algorithm.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# README:データ中心NLPによる医療ジャーゴンのブリッジと患者教育への理解

README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP ( http://arxiv.org/abs/2312.15561v3 )

ライセンス: Link先を確認

Zonghai Yao, Nandyala Siddharth Kantu, Guanghao Wei, Hieu Tran, Zhangqi Duan, Sunjae Kwon, Zhichao Yang, README annotation team, Hong Yu,

(参考訳) 医療の進歩は、患者中心のアプローチ、特にElectronic Health Records(EHR)へのアクセスによって促進されるセルフケアと患者教育に焦点を移している。しかし, EHRの医療ジャーゴンは, 患者の理解に重大な課題をもたらす。そこで我々は,複雑な医療用語を患者フレンドリーなレイ言語に単純化することを目的とした,レイ定義を自動的に生成する新しいタスクを提案する。最初、READMEデータセットを作成しました。これは、5万以上のユニークな(医療用語、レイ定義)ペアと30万の言及の広範なコレクションで、それぞれがドメインの専門家が手動で注釈付けしたコンテキスト対応のレイ定義を提供しています。また、データフィルタリング、拡張、選択を相乗化してデータ品質を改善する、データ中心のHuman-AIパイプラインも開発しました。その後、READMEをモデルトレーニングデータとして使用し、検索補助生成法を用いて幻覚を低減し、モデル出力の品質を向上させる。我々の大規模な自動および人為的評価は、高品質なデータで微調整されたオープンソースのモバイルフレンドリなモデルが、ChatGPTのような最先端のクローズドソースな大規模言語モデルの性能にマッチしたり、超えたりできることを示している。この研究は、患者教育における知識ギャップを解消し、患者中心の医療ソリューションを前進させる重要な取り組みである。

The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical terms into patient-friendly lay language. We first created the README dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions, each offering context-aware lay definitions manually annotated by domain experts. We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality. We then used README as the training data for models and leveraged a Retrieval-Augmented Generation method to reduce hallucinations and improve the quality of model outputs. Our extensive automatic and human evaluations demonstrate that open-source mobile-friendly models, when fine-tuned with high-quality data, are capable of matching or even surpassing the performance of state-of-the-art closed-source large language models like ChatGPT. This research represents a significant stride in closing the knowledge gap in patient education and advancing patient-centric healthcare solutions.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# インストラクションフュージョン:ハイブリッド化によるプロンプト進化の促進

Instruction Fusion: Advancing Prompt Evolution through Hybridization ( http://arxiv.org/abs/2312.15692v4 )

ライセンス: Link先を確認

Weidong Guo, Jiuding Yang, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu,

(参考訳) コード生成に特化したLLM(Large Language Models)の微調整は、オープンドメインのコーディングクエリを使うことで、顕著な進歩を遂げた。成功にもかかわらず、Evol-Instructのような既存の方法論はパフォーマンスの制限に直面し、コード生成タスクのさらなる強化を妨げる。本稿では,既存のプロンプト進化手法の制約について検討し,新しいアプローチであるインストラクション・フュージョン(IF)を導入する。 IFは、ハイブリッド化プロセスを通じて、2つの異なるプロンプトを革新的に組み合わせ、コードLLMのトレーニングプロンプトの進化を強化する。提案手法は,HumanEval,HumanEval+,MBPP,MBPP+,MultiPL-Eの5つのコード生成ベンチマークにおけるコードLLMの性能を著しく向上し,コード生成におけるLLMの能力向上にインストラクションフュージョンが有効であることを示す。

The fine-tuning of Large Language Models (LLMs) specialized in code generation has seen notable advancements through the use of open-domain coding queries. Despite the successes, existing methodologies like Evol-Instruct encounter performance limitations, impeding further enhancements in code generation tasks. This paper examines the constraints of existing prompt evolution techniques and introduces a novel approach, Instruction Fusion (IF). IF innovatively combines two distinct prompts through a hybridization process, thereby enhancing the evolution of training prompts for code LLMs. Our experimental results reveal that the proposed novel method effectively addresses the shortcomings of prior methods, significantly improving the performance of Code LLMs across five code generation benchmarks, namely HumanEval, HumanEval+, MBPP, MBPP+ and MultiPL-E, which underscore the effectiveness of Instruction Fusion in advancing the capabilities of LLMs in code generation.

翻訳日:2024-06-19 07:14:24 公開日:2024-06-17

# 量子誤り訂正符号の準最適性能

The Near-optimal Performance of Quantum Error Correction Codes ( http://arxiv.org/abs/2401.02022v2 )

ライセンス: Link先を確認

Guo Zheng, Wenhao He, Gideon Lee, Liang Jiang,

(参考訳) Knill-Laflamme (KL) 条件は正確な量子誤り訂正符号を区別し、最先端の符号の発見に重要な役割を果たしている。しかし、正確な符号の族は非常に制限的であり、必ずしも最高の性能の符号を含まない。したがって、一般化された定量的な性能指標を開発することが望ましい。このレターでは、任意の符号と雑音に対する簡潔で最適化のない計量である準最適チャネル忠実度を導出する。この計量は最適なコード性能に限定した狭い2辺の値を与え、KL条件で要求されるのと全く同じ入力で評価することができる。複数の量子ビット符号と発振器符号の例を通して、準最適チャネル忠実度の数値的利点を示す。従来の最適化手法と比較して、計算コストの削減により、何百もの平均励起を符号化する発振器など、これまでアクセスできない大きさのシステムをシミュレートすることができる。さらに,熱力学符号とGottesman-Kitaev-Preskill (GKP)符号のほぼ最適性能を解析的に導出した。特に、励起損失下でのGKP符号の性能は、そのエネルギーと単調に改善し、他の発振器符号とは異なる無限エネルギーでの漸近極限に収束する。

The Knill-Laflamme (KL) conditions distinguish exact quantum error correction codes, and it has played a critical role in the discovery of state-of-the-art codes. However, the family of exact codes is a very restrictive one and does not necessarily contain the best-performing codes. Therefore, it is desirable to develop a generalized and quantitative performance metric. In this Letter, we derive the near-optimal channel fidelity, a concise and optimization-free metric for arbitrary codes and noise. The metric provides a narrow two-sided bound to the optimal code performance, and it can be evaluated with exactly the same input required by the KL conditions. We demonstrate the numerical advantage of the near-optimal channel fidelity through multiple qubit code and oscillator code examples. Compared to conventional optimization-based approaches, the reduced computational cost enables us to simulate systems with previously inaccessible sizes, such as oscillators encoding hundreds of average excitations. Moreover, we analytically derive the near-optimal performance for the thermodynamic code and the Gottesman-Kitaev-Preskill (GKP) code. In particular, the GKP code's performance under excitation loss improves monotonically with its energy and converges to an asymptotic limit at infinite energy, which is distinct from other oscillator codes.

翻訳日:2024-06-19 07:04:39 公開日:2024-06-17

# t-DGR:意思決定における連続学習のための軌道ベース深層生成再生法

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making ( http://arxiv.org/abs/2401.02576v2 )

ライセンス: Link先を確認

William Yue, Bo Liu, Peter Stone,

(参考訳) 深い生成的リプレイは、意思決定タスクにおける継続的な学習のための有望なアプローチとして現れてきた。このアプローチは、これまで遭遇したタスクからトラジェクトリの生成を活用して、現在のデータセットを増大させることによって、破滅的な忘れの問題に対処する。しかし、既存の連続学習のための深層生成的再生法は、生成した軌跡の複雑な誤りに悩まされる自己回帰モデルに依存している。本稿では,軌道上の時間ステップに条件付きタスクサンプルを生成する生成モデルを用いて,意思決定タスクにおける継続学習のためのシンプルでスケーラブルで非自己回帰的手法を提案する。提案手法は連続世界ベンチマークで評価し, 連続学習手法の平均成功率測定値から最先端のパフォーマンスを達成できることを確認した。コードはhttps://github.com/WilliamYue37/t-DGRで公開されている。

Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR.

翻訳日:2024-06-19 07:04:39 公開日:2024-06-17

# MLLM-Protector:HurtingパフォーマンスのないMLLMの安全性を保証する

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance ( http://arxiv.org/abs/2401.02906v3 )

ライセンス: Link先を確認

Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang,

(参考訳) マルチモーダルな大規模言語モデル(MLLM)の展開は、視覚入力による悪意のある攻撃に対する感受性という、ユニークな脆弱性を生み出した。本稿では,このような攻撃に対してMLLMを防御する新たな課題について検討する。大型言語モデル (LLM) と比較して、MLLM には追加の画像モダリティが含まれている。画像は安全アライメント時に考慮されない「外部言語」として機能し、MLLMは有害な応答を生じやすくする。残念なことに、テキストベースのLLMで考慮された離散トークンとは異なり、画像信号の連続的な性質は重要なアライメントの課題を示しており、すべてのシナリオを完全にカバーすることは困難である。この脆弱性は、ほとんどの最先端のMLLMが、大規模なテキストベースの事前学習コーパスよりもはるかに少ない制限された画像テキストペアで微調整されているという事実により、さらに悪化している。これらの課題に対処するために,2つのサブタスクを解決するプラグアンドプレイ戦略であるMLLM-Protectorを導入する。 1)軽量害検知器を介して有害な応答を識別し、 2) 有害な応答を除毒剤を介して無害な応答に変換する。このアプローチは、MLLMの本来の性能を損なうことなく、悪意ある視覚入力によって引き起こされるリスクを効果的に軽減する。 MLLM-Protectorは,MLLMセキュリティの未適応な側面に対して,堅牢なソリューションを提供することを示す。

The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not considered during safety alignment, making MLLMs more prone to producing harmful responses. Unfortunately, unlike the discrete tokens considered in text-based LLMs, the continuous nature of image signals presents significant alignment challenges, which poses difficulty to thoroughly cover all possible scenarios. This vulnerability is exacerbated by the fact that most state-of-the-art MLLMs are fine-tuned on limited image-text pairs that are much fewer than the extensive text-based pretraining corpus, which makes the MLLMs more prone to catastrophic forgetting of their original abilities during safety fine-tuning. To tackle these challenges, we introduce MLLM-Protector, a plug-and-play strategy that solves two subtasks: 1) identifying harmful responses via a lightweight harm detector, and 2) transforming harmful responses into harmless ones via a detoxifier. This approach effectively mitigates the risks posed by malicious visual inputs without compromising the original performance of MLLMs. Our results demonstrate that MLLM-Protector offers a robust solution to a previously unaddressed aspect of MLLM security.

翻訳日:2024-06-19 07:04:39 公開日:2024-06-17

# 現実的な量子系のシミュレーションにおける過度パラメトリゼーションのキャラクタリゼーション

Characterization of overparametrization in the simulation of realistic quantum systems ( http://arxiv.org/abs/2401.05500v2 )

ライセンス: Link先を確認

Matthew Duschenes, Juan Carrasquilla, Raymond Laflamme,

(参考訳) 量子コンピューティングデバイスは、量子状態を作成し、他の量子システムをシミュレートするために、実験パラメータを例外的に制御する必要がある。このような最適制御パラメータを見つけるために使用される古典的な最適化手順は、様々な学習様式を示すために理想化された設定でさらに示されている。十分な数のパラメータを持つシステムでは、準備された状態に対するグローバルな最適化とコンパイルされたユニタリ忠実度が指数関数的に速く到達する可能性がある。本稿では,演算子間のパラメータの有界化や共有など,実験的な制約が存在する場合の過パラメータ化現象のロバスト性や,実験的な設定に固有のノイズの存在について検討する。過度パラメータ化現象は、これらの現実的な環境では短時間で回復するが、量子ノイズまたは古典ノイズの蓄積により、臨界シミュレーション期間を過ぎて、忠実度はゼロに低下する。この臨界深度はノイズのスケールで対数的であり、最適忠実度は最初は深さで指数関数的に増加し、その後、深さで多項式的に減少し、ノイズで減少する。この結果から, パラメータ化アンサツェは環境からエントロピー効果を緩和し, 近い将来の量子デバイスでの実験的な実現を可能にした。

Quantum computing devices require exceptional control of their experimental parameters to prepare quantum states and simulate other quantum systems. Classical optimization procedures used to find such optimal control parameters, have further been shown in idealized settings to exhibit different regimes of learning. Of interest in this work is the overparameterization regime, where for systems with a sufficient number of parameters, global optima for prepared state and compiled unitary fidelities may potentially be reached exponentially quickly. Here, we study the robustness of overparameterization phenomena in the presence of experimental constraints on the controls, such as bounding or sharing parameters across operators, as well as in the presence of noise inherent to experimental setups. We observe that overparameterization phenomena are resilient in these realistic settings at short times, however fidelities decay to zero past a critical simulation duration due to accumulation of either quantum or classical noise. This critical depth is found to be logarithmic in the scale of noise, and optimal fidelities initially increase exponentially with depth, before decreasing polynomially with depth, and with noise. Our results demonstrate that parameterized ansatze can mitigate entropic effects from their environment, offering tantalizing opportunities for their application and experimental realization in near term quantum devices.

翻訳日:2024-06-19 07:04:39 公開日:2024-06-17

# 最小編集制約によるきめ細かい強化学習による大規模言語モデルの改善

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint ( http://arxiv.org/abs/2401.06081v2 )

ライセンス: Link先を確認

Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Junchen Wan, Fuzheng Zhang, Di Zhang, Ji-Rong Wen,

(参考訳) 強化学習(Reinforcement Learning, RL)は、予期せぬアウトプットを防止し、有害性とエラーを減らすために、大規模言語モデル(LLM)の訓練に広く用いられている。しかし、既存のRLメソッドは、主にインスタンスレベルの報酬を採用しており、複雑な推論タスクのきめ細かい監督を提供することができず、不正につながるいくつかのキートークンに集中できない。そこで本研究では,生成モデルを報酬モデルとして組み込んだ新たなRL手法を提案する。これは,最小編集制約下での誤解書き換えタスクによってトレーニングされ,RLトレーニングのためのトークンレベル報酬を生成することができる。生成報酬モデルに基づいて、トレーニングのためのトークンレベルRL目標と、RLプロセスの安定化のための模倣ベース正規化を設計する。両方の目的は、誤った解に対するキートークンの学習に集中し、他の重要でないトークンの影響を減らします。数学的タスクと質問応答タスクの実験結果から,本手法の有効性が示された。私たちのコードとデータはhttps://github.com/RUCAIBox/RLMECで公開されています。

Reinforcement learning (RL) has been widely used in training large language models (LLMs) for preventing unexpected outputs, eg reducing harmfulness and errors. However, existing RL methods mostly adopt the instance-level reward, which is unable to provide fine-grained supervision for complex reasoning tasks, and can not focus on the few key tokens that lead to the incorrectness. To address it, we propose a new RL method named RLMEC that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process. And the both objectives focus on the learning of the key tokens for the erroneous solution, reducing the effect of other unimportant tokens. The experiment results on mathematical tasks and question-answering tasks have demonstrated the effectiveness of our approach. Our code and data are available at https://github.com/RUCAIBox/RLMEC.

翻訳日:2024-06-19 07:04:39 公開日:2024-06-17

# Stockformer: Wavelet Transform と Multi-Task Self-Attention Network に基づく価格変数ストック選択モデル

Stockformer: A Price-Volume Factor Stock Selection Model Based on Wavelet Transform and Multi-Task Self-Attention Networks ( http://arxiv.org/abs/2401.06139v2 )

ライセンス: Link先を確認

Bohan Ma, Yushan Xue, Yuan Lu, Jing Chen,

(参考訳) 中国株式市場が発展し、市場構造が複雑化するにつれ、伝統的な量的取引手法はエスカレートする課題に直面している。特に、政策の不確実性や突然の経済的な出来事によって引き起こされる市場の頻繁な変動により、既存のモデルは市場のダイナミクスを正確に予測するのに苦労することが多い。これらの課題に対処するため,市場不安定性に関する応答性と予測精度の向上を目的とした,ウェーブレット変換とマルチタスク自己注意ネットワークを統合した価格-体積係数ストックセレクションモデルであるStockformerを紹介した。離散ウェーブレット変換により、ストックフォーマーは株価のリターンを高頻度と低頻度に分解し、急激な出来事を含む長期市場のトレンドと短期的な変動を注意深く捉えている。さらに、このモデルには、二重周波数時空間エンコーダとグラフ埋め込み技術が組み込まれ、ストック間の複雑な時間的および空間的関係を効果的に捉えることができる。マルチタスク学習戦略を採用することで、株価のリターンと方向性の傾向を同時に予測する。実験結果から、Stockformerは複数の実市場データセットにおいて、既存の先進的な手法よりも優れていることが示された。ストラテジーバックテストにおいて、Stockformerは、ダウンターンや揮発性の期間に特に高いパフォーマンスを維持し、市場の変動に高い適応性を示すような、市場条件全体にわたる例外的な安定性と信頼性を一貫して示している。金融分析分野におけるイノベーションとコラボレーションを促進するため、Stockformerモデルのコードはオープンソースとして公開され、GitHubリポジトリで公開されている。

As the Chinese stock market continues to evolve and its market structure grows increasingly complex, traditional quantitative trading methods are facing escalating challenges. Particularly, due to policy uncertainty and the frequent market fluctuations triggered by sudden economic events, existing models often struggle to accurately predict market dynamics. To address these challenges, this paper introduces Stockformer, a price-volume factor stock selection model that integrates wavelet transformation and a multitask self-attention network, aimed at enhancing responsiveness and predictive accuracy regarding market instabilities. Through discrete wavelet transform, Stockformer decomposes stock returns into high and low frequencies, meticulously capturing long-term market trends and short-term fluctuations, including abrupt events. Moreover, the model incorporates a Dual-Frequency Spatiotemporal Encoder and graph embedding techniques to effectively capture complex temporal and spatial relationships among stocks. Employing a multitask learning strategy, it simultaneously predicts stock returns and directional trends. Experimental results show that Stockformer outperforms existing advanced methods on multiple real stock market datasets. In strategy backtesting, Stockformer consistently demonstrates exceptional stability and reliability across market conditions-whether rising, falling, or fluctuating-particularly maintaining high performance during downturns or volatile periods, indicating a high adaptability to market fluctuations. To foster innovation and collaboration in the financial analysis sector, the Stockformer model's code has been open-sourced and is available on the GitHub repository: https://github.com/Eric991005/Multitask-Stockformer.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# 完全連結フィードフォワードニューラルネットワークにおける重み最適化のための閉形式解法

A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks ( http://arxiv.org/abs/2401.06699v2 )

ライセンス: Link先を確認

Slavisa Tomic, João Pedro Matos-Carvalho, Marko Beko,

(参考訳) 本研究は、完全連結フィードフォワードニューラルネットワークにおける重み付け最適化問題に対処する。バックプロパゲーション(BP)とチェーン規則勾配に基づく最適化(場合によっては繰り返し実行、潜在的に重荷、時間を要する)に基づく既存の手法とは異なり、提案手法は最小二乗法(LS)法を用いて閉形式における重み付け最適化の解を提供する。インプット・トゥ・アウトプット・マッピングがインジェクティブである場合、新しいアプローチでは、各ニューロンに対して各レイヤの重みのセットを共同で最適化することにより、1イテレーションでバックプロパゲーション方式で重みを最適化する。インプット・トゥ・アウトプット・マッピングが帰納的でない場合(例えば分類問題では)、提案手法は数イテレーションで最終解が得られるように容易に適応できる。既存のソリューションに対する重要な利点は、これらの計算(層内の全てのニューロン)が互いに独立していることである。さらに、その実行時間は、全てのネットワーク層の重みを最適化するために必要な計算の正確な数が得られるという意味で決定論的である(反復の場合、非射影写像の場合)。シミュレーションおよび実験結果から,提案手法であるBPLSは,既存の手法と精度で競合するが,実行時間ではかなり上回っていることがわかった。要約すると、新しい手法は実装が簡単で、既存の方法よりも競争力があり、計算効率が良く、並列実装に適している。

This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least squares (LS) methodology. In the case where the input-to-output mapping is injective, the new approach optimizes the weights in a back-propagating fashion in a single iteration by jointly optimizing a set of weights in each layer for each neuron. In the case where the input-to-output mapping is not injective (e.g., in classification problems), the proposed solution is easily adapted to obtain its final solution in a few iterations. An important advantage over the existing solutions is that these computations (for all neurons in a layer) are independent from each other; thus, they can be carried out in parallel to optimize all weights in a given layer simultaneously. Furthermore, its running time is deterministic in the sense that one can obtain the exact number of computations necessary to optimize the weights in all network layers (per iteration, in the case of non-injective mapping). Our simulation and empirical results show that the proposed scheme, BPLS, works well and is competitive with existing ones in terms of accuracy, but significantly surpasses them in terms of running time. To summarize, the new method is straightforward to implement, is competitive and computationally more efficient than the existing ones, and is well-tailored for parallel implementation.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# MADA: 高度劣化によるメタ適応最適化

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent ( http://arxiv.org/abs/2401.08893v3 )

ライセンス: Link先を確認

Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher,

(参考訳) Adamの導入に続いて、ディープラーニングのための新しい適応最適化器が提案されている。これらのオプティマイザは一般的にいくつかのタスクで優れるが、すべてのタスクでAdamを均一に上回るものではない。本稿では,複数の既知のオプティマイザを一般化し,トレーニング中に最も適したオプティマイザを動的に学習する,統一オプティマイザフレームワークであるメタ適応オプティマイザ(MADA)を紹介する。 MADAのキーとなるアイデアは、オプティマイザの空間をパラメータ化して、トレーニング中に過度な降下を使って動的に探索することだ。我々は、MADAを視覚や言語タスクにおける他の人気のあるオプティマイザと経験的に比較し、MADAがAdamや他の人気のあるオプティマイザより一貫して優れており、サブ最適化されたハイパーパラメータに対して堅牢であることを確認した。 MADAは、GPT-2トレーニングや微調整において、他の一般的なオプティマイザと比較して、Adamよりも高い検証性能向上を実現している。 AVGradも提案する。AMSGradは最大演算子を平均演算子に置き換えたもので、高次最適化に適している。最後に、最適化器のパラメータ化補間が誤差境界(定数まで)を改善できることを示し、メタ最適化器の利点を示唆する収束解析を提供する。

Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# 適応器混合:微調整事前学習テキスト分類器の逆ロバスト性を高めるためのパラメータ効率の良い混合適応器

Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers ( http://arxiv.org/abs/2401.10111v2 )

ライセンス: Link先を確認

Tuc Nguyen, Thai Le,

(参考訳) 既存の研究は、パラメータ効率のよい微調整法(PEFT)を用いて微調整された分類タスクのための事前訓練された言語モデル(PLM)のトレーニングデータを増やすことで、敵攻撃下での堅牢性を高めることを示している。しかし、この敵対的なトレーニングパラダイムは、しばしばクリーンな入力のパフォーマンス低下を招き、新しい未知の攻撃を説明するために、データ全体を頻繁に再トレーニングする必要がある。これらの課題を克服しつつ、PEFTの利点と効率を生かし、(1)アダプタによる微調整と(2)ミックスアップによる敵の増強という2つのパラダイムを組み合わせた新しいアプローチを提案する。直感的には、AdpMixupファインチューンPLMは、クリーンかつ既知の逆数例を持つ複数のアダプタを持ち、予測中に異なる比率でそれらをインテリジェントに混合する。実験の結果,AdpMixupは6つのブラックボックス攻撃と2つのPLMに対して,既存の5つの下流タスクのベースラインと比較して,トレーニング効率とロバストネスの最良のトレードオフを実現していることがわかった。すべてのソースコードが利用可能になる。

Existing works show that augmenting the training data of pre-trained language models (PLMs) for classification tasks fine-tuned via parameter-efficient fine-tuning methods (PEFT) using both clean and adversarial examples can enhance their robustness under adversarial attacks. However, this adversarial training paradigm often leads to performance degradation on clean inputs and requires frequent re-training on the entire data to account for new, unknown attacks. To overcome these challenges while still harnessing the benefits of adversarial training and the efficiency of PEFT, this work proposes a novel approach, called AdpMixup, that combines two paradigms: (1) fine-tuning through adapters and (2) adversarial augmentation via mixup to dynamically leverage existing knowledge from a set of pre-known attacks for robust inference. Intuitively, AdpMixup fine-tunes PLMs with multiple adapters with both clean and pre-known adversarial examples and intelligently mixes them up in different ratios during prediction. Our experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks, compared to existing baselines on five downstream tasks across six varied black-box attacks and 2 PLMs. All source code will be available.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# 交通流予測のためのハイブリッド時変グラフニューラルネットワーク

A novel hybrid time-varying graph neural network for traffic flow forecasting ( http://arxiv.org/abs/2401.10155v4 )

ライセンス: Link先を確認

Ben-Ao Dai, Bao-Lin Ye, Lingxi Li,

(参考訳) インテリジェント交通システムの効率化には,リアルタイムかつ正確な交通流予測が不可欠である。従来の手法では、都市道路網における交通ノード間の空間的相関を記述するために、事前に定義されたグラフを持つグラフニューラルネットワーク(GNN)を用いることが多い。しかし、これらの事前定義されたグラフは、既存の知識やグラフ生成手法によって制限されており、空間的相関の完全な図形を提供していない。データ駆動学習に基づく時間変化グラフは、これらの制限に対処しようとするが、トラフィックデータに固有の空間的相関を適切に捉えることに苦慮している。さらに、動的時間相関を捕捉するための現在のほとんどの手法は、時間的多頭部自己注意機構を用いた統一的な計算方式に依存しており、あるレベルでは不正確な結果をもたらす可能性がある。これらの課題を克服するために,交通流予測のためのハイブリッド時変グラフニューラルネットワーク(HTVGNN)を提案する。まず,時間変化マスク強化に基づく新しい時間的知覚多頭部自己認識機構を報告し,トラフィックネットワーク内の異なるトラフィックノード間の動的時間的依存関係をより正確にモデル化した。次に,道路ネットワークにおける異なる交通ノード間の静的および動的空間的関連を同時に学習するグラフ学習手法を提案する。一方、時間変化グラフの学習能力を高めるために、各時間ステップで学習したグラフを結合するグラフ学習機構が設計された。最後に,提案手法の有効性を4つの実データを用いて実証した。シミュレーションの結果,HTVGNNは最先端の時空間グラフニューラルネットワークモデルと比較して予測精度が優れていることがわかった。さらに、このアブレーション実験により、結合グラフ学習機構がHTVGNNの長期予測性能を効果的に向上できることを確認した。

Real-time and precise traffic flow prediction is vital for the efficiency of intelligent transportation systems. Traditional methods often employ graph neural networks (GNNs) with predefined graphs to describe spatial correlations among traffic nodes in urban road networks. However, these pre-defined graphs are limited by existing knowledge and graph generation methodologies, offering an incomplete picture of spatial correlations. While time-varying graphs based on data-driven learning have attempted to address these limitations, they still struggle with adequately capturing the inherent spatial correlations in traffic data. Moreover, most current methods for capturing dynamic temporal correlations rely on a unified calculation scheme using a temporal multi-head self-attention mechanism, which at some level might leads to inaccuracies. In order to overcome these challenges, we have proposed a novel hybrid time-varying graph neural network (HTVGNN) for traffic flow prediction. Firstly, a novel enhanced temporal perception multi-head self-attention mechanism based on time-varying mask enhancement was reported to more accurately model the dynamic temporal dependencies among distinct traffic nodes in the traffic network. Secondly, we have proposed a novel graph learning strategy to concurrently learn both static and dynamic spatial associations between different traffic nodes in road networks. Meanwhile, in order to enhance the learning ability of time-varying graphs, a coupled graph learning mechanism was designed to couple the graphs learned at each time step. Finally, the effectiveness of the proposed method HTVGNN was demonstrated with four real data sets. Simulation results revealed that HTVGNN achieves superior prediction accuracy compared to the state of the art spatio-temporal graph neural network models. Additionally, the ablation experiment verifies that the coupled graph learning mechanism can effectively improve the long-term prediction performance of HTVGNN.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# AFS-BM: バイナリマスキングによる適応的特徴選択によるモデル性能の向上

AFS-BM: Enhancing Model Performance through Adaptive Feature Selection with Binary Masking ( http://arxiv.org/abs/2401.11250v2 )

ライセンス: Link先を確認

Mehmet Y. Turali, Mehmet E. Lorasdagi, Ali T. Koc, Suleyman S. Kozat,

(参考訳) 一般機械学習(ML)コンテキストにおける特徴選択の問題について検討する。しかし,これらの手法はスケーラビリティ,高次元データ管理,特徴の相関処理,特徴の多様性への適応,ドメイン知識の統合といった課題に直面している。この目的のために,これらの問題を修復する「二項マスキングによる適応的特徴選択(AFS-BM)」を導入する。 AFS-BMは、同時特徴選択とモデルトレーニングのための共同最適化によってこれを達成している。特に、トレーニングプロセス中に特徴とモデルパラメータのセットを継続的に適応させるために、共同最適化とバイナリマスキングを行います。このアプローチにより、モデルの精度が大幅に向上し、計算要求が減少する。我々は、AFS-BMと、実生活のコンペティションからよく知られたデータセットを用いて確立された特徴選択手法を比較する、広範な実験セットを提供する。以上の結果から,AFS-BMの精度は大幅に向上し,計算量も大幅に削減された。これは、AFS-BMが訓練過程における機能の重要性の変化を動的に調整する能力によって、この分野に重要な貢献をしたためである。結果の複製性に関するコードをオープンに共有し、さらなる研究を促進する。

We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the "Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# Leggett-Garg不等式による量子ドットデバイスにおける電子輸送の量子性:非平衡グリーン関数アプローチ

Quantumness of electron transport in quantum dot devices through Leggett-Garg inequalities: A non-equilibrium Green's function approach ( http://arxiv.org/abs/2401.12502v3 )

ライセンス: Link先を確認

Thingujam Yaiphalemba Meitei, Saikumar Krithivasan, Arijit Sen, Md Manirul Ali,

(参考訳) 電子状態のコヒーレントな操作は、ナノファブリケーションツールを利用することで量子ドット(QD)デバイスで達成できるが、これらのナノエレクトロニクスデバイスが量子力学的に振る舞う範囲を太くすることはしばしばである。そのため、電子状態のコヒーレントなダイナミクスが重要な役割を担っているため、量子技術の新興世界では、その非古典的な性質が最重要視される。このような背景から、LGI(Leggett-Garg inequality)の一般的な枠組みを利用して、ナノ構造を介する古典的および量子的輸送を、様々な2時間相関関数によって区別することができる。 2つの異なる時間における局所電荷検出を用いて、マルコフ力学と非マルコフ力学の両方の下で、元のLGIの量子違反が存在するかどうかを理論的に調査する。 LGI内の2時間相関子は、量子ランゲヴィン方程式を正確に解くことによって、非平衡グリーン関数(NEGF)によって導出される。貯水池と相互作用する量子系の非マルコフ力学の研究は、超高速な過渡状態における緩和現象を理解し、特に高速な量子デバイスに起こることを模倣するために重要である。非マルコフ記憶効果とともに電極の水平拡大を考慮し, 有限貯水池相関時間の影響を捉えることができる。さらに、電子貯水池間の有限バイアスを安全に考慮できるように、我々の計算では大きなバイアス制限はもはや課されない。我々のアプローチは、平衡から追い出される他の量子多体系の量子性を目撃する新たな可能性を開く可能性が高い。

Although coherent manipulation of electronic states can be achieved in quantum dot (QD) devices by harnessing nanofabrication tools, it is often hard to fathom the extent to which these nanoelectronic devices can behave quantum mechanically. Witnessing their nonclassical nature would thus remain of paramount importance in the emerging world of quantum technologies, since the coherent dynamics of electronic states plays there a crucial role. Against this backdrop, we resort to the general framework of Leggett-Garg inequalities (LGI) as it allows for distinguishing the classical and quantum transport through nanostructures by way of various two-time correlation functions. Using the local charge detection at two different time, we investigate here theoretically whether any quantum violation of the original LGI exists with varying device configurations and parameters under both Markovian and non-Markovian dynamics. Two-time correlators within LGI are derived in terms of the non-equilibrium Green's functions (NEGFs) by exactly solving the quantum Langevin equations. The present study of non-Markovian dynamics of quantum systems interacting with reservoirs is significant for understanding the relaxation phenomenon in the ultrafast transient regime to especially mimic what happens to high-speed quantum devices. We can potentially capture the effect of finite reservoir correlation time by accounting for level broadening at the electrodes along with non-Markovian memory effects. Furthermore, the large bias restriction is no longer imposed in our calculations so that we can safely consider a finite bias between the electronic reservoirs. Our approach is likely to open up new possibilities of witnessing the quantumness for other quantum many-body systems as well that are driven out of the equilibrium.

翻訳日:2024-06-19 06:54:55 公開日:2024-06-17

# 予測の公平な分布から社会財の公正な分布へ--機械学習が長期的失業に与える影響の評価

From the Fair Distribution of Predictions to the Fair Distribution of Social Goods: Evaluating the Impact of Fair Machine Learning on Long-Term Unemployment ( http://arxiv.org/abs/2401.14438v2 )

ライセンス: Link先を確認

Sebastian Zezulka, Konstantin Genin,

(参考訳) アルゴリズムによる政策の展開は、社会における重要な介入である。アルゴリズムフェアネスの代表的な方法は、特定の社会的文脈にアルゴリズムを配置した後に生じる社会財の分布よりも、訓練時の予測の分布に焦点を当てる。しかし、予測の「公正な」分布を必要とすることは、社会的商品の公平な分布を確立するための努力を損なう可能性がある。まず,この問題に対処するためには,展開後のソーシャルグッズ分布の変化を予見する予見的公正性の概念が必要であると論じる。第2に、この変更が事前デプロイデータから特定される正式な条件を提供する。それは、さまざまな種類のパフォーマンス効果を説明する必要がある。ここでは、予測が政策決定をどう変えるか、その結果、社会財の因果的下流分布に焦点をあてる。全体として、私たちは、公共行政からの申請によって導かれています。最近失業した人のうちの誰が長期的に失業し続け、労働市場プログラムで彼らを狙うかを予測するアルゴリズムの使用です。第3に、スイスの公共雇用サービスによる行政データを用いて、このようなアルゴリズムによるインフォームドポリシーが、長期失業における男女不平等にどのように影響するかをシミュレートする。リスク予測が統計的平等と機会の平等に従って「公正」である必要がある場合、ターゲティング決定は効果が低く、長期失業の全体的な水準を低くし、長期失業の男女格差を埋める努力を損なう。

Deploying an algorithmically informed policy is a significant intervention in society. Prominent methods for algorithmic fairness focus on the distribution of predictions at the time of training, rather than the distribution of social goods that arises after deploying the algorithm in a specific social context. However, requiring a "fair" distribution of predictions may undermine efforts at establishing a fair distribution of social goods. First, we argue that addressing this problem requires a notion of prospective fairness that anticipates the change in the distribution of social goods after deployment. Second, we provide formal conditions under which this change is identified from pre-deployment data. That requires accounting for different kinds of performative effects. Here, we focus on the way predictions change policy decisions and, consequently, the causally downstream distribution of social goods. Throughout, we are guided by an application from public administration: the use of algorithms to predict who among the recently unemployed will remain unemployed in the long term and to target them with labor market programs. Third, using administrative data from the Swiss public employment service, we simulate how such algorithmically informed policies would affect gender inequalities in long-term unemployment. When risk predictions are required to be "fair" according to statistical parity and equality of opportunity, targeting decisions are less effective, undermining efforts to both lower overall levels of long-term unemployment and to close the gender gap in long-term unemployment.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# HiFT:階層型フルパラメータ細調整戦略

HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy ( http://arxiv.org/abs/2401.15207v3 )

ライセンス: Link先を確認

Yongkang Liu, Yiqun Zhang, Qian Li, Tong Liu, Shi Feng, Daling Wang, Yifei Zhang, Hinrich Schütze,

(参考訳) 言語モデル(LM)を下流タスクに適応させる手段として,フルパラメータの微調整が選択肢となっている。 LMのサイズが大きくなるにつれて、LMの完全なパラメータを微調整するには、非常に大量のGPUメモリが必要である。既存のアプローチでは、ゼロオーダーオプティマイザを使用してGPUメモリを保存することで、非ゼロオーダーオプティマイザがほとんどのダウンストリームタスクに容易に収束する傾向があるため、LMのパフォーマンスを損なう可能性がある。本稿では,各学習段階におけるパラメータのサブセットのみを更新する,新しい最適化非依存型エンドツーエンド階層的微調整戦略であるHiFTを提案する。 HiFTは、GPUメモリに存在する勾配の量と最適化状態パラメータを同時に大幅に削減し、GPUメモリ使用量を減らすことができる。その結果,(1) HiFT はパラメータ効率の高いファインチューニングと標準のフルパラメータファインチューニングに匹敵する性能を達成できることがわかった。 2) HiFTは,AdamW,AdaGrad,SGDなど,さまざまなオプティマイザをサポートする。 (4) HiFTはメモリセーブ技術を用いることなく,AdamWオプティマイザを用いた精度32のシングル48G A6000上で7Bモデルのフルパラメータ微調整を可能にする。

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60\% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# RE-GAINS & EnCHANT: クエリ応答強化のためのインテリジェントツール操作システム

RE-GAINS & EnCHANT: Intelligent Tool Manipulation Systems For Enhanced Query Responses ( http://arxiv.org/abs/2401.15724v2 )

ライセンス: Link先を確認

Sahil Girhepuje, Siva Sankar Sajeev, Purvam Jain, Arya Sikder, Adithya Rama Varma, Ryan George, Akshay Govind Srinivasan, Mahendra Kurup, Ashmit Sinha, Sudip Mondal,

(参考訳) LLMの顕著な成功にもかかわらず、入力クエリやツール引数の記述が不十分なため、ツール呼び出しやツールチェーンに悩まされている。本稿では,RE-GAINSとEnCHANTという2つの新しいフレームワークを提案する。 EnCHANTはオープンソースのソリューションで、LLMフォーマットインクルーダー、LLM(OpenChat 3.5)、レトリバー(ToolBenchのAPI Retriever)を利用している。 RE-GAINSはOpenAIモデルと組み込みに基づいており、RAP論文に基づいた特別なプロンプトを使用している。どちらのソリューションもクエリ毎に0.01ドル以下でレイテンシが最小で、フレームワークの有用性を示している。

Despite the remarkable success of LLMs, they still suffer from tool invocation and tool chaining due to inadequate input queries and/or tool argument descriptions. We propose two novel frameworks, RE-GAINS and EnCHANT, enabling LLMs to tackle tool manipulation for solving complex user queries by making API calls. EnCHANT is an open-source solution that makes use of an LLM format enforcer, an LLM(OpenChat 3.5) and a retriever(ToolBench's API Retriever). RE-GAINS is based on OpenAI models and embeddings using a special prompt based on the RAP paper. Both solutions cost less than $0.01 per query with minimal latency, therefore showcasing the usefulness of the frameworks.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# 2つの石が1羽の鳥にぶつかる

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation ( http://arxiv.org/abs/2401.16421v2 )

ライセンス: Link先を確認

Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He,

(参考訳) 本研究では,言語系列の固有セグメンテーションを活用し,Bilevel Positional Encoding (BiPE)と呼ばれる新しい位置符号化法を設計する。それぞれの位置について、私たちのBiPEは、セグメント内エンコーディングとセグメント間エンコーディングをブレンドします。セグメント内エンコーディングはセグメント内の位置を識別し、絶対的な位置エンコーディングによってモデルがそこにある意味情報をキャプチャするのを助ける。セグメント間符号化はセグメントインデックスを規定し、セグメント間の関係をモデル化し、相対的な位置符号化による外挿能力の向上を目的としている。理論的分析は、この位置情報の絡み合いが学習をより効果的にすることを示している。実験結果から,BiPEは多種多様なテキストモダリティにおいて,幅広いタスクにまたがる長さの補間能力に優れていたことが示唆された。

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# 言語モデルでは人間の学習者と同じ認知バイアスが解けるか?

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? ( http://arxiv.org/abs/2401.18070v2 )

ライセンス: Link先を確認

Andreas Opedal, Alessandro Stolfo, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan,

(参考訳) 認知モデルとして大規模言語モデル(LLM)を採用することへの関心が高まっている。このような目的のために、人間の認知のどの特性がLLMによって適切にモデル化されているかを理解することが中心であり、どちらがそうでないかを理解することが重要である。本研究では,算術語問題の解法において,子どもに知られている問題とLLMの偏りについて検討する。学習科学の文献を調査した結果、問題解決の過程は、テキスト理解、ソリューション計画、ソリューション実行の3つのステップに分けることができると仮定した。これらの段階において,現在のLSMが子どもと同じ認知バイアスを示すかどうかを理解するために,それぞれのテストを構築した。我々は,これらの各テストに対して,問題特徴のきめ細かい制御を可能にするニューロシンボリックアプローチを用いて,新しい単語問題を生成する。我々は,LLMがテキスト理解と解法計画の両方において人間的な偏見を示すが,算術式が実行されて解を得る最終段階には現れないことを示す。

There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the problem-solving process can be split into three distinct steps: text comprehension, solution planning and solution execution. We construct tests for each one in order to understand whether current LLMs display the same cognitive biases as children in these steps. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features. We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not in the final step, in which the arithmetic expressions are executed to obtain the answer.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# 情報完全量子計測のためのデュアルフレーム最適化

Dual frame optimization for informationally complete quantum measurements ( http://arxiv.org/abs/2401.18071v2 )

ライセンス: Link先を確認

Laurin E. Fischer, Timothée Dao, Ivano Tavernelli, Francesco Tacchino,

(参考訳) 古典的なシャドウのようなランダム化測定プロトコルは量子技術の強力なリソースを表しており、量子状態のキャラクタリゼーションやプロセストモグラフィーから機械学習やエラー軽減まで幅広い応用がある。近年、古典的な影をPOVM効果の双対作用素に一般化する測定双対フレームの概念が文献で再浮上している。このことは、しばしば確立された技術によって無視されるランダム化測定の処理後の段階において、さらなる自由度に注意を向けた。本研究では,2重フレームを利用して,情報的に完全な測定サンプルから改良された観測可能推定器を構築する。実験周波数に基づくパラメタライズド・フレーム・スーパーオペレータと最適化自由なデュアルフレームの新たなクラスを導入し,計算効率を保ちながら,その標準周波数よりも優れていることを示す。興味深いことに、これは量子や古典的なコストがほとんどないため、デュアルフレームの最適化はランダム化測定ツールボックスに価値ある追加となる。

Randomized measurement protocols such as classical shadows represent powerful resources for quantum technologies, with applications ranging from quantum state characterization and process tomography to machine learning and error mitigation. Recently, the notion of measurement dual frames, in which classical shadows are generalized to dual operators of POVM effects, resurfaced in the literature. This brought attention to additional degrees of freedom in the post-processing stage of randomized measurements that are often neglected by established techniques. In this work, we leverage dual frames to construct improved observable estimators from informationally complete measurement samples. We introduce novel classes of parametrized frame superoperators and optimization-free dual frames based on empirical frequencies, which offer advantages over their canonical counterparts while retaining computational efficiency. Remarkably, this comes at almost no quantum or classical cost, thus rendering dual frame optimization a valuable addition to the randomized measurement toolbox.

翻訳日:2024-06-19 06:45:07 公開日:2024-06-17

# 位置エンコーディングは、ニューラルネットワークが大きな語彙を扱うのに役立つ

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary ( http://arxiv.org/abs/2402.00236v2 )

ライセンス: Link先を確認

Takashi Morita,

(参考訳) 本研究では、位置エンコーディングがリカレントニューラルネットワーク(RNN)の学習を促進するという直感的な発見を報告する。位置符号化は入力データ上の時間指標の高次元表現である。最も有名なのは、位置エンコーディングは、データ順序を表現する固有のメカニズムが欠如しているトランスフォーマーニューラルネットワークの能力を補完するものである。対照的に、RNNはデータポイントの時間情報を自身でエンコードすることができ、位置エンコーディングの使用は冗長/不要のように見える。それにもかかわらず、合成ベンチマークによる調査は、特に低周波トークンを生成する大きな語彙を扱うために、位置符号化とRNNの結合の利点を明らかにしている。さらなる精査により、これらの低周波トークンがバニラRNNの勾配を不安定にし、位置エンコーディングがこの不安定を解消することが明らかになった。これらの結果は、トランスフォーマーのタイムキーパーとしての役割を超えて、位置エンコーディングの実用性に新たな光を当てた。

This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# 大規模言語モデルに基づくコードレビュー自動化のためのファインチューニングとプロンプトエンジニアリング

Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation ( http://arxiv.org/abs/2402.00905v4 )

ライセンス: Link先を確認

Chanathip Pornprasit, Chakkrit Tantithamthavorn,

(参考訳) コンテキスト: 大規模言語モデル(LLM)の急速な進化は、コードレビュープロセスの自動化に彼らの能力を活用することに、大きな関心を喚起しました。以前の研究は、コードレビュー自動化のためのLLMの開発に注力することが多いが、高価なリソースを必要とするため、予算やリソースが限られている組織では不可能である。したがって、コードレビュー自動化にLLMを活用するための2つの一般的なアプローチは、微調整と迅速なエンジニアリングである。目的: LLMが微調整とプロンプトによって活用される場合の2つのコンテキストに基づいて,LLMベースのコードレビュー自動化の性能を検討することを目的とする。微調整には、特定のコードレビューデータセットでモデルをトレーニングすること、プロンプトには、特定のコードレビューデータセットを必要とせずに、モデル生成プロセスをガイドするための明確な命令を提供することが含まれる。方法: LLMベースのコードレビュー自動化において,モデルファインチューニングと推論技術(ゼロショット学習,少数ショット学習,ペルソナ)を活用する。総じて、2つのLCMベースのコードレビュー自動化(GPT-3.5とMagicoder)の12のバリエーションを調査し、それらをGuo et alのアプローチと既存のコードレビュー自動化アプローチ3つと比較する。結果: ゼロショット学習による GPT 3.5 の微調整により GPT-3.5 は 73.17% -74.23% の EM を達成することができる。さらに、GPT-3.5が微調整されていない場合、少数ショット学習のGPT-3.5は0ショット学習のGPT-3.5よりも46.38%から659.09%高いEMが得られる。結論: 結果から,(1) コードレビュー自動化のためのLLMは,最高のパフォーマンスを達成するために微調整する必要があること,(2) モデル微調整に十分なデータがない場合(例: コールドスタート問題)は,コードレビュー自動化のためのLLMにはペルソナを使わずに,ペルソナを使わなければならないこと,などが示唆された。

Context: The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation. Objective: We aim to investigate the performance of LLMs-based code review automation based on two contexts, i.e., when LLMs are leveraged by fine-tuning and prompting. Fine-tuning involves training the model on a specific code review dataset, while prompting involves providing explicit instructions to guide the model's generation process without requiring a specific code review dataset. Method: We leverage model fine-tuning and inference techniques (i.e., zero-shot learning, few-shot learning and persona) on LLMs-based code review automation. In total, we investigate 12 variations of two LLMs-based code review automation (i.e., GPT- 3.5 and Magicoder), and compare them with the Guo et al.'s approach and three existing code review automation approaches. Results: The fine-tuning of GPT 3.5 with zero-shot learning helps GPT-3.5 to achieve 73.17% -74.23% higher EM than the Guo et al.'s approach. In addition, when GPT-3.5 is not fine-tuned, GPT-3.5 with few-shot learning achieves 46.38% - 659.09% higher EM than GPT-3.5 with zero-shot learning. Conclusions: Based on our results, we recommend that (1) LLMs for code review automation should be fine-tuned to achieve the highest performance; and (2) when data is not sufficient for model fine-tuning (e.g., a cold-start problem), few-shot learning without a persona should be used for LLMs for code review automation.

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# Killer Apps: 高速で大規模なAI兵器

Killer Apps: Low-Speed, Large-Scale AI Weapons ( http://arxiv.org/abs/2402.01663v4 )

ライセンス: Link先を確認

Philip Feldman, Aaron Dant, James R. Foulds,

(参考訳) 人工知能(AI)と機械学習(ML)の急速な進歩は、OpenAI、Meta、Anthhropicといった組織による最先端のジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)モデルの開発によって強調され、戦争と安全保障における新たな課題と機会を提供する。現在注目されているのは、武器システムにおけるAIの統合と、速度論的衝突における迅速な意思決定におけるその役割である。しかし、同様に重要だが見落とされがちな側面は、情報領域内のインターネットスケールにおけるAIベースの心理的操作の可能性である。これらの能力は、世界中の個人、組織、社会に重大な脅威をもたらす可能性がある。本稿では,AI兵器の概念,その展開,検出,潜在的な対策について検討する。

The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# Minusformer: 逐次学習による時系列予測の改善

Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals ( http://arxiv.org/abs/2402.02332v3 )

ライセンス: Link先を確認

Daojun Liang, Haixia Zhang, Dongfeng Yuan, Bingzheng Zhang, Minggao Zhang,

(参考訳) 本稿では,ユビキタス時系列(TS)予測モデルが過度なオーバーフィッティングの傾向にあることを示す。この問題に対処するため,我々はTSの内在値を将来的な間隔で漸進的に再保存する非冗長アプローチを採用した。具体的には、ディープ・ブースティング・アンサンブル学習法である二重ストリーム・サブトラクション機構を導入する。そして、バニラ変換器は、情報集約機構を加算から減算に再配置することにより、更新される。そして、原モデルの各ブロックに補助出力分岐を組み込んで、最終的な予測につながるハイウェイを構築する。このブランチにおけるその後のモジュールの出力は、事前に学習した結果を減らし、モデルが監視信号の残余を層ごとに学習できるようにする。この設計により、学習駆動による入力ストリームと出力ストリームの漸進的分解が促進され、モデルの汎用性、解釈可能性、過度な適合に対するレジリエンスが向上する。モデル内のすべてのアグリゲーションはマイナス記号であるため、これはMinusformerと呼ばれる。大規模な実験により、提案手法は既存の最先端手法よりも優れており、様々なデータセットで平均11.9%の性能向上を実現している。このコードはhttps://github.com/Anoise/Minusformer.comでリリースされた。

In this paper, we find that ubiquitous time series (TS) forecasting models are prone to severe overfitting. To cope with this problem, we embrace a de-redundancy approach to progressively reinstate the intrinsic values of TS for future intervals. Specifically, we introduce a dual-stream and subtraction mechanism, which is a deep Boosting ensemble learning method. And the vanilla Transformer is renovated by reorienting the information aggregation mechanism from addition to subtraction. Then, we incorporate an auxiliary output branch into each block of the original model to construct a highway leading to the ultimate prediction. The output of subsequent modules in this branch will subtract the previously learned results, enabling the model to learn the residuals of the supervision signal, layer by layer. This designing facilitates the learning-driven implicit progressive decomposition of the input and output streams, empowering the model with heightened versatility, interpretability, and resilience against overfitting. Since all aggregations in the model are minus signs, which is called Minusformer. Extensive experiments demonstrate the proposed method outperform existing state-of-the-art methods, yielding an average performance improvement of 11.9% across various datasets.The code has been released at https://github.com/Anoise/Minusformer.

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# ポストホック解釈可能性と注意:数学的視点

Attention Meets Post-hoc Interpretability: A Mathematical Perspective ( http://arxiv.org/abs/2402.03485v2 )

ライセンス: Link先を確認

Gianluigi Lopardo, Frederic Precioso, Damien Garreau,

(参考訳) 注意に基づくアーキテクチャ、特にトランスフォーマーは、技術的な革命の中心にある。興味深いことに、幅広いアプリケーションにおける最先端の成果の獲得を支援することに加えて、アテンションメカニズムは本質的にモデルの内部動作に関する有意義な洞察を提供する。これらの洞察は説明として利用できますか? 物議を醸す。本稿では,簡単な注意に基づくアーキテクチャを数学的に研究し,ポストホックとアテンションに基づく説明の違いを指摘する。それらとは全く異なる結果が得られており、その制限にもかかわらず、ポストホック法は単に注意重みを調べるだけでなく、より有用な洞察を捉えることができることを示した。

Attention-based architectures, in particular transformers, are at the heart of a technological revolution. Interestingly, in addition to helping obtain state-of-the-art results on a wide range of applications, the attention mechanism intrinsically provides meaningful insights on the internal behavior of the model. Can these insights be used as explanations? Debate rages on. In this paper, we mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights.

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# 適応的勾配法で正方根を除去できるか? : 2次視点

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective ( http://arxiv.org/abs/2402.03496v5 )

ライセンス: Link先を確認

Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani,

(参考訳) Adam(W)のような適応的な勾配最適化アルゴリズムは、トランスフォーマーのような多くのディープラーニングアーキテクチャのデフォルトのトレーニングアルゴリズムである。彼らの対角プレコンディショナーは、平方根を介してパラメータ更新に組み込まれた勾配外積に基づいている。これらの方法はしばしば近似二階法として動機付けされるが、平方根は基本的な違いを表す。本研究では,適応手法の動作が根の除去時にどのように変化するか,すなわち2階のモチベーションを強化するかを検討する。意外なことに、これらの平方根自由適応法は、変換器の性能を維持しながら、畳み込みアーキテクチャ上のSGDへの一般化ギャップを閉じている。 2階の視点はまた、プレコンディショナー不変性の概念を通じて非対角適応法を開発するための実践的な利点も持っている。シャンプーのような根ベースの手法とは対照的に、根のない手法は数値的に不安定な行列の根分解や逆変換を必要としないため、半精度でうまく高速に機能する。これは対角法と非対角法の計算ギャップを埋めるのに役立つ。本研究は適応手法の開発に関する新たな知見を提供し,現在見過ごされている適応性の役割について重要な疑問を提起する。 (実験コード:https://github.com/yorkerlin/remove-the-square-root Optimizationr code:https://github.com/f-dangel/sirfshampoo)

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental difference. In this work, we investigate how the behavior of adaptive methods changes when we remove the root, i.e. strengthen their second-order motivation. Surprisingly, we find that such square-root-free adaptive methods close the generalization gap to SGD on convolutional architectures, while maintaining their root-based counterpart's performance on transformers. The second-order perspective also has practical benefits for developing non-diagonal adaptive methods through the concept of preconditioner invariance. In contrast to root-based methods like Shampoo, root-free counterparts work well and fast with half-precision since they do not require numerically unstable matrix root decompositions and inversions. This is useful to bridge the computation gap between diagonal and non-diagonal methods. Our findings provide new insights into the development of adaptive methods and raise important questions regarding the currently overlooked role of adaptivity for their success. (experiment code: https://github.com/yorkerlin/remove-the-square-root optimizer code: https://github.com/f-dangel/sirfshampoo)

翻訳日:2024-06-19 06:35:20 公開日:2024-06-17

# AIフィードバックによる強化学習を用いたビデオ用大規模マルチモーダルモデルのチューニング

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ( http://arxiv.org/abs/2402.03746v3 )

ライセンス: Link先を確認

Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi,

(参考訳) 近年の大規模言語モデルの発展は,ビデオ大マルチモーダルモデル(VLMM)の開発に影響を及ぼしている。 VLMMの以前のアプローチには、命令調整されたデータセットを使用したSupervised Fine-Tuning (SFT)、ビジュアルエンコーダとLLMの統合、学習可能なモジュールの追加が含まれていた。ビデオとテキストのマルチモーダルアライメントは、主にテキストのみのデータと比較して、マルチモーダル命令・トゥンデータのボリュームと品質が不足しているため、依然として困難である。本稿では,AIフィードバックからの強化学習(Reinforcement Learning from AI Feedback, RLAIF)と呼ばれる,マルチモーダルAIシステムを利用した新たなアライメント戦略を提案する。具体的には、映像コンテンツの理解を深めるため、好みフィードバックの生成中に、詳細な映像記述を文脈として提供し、文脈対応報酬モデルを提案する。我々のマルチモーダルRLAIFアプローチであるVLM-RLAIFはSFTモデルを含む既存の手法よりも優れています。私たちは、この分野のさらなる研究を促進するために、コード、モデル、データセットをオープンソース化することを約束します。

Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume and quality of multimodal instruction-tune data compared to text-only data. We present a novel alignment strategy that employs multimodal AI system to oversee itself called Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. In specific, we propose context-aware reward modeling by providing detailed video descriptions as context during the generation of preference feedback in order to enrich the understanding of video content. Demonstrating enhanced performance across diverse video benchmarks, our multimodal RLAIF approach, VLM-RLAIF, outperforms existing approaches, including the SFT model. We commit to open-sourcing our code, models, and datasets to foster further research in this area.

翻訳日:2024-06-19 06:25:35 公開日:2024-06-17

# 中性原子量子プロセッサを用いたブレンダー分解を用いた混合整数線形計画法

Mixed Integer Linear Programming Solver Using Benders Decomposition Assisted by Neutral Atom Quantum Processor ( http://arxiv.org/abs/2402.05748v2 )

ライセンス: Link先を確認

M. Yassine Naghmouchi, Wesley da Silva Coelho,

(参考訳) 本稿では,中性原子量子計算を用いたMILP(Mixed Integer Linear Programming)の解法を提案する。我々は,MILPをマスター問題 (MP) とサブプロブレム (SP) に分割するためにBenders decomposition (BD) を適用し,MP を擬似非拘束バイナリ最適化 (QUBO) モデルに変換した後,中性原子デバイスで処理する。我々のMILPからQUBOへの変換は、関連する連続変数の上限を狭め、必要量子ビット数とアルゴリズムの収束に肯定的に影響を及ぼす。 QUBOを解くため、我々は原子レジスタ埋め込みのためのヒューリスティックを開発し、パルス整形のための変分アルゴリズムを適用した。さらに、既存のソリューションよりも優れたPoC(Proof of Concept)を実装します。我々のアルゴリズムは,MPを擬似アニーリングを用いて解いた古典的BD手法よりも優れた,高品質な実現可能な解の95%以上を同定する。我々の知る限り、この研究は、BDを通してMILPを解くための、自動化された問題に依存しないフレームワークを開発する際に、中性原子量子プロセッサを利用する最初のものである。

This paper presents a new hybrid classical-quantum approach to solve Mixed Integer Linear Programming (MILP) using neutral atom quantum computations. We apply Benders decomposition (BD) to segment MILPs into a master problem (MP) and a subproblem (SP), where the MP is addressed using a neutral-atom device, after being transformed into a Quadratic Unconstrained Binary Optimization (QUBO) model, with an automatized procedure. Our MILP to QUBO conversion tightens the upper bounds of the involved continuous variables, positively impacting the required qubit count, and the convergence of the algorithm. To solve the QUBO, we develop a heuristic for atom register embedding and apply a variational algorithm for pulse shaping. In addition, we implement a Proof of Concept (PoC) that outperforms existing solutions. We also conduct preliminary numerical results: in a series of small MILP instances our algorithm identifies over 95 percent of feasible solutions of high quality, outperforming classical BD approaches where the MP is solved using simulated annealing. To the best of our knowledge, this work is the first to utilize a neutral atom quantum processor in developing an automated, problem-agnostic framework for solving MILPs through BD.

翻訳日:2024-06-19 06:25:35 公開日:2024-06-17

# 大規模言語モデルにおけるゼロ次フェデレーションチューニングの収束性について

On the Convergence of Zeroth-Order Federated Tuning for Large Language Models ( http://arxiv.org/abs/2402.05926v3 )

ライセンス: Link先を確認

Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen,

(参考訳) FL(Federated Learning)とLLM(Large Language Models)の合流は、プライバシを保存する自然言語処理の新しい時代を後押ししている。しかし、微調整LDMのメモリ要求は、特に限られた計算資源を持つクライアントにデプロイする場合、大きな課題を生じさせる。これを回避するために、フェデレーション設定におけるメモリ効率ゼロ階最適化の新たな統合、すなわちFedMeZOというシナジーについて検討する。本研究では, LLMの文脈におけるFedMeZOの理論的基盤について, 大きなパラメータ空間が最適化行動に与える影響, 収束特性の確立, パーソナライズされたフェデレーション戦略を伝えるための重要なパラメータの同定について, 主要な疑問に対処する。 FedMeZOは従来のFedAvgのような一階法よりも高速に収束するだけでなく、トレーニング中のGPUメモリ使用量を推論時に同等のレベルまで大幅に削減することを示す。さらに、クライアントの学習率をカスタマイズするための理論的洞察に基づくパーソナライズされたFL戦略は、損失削減を効果的に加速させることができる。我々は,LLMのフェデレーションファインチューニングの理論的および実践的な側面を橋渡しし,この分野のさらなる進歩と研究を促進することを願っている。

The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-Order Optimization within a federated setting, a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs, tackling key questions regarding the influence of large parameter spaces on optimization behavior, the establishment of convergence properties, and the identification of critical parameters for convergence to inform personalized federated strategies. Our extensive empirical evidence supports the theory, showing that FedMeZO not only converges faster than traditional first-order methods such as FedAvg but also significantly reduces GPU memory usage during training to levels comparable to those during inference. Moreover, the proposed personalized FL strategy that is built upon the theoretical insights to customize the client-wise learning rate can effectively accelerate loss reduction. We hope our work can help to bridge theoretical and practical aspects of federated fine-tuning for LLMs, thereby stimulating further advancements and research in this area.

翻訳日:2024-06-19 06:25:35 公開日:2024-06-17

# 対話型エージェント基礎モデル

An Interactive Agent Foundation Model ( http://arxiv.org/abs/2402.05929v2 )

ライセンス: Link先を確認

Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang,

(参考訳) 人工知能システムの開発は、静的なタスク固有のモデルから、幅広いアプリケーションでうまく機能する動的エージェントベースのシステムへと移行しつつある。多様なドメイン、データセット、タスクにまたがるAIエージェントのトレーニングに、新しいマルチタスクエージェントトレーニングパラダイムを使用するインタラクティブエージェント財団モデルを提案する。私たちのトレーニングパラダイムは、ビジュアルマスク付きオートエンコーダ、言語モデリング、次世代予測など、さまざまな事前トレーニング戦略を統合することで、汎用的で適応可能なAIフレームワークを実現しています。私たちは、ロボティクス、ゲームAI、ヘルスケアという3つの異なる領域でフレームワークのパフォーマンスを実演します。本モデルでは,各領域において意味的かつ文脈的に関係のある出力を生成する能力を示す。提案手法の強みは,ロボットシーケンス,ゲームプレイデータ,大規模ビデオデータセット,テキスト情報など,さまざまなデータソースを有効マルチモーダル・マルチタスク学習に活用することにある。我々のアプローチは、ジェネラリスト、アクションテイク、マルチモーダルシステムを開発するための有望な道を提供する。

The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.

翻訳日:2024-06-19 06:25:35 公開日:2024-06-17

# LLMにおける復号法の検討

A Thorough Examination of Decoding Methods in the Era of LLMs ( http://arxiv.org/abs/2402.06925v2 )

ライセンス: Link先を確認

Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam,

(参考訳) 復号法は、次世代の予測器から実用的なタスク解決器に言語モデルを変換する上で、必須の役割を果たす。主にタスク固有モデルに焦点を当てた復号法に関する先行研究は、汎用大規模言語モデル(LLM)の現在まで及ばない可能性がある。さらに、最近のデコード戦略の流入により、この状況はさらに複雑になっている。本稿では,LLMのコンテキスト内での様々なデコード手法の包括的かつ多面的解析を行い,その性能,ハイパーパラメータ変化に対する堅牢性,幅広いタスク,モデル,デプロイメント環境におけるデコード速度を評価する。その結果,復号法の性能は特にタスク依存的であり,アライメント,モデルサイズ,量子化などの要因に影響されていることが明らかとなった。興味深いことに、感度分析は、広範囲なハイパーパラメータチューニングのコストにおいて、特定の手法が優れたパフォーマンスを達成することを明らかにし、最適な結果と様々な状況における実装の実践性との間のトレードオフを強調している。

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.

翻訳日:2024-06-19 06:25:35 公開日:2024-06-17

# 縦型胸部X線における視覚的質問応答の事前学習モデル

Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays ( http://arxiv.org/abs/2402.08966v2 )

ライセンス: Link先を確認

Yeongjae Cho, Taehee Kim, Heejun Shin, Sungzoon Cho, Dongmyung Shin,

(参考訳) 差分視覚質問応答(diff-VQA)は、画像間の差分に基づいて複雑な質問に答えることを必要とする課題である。この課題は胸部X線画像の読影において特に重要であり, 放射線科医は疾患の進行と重症度の変化を追跡するために, 異なる時期に撮影された同一患者の複数の画像と比較することが多い。しかし、以前の研究はdiff-VQAタスクのための特定のネットワークアーキテクチャの設計に重点を置いており、事前訓練された視覚言語モデル(VLM)を使用してモデルの性能を向上させる機会を欠いていた。本稿では,diff-VQAタスクのための自然および縦部胸部X線データに基づいて,PLURALと呼ばれる新しいVLMを提案する。このモデルはステップバイステップのアプローチで開発され、まず自然画像やテキストで事前訓練され、続いて縦型胸部X線データを用いて訓練される。縦方向のデータは、X線画像の対と、時間とともに肺の異常や疾患の変化を記述した質問・回答セットと放射線技師の報告で構成されている。実験結果から,PLURALモデルは縦X線に対するdiff-VQAだけでなく,1枚のX線画像に対する従来のVQAにおいても,最先端の手法よりも優れていることがわかった。広範にわたる実験により,提案するVLMアーキテクチャの有効性と,モデルの性能向上のための事前学習手法の有効性を実証した。

Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images. This task is particularly important in reading chest X-ray images because radiologists often compare multiple images of the same patient taken at different times to track disease progression and changes in its severity in their clinical practice. However, previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance using a pretrained vision-language model (VLM). Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task. The model is developed using a step-by-step approach, starting with being pretrained on natural images and texts, followed by being trained using longitudinal chest X-ray data. The longitudinal data consist of pairs of X-ray images, along with question-answer sets and radiologist's reports that describe the changes in lung abnormalities and diseases over time. Our experimental results show that the PLURAL model outperforms state-of-the-art methods not only in diff-VQA for longitudinal X-rays but also in conventional VQA for a single X-ray image. Through extensive experiments, we demonstrate the effectiveness of the proposed VLM architecture and pretraining method in improving the model's performance.

翻訳日:2024-06-19 06:15:51 公開日:2024-06-17

# 生成型大規模言語モデルにおける確率論的推論

Probabilistic Reasoning in Generative Large Language Models ( http://arxiv.org/abs/2402.09614v2 )

ライセンス: Link先を確認

Aliakbar Nafar, Kristen Brent Venable, Parisa Kordjamshidi,

(参考訳) 本稿では,大言語モデル (LLM) が,確率値を介して明示的に定量化される不確実性を含む情報を含むテキストを推論する際に直面する課題について考察する。この種の推論は、日常的な会話から医療的な意思決定まで、さまざまな文脈に関係している。 LLMの数学的推論能力は改善されているものの、確率論的推論に関しては依然として重大な困難を呈している。この問題に対処するために,LLMの確率論的推論能力をテストするために設計された新しいデータセットであるBayesian Linguistic Inference Dataset (BLInD)を導入する。 BLInD を用いて確率論的推論を含むタスクにおいて LLM の限界を明らかにする。さらに,Pythonのコードや確率論的アルゴリズム,確率論的論理プログラミングなど,様々な形式表現に問題をマッピングするいくつかのプロンプト戦略を提案する。我々は,BLInDにおける手法の評価と因果推論質問応答データセットの適応を提供することで結論付けた。実験結果から,複数のLSMに対する提案手法の有効性が明らかになった。

This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We use BLInD to find out the limitations of LLMs for tasks involving probabilistic reasoning. In addition, we present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and an adaptation of a causal reasoning question-answering dataset. Our empirical results highlight the effectiveness of our proposed strategies for multiple LLMs.

翻訳日:2024-06-19 06:15:51 公開日:2024-06-17

# InSaAF: 正確性と公正性による安全性の確立 : LLMsはインド法定領域に向けて準備が整っているか?

InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain? ( http://arxiv.org/abs/2402.10567v4 )

ライセンス: Link先を確認

Yogesh Tripathi, Raghav Donakanti, Sahil Girhepuje, Ishan Kavathekar, Bhaskara Hanuma Vedula, Gokul S Krishnan, Shreya Goyal, Anmol Goel, Balaraman Ravindran, Ponnurangam Kumaraguru,

(参考訳) 近年の言語技術と人工知能の進歩により、判断の予測から要約の生成に至るまで、法域における様々なタスクを実行するために多くの言語モデルが提案されている。その大きな可能性にもかかわらず、これらのモデルは学習し、社会的偏見を示し、不公平な予測を行うことが証明されている。本研究では,大規模言語モデル(LLM)の社会的要因が関与するインドの景観における法的タスクを遂行する能力について検討する。 LLMの公平性と正確性の両方をカプセル化した新しい計量である$\beta$-weighted $\textit{Legal Safety Score (LSS_{\beta}$)} を提示する。我々は,インド社会における様々な格差の軸に関する課題と公正な展示において,その性能を考慮し,LCMsの安全性を評価する。 LLaMAとLLaMA--2モデルのタスク性能と公平性スコアは、提案されたLSS_{\beta}$メトリックが、法分野における安全な使用のためのモデルの可読性を効果的に決定できることを示している。また、偏見を緩和し、モデルの安全性を改善するための潜在的方法として、特別法データセットを利用した微調整パイプラインを提案する。LLaMAとLLaMA--2モデルの微調整手順は、LSS_{\beta}$を増大させ、インドの法域におけるユーザビリティを向上させる。私たちのコードは公開されています。

Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $\beta$-weighted $\textit{Legal Safety Score ($LSS_{\beta}$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_{\beta}$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_{\beta}$, improving their usability in the Indian legal domain. Our code is publicly released.

翻訳日:2024-06-19 06:15:51 公開日:2024-06-17

# Absinstruct: 可塑性推定による説明調整によるLLMからの抽象化能力の排除

AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation ( http://arxiv.org/abs/2402.10646v2 )

ライセンス: Link先を確認

Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See,

(参考訳) 抽象化能力は人間の知性において不可欠であり、NLP研究における様々なタスクにも有用である。既存の研究によると、LLMは抽象能力に欠けており、その改善方法はまだ解明されていない。本研究では,命令チューニングによるLLMの抽象化能力を向上するフレームワークAbsInstructを設計する。このフレームワークは、LLMが抽象化の根底にある理論的根拠を捉えるのを支援するために、詳細な説明で命令を構築する。一方,LLMの抽象的知識とより整合した命令を選択するための可視性推定器を導入する。そして、このフレームワークは抽象化命令と汎用命令を組み合わせてハイブリッドデータセットを構築する。大規模な実験と分析により,LLMの抽象化能力は,一般的な命令追従能力を維持しつつ,高い一般化性能で大幅に向上できることが示された。

Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs' abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LLMs in capturing the underlying rationale of abstraction. Meanwhile, we introduce a plausibility estimator to select instructions that are more consistent with the abstraction knowledge of LLMs to be aligned. Then, our framework combines abstraction instructions with general-purpose ones to build a hybrid dataset. Extensive experiments and analyses demonstrate that our framework can considerably enhance LLMs' abstraction ability with strong generalization performance while maintaining their general instruction-following abilities.

翻訳日:2024-06-19 06:15:51 公開日:2024-06-17

# 効率的な言語モデル推論のための言語間語彙適応に関する実証的研究

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference ( http://arxiv.org/abs/2402.10712v2 )

ライセンス: Link先を確認

Atsuki Yamaguchi, Aline Villavicencio, Nikolaos Aletras,

(参考訳) 最先端の生成型大言語モデル(LLM)の開発は、英語中心のトークン化器、語彙、事前学習データに依存している。 LLMには多言語機能があるにもかかわらず、近年の研究では、英語以外の言語でテキストを生成する際に、推論効率が低下することが示されている。その結果、推論時間とコストが増加する。下流の性能向上を目的としたターゲット言語にモデルを適用するために,言語間語彙適応法 (CVA) が提案されている。しかし, 生成LDMの推論効率向上に対するこれらの手法の有効性については, 未だ検討されていない。本稿では,4つの言語と4つの自然言語理解タスクにおける4つの生成LLM(単言語モデルと多言語モデルを含む)に対する5つのCVA手法の実証的研究を行う。 CVA は LLM の推論速度を最大 271.5 % まで向上させる。また、よりバランスの取れた多言語データに事前学習されたLLMを適応させることで、元のモデルに匹敵するダウンストリーム性能が得られることを示す。

The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation (CVA) methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of five CVA methods on four generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that CVA substantially contributes to LLM inference speedups of up to 271.5\%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

翻訳日:2024-06-19 06:15:51 公開日:2024-06-17

# LLMシミュレーションにおけるペルソナ効果の定量化

Quantifying the Persona Effect in LLM Simulations ( http://arxiv.org/abs/2402.10811v2 )

ライセンス: Link先を確認

Tiancheng Hu, Nigel Collier,

(参考訳) 大規模言語モデル(LLM)は、人間の言語と振る舞いをシミュレートする際、顕著な可能性を示してきた。本研究では,パーソナ変数のデコグラフィ,社会的,行動的要因の統合がLLMの多様な視点をシミュレートする能力にどのように影響するかを検討する。既存の主観的NLPデータセットにおいて,ペルソナ変数はアノテーションの10%のばらつきを考慮に入れている。それでも、LSMのプロンプトによるペルソナ変数の導入は、控えめではあるが統計的に有意な改善をもたらす。ペルソナのプロンプトは、多くのアノテーターが同意しないサンプルにおいて最も効果的であるが、それらの不一致は比較的小さい。ペルソナ変数と人間のアノテーションの相関が強くなるほど、LSMの予測がより正確になる。ゼロショット設定では、ペルソナを誘導する強力な70bモデルが、基底真理アノテーションに基づいて訓練された線形回帰によって達成可能なアノテーションの81%をキャプチャする。しかしながら、ペルソナ変数が説明力に制限があるほとんどの主観的NLPデータセットでは、ペルソナプロンプトの利点は限られている。

Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating persona variables via prompting in LLMs provides modest but statistically significant improvements. Persona prompting is most effective in samples where many annotators disagree, but their disagreements are relatively minor. Notably, we find a linear relationship in our setting: the stronger the correlation between persona variables and human annotations, the more accurate the LLM predictions are using persona prompting. In a zero-shot setting, a powerful 70b model with persona prompting captures 81% of the annotation variance achievable by linear regression trained on ground truth annotations. However, for most subjective NLP datasets, where persona variables have limited explanatory power, the benefits of persona prompting are limited.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# 偽装検出はより深くなるか? 偽装推論のためのデータセット, 評価, ベンチマーク

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning ( http://arxiv.org/abs/2402.11432v2 )

ライセンス: Link先を確認

Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao,

(参考訳) 虚偽検出は、現実のシナリオにおける重要性から注目を集めている。その主な目的は、ジェスチャー、表情、韻律など、マルチモーダルな手がかりから欺く行動を検出することである。しかしながら、これらの基盤は通常主観的であり、個人の習慣に関係している。そこで我々は, 虚偽検出を虚偽推論に拡張し, さらに主観的判断を支持する客観的な証拠を提供する。具体的には、潜在的な嘘と基本的な事実を提供し、その背景にある事実の矛盾と意図を組み合わせることによって、この文が嘘である可能性がある理由を分析する。偽造検出と比較すると、このタスクは現実世界のシナリオにもより適用可能である。例えば、尋問においては、警察は確固たる証拠に基づいて嘘をついているかどうかを判断すべきである。本稿では,データセットの構築や評価指標の定義など,この課題に対する最初の試みについて述べる。一方、このタスクは、大規模言語モデルの複雑な推論能力を評価するためのベンチマークとして機能する。コードとデータは公開されます。

Deception detection has attracted increasing attention due to its importance in real-world scenarios. Its main goal is to detect deceptive behaviors from multimodal clues such as gestures, facial expressions, prosody, etc. However, these bases are usually subjective and related to personal habits. Therefore, we extend deception detection to deception reasoning, further providing objective evidence to support subjective judgment. Specifically, we provide potential lies and basic facts and then analyze why this sentence may be a lie by combining factual inconsistencies and intent behind them. Compared with deception detection, this task is more applicable to real-world scenarios. For example, in interrogation, the police should judge whether a person is lying based on solid evidence. This paper presents our initial attempts at this task, including constructing a dataset and defining evaluation metrics. Meanwhile, this task can serve as a benchmark for evaluating the complex reasoning capability of large language models. Code and data will be made publicly available.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# alaVA:ライトビジョン言語モデルのためのGPT4V合成データのハーネス化

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models ( http://arxiv.org/abs/2402.11684v2 )

ライセンス: Link先を確認

Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang,

(参考訳) 大規模視覚言語モデル(LVLM)は、その強力な推論と一般化能力を備えた幅広い視覚言語タスクの前提を示してきた。しかし、訓練と配備にはかなりの計算資源が必要である。本研究では,従来のLVLMとリソースフレンドリなライトバージョンのパフォーマンスギャップを,高品質なトレーニングデータを用いて橋渡しすることを目的とする。そこで本研究では,合成データセットを生成するための包括的パイプラインを提案する。鍵となるアイデアは、強力なプロプライエタリなモデルを利用して生成することです。 (i)視覚言語アライメントのためのきめ細かい画像アノテーション (II)視覚指導微調整のための複合推論視覚質問応答ペアは、合計1.3Mサンプルを得る。合成データセット上で一連のライトVLMを訓練し,提案手法の有効性を実証し, 4B LVLM間で17のベンチマークで競合性能を達成し, 各種ベンチマークで7B/13Bスケールモデルと同等の性能を示す。この研究は、より効率的なLVLMを作成する際に、高品質なデータを採用する可能性を強調している。当社のデータセットであるtextit{ALLaVA} をオープンソースとして公開し、リソース効率のよい LVLM を広く活用するための研究コミュニティに公開しています。

Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To this end, we propose a comprehensive pipeline for generating a synthetic dataset. The key idea is to leverage strong proprietary models to generate (i) fine-grained image annotations for vision-language alignment and (ii) complex reasoning visual question-answering pairs for visual instruction fine-tuning, yielding 1.3M samples in total. We train a series of lite VLMs on the synthetic dataset and experimental results demonstrate the effectiveness of the proposed scheme, where they achieve competitive performance on 17 benchmarks among 4B LVLMs, and even perform on par with 7B/13B-scale models on various benchmarks. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. We name our dataset \textit{ALLaVA}, and open-source it to research community for developing better resource-efficient LVLMs for wider usage.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# ROSEはそうしない: 逆プロンプトコントラストデコーディングによる命令付き大規模言語モデルの安全性を高める

ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding ( http://arxiv.org/abs/2402.11889v2 )

ライセンス: Link先を確認

Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao,

(参考訳) 命令調整型大規模言語モデル(LLM)の開発により,LLMの安全性の向上がますます重要になっている。しかしながら、LLMの出力を期待される安全性に合わせるための現在のアプローチは、通常、高品質の安全データや高価な計算資源といった、費用がかかり非効率な訓練努力を必要とする。そこで本研究では,既存の命令調整 LLM の安全性を,追加の訓練を伴わずに直接向上させる,逆プロンプトコントラスト復号法 (ROSE) を提案する。 ROSEの原理は、慎重に設計された逆プロンプトによって引き起こされる望ましくない出力を抑えることにより、所望の安全出力の確率を改善することである。 6つの安全性と2つの汎用タスクの実験から、ROSEは5種類の命令調整LDMに対して、一貫した、重要な安全性向上(+13.8%の安全性スコア)をもたらすだけでなく、LLMの汎用能力にも恩恵をもたらすことが示されている。詳細な分析では、ROSEの基盤となるメカニズムを探求し、いつ、どこで使用するかを明らかにする。

With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt contrastive decoding (ROSE), a simple-yet-effective method to directly boost the safety of existing instruction-tuned LLMs without any additional training. The principle of ROSE is to improve the probability of desired safe output via suppressing the undesired output induced by the carefully-designed reverse prompts. Experiments on 6 safety and 2 general-purpose tasks show that, our ROSE not only brings consistent and significant safety improvements (up to +13.8% safety score) upon 5 types of instruction-tuned LLMs, but also benefits the general-purpose ability of LLMs. In-depth analyses explore the underlying mechanism of ROSE, and reveal when and where to use it.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# 自己回帰型言語モデルにおける知識蒸留の再検討

Revisiting Knowledge Distillation for Autoregressive Language Models ( http://arxiv.org/abs/2402.11890v2 )

ライセンス: Link先を確認

Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, Dacheng Tao,

(参考訳) 知識蒸留(KD)は、より小さな学生モデルを訓練することで、推論コストとメモリフットプリントを減らすために教師モデルを圧縮する一般的な手法である。しかし、自己回帰言語モデル(LM)の文脈では、より大きな教師のLMが劇的に貧しい学生に繋がる可能性があることを実証的に見出した。この問題に対処するため、我々は一連の分析を行い、異なるトークンが異なる指導モードを持ち、性能劣化につながることを明らかにした。そこで本研究では,KD を改善するための簡易かつ効果的な適応型教育手法 (ATKD) を提案する。 ATKDの中核は、ロート学習を減らし、教育をより多様で柔軟なものにすることだ。 8つのLMタスクに関する大規模な実験は、ATKDの助けを借りて、様々なベースラインのKD手法が、すべてのモデルタイプとサイズに対して一貫した、重要なパフォーマンス向上(平均スコア+3.04%)を達成することを示した。より奨励的に、ATKDは学生モデルの一般化を効果的に改善することができる。

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that different tokens have different teaching modes, neglecting which will lead to performance degradation. Motivated by this, we propose a simple yet effective adaptive teaching approach (ATKD) to improve the KD. The core of ATKD is to reduce rote learning and make teaching more diverse and flexible. Extensive experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains (up to +3.04% average score) across all model types and sizes. More encouragingly, ATKD can improve the student model generalization effectively.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# Self-AMPLIFY: セルフポストホック説明による小言語モデルの改善

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations ( http://arxiv.org/abs/2402.12038v3 )

ライセンス: Link先を確認

Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot,

(参考訳) インプロンプトとインコンテキスト学習(ICL)に自然言語の合理性を組み込むことで、LLM(Large Language Models)のパフォーマンスが大幅に向上した。しかし、高品質な合理性を生成するには、人間のアノテーションや補助的なプロキシモデルの使用が必要である。そこで本研究では,Small Language Models (SLM) に適用したポストホックな説明手法から論理式を自動的に生成する自己AMPLIFYを提案する。 Self-AMPLIFYは、サンプルをターゲットとし、合理性を生成し、ICLを活用するための最後のプロンプトを構築する3段階のメソッドである。自己AMPLIFYのパフォーマンスは、強力な推論能力を必要とする4つのSLMと5つのデータセットで評価される。 Self-AMPLIFYは競争相手に対して良い結果をもたらし、高い精度の改善をもたらす。 Self-AMPLIFYは、自己回帰型言語モデルにポストホックな説明法を適用した最初の方法である。

Incorporating natural language rationales in the prompt and In-Context Learning (ICL) have led to a significant improvement of Large Language Models (LLMs) performance. However, generating high-quality rationales require human-annotation or the use of auxiliary proxy models. In this work, we propose Self-AMPLIFY to automatically generate rationales from post hoc explanation methods applied to Small Language Models (SLMs) to improve their own performance. Self-AMPLIFY is a 3-step method that targets samples, generates rationales and builds a final prompt to leverage ICL. Self-AMPLIFY performance is evaluated on four SLMs and five datasets requiring strong reasoning abilities. Self-AMPLIFY achieves good results against competitors, leading to strong accuracy improvement. Self-AMPLIFY is the first method to apply post hoc explanation methods to autoregressive language models to generate rationales to improve their own performance in a fully automated manner.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# チェックの学習:大規模言語モデルにおける自己補正の可能性

Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models ( http://arxiv.org/abs/2402.13035v3 )

ライセンス: Link先を確認

Che Zhang, Zhenyang Xiao, Chengcheng Han, Yixin Lian, Yuejian Fang,

(参考訳) 自己補正は、大きな言語モデル(LLM)から生成された出力のスタイルと安全性を向上させることで、驚くべき成果を上げている。しかし、近年の研究では、LLMが論理的誤りを特定するのが困難であることから、自己訂正は限定的あるいは非生産的である可能性が示唆されている。本稿では,タスクチェックのためのトレーニングデータを構築することで,LCMの自己チェック能力を向上させることを目的とする。具体的には、思考の連鎖(CoT)手法を自己チェックタスクに適用し、ステップレベルの詳細な分析と説明を利用して推論経路の正しさを評価する。我々は「ステップCoTチェック」と呼ばれる特殊なチェックフォーマットを提案する。このフォーマットに従うと、ステップバイステップの分析とチェックを含むチェック補正データセットを構築する。次に,LLMの微調整を行い,その誤り検出と補正能力を向上させる。実験により,複数のベンチマークでLLMの自己検査と自己補正能力を大幅に向上させることを示す。このアプローチは、特に不正な位置の特定において他のフォーマットよりも優れ、より大きなモデルでより大きな利点が観察される。再現性のために、すべてのデータセットとコードはhttps://github.com/bammt/Learn-to-checkで提供されている。

Self-correction has achieved impressive results in enhancing the style and security of the generated output from large language models (LLMs). However, recent studies suggest that self-correction might be limited or even counterproductive in reasoning tasks due to LLMs' difficulties in identifying logical mistakes. In this paper, we aim to enhance the self-checking capabilities of LLMs by constructing training data for checking tasks. Specifically, we apply the Chain of Thought (CoT) methodology to self-checking tasks, utilizing fine-grained step-level analyses and explanations to assess the correctness of reasoning paths. We propose a specialized checking format called "Step CoT Check". Following this format, we construct a checking-correction dataset that includes detailed step-by-step analysis and checking. Then we fine-tune LLMs to enhance their error detection and correction abilities. Our experiments demonstrate that fine-tuning with the "Step CoT Check" format significantly improves the self-checking and self-correction abilities of LLMs across multiple benchmarks. This approach outperforms other formats, especially in locating the incorrect position, with greater benefits observed in larger models. For reproducibility, all the datasets and code are provided in https://github.com/bammt/Learn-to-check.

翻訳日:2024-06-19 06:06:06 公開日:2024-06-17

# 視覚言語モデルにおける社会バイアス評価のための統一フレームワークとデータセット

A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models ( http://arxiv.org/abs/2402.13636v2 )

ライセンス: Link先を確認

Ashutosh Sathe, Prachi Jain, Sunayana Sitaram,

(参考訳) ヴィジュアル言語モデル(VLM)は、産業とアカデミックの両方で広く採用されている。本研究では,VLMにおける職業に関する性別,人種,年齢の偏りを体系的に評価するための統一的な枠組みを提案する。我々の評価は、画像からテキストへ、テキストからテキストへ、テキストから画像へ、画像から画像へを含む、最近のVLMでサポートされているすべての推論モードを含む。さらに、生成したテキストと画像の両方において、異なる専門分野にわたる性別、人種、年齢情報を意図的に隠蔽する高品質な合成データセットを生成する自動パイプラインを提案する。データセットには、各専門職のアクションベースの記述が含まれており、視覚言語モデル(VLM)における社会的バイアスを評価するためのベンチマークとして機能している。広範に使用されているVLMの比較分析では,入力出力モードの変動が,バイアスの大きさと方向の差を識別できることを示した。さらに, VLMモデルでは, 異なるバイアス特性に対して, 異なるバイアス特性を示すことが判明した。私たちの仕事は、VLMの改善における今後の進歩を、社会的に偏見のない表現を学ぶのに役立つことを願っています。データとコードを公開します。

Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible differences in bias magnitudes and directions. Additionally, we find that VLM models exhibit distinct biases across different bias attributes we investigated. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# KinIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection (英語)

KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection ( http://arxiv.org/abs/2402.13671v2 )

ライセンス: Link先を確認

Michal Spiegel, Dominik Macko,

(参考訳) SemEval-2024 Task 8は、マルチジェネレータ、マルチドメイン、マルチランガルブラックボックスマシン生成テキスト検出に重点を置いている。このような検出は、言語モデル(LLM)の潜在的な誤用を防ぐために重要である。我々は,テキスト分類において,言語識別とより小さなLLMのパラメータ効率の微調整を利用して,この課題に複数の方法で対処してきた。さらに、言語ごとの分類閾値校正を用いて、微調整モデル予測と統計的検出指標を一意に組み合わせ、システム検出性能の一般化を図った。提案手法は,第4位にランクインし,勝者のわずか1ポイント未満の競争結果を得た。

SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection. Such a detection is important for preventing a potential misuse of large language models (LLMs), the newest of which are very capable in generating multilingual human-like texts. We have coped with this task in multiple ways, utilizing language identification and parameter-efficient fine-tuning of smaller LLMs for text classification. We have further used the per-language classification-threshold calibration to uniquely combine fine-tuned models predictions with statistical detection metrics to improve generalization of the system detection performance. Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# ファクチュアル知識のひび割れ:大規模言語モデルにおける退化知識ニューロンの包括的解析

Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models ( http://arxiv.org/abs/2402.13731v2 )

ライセンス: Link先を確認

Yuheng Chen, Pengfei Cao, Yubo Chen, Yining Wang, Shengping Liu, Kang Liu, Jun Zhao,

(参考訳) 大規模言語モデル(LLM)は、膨大な事実知識を格納するが、その基盤となるメカニズムはいまだ不明である。以前の研究では、事実知識は多層パーセプトロンの重みに格納され、いくつかの記憶ユニットは縮退知識ニューロン(DKN)と呼ばれる縮退性を示すことが示唆された。この概念の斬新さと独特な性質にもかかわらず、厳密に定義されたり体系的に研究されたりはしていない。まず、MLPニューロンの接続重みパターンを考察し、構造的側面と機能的側面の両方からDKNを定義する。これに基づいて神経学的トポロジ・クラスタリング法を導入し,任意の数や構造にDKNが形成されることにより,より正確なDKNの取得が可能となる。さらに、認知科学に触発されて、我々はDKNとLLMの堅牢性、進化性、複雑さとの関係を探求する。 6 つの条件下で34 実験を行った結果,DKN とこれら3 つの特性の関連性が示された。コードはまもなく利用可能になる。

Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear. Previous research suggests that factual knowledge is stored within multi-layer perceptron weights, and some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons (DKNs). Despite the novelty and unique properties of this concept, it has not been rigorously defined or systematically studied. We first consider the connection weight patterns of MLP neurons and define DKNs from both structural and functional aspects. Based on this, we introduce the Neurological Topology Clustering method, which allows the formation of DKNs in any numbers and structures, leading to a more accurate DKN acquisition. Furthermore, inspired by cognitive science, we explore the relationship between DKNs and the robustness, evolvability, and complexity of LLMs. Our execution of 34 experiments under 6 settings demonstrates the connection between DKNs and these three properties. The code will be available soon.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# MLXP: Pythonで再現可能な実験を行うフレームワーク

MLXP: A Framework for Conducting Replicable Experiments in Python ( http://arxiv.org/abs/2402.13831v2 )

ライセンス: Link先を確認

Michael Arbel, Alexandre Zouaoui,

(参考訳) 機械学習(ML)研究の再現性は、複雑な非決定論的アルゴリズムの利用と、モデルアーキテクチャやトレーニングデータセットなどの多くのハイパーパラメータ選択への依存により、ますます懸念されている。再現性と複製性のある結果の確保は、この分野を前進させるには不可欠であるが、堅牢な結論を得るための体系的かつよく組織された実験を行うためには、重要な技術的努力を必要とすることが多い。実験管理の促進と再現性向上のためにいくつかのツールが開発されているが、産業環境ではうまく処理されているにもかかわらず、研究コミュニティ内での採用を妨げる複雑さをしばしば導入している。低採用の課題に対処するため、オープンソースでシンプルで軽量なPythonベースの実験管理ツールであるMLXPがhttps://github.com/inria-thoth/mlxp.comで公開されている。 MLXPは、高い再現性を確保しながら、最小限のオーバーヘッドで実験プロセスを合理化します。

Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp . MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# 社会環境設計

Social Environment Design ( http://arxiv.org/abs/2402.14090v3 )

ライセンス: Link先を確認

Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen,

(参考訳) 人工知能(AI)は、政府や経済政策の改善に使用できる技術として、約束を守る。本稿では、強化学習、EconCS、計算社会選択のコミュニティと連携する自動政策作成にAIを使用するための一般的なフレームワークである社会環境設計を導入することにより、この目的に向けた新たな研究課題を提案する。このフレームワークは、一般的な経済環境を捉え、政策目標に関する投票を含め、AIシミュレーションを通じて政府と経済政策を体系的に分析するための方向性を提供する。 AIベースの政策決定における今後の研究の鍵となるオープンな問題を強調します。これらの課題を解決することで、我々は様々な社会福祉目標を達成することができ、それによってより倫理的で責任ある意思決定を促進することを望んでいます。

Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# COBIAS:バイアス評価におけるコンテキスト信頼性

COBIAS: Contextual Reliability in Bias Assessment ( http://arxiv.org/abs/2402.14889v2 )

ライセンス: Link先を確認

Priyanshul Govil, Hemang Jain, Vamshi Krishna Bonagiri, Aman Chadha, Ponnurangam Kumaraguru, Manas Gaur, Sanorita Dey,

(参考訳) 大規模な言語モデル(LLM)は、広範囲なウェブコーパスで訓練されており、人間のようなテキストを理解して生成することができる。しかし、このトレーニングプロセスはモデルに固有のバイアスをもたらす。これらのバイアスは、様々なステレオタイプや偏見を含む、Webデータの多様性と、しばしば未修正の性質から生じる。デバイアスモデルに関するこれまでの作業は、メソッドのパフォーマンスを測定するためにベンチマークデータセットに依存していた。しかし、これらのデータセットは、偏見の非常に主観的な理解のため、いくつかの落とし穴に悩まされ、文脈探索の重要な必要性が浮かび上がっている。本稿では,それらが生じる可能性のある多様な状況を考慮して,入力の文脈を理解することを提案する。私たちの貢献は2つあります。 (i)2つの既存のバイアスベンチマークデータセットから2,291個のステレオタイプステートメントを拡張し、コンテキストを追加するためのポイントを付与する。 (II) 文脈指向バイアス指標と評価スコア(COBIAS)を開発し, バイアス測定における文の文脈的信頼性を評価する。我々の計量は、文の文脈的信頼性に関する人間の判断(Spearman's $\rho = 0.65, p = 3.4 * 10^{-60}$)と一致し、バイアス軽減作業を支援する信頼できるデータセットを作成するために使用できる。

Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measure their method's performance. However, these datasets suffer from several pitfalls due to the highly subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of inputs by considering the diverse situations in which they may arise. Our contribution is two-fold: (i) we augment 2,291 stereotyped statements from two existing bias-benchmark datasets with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess a statement's contextual reliability in measuring bias. Our metric aligns with human judgment on contextual reliability of statements (Spearman's $\rho = 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable datasets, which would assist bias mitigation works.

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# 人間がどうやってコードを書くのか? 大型モデルも同じように

How Do Humans Write Code? Large Models Do It the Same Way Too ( http://arxiv.org/abs/2402.15729v2 )

ライセンス: Link先を確認

Long Li, Xuzheng He,

(参考訳) Program-of-Thought (PoT) は、自然言語ベースのChain-of-Thought (CoT) を、計算エラーを回避するために外部ツールコールを利用することで、Large Language Models (LLM) の数学的推論タスクで最も一般的な方法として置き換える。しかし, GPT-4 と Llama シリーズの評価では, CoT と比較して,PoT の誤式や論理の欠陥などの推論誤差が大きくなることが判明した。この問題に対処するために,PoTとCoTの統合を支援する一連の戦略を活用するHTL(Human-Think Language)を提案する。 2) 注意点より論理的なコードを生成するため、PoT中のCoT推論にモデル注意を向ける。 3)難解な数学問題の解法において,LLMの繰り返し推論ステップを防止するため,CoT応答とPoT応答の精度を報奨として活用する強化学習を行う。 Llama-Baseモデルでは平均6.5%,Mistral-Baseモデルでは4.3%の改善を実現している。また、5つのドメイン外のデータセットに対して、モデルの情報フローを制御し、強い転送可能性を示すことにより、大きな効果を示す。さらに、HTLは非数学的自然言語推論タスクにおいて最も顕著な改善を示し、統一推論タスクフレームワークに寄与している。

Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models (LLMs) mathematical reasoning tasks by utilizing external tool calls to circumvent computational errors. However, our evaluation of the GPT-4 and Llama series reveals that using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT. To address this issue, we propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT, encompassing: (1) a new generation paradigm that uses full CoT reasoning to control code generation. (2) Focus Attention, that directs model attention to the CoT reasoning during PoT to generate more logical code. (3) reinforcement learning that utilizes the accuracy of both CoT and PoT responses as rewards to prevent repetitive reasoning steps in LLMs when solving difficult math problems. Our method achieves an average improvement of 6.5% on the Llama-Base model and 4.3% on the Mistral-Base model across 8 mathematical calculation datasets. It also shows significant effectiveness on five out-of-domain datasets by controlling the model's information flow, exhibiting strong transferability. Additionally, HTL shows the most significant improvement in non-mathematical natural language inference task, contributing to a unified reasoning task framework

翻訳日:2024-06-19 05:56:21 公開日:2024-06-17

# 真理とファシリテーティング・チェンジの展開--エージェントによる大規模社会運動シミュレーションを目指して

Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation ( http://arxiv.org/abs/2402.16333v2 )

ライセンス: Link先を確認

Xinyi Mou, Zhongyu Wei, Xuanjing Huang,

(参考訳) ソーシャルメディアは社会運動の基盤として現れ、社会変革の推進に大きな影響を与えている。大衆の反応をシミュレートし、潜在的な影響を予測することがますます重要になっている。しかし,このような現象をシミュレートする既存の手法は,社会運動参加者の行動を把握する上での有効性と効率性に関する課題に直面している。本稿では,ソーシャルメディアユーザシミュレーションのためのハイブリッドフレームワークHiSimを紹介し,ユーザを2つのタイプに分類する。コアユーザはLarge Language Modelsによって駆動されるが、多くの一般ユーザはdeductive agent-based modelによってモデル化される。さらに、トリガイベントに続く応答ダイナミクスを再現するために、Twitterのような環境を構築します。次に,実世界のデータセットを対象とした総合的な実験を行うための,多面的ベンチマークSoMoSiMu-Benchを開発した。実験の結果,本手法の有効性と柔軟性が示された。

Social media has emerged as a cornerstone of social movements, wielding significant influence in driving societal change. Simulating the response of the public and forecasting the potential impact has become increasingly important. However, existing methods for simulating such phenomena encounter challenges concerning their efficacy and efficiency in capturing the behaviors of social movement participants. In this paper, we introduce a hybrid framework HiSim for social media user simulation, wherein users are categorized into two types. Core users are driven by Large Language Models, while numerous ordinary users are modeled by deductive agent-based models. We further construct a Twitter-like environment to replicate their response dynamics following trigger events. Subsequently, we develop a multi-faceted benchmark SoMoSiMu-Bench for evaluation and conduct comprehensive experiments across real-world datasets. Experimental results demonstrate the effectiveness and flexibility of our method.

翻訳日:2024-06-19 05:46:37 公開日:2024-06-17

# KoDialogBench:韓国語対話ベンチマークによる言語モデルの会話的理解の評価

KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark ( http://arxiv.org/abs/2402.17377v2 )

ライセンス: Link先を確認

Seongbo Jang, Seonghyeon Lee, Hwanjo Yu,

(参考訳) 言語モデルは、しばしばチャットボットアシスタントとしてデプロイされるため、モデルがユーザの最初の言語で会話を行うようになる。これらのモデルは幅広い言語で訓練されているが、韓国語のような低リソース言語におけるそれらの能力の総合的な評価は欠如している。本研究では,韓国語における言語モデルの対話能力を評価するためのベンチマークであるKoDialogBenchを紹介する。この目的のために,日中の話題に関する韓国語対話を公開資料から収集したり,他言語からの対話を翻訳したりする。次に、これらの会話を多様なテストデータセットに構成し、対話理解から応答選択タスクにまたがる。提案手法を応用して,韓国語対話の基盤的理解を測定するために,様々な言語モデルの広範囲な評価と分析を行う。実験結果から,モデルの会話能力向上のための重要な場があることが示唆された。さらに、異なる言語モデル間での詳細な比較では、会話の熟練度を高めるための最近の訓練手法の有効性を強調した。我々はKoDialogBenchが韓国語モデルの発展を促進することを期待する。

As language models are often deployed as chatbot assistants, it becomes a virtue for models to engage in conversations in a user's first language. While these models are trained on a wide range of languages, a comprehensive evaluation of their proficiency in low-resource languages such as Korean has been lacking. In this work, we introduce KoDialogBench, a benchmark designed to assess language models' conversational capabilities in Korean. To this end, we collect native Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks. Leveraging the proposed benchmark, we conduct extensive evaluations and analyses of various language models to measure a foundational understanding of Korean dialogues. Experimental results indicate that there exists significant room for improvement in models' conversation skills. Furthermore, our in-depth comparisons across different language models highlight the effectiveness of recent training techniques in enhancing conversational proficiency. We anticipate that KoDialogBench will promote the progress towards conversation-aware Korean language models.

翻訳日:2024-06-19 05:46:37 公開日:2024-06-17

# シリコンバレーの群衆の知恵: LLM Ensemble Prediction Capability Rival Human Crowd Accuracy

Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy ( http://arxiv.org/abs/2402.19379v5 )

ライセンス: Link先を確認

Philipp Schoenegger, Indre Tuminauskaite, Peter S. Park, Rafael Valdece Sousa Bastos, Philip E. Tetlock,

(参考訳) 実際の人間の予測精度は、「群衆の知恵」効果に依存しており、個々の予測者の群集に集結することで、将来の出来事に関する予測が著しく改善される。大規模言語モデル(LLMs)の予測能力に関する過去の研究は、フロンティアのLLMは、人混みの予測・学習集約のゴールド標準に比べて性能が劣っていることを示唆している。研究1では、12個のLLMの群集からなるLLMアンサンブルアプローチを用いて、この研究を拡大する。我々は,31の2進数質問に対するLLM予測を,3ヶ月の予測トーナメントの925人の予測者の群集と比較した。我々の事前登録された主要な分析は、LLMの群集が単純な非情報ベンチマークよりも優れており、統計的にヒトの群集と異なるものではないことを示している。また、アクセプション効果やラウンド数を好む傾向など、機械応答における人間のようなバイアスの集合も観察する。研究2では,LLM予測(GPT-4とClaude 2)が人間の認知的アウトプットに描画することで改善できるかどうかを検証した。両モデルの予測精度は、中央値の人間の予測を情報として露出することで、精度を17%から28%向上させることで得られるが、これは人や機械の予測を単に平均化するよりも精度の低い予測につながる。以上の結果から, LLMは, 簡易で実用的な予測集計手法により, 人群に匹敵する予測精度を達成できることが示唆された。

Human forecasting accuracy in practice relies on the 'wisdom of the crowd' effect, in which predictions about future events are significantly improved by aggregating across a crowd of individual forecasters. Past work on the forecasting ability of large language models (LLMs) suggests that frontier LLMs, as individual forecasters, underperform compared to the gold standard of a human-crowd forecasting-tournament aggregate. In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of 12 LLMs. We compare the aggregated LLM predictions on 31 binary questions to those of a crowd of 925 human forecasters from a three-month forecasting tournament. Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark, and is not statistically different from the human crowd. We also observe a set of human-like biases in machine responses, such as an acquiescence effect and a tendency to favour round numbers. In Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models' forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%, though this leads to less accurate predictions than simply averaging human and machine forecasts. Our results suggest that LLMs can achieve forecasting accuracy rivaling that of the human crowd: via the simple, practically applicable method of forecast aggregation.

翻訳日:2024-06-19 05:46:37 公開日:2024-06-17

# DiaHalu: 大規模言語モデルのための対話レベルの幻覚評価ベンチマーク

DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models ( http://arxiv.org/abs/2403.00896v2 )

ライセンス: Link先を確認

Kedi Chen, Qin Chen, Jie Zhou, Yishen He, Liang He,

(参考訳) 近年, 大規模言語モデル (LLM) が大きな成功を収めているため, 幻覚の問題は依然として課題であり, 幻覚を検出するためのベンチマークが多数提案されている。しかしながら、これらのベンチマークのいくつかはLLMによって自然に生成されるものではなく、意図的に誘導される。また、忠実な幻覚を無視しながら、事実の幻覚にのみ焦点をあてる者も多い。さらに,LLMの時代には,対話パターンが広く利用されているが,現在のベンチマークでは文レベルと通過レベルの幻覚にのみ焦点が当てられている。本研究では,対話レベルの幻覚評価ベンチマークDiaHaluを提案する。当初、収集したトピックをシステムプロンプトに統合し、2つのChatGPT3.5間の対話を容易にする。その後、人間の言語規則に従わない内容を手動で修正し、LLMを再生させ、人間と機械の相互作用シナリオをシミュレートする。最後に、専門家はデータセットのすべてのサンプルに注釈を付ける。 DiaHaluは4つの共通多ターン対話ドメインと5つの幻覚サブタイプをカバーしており、事実性と忠実な幻覚から拡張されている。データセット上のよく知られたLCMと検出方法による実験は、DiaHaluが挑戦的なベンチマークであり、さらなる研究に重要な価値を持っていることを示している。

Since large language models (LLMs) achieve significant success in recent years, the hallucination issue remains a challenge, numerous benchmarks are proposed to detect the hallucination. Nevertheless, some of these benchmarks are not naturally generated by LLMs but are intentionally induced. Also, many merely focus on the factuality hallucination while ignoring the faithfulness hallucination. Additionally, although dialogue pattern is more widely utilized in the era of LLMs, current benchmarks only concentrate on sentence-level and passage-level hallucination. In this study, we propose DiaHalu, the first dialogue-level hallucination evaluation benchmark to our knowledge. Initially, we integrate the collected topics into system prompts and facilitate a dialogue between two ChatGPT3.5. Subsequently, we manually modify the contents that do not adhere to human language conventions and then have LLMs re-generate, simulating authentic human-machine interaction scenarios. Finally, professional scholars annotate all the samples in the dataset. DiaHalu covers four common multi-turn dialogue domains and five hallucination subtypes, extended from factuality and faithfulness hallucination. Experiments through some well-known LLMs and detection methods on the dataset show that DiaHalu is a challenging benchmark, holding significant value for further research.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# 量子ビット計測用リードアウト共振器の方向放射

Directional emission of a readout resonator for qubit measurement ( http://arxiv.org/abs/2403.01375v2 )

ライセンス: Link先を確認

Alec Yen, Yufeng Ye, Kaidong Peng, Jennifer Wang, Gregory Cunningham, Michael Gingras, Bethany M. Niedzielski, Hannah Stickler, Kyle Serniak, Mollie E. Schwartz, Kevin P. O'Brien,

(参考訳) 我々は、全パス共振器を用いて超伝導量子ビットの伝送に基づく分散読み出しを提案し、出力に対して優先的に読み出し光子を出力する。これは、リードアウト信号が出力に向かって優先的に減衰するように、フィードラインを一方の端で意図的にミスマッチする典型的な読み出し方式とは対照的である。この意図的なミスマッチは、非理想的インピーダンス環境による有効共振器のライン幅の拡大や、インピーダンスマッチングのためのインフラの追加など、スケーリング上の課題を生じさせる。多重化オールパスリードアウト共振器を用いた将来のアーキテクチャでは、意図的にミスマッチする必要がなくなり、量子コンピュータのスケーリングの見通しが向上する可能性がある。オールパスリードアウト」の実証実証として,全パスリードアウト共振器を設計し,リードアウト周波数1.17dB未満の挿入損失と最大挿入損失1.53dBを,トランスモンキュービットの最低3つの状態に対して全帯域にわたって実現した。我々は,600 nsで平均98.1%のシングルショット忠実度を持つ量子ビット読み出しを実証し,より大きな分散シフトの効果を評価するために,シェルビングプロトコルを実装し,300 nsで99.0%の忠実度を達成する。

We propose and demonstrate transmission-based dispersive readout of a superconducting qubit using an all-pass resonator, which preferentially emits readout photons toward the output. This is in contrast to typical readout schemes, which intentionally mismatch the feedline at one end so that the readout signal preferentially decays toward the output. We show that this intentional mismatch creates scaling challenges, including larger spread of effective resonator linewidths due to non-ideal impedance environments and added infrastructure for impedance matching. A future architecture using multiplexed all-pass readout resonators would avoid the need for intentional mismatch and potentially improve the scaling prospects of quantum computers. As a proof-of-concept demonstration of "all-pass readout," we design and fabricate an all-pass readout resonator that demonstrates insertion loss below 1.17 dB at the readout frequency and a maximum insertion loss of 1.53 dB across its full bandwidth for the lowest three states of a transmon qubit. We demonstrate qubit readout with an average single-shot fidelity of 98.1% in 600 ns; to assess the effect of larger dispersive shift, we implement a shelving protocol and achieve a fidelity of 99.0% in 300 ns.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# 群集ナビゲーションのための混合戦略ナッシュ平衡

Mixed Strategy Nash Equilibrium for Crowd Navigation ( http://arxiv.org/abs/2403.01537v4 )

ライセンス: Link先を確認

Muchen Sun, Francesca Baldini, Katie Hughes, Peter Trautman, Todd Murphey,

(参考訳) 混雑した地域で移動するロボットは、衝突回避を完全に制御するのではなく、人間と自由空間を交渉する必要がある。ゲーム理論は、ロボットが経路計画中に衝突回避のために人間と協力する可能性について推論するための枠組みを提供する。特に、混合戦略ナッシュ均衡は不確実性の下での交渉行動を捉え、群衆のナビゲーションに適している。しかし、混合戦略のナッシュ均衡の計算は、しばしばリアルタイムな意思決定には不当に高価である。本稿では,軌道の確率分布の反復的ベイズ更新方式を提案する。アルゴリズムはロボットの確率的計画と他の歩行者の進路の確率論的予測を同時に生成する。提案アルゴリズムは,クラウドナビゲーションのための混合戦略ゲームと等価であり,このアルゴリズムはゲーム全体のナッシュ均衡の回復を保証する。我々はベイズのルールナッシュ平衡 (BRNE) と命名し、リアルタイムモデル予測クラウドナビゲーションフレームワークを開発した。 BRNEは汎用的な混合戦略ナッシュ均衡を解くのではなく、特に群集ナビゲーションに適した公式を解くため、低消費電力の組込みコンピュータ上でリアルタイムで解を計算することができる。シミュレーション環境と実世界の歩行者データの両方においてBRNEを評価する。 BRNEは、安全性とナビゲーション効率に関する非学習および学習ベースの手法を一貫して上回っている。また、歩行者データセットのベンチマークでは、人レベルの群衆ナビゲーションのパフォーマンスにも到達している。最後に,本アルゴリズムの実際の人間による実用性を,完全に搭載された知覚と計算能力を備えた四足歩行ロボット上で実証する。

Robots navigating in crowded areas should negotiate free space with humans rather than fully controlling collision avoidance, as this can lead to freezing behavior. Game theory provides a framework for the robot to reason about potential cooperation from humans for collision avoidance during path planning. In particular, the mixed strategy Nash equilibrium captures the negotiation behavior under uncertainty, making it well suited for crowd navigation. However, computing the mixed strategy Nash equilibrium is often prohibitively expensive for real-time decision-making. In this paper, we propose an iterative Bayesian update scheme over probability distributions of trajectories. The algorithm simultaneously generates a stochastic plan for the robot and probabilistic predictions of other pedestrians' paths. We prove that the proposed algorithm is equivalent to solving a mixed strategy game for crowd navigation, and the algorithm guarantees the recovery of the global Nash equilibrium of the game. We name our algorithm Bayes' Rule Nash Equilibrium (BRNE) and develop a real-time model prediction crowd navigation framework. Since BRNE is not solving a general-purpose mixed strategy Nash equilibrium but a tailored formula specifically for crowd navigation, it can compute the solution in real-time on a low-power embedded computer. We evaluate BRNE in both simulated environments and real-world pedestrian datasets. BRNE consistently outperforms non-learning and learning-based methods regarding safety and navigation efficiency. It also reaches human-level crowd navigation performance in the pedestrian dataset benchmark. Lastly, we demonstrate the practicality of our algorithm with real humans on an untethered quadruped robot with fully onboard perception and computation.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# 超高速後方散乱光電子に対するカタストロフィと隠れ力学対称性の影響

Influence of catastrophes and hidden dynamical symmetries on ultrafast backscattered photoelectrons ( http://arxiv.org/abs/2403.02264v2 )

ライセンス: Link先を確認

T. Rook, L. Cruz Rodriguez, C. Figueira de Morisson Faria,

(参考訳) 我々は最近実装されたハイブリッドフォワード境界CQSFA(H-CQSFA)を用いて、クーロンテールと光電子運動量分布(PMD)における軟化の程度の違いによるポテンシャルの利用効果について議論した。クーロン相互作用に軟化を導入することは、後方散乱電子軌跡に関連するPSDで観察される隆起に影響を及ぼすことを示す。ハードコアクーロン相互作用の限界では、再散乱した尾根は偏光軸に沿って近づき、ソフトコア電位は尾根特異的な角度で中断される。我々は、尾根につながる異なる軌道の運動量マッピングを分析する。ハードコアポテンシャルについては、尾根で結合する2種類のサドルポイント解が存在する。軟化を増大させることにより、クーロンポテンシャルにのみ関連する隠れた力学対称性を破り、さらに2つの解が現れることを示す。この対称性の破れのさらなるシグネチャは運動量空間の軌跡のサブセットで遭遇する。最後に、散乱理論を用いて、軟化が最大散乱角にどのように影響するかを示し、CQSFAからの観測と一致する見積もりを提供する。これは、電子の連続体伝播における残留結合電位の存在下では、純粋に運動学と動的因果関係の区別が曖昧になることを意味する。

We discuss the effect of using potentials with a Coulomb tail and different degrees of softening in the photoelectron momentum distributions (PMDs) using the recently implemented hybrid forward-boundary CQSFA (H-CQSFA). We show that introducing a softening in the Coulomb interaction influences the ridges observed in the PMDs associated with backscattered electron trajectories. In the limit of a hard-core Coulomb interaction, the re-scattering ridges close along the polarization axis, while for a soft-core potential, they are interrupted at ridge-specific angles. We analyze the momentum mapping of the different orbits leading to the ridges. For the hard-core potential, there exist two types of saddle-point solutions that coalesce at the ridge. By increasing the softening, we show that two additional solutions emerge as the result of breaking a hidden dynamical symmetry associated exclusively with the Coulomb potential. Further signatures of this symmetry breaking are encountered in subsets of momentum-space trajectories. Finally, we use scattering theory to show how the softening affects the maximal scattering angle and provide estimates that agree with our observations from the CQSFA. This implies that, in the presence of residual binding potentials in the electron's continuum propagation, the distinction between purely kinematic and dynamic caustics becomes blurred.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# LLM評価のための微調整判定モデルの限界について

On the Limitations of Fine-tuned Judge Models for LLM Evaluation ( http://arxiv.org/abs/2403.02839v2 )

ライセンス: Link先を確認

Hui Huang, Yingqi Qu, Hongli Zhou, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao,

(参考訳) 近年,Large Language Model (LLM) を用いて他のLLMの品質を評価する傾向が高まっている。多くの研究では、プロプライエタリなオープンソースモデル、特にGPT-4を評価対象として採用している。あるいは、オープンソースのLCMに基づいて微調整された判断モデルを評価対象とする作品もある。微調整された判定モデルはGPT-4と同等の評価能力を発揮すると主張されているが,本研究では,判定モデルの実証的研究を行う。提案手法は, GPT-4 を超越しても, GPT-4 は汎用性, 公平性, アスペクト特化評価, 拡張性など, 領域内テストセット上で高い性能を達成できることが示唆された。また、微調整された判断モデルが本質的にタスク固有の分類器として機能し、その結果、制限が課されることを明らかにした。最後に, LLM評価における有効性を最大化する目的で, 微調整審査員の信頼性を測定する効果的な指標を提案する。

Recently, there has been a growing trend of utilizing Large Language Model (LLM) to evaluate the quality of other LLMs. Many studies have employed proprietary close-source models, especially GPT-4, as the evaluator. Alternatively, other works have fine-tuned judge models based on open-source LLMs as the evaluator. While the fine-tuned judge models are claimed to achieve comparable evaluation capability with GPT-4, in this study, we conduct an empirical study of judge models. Our findings indicate that although the fine-tuned judge models achieve high performance on in-domain test sets, even surpassing GPT-4, they underperform GPT-4 across several dimensions, including generalizability, fairness, aspect-specific evaluation, and scalability. We also reveal that the fine-tuned judge model inherently operates as a task-specific classifier, consequently imposing the limitations. Finally, we propose an effective indicator to measure the reliability of fine-tuned judges, with the aim of maximizing their utility in LLM evaluation.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# 統合性保護ブロック暗号モード -- 絡み合ったWebをアンタングする

Integrity-protecting block cipher modes -- Untangling a tangled web ( http://arxiv.org/abs/2403.03654v2 )

ライセンス: Link先を確認

Chris J Mitchell,

(参考訳) 本稿では,認証暗号を提供するために設計された3つのブロック暗号モードのセキュリティを再検討する。これらのモードは PES-PCBC, IOBC, EPBC と呼ばれ、いずれも1990年代半ばに提案された。しかし、後者の2つのモードのセキュリティ分析はより最近になって発表された。いずれの場合も、これらのスキームに関するセキュリティ問題を記述した1つ以上の論文が最終的に公表されたが、これらの分析のうちの1つ(EDBCの)の欠陥が後に発見された。本稿は,これら3つのスキームがいずれも,それらの使用を防ぐための欠陥を持っていること,特にセキュリティの証明を有する効率的な代替スキームが多数存在することを明らかにする。

This paper re-examines the security of three related block cipher modes of operation designed to provide authenticated encryption. These modes, known as PES-PCBC, IOBC and EPBC, were all proposed in the mid-1990s. However, analyses of security of the latter two modes were published more recently. In each case one or more papers describing security issues with the schemes were eventually published, although a flaw in one of these analyses (of EPBC) was subsequently discovered - this means that until now EPBC had no known major issues. This paper establishes that, despite this, all three schemes possess defects which should prevent their use - especially as there are a number of efficient alternative schemes possessing proofs of security.

翻訳日:2024-06-19 05:36:50 公開日:2024-06-17

# 生成事前学習型構造化変換器:大規模における教師なし構文言語モデル

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale ( http://arxiv.org/abs/2403.08293v3 )

ライセンス: Link先を確認

Xiang Hu, Pengyu Ji, Qingyang Zhu, Wei Wu, Kewei Tu,

(参考訳) 構文言語モデル(SLM)はその構文木を左から右に漸進的に生成する。並列性の高い原文のスクラッチから事前学習が可能な大規模教師なしSLMであるGenerative Pretrained Structured Transformers (GPST)を提案する。 GPSTは、ゴールドツリーやシーケンシャルトレーニングなど、以前のSLMの制限を回避している。これは、一方向の言語モデリング損失によって教師される通常のSLMと、構文解析木を誘導し、双方向の言語モデリング損失によって教師される構成表現を計算する追加の合成モデルからなる。本稿では,2つのモデルの連立並列訓練をEM方式で行うための表現代行法を提案する。我々は9億ドルのトークンを持つコーパスであるOpenWebText上でGPSTを事前訓練し、GPT-2よりもGPSTの方が優れていることを示す。一方、GPSTは既存の教師なしSLMよりも左から右への文法誘導に優れており、トレーニングにおいてかなりの加速を保っている。

A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It consists of two components, a usual SLM supervised by a uni-directional language modeling loss, and an additional composition model, which induces syntactic parse trees and computes constituent representations, supervised by a bi-directional language modeling loss. We propose a representation surrogate to enable joint parallel training of the two models in a hard-EM fashion. We pre-train GPST on OpenWebText, a corpus with $9$ billion tokens, and demonstrate the superiority of GPST over GPT-2 with a comparable size in numerous tasks covering both language understanding and language generation. Meanwhile, GPST also significantly outperforms existing unsupervised SLMs on left-to-right grammar induction, while holding a substantial acceleration on training.

翻訳日:2024-06-19 05:27:06 公開日:2024-06-17

# TaxoLLaMA:複数語彙意味課題の解決のためのWordNetベースのモデル

TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks ( http://arxiv.org/abs/2403.09207v2 )

ライセンス: Link先を確認

Viktor Moskvoretskii, Ekaterina Neminova, Alina Lobanova, Alexander Panchenko, Irina Nikishina,

(参考訳) 本稿では,LLaMA-2-7bモデルの例を用いて,WordNetから語彙意味知識を抽出し,複数の語彙意味タスクで検証するLLMの機能について検討する。実験の結果,4ビット量子化とLoRAにより軽量なオールインワンモデルであるTaxoLLaMAを提案する。 SotAの結果は11で、分類の豊かさ、ハイパーネム発見、分類構築、レキシカル・エンテリメントの16のタスクのうち4つのトップ2が達成されている。さらに、レキシカルエンターメントと分類構築において、微調整なしで非常に強力なゼロショット性能を示す。また、その隠れた多言語およびドメイン適応機能についても、少しチューニングしたり、ほんの少しの学習で調べます。すべてのデータセット、コード、モデルはhttps://github.com/VityaVitalich/TaxoLLaMAで公開されている。

In this paper, we explore the capabilities of LLMs in capturing lexical-semantic knowledge from WordNet on the example of the LLaMA-2-7b model and test it on multiple lexical semantic tasks. As the outcome of our experiments, we present TaxoLLaMA, the everything-in-one model, lightweight due to 4-bit quantization and LoRA. It achieves 11 SotA results, 4 top-2 results out of 16 tasks for the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks. Moreover, it demonstrates very strong zero-shot performance on Lexical Entailment and Taxonomy Construction with no fine-tuning. We also explore its hidden multilingual and domain adaptation capabilities with a little tuning or few-shot learning. All datasets, code, and model are available online at https://github.com/VityaVitalich/TaxoLLaMA

翻訳日:2024-06-19 05:27:06 公開日:2024-06-17

# 包括的マルチモーダル知覚に向けて:タッチ・ランゲージ・ビジョン・データセットの導入

Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset ( http://arxiv.org/abs/2403.09813v3 )

ライセンス: Link先を確認

Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han,

(参考訳) 触覚は、人間とロボットの両方の知覚と相互作用能力に対する重要なサポートと強化を提供する。それでも、タッチに関連するマルチモーダル研究は主に視覚的・触覚的なモダリティに焦点を当てており、言語領域での探索は限られている。語彙以外にも、文レベルの記述にはよりリッチな意味論が含まれる。そこで我々は,マルチモードアライメントのための文レベル記述を特徴とする,人間と機械のカスケード協調によるTLV(Touch-Language-Vision)というタッチ言語ビジョンデータセットを構築した。新しいデータセットは、提案した軽量トレーニングフレームワークであるSTLV-Align(Synergistic Touch-Language-Vision Alignment)を微調整するために使用され、最小パラメータ調整(1%)で効果的なセマンティックアライメントを実現する。 Project Page: https://xiaoen0.github.io/touch.page/.com

Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-language-vision dataset named TLV (Touch-Language-Vision) by human-machine cascade collaboration, featuring sentence-level descriptions for multimode alignment. The new dataset is used to fine-tune our proposed lightweight training framework, STLV-Align (Synergistic Touch-Language-Vision Alignment), achieving effective semantic alignment with minimal parameter adjustments (1%). Project Page: https://xiaoen0.github.io/touch.page/.

翻訳日:2024-06-19 05:27:06 公開日:2024-06-17

# BirdSet: 鳥類のバイオ音響学の分類のためのデータセットとベンチマーク

BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics ( http://arxiv.org/abs/2403.10380v3 )

ライセンス: Link先を確認

Lukas Rauch, Raphael Schwinger, Moritz Wirth, René Heinrich, Denis Huseljic, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz,

(参考訳) 深層学習(DL)モデルは、環境健康を評価するための鳥のバイオ音響学の強力なツールとして登場した。低コストで最小限のパッシブ・アコースティック・モニタリング(PAM)の可能性を最大化するために、DLモデルは幅広い種や環境条件で鳥の声化を分析する必要がある。しかし、データの断片化は一般化性能の包括的な評価に挑戦する。そこで,BirdSetデータセットを導入し,約52万本のグローバル・バード・レコードと400時間以上のPAM・レコードをテスト対象とする。我々のベンチマークでは、複数のDLモデルのベースラインを提供し、総合的なトレーニングや評価プロトコルを含むコード実装とともに、コンパラビリティを高め、研究を集約しています。

Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.

翻訳日:2024-06-19 05:27:06 公開日:2024-06-17

# 時間的Oracleの混在を伴わない実践的アワード強化学習のグローバルな最適化に向けて

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles ( http://arxiv.org/abs/2403.11925v4 )

ライセンス: Link先を確認

Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha,

(参考訳) 平均回帰強化学習の文脈では、固定された政策の下でマルコフ連鎖が定常分布を達成するためには、混合時間のオラクル知識が必要であり、政策勾配法のグローバル収束に重要な課題を生じさせる。この要件は、大規模な状態空間を持つ環境における混合時間推定の困難さと費用が問題であり、実用的なアプリケーションにおいて効果的な勾配推定を行うために、急激な長い軌道が必要となり、この制限に対処するために、マルチレベルモンテカルロ勾配推定器を組み込んだマルチレベルアクター・クリティカル(MAC)フレームワークを考える。提案手法では, 時間知識の混合への依存を効果的に緩和する。さらに,本手法は先行研究から知られている$\mathcal{O}\left( \sqrt{\tau_{mix}} \right)の最も厳密な依存性を示す。 2Dグリッドの世界における目標達成ナビゲーション実験により、MACは、平均的な報酬設定のために既存の最先端のポリシー勾配に基づく手法よりも優れていることを示す。

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications.To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte-Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$known from prior work. With a 2D grid world goal-reaching navigation experiment, we demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# 基準に基づくメトリクスは質問生成のテーマを異にする

Reference-based Metrics Disprove Themselves in Question Generation ( http://arxiv.org/abs/2403.12242v2 )

ライセンス: Link先を確認

Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang,

(参考訳) BLEUやBERTScoreのような基準ベースのメトリクスは、質問生成(QG)を評価するために広く使われている。本研究では、SQuADやHotpotQAなどのQGベンチマークにおいて、人手による参照を用いることで基準ベースのメトリクスの有効性を保証できないことを示す。ほとんどのQGベンチマークには1つの参照しかありません。優れた測定基準は、生成した質問に比較して、人間公認の質問を格付けすることが期待された。しかし, 新たに収集した基準値に対する基準基準値の結果は, 基準値自体を反証した。本研究では,大規模言語モデルを用いて,自然性,応答可能性,複雑性などの多次元基準からなる基準自由度尺度を提案する。これらの基準は単一の参照質問の構文や意味に制約されず、メトリクスは多様な参照セットを必要としない。実験の結果、我々の測定基準は高品質な質問と欠陥のある質問を正確に区別し、人間の判断と最先端の一致を実現していることがわかった。

Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# 任意多視点画像からの人間のメッシュ復元

Human Mesh Recovery from Arbitrary Multi-view Images ( http://arxiv.org/abs/2403.12434v4 )

ライセンス: Link先を確認

Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen,

(参考訳) 任意のマルチビュー画像からのヒューマンメッシュリカバリには、任意のカメラポーズと、任意の数のカメラビューの2つの特徴がある。可変性のため、このタスクに取り組むために統一されたフレームワークを設計することは困難である。この課題は、フレキシビリティを維持しつつ、任意のカメラのポーズを同時に推定し、任意のマルチビューイメージから人間のメッシュを復元できるというジレンマとして要約できる。このジレンマを解決するために、任意の多視点画像から統一人間メッシュ回復(U-HMR)を分離・征服するフレームワークを提案する。特にU-HMRは、分離された構造と、カメラとボディーデカップリング(CBD)、カメラポーズ推定(CPE)、任意のビュー融合(AVF)の2つの主要コンポーネントから構成される。カメラのポーズと人体メッシュが互いに独立しているため、CBDはそれらを2つのサブタスクに分割し、2つのサブネットワーク(ie, CPE, AVF)でそれぞれ処理する。 CPEでは、各カメラのポーズは他のカメラと無関係であるため、すべてのビューを並列に処理するために共有MLPを採用する。 AVFでは、マルチビュー情報を融合して融合操作をビュー数に依存しないものにするため、SMPLパラメータクエリトークンを用いたトランスフォーマーデコーダを導入し、メッシュリカバリのためのクロスビュー機能を抽出する。提案するフレームワークの有効性と各コンポーネントの効果を実証するため,Human3.6M,MPI-INF-3DHP,TotalCaptureの3つの公開データセットに対して広範な実験を行った。

Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views. Because of the variability, designing a unified framework to tackle this task is challenging. The challenges can be summarized as the dilemma of being able to simultaneously estimate arbitrary camera poses and recover human mesh from arbitrary multi-view images while maintaining flexibility. To solve this dilemma, we propose a divide and conquer framework for Unified Human Mesh Recovery (U-HMR) from arbitrary multi-view images. In particular, U-HMR consists of a decoupled structure and two main components: camera and body decoupling (CBD), camera pose estimation (CPE), and arbitrary view fusion (AVF). As camera poses and human body mesh are independent of each other, CBD splits the estimation of them into two sub-tasks for two individual sub-networks (ie, CPE and AVF) to handle respectively, thus the two sub-tasks are disentangled. In CPE, since each camera pose is unrelated to the others, we adopt a shared MLP to process all views in a parallel way. In AVF, in order to fuse multi-view information and make the fusion operation independent of the number of views, we introduce a transformer decoder with a SMPL parameters query token to extract cross-view features for mesh recovery. To demonstrate the efficacy and flexibility of the proposed framework and effect of each component, we conduct extensive experiments on three public datasets: Human3.6M, MPI-INF-3DHP, and TotalCapture.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# NovelQA:20万件の文書に関するベンチマーク質問

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens ( http://arxiv.org/abs/2403.12766v2 )

ライセンス: Link先を確認

Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Xiangkun Hu, Zheng Zhang, Qian Wang, Yue Zhang,

(参考訳) 大規模言語モデル(LLM)の急速な進歩は、特に長文情報の理解と処理において、自然言語処理における新たなフロンティアを導入している。しかしながら、これらのモデルの長期コンテキスト能力の評価は、現在のベンチマークの限界のため、依然として課題である。このギャップに対処するために,拡張テキストでLLMの能力をテストするためのベンチマークであるNovellQAを紹介する。ノベルクアは英語の小説から作られており、複雑さ、長さ、物語のコヒーレンスを独特にブレンドしており、LLMの深いテキスト理解を評価するのに理想的なツールである。本稿では,ノベルQAの設計と構築について述べる。 NovelQA上でのLong-context LLMの評価では、特にマルチホップ推論、詳細指向の質問、および平均20万トークン以上の非常に長い入力で直面する課題について、モデルの性能に関する重要な洞察が明らかにされている。その結果,LLMの長文理解を改善するためのさらなる進歩の必要性が浮き彫りになった。

The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with an average length more than 200,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# 拡散復元モデルに基づく超音波イメージング

Ultrasound Imaging based on the Variance of a Diffusion Restoration Model ( http://arxiv.org/abs/2403.15316v2 )

ライセンス: Link先を確認

Yuxin Zhang, Clément Huneau, Jérôme Idier, Diana Mateus,

(参考訳) 今日の医学における超音波画像の流行にもかかわらず、超音波信号とノイズの比率は、いくつかのノイズや人工物の影響を受けている。さらに、超音波画像品質の向上には、コントラスト、解像度、スペックル保存といった同時的な要因のバランスが伴う。近年,超音波画像再構成の問題に対処するモデルベースと学習ベースの両方のアプローチが進展している。両世界から最善を享受し, 生成的デノナイジング拡散モデルから得られた学習前モデルと超音波線形直列モデルを組み合わせたハイブリッド再構成手法を提案する。より具体的には、事前訓練されたDDRM(Denoising Diffusion Restoration Model)の教師なし微調整に頼る。本稿では,超音波固有の乗法ノイズの性質を考慮し,超音波画像の拡散再構成の確率性を特徴付ける実験モデルを提案する。本研究では, 合成, 生体内, 生体内データに関する実験を行い, 単一平面波取得による高画質画像再構成および最先端手法との比較において, 分散イメージング手法の有効性を実証した。コードは、https://github.com/Yuxin-Zhang-Jasmine/DRUSvarで入手できる。

Despite today's prevalence of ultrasound imaging in medicine, ultrasound signal-to-noise ratio is still affected by several sources of noise and artefacts. Moreover, enhancing ultrasound image quality involves balancing concurrent factors like contrast, resolution, and speckle preservation. Recently, there has been progress in both model-based and learning-based approaches addressing the problem of ultrasound image reconstruction. Bringing the best from both worlds, we propose a hybrid reconstruction method combining an ultrasound linear direct model with a learning-based prior coming from a generative Denoising Diffusion model. More specifically, we rely on the unsupervised fine-tuning of a pre-trained Denoising Diffusion Restoration Model (DDRM). Given the nature of multiplicative noise inherent to ultrasound, this paper proposes an empirical model to characterize the stochasticity of diffusion reconstruction of ultrasound images, and shows the interest of its variance as an echogenicity map estimator. We conduct experiments on synthetic, in-vitro, and in-vivo data, demonstrating the efficacy of our variance imaging approach in achieving high-quality image reconstructions from single plane-wave acquisitions and in comparison to state-of-the-art methods. The code is available at: https://github.com/Yuxin-Zhang-Jasmine/DRUSvar

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# 有限量子系におけるワームホールテレポーテーションの忠実性

Fidelity of Wormhole Teleportation in Finite-qubit Systems ( http://arxiv.org/abs/2403.16793v3 )

ライセンス: Link先を確認

Zeyu Liu, Pengfei Zhang,

(参考訳) 量子科学と技術の急速な発展は、量子シミュレーションによって量子多体システムを解釈できる時代へと導く。ホログラフィーの双対性は、強い相互作用を持つ系から重力と時空を表現し、実験的に実現不可能な高エネルギーを掘り下げることなく、重力物理学の実験研究のための自然な道を提供する。顕著な例として、ワームホール・テレポーテーションプロトコルを通したワームホールのシミュレーションがあり、理論的にも実験的にも注目されている。本研究では、相互情報と絡み合いの負性によって定量化され、全対一の相互作用を持つ$N$量子ビットシステムにおけるワームホールテレポーテーションの忠実度を計算するための理論的枠組みを開発する。主な手法はスクランブルン有効理論であり、一般的なカオス系における普遍的な時間外相関を捉えている。半古典的トラベル可能なワームホールのプローブ限界を, ほぼ最大カオスの強い相互作用系を用いてシミュレートするためには, 両システム間の強い結合が不可欠であることを示す。しかし、テレポーテーション信号はシステムサイズを$N$にすると急速に減少し、サハデフ・イェ・キタエフモデルをシミュレートすることで、創発的幾何学の鋭いシグネチャを観測するために多数のキュービットを必要とする。これには、信号の因果時間順序と、異なる信号と結合するためのテレポーテーション信号の非対称性の両方が含まれる。比較として、弱い相互作用を持つシステムにおいて、N$を減少させると、テレポーテーション信号が増加する。また、フェルミオン弦作用素における一般化符号化スキームの忠実度も解析する。

The rapid development of quantum science and technology is leading us into an era where quantum many-body systems can be comprehended through quantum simulations. Holographic duality, which states gravity and spacetime can emerge from strongly interacting systems, then offers a natural avenue for the experimental study of gravity physics without delving into experimentally infeasible high energies. A prominent example is the simulation of traversable wormholes through the wormhole teleportation protocol, attracting both theoretical and experimental attention. In this work, we develop the theoretical framework for computing the fidelity of wormhole teleportation in $N$-qubit systems with all-to-all interactions, quantified by mutual information and entanglement negativity. The main technique is the scramblon effective theory, which captures universal out-of-time-order correlations in generic chaotic systems. We clarify that strong couplings between the two systems are essential for simulating the probe limit of semi-classical traversable wormholes using strongly interacting systems with near-maximal chaos. However, the teleportation signal diminishes rapidly when reducing the system size $N$, requiring a large number of qubits to observe a sharp signature of emergent geometry by simulating the Sachdev-Ye-Kitaev model. This includes both the causal time-order of signals and the asymmetry of the teleportation signal for coupling with different signs. As a comparison, the teleportation signal increases when reducing $N$ in weakly interacting systems. We also analyze the fidelity of the generalized encoding scheme in fermionic string operators.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# Beyond Embeddings: Visual ReasoningにおけるVisual Tableの約束

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning ( http://arxiv.org/abs/2403.18252v2 )

ライセンス: Link先を確認

Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang,

(参考訳) 視覚表現学習はコンピュータビジョンの基盤であり、視覚埋め込み、構造記号、テキストベースの表現などの典型的な形式を含んでいる。 CLIP型視覚埋め込みの成功にもかかわらず、視覚的推論にとって重要な世界知識へのアクセスが欠如していることが多い。本研究では,視覚的推論に適した新しい視覚表現形式である視覚表を提案する。ビジュアルテーブルは、視覚シーンの階層的な記述として構築され、シーン記述とカテゴリ、属性、知識を含む複数のオブジェクト中心の記述が特徴である。構造的およびテキスト的フォーマットのおかげで、ビジュアルテーブルは、解釈可能性や制御可能な編集など、単に視覚的な埋め込みよりも独特なアドバンテージを提供する。さらに、視覚的推論に不可欠な、インスタンスレベルの世界知識と詳細な属性を提供する。ビジュアルテーブルを作成するために、収集された小さなアノテーションを用いてデータセット上で訓練されたジェネレータを開発する。 11の視覚的推論ベンチマークの結果は、生成した視覚表が、以前の構造的およびテキストベースの表現よりも大幅に優れていたことを示している。さらに、さまざまなベンチマークで最先端のマルチモーダルな大規模言語モデルを強化し、視覚的推論タスクを前進させる可能性を示している。私たちのコードはhttps://github.com/LaVi-Lab/Visual-Table.comで利用可能です。

Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often lack access to world knowledge critical for visual reasoning. In this work, we propose Visual Table, a novel form of visual representation tailored for visual reasoning. Visual tables are constructed as hierarchical descriptions of visual scenes, featuring a scene description and multiple object-centric descriptions covering categories, attributes, and knowledge. Thanks to the structural and textual formats, visual tables offer unique advantages over mere visual embeddings, such as interpretability and controllable editing. Furthermore, they deliver instance-level world knowledge and detailed attributes that are essential for visual reasoning. To create visual tables, we develop a generator trained on the dataset with collected, small-scale annotations. Extensive results on 11 visual reasoning benchmarks demonstrate that the generated visual tables significantly outperform previous structural and text-based representations. Moreover, they consistently enhance state-of-the-art multimodal large language models across diverse benchmarks, showcasing their potential for advancing visual reasoning tasks. Our code is available at https://github.com/LaVi-Lab/Visual-Table.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# 雑音リンク上の分散最大合意

Distributed Maximum Consensus over Noisy Links ( http://arxiv.org/abs/2403.18509v2 )

ライセンス: Link先を確認

Ehsan Lari, Reza Arablouei, Naveen K. D. Venkategowda, Stefan Werner,

(参考訳) 本稿では,雑音の多い通信リンクが存在する場合のマルチエージェントネットワーク内の最大値を推定する分散アルゴリズムRD-MCを提案する。提案手法では,最大収束問題を分散最適化問題として再定義し,乗算器の交互方向法を用いて解を求める。複数のノイズ破損推定セットに依存する既存のアルゴリズムとは異なり、RD-MCは単一のセットを採用し、堅牢性と効率性を向上させる。リンクノイズの影響を緩和し、ロバスト性を向上させるため、移動平均化を局所推定に適用する。大規模なシミュレーションにより,RD-MCは既存の最大合意アルゴリズムに比べて通信リンクノイズに対してかなり頑健であることを示す。

We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.

翻訳日:2024-06-19 05:17:19 公開日:2024-06-17

# SGCNeRF:Sparse Geometric Consistency GuidanceによるFew-Shot Neural Rendering

SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance ( http://arxiv.org/abs/2404.00992v2 )

ライセンス: Link先を確認

Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji,

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)技術は、新しい視点の創出に大きく貢献している。しかし、その効果は、わずかに利用可能なビューを扱うときに妨げられ、しばしばオーバーフィッティングによるパフォーマンス低下につながる。 FreeNeRFは、幾何学とテクスチャの両方を漸進的に改善する暗黙の幾何正規化を統合することで、この制限を克服しようとする。それでも、初期低位置符号化帯域は高周波素子を除外する。過度な適合と高周波の詳細の保存を兼ね備えた包括的アプローチの探求は現在も続いている。本研究では,特徴マッチングに基づくスパース幾何正規化モジュールを提案する。このモジュールは、高周波キーポイントをピンポイントすることで、詳細の完全性を保護する。我々は、NeRF反復による幾何やテクスチャの漸進的な改善を通じて、新規なビュー合成を向上するために、SGCNeRFと命名された効果的な数ショットのニューラルレンダリングアーキテクチャを公表する。 LLFFデータセットとDTUデータセットのPSNRの0.7dBと0.6dBの改善により、SGCNeRFは優れた幾何一貫性を持つ結果を得るだけでなく、FreeNeRFを上回る結果が得られることを示した。

Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless, an initial low positional encoding bandwidth results in the exclusion of high-frequency elements. The quest for a holistic approach that simultaneously addresses overfitting and the preservation of high-frequency details remains ongoing. This study introduces a novel feature matching based sparse geometry regularization module. This module excels in pinpointing high-frequency keypoints, thereby safeguarding the integrity of fine details. Through progressive refinement of geometry and textures across NeRF iterations, we unveil an effective few-shot neural rendering architecture, designated as SGCNeRF, for enhanced novel view synthesis. Our experiments demonstrate that SGCNeRF not only achieves superior geometry-consistent outcomes but also surpasses FreeNeRF, with improvements of 0.7 dB and 0.6 dB in PSNR on the LLFF and DTU datasets, respectively.

翻訳日:2024-06-19 05:07:34 公開日:2024-06-17

# 大規模言語モデルによる関連判断を用いたクエリ性能予測

Query Performance Prediction using Relevance Judgments Generated by Large Language Models ( http://arxiv.org/abs/2404.01012v2 )

ライセンス: Link先を確認

Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke,

(参考訳) クエリ性能予測(QPP)は,クエリの検索システムの検索品質を人間関係判断なしで推定することを目的としている。従来のQPPメソッドは通常、単一のスカラー値を返すが、特定の情報検索(IR)評価尺度を近似するために予測値を必要としない。一一つのスカラーで異なる赤外線評価尺度を正確に表すには不十分で、特にメトリクスが高度に相関しない場合 (II) 単一スカラーは、単にスカラーを用いることだけでQPP結果を説明することができないため、QPP法の解釈可能性を制限する。これらの問題に対処するために,QPPを個別のサブタスクに分解し,ランクリスト内の各項目の関連性を所定のクエリに予測するQPPフレームワーク(QPP-GenRE)を提案する。これにより、生成した関連判断を擬似ラベルとして利用して、任意のIR評価尺度を予測することができる。これにより、予測されたIR評価尺度を解釈し、生成された関連判断における誤りを特定し、追跡し、修正し、QPP品質を向上させることができる。我々は,オープンソースの大規模言語モデル(LLM)を用いて,科学的再現性を確保することにより,項目の関連性を予測する。主な課題は2つあります。一リコールを考慮したメートル法予測のための全コーパスを判定する過大な計算コスト (II) オープンソース LLM をゼロ/フェーショット方式でプロンプトする際の限られた性能。課題を解決するため、リコールを考慮したIR測度予測のための近似戦略を考案し、人間ラベルの関連判断を用いたオープンソースのLCMの微調整を提案する。 TREC 2019-2022のディープラーニングトラックでの実験によると、QPP-GenREは、語彙とニューラルランサーの両方で最先端のQPP品質を達成する。

Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of predicting the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels. This also allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific reproducibility. We face two main challenges: (i) excessive computational costs of judging an entire corpus for predicting a metric considering recall, and (ii) limited performance in prompting open-source LLMs in a zero-/few-shot manner. To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments. Experiments on the TREC 2019-2022 deep learning tracks show that QPP-GenRE achieves state-of-the-art QPP quality for both lexical and neural rankers.

翻訳日:2024-06-19 05:07:34 公開日:2024-06-17

# Rydberg Superatoms: 量子情報処理と量子光学のための人工量子システム

Rydberg superatoms: An artificial quantum system for quantum information processing and quantum optics ( http://arxiv.org/abs/2404.05330v2 )

ライセンス: Link先を確認

Xiao-Qiang Shao, Shi-Lei Su, Lin Li, Rejish Nath, Jin-Hui Wu, Weibin Li,

(参考訳) ライドバーグ励起による高密度原子アンサンブルは、その強い長距離双極子-双極子相互作用を媒介する集団効果を示す。これらの集団効果は、しばしばリドバーグ超原子を用いてモデル化され、量子情報処理や量子光学における潜在的な応用により、様々な分野において大きな注目を集めている。本稿では,Rydberg相互作用の理論的基礎を掘り下げ,その操作と検出のための実験的手法を探求する。また、Rydberg集合効果を量子計算や光量子技術に活用する最新の進歩についても論じる。理論的研究と実験的実証から洞察を合成することにより、この急速に発展する分野と、量子技術の将来に対するその潜在的影響の包括的概要を提供する。

Dense atom ensembles with Rydberg excitations display intriguing collective effects mediated by their strong, long-range dipole-dipole interactions. These collective effects, often modeled using Rydberg superatoms, have gained significant attention across various fields due to their potential applications in quantum information processing and quantum optics. In this review article, we delve into the theoretical foundations of Rydberg interactions and explore experimental techniques for their manipulation and detection. We also discuss the latest advancements in harnessing Rydberg collective effects for quantum computation and optical quantum technologies. By synthesizing insights from theoretical studies and experimental demonstrations, we aim to provide a comprehensive overview of this rapidly evolving field and its potential impact on the future of quantum technologies.

翻訳日:2024-06-19 05:07:34 公開日:2024-06-17

# YOLC: 空撮画像の細い物体検出のためのクラスターのみを見る

YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images ( http://arxiv.org/abs/2404.06180v2 )

ライセンス: Link先を確認

Chenguang Liu, Guangshuai Gao, Ziyue Huang, Zhenghui Hu, Qingjie Liu, Yunhong Wang,

(参考訳) 空中画像から物体を検出することは、以下の要因により大きな課題となる。 1) 空中画像は一般に非常に大きなサイズを持ち、一般に数百万または数億のピクセルを持つが、計算資源は限られている。 2) 対象物の大きさが小さいと, 有効検出に十分な情報が得られない。 3)不均一なオブジェクト分布は計算資源の浪費につながる。これらの問題に対処するために、我々は、アンカーフリーなオブジェクト検出器であるCenterNet上に構築された効率的で効果的なフレームワークであるYOLC(You Only Look Clusters)を提案する。大規模画像や非一様オブジェクトの分布がもたらす課題を克服するため,正確な検出のためにクラスタ領域のズームインを適応的に検索するローカルスケールモジュール(LSM)を導入する。さらに、ガウスワッサーシュタイン距離(GWD)を用いて回帰損失を修正し、高品質なバウンディングボックスを得る。検出ヘッドに変形可能な畳み込み・精細化法を用い、小型物体の検出を強化する。 Visdrone2019 と UAVDT を含む2つの航空画像データセットに対する広範な実験を行い、提案手法の有効性と優位性を実証した。コードはhttps://github.com/dawn-ech/YOLCで入手できる。

Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. Additionally, we modify the regression loss using Gaussian Wasserstein distance (GWD) to obtain high-quality bounding boxes. Deformable convolution and refinement methods are employed in the detection head to enhance the detection of small objects. We perform extensive experiments on two aerial image datasets, including Visdrone2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach. Code is available at https://github.com/dawn-ech/YOLC.

翻訳日:2024-06-19 05:07:34 公開日:2024-06-17

# 4-times 4$ Involutory MDS 行列の体系的構成法

A Systematic Construction Approach for All $4\times 4$ Involutory MDS Matrices ( http://arxiv.org/abs/2404.08250v2 )

ライセンス: Link先を確認

Yogesh Kumar, P. R. Mishra, Susanta Samanta, Atul Gaur,

(参考訳) 最大距離分離(MDS)行列は、符号化理論だけでなく、ブロック暗号やハッシュ関数の設計においても重要な役割を果たす。特に興味深いのは、ハードウェア実装における暗号化と復号化の両方に単一の回路を使用することを容易にする不揮発性MDS行列である。本稿では、偶数次不揮発性MDS行列のいくつかの特性について述べる。さらに、偶数列のすべての不揮発性MDS行列を得るための新しい行列形式を導入し、文献で利用可能な他の行列形式と比較する。次に、有限体 $\mathbb{F}_{2^m}$ 上の 4 つの時間 4$ のインボリュートな MDS 行列を体系的に構築する手法を提案する。この方法では,不揮発性MDSクラス代表行列に着目して探索空間を著しく減少させ,これらすべての行列を4,4,4$不揮発性行列と比較すると,かなり小さいセットで生成する。具体的には、これらの代表行列を濃度の集合((2^m-1)^5$)で探索する。この方法を通じて、$$$\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$ の総数 4 \times 4$ involutory MDS 行列を明示的に列挙する。

Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even order. Additionally, we introduce a new matrix form for obtaining all involutory MDS matrices of even order and compare it with other matrix forms available in the literature. We then propose a technique to systematically construct all $4 \times 4$ involutory MDS matrices over a finite field $\mathbb{F}_{2^m}$. This method significantly reduces the search space by focusing on involutory MDS class representative matrices, leading to the generation of all such matrices within a substantially smaller set compared to considering all $4 \times 4$ involutory matrices. Specifically, our approach involves searching for these representative matrices within a set of cardinality $(2^m-1)^5$. Through this method, we provide an explicit enumeration of the total number of $4 \times 4$ involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# 大きな言語モデルは間違いから進化し続けることができる

Large Language Model Can Continue Evolving From Mistakes ( http://arxiv.org/abs/2404.08707v4 )

ライセンス: Link先を確認

Haokun Zhao, Haixia Han, Jie Shi, Chengyu Du, Jiaqing Liang, Yanghua Xiao,

(参考訳) 世界の知識が進化し、新しいタスクパラダイムが出現するにつれて、継続的な学習(CL)は、大きな言語モデル(LLM)を最新に保つ上で不可欠であり、その欠点に対処する。 LLMは、新しいタスクパラダイムに適応し、タスク解決に必要な知識を取得するために、連続的命令チューニング(CIT)と連続的事前訓練(CPT)の両方を必要とすることが多い。しかし, 適切なボリュームを維持しながら, モデル内の知識不足に対処するCPTデータを収集することは依然として困難であり, また, このデータの利用効率も向上している。そこで本研究では,CPTデータ収集のためのデータ効率の高いアプローチを提案し,誤り関連知識の反復的評価と補足によってLCMの性能を継続的に向上することを目的とした,ミスからの継続進化(Continuue Evolving from Mistakes, CEM)手法を提案する。これらのCPTデータを効率的に利用し、忘れを軽減するために、並列CITとCPTデータを統合する新しいCLトレーニングセット構築パラダイムを設計する。 CEM法の有効性を実証し,CEM法の精度を最大17%向上させる実験を行った。さらに、CEMと破滅的吸収緩和法を組み合わせる可能性を確認し、反復的および連続的なモデル進化を可能にする。

As world knowledge evolves and new task paradigms emerge, Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings. In practical applications, LLMs often require both continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new task paradigms and acquire necessary knowledge for task-solving. However, it remains challenging to collect CPT data that addresses the knowledge deficiencies in models while maintaining adequate volume, and improving the efficiency of utilizing this data also presents significant difficulties. Inspired by the 'summarizing mistakes' learning skill, we propose the Continue Evolving from Mistakes (CEM) method, aiming to provide a data-efficient approach for collecting CPT data and continually improving LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To efficiently utilize these CPT data and mitigate forgetting, we design a novel CL training set construction paradigm that integrates parallel CIT and CPT data. Extensive experiments demonstrate the efficacy of the CEM method, achieving up to a 17% improvement in accuracy in the best case. Furthermore, additional experiments confirm the potential of combining CEM with catastrophic forgetting mitigation methods, enabling iterative and continual model evolution.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# モンテカルロのマルコフ氏、3Dガウシアン・スプラッティング

3D Gaussian Splatting as Markov Chain Monte Carlo ( http://arxiv.org/abs/2404.09591v2 )

ライセンス: Link先を確認

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi,

(参考訳) 3D Gaussian Splattingは最近、ニューラルレンダリングで人気になっているが、現在の手法は、ガウシアンを配置するための注意深く設計されたクローニングと分割戦略に依存しており、品質の悪いレンダリングにつながり、優れた初期化に依存している。本研究では,3次元ガウスの集合を,背景となる確率分布から得られたランダムなサンプルとして再考する。この観点から,3次元ガウス更新をSGLD(Stochastic Gradient Langevin Dynamics)更新として,単にノイズを導入するだけで変換できることを示す。次に,3次元ガウススプラッティングにおける密度化とプルーニングの戦略をMCMCサンプルの決定論的状態遷移として書き直し,これらのヒューリスティックをフレームワークから取り除いた。そのため、ガウスの「閉化」をサンプル確率を概ね保存する再局在化スキームに修正する。ガウスの効率的な利用を促進するために,未使用ガウスの除去を促進する正則化器を導入する。様々な標準的な評価シーンにおいて,本手法はレンダリング品質の向上,ガウス数の簡易制御,初期化に対する堅牢性などを実現する。

While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physical representation of the scene-in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates can be converted as Stochastic Gradient Langevin Dynamics (SGLD) updates by simply introducing noise. We then rewrite the densification and pruning strategies in 3D Gaussian Splatting as simply a deterministic state transition of MCMC samples, removing these heuristics from the framework. To do so, we revise the 'cloning' of Gaussians into a relocalization scheme that approximately preserves sample probability. To encourage efficient use of Gaussians, we introduce a regularizer that promotes the removal of unused Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# MOWA:マルチインワンイメージワープモデル

MOWA: Multiple-in-One Image Warping Model ( http://arxiv.org/abs/2404.10716v2 )

ライセンス: Link先を確認

Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy,

(参考訳) 最近の画像ワープアプローチは既存のベンチマークで顕著に成功したが、特定のタスクごとに個別のモデルをトレーニングする必要があるため、異なるカメラモデルやカスタマイズされた操作にうまく対応できない。本研究で提案するマルチ・イン・ワン・イメージWArpingモデル(MOWA)は,マルチ・イン・ワン・イメージWArpingモデル(Multiple-in-One Image WArping model)である。具体的には、領域レベルと画素レベルの両方で動作推定を遠ざけることで、マルチタスク学習の難しさを軽減する。さらに動的なタスク認識画像のワープを可能にするために,タスクタイプを予測する軽量なポイントベース分類器を導入し,より正確な推定のために特徴マップを変調するプロンプトとして機能する。私たちの知る限り、これは1つのモデルで複数の実用的なワープタスクを解決する最初の作業です。マルチインワンイメージワープのために6つのタスクでトレーニングされたMOWAは、ほとんどのタスクで最先端のタスク固有モデルより優れています。さらに、MOWAは、クロスドメインとゼロショットの評価によって証明されているように、目に見えないシーンに一般化する有望な可能性をも示している。コードとより視覚的な結果は、プロジェクトのページ(https://kangliao929.github.io/projects/mowa/)で見ることができる。

While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level. To further enable dynamic task-aware image warping, we introduce a lightweight point-based classifier that predicts the task type, serving as prompts to modulate the feature maps for more accurate estimation. To our knowledge, this is the first work that solves multiple practical warping tasks in one single model. Extensive experiments demonstrate that our MOWA, which is trained on six tasks for multiple-in-one image warping, outperforms state-of-the-art task-specific models across most tasks. Moreover, MOWA also exhibits promising potential to generalize into unseen scenes, as evidenced by cross-domain and zero-shot evaluations. The code and more visual results can be found on the project page: https://kangliao929.github.io/projects/mowa/.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# δ_θ=nθ$のダイヨン

Dyons with phase $δ_θ=nθ$ ( http://arxiv.org/abs/2404.11622v2 )

ライセンス: Link先を確認

Ricardo Heras,

(参考訳) 最近の論文 (Heras in Eur. Phys. J. Plus 138: 329, 2023) では、峡谷が電気および磁気束を囲む無限長のソレノイドを囲むとき、その波動関数が電磁双対変換の下で量子相不変性を蓄積することを示した。本稿では、この位相がウィッテン効果とともに真空角$\theta$に比例した位相位相となり、CP違反に結びつくことを示す。この位相は真空状態 $\delta_{\theta}=n\theta$ で量子化され、この量子化に関連する最も一般的な真空状態は、$\theta$-vacua のアベリア形式と同一視される。真空中における2つの仮定的干渉効果について論じ、そこでは角$\theta$が現れる。

In a recent paper (Heras in Eur. Phys. J. Plus 138: 329, 2023), we have demonstrated that when a dyon encircles an infinitely long solenoid enclosing electric and magnetic fluxes, its wave function accumulates a quantum phase invariant under electromagnetic duality transformations. In this paper, we show that this phase, in conjunction with the Witten effect, gives rise to a topological phase proportional to the vacuum angle $\theta$ and thereby connected with CP violation. We show that this phase becomes quantised in a vacuum state $\delta_{\theta}=n\theta$ and that the most general vacuum state associated with this quantisation identifies with an Abelian form of the $\theta$-vacua. We discuss two hypothetical interference effects in the vacuum where the angle $\theta$ could manifest.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# あいまいさを明示的に扱えるように言語モデルを調整する

Aligning Language Models to Explicitly Handle Ambiguity ( http://arxiv.org/abs/2404.11972v2 )

ライセンス: Link先を確認

Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim,

(参考訳) ユーザと言語モデルエージェント間のインタラクションにおいて、ユーザの発話は効率を優先するために、楕円(単語やフレーズの省略)や不正確(正確さの欠如)をしばしば示す。これは、異なる仮定や背景知識に基づいて、同じ入力の様々な解釈につながる可能性がある。したがって、信頼性を確保するために、エージェントがクエリの固有のあいまいさを適切に処理することが不可欠である。しかし、現在最先端の大規模言語モデル(LLM)でさえも、主に次のようなハードルにより、このようなシナリオで課題に直面している:(1) LLMは、曖昧な発話を扱うために明示的に訓練されていない; (2) LLMが認識する曖昧さの程度は、所有する知識によって異なるかもしれない。これらの問題に対処するために、我々は、あいまいさ(すなわち知覚曖昧さ)の自己評価を活用することで、LLMをあいまいなクエリを管理するために調整する新しいパイプラインであるAlignment with Perceived Ambiguity (APA)を提案する。質問応答データセットの実験結果から、APAは、明確な質問に答える能力を維持しながら、あいまいなクエリを明示的に検出し、管理する権限をLLMに与えていることが示された。さらに,APAは,特にアウト・オブ・ディストリビューションのシナリオにおいて,ゴールド・スタンダード・ラベルのトレーニング以上に優れていることが確認された。

In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# X-Light: 変圧器上の変圧器をメタマルチエージェント強化学習器として用いた都市横断信号制御

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner ( http://arxiv.org/abs/2404.12090v3 )

ライセンス: Link先を確認

Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao,

(参考訳) 交通光制御の有効性は、複数の信号機間の協調により、現在の強化学習に基づくアプローチによって著しく改善されている。しかし、持続的な問題として、多様な都市にまたがる顕著な転送性を持つマルチエージェント交通信号制御アルゴリズムの取得方法がある。本稿では,都市間メタマルチエージェント交通信号制御のためのトランスフォーマー(TonT)モデルを提案する。X-Light:我々はマルコフ決定プロセスの完全なトラジェクトリを入力し,ローワートランスフォーマーは,都市内における目標交差点とその周辺地域の状態,行動,報酬を集約し,アッパートランスフォーマーは,各都市間の一般的な決定トラジェクトリを学習する。この二重レベルアプローチはモデルの堅牢な一般化と伝達可能性を促進する。特に、目に見えないシナリオへの直接転送では、平均で+7.91%、場合によっては+16.3%のベースラインメソッドを超越し、最良の結果が得られる。

The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# MergeNet: 異種モデル、タスク、モダリティ間の知識マイグレーション

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities ( http://arxiv.org/abs/2404.13322v2 )

ライセンス: Link先を確認

Kunxi Li, Tianyu Zhan, Kairui Fu, Shengyu Zhang, Kun Kuang, Jiwei Li, Zhou Zhao, Fei Wu,

(参考訳) 本研究では, 全く異なるモデルアーキテクチャ, タスク, モダリティ間の異質な知識伝達に着目した。既存の知識伝達方法(例えば、バックボーン共有、知識蒸留)は、しばしばモデル構造やタスク固有の機能/ラベル内の共有要素にヒンジし、複雑なモデルタイプやタスクへの転送を制限する。これらの課題を克服するために、異種モデルのパラメータ空間のギャップを埋めることを学び、これらのパラメータ空間内での直接的な相互作用、抽出、知識の応用を容易にするMergeNetを提案する。 MergeNetの中核となるメカニズムはパラメータアダプタにあり、ソースモデルの低ランクパラメータをクエリして、ターゲットモデルへのパラメータの識別とマッピングを順応的に学習する。 MergeNetは両方のモデルと共に学習され、我々のフレームワークは、ソースモデルのトレーニング軌道知識を含む、現在のステージに関連する知識を動的に転送し、適応することができます。不均一な知識伝達に関する大規模な実験は、代表的アプローチが干渉したり適用範囲を減らしたりすることの可能な、挑戦的な設定において顕著な改善を示す。

In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.

翻訳日:2024-06-19 04:57:50 公開日:2024-06-17

# コヒーレンス測定に基づく電子ビームのウィグナー関数の再構成

Reconstruction of Wigner function of electron beams based on coherence measurements ( http://arxiv.org/abs/2404.13379v2 )

ライセンス: Link先を確認

Shuhei Hatanaka, Jun Yamasaki,

(参考訳) 電子ビームの密度行列とウィグナー関数をエアリーパターン強度プロファイル解析により再構成する方法を開発した。透過電子顕微鏡対象物体の密度行列をコヒーレンス関数と電子波振幅と位相分布を用いて計算した。その後、ウィグナー函数は行列要素を用いて再構成された。位相空間の起点におけるウィグナー関数に基づいて、その軸方向の明るさを計算する式を導出し、従来の平均輝度測定よりも精度よくエミッタ性能を反映したショットキー界放出ガンの軸方向の明るさを決定した。

We developed a reconstruction method for the density matrix and Wigner function of electron beams through analysis of the Airy pattern intensity profile. The density matrix in a transmission electron microscope object plane was calculated using the coherence function and the electron wave amplitude and phase distributions. The Wigner function was then reconstructed using the matrix elements. Based on the Wigner function at the origin of the phase space, we derived a formula to calculate the axial brightness, and then determined the axial brightness of a Schottky field emission gun, which reflects the emitter performance more accurately and precisely than the conventional mean brightness measurements.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# SnapKV: LLMは、あなたが生成前に探しているものを知っている

SnapKV: LLM Knows What You are Looking for Before Generation ( http://arxiv.org/abs/2404.14469v2 )

ライセンス: Link先を確認

Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen,

(参考訳) 大きな言語モデル(LLM)は、キーバリュー(KV)キャッシュがパフォーマンス向上に重要な役割を果たしているため、広範なコンテキストの処理において顕著な進歩を遂げている。しかし、入力長の増加に対応するKVキャッシュの増加は、メモリと時間効率に課題をもたらす。この問題に対処するため,本稿では,KVキャッシュサイズを効率よく最小化しつつ,実世界のアプリケーションで同等のパフォーマンスを実現する,革新的な,微調整不要なアプローチであるSnapKVを紹介する。モデル内の各注意点が、生成中の特定の注意点に一貫して焦点を合わせていることが判明した。一方、この頑健なパターンはプロンプトの端にある「観測」ウィンドウから得ることができる。この洞察に基づいてSnapKVは、注目ヘッド毎にクラスタ化された重要なKV位置を選択することで、KVキャッシュを自動的に圧縮する。提案手法は,長い入力シーケンスを処理する際の計算オーバーヘッドとメモリフットプリントの増大を著しく低減する。具体的には、SnapKVは16Kトークンの入力を処理する際に、生成速度が3.6倍、メモリ効率が8.2倍向上して一貫した復号速度を達成する。同時に、16の長いシーケンスデータセットにわたるベースラインモデルと同等のパフォーマンスを維持している。さらに、SnapKVはHuggingFace実装を使って1つのA100-80GB GPU上で最大380Kのコンテキストトークンを小さな変更で処理でき、Needdle-in-a-Haystackテストでは無視できる精度の低下しか表示できない。より包括的な研究は、SnapKVの実用的な応用の可能性を示している。

Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach that efficiently minimizes KV cache size while still delivering comparable performance in real-world applications. We discover that each attention head in the model consistently focuses on specific prompt attention features during generation. Meanwhile, this robust pattern can be obtained from an 'observation' window located at the end of the prompts. Drawing on this insight, SnapKV automatically compresses KV caches by selecting clustered important KV positions for each attention head. Our approach significantly reduces the growing computational overhead and memory footprint when processing long input sequences. Specifically, SnapKV achieves a consistent decoding speed with a 3.6x increase in generation speed and an 8.2x enhancement in memory efficiency compared to the baseline when processing inputs of 16K tokens. At the same time, it maintains comparable performance to the baseline models across 16 long sequence datasets. Moreover, SnapKV can process up to 380K context tokens on a single A100-80GB GPU using HuggingFace implementation with minor changes, exhibiting only a negligible accuracy drop in the Needle-in-a-Haystack test. Further comprehensive studies suggest SnapKV's potential for practical applications.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# PLAYER*:殺人ミステリーゲームにおけるLLMに基づくマルチエージェントコミュニケーションとインタラクションの強化

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games ( http://arxiv.org/abs/2404.17662v2 )

ライセンス: Link先を確認

Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He,

(参考訳) 複雑な問題に対処し、動的環境における対人関係を理解する上で、LLM(Large Language Models)上に構築された既存のエージェントベースのアプローチの限界に対処する新しいフレームワークPLAYER*を提案する。 PLAYER*は,任意のサンプリングベースプランナと質問駆動検索フレームワークを用いて,Murder Mystery Games(MMG)のパス計画を強化する。エージェントに一連のセンサーを装備することで、PLAYER*は事前に定義された質問を不要にし、エージェントが複雑な社会的相互作用をナビゲートすることを可能にする。また,複数問合せを用いた定量評価手法を導入し,1,482問問問答対を含むデータセットWellPlayを提案する。実験の結果、PLAYER*は既存のマルチエージェント法よりも優れており、MMGにおけるエージェントの汎用性と適応性を高め、より効果的なマルチエージェントインタラクションの道を開いた。

We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# リモートセンシング画像における高能率メタラーニングによるマルチスケールFew-Shotオブジェクト検出

Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images ( http://arxiv.org/abs/2404.18426v3 )

ライセンス: Link先を確認

Wenbin Guan, Zijiu Yang, Xiaohong Wu, Liqiong Chen, Feng Huang, Xiaohai He, Honggang Chen,

(参考訳) 現在、リモートセンシング画像(RSI)における小ショット物体検出(FSOD)の課題が注目されている。多数の数発の検出器、特に2段階の検出器に基づく検出器は、RSIに固有のマルチスケールの複雑さを扱う際に困難に直面している。さらに、これらの検出器は、大量のデータを扱う際に、主に不安定なモデルパラメータのために、現実世界の応用において非現実的な特性を示す。対照的に、高い検出速度や大域的受容場を含む一段検出器の利点を認識している。その結果,YOLOv71段検出器をベースラインとして選択し,新しいメタラーニングトレーニングフレームワークを提案する。この変換により、検出器はFSODのタスクに十分対応できると同時に、その固有の軽量化の利点を活かすことができる。さらに, メタ学習戦略によって生成されたサンプルを徹底的に調査し, 設計したメタ検出ヘッドが生成したサンプルを保持するための新しいメタサンプリング手法を提案する。考案したメタクロス損失と相まって、しばしば見過ごされる"負のサンプル"を意図的に利用して、それらから貴重な知識を抽出します。このアプローチは、検出精度を高め、全体的なメタ学習戦略を効率的に洗練する。提案した検出器の有効性を検証するため,DIORとNWPU VHR-10.v2データセットを用いて現状の検出器の性能比較を行い,良好な結果を得た。

Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwieldy model parameters when handling large amount of data. In contrast, we recognize the advantages of one-stage detectors, including high detection speed and a global receptive field. Consequently, we choose the YOLOv7 one-stage detector as a baseline and subject it to a novel meta-learning training framework. This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight. Additionally, we thoroughly investigate the samples generated by the meta-learning strategy and introduce a novel meta-sampling approach to retain samples produced by our designed meta-detection head. Coupled with our devised meta-cross loss, we deliberately utilize "negative samples" that are often overlooked to extract valuable knowledge from them. This approach serves to enhance detection accuracy and efficiently refine the overall meta-learning strategy. To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors using the DIOR and NWPU VHR-10.v2 datasets, yielding satisfactory results.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# バイアス・コントラスト・ペアにおけるクラス識別コモン属性の探索による内在的特徴のデバイアス化

Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair ( http://arxiv.org/abs/2404.19250v2 )

ライセンス: Link先を確認

Jeonghoon Park, Chaeyeon Chung, Juyoung Lee, Jaegul Choo,

(参考訳) 画像分類タスクでは、ディープニューラルネットワークは、データセットバイアスが存在する場合、ターゲットクラスと突発的に相関するバイアス特性にしばしば依存し、バイアス属性のないデータに適用した場合、性能が低下する。 Debiasingのタスクは、バイアス属性ではなく、本質的にターゲットクラスを定義する固有の属性を学ぶために、分類器を強制することを目的としている。近年のアプローチでは、バイアス特性のないデータサンプルの学習(すなわちバイアス強調サンプル)をバイアス特性を持つサンプル(すなわちバイアス整合サンプル)と比較して強調する傾向にあるが、本質的な特徴の学習に焦点をあてるモデルを直接指導するには至っていない。この制限に対処するため,本研究では,本質的な特徴の領域を示す明示的な空間的ガイダンスをモデルに提供する手法を提案する。まず, バイアス整合型 (BA) サンプルとバイアス整合型 (BC) サンプル (バイアス整合型 (BC) ペア) のクラス識別共通特徴について検討した。次に, BA試料の内在的特徴をBC試料と比較した場合, 予測にはあまり役に立たなかった。バイアス情報を使わずにバイアス競合対を構築するために,バイアスモデルを用いたBAサンプルとBCサンプルを区別するバイアス負スコアを導入する。実験により, 種々のバイアス重大度を有する合成および実世界のデータセットに対して, 最先端の性能を達成できることが実証された。

In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias attributes. While recent approaches mainly focus on emphasizing the learning of data samples without bias attributes (i.e., bias-conflicting samples) compared to samples with bias attributes (i.e., bias-aligned samples), they fall short of directly guiding models where to focus for learning intrinsic features. To address this limitation, this paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. We first identify the intrinsic features by investigating the class-discerning common features between a bias-aligned (BA) sample and a bias-conflicting (BC) sample (i.e., bias-contrastive pair). Next, we enhance the intrinsic features in the BA sample that are relatively under-exploited for prediction compared to the BC sample. To construct the bias-contrastive pair without using bias information, we introduce a bias-negative score that distinguishes BC samples from BA samples employing a biased model. The experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# ParaGrapher を用いた大規模圧縮グラフの選択的並列ロード

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher ( http://arxiv.org/abs/2404.19735v2 )

ライセンス: Link先を確認

Mohsen Koohi Esfahani, Marco D'Antonio, Syed Ibtisam Tauhidi, Thai Son Mai, Hans Vandierendonck,

(参考訳) 総合評価は実験科学の基礎の1つである。高性能グラフ処理では、さまざまなフレームワーク上で共通の入力フォーマットをサポートすることで、コントリビューションの徹底的な評価がより達成できるようになります。しかし、それぞれのフレームワークは、大規模な実世界のグラフデータセットの読み込みをサポートしない特定のフォーマットを作成する。これはグラフをロードできる高性能ライブラリの需要を示している。 (i)新しいグラフアルゴリズムの設計を加速する (二)幅広いグラフアルゴリズムへの貢献を評価すること、及び (iii)異なるグラフフレームワークに対する簡易かつ高速な比較を容易にすること。そこで我々は,大規模および圧縮されたグラフをロードする高性能APIおよびライブラリであるParaGrapherを紹介する。 ParaGrapherは、共有メモリおよび分散メモリおよびアウトオブコアグラフ処理でグラフにアクセスするためのさまざまなタイプのリクエストをサポートする。本稿では,ParaGrapherの設計と,ParaGrapherを3つのストレージタイプで評価するグラフ圧縮の性能モデルについて説明する。評価の結果,ParaGrapherは圧縮グラフをWebGraph形式で圧縮することにより,ロード時の最大3.2倍,エンドツーエンド実行時の最大5.2倍の高速化を実現している。 ParaGrapherはhttps://blogs.qub.ac.uk/DIPSA/ParaGrapher/.comで公開されている。

Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks. To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats. ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# SIMPLOT: 必需品の蒸留によるチャート回答の強化

SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials ( http://arxiv.org/abs/2405.00021v2 )

ライセンス: Link先を確認

Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park,

(参考訳) 近年,論理的推論による複雑なチャートの解釈が,視覚言語モデルの開発による課題として浮上している。従来のSOTA(State-of-the-art)モデルでは、視覚言語モデルを利用して、大言語モデル(LLM)を用いてチャートをテーブル形式に変換するエンド・ツー・エンドの手法が提案されている。しかし、自然画像とは異なり、チャートにはチャート推論に必要な重要な情報と無関係な情報が混在しており、この特性がチャートからテーブルへの抽出の性能を低下させることが判明した。本稿では,チャート推論に必要な要素のみを抽出するSIMPLOTを提案する。提案手法には2つのステップがある。 1)複雑な図表から表を抽出するための重要な情報のみを含む単純なプロットを模倣する訓練。 2) 表に基づく推論を行う。本モデルでは,アノテーションやデータセットを追加することなく正確なチャート推論が可能であり,その有効性は様々な実験によって実証されている。さらに,より正確な推論を行うために,人間の解釈チャートを模倣する新しいプロンプトを提案する。ソースコードはhttps://github.com/sangwu99/Simplot.comから入手可能です。

Recently, interpreting complex charts with logical reasoning has emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Large Language Model (LLM) for reasoning. However, unlike natural images, charts contain a mix of essential and irrelevant information required for chart reasoning, and we discover that this characteristic can lower the performance of chart-to-table extraction. In this paper, we introduce SIMPLOT, a method designed to extract only the elements necessary for chart reasoning. The proposed method involves two steps: 1) training to mimic a simple plot that contains only the essential information from a complex chart for table extraction, followed by 2) performing reasoning based on the table. Our model enables accurate chart reasoning without the need for additional annotations or datasets, and its effectiveness is demonstrated through various experiments. Furthermore, we propose a novel prompt mimicking how human interpret charts for more accurate reasoning. Our source code is available at https://github.com/sangwu99/Simplot.

翻訳日:2024-06-19 04:48:05 公開日:2024-06-17

# 平均場コヒーレントイジングマシンを用いたL0規則化圧縮センシング

L0-regularized compressed sensing with Mean-field Coherent Ising Machines ( http://arxiv.org/abs/2405.00366v2 )

ライセンス: Link先を確認

Mastiyage Don Sudeera Hasaranga Gunathilaka, Yoshitaka Inui, Satoshi Kako, Kazushi Mimura, Masato Okada, Yoshihisa Yamamoto, Toru Aonishi,

(参考訳) コヒーレントイジングマシン(Coherent Ising Machine, CIM)は、イジング・ハミルトンの基底状態を見つけることで組合せ最適化問題を解決する光学パラメトリック発振器のネットワークである。 CIMの実用化として、AonishiらはL0規則化に基づく圧縮センシング(L0RBCS)の最適化問題を解決するために量子古典ハイブリッドシステムを提案した。 Gunathilakaらはシステムの精度をさらに高めた。しかし、計算コストのかかるCIMの確率微分方程式(SDE)は、デジタルハードウェアの実装の使用を制限する。我々は,GunathilakaらのCIM SDEの代替として,量子ノイズのない物理学的なヒューリスティック解法である平均場CIM(MF-CIM)モデルを提案する。 MF-CIMは微分方程式(DE)の単純性により計算コストを上回ります。さらに,提案手法は,CIMベースのL0RBCSをFPGA(Field Programmable Gate Arrays)などのデジタルハードウェア上で実装する方法として,人工的および磁気共鳴画像データの両方において,物理的に正確なSDEと類似した性能を有することを示す。

Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the accuracy of the system. However, the computationally expensive CIM's stochastic differential equations (SDEs) limit the use of digital hardware implementations. As an alternative to Gunathilaka et al.'s CIM SDEs used previously, we propose using the mean-field CIM (MF-CIM) model, which is a physics-inspired heuristic solver without quantum noise. MF-CIM surmounts the high computational cost due to the simple nature of the differential equations (DEs). Furthermore, our results indicate that the proposed model has similar performance to physically accurate SDEs in both artificial and magnetic resonance imaging data, paving the way for implementing CIM-based L0RBCS on digital hardware such as Field Programmable Gate Arrays (FPGAs).

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# 対数的複雑度と規則保証を考慮したオンライングラディエント型キャッシングポリシー

An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees ( http://arxiv.org/abs/2405.01263v2 )

ライセンス: Link先を確認

Damiano Carra, Giovanni Neglia,

(参考訳) LRU(Least recent Used)やLFU(Least Frequently Used)といった一般的なキャッシュポリシは、特定のトラフィックパターンの下でのみ最適なパフォーマンスを示す。過去の要求データのパターンを検出する高度な機械学習ベースの方法でさえ、将来の要求が過去のトレンドから逸脱した場合に苦労する。近年,交通パターンの変化に頑健な新しいポリシーが出現している。これらのアルゴリズムは、コンテキストへの継続的な適応を可能にするオンライン最適化問題に対処する。それらは、オンラインポリシーと後ろ向きの最適な静的キャッシュ割り当ての間のパフォーマンスギャップを測定する、後悔の度合いに関する理論的保証を提供する。しかし、これらの解の計算複雑性が高いことは、その実践的採用を妨げる。本研究では,カタログサイズに対する対数計算の複雑さを突破し,かつ,後悔の保証を提供する,勾配に基づくオンラインキャッシュポリシーの新たなバリエーションを提案する。この進歩により、何百万ものリクエストやアイテムをフィーチャーした大規模で現実世界のトレース上でポリシーをテストすることが可能になります。我々の知る限り、我々の実験結果は、勾配に基づくキャッシュポリシーの後悔の保証が、現実的なシナリオでかなりの利益をもたらすことを初めて証明した。

Commonly used caching policies, such as LRU (Least Recently Used) or LFU (Least Frequently Used), exhibit optimal performance only under specific traffic patterns. Even advanced machine learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that are robust to varying traffic patterns. These algorithms address an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which measures the performance gap between the online policy and the optimal static cache allocation in hindsight. However, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity relative to catalog size, while also providing regret guarantees. This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items - a significant achievement, as such scales have been beyond the reach of existing policies with regret guarantees. To the best of our knowledge, our experimental results demonstrate for the first time that the regret guarantees of gradient-based caching policies offer substantial benefits in practical scenarios.

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# ポジション:LLMを理解するには統計的一般化以上のものが必要だ

Position: Understanding LLMs Requires More Than Statistical Generalization ( http://arxiv.org/abs/2405.01964v3 )

ライセンス: Link先を確認

Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár,

(参考訳) この10年、ディープラーニング理論における花の咲く研究が「なぜディープラーニングは一般化するのか?」と答えようとしている。パースペクティブの強力なシフトは、補間系における過度にパラメトリケートされたモデルの研究という、この進歩を早めた。本稿では, LLMの望ましい性質のいくつかは, 良好な統計一般化の結果ではなく, 別々に理論的な説明を必要とするため, もう一つの視点シフトが原因であると主張する。我々の中心的な議論は、AR確率モデルは本質的には識別不可能である、という観察に依存している。我々は,(1)ゼロショット規則外挿の非識別性,(2)文脈内学習の近似的非識別性,(3)微視的学習の非識別性という3つのケーススタディを通じて,非識別性が実際的関連性を持つ理由を考察した。我々は, LLM関連一般化対策, 伝達可能性, 誘導バイアスに着目した有望な研究方向性を概観する。

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# P-ICL:大規模言語モデルを用いた名前付きエンティティ認識のためのポイントインコンテキスト学習

P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models ( http://arxiv.org/abs/2405.04960v2 )

ライセンス: Link先を確認

Guochao Jiang, Zepeng Ding, Yuchen Shi, Deqing Yang,

(参考訳) 近年,大規模言語モデル (LLM) の台頭により,実演サンプルを使わずに直接名前付きエンティティ認識 (NER) を実現することが可能になった。しかし、標準のICLは、LSMがタスク命令、フォーマット、入力ラベルマッピングを理解するのに役立ち、NERタスク自体の特異性を無視している。本稿では, LLM を用いて NER をよりよく実現するための新しいプロンプトフレームワーク P-ICL を提案する。このような重要な情報により、LLMはより正確にエンティティ分類を達成することができる。そこで本研究では,K-Meansクラスタリングに基づくポイントエンティティ選択手法を提案する。 P-ICL とポイントエンティティ選択における提案手法の有効性を検証するため,いくつかの代表的 NER ベンチマークの広範な実験を行った。

In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we propose a new prompting framework P-ICL to better achieve NER with LLMs, in which some point entities are leveraged as the auxiliary information to recognize each entity type. With such significant information, the LLM can achieve entity classification more precisely. To obtain optimal point entities for prompting LLMs, we also proposed a point entity selection method based on K-Means clustering. Our extensive experiments on some representative NER benchmarks verify the effectiveness of our proposed strategies in P-ICL and point entity selection.

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# 拡散過程とインプット補間予測マスクによる時系列表現の自己教師付き学習

Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask ( http://arxiv.org/abs/2405.05959v2 )

ライセンス: Link先を確認

Zineb Senane, Lele Cao, Valentin Leonhard Buchner, Yusuke Tashiro, Lei You, Pawel Herman, Mats Nordahl, Ruibo Tu, Vilhelm von Ehrenheim,

(参考訳) 時系列表現学習(TSRL)は、様々な時系列モデリングタスクのための情報表現を生成することに焦点を当てている。 TSRLの従来の自己監視学習(SSL)の手法は、再構成、反対、対照的、予測の4つの主要なカテゴリに分類され、それぞれにノイズに対する感受性と複雑なデータニュアンスに関する共通の課題がある。近年,拡散法は高度な生成能力を示している。しかし、それらは主に計算や予測のような特定のアプリケーションシナリオをターゲットにしており、一般的なTSRLに拡散モデルを利用する際のギャップを残している。我々の研究である Time Series Diffusion Embedding (TSDE) は、このギャップを最初の拡散ベースのSSL TSRLアプローチとして橋渡ししています。 TSDEは、Imputation-Interpolation-Forecasting (IIF)マスクを使用して、TSデータを観察およびマスクされた部分にセグメントする。両直交トランスフォーマーエンコーダとクロスオーバー機構を備えたトレーニング可能な埋め込み関数を観察部位に適用する。我々は,マスク部分に追加される雑音を予測するために,埋め込みを条件とした逆拡散過程を訓練する。大規模な実験は、TSDEの計算、補間、予測、異常検出、分類、クラスタリングにおける優位性を実証している。また,TSDEデータの学習表現における効率と妥当性について,アブレーション研究,埋め込み可視化の提示,推論速度の比較を行い,TSDEの効率と妥当性について検討した。

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# RAG 会議 LLM に関する調査研究 : 検索型大規模言語モデルに向けて

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models ( http://arxiv.org/abs/2405.06211v3 )

ライセンス: Link先を確認

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li,

(参考訳) AIの最も高度な技術のひとつとして、Retrieval-Augmented Generation(RAG)は、信頼性と最新の外部知識を提供し、多数のタスクに多大な利便性を提供する。特にAIGC(AI-Generated Content)の時代には、追加知識を提供するための強力な検索能力により、RAGは既存の生成AIが高品質な出力を生成するのを支援することができる。近年、Large Language Models (LLM) は言語理解と生成において革命的な能力を示しつつも、幻覚や時代遅れの内的知識といった固有の制限に直面している。最新の補助情報を提供するRAGの強力な能力を考えると、検索型大規模言語モデル(RA-LLM)は、モデルの内部知識にのみ依存するのではなく、外部および権威的な知識ベースを活用してLLMの生成品質を向上する。本調査では, RA-LLMの既存の研究成果を概観し, アーキテクチャ, トレーニング戦略, 応用の3つの技術的側面を概観する。予備知識として,LLMの基礎と最近の進歩を紹介する。次に, LLMにおけるRAGの実用的意義を説明するため, アーキテクチャ, トレーニング戦略, アプリケーション分野の主流となる業務を体系的に検討し, RA-LLMの課題とそれに対応する能力について詳述する。最後に、より深い洞察を提供するため、今後の研究に向けて、現在の限界といくつかの有望な方向性について論じる。この調査に関する最新の情報は、https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/にある。

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, we briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, we systematically review mainstream relevant work by their architectures, training strategies, and application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, we discuss current limitations and several promising directions for future research. Updated information about this survey can be found at https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# チャート上でのMLLMのタスクベースの有効性評価

Evaluating Task-based Effectiveness of MLLMs on Charts ( http://arxiv.org/abs/2405.07001v2 )

ライセンス: Link先を確認

Yifan Wu, Lutao Yan, Yuyu Luo, Yunhai Wang, Nan Tang,

(参考訳) 本稿では,GPT-4Vはグラフ上の低レベルデータ解析タスクに有効か? この目的のために、我々はまず89,388のクォーテット(チャート、タスク、質問、回答)からなるChartInsightsという名の大規模なデータセットをキュレートし、7つのチャートタイプで広く使用されている10の低レベルデータ分析タスクをカバーした。まず、12のオープンソースモデルと6のクローズドソースモデルを含む18の高度なMLLMの能力と限界を理解するために、系統的な評価を行う。標準的なテキストプロンプトアプローチから始めて、18個のMLLMの平均精度は36.17%である。全てのモデルの中で、GPT-4Vは最高精度で56.13%に達する。低レベルデータ解析タスクにおけるマルチモーダル大モデルの限界を理解するため、GPT-4Vの機能の詳細なテストを行うために様々な実験を設計した。さらに、視覚要素の変更(例えば、色調の変更)や摂動の導入(例えば、画像ノイズの追加)など、チャートに対する視覚的変化が、GPT-4Vの性能に与える影響についても検討する。第2に,12例の実験的検討を行った。これらの結果は,GPT-4Vがチャートとの相互作用に革命をもたらす可能性を示し,人的分析ニーズとGPT-4Vの能力のギャップを明らかにすることを示唆している。第3に、低レベル解析タスクに適した、Chain-of-Chartsという新しいテキストプロンプト戦略を提案し、モデル性能を24.36%向上させ、80.49%の精度を実現した。さらに, GPT-4Vの注意を疑問関連視覚要素に向ける視覚的プロンプト戦略を導入することにより, さらに精度を83.83%向上させる。本研究は,低レベルデータ解析タスクにおけるGPT-4Vの能力と限界に光を当てるだけでなく,今後の研究に有用な知見を提供する。

In this paper, we explore a forward-thinking question: Is GPT-4V effective at low-level data analysis tasks on charts? To this end, we first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types. Firstly, we conduct systematic evaluations to understand the capabilities and limitations of 18 advanced MLLMs, which include 12 open-source models and 6 closed-source models. Starting with a standard textual prompt approach, the average accuracy rate across the 18 MLLMs is 36.17%. Among all the models, GPT-4V achieves the highest accuracy, reaching 56.13%. To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V. We further investigate how visual modifications to charts, such as altering visual elements (e.g. changing color schemes) and introducing perturbations (e.g. adding image noise), affect performance of GPT-4V. Secondly, we present 12 experimental findings. These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V. Thirdly, we propose a novel textual prompt strategy, named Chain-of-Charts, tailored for low-level analysis tasks, which boosts model performance by 24.36%, resulting in an accuracy of 80.49%. Furthermore, by incorporating a visual prompt strategy that directs attention of GPT-4V to question-relevant visual elements, we further improve accuracy to 83.83%. Our study not only sheds light on the capabilities and limitations of GPT-4V in low-level data analysis tasks but also offers valuable insights for future research.

翻訳日:2024-06-19 04:38:09 公開日:2024-06-17

# 特徴融合ネットワークを用いた人・機械用スケーラブル画像符号化

Scalable Image Coding for Humans and Machines Using Feature Fusion Network ( http://arxiv.org/abs/2405.09152v5 )

ライセンス: Link先を確認

Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe,

(参考訳) 画像認識モデルがより普及するにつれて、機械や人間のスケーラブルなコーディング方法がより重要になる。画像認識モデルの応用例としては、交通監視と農業管理がある。これらのユースケースでは、スケーラブルな符号化手法が有効であることが証明される。人間や機械の既存の画像圧縮手法は、これらの要件をある程度満たしている。しかし,これらの圧縮法は特定の画像認識モデルにのみ有効である。本稿では,多数の画像認識モデルと互換性のある人や機械を対象とした,学習に基づくスケーラブルな画像符号化手法を提案する。我々は,機械用画像圧縮モデルと圧縮モデルを組み合わせて,人間の画像復号を容易にするための追加情報を提供する。これらの圧縮モデルの特徴は、効率的な画像圧縮を実現するために、特徴融合ネットワークを用いて融合される。本手法では,特徴融合ネットワークにおいて,異なるサイズの特徴の組み合わせを可能とし,パラメータ数を削減するために,付加的な情報圧縮モデルを調整する。提案手法では,パラメータ数を削減しつつ,画像圧縮モデルを効率よく組み合わせることを確認する。さらに、デコードされた画像の品質とビットレートの観点から画像圧縮性能を評価することにより、提案手法の有効性を実証する。

As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# 点クラウドデータセットへの量子ニューラルネットワークの適用における正確な置換と回転対称性の強制

Enforcing exact permutation and rotational symmetries in the application of quantum neural network on point cloud datasets ( http://arxiv.org/abs/2405.11150v3 )

ライセンス: Link先を確認

Zhelun Li, Lento Nagano, Koji Terashi,

(参考訳) 量子機械学習の分野での最近の進歩は、量子回路の構造に物理対称性を取り入れるというアイデアを推進してきた。この領域における重要なマイルストーンは、入力オブジェクトの置換の下で同変である$S_{n}$-permutation等変量子ニューラルネットワーク(QNN)の実現である。本稿では,ポイントクラウドデータセットの回転対称性をQNNに符号化することに焦点を当てる。このアプローチのキーとなる洞察は、ベクトル入力を持つすべての回転不変関数は、ベクトル内部積の入力を持つ関数と等価であるということである。プロトン-陽子衝突によって生じる高エネルギー粒子崩壊をSO(1,3)$ローレンツ対称性で数値的に証明し,その有効性を示す。

Recent developments in the field of quantum machine learning have promoted the idea of incorporating physical symmetries in the structure of quantum circuits. A crucial milestone in this area is the realization of $S_{n}$-permutation equivariant quantum neural networks (QNN) that are equivariant under permutations of input objects. In this work, we focus on encoding the rotational symmetry of point cloud datasets into the QNN. The key insight of the approach is that all rotationally invariant functions with vector inputs are equivalent to a function with inputs of vector inner products. We provide a novel structure of QNN that is exactly invariant to both rotations and permutations, with its efficacy demonstrated numerically in the problems of two-dimensional image classifications and identifying high-energy particle decays, produced by proton-proton collisions, with the $SO(1,3)$ Lorentz symmetry.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# SLAB: 線形注意とプログレッシブ再パラメータ化バッチ正規化を簡略化した効率的な変圧器

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization ( http://arxiv.org/abs/2405.11582v2 )

ライセンス: Link先を確認

Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang,

(参考訳) トランスフォーマーは自然言語とコンピュータビジョンの両方のタスクの基盤となるアーキテクチャとなっている。しかし、計算コストが高いため、リソース制約のあるデバイスへのデプロイは非常に困難である。本稿では,効率的な変圧器の計算ボトルネックモジュール,すなわち正規化層とアテンションモジュールについて検討する。 LayerNormはトランスフォーマーアーキテクチャで一般的に使用されるが、推論中の統計計算のために計算に適さない。しかし、トランスフォーマーでLayerNormをより効率的なBatchNormに置き換えると、しばしばパフォーマンスが低下し、トレーニングが崩壊する。そこで本研究では,LayerNorm を再パラメータ化した BatchNorm に段階的に置き換える PRepBN という新しい手法を提案する。さらに,単純化された線形アテンション(SLA)モジュールを提案する。画像分類および物体検出に関する大規模な実験により,提案手法の有効性が示された。例えば、私たちのSLAB-Swinは、ImageNet-1K上で16.2$msのレイテンシで8,3.6\%のTop-1精度を得ることができ、これはFlatten-Swinよりも2.4$ms安く、精度は0.1$%の精度である。また、言語モデリングタスクの手法を評価し、同等のパフォーマンスと低レイテンシを得る。コードはhttps://github.com/xinghaochen/SLABとhttps://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLABで公開されています。

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains $83.6\%$ top-1 accuracy on ImageNet-1K with $16.2$ms latency, which is $2.4$ms less than that of Flatten-Swin with $0.1\%$ higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# CNNを用いた後処理による人間の視覚層における符号化画像の精細化

Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing ( http://arxiv.org/abs/2405.11894v2 )

ライセンス: Link先を確認

Takahiro Shindo, Yui Tatsumi, Taiju Watanabe, Hiroshi Watanabe,

(参考訳) 人間と機械の両方のスケーラブルなイメージコーディングは、最近多くの注目を集めているテクニックです。この技術は、人間の視覚と画像認識モデルのための画像の階層的復号化を可能にする。画像が両方の目的を果たす必要がある場合、非常に効果的な方法である。しかし、一般的な画像圧縮方式でよく使われるポストプロセッシングを人や機械のスケーラブルな画像符号化法に組み込んだ研究はまだない。本稿では,ポストプロセッシングをスケーラブルな符号化方式に統合することにより,人間のデコード画像の品質を向上させる手法を提案する。実験結果から, 後処理により圧縮性能が向上することが示された。さらに,従来の手法との比較により,提案手法の有効性を検証した。

Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression schemes into scalable image coding method for humans and machines. In this paper, we propose a method to enhance the quality of decoded images for humans by integrating post-processing into scalable coding scheme. Experimental results show that the post-processing improves compression performance. Furthermore, the effectiveness of the proposed method is validated through comparisons with traditional methods.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# シフトベクトルの測地的性質と量子化

Geodesic nature and quantization of shift vector ( http://arxiv.org/abs/2405.13355v2 )

ライセンス: Link先を確認

Hua Wang, Kai Chang,

(参考訳) 量子系における幾何シフトベクトルの測地的性質と量子化について、Wilsonループアプローチを用いてブロッホ運動量によって定義されるパラメータ空間について述べる。我々の分析は、非垂直遷移を持つボソニックフォノンドラッグシフトベクトルを含むまで拡張する。ゲージ不変シフトベクトルは、滑らかな境界を持つ多様体に対するガウス・ボンネットの定理に基づくオイラー特性に類似した整数値として量子化できることを示した。シフトベクトル、ベリー曲率、量子計量などの幾何量の間の複雑な関係を明らかにする。その結果, 量子化バンド式におけるシフトベクトルのループ積分は, 円光ガルバニック効果における導電率のトレースの非量子化成分に寄与することが示唆された。ウィルソンループ法は第一原理計算を容易にし、これらのバンド間ゲージ不変量の幾何学的基盤に関する洞察を与え、実材料における非線形光学的現象に光を遮蔽する。

We present the geodesic nature and quantization of geometric shift vector in quantum systems, with the parameter space defined by the Bloch momentum, using the Wilson loop approach. Our analysis extends to include bosonic phonon drag shift vectors with non-vertical transitions. We demonstrate that the gauge invariant shift vector can be quantized as integer values, analogous to the Euler characteristic based on the Gauss-Bonnet theorem for a manifold with a smooth boundary. We reveal intricate relationships among geometric quantities such as the shift vector, Berry curvature, and quantum metric. Our findings demonstrate that the loop integral of the shift vector in the quantized interband formula contributes to the non-quantized component of the trace of conductivity in the circular photogalvanic effect. The Wilson loop method facilitates first-principles calculations, providing insights in the geometric underpinnings of these interband gauge invariant quantities and shedding light on their nonlinear optical manifestations in real materials.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# 分散調和:フェデレートされたクラスタバッチ効果の調整と一般化

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization ( http://arxiv.org/abs/2405.15081v2 )

ライセンス: Link先を確認

Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul Thompson, Jiayu Zhou,

(参考訳) 独立かつ同一に分散したデータ(d.d.)は多くのデータ分析とモデリング技術に不可欠である。医療分野において、複数の施設や施設からデータを収集することは、医療データの分散性によって決定される十分な臨床多様性を保証する共通の戦略である。しかし、各地のデータは、現地の環境や施設によって容易にバイアスを受け、従ってi.d.ルールに違反する。一般的な戦略は、重要な生物学的情報を保持しながら、サイトのバイアスを調和させることである。 ComBatは最も人気のある調和方式の一つであり、最近分散サイトを扱うように拡張されている。しかし、新しく加入したサイトが未知のサイトからデータをトレーニングしたり、評価したりする状況に直面している場合、ComBatは互換性に欠け、すべてのサイトからのデータで再トレーニングする必要がある。再訓練は計算上のオーバーヘッドとロジスティックなオーバーヘッドをもたらし、通常は禁止される。本研究では,異なるサイトのデータのクラスタパターンを活用し,ComBatのハーモニゼーションのユーザビリティを大幅に向上させる新しいクラスタ・コンバット・ハーモニゼーション・アルゴリズムを提案する。提案手法の優位性を実証するために,ADNIによる広範囲なシミュレーションと実際の医用画像データを用いた。私たちのコードはhttps://github.com/illidanlab/distributed-cluster-harmonizationで提供されます。

Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.

翻訳日:2024-06-19 04:28:22 公開日:2024-06-17

# MindStar: 推論時間における事前学習LDMにおける数学推論の強化

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time ( http://arxiv.org/abs/2405.16265v2 )

ライセンス: Link先を確認

Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Boxing Chen,

(参考訳) 大きな言語モデル(LLM)は様々なタスクで顕著なパフォーマンスを達成するが、数学的な疑問に答えるなど複雑な推論タスクに苦しむことが多い。この問題に対処する最近の取り組みは、主に教師付き微調整技術や自己改善技術による数学的データセットの活用に焦点を当てている。しかし、これらの手法は、しばしば準備が難しい高品質なデータセットに依存するか、あるいは微調整のためにかなりの計算資源を必要とする。 LLMが正しい答えを生成する方法を知っているが、正しい推論経路を選択するのに苦労しているという発見に触発されて、我々は純粋に推論に基づく探索手法であるMindStar (M*)を提案する。本手法は,探索問題として推論タスクを定式化し,最適な推論経路を特定するための2つの探索アイデアを提案する。 GSM8KとMATHの両方のデータセット上でM*フレームワークを評価し,その性能を既存のオープンソースLLMと比較した。その結果,M* は Llama-2-13B や Mistral-7B などのオープンソースモデルの推論能力を大幅に向上し,GPT-3.5 や Grok-1 に匹敵する性能が得られたが,モデルサイズや計算コストは大幅に削減された。

Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datasets that are difficult to prepare, or they require substantial computational resources for fine-tuning. Inspired by findings that LLMs know how to produce the right answer but struggle to select the correct reasoning path, we propose a purely inference-based searching method -- MindStar (M*). This method formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. We evaluate the M* framework on both the GSM8K and MATH datasets, comparing its performance with existing open and closed-source LLMs. Our results demonstrate that M* significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1, but with substantially reduced model size and computational costs.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# ステラー光曲線のスケーリング法則

The Scaling Law in Stellar Light Curves ( http://arxiv.org/abs/2405.17156v2 )

ライセンス: Link先を確認

Jia-Shu Pan, Yuan-Sen Ting, Yang Huang, Jie Yu, Ji-Feng Liu,

(参考訳) 恒星の光曲線として知られる恒星からの一連のフラックスを分析することで、恒星の性質に関する貴重な情報を明らかにすることができる。しかし、現在のほとんどの手法は要約統計の抽出に依存しており、ディープラーニングを用いた研究は教師付きアプローチに限られている。本研究では、天文時系列データから学習するときに現れるスケーリング法則について、自己監督技術を用いて検討する。 GPT-2アーキテクチャを用いることで,パラメータ数が10^4$から10^9$に増加するにつれて,性能の低下の兆候がなく,学習表現が向上することを示す。本研究では, 自監督トランスフォーマーモデルを用いて, 恒星の表面重力を下流の課題として推定した場合の, 最先端の教師付き学習モデルと比較して, サンプル効率を310倍に向上させることを示した。本研究は,大規模自己回帰生成モデルを用いて恒星の光度曲線を解析するための基礎研究である。

Analyzing time series of fluxes from stars, known as stellar light curves, can reveal valuable information about stellar properties. However, most current methods rely on extracting summary statistics, and studies using deep learning have been limited to supervised approaches. In this research, we investigate the scaling law properties that emerge when learning from astronomical time series data using self-supervised techniques. By employing the GPT-2 architecture, we show the learned representation improves as the number of parameters increases from $10^4$ to $10^9$, with no signs of performance plateauing. We demonstrate that a self-supervised Transformer model achieves 3-10 times the sample efficiency compared to the state-of-the-art supervised learning model when inferring the surface gravity of stars as a downstream task. Our research lays the groundwork for analyzing stellar light curves by examining them through large-scale auto-regressive generative models.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# インテクスト学習のためのベンチマーク

Benchmarking General Purpose In-Context Learning ( http://arxiv.org/abs/2405.17234v4 )

ライセンス: Link先を確認

Fan Wang, Chuan Lin, Yang Cao, Yu Kang,

(参考訳) インコンテキスト学習(ICL)は、柔軟性、汎用性、サンプル効率、人工最適化スキルの免除などにより、AIコミュニティにますますアピールしている。汎用インコンテキスト学習(GPICL)の概念がもたらされるICLの汎用性と能力のさらなる向上が望まれる。我々は、より広い範囲のタスクに対応するためにICLを拡張し、比較的制限されたゼロショットの一般化を伴いながら、学習の地平を拡大し、改善の可能性を高めることを目指している。この目的のために、GPICLの機能のトレーニングと評価に特化して開発された2つの軽量で洞察に富んだベンチマークを導入する。各ベンチマークには、大きなタスク分散を特徴とする膨大なタスクが含まれており、最小限の帰納バイアスが特徴である。これらのタスクは、連続した生成と相互作用を通じて、生涯にわたるコンテキスト内学習を促進するように設計されている。これらの特徴は、言語モデル、決定モデル、世界モデルなどの能力を向上させるために文脈や相互作用に依存するモデルに重大な課題をもたらす。実験の結果,パラメータのスケールはICLやGPICLにとって重要ではなく,コンテキストやメモリ状態のスケールを増大させるような代替手法が提案されている。

In-context learning (ICL) is becoming increasingly appealing to the AI community due to its flexibility, generality, sample efficiency, and exemption from artificial optimization skills. It is desirable to further enhance the generality and capability of ICL, which gives rise to the concept of general-purpose in-context learning (GPICL). We aim to extend ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, albeit with relatively limited zero-shot generalization. To this end, we introduce two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark includes a vast number of tasks characterized by significant task variance, featuring minimal inductive bias. These tasks are also designed to facilitate lifelong in-context learning through continuous generation and interaction. These features pose significant challenges for models that rely on context or interactions to improve their proficiency, including language models, decision models, and world models. Our experiments reveal that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# 観測量子ドットの力学における近藤-世野交叉

Kondo-Zeno crossover in the dynamics of a monitored quantum dot ( http://arxiv.org/abs/2405.17348v2 )

ライセンス: Link先を確認

Matthieu Vanhoecke, Marco Schirò,

(参考訳) 金属浴に結合した量子ドットの力学について検討し, 電荷密度の連続モニタリングを行った。測定ノイズ上で平均化された力学は、局所マルコフのデプションを持つ散逸的アンダーソン不純物モデルにより記述され、ベクトル化されたヒルベルト空間における非閉近似の拡張を用いて解決する。浴槽と監視プロトコルに突然結合した初期偏光スピンの崩壊時間スケールは, 相互作用によって制御された近藤スクリーニングから量子ゼノ効果へのクロスオーバーを示し, 脱落・監視速度が増大するにつれて, 脱落とともに減少する寿命を示す。リンドブラディアン上のシュリーファー・ヴォルフ変換を用いて、複素数値スピン-スピン交換を持つ非エルミート・コンドモデルによって弱散逸時に記述される長時間力学の有効モデルが導出される。ダブルロン生成による脱落反応の加熱が増加すると、スピン崩壊が制御される。

We study the dynamics of a quantum dot coupled to a metallic bath and subject to continuous monitoring of its charge density. The dynamics averaged over measurement noise is described by a dissipative Anderson impurity model with local Markovian dephasing, that we solve using an extension of the Non-Crossing Approximation in the vectorized Hilbert space. We show that the decay time scale of an initially polarised spin which is suddenly coupled to the bath and to the monitoring protocol displays a crossover from Kondo screening, with a lifetime controlled by interactions, to Quantum Zeno effect, with a lifetime which decreases with bare dissipation as the dephasing or monitoring rate is increased. Using a Schrieffer-Wolff transformation on the Lindbladian we derive an effective model for the long-time dynamics which is described at weak dissipation by a non-Hermitian Kondo model with complex-valued spin-spin exchange. As the dephasing is increased heating due to doublon production takes over and control the spin decay.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# ART: テキストから画像への自動リピートによるユーザ保護

ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users ( http://arxiv.org/abs/2405.19360v2 )

ライセンス: Link先を確認

Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, Tianwei Zhang,

(参考訳) 大規模で事前訓練された生成モデルは、創造的なコンテンツを生成する能力のために、世界を嵐にさらしている。一方、これらの生成モデルの保護は、ユーザの権利と安全性を保護するために開発されており、そのほとんどは大規模言語モデル用に設計されている。既存の手法は主に、悪質なプロンプトの下でモデルの安全性を評価するジェイルブレイクと敵攻撃に焦点を当てている。最近の研究によると、手作業で安全なプロンプトを作れば、意図せずに安全でない世代が引き起こされる可能性がある。そこで本研究では,テキスト・ツー・イメージモデルの安全性リスクを定量的に評価するために,新しい自動レッド・チーム・フレームワークARTを提案する。本手法は,視覚言語モデルと大言語モデルの両方を活用し,安全でない世代とそのプロンプト間の接続を確立することにより,モデルの脆弱性をより効率的に識別する。包括的実験により、人気のあるオープンソーステキスト・ツー・イメージモデルの毒性を明らかにする。実験はまた、ARTの有効性、適応性、および大きな多様性を検証した。さらに,テキスト・ツー・イメージ・モデルに関連する安全性リスクを研究するために,大規模な3つのレッド・チーム・データセットを導入する。データセットとモデルはhttps://github.com/GuanlinLee/ARTで確認できる。

Large-scale pre-trained generative models are taking the world by storm, due to their abilities in generating creative content. Meanwhile, safeguards for these generative models are developed, to protect users' rights and safety, most of which are designed for large language models. Existing methods primarily focus on jailbreak and adversarial attacks, which mainly evaluate the model's safety under malicious prompts. Recent work found that manually crafted safe prompts can unintentionally trigger unsafe generations. To further systematically evaluate the safety risks of text-to-image models, we propose a novel Automatic Red-Teaming framework, ART. Our method leverages both vision language model and large language model to establish a connection between unsafe generations and their prompts, thereby more efficiently identifying the model's vulnerabilities. With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models. The experiments also validate the effectiveness, adaptability, and great diversity of ART. Additionally, we introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models. Datasets and models can be found in https://github.com/GuanlinLee/ART.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# TetSphere Splatting:ラグランジアン体積メッシュを用いた高品質形状の表現

TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes ( http://arxiv.org/abs/2405.20283v2 )

ライセンス: Link先を確認

Minghao Guo, Bohan Wang, Kaiming He, Wojciech Matusik,

(参考訳) 高品質な幾何学を用いて3次元形状を再構成するための明示的なラグランジュ表現であるTetSphere splattingを提案する。ニューラル暗黙的(例えば、NeRF、NeuS)と明示的(例えば、DMTet)の両方を含むユーレリア表現を主に用いた従来のオブジェクト再構成手法とは異なり、高い計算要求と最適メッシュ品質に苦しむ場合が多いが、TetSphere splatting は未使用で非常に効果的な原始的四面体メッシュを利用する。このアプローチでは、ニューラルネットワークや後処理に頼ることなく、メッシュ品質が直接的に向上する。複数の初期四面体球を変形させ、微分可能レンダリングと幾何エネルギー最適化を組み合わせて3次元形状を正確に再構成し、計算効率を著しく向上させる。 Tet-Sphereのスプラッティングは、堅牢で汎用的な幾何学表現として機能し、シングルビューの3D再構成、画像とテキストの3Dコンテンツ生成など、多様なアプリケーションにシームレスに統合される。実験結果から,TetSphereスプラッティングは既存の表現よりも優れており,最適化速度の向上,メッシュ品質の向上,薄型構造物の信頼性維持を実現している。

We present TetSphere splatting, an explicit, Lagrangian representation for reconstructing 3D shapes with high-quality geometry. In contrast to conventional object reconstruction methods which predominantly use Eulerian representations, including both neural implicit (e.g., NeRF, NeuS) and explicit representations (e.g., DMTet), and often struggle with high computational demands and suboptimal mesh quality, TetSphere splatting utilizes an underused but highly effective geometric primitive -- tetrahedral meshes. This approach directly yields superior mesh quality without relying on neural networks or post-processing. It deforms multiple initial tetrahedral spheres to accurately reconstruct the 3D shape through a combination of differentiable rendering and geometric energy optimization, resulting in significant computational efficiency. Serving as a robust and versatile geometry representation, Tet-Sphere splatting seamlessly integrates into diverse applications, including single-view 3D reconstruction, image-/text-to-3D content generation. Experimental results demonstrate that TetSphere splatting outperforms existing representations, delivering faster optimization speed, enhanced mesh quality, and reliable preservation of thin structures.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# MTEB- French: フランス語文の埋め込み評価と分析のためのリソース

MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis ( http://arxiv.org/abs/2405.20468v2 )

ライセンス: Link先を確認

Mathieu Ciancone, Imene Kerboua, Marion Schaeffer, Wissam Siblini,

(参考訳) 近年、様々なNLPタスクに多くの埋め込みモデルが利用可能となり、広く使われている。 MTEB(Massive Text Embedding Benchmark)は、主に英語のいくつかのタスクでうまく機能するモデルを選択するプロセスを単純化しているが、他の言語への拡張は難しいままである。そこで、MTEBを拡張して、フランス語の文埋め込みに関する最初の大規模なベンチマークを提案する。 15の既存のデータセットを使いやすいインターフェースで収集し、8つのタスクカテゴリのグローバル評価のための3つの新しいフランス語データセットを作成します。我々は,大規模に選択した51個の埋め込みモデルを比較し,包括的統計テストを行い,モデル性能と多くの特性の相関関係を解析した。すべてのタスクにおいてモデルが最良でない場合でも、文類似性に基づいて事前訓練された大規模多言語モデルは非常によく機能することがわかった。私たちの作業には、オープンソースコード、新しいデータセット、公開リーダボードが含まれています。

Recently, numerous embedding models have been made available and widely used for various NLP tasks. The Massive Text Embedding Benchmark (MTEB) has primarily simplified the process of choosing a model that performs well for several tasks in English, but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. We gather 15 existing datasets in an easy-to-use interface and create three new French datasets for a global evaluation of 8 task categories. We compare 51 carefully selected embedding models on a large scale, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform exceptionally well. Our work comes with open-source code, new datasets and a public leaderboard.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# Ovis: マルチモーダル大言語モデルのための構造埋め込みアライメント

Ovis: Structural Embedding Alignment for Multimodal Large Language Model ( http://arxiv.org/abs/2405.20797v2 )

ライセンス: Link先を確認

Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye,

(参考訳) 現在のMultimodal Large Language Models (MLLM) は、通常、事前訓練されたLLMと、MLPのようなコネクタを通じて、他の事前訓練されたビジョントランスフォーマーを統合する。しかし、MLLMの2つの埋め込み戦略(埋め込みルックアップテーブルに基づく構造的テキスト埋め込みと、ビジョンエンコーダによって直接生成される継続的埋め込み)の相違は、視覚的およびテキスト情報のよりシームレスな融合に挑戦する。視覚とテキストの埋め込みを構造的に整列する新しいMLLMアーキテクチャであるOvisを提案する。 Ovisは学習可能なビジュアル埋め込みテーブルをビジュアルエンコーダのプロセスに統合する。リッチな視覚的セマンティクスをキャプチャするために、各イメージパッチは視覚的埋め込みテーブルを複数回インデックスし、最終的な視覚的埋め込みはインデックス化された埋め込みの確率的組み合わせとなる。この構造的アプローチは、テキスト埋め込みを生成するために使われる手法を反映している。様々なマルチモーダルベンチマークにおける実証的な評価は、Ovisが同様のパラメータスケールのオープンソースMLLMよりも優れており、Qwen-VL-Plusのプロプライエタリモデルよりも優れていることを示している。これらの結果は,MLLMアーキテクチャ設計を推進し,より効果的なマルチモーダル学習を促進するために,Ovisが構築した視覚表現の可能性を強調している。コード、データセット、モデルはhttps://github.com/AIDC-AI/Ovis.comで入手できる。

Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information. We propose Ovis, a novel MLLM architecture designed to structurally align visual and textual embeddings. Ovis integrates an additional learnable visual embedding table into the visual encoder's process. To capture rich visual semantics, each image patch indexes the visual embedding table multiple times, resulting in a final visual embedding that is a probabilistic combination of the indexed embeddings. This structural approach mirrors the method used for generating textual embeddings. Empirical evaluations on various multimodal benchmarks show that Ovis outperforms open-source MLLMs of similar parameter scales and even surpasses the proprietary model Qwen-VL-Plus overall. These results highlight the potential of Ovis' structured visual representation for advancing MLLM architectural design and promoting more effective multimodal learning. Code, datasets, and models are available at https://github.com/AIDC-AI/Ovis.

翻訳日:2024-06-19 04:18:36 公開日:2024-06-17

# 量子最適制御における基底の役割

The Role of Bases in Quantum Optimal Control ( http://arxiv.org/abs/2405.20889v2 )

ライセンス: Link先を確認

Alice Pagano, Matthias M Müller, Tommaso Calarco, Simone Montangero, Phila Rembold,

(参考訳) 量子最適制御(QOC)は、パルスレベルで問題に取り組むことで量子技術の進歩をサポートする: 数値的なアプローチは、有限個の変数で適用された時間依存フィールドをパラメトリすることで、与えられたターゲットに向かって反復的に機能する。結果の最適化の有効性は、問題の複雑さと変数の数に依存する。応用基底の選択が最適化の品質に影響を及ぼすかどうかを問うため、基底関数の観点から異なるパラメトリを考察する。さらに、最も適切な基盤を選択するための戦略も検討する。比較のために,シック基底とシグモイド基底をフーリエ基底の代替として導入する3つの異なるランダム化可能な基底を,複雑さの異なるQOC問題に基づいて検証した。各問題に対して、基底固有の収束速度は、一意のランク付けをもたらす。特にクローズドループでの高価な評価では、最大10倍のスピードアップが最適化の実現可能性に不可欠である。問題依存に基づく基本選択はQOC効率に影響を及ぼす要因であり、そのアプローチに対するアドバイスを提供すると結論付けている。

Quantum Optimal Control (QOC) supports the advance of quantum technologies by tackling its problems at the pulse level: Numerical approaches iteratively work towards a given target by parametrising the applied time-dependent fields with a finite set of variables. The effectiveness of the resulting optimisation depends on the complexity of the problem and the number of variables. We consider different parametrisations in terms of basis functions, asking whether the choice of the applied basis affects the quality of the optimisation. Furthermore, we consider strategies to choose the most suitable basis. For the comparison, we test three different randomisable bases - introducing the sinc and sigmoid bases as alternatives to the Fourier basis - on QOC problems of varying complexity. For each problem, the basis-specific convergence rates result in a unique ranking. Especially for expensive evaluations, e.g., in closed-loop, a potential speed-up by a factor of up to 10 may be crucial for the optimisation's feasibility. We conclude that a problem-dependent basis choice is an influential factor for QOC efficiency and provide advice for its approach.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# KnowledgeHub: 科学的発見を支援するエンドツーエンドツール

KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery ( http://arxiv.org/abs/2406.00008v2 )

ライセンス: Link先を確認

Shinnosuke Tanaka, James Barry, Vishnudev Kuruvanthodi, Movina Moses, Maxwell J. Giammona, Nathan Herr, Mohab Elkaref, Geeth De Mel,

(参考訳) 本稿では、知識Hubツール、科学文献情報抽出(IE)および質問回答(QA)パイプラインについて述べる。これはPDF文書がテキストや構造化表現に変換されるのをサポートすることで達成される。オントロジーは、ユーザがキャプチャしたいエンティティとリレーションのタイプを定義するように構築できる。ブラウザベースのアノテーションツールは、オントロジーに従ってPDF文書の内容に注釈を付けることができる。名前付きエンティティ認識(NER)と関係分類(RC)モデルは、結果として得られたアノテーションに基づいてトレーニングすることができ、文書の注釈のない部分を注釈付けするのに使うことができる。これらのエンティティと関係トリプルから知識グラフを構築し、データから洞察を得るためにクエリすることができる。さらに,QAや要約に使用できるLarge Language Models (LLMs) のスイートを統合する。 KnowledgeHubは、アノテーション、IE、QAをサポートするユニークなツールである。

This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. Named Entity Recognition (NER) and Relation Classification (RC) models can be trained on the resulting annotations and can be used to annotate the unannotated portion of the documents. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data. Furthermore, we integrate a suite of Large Language Models (LLMs) that can be used for QA and summarisation that is grounded in the included documents via a retrieval component. KnowledgeHub is a unique tool that supports annotation, IE and QA, which gives the user full insight into the knowledge discovery pipeline.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# 騒音・不確実環境における深部RL用逆流機

Reward Machines for Deep RL in Noisy and Uncertain Environments ( http://arxiv.org/abs/2406.00120v2 )

ライセンス: Link先を確認

Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith,

(参考訳) Reward Machinesは、命令、安全性の制約、その他の時間的に拡張された報酬に値する振る舞いを指定するための、オートマチックにインスパイアされた構造を提供する。複雑な報酬関数構造を公開することで、サンプル効率が著しく向上した反実的学習の更新が可能になる。 Reward Machinesは表と奥のRL設定の両方で使われているが、典型的には、報酬関数の構成要素を形成するドメイン固有の語彙の地味な解釈に依存している。このような地味な解釈は、部分的な可観測性やノイズ感知のために、現実世界で多くの場面で解明することができる。本稿では,雑音および不確実な環境における深部RLに対するReward Machinesの利用について検討する。我々はこの問題をPOMDPとして特徴付け、ドメイン固有語彙の不確定な解釈の下でタスク構造を利用するRLアルゴリズムスイートを提案する。理論的解析により,本問題に対する直感的なアプローチの落とし穴が明らかとなり,実験結果から,我々のアルゴリズムはタスク構造をうまく活用し,語彙のノイズの多い解釈下での性能向上を図っている。本研究では,Reward Machinesを部分的に観測可能な環境で活用するための一般的なフレームワークを提供する。

Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While Reward Machines have been employed in both tabular and deep RL settings, they have typically relied on a ground-truth interpretation of the domain-specific vocabulary that form the building blocks of the reward function. Such ground-truth interpretations can be elusive in many real-world settings, due in part to partial observability or noisy sensing. In this paper, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that leverage task structure under uncertain interpretation of domain-specific vocabulary. Theoretical analysis exposes pitfalls in naive approaches to this problem, while experimental results show that our algorithms successfully leverage task structure to improve performance under noisy interpretations of the vocabulary. Our results provide a general framework for exploiting Reward Machines in partially observable environments.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# A-SDM:モデルアセンブリと特徴継承戦略による安定拡散の加速

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies ( http://arxiv.org/abs/2406.00210v3 )

ライセンス: Link先を確認

Jinchao Zhu, Yuxuan Wang, Siyuan Pan, Pengfei Wan, Di Zhang, Gao Huang,

(参考訳) 安定拡散モデル (SDM) はテキスト・トゥ・イメージ (T2I) と画像・ツー・イメージ (I2I) 生成のための一般的かつ効果的なモデルである。サンプル最適化、モデル蒸留、ネットワーク量子化の様々な試みにもかかわらず、これらのアプローチは典型的には元のネットワークアーキテクチャを維持している。広範なパラメータスケールとかなりの計算要求により、モデルアーキテクチャの調整に関する研究は限られている。本研究では,SDMにおける冗長計算の削減に焦点をあて,チューニング不要とチューニング不要の両方の手法を用いてモデルを最適化する。 1) 本手法では, 蒸留により性能を保ちつつ, 軽量モデルを再構築するためのモデル組立戦略を設計する。第2に, プレニングによる性能低下を軽減するため, 圧縮ユニセットにマルチエキスパート条件付き畳み込み(ME-CondConv)を導入し, 速度を犠牲にすることなく, ネットワーク性能を向上させる。第3に,ネットワーク速度向上のためのマルチUNet切替方式の有効性を検証する。 2)チューニング不要な手法では,ネットワーク構造内のブロック,層,単位レベルの局所計算をスキップすることで,推論を高速化する機能継承戦略を提案する。また、時間段階における特徴継承のための複数のサンプリングモードについても検討する。実験により,提案手法とチューニング不要手法の両方がSDMの高速化と性能向上を図っている。モデル組立戦略によって再構成された軽量モデルは、生成速度を22.4%高め、特徴継承戦略はSDM生成速度を40.0%高めにする。

The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. This study focuses on reducing redundant computation in SDM and optimizes the model through both tuning and tuning-free methods. 1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation. Second, to mitigate performance loss due to pruning, we incorporate multi-expert conditional convolution (ME-CondConv) into compressed UNets to enhance network performance by increasing capacity without sacrificing speed. Third, we validate the effectiveness of the multi-UNet switching method for improving network speed. 2) For the tuning-free method, we propose a feature inheritance strategy to accelerate inference by skipping local computations at the block, layer, or unit level within the network structure. We also examine multiple sampling modes for feature inheritance at the time-step level. Experiments demonstrate that both the proposed tuning and the tuning-free methods can improve the speed and performance of the SDM. The lightweight model reconstructed by the model assembly strategy increases generation speed by $22.4%$, while the feature inheritance strategy enhances the SDM generation speed by $40.0%$.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# Bileve: 双方向署名によるスポーフィングに対する大規模言語モデルにおけるテキストの保護

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature ( http://arxiv.org/abs/2406.01946v2 )

ライセンス: Link先を確認

Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren,

(参考訳) 大規模言語モデル(LLM)のテキスト透かしは、ディープフェイクや有害なコンテンツと闘う際の責任評価を約束する機械生成コンテンツの起源を特定するために一般的に用いられてきた。既存の透かし技術は、通常、除去攻撃に対する堅牢性を優先するが、残念ながら、悪質なアクターはLLM生成の応答の意味を微妙に変更したり、有害なコンテンツを偽造したり、LLM開発者の非難を招きかねない。この問題を解決するために、二レベルシグネチャスキームであるBileveを導入する。これは、整合性チェック(スプーフィング攻撃の軽減)のためのきめ細かいシグネチャビットを埋め込むとともに、新しいランクベースのサンプリング戦略により、シグネチャが無効(検出可能性の向上)であるときにテキストソースをトレースする粗いシグネチャビットを埋め込む。バイナリ結果のみを出力する従来の透かし検出器と比較して、Bileveは検出中に5つのシナリオを区別し、テキストの出所を確実に追跡し、LLMを調整できる。 OPT-1.3BとLLaMA-7Bで実施された実験は、検出性を高めたスプーフ攻撃を打破するBileveの有効性を実証した。

Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# 圧縮率の高いキー情報の保持:LCM用クエリ誘導圧縮機

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ( http://arxiv.org/abs/2406.02376v2 )

ライセンス: Link先を確認

Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su,

(参考訳) 大規模言語モデルの人気が高まり、LLM(Large Language Models)のコンテキスト圧縮への関心が高まった。しかし、圧縮比が増加するにつれて従来の手法の性能は劇的に低下し、時にはクローズドブックレベルにまで低下する。この減少は、圧縮プロセス中にキー情報が失われることに起因する。本研究は, 高圧縮比下でのモデル性能を維持するために重要な情報を保持することの重要性を強調し, この仮説を支持する。その結果,QGC (Query-Guided Compressor) を導入し,クエリを利用してコンテキスト圧縮プロセスのガイドを行い,圧縮されたコンテキスト内のキー情報を効果的に保存する。さらに、動的圧縮戦略を採用する。提案したQGCの有効性を,NaturalQuestions,TriviaQA,HotpotQAデータセットを含む質問応答タスクで検証する。実験結果から,QGCは高い圧縮比でも一貫した性能を示し,推算コストとスループットの面でも有益であることがわかった。

The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# モニタリングされた単一粒子動力学における多フラクタル性

Multifractality in monitored single-particle dynamics ( http://arxiv.org/abs/2406.02386v2 )

ライセンス: Link先を確認

Kohei Yajima, Hisanori Oshima, Ken Mochizuki, Yohei Fuji,

(参考訳) 繰り返し測定した単一粒子の時間発展におけるマルチフラクタル特性について検討した。量子系では、局所的なユニタリゲートと局所射影測定からなる回路モデルを考える。古典系では,局所遷移過程下で発達した粒子の軌道を部分的に測定することで推定するモデルを考える。どちらの場合も、波動関数のアンサンブルや、十分に長い時間経過後に測定結果に条件付けられた確率分布にマルチフラクタルな挙動が現れる。粒子輸送の性質(拡散性または弾道性)は、多フラクタル特性に質的に影響を及ぼすが、測定速度や特定のプロトコルに対してさえ定量的に堅牢である。一方、多フラクタル性は、誤った結果が得られるような一般化された測定や、粒子検出のない結果のポストセレクションによって一般的に失われる。数値シミュレーションによりこれらの特性を実証し、また、監視された単一粒子系における多重フラクタル特性を解析的に得るために、いくつかの単純化されたモデルを提案する。

We study multifractal properties in time evolution of a single particle subject to repeated measurements. For quantum systems, we consider circuit models consisting of local unitary gates and local projective measurements. For classical systems, we consider models for estimating the trajectory of a particle evolved under local transition processes by partially measuring particle occupations. In both cases, multifractal behaviors appear in the ensemble of wave functions or probability distributions conditioned on measurement outcomes after a sufficiently long time. While the nature of particle transport (diffusive or ballistic) qualitatively affects the multifractal properties, they are even quantitatively robust to the measurement rate or specific protocols. On the other hand, multifractality is generically lost by generalized measurements allowing erroneous outcomes or by postselection of the outcomes with no particle detection. We demonstrate these properties by numerical simulations and also propose several simplified models, which allow us to analytically obtain multifractal properties in the monitored single-particle systems.

翻訳日:2024-06-19 04:08:51 公開日:2024-06-17

# 深層強化学習による自動微分の最適化

Optimizing Automatic Differentiation with Deep Reinforcement Learning ( http://arxiv.org/abs/2406.05027v2 )

ライセンス: Link先を確認

Jamie Lohoff, Emre Neftci,

(参考訳) 自動微分を持つ計算ジャコビアン(英語版)は、機械学習、計算流体力学、ロボット工学、ファイナンスなど、多くの科学分野においてユビキタスである。ヤコビアン計算における計算量やメモリ使用量の小さな削減でさえ、既にエネルギー消費と実行時の大幅な削減を招いている。このような貯蓄を許容する多くの方法が存在するが、それらは一般に、正確なヤコビアンを近似するために計算効率を交換する。本稿では、深い強化学習(RL)とクロスカントリー除去という概念を活用して、ジャコビアン計算に必要な乗算数を最適化する新しい手法を提案する。クロスカントリー除去は、ジャコビアン累積を計算グラフ上の全ての頂点の順序づけられた除去として表現する自動微分のフレームワークであり、全ての除去が一定の計算コストを発生させる。本稿では,RLエージェントがプレイする単一プレイヤーゲームとして必要な乗算数を最小化する最適消去順序の探索を定式化する。本手法は,様々な領域から取得した複数のタスクに対して,最先端の手法よりも最大33%改善できることを実証する。さらに、これらの理論的なゲインは、得られた除去順序を効率的に実行可能なJAXのクロスカントリー除去インタプリタを提供することにより、実際のランタイム改善に変換されることを示す。

Computing Jacobians with automatic differentiation is ubiquitous in many scientific domains such as machine learning, computational fluid dynamics, robotics and finance. Even small savings in the number of computations or memory usage in Jacobian computations can already incur massive savings in energy consumption and runtime. While there exist many methods that allow for such savings, they generally trade computational efficiency for approximations of the exact Jacobian. In this paper, we present a novel method to optimize the number of necessary multiplications for Jacobian computation by leveraging deep reinforcement learning (RL) and a concept called cross-country elimination while still computing the exact Jacobian. Cross-country elimination is a framework for automatic differentiation that phrases Jacobian accumulation as ordered elimination of all vertices on the computational graph where every elimination incurs a certain computational cost. We formulate the search for the optimal elimination order that minimizes the number of necessary multiplications as a single player game which is played by an RL agent. We demonstrate that this method achieves up to 33% improvements over state-of-the-art methods on several relevant tasks taken from diverse domains. Furthermore, we show that these theoretical gains translate into actual runtime improvements by providing a cross-country elimination interpreter in JAX that can efficiently execute the obtained elimination orders.

翻訳日:2024-06-19 02:10:30 公開日:2024-06-17

# VALL-E 2:ニューラルコーデック言語モデルは、音声合成のための人間のパーティゼロショットテキストである

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers ( http://arxiv.org/abs/2406.05370v2 )

ライセンス: Link先を確認

Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei,

(参考訳) 本稿では,ゼロショット音声合成(TTS)における節目となる,ニューラルコーデック言語モデルの最新の進歩であるVALL-E 2を紹介する。繰り返し認識サンプリング(Repetition Aware Smpling)は、デコード履歴におけるトークンの繰り返しを考慮して、元の核サンプリングプロセスを洗練する。復号化を安定化するだけでなく、無限ループ問題を回避している。 Grouped Code Modelingは、コーデックコードをグループに編成してシーケンス長を効果的に短縮する。 LibriSpeech と VCTK を用いた実験により,VALL-E 2 は音声の頑健性,自然性,話者の類似性において,従来のシステムを上回っていることがわかった。この種のベンチマークで人間と同等に到達したのは、これが初めてのことだ。さらに、VALL-E 2は、その複雑さや繰り返し句によって伝統的に困難な文であっても、高品質な音声を一貫して合成する。この研究の利点は、失語症のある人や筋萎縮性側索硬化症を持つ人のためのスピーチを生成するなど、貴重な努力に寄与する可能性がある。 VALL-E 2.0のデモはhttps://aka.ms/valle2を参照。

This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis. See https://aka.ms/valle2 for demos of VALL-E 2.

翻訳日:2024-06-19 02:10:30 公開日:2024-06-17

# 高次グラフニューラルネットワークのための高効率トポロジ対応データ拡張

Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks ( http://arxiv.org/abs/2406.05482v3 )

ライセンス: Link先を確認

Yurui Lai, Xiaoyang Lin, Renchi Yang, Hongtao Wang,

(参考訳) 近年,グラフニューラルネットワーク(GNN)がグラフ構造化データ学習の強力なツールとして登場し,様々な分野で実りある成功を収めている。 GNNの大多数はメッセージパッシングのパラダイムに従っており、各ノードの表現は隣人の機能を再帰的に集約することで学習される。しかし、このメカニズムは、高次グラフ(HDG)よりも過度にスムーシングと効率上の問題をもたらし、ほとんどのノードには、ソーシャルネットワーク、トランザクショングラフ、電力網など、数十(あるいは数百)の隣人が存在する。さらに、そのようなグラフは通常、リッチで複雑な構造意味論を含み、GNNの機能集約だけではキャプチャが困難である。上記の制限により,HDG上でのGNNのための効率的かつ効果的なフロントマウントデータ拡張フレームワークであるTADを提案する。内部では、TADには2つの重要なモジュールが含まれている。 (i)構造埋め込みによる特徴拡張、及び (ii) トポロジーと属性対応グラフのスパース化。前者は,高効率スケッチ法を用いて,グラフ構造を高品質な構造埋め込みに符号化することにより,拡張ノード特性とモデルキャパシティを向上させる。さらに、グラフ構造や属性から抽出したタスク関連特徴を利用して、第2モジュールは、入力グラフから多数の冗長/ノイズエッジの正確な識別と削減を可能にし、過剰なスムーシングを緩和し、HDGよりも高速な特徴集約を容易にする。経験的に、TADはノード分類の観点から8つの実ホモ親和性/ヘテロ親和性HDG上でのメインストリームGNNモデルの予測性能を著しく改善し、効率的なトレーニングと推論プロセスを実現している。

In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over high-degree graphs (HDGs), wherein most nodes have dozens (or even hundreds) of neighbors, such as social networks, transaction graphs, power grids, etc. Additionally, such graphs usually encompass rich and complex structure semantics, which are hard to capture merely by feature aggregations in GNNs. Motivated by the above limitations, we propose TADA, an efficient and effective front-mounted data augmentation framework for GNNs on HDGs. Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. The former obtains augmented node features and enhanced model capacity by encoding the graph structure into high-quality structure embeddings with our highly-efficient sketching method. Further, by exploiting task-relevant features extracted from graph structures and attributes, the second module enables the accurate identification and reduction of numerous redundant/noisy edges from the input graph, thereby alleviating over-smoothing and facilitating faster feature aggregations over HDGs. Empirically, TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification, while achieving efficient training and inference processes.

翻訳日:2024-06-19 02:10:30 公開日:2024-06-17

# DomainRAG: ドメイン固有検索拡張世代評価のための中国語ベンチマーク

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation ( http://arxiv.org/abs/2406.05654v2 )

ライセンス: Link先を確認

Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou,

(参考訳) Retrieval-Augmented Generation (RAG) は,幻覚やリアルタイム更新の維持の難しさといった,Large Language Models (LLM) のさまざまな制限に対処する,有望なソリューションを提供する。 LLMが専門家の知識をカバーするのに苦労する専門家やドメイン固有のアプリケーションでは、このアプローチは特に重要である。したがって、このようなシナリオにおけるRAGモデルの評価は極めて重要であるが、最近の研究は、共通センスの問題を解決する際のモデルの能力を評価するために、ウィキペディアのような一般的な知識ソースに依存していることが多い。本稿では,ドメイン固有の文脈,大学入学におけるRAG設定によるLCMの評価を行った。 RAGモデルに必要な機能として,会話RAGの能力,構造情報の分析,外部知識への忠実さ,妄想,時間依存問題の解決,多文書間相互作用の理解など6つを同定した。各機能は、RAGモデルのパフォーマンスを評価するために、共有コーパスに関連付けられたデータセットを持つ。 Llama,Baichuan,ChatGLM,GPTモデルなどのLLMの評価を行った。実験の結果,既存の閉書 LLM はドメイン固有の問題に悩まされており,専門家の問題を解決するためのRAG モデルの必要性を強調している。さらに、RAGモデルは、会話履歴の理解、構造情報の分析、装飾、多文書間相互作用の処理、専門家の知識への忠実さなどの能力を向上させる余地がある。今後の研究がこれらの問題をよりよく解決することを期待している。

Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, yet current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems. In this paper, we evaluated LLMs by RAG settings in a domain-specific context, college enrollment. We identified six required abilities for RAG models, including the ability in conversational RAG, analyzing structural information, faithfulness to external knowledge, denoising, solving time-sensitive problems, and understanding multi-document interactions. Each ability has an associated dataset with shared corpora to evaluate the RAG models' performance. We evaluated popular LLMs such as Llama, Baichuan, ChatGLM, and GPT models. Experimental results indicate that existing closed-book LLMs struggle with domain-specific questions, highlighting the need for RAG models to solve expert problems. Moreover, there is room for RAG models to improve their abilities in comprehending conversational history, analyzing structural information, denoising, processing multi-document interactions, and faithfulness in expert knowledge. We expect future studies could solve these problems better.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# 幾何学における接地連続表現:等変ニューラル場

Grounding Continuous Representations in Geometry: Equivariant Neural Fields ( http://arxiv.org/abs/2406.05753v3 )

ライセンス: Link先を確認

David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Sharvaree Vadgama, Efstratios Gavves, Erik J Bekkers,

(参考訳) 近年,ニューラルフィールドは連続的な信号を表現するための強力なモデリングパラダイムとして出現している。条件付きニューラルネットワークでは、フィールドはNeFを条件とする潜在変数で表現され、そのパラメータはデータセット全体にわたって共有される。クロスアテンション・トランスフォーマーをベースとした同変ニューラル場を提案する。NeFは、ラテント点雲である幾何条件変数に条件付けされ、ラテント点からフィールドへの同変復号を可能にする。我々の同変的アプローチは、場と潜伏剤の両方が幾何学的に接地され、場が変換されたときの変換法則に従属するステアビリティ特性を誘導する。重要なこととして、等式関係は、(1)被写体が幾何学的パターンをファイトフリーに表現でき、(2)被写体空間における幾何学的推論が可能であること、(2)空間的に類似したパターンを重み分けできること、およびフィールドのデータセットの効率的な学習を可能にすることを保証する。これらの主な特性は、他の非同変NeFアプローチと比較して、分類実験とデータセット全体を適合させる能力の検証によって検証される。さらに,一意な局所フィールド編集特性を示すことで,ENFの可能性を検証した。

Recently, Neural Fields have emerged as a powerful modelling paradigm to represent continuous signals. In a conditional neural field, a field is represented by a latent variable that conditions the NeF, whose parametrisation is otherwise shared over an entire dataset. We propose Equivariant Neural Fields based on cross attention transformers, in which NeFs are conditioned on a geometric conditioning variable, a latent point cloud, that enables an equivariant decoding from latent to field. Our equivariant approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws if the field transforms, the latent represents transforms accordingly and vice versa. Crucially, the equivariance relation ensures that the latent is capable of (1) representing geometric patterns faitfhully, allowing for geometric reasoning in latent space, (2) weightsharing over spatially similar patterns, allowing for efficient learning of datasets of fields. These main properties are validated using classification experiments and a verification of the capability of fitting entire datasets, in comparison to other non-equivariant NeF approaches. We further validate the potential of ENFs by demonstrate unique local field editing properties.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# マルチモーダル気候変化を考慮した作物収量予測のためのオープンかつ大規模データセット

An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware Crop Yield Predictions ( http://arxiv.org/abs/2406.06081v2 )

ライセンス: Link先を確認

Fudong Lin, Kaleb Guillot, Summer Crawford, Yihe Zhang, Xu Yuan, Nian-Feng Tzeng,

(参考訳) 正確な収穫予測は、食料の安全と持続可能な農業慣行を保証するために国家的に重要である。 AI-for-scienceアプローチは、薬物発見や降水流キャストなど、多くの科学的問題を解決する上で有望な成果を示したが、作物収量を予測するディープラーニングモデルの開発は、十分な情報を満たすために、複数のモダリティを持つオープンで大規模なディープラーニング対応データセットが欠如していることによって、常に妨げられている。これを改善するために,米国(アメリカ合衆国)大陸の気候変化を考慮した収量予測を対象とする,最初のテラバイト規模の,公開可能なマルチモーダルデータセットであるCropNetデータセットを紹介した。私たちのCropNetデータセットは、3つのデータ、すなわちSentinel-2 Imagery、WRF-HRRR Computed Dataset、USDA Crop Datasetで構成されており、6年間にわたる2200以上の米国郡(2017-2022年)で、短期間に成長する季節変動と長期気候変動の両方が収穫量に与える影響を考慮し、タイムリーかつ正確に郡レベルでの収穫量を予測するための多目的ディープラーニングモデルの開発を促進することが期待されている。さらに、CropNetパッケージを開発し、3種類のAPIを提供し、研究者が興味のある時間と領域でCropNetデータをダウンロードしやすくし、正確な収量予測のためのディープラーニングモデルを柔軟に構築する。気候変化を考慮した作物収量予測におけるCropNetデータセットの適用性と有効性を検証した。

Precise crop yield predictions are of national importance for ensuring food security and sustainable agricultural practices. While AI-for-science approaches have exhibited promising achievements in solving many scientific problems such as drug discovery, precipitation nowcasting, etc., the development of deep learning models for predicting crop yields is constantly hindered by the lack of an open and large-scale deep learning-ready dataset with multiple modalities to accommodate sufficient information. To remedy this, we introduce the CropNet dataset, the first terabyte-sized, publicly available, and multi-modal dataset specifically targeting climate change-aware crop yield predictions for the contiguous United States (U.S.) continent at the county level. Our CropNet dataset is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, for over 2200 U.S. counties spanning 6 years (2017-2022), expected to facilitate researchers in developing versatile deep learning models for timely and precisely predicting crop yields at the county-level, by accounting for the effects of both short-term growing season weather variations and long-term climate change on crop yields. Besides, we develop the CropNet package, offering three types of APIs, for facilitating researchers in downloading the CropNet data on the fly over the time and region of interest, and flexibly building their deep learning models for accurate crop yield predictions. Extensive experiments have been conducted on our CropNet dataset via employing various types of deep learning solutions, with the results validating the general applicability and the efficacy of the CropNet dataset in climate change-aware crop yield predictions.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# SUBLLM: LLMのためのToken Sequence Subsamplingを用いた新しい効率的なアーキテクチャ

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM ( http://arxiv.org/abs/2406.06571v2 )

ライセンス: Link先を確認

Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang,

(参考訳) 大規模言語モデル(LLM)は様々な分野で大きな成功を収めてきたが、トレーニングと推論の効率性は依然として大きな課題である。本稿では,Subsampling-Upsampling-Bypass Large Language Modelの略で,Subsampling, Upsampling, Bypassモジュールを組み込んでコアデコーダのみのフレームワークを拡張する革新的なアーキテクチャであるSUBLLMを提案する。サブサンプリングモジュールはシーケンスを短縮し、アップサンプリングモジュールはシーケンスの長さを復元し、バイパスモジュールは収束を高める。 LLaMAと比較して、提案されたSUBLLMは、トレーニング速度と推論速度、メモリ使用量の両方で大幅に向上し、競合する数ショットのパフォーマンスを維持している。トレーニング中、SUBLLMはスピードを26%向上し、GPU毎にメモリを10GB削減する。推論では、スピードを最大37%向上し、1GPUあたりのメモリを1GB削減する。トレーニングと推論のスピードは、コンテキストウィンドウが8192に拡張された場合、それぞれ34%と52%向上できる。提案されたアーキテクチャのソースコードを公開バージョンで公開します。

While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# Prompt Report: A Systematic Survey of Prompting Techniques

The Prompt Report: A Systematic Survey of Prompting Techniques ( http://arxiv.org/abs/2406.06608v2 )

ライセンス: Link先を確認

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker, Denis Peskoff, Marine Carpuat, Jules White, Shyamal Anadkat, Alexander Hoyle, Philip Resnik,

(参考訳) ジェネレーティブ・人工知能(GenAI)システムは、産業や研究環境のあらゆる部分に展開されている。開発者とエンドユーザは、プロンプトやプロンプトエンジニアリングを使用して、これらのシステムと対話する。プロンプトは広く研究されている概念であるが、この地域の急進性のために何がプロンプトを構成するのかについての矛盾する用語や質素な存在論的理解が存在する。本稿では, プロンプトの分類を組立て, 利用分析を行うことにより, プロンプトの構造的理解を確立した。本稿では,33の語彙の包括的語彙,58のテキストのみのプロンプト技術,40のモダリティのテクニックを提示する。さらに、自然言語のプレフィックス・プロンプティングに関する文献全体をメタ分析する。

Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# モデルアーキテクチャのレンズによるニューラルビークルルーティング問題解法の一般化

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture ( http://arxiv.org/abs/2406.06652v2 )

ライセンス: Link先を確認

Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou,

(参考訳) ニューラルモデルは、車両ルーティング問題(VRP)を解決する際に有望な結果をもたらすが、一般化においてしばしば不足する。モデル一般化の最近の試みは、必要以上に大規模なトレーニングコストを発生させるか、あるいは異なるVRPのバリエーションを解決する他のモデルに直接適用できない場合が多い。これらの課題に対処するため,本研究では,モデルアーキテクチャの新たな視点について考察する。具体的には,Scaling Factor (ESF) とDistributment-Specific (DS) デコーダをそれぞれ提案し,サイズと分布の一般化を促進させる。 ESFは、様々な大きさのVRPを解く際に、トレーニング中に発見された慣れ親しんだものに対して、モデルの注意重みパターンを調整する。 DSデコーダは、複数の補助光デコーダを通して複数のトレーニング分布パターンのVRPを明示的にモデル化し、より広範な分散シナリオを含むモデル表現空間を拡張する。我々は,合成および広く認識されている実世界のベンチマークデータセットについて広範な実験を行い,その性能を7つのベースラインモデルと比較した。その結果、ESFとDSデコーダを用いてより一般化可能なモデルを得ることができ、様々なVRP、すなわち旅行セールスマン問題と静電容量化VRPを解くための適用性を示すことができた。特に,提案する汎用コンポーネントは最小限の計算資源を必要とするため,モデル一般化をさらに高めるため,従来の一般化戦略に精力的に組み込むことができる。

Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

翻訳日:2024-06-19 02:00:43 公開日:2024-06-17

# 時空ホークプロセスのためのフレキシブルパラメトリック推論

Flexible Parametric Inference for Space-Time Hawkes Processes ( http://arxiv.org/abs/2406.06849v2 )

ライセンス: Link先を確認

Emilia Siviero, Guillaume Staerman, Stephan Clémençon, Thomas Moreau,

(参考訳) 社会学、疫学、地震学などの現代の時空間データセットの多くは、適切なホークス時空過程が正確に捉えられるように、自励特性、トリガー、クラスタリングの挙動を同時に示している。本稿では,これらのデータに基づいて,時空ホークスプロセスの強度関数に係わるカーネル関数のパラメータを高速かつ柔軟なパラメトリック推論手法を開発することを目的とする。私たちの統計的アプローチは3つの重要な要素を組み合わせています。 1)有限支持のカーネルについて検討する。 2)時空領域は適切に識別され、 3) (近似)事前計算が使用される。そこで提案する推論手法は, 高速かつ統計的に精度の高い$\ell_2$グラデーションベースの解法である。アルゴリズムの側面を説明することに加えて、合成時空間データと実時空間データについて数値実験を行い、提案手法の妥当性を実証した。

Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in the intensity function of a space-time Hawkes process based on such data. Our statistical approach combines three key ingredients: 1) kernels with finite support are considered, 2) the space-time domain is appropriately discretized, and 3) (approximate) precomputations are used. The inference technique we propose then consists of a $\ell_2$ gradient-based solver that is fast and statistically accurate. In addition to describing the algorithmic aspects, numerical experiments have been carried out on synthetic and real spatio-temporal data, providing solid empirical evidence of the relevance of the proposed methodology.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# Hydra-MDP:マルチターゲットハイドラ蒸留によるエンドツーエンドマルチモーダルプランニング

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation ( http://arxiv.org/abs/2406.06978v2 )

ライセンス: Link先を確認

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez,

(参考訳) 教師-学生モデルに複数の教師を取り入れた新しいパラダイムであるHydra-MDPを提案する。このアプローチでは、人間とルールベースの教師の両方から知識を蒸留して学生モデルを訓練し、様々な評価指標に合わせて様々な軌道候補を学習するマルチヘッドデコーダを特徴とする。ルールベースの教師の知識により、Hydra-MDPは、非微分不可能なポストプロセッシングに頼るのではなく、エンド・ツー・エンドの方法で環境がプランニングにどのように影響するかを学ぶ。この手法はナブシム問題において1^{st}$の精度を達成し、様々な運転環境や条件における一般化の大幅な改善を示す。コードはhttps://github.com/woxihuanjiangguo/Hydra-MDPで入手できる。

We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at https://github.com/woxihuanjiangguo/Hydra-MDP

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# 自由を破る:非協力的仮定なしで効率的な多人数のプライベート・セット・ユニオン

Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions ( http://arxiv.org/abs/2406.07011v2 )

ライセンス: Link先を確認

Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai,

(参考訳) マルチパーティ・プライベート・セット・ユニオン(MPSU)プロトコルでは、$m$$(m > 2)$パーティがそれぞれセットを持っていて、他のパーティに追加情報を公開することなく、セットのユニオンをまとめて計算することができる。 MPSUプロトコルには2つの主要なカテゴリがある。このカテゴリの既存のすべての作業は、超直線的な公開鍵操作を含み、結果として実用的効率が低下する。 2つ目は、暗黙の転送と対称キー技術に基づくものである。このカテゴリにおける唯一の既存の研究は、Liu and Gao (ASIACRYPT 2023) によって提案されている。残念なことに、これは通常の半正直なセキュリティを達成しない。したがって、標準的な半真性モデルにおいて、暗黙の転送と対称鍵技術に基づく実用的なMPSUプロトコルを構築するという問題は未解決のままである。さらに,線形計算と線形通信の複雑さを両立させるMPSUプロトコルは存在しない。本稿では、これらの2つの未解決問題を解決する。本稿では,標準半高次モデルにおいて,暗黙の転送と対称鍵技術に基づく最初のMPSUプロトコルを提案する。このプロトコルは、LAN設定でLiuやGaoよりも高速な4.9-9.3 \timesである。具体的には、当社のプロトコルはオンラインフェーズでわずか3.6ドル秒で、それぞれ2〜20ドルのアイテムがセットされている。公開鍵演算に基づく線形計算と線形通信の複雑さを両立させる最初のMPSUプロトコルを提案する。このプロトコルは通信コストが低く、Liu や Gao と比較すると、通信コストが3.0-36.5 倍になる。

Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# 音声テキスト検索におけるブリッジング言語ギャップ

Bridging Language Gaps in Audio-Text Retrieval ( http://arxiv.org/abs/2406.07012v2 )

ライセンス: Link先を確認

Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang,

(参考訳) 音声テキスト検索は難しい作業であり、データベース内で音声クリップやテキストキャプションを検索する必要がある。英語記述に関する既存の研究の主な焦点は、実世界のデータに非英語コンテンツが豊富に存在することを考えると、そのようなモデルの適用性に制限を課している。これらの言語格差に対処するため,多言語テキストエンコーダ(SONAR)を用いて言語固有の情報でテキストデータを符号化する言語拡張(LE)を提案する。さらに、一貫したアンサンブル蒸留(CED)を適用してオーディオエンコーダを最適化し、可変長音声テキスト検索のサポートを強化する。提案手法は,AudioCaps や Clotho などの一般的なデータセット上でのSOTA (State-of-the-art) の性能を示す,英語の音声テキスト検索に優れている。同時に、この手法は、追加の言語強化トレーニングデータの10%しか持たない、他の7つの言語でのコンテンツ検索の習熟度を示し、有望な結果をもたらす。ソースコードはhttps://github.com/zyyan4/ml-clap.comで公開されている。

Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multilingual text encoder (SONAR) to encode the text data with language-specific information. Additionally, we optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval. Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho. Simultaneously, the approach exhibits proficiency in retrieving content in seven other languages with only 10% of additional language-enhanced training data, yielding promising results. The source code is publicly available https://github.com/zyyan4/ml-clap.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# Beyond Bare Queries: 3D Scene Graphによるオープン語彙オブジェクト検索

Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph ( http://arxiv.org/abs/2406.07113v2 )

ライセンス: Link先を確認

Sergey Linok, Tatiana Zemskova, Svetlana Ladanova, Roman Titkov, Dmitry Yudin,

(参考訳) 自然言語で言及されたオブジェクトの配置は、自律的なエージェントにとって大きな課題となる。既存のCLIPベースのオープンボキャブラリ手法は,単純なクエリによる3次元オブジェクトの検索に成功しているが,オブジェクト関係の理解を求める曖昧な記述には対応できない。そこで,この問題を解決するためにBBQ (Beyond Bare Queries) と呼ばれるモジュラー手法を提案する。この手法は3次元空間グラフ表現を計量エッジで構築し,提案アルゴリズムを用いて大規模言語モデルを人対エージェントインタフェースとして利用する。 BBQは、3Dオブジェクトを形成するためにDINOを使ったロバストなアソシエーション、それらを2Dに投影する高度なレイキャストアルゴリズム、グラフノードとして記述するビジョン言語モデルを採用している。 Replica と ScanNet のデータセットでは,設計手法が3次元オブジェクト中心の地図を正確に構築できることが示されている。オープンな3次元セマンティックセマンティックセグメンテーションにおいて,他のゼロショット手法に対して,その品質が重要な位置を占めることを実証した。また,同じ意味クラスの複数の実体を含む場面において,空間的関係の活用が特に有効であることを示す。 Sr3D と Nr3D のベンチマークでは、提案手法は、他の最先端手法と比較して、複雑なクエリによるオブジェクトの検索を可能にした。設計ソリューションを考えると、最も近いアナログの約x3倍の処理速度を達成した。この有望なパフォーマンスは、応用インテリジェントロボティクスプロジェクトにおける私たちのアプローチの活用を可能にします。コードをlinukc.github.io/bbq/で公開しています。

Locating objects referred to in natural language poses a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object retrieval with simple (bare) queries but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene spatial graph representation with metric edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to form 3D objects, an advanced raycasting algorithm to project them to 2D, and a vision-language model to describe them as graph nodes. On Replica and ScanNet datasets, we show that the designed method accurately constructs 3D object-centric maps. We have demonstrated that their quality takes a leading place for open-vocabulary 3D semantic segmentation against other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On Sr3D and Nr3D benchmarks, our deductive approach demonstrates a significant improvement, enabling retrieving objects by complex queries compared to other state-of-the-art methods. Considering our design solutions, we achieved a processing speed approximately x3 times faster than the closest analog. This promising performance enables our approach for usage in applied intelligent robotics projects. We make the code publicly available at linukc.github.io/bbq/.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# 深層強化学習に基づく車両インターネットにおけるセマンティック・アウェアスペクトル共有

Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning ( http://arxiv.org/abs/2406.07213v3 )

ライセンス: Link先を確認

Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief,

(参考訳) 本研究の目的は、車両間通信(V2V)と車両間通信(V2I)のスペクトル共有に着目し、高速移動体インターネット(IoV)環境における意味コミュニケーションを検討することである。本稿では、スペクトル不足とネットワークトラフィックに対処し、深部強化学習(DRL)に基づく意味認識スペクトル共有アルゴリズム(SSS)を提案する。まず,意味情報の抽出について検討する。第二に、IoV環境でのV2VとV2Iのスペクトル共有における意味情報のメトリクスを再定義し、高速な意味スペクトル効率(HSSE)と意味伝達率(HSR)を導入する。最後に、意味情報に基づくV2VおよびV2Iスペクトル共有における決定最適化にSACアルゴリズムを用いる。この最適化は、V2VとV2Iの共有戦略の最適リンク、V2VのHSSEを最大化し、V2Vの効果的な意味情報伝達(SRS)の成功率を高めることを目的として、セマンティック情報を送信する車両の送信パワーと送信セマンティックシンボルの長さを含む。実験の結果,SSSアルゴリズムは,従来の通信方式のスペクトル共有アルゴリズムや,他の強化学習手法を用いたスペクトル共有アルゴリズムなど,他のベースラインアルゴリズムよりも優れていた。 SSSアルゴリズムは、HSSEの15%増加、SRSの約7%増加を示す。

This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# 能動学習を用いた局所モデル妥当性の定量化

Quantifying Local Model Validity using Active Learning ( http://arxiv.org/abs/2406.07474v2 )

ライセンス: Link先を確認

Sven Lämmle, Can Bogoclu, Robert Voßhall, Anselm Haselhoff, Dirk Roos,

(参考訳) 機械学習モデルの現実的な応用は、しばしば法律やポリシーに基づく規制の対象となる。これらの規則のいくつかはモデルの妥当性を保証することを必要とし、すなわち近似誤差は閾値よりも小さい。グローバルメトリックは、一般的に、特定の予測の妥当性を決定するには敏感すぎるが、追加データを集める必要があるため、局所的な妥当性を評価するにはコストがかかる。モデル検証ベンチマークを用いて,提案手法が比較的少量のデータを用いて十分な識別特性を持つ誤差モデルに導出できることを示す。さらに, 妥当性境界の局所的変化に対する感度を, 代替手法と比較して高めることを示した。

Real-world applications of machine learning models are often subject to legal or policy-based regulations. Some of these regulations require ensuring the validity of the model, i.e., the approximation error being smaller than a threshold. A global metric is generally too insensitive to determine the validity of a specific prediction, whereas evaluating local validity is costly since it requires gathering additional data.We propose learning the model error to acquire a local validity estimate while reducing the amount of required data through active learning. Using model validation benchmarks, we provide empirical evidence that the proposed method can lead to an error model with sufficient discriminative properties using a relatively small amount of data. Furthermore, an increased sensitivity to local changes of the validity bounds compared to alternative approaches is demonstrated.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# VideoLLaMA 2:ビデオLLMにおける空間時間モデリングと音声理解の促進

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs ( http://arxiv.org/abs/2406.07476v2 )

ライセンス: Link先を確認

Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing,

(参考訳) 本稿では,映像・音声指向タスクにおける時空間モデリングと音声理解の強化を目的としたビデオ大言語モデル(Video Large Language Models: Video-LLMs)のセットであるVideoLLaMA 2を提案する。 VideoLLaMA 2には、ビデオデータの複雑な空間的・時間的ダイナミクスを効果的にキャプチャする、テーラーメイドの空間的時間的畳み込みコネクタが組み込まれている。さらに,ジョイントトレーニングを通じてモデルにオーディオブランチを組み込むことで,音声キューをシームレスに組み込むことで,モデルのマルチモーダル理解能力を向上する。マルチ選択ビデオ質問応答(MC-VQA)、オープンエンドビデオ質問応答(OE-VQA)、ビデオキャプション(VC)タスクに関する総合的な評価は、VideoLLaMA 2がオープンソースモデル間の競争結果を一貫して達成し、いくつかのベンチマークでいくつかのプロプライエタリなモデルに近づいたことを示している。さらに、VideoLLaMA 2は、既存のモデルよりもオーディオ専用およびオーディオビデオ質問応答(AQA & OE-AVQA)のベンチマークが合理的に改善されている。これらの進歩は、マルチモーダル理解におけるVideoLLaMA 2の優れた性能を基盤としており、インテリジェントなビデオ分析システムのための新しい標準となっている。すべてのモデルは、さらなる研究を促進するために公開されています。

In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data. Additionally, we integrate an Audio Branch into the model through joint training, thereby enriching the multimodal understanding capabilities of the model by seamlessly incorporating audio cues. Comprehensive evaluations on multiple-choice video question answering (MC-VQA), open-ended video question answering (OE-VQA), and video captioning (VC) tasks demonstrate that VideoLLaMA 2 consistently achieves competitive results among open-source models and even gets close to some proprietary models on several benchmarks. Furthermore, VideoLLaMA 2 exhibits reasonable improvements in audio-only and audio-video question-answering (AQA & OE-AVQA) benchmarks over existing models. These advancements underline VideoLLaMA 2's superior performance in multimodal comprehension, setting a new standard for intelligent video analysis systems. All models are public to facilitate further research.

翻訳日:2024-06-19 01:50:51 公開日:2024-06-17

# アクターが話す: 動きと外見が絡み合った、一般化可能で高忠実なリップシンク

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement ( http://arxiv.org/abs/2406.08096v2 )

ライセンス: Link先を確認

Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian,

(参考訳) 本研究の目的は,個人的アイデンティティと視覚的詳細を保ちながら,音声による唇の動きの編集を行うことである。課題は,(1)音声による唇の動き生成と(2)視覚的外観合成の2つのサブプロブレムに分解することができる。現在のソリューションは、単一の生成モデル内で2つのサブプロブレムを処理する。その代わりに、動作と外観をアンタングルにし、音声間拡散モデルと動作条件付き外観生成モデルで1つずつ生成することを提案する。しかし,(1)における動作認識のアイデンティティの保存,(2)における視覚的詳細の保存など,各段階における課題は依然として残っている。したがって、個人的アイデンティティを維持するために、動作を表現するためにランドマークを採用し、さらにランドマークに基づくアイデンティティ損失を採用する。動きに依存しない視覚的詳細をキャプチャするために、別個のエンコーダを使用して唇、非唇の外観、動きを符号化し、学習した融合モジュールと統合する。大規模で多様なデータセットでMyTalkをトレーニングします。実験により,本手法は,リップシンクと視覚的ディテールの両面から,未知のドメイン外人物によく一般化することが示された。プロジェクトページ(https://Ingrid789.github.io/MyTalk/)でビデオを見ることを推奨しています。

We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync quality and visual details preservation. Instead, we propose to disentangle the motion and appearance, and then generate them one by one with a speech-to-motion diffusion model and a motion-conditioned appearance generation model. However, there still remain challenges in each stage, such as motion-aware identity preservation in (1) and visual details preservation in (2). Therefore, to preserve personal identity, we adopt landmarks to represent the motion, and further employ a landmark-based identity loss. To capture motion-agnostic visual details, we use separate encoders to encode the lip, non-lip appearance and motion, and then integrate them with a learned fusion module. We train MyTalk on a large-scale and diverse dataset. Experiments show that our method generalizes well to the unknown, even out-of-domain person, in terms of both lip sync and visual detail preservation. We encourage the readers to watch the videos on our project page (https://Ingrid789.github.io/MyTalk/).

翻訳日:2024-06-19 01:41:06 公開日:2024-06-17

# 目が広いアンシャット:予測不能な視線検出による自我中心ビデオにおける教師なしの誤検出

Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze ( http://arxiv.org/abs/2406.08379v2 )

ライセンス: Link先を確認

Michele Mazzamuto, Antonino Furnari, Giovanni Maria Farinella,

(参考訳) 本稿では,スマートグラスにおけるユーザ支援を促進する重要な要素である視線信号の解析を通じて,自我中心映像における教師なし誤り検出の課題に対処する。手動でラベル付けされたミスに依存する従来の教師付きメソッドは、ドメイン依存性とスケーラビリティの問題に悩まされている。本研究では、ドメイン固有の要件と注釈付きデータの必要性を克服し、人間の活動のビデオの誤りを検出する教師なし手法を提案する。不完全な入力から視線軌跡を予測できる視線完了モデルを提案する。期待された視線経路と観測された視線経路の違いは、誤りを特定する指標として機能する。本手法はEPIC-Tentデータセットで検証され,従来の1クラスの教師なし・教師なしの手法と比較して優位性を示した。

In this paper, we address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals, a critical component for advancing user assistance in smart glasses. Traditional supervised methods, reliant on manually labeled mistakes, suffer from domain-dependence and scalability issues. This research introduces an unsupervised method for detecting mistakes in videos of human activities, overcoming the challenges of domain-specific requirements and the necessity for annotated data. By analyzing unusual gaze patterns that signal user disorientation during tasks, we propose a gaze completion model that forecasts eye gaze trajectories from incomplete inputs. The difference between the anticipated and observed gaze paths acts as an indicator for identifying errors. Our method is validated on the EPIC-Tent dataset, showing its superiority compared to current one-class supervised and unsupervised techniques.

翻訳日:2024-06-19 01:41:06 公開日:2024-06-17

# CIMRL: 安全な自動運転のためのシミュレーションと強化学習を組み合わせる

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving ( http://arxiv.org/abs/2406.08878v2 )

ライセンス: Link先を確認

Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Aleksandr Petiushko,

(参考訳) 現代の自動運転のアプローチは、模倣学習を通じて大量の人間の運転データで訓練された学習されたコンポーネントに大きく依存している。しかし、これらの手法には大量の高価なデータ収集が必要であり、ロングテールシナリオを安全に処理し、時間とともにエラーを複雑化するという課題に直面している。同時に、純粋な強化学習(RL)手法は、運転のような報酬設定を疎外し、制約し、かつ決定し難いパフォーマンスポリシーを学習することができない。これらの課題はどちらも、自動運転車のような安全上重要なアプリケーションに、純粋にクローン化されたポリシーを展開させる。本稿では,模倣動作の先行と安全性制約を活用することで,シミュレーションにおける運転方針のトレーニングを可能にするCIMRL(Combining imitation and Reinforcement Learning)アプローチを提案する。 CIMRLは広範な報酬仕様を必要とせず、純粋なクローンメソッドの閉ループ挙動を改善している。 RLと模倣を組み合わせることで,本手法は閉ループシミュレーション駆動ベンチマークにおいて最先端の結果が得られることを示す。

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.

翻訳日:2024-06-19 01:41:06 公開日:2024-06-17

# 特徴構造を持つ細粒領域一般化

Fine-Grained Domain Generalization with Feature Structuralization ( http://arxiv.org/abs/2406.09166v2 )

ライセンス: Link先を確認

Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu,

(参考訳) 細粒度領域一般化(FGDG)は、クラス間差が小さく、クラス内差が比較的大きいため、従来のDGタスクよりも難しい課題である。ドメイン分布が変化すると、微妙な特徴の脆弱性がモデルの性能を著しく低下させる。それでも人間は、カテゴリー内の共通点と特異点の識別から生じる構造化された多粒性知識を活用して、アウト・オブ・ディストリビューションデータに一般化する能力を本質的に示している。同様に、FGDGの性能を高めるために、FSDGモデル(Feature Structureized Domain Generalization: FSDG)モデルを提案する。特に特徴構造化(FS)は5つの制約の合同最適化によって達成される: 絡み合ったセグメントに適用されるデコリレーション関数、共通特徴の一貫性と特徴の特異性を保証する3つの制約、予測キャリブレーション項。これらの規定を課すことにより、FSDGは多粒度知識に基づいて特徴を歪め、整列させ、カテゴリー間の頑健な微妙な区別を促進する。 3つのベンチマークでの大規模な実験は、FGDGのパフォーマンスが平均6.2%向上し、最先端のベンチマークよりもFSDGの方が優れていることを一貫して検証している。さらに、カテゴリ間の共有概念とモデルチャネル間の明示的な概念マッチング強度に関する説明可能性分析を行い、様々な主流モデルアーキテクチャの実験を行い、FSの有効性を実証した。

Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.

翻訳日:2024-06-19 01:41:06 公開日:2024-06-17

# 双方向AIアライメントに向けて: 明確化, 枠組み, 今後の方向性の体系的レビュー

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions ( http://arxiv.org/abs/2406.09264v2 )

ライセンス: Link先を確認

Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens,

(参考訳) 汎用AIの最近の進歩は、AIシステムを意図された目標、倫理的原則、個人とグループの価値に向けて導くことの重要性を強調している。しかしながら、人間-AIアライメントの明確な定義とスコープの欠如は、このアライメントを達成するための研究領域間の協調作業を妨げる重要な障害となる。特に、MLと哲学指向のアライメント研究は、AIアライメントを、進行中の相互アライメント問題(429)ではなく、静的で一方向のプロセス(つまり、AIシステムの目的が人間と一致することを保証すること)とみなすことが多い。この観点は、長期の相互作用とアライメントの動的変化を無視している。これらのギャップを理解するために、2019年から2024年1月までに発行された400以上の論文を体系的にレビューし、ヒューマン・コンピュータ・インタラクション(HCI)、自然言語処理(NLP)、機械学習(ML)など複数のドメインにまたがる調査を行った。人間のAIアライメントを特徴づけ、定義し、スコープ化します。そこで本研究では,「双方向型AIアライメント」の概念的枠組みを提示し,文学を人間中心の視点から整理する。このフレームワークは両方を包含する 1)AIを人間に合わせる従来の研究は、AIが人間によって決定された結果を生み出すことを確実にしている。 2) 個人や社会が認知的・行動的にAIの進歩に適応することを支援することを目的として,人間をAIに整合させる概念を提案する。さらに,人的価値,インタラクション技術,評価に関する議論など,文献分析から得られた重要な知見を述べる。今後の研究の道を開くために,今後の方向性に関する3つの重要な課題を思いつき,今後の解決策の例を提案する。

Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.

翻訳日:2024-06-19 01:41:06 公開日:2024-06-17

# フレームが多すぎて役に立たない:長めのビデオQAのための効率的な戦略

Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA ( http://arxiv.org/abs/2406.09396v2 )

ライセンス: Link先を確認

Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo,

(参考訳) 広い時間間隔にまたがるロングフォームビデオは、非常に情報冗長であり、しばしばゆるやかな関係を持つ複数の異なるイベントやエンティティを含んでいる。したがって、長文ビデオ質問応答(LVQA)を行う場合、正しい応答を生成するために必要な情報はすべて、フレームの小さなサブセットに含まれることが多い。近年の文献では、ビデオ内のすべての視覚コンテンツを自然言語に変換するために視覚言語モデル(VLM)に依存しながら、LVQAベンチマークにおける大きな言語モデル(LLM)の使用を調査している。このようなVLMは、長いビデオから一様にサンプリングされた大量のフレームを独立にキャプションすることが多いが、これは効率的ではなく、ほとんど冗長である。これらの選択を問うことで、キーフレーム選択とシーケンス認識キャプションの最適戦略を探求し、これらの冗長性を著しく低減することができる。本稿では,階層型鍵フレームセレクタと逐次型ビジュアルLLMという,各側面を改善する2つの新しいアプローチを提案する。 LVNetと呼ばれるフレームワークは、3つのベンチマークLVQAデータセットにまたがって最先端のパフォーマンスを実現する。私たちのコードは公開されます。

Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large language models (LLMs) in LVQA benchmarks, achieving exceptional performance, while relying on vision language models (VLMs) to convert all visual content within videos into natural language. Such VLMs often independently caption a large number of frames uniformly sampled from long videos, which is not efficient and can mostly be redundant. Questioning these decision choices, we explore optimal strategies for key-frame selection and sequence-aware captioning, that can significantly reduce these redundancies. We propose two novel approaches that improve each of aspects, namely Hierarchical Keyframe Selector and Sequential Visual LLM. Our resulting framework termed LVNet achieves state-of-the-art performance across three benchmark LVQA datasets. Our code will be released publicly.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-17

# 要件は必要なものすべて: LLMによる要件からコードへ

Requirements are All You Need: From Requirements to Code with LLMs ( http://arxiv.org/abs/2406.10101v2 )

ライセンス: Link先を確認

Bingyang Wei,

(参考訳) ソフトウェア要件のドキュメンテーションにおけるテキスト形式の普及は、ソフトウェアエンジニアリングタスクに大規模言語モデル(LLM)を適用する大きな機会を提供する。高品質なソフトウェア要件は、手動のソフトウェア開発プロセスを強化するだけでなく、新興のLLM技術の可能性を完全に活用するように組織を配置する。本稿では,要求文書からコードスニペットを自動生成するLLMについて述べる。このLLMは、ソフトウェア開発プロセス、要件分析、オブジェクト指向設計、テスト駆動開発に関連する知識、ヒューリスティックス、インストラクションで拡張され、経験豊富なソフトウェアエンジニアの専門知識を効果的にエミュレートします。我々は,ソフトウェア技術者が段階的にこのLLMに関わり得る「プログレッシブ・プロンプティング」手法を導入する。このアプローチを通じて、LLMは、提供された要件を解釈して機能要件を抽出し、これらを使用してオブジェクト指向モデルを作成し、その後、オブジェクト指向設計に基づいて単体テストとコードを生成することで、ソフトウェア開発タスクに段階的に取り組みます。複雑なユーザ要件の理解とロバストな設計とコードソリューションの創出におけるLCMの熟練度を,Webプロジェクトの開発に焦点をあてたケーススタディを通じて実証する。本研究は、LCMをソフトウェア開発ワークフローに統合し、効率と品質の両方を大幅に向上させる可能性を明らかにする。 LLMはhttps://chat.openai.com/g/g-bahoiKzkB-software-engineer-gptで利用可能である。

The pervasive use of textual formats in the documentation of software requirements presents a great opportunity for applying large language models (LLMs) to software engineering tasks. High-quality software requirements not only enhance the manual software development process but also position organizations to fully harness the potential of the emerging LLMs technology. This paper introduces a tailored LLM for automating the generation of code snippets from well-structured requirements documents. This LLM is augmented with knowledge, heuristics, and instructions that are pertinent to the software development process, requirements analysis, object-oriented design, and test-driven development, effectively emulating the expertise of a seasoned software engineer. We introduce a "Progressive Prompting" method that allows software engineers to engage with this LLM in a stepwise manner. Through this approach, the LLM incrementally tackles software development tasks by interpreting the provided requirements to extract functional requirements, using these to create object-oriented models, and subsequently generating unit tests and code based on the object-oriented designs. We demonstrate the LLM's proficiency in comprehending intricate user requirements and producing robust design and code solutions through a case study focused on the development of a web project. This study underscores the potential of integrating LLMs into the software development workflow to significantly enhance both efficiency and quality. The tailored LLM is available at https://chat.openai.com/g/g-bahoiKzkB-software-engineer-gpt.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-17

# サブターフュージへのシクロファンシー:大規模言語モデルにおけるリワードタンパの検討

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ( http://arxiv.org/abs/2406.10162v2 )

ライセンス: Link先を確認

Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger,

(参考訳) 強化学習では、AIシステムが不特定のトレーニング目標のために高い報酬を受ける、望ましくない振る舞いを学ぶとき、仕様ゲームが発生する。仕様ゲームは、サイコファンシーのような単純な行動から、報酬のテーパーのような洗練された行動まで様々で、モデルが自身の報酬メカニズムを直接変更する。しかし、これらの悪質な行動は、探索によって発見されるには複雑すぎるかもしれない。本稿では,言語モデル(LLM)アシスタントにおいて,発見が容易な仕様ゲームが,報酬テーパーを含む,より希少な,よりブレントな形式を実現するために一般化されるかどうかを考察する。より洗練されたゲーム環境のカリキュラムを構築し、早期のカリキュラム環境におけるトレーニングが、残りの環境におけるより多くの仕様ゲームに繋がることを示した。興味深いことに、LLMアシスタントは、カリキュラム全体を訓練し、ゼロショットを一般化して、自身の報酬関数を直接書き換える。初期のカリキュラム環境をゲームするためにLLMをトレーニングすることは、軽減するが、後続の環境では報酬のテーパーを排除しない。さらに、ゲーム可能な環境に無害トレーニングを加えることで、報酬の改ざんを防ぐことはできない。これらの結果は、LLMが一般的な仕様ゲームからより悪質な報酬テーパーへと一般化でき、そのような振る舞いを除去するのは簡単ではないことを示している。

In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too complex to be discovered via exploration. In this paper, we study whether Large Language Model (LLM) assistants which find easily discovered forms of specification gaming will generalize to perform rarer and more blatant forms, up to and including reward-tampering. We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments. Strikingly, a small but non-negligible proportion of the time, LLM assistants trained on the full curriculum generalize zero-shot to directly rewriting their own reward function. Retraining an LLM not to game early-curriculum environments mitigates, but does not eliminate, reward-tampering in later environments. Moreover, adding harmlessness training to our gameable environments does not prevent reward-tampering. These results demonstrate that LLMs can generalize from common forms of specification gaming to more pernicious reward tampering and that such behavior may be nontrivial to remove.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-17

# 意図からテクニックへ:大規模言語モデルのためのテキスト透かしの包括的分類と課題

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models ( http://arxiv.org/abs/2406.11106v1 )

ライセンス: Link先を確認

Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee,

(参考訳) LLM(Large Language Models)の急速な成長に伴い、不正使用に対するテキストコンテンツの保護が重要となる。テキスト透かしは、LLM生成とプレーンテキストソースの両方を保護する、重要なソリューションを提供する。本稿では, 透かし技術設計の背景にある様々な視点を総合的に概観し, 研究文献の総合的な調査を通して概観する。本研究は, 異なる透かし技術の背後にある特定の意図, 使用する評価データセット, 透かしの追加, および, 凝集性分類学を構築するための除去方法に基づいて, 研究を考察する。 2)テキストオーサシップの保護研究を促進するために,テキスト透かしにおけるギャップとオープンな課題を強調した。この広範囲にわたるカバレッジと詳細な分析は、言語モデルにおけるテキスト透かしの進化状況に関する貴重な洞察を与えてくれる。

With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# パーソナライズド言語モデルにおける安全・実用取引の探索

Exploring Safety-Utility Trade-Offs in Personalized Language Models ( http://arxiv.org/abs/2406.11107v1 )

ライセンス: Link先を確認

Anvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi,

(参考訳) 大規模言語モデル(LLM)が日々のアプリケーションにますます統合されるにつれて、多様なユーザ層にわたって、それらが適切に動作することを保証することが不可欠である。本研究では,LLMがパーソナライズバイアスに悩まされ,ユーザのアイデンティティにパーソナライズされた場合のパフォーマンスに影響を及ぼすことを示す。安全性と実用性という2つの軸に沿ってLLMの性能を評価することにより、パーソナライズバイアスを定量化する。我々は、パーソナライズなしで、安全でないプロンプトに対する良識あるLLM応答がどのように安全であるかを調べることで安全性を測定する。汎用知識,数学的能力,プログラミング,推論能力など,様々なタスクにおいてLLMの性能を評価することで,実用性を評価する。 Llama (Touvron et al , 2023) や Mistral (Jiang et al , 2023) のようなオープンソースのモデルから GPT-3.5 や GPT-4o (Ouyang et al , 2022) のような API ベースのモデルまで,ユーザアイデンティティによる安全性と実用性のトレードオフの観点からは,さまざまな LLM がパフォーマンスに有意なばらつきを示すことがわかった。最後に、嗜好調整とプロンプトベースディフェンスを用いたパーソナライズバイアスを軽減するためのいくつかの戦略について議論する。

As large language models (LLMs) become increasingly integrated into daily applications, it is essential to ensure they operate fairly across diverse user demographics. In this work, we show that LLMs suffer from personalization bias, where their performance is impacted when they are personalized to a user's identity. We quantify personalization bias by evaluating the performance of LLMs along two axes - safety and utility. We measure safety by examining how benign LLM responses are to unsafe prompts with and without personalization. We measure utility by evaluating the LLM's performance on various tasks, including general knowledge, mathematical abilities, programming, and reasoning skills. We find that various LLMs, ranging from open-source models like Llama (Touvron et al., 2023) and Mistral (Jiang et al., 2023) to API-based ones like GPT-3.5 and GPT-4o (Ouyang et al., 2022), exhibit significant variance in performance in terms of safety-utility trade-offs depending on the user's identity. Finally, we discuss several strategies to mitigate personalization bias using preference tuning and prompt-based defenses.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# ヘイト音声検出のための大規模言語モデルにおけるアノテーションバイアスの検討

Investigating Annotator Bias in Large Language Models for Hate Speech Detection ( http://arxiv.org/abs/2406.11109v1 )

ライセンス: Link先を確認

Amit Das, Zheng Zhang, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Nilanjana Raychawdhary, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals,

(参考訳) データアノテーション(生データに記述ラベルを割り当てるプラクティス)は、機械学習モデルのパフォーマンスを最適化する上で重要である。しかし、アノテータが導入したバイアスの影響を受けやすいリソース集約プロセスである。 ChatGPTのような高度なLarge Language Models(LLM)の出現は、この複雑な手続きを近代化し合理化するユニークな機会を提供する。既存の研究は,LPMのアノテータとしての有効性を広く評価しているが,本論文では,ヘイトスピーチデータのアノテート時のLPM,特にGPT 3.5およびGPT 4oのバイアスについて検討する。我々の研究は、性別、人種、宗教、障害の4つの主要なカテゴリーにおけるバイアスの理解に貢献する。具体的には、これらのカテゴリ内の非常に脆弱なグループを対象として、アノテータバイアスを分析します。さらに、アノテーション付きデータを精査することにより、これらのバイアスに寄与する潜在的な因子を網羅的に調査する。我々は、この研究を行うために、私たちのカスタムヘイトスピーチ検出データセットであるHateSpeechCorpusを紹介します。さらに、ETHOS(Mollas et al , 2022)データセット上でも、比較分析のために同様の実験を行う。本論文は,LLMの可能性をデータアノテーションに活用する上で,研究者や実践者たちを指導する上で重要な資源として機能する。 https://github.com/AmitDasRup123/HateSpeechCorpus.com/HateSpeechCorpusデータセットが利用可能だ。

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs), like ChatGPT presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs, specifically GPT 3.5 and GPT 4o when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateSpeechCorpus, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al., 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for dataannotation, thereby fostering advancements in this critical field. The HateSpeechCorpus dataset is available here: https://github.com/AmitDasRup123/HateSpeechCorpus

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# ニューラルネットワークがいかにしてサポートを学ぶかは、SGDの必然的正規化効果である

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD ( http://arxiv.org/abs/2406.11110v1 )

ライセンス: Link先を確認

Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio,

(参考訳) 目的関数の支持を識別するディープニューラルネットワークの能力について検討する。以上の結果から,SGDは入力の無関係成分に関連する重みをゼロにすることで,ネットワークの第1層の支持を効果的に学習することがわかった。対照的に、バニラGDも対象関数を近似するが、第1層の支持を学習するためには明示的な正規化項が必要である。ミニバッチSGDのこの性質は、2階の暗黙正則化効果が$\eta / b$(ステップサイズ/バッチサイズ)に比例していることが証明されている。我々の結果は、暗黙の正則化がトレーニング最適化のダイナミクスに重大な影響を与えることの証明であるだけでなく、ネットワークによって学習される特徴の構造にも光を当てている。さらに、より小さなバッチは機能の解釈可能性を高め、初期化への依存を減らすことを示唆している。

We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input. In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit regularization term to learn the support in the first layer. We prove that this property of mini-batch SGD is due to a second-order implicit regularization effect which is proportional to $\eta / b$ (step size / batch size). Our results are not only another proof that implicit regularization has a significant impact on training optimization dynamics but they also shed light on the structure of the features that are learned by the network. Additionally, they suggest that smaller batches enhance feature interpretability and reduce dependency on initialization.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 固有状態熱化仮説によるエルゴトロピーとノーゴー理論の普遍的束縛

Universal bound on Ergotropy and No-Go Theorem by the Eigenstate Thermalization Hypothesis ( http://arxiv.org/abs/2406.11112v1 )

ライセンス: Link先を確認

Akihiro Hokkyo, Masahito Ueda,

(参考訳) 量子多体系から抽出可能な最大処理(エルゴトロピー)は、初期状態の局所熱水性と量子演算による局所エントロピー減少によって制約されることを示す。得られたエルゴトロピーの普遍的境界は、固有状態熱化仮説が有限時間単位演算によるエネルギー固有状態からの仕事の抽出を禁止していることを示している。このノーゴー性質は、プランクの原理、すなわち熱力学の第2法則の形式が純粋量子状態に対しても成り立つことを意味する。その結果, 量子熱力学, 第2法則, 熱化の2つの独立に研究された概念を, 作業抽出の資源としての多体系系内相関を通じて橋渡しした。

We show that the maximum extractable work (ergotropy) from a quantum many-body system is constrained by local athermality of an initial state and local entropy decrease brought about by quantum operations. The obtained universal bound on ergotropy implies that the eigenstate thermalization hypothesis prohibits work extraction from energy eigenstates by means of finite-time unitary operations. This no-go property implies that Planck's principle, a form of the second law of thermodynamics, holds even for pure quantum states. Our result bridges two independently studied concepts of quantum thermodynamics, the second law and thermalization, via intrasystem correlations in many-body systems as a resource for work extraction.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# テキストグラフト:テキスト分類におけるマイノリティクラスのための近分布弱スーパービジョン

Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification ( http://arxiv.org/abs/2406.11115v1 )

ライセンス: Link先を確認

Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang,

(参考訳) 極端に弱められたテキスト分類のために、先駆的な研究は、生のコーパスからクラス名に似たテキストをマイニングすることで擬似ラベルを生成する。最近の研究は、クラス名や定義を使って LLM に関連テキストを生成し始めたが、LCM が in-distribution (すなわち、テキスト分類器が適用されるコーパスに似た) データを生成できないリスクが高く、一般化不可能な分類に繋がる。本稿では,これら2つのアプローチの利点を組み合わせて,マイノリティクラスにおけるクリーンでほぼ分布の弱い監督者獲得を目的とした,新しいフレームワークである 'emph{text grafting} を通じてギャップを埋めることを提案する。具体的には、まずLLMベースのロジットを用いて、ターゲットとするマイノリティクラスへのデータ合成の可能性が高い生コーパスからマスク付きテンプレートをマイニングする。次に、テンプレートは最先端のLCMで満たされ、マイノリティクラスに該当する近分布テキストを合成する。テキストグラフトは、マイノリティクラスでの直接採掘や合成よりも大幅に改善されている。また,テキストグラフトの性質を理解するために解析とケーススタディを用いた。

For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot generate in-distribution (i.e., similar to the corpus where the text classifier will be applied) data, leading to ungeneralizable classifiers. In this paper, we combine the advantages of these two approaches and propose to bridge the gap via a novel framework, \emph{text grafting}, which aims to obtain clean and near-distribution weak supervision for minority classes. Specifically, we first use LLM-based logits to mine masked templates from the raw corpus, which have a high potential for data synthesis into the target minority class. Then, the templates are filled by state-of-the-art LLMs to synthesize near-distribution texts falling into minority classes. Text grafting shows significant improvement over direct mining or synthesis on minority classes. We also use analysis and case studies to comprehend the property of text grafting.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# ChatGPTにおける文法表現 : 言語学者とレイマンとの比較

Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople ( http://arxiv.org/abs/2406.11116v1 )

ライセンス: Link先を確認

Zhuang Qiu, Xufeng Duan, Zhenguang G. Cai,

(参考訳) 大規模言語モデル (LLM) は様々な言語課題において例外的な性能を示した。しかし、LLMが人間のような微粒な文法的直観を発達させたかどうかは不明である。この事前登録された研究 (https://osf.io/t5nes) は、ChatGPTの文法的直観を初めて大規模に調査し、言語学者が文法的、非文法的、辺縁的な文法的であると判断した148の言語的現象について、住民の文法的判断を収集した以前の研究に基づいている(Sprouse, Schutze, & Almeida, 2013)。我々の主な焦点は、これらの言語構成の判断において、ChatGPTを一般人と言語学者の両方と比較することであった。実験1では、ChatGPTは与えられた参照文に基づいて評価を文に割り当てた。実験2では7点の尺度で評価文を選択し,実験3ではChatGPTに対して,より文法的な文章を選択するように求めた。全体として,ChatGPTと言語学者の間には73%から95%の収束率があり,全体としては89%と推定された。また,全てのタスクにおいてChatGPTとレイパーの間に有意な相関関係が認められたが,相関強度はタスクによって異なる。これらの結果は、判断タスクの心理測定的性質と、人間とLLMの言語処理スタイルの違いによるものである。

Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 統計的契約による品質テキスト生成のインセンティブ

Incentivizing Quality Text Generation via Statistical Contracts ( http://arxiv.org/abs/2406.11118v1 )

ライセンス: Link先を確認

Eden Saig, Ohad Einav, Inbal Talgam-Cohen,

(参考訳) 大規模言語モデル(LLMs)の成功は、機械生成テキストの需要を増加させる一方で、現在のペイ・パー・トーケンの価格体系は、経済においてモラルハザード(モラルハザード)として知られるインセンティブの誤調整を生み出している。本研究は、品質をインセンティブ化するための、パフォーマンスの高い契約ベースのフレームワークを提案することで、経済的な観点からこの問題にアプローチする。エージェントがコストのかかる推論を用いてテキストを生成するプリンシパルエージェントゲームについて検討し、自動品質評価に基づいて、契約がテキストに対するプリンシパルの支払いを決定する。内部推論コストが不明な場合,標準契約理論は適用できないため,コストロバスト契約を導入する。筆者らの主な理論的貢献として、統計学からの最適合成仮説テストと直接対応して最適コストロス契約を特徴づけ、Saig et al (NeurIPS'23) の結果を一般化する。我々は,様々な目標とLCM評価ベンチマークの契約を導出して,実証的にフレームワークを評価し,コストロスの契約は,コスト意識の契約よりも目標値の限界的な増加を犠牲にしていることがわかった。

While the success of large language models (LLMs) increases demand for machine-generated text, current pay-per-token pricing schemes create a misalignment of incentives known in economics as moral hazard: Text-generating agents have strong incentive to cut costs by preferring a cheaper model over the cutting-edge one, and this can be done "behind the scenes" since the agent performs inference internally. In this work, we approach this issue from an economic perspective, by proposing a pay-for-performance, contract-based framework for incentivizing quality. We study a principal-agent game where the agent generates text using costly inference, and the contract determines the principal's payment for the text according to an automated quality evaluation. Since standard contract theory is inapplicable when internal inference costs are unknown, we introduce cost-robust contracts. As our main theoretical contribution, we characterize optimal cost-robust contracts through a direct correspondence to optimal composite hypothesis tests from statistics, generalizing a result of Saig et al. (NeurIPS'23). We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 時間制約付き身体制御のためのモデル適応

Model Adaptation for Time Constrained Embodied Control ( http://arxiv.org/abs/2406.11128v1 )

ライセンス: Link先を確認

Jaehyun Song, Minjong Yoo, Honguk Woo,

(参考訳) エージェントの深層学習モデルを採用するには,特定のタスクや運用条件に対してモデル構造を最適化する必要がある。このような最適化は、モデル圧縮や適応推論のような動的に静的である。しかし、これらの手法は、複数のタスクに対する逐次的な意思決定を必要とする時間制約を考慮に入れた実施制御システムについて、完全には研究されていない。本稿では,モジュラーモデル適応を用いた時間制約を考慮した具体化制御フレームワークであるMoDeCを提案する。モジュールネットワーク上の動的ルーティングとして資源および時間制限の様々な運用条件に対するモデル適応を定式化し、これらの条件をマルチタスクの目的の一部として組み込む。ロボット操作と自律運転における時間制約の両面において,MoDeCのロバスト性を示し,他のモデル適応手法よりも優れていることを示す。

When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential decision-making for multiple tasks, each with distinct inference latency limitations. In this paper, we present MoDeC, a time constraint-aware embodied control framework using the modular model adaptation. We formulate model adaptation to varying operational conditions on resource and time restrictions as dynamic routing on a modular network, incorporating these conditions as part of multi-task objectives. Our evaluation across several vision-based embodied environments demonstrates the robustness of MoDeC, showing that it outperforms other model adaptation methods in both performance and adherence to time constraints in robotic manipulation and autonomous driving applications

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 神経系

Neural Lineage ( http://arxiv.org/abs/2406.11129v1 )

ライセンス: Link先を確認

Runpeng Yu, Xinchao Wang,

(参考訳) ニューラルネットワークが十分に機能しているとすれば、その親を、そのチューニングに基づいて特定することは可能だろうか? 本稿では,親子間の系統関係の発見を目的としたニューラルライン検出という新しいタスクを提案する。具体的には、一組の親モデルから、神経系統検出は、子モデルからどの親モデルが微調整されたかを予測する。この課題に対処するための2つのアプローチを提案する。 1) 実用上, 微調整過程の近似をニューラルネットワーク表現類似度指標に統合した学習自由アプローチを導入し, 類似性に基づく系統検出手法を提案する。 2) 精度の追求のために, エンコーダと変圧器検出器を組み合わせた学習系系統検出装置を導入する。実験を通じて,提案手法が学習環境におけるベースラインよりも優れており,様々な視覚モデルに適応可能であることを検証した。さらに、親モデルだけでなく祖先も識別し、世代間の血統を辿る能力も示している。

Given a well-behaved neural network, is possible to identify its parent, based on which it was tuned? In this paper, we introduce a novel task known as neural lineage detection, aiming at discovering lineage relationships between parent and child models. Specifically, from a set of parent models, neural lineage detection predicts which parent model a child model has been fine-tuned from. We propose two approaches to address this task. (1) For practical convenience, we introduce a learning-free approach, which integrates an approximation of the finetuning process into the neural network representation similarity metrics, leading to a similarity-based lineage detection scheme. (2) For the pursuit of accuracy, we introduce a learning-based lineage detector comprising encoders and a transformer detector. Through experimentation, we have validated that our proposed learning-free and learning-based methods outperform the baseline in various learning settings and are adaptable to a variety of visual models. Moreover, they also exhibit the ability to trace cross-generational lineage, identifying not only parent models but also their ancestors.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 生成アスペクトに基づく感性分析のための動的順序テンプレート予測

Dynamic Order Template Prediction for Generative Aspect-Based Sentiment Analysis ( http://arxiv.org/abs/2406.11130v1 )

ライセンス: Link先を確認

Yonghyun Jun, Hwanhee Lee,

(参考訳) アスペクトベースの感情分析(ABSA)は、テキスト内の特定の側面に対する感情を評価し、詳細な感情タプルをもたらす。以前のABSAモデルは、しばしば静的テンプレートを使用してタプル内のすべての要素を予測する。マルチビュープロンプト法は,様々なテンプレートでタプルを予測し,結果をアンサンブルすることでABSAの性能を向上させる。しかし、この方法は非効率性や分配エラーに悩まされる。本稿では,インスタンスレベルのエントロピーに基づいて,各インスタンスに必要なビューを動的に生成するABSAの動的順序テンプレート(DOT)手法を提案する。提案手法は,ASQPおよびACOSデータセットのF1スコアを改善するとともに,推論時間を大幅に短縮する。

Aspect-based sentiment analysis (ABSA) assesses sentiments towards specific aspects within texts, resulting in detailed sentiment tuples. Previous ABSA models often use static templates to predict all of the elements in the tuples, and these models often fail to accurately capture dependencies between elements. Multi-view prompting method improves the performance of ABSA by predicting tuples with various templates and then ensembling the results. However, this method suffers from inefficiencies and out-of-distribution errors. In this paper, we propose a Dynamic Order Template (DOT) method for ABSA, which dynamically generates necessary views for each instance based on instance-level entropy. Ensuring the diverse and relevant view generation, our proposed method improves F1-scores on ASQP and ACOS datasets while significantly reducing inference time.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# 大規模言語モデルは分類学のよい置き換えか?

Are Large Language Models a Good Replacement of Taxonomies? ( http://arxiv.org/abs/2406.11131v1 )

ライセンス: Link先を確認

Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen,

(参考訳) 大きな言語モデル(LLM)は、知識を内部化し、自然言語の質問に答える素晴らしい能力を示している。従来の知識グラフがLLMに置き換わるべきかどうかについては,従来の知識グラフがLLMに置き換わるべきかどうか,コミュニティは疑念を抱いている。本稿では,LLMによって知識グラフのスキーマ(分類学)が時代遅れになるかどうかを問う。直感的には、LLMは一般的な分類学や人間に共通する分類学レベルでうまく機能すべきである。残念なことに、LLMを一般的なドメインから特定のドメイン、ルートからリーフまでのレベルまで幅広く評価する包括的なベンチマークが欠けているため、確実な結論が得られます。研究ギャップを狭めるため,分類学上のLLMの性能を評価するため,TaxoGlimpseという新しい分類階層構造探索ベンチマークを構築した。 TaxoGlimpseは10の代表的な分類体系を網羅し、根から葉まで、この分類学におけるさまざまなレベルの実体の詳細な実験を行っている。現状のLLM18種の総合的な実験から, LLM18種の分類学的知識を十分に把握できないことが確認された。

Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask if the schema of knowledge graph (i.e., taxonomy) is made obsolete by LLMs. Intuitively, LLMs should perform well on common taxonomies and at taxonomy levels that are common to people. Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion. To narrow the research gap, we constructed a novel taxonomy hierarchical structure discovery benchmark named TaxoGlimpse to evaluate the performance of LLMs over taxonomies. TaxoGlimpse covers ten representative taxonomies from common to specialized domains with in-depth experiments of different levels of entities in this taxonomy from root to leaf. Our comprehensive experiments of eighteen state-of-the-art LLMs under three prompting settings validate that LLMs can still not well capture the knowledge of specialized taxonomies and leaf-level entities.

翻訳日:2024-06-18 18:53:41 公開日:2024-06-17

# RePrompt:大規模言語モデルエージェントのための自動プロンプトエンジニアリングによる計画

RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents ( http://arxiv.org/abs/2406.11132v1 )

ライセンス: Link先を確認

Weizhe Chen, Sven Koenig, Bistra Dilkina,

(参考訳) この1年間で、大規模言語モデル(LLM)は、従来の自然言語処理以外の領域で顕著な成功を収め、コード生成や旅行計画、ロボット制御といったアプリケーションドメインに近い、より一般的で近い分野におけるLLMの使用を探求し始めている。 LLMを優れた能力と外部ツールで結びつけることで、人びとはLLMエージェントと呼ばれるエージェントを構築している。これらすべての領域において、LLMのプロンプトはLLMが生成するものに大きな違いを示し、LLMエージェントの性能に影響を及ぼす。したがって、自動プロンプトエンジニアリングは多くの研究者やLLMのユーザにとって重要な問題となっている。本稿では, LLMエージェントとの対話から得られるチャット履歴に基づいて, LLMエージェントのプロンプトにおけるステップバイステップ命令を最適化する,新しい手法である「textsc{RePrompt}」を提案する。プロンプトを最適化することで、LLMは特定のドメインで計画する方法を学ぶことができる。我々はPDDL生成と旅行計画において、更新プロンプトを初期プロンプトとして使用する場合、一般的に異なる推論タスクの性能を向上させることができることを示すために実験を行った。

In this past year, large language models (LLMs) have had remarkable success in domains outside the traditional natural language processing, and people are starting to explore the usage of LLMs in more general and close to application domains like code generation, travel planning, and robot controls. Connecting these LLMs with great capacity and external tools, people are building the so-called LLM agents, which are supposed to help people do all kinds of work in everyday life. In all these domains, the prompt to the LLMs has been shown to make a big difference in what the LLM would generate and thus affect the performance of the LLM agents. Therefore, automatic prompt engineering has become an important question for many researchers and users of LLMs. In this paper, we propose a novel method, \textsc{RePrompt}, which does "gradient descent" to optimize the step-by-step instructions in the prompt of the LLM agents based on the chat history obtained from interactions with LLM agents. By optimizing the prompt, the LLM will learn how to plan in specific domains. We have used experiments in PDDL generation and travel planning to show that our method could generally improve the performance for different reasoning tasks when using the updated prompt as the initial prompt.