Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240603となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 有害自殺検出 Harmful Suicide Content Detection ( http://arxiv.org/abs/2407.13942v1 ) ライセンス: Link先を確認	Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park,	(参考訳) インターネット上の有害な自殺コンテンツは、脆弱な人口の自殺的思考や行動を引き起こす重要な危険因子である。世界的努力にもかかわらず、既存の資源、特に大韓民国のような高リスク地域では不足している。現在の研究は、内容の有害性を自動的に検出するのではなく、個人におけるそのような内容や自殺リスクのネガティブな影響を理解することに焦点を当てている。このギャップを埋めるために、オンライン自殺コンテンツを5つの有害レベルに分類する有害自殺コンテンツ検出タスクを導入する。我々は,医療専門家と共同でマルチモーダル・ベンチマークとタスク記述文書を開発し,大規模言語モデル(LLM)を活用して,そのようなコンテンツをモデレートするための効率的な手法を探索する。コントリビューションには,新たな検出タスクの提案,専門家アノテーションを用いたマルチモーダル韓国ベンチマーク,違法かつ有害なコンテンツの検出にLLMを用いた戦略の提案などが含まれている。潜在的な害が伴うため、倫理的検証プロセスを導入し、実装とベンチマークを公表します。 Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.	翻訳日:2024-08-05 01:55:24 公開日:2024-06-03
# 人工知能を使って集団知能を加速する - ポリシーシンスとよりスマートなクラウドソーシング Using Artificial Intelligence to Accelerate Collective Intelligence: Policy Synth and Smarter Crowdsourcing ( http://arxiv.org/abs/2407.13960v1 ) ライセンス: Link先を確認	Róbert Bjarnason, Dane Gambrell, Joshua Lanthier-Welch,	(参考訳) 社会の急激な変化と複雑な課題を特徴とする時代には、公共セクターにおける従来の問題解決方法が不十分になってきている。本研究では, 人工知能を用いて, 緊急時問題に対する効果的な解を, より効率的に生成することのできる, 革新的で効果的なモデルを提案する。クラウドソーシングを通じて、問題に関する専門知識を持つ人々の集合的インテリジェンスを行動可能なソリューションに変換するために設計された、Smarter Crowdsourcingと呼ばれる、実証済みの集合的インテリジェンス手法について説明する。次に、AIを活用する革新的なツールキットであるPolicy Synthを紹介します。 Policy Synthは人間中心のアプローチを使って開発されており、AIは人間の知性と創造性を高めるツールであり、それを置き換えるものではない、と認識している。専門家のクラウドソーシングの結果と、ポリシーシンスAIエージェントが支援する専門家のクラウドソーシング結果を比較した実世界のケーススタディに基づいて、我々は、ポリシーシンスによるスマートクラウドソーシングが、人間の専門家の集合的な知恵とAIの計算力を統合して、公共の問題解決プロセスの強化とスケールアップに有効なモデルを提供すると結論付けた。既存の多くのアプローチでは、AIをクラウドソーシングと熟考プロセスをより効率的にするためのツールとして見ているが、Policy Synthはさらに一歩進んで、AIが研究と共にエンゲージメントからの発見を合成し、エビデンスベースのソリューションとポリシーを開発するために使用できることを認識している。この研究は、緊急の社会的課題に対処するために、コミュニティを効果的に取り組もうとする機関に対して、実践的なツールと洞察を提供する。 In an era characterized by rapid societal changes and complex challenges, institutions' traditional methods of problem-solving in the public sector are increasingly proving inadequate. In this study, we present an innovative and effective model for how institutions can use artificial intelligence to enable groups of people to generate effective solutions to urgent problems more efficiently. We describe a proven collective intelligence method, called Smarter Crowdsourcing, which is designed to channel the collective intelligence of those with expertise about a problem into actionable solutions through crowdsourcing. Then we introduce Policy Synth, an innovative toolkit which leverages AI to make the Smarter Crowdsourcing problem-solving approach both more scalable, more effective and more efficient. Policy Synth is crafted using a human-centric approach, recognizing that AI is a tool to enhance human intelligence and creativity, not replace it. Based on a real-world case study comparing the results of expert crowdsourcing alone with expert sourcing supported by Policy Synth AI agents, we conclude that Smarter Crowdsourcing with Policy Synth presents an effective model for integrating the collective wisdom of human experts and the computational power of AI to enhance and scale up public problem-solving processes. While many existing approaches view AI as a tool to make crowdsourcing and deliberative processes better and more efficient, Policy Synth goes a step further, recognizing that AI can also be used to synthesize the findings from engagements together with research to develop evidence-based solutions and policies. The study offers practical tools and insights for institutions looking to engage communities effectively in addressing urgent societal challenges.	翻訳日:2024-08-05 01:55:24 公開日:2024-06-03
# コンピューターの創造性は死んだインターネットで繁栄しているか? Is computational creativity flourishing on the dead internet? ( http://arxiv.org/abs/2407.17590v1 ) ライセンス: Link先を確認	Terence Broad,	(参考訳) 死んだインターネット理論は、ソーシャルメディア上のすべてのインタラクションとポストは、もはや現実の人間ではなく、自律的なボットによって作られている、という陰謀論である。この理論は明らかに真実ではないが、ソーシャルメディアへの投稿が増えているのは、フォロワーを獲得してソーシャルメディアプラットフォームへのエンゲージメントを促進するために最適化されたボットによるものだ。本稿では、これらのボットの最近の現象を考察し、それらの振る舞いを計算的創造性のレンズを通して分析し、その疑問を考察する: 計算的創造性は死んだインターネット上で繁栄しているか? The dead internet theory is a conspiracy theory that states that all interactions and posts on social media are no longer being made by real people, but rather by autonomous bots. While the theory is obviously not true, an increasing amount of posts on social media have been made by bots optimised to gain followers and drive engagement on social media platforms. This paper looks at the recent phenomenon of these bots, analysing their behaviour through the lens of computational creativity to investigate the question: is computational creativity flourishing on the dead internet?	翻訳日:2024-08-05 01:35:56 公開日:2024-06-03
# テキスト・画像拡散モデルのための分割自由誘導法 Segmentation-Free Guidance for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2407.04800v1 ) ライセンス: Link先を確認	Kambiz Azarian, Debasmit Das, Qiqi Hou, Fatih Porikli,	(参考訳) 安定拡散のようなテキストと画像の拡散モデルのための新しい手法であるセグメンテーションフリーガイダンスを導入する。拡散モデルの再学習は不要である。追加の計算コストなしでは、拡散モデル自体をインプリッドセグメンテーションネットワークとして使用し、したがってセグメンテーションフリーガイダンスと呼ばれ、プロンプトの概念に対するパッチの関連性に基づいて、生成された画像の各パッチに対する負のプロンプトを動的に調整する。 FID,CLIP,IS,PickScoreを主観的,主観的に評価する。主観評価には,MS COCO-30Kのようなデータセットのプロンプトをサブサンプリングする手法も提案する。その結果,広く使用されている分類器フリー手法に対するセグメント化フリーガイダンスの優位性を示した。人間の評価者は、分類子なしの60%から19%よりもセグメンテーションなしの指導を好んだが、その18%は強い嗜好を示した。さらに、最近提案された人間の嗜好を模倣する指標であるPickScore win-rateも、分類器フリーよりもメソッドの好みを示している。 We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generated image, based on the patch's relevance to concepts in the prompt. We evaluate segmentation-free guidance both objectively, using FID, CLIP, IS, and PickScore, and subjectively, through human evaluators. For the subjective evaluation, we also propose a methodology for subsampling the prompts in a dataset like MS COCO-30K to keep the number of human evaluations manageable while ensuring that the selected subset is both representative in terms of content and fair in terms of model performance. The results demonstrate the superiority of our segmentation-free guidance to the widely used classifier-free method. Human evaluators preferred segmentation-free guidance over classifier-free 60% to 19%, with 18% of occasions showing a strong preference. Additionally, PickScore win-rate, a recently proposed metric mimicking human preference, also indicates a preference for our method over classifier-free.	翻訳日:2024-07-22 14:29:03 公開日:2024-06-03
# グラディエントDescent法によるMU-MIMO放送チャンネルの連成星形成 Joint Constellation Shaping Using Gradient Descent Approach for MU-MIMO Broadcast Channel ( http://arxiv.org/abs/2407.07708v1 ) ライセンス: Link先を確認	Maxime Vaillant, Alix Jeannerot, Jean-Marie Gorce,	(参考訳) 我々は,マルチユーザMIMO放送チャンネル(T$Txアンテナ,K$ユーザ,それぞれ$R$Rxアンテナ)のコンステレーションを,完全チャネル知識で最適化するための学習ベースのアプローチを導入する。最適化器(MAX-MIN)の目的は、送信機と受信機間の最小の相互情報を和力制約の下で最大化することである。提案手法は、送信機に重ね合わせ符号(SC)やその他の線形プリコーディングの使用や、受信機での逐次干渉キャンセル(SIC)の使用を強制しない。その代わりに、各受信機$k$のサブスペースへの投影を最適化し、送信された各バイナリ入力$W_k$と意図された受信機$Y_k$の出力信号との間の最小相互情報$I(W_k;Y_k)$を最大化する。本手法により得られたレートは,線形プリコーダで得られたレートと比較される。 We introduce a learning-based approach to optimize a joint constellation for a multi-user MIMO broadcast channel ($T$ Tx antennas, $K$ users, each with $R$ Rx antennas), with perfect channel knowledge. The aim of the optimizer (MAX-MIN) is to maximize the minimum mutual information between the transmitter and each receiver, under a sum-power constraint. The proposed optimization method do neither impose the transmitter to use superposition coding (SC) or any other linear precoding, nor to use successive interference cancellation (SIC) at the receiver. Instead, the approach designs a joint constellation, optimized such that its projection into the subspace of each receiver $k$, maximizes the minimum mutual information $I(W_k;Y_k)$ between each transmitted binary input $W_k$ and the output signal at the intended receiver $Y_k$. The rates obtained by our method are compared to those achieved with linear precoders.	翻訳日:2024-07-22 13:58:01 公開日:2024-06-03
# ディープスパイクニューロンネットワークの効率化に向けて:圧縮に関する調査研究 Toward Efficient Deep Spiking Neuron Networks:A Survey On Compression ( http://arxiv.org/abs/2407.08744v1 ) ライセンス: Link先を確認	Hui Xie, Ge Yang, Wenjuan Gao,	(参考訳) ディープラーニングの急速な発展に伴い、Deep Spiking Neural Networks(DSNN)は、独自のスパイクイベント処理と非同期計算のために、有望な存在として現れている。ニューロモルフィックチップにデプロイすると、DSNNはディープ・ニューラル・ニューラルネットワーク(DANN)よりも大きなパワーアドバンテージを提供し、スパイク(0または1)のバイナリの性質による時間とエネルギー消費の乗算をなくす。さらに、DSNNは時間情報の処理に優れており、DANNよりも時間データの処理に優れている可能性がある。しかし、その深いネットワーク構造と多くのパラメータは計算コストとエネルギー消費を増大させ、実際の展開を制限する。 DSNNの効率を高めるために、研究者は、プルーニング、量子化、知識蒸留といったDANNの手法を応用し、スパイクシューティングやプルーニングタイムステップの削減のような特定の技術を開発した。以前の調査では、DSNNのアルゴリズム、ハードウェアデプロイメント、一般的な概要をカバーしていたが、DSNNの圧縮と効率性についての研究は欠如している。本研究では,効率的なDSNNとその圧縮手法に集中することで,このギャップを解消する。 DSNNの生物学的背景と計算単位の探索から始まり、DANNとの違いを強調している。その後、プルーニング、量子化、知識の蒸留、スパイク発火の低減など様々な圧縮手法を練り込み、今後の研究の方向性を示唆した。 With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or 1). Additionally, DSNNs excel in processing temporal information, making them potentially superior for handling temporal data compared to DANNs. However, their deep network structure and numerous parameters result in high computational costs and energy consumption, limiting real-life deployment. To enhance DSNNs efficiency, researchers have adapted methods from DANNs, such as pruning, quantization, and knowledge distillation, and developed specific techniques like reducing spike firing and pruning time steps. While previous surveys have covered DSNNs algorithms, hardware deployment, and general overviews, focused research on DSNNs compression and efficiency has been lacking. This survey addresses this gap by concentrating on efficient DSNNs and their compression methods. It begins with an exploration of DSNNs' biological background and computational units, highlighting differences from DANNs. It then delves into various compression methods, including pruning, quantization, knowledge distillation, and reducing spike firing, and concludes with suggestions for future research directions.	翻訳日:2024-07-22 13:48:17 公開日:2024-06-03
# 汎用人工知能システムの設計と強化のための進化計算:調査と展望 Evolutionary Computation for the Design and Enrichment of General-Purpose Artificial Intelligence Systems: Survey and Prospects ( http://arxiv.org/abs/2407.08745v1 ) ライセンス: Link先を確認	Javier Poyatos, Javier Del Ser, Salvador Garcia, Hisao Ishibuchi, Daniel Molina, Isaac Triguero, Bing Xue, Xin Yao, Francisco Herrera,	(参考訳) 人工知能では、多様な学習タスクを扱うことができる適応モデルへの需要が増加しており、単一のタスクに対処するために考案されたシステムの制限を超越している。最近の汎用人工知能システム(GPAIS)の出現は、従来の機械学習モデルの最適設計よりもはるかに複雑なスケールでモデル構成と適応性の問題を引き起こす。進化計算(Evolutionary Computation:EC)は、機械学習モデルの設計と最適化の両方に有用なツールであり、考慮中のタスクに自分自身を設定および/または適応する能力を提供する。したがって、GPAISへの応用は自然な選択である。本稿では,GPAISの分野におけるECの役割を解析し,その設計や富化におけるECの利用について検討する。私たちはまた、GPAISのプロパティを、ECが目立った貢献をした機械学習領域にマッチさせ、GPAISのECの最近のマイルストーンを強調します。さらに、GPAISにおけるECのメリットを活用し、GPAISをECで設計・改善するための異なる戦略を提示し、接する領域をカバーし、研究ニッチを識別し、ECとGPAISの潜在的研究方向性を概説する課題についても論じる。 In Artificial Intelligence, there is an increasing demand for adaptive models capable of dealing with a diverse spectrum of learning tasks, surpassing the limitations of systems devised to cope with a single task. The recent emergence of General-Purpose Artificial Intelligence Systems (GPAIS) poses model configuration and adaptability challenges at far greater complexity scales than the optimal design of traditional Machine Learning models. Evolutionary Computation (EC) has been a useful tool for both the design and optimization of Machine Learning models, endowing them with the capability to configure and/or adapt themselves to the task under consideration. Therefore, their application to GPAIS is a natural choice. This paper aims to analyze the role of EC in the field of GPAIS, exploring the use of EC for their design or enrichment. We also match GPAIS properties to Machine Learning areas in which EC has had a notable contribution, highlighting recent milestones of EC for GPAIS. Furthermore, we discuss the challenges of harnessing the benefits of EC for GPAIS, presenting different strategies to both design and improve GPAIS with EC, covering tangential areas, identifying research niches, and outlining potential research directions for EC and GPAIS.	翻訳日:2024-07-22 13:48:17 公開日:2024-06-03
# Twitterボット分類のための時系列スパイクニューラルネットワークにおけるイベント空間の反復 Iteration over event space in time-to-first-spike spiking neural networks for Twitter bot classification ( http://arxiv.org/abs/2407.08746v1 ) ライセンス: Link先を確認	Mateusz Pabian, Dominik Rzepka, Mirosław Pawlak,	(参考訳) 本研究では,従来の時分割スパイクスパイクニューラルネットワーク(SNN)モデルを拡張して,時間とともに情報を処理するフレームワークを提案する。本稿では、各ニューロンにおける複数の入力と出力のスパイクを持つモデルによるスパイク伝播と、エンドツーエンドのバックプロパゲーションのためのトレーニングルールの設計について説明する。この戦略により、時間とともに変化する情報を処理できます。モデルは、イベントの時間(ツイートとリツイート)が情報の主要キャリアであるTwitterボット検出タスクでトレーニングされ、評価される。このタスクは、提案されたSNNが、時間スケールで発生した数百のイベントからなるスパイクトレインデータをどのように扱うかを評価するために選択された。各種パラメータがモデル特性,性能,訓練時間安定性に与える影響を解析した。 This study proposes a framework that extends existing time-coding time-to-first-spike spiking neural network (SNN) models to allow processing information changing over time. We explain spike propagation through a model with multiple input and output spikes at each neuron, as well as design training rules for end-to-end backpropagation. This strategy enables us to process information changing over time. The model is trained and evaluated on a Twitter bot detection task where the time of events (tweets and retweets) is the primary carrier of information. This task was chosen to evaluate how the proposed SNN deals with spike train data composed of hundreds of events occurring at timescales differing by almost five orders of magnitude. The impact of various parameters on model properties, performance and training-time stability is analyzed.	翻訳日:2024-07-22 13:48:17 公開日:2024-06-03
# 大規模言語モデルのライフサイクル:教育におけるバイアスの概観 The Life Cycle of Large Language Models: A Review of Biases in Education ( http://arxiv.org/abs/2407.11203v1 ) ライセンス: Link先を確認	Jinsook Lee, Yann Hicke, Renzhe Yu, Christopher Brooks, René F. Kizilcec,	(参考訳) 大規模言語モデル(LLM)は、学生や教師にパーソナライズされたサポートを提供するために、教育の文脈でますます採用されている。自然言語を理解・生成するLLMベースのアプリケーションの前例のない能力は、指導効果と学習結果を改善する可能性があるが、教育技術におけるLLMの統合は、教育的不平等を悪化させる可能性のあるアルゴリズムバイアスに対して、新たな懸念を抱いている。本稿では,従来の機械学習のライフサイクルをマッピングするための先行研究に基づいて,LLMの初期開発から教育環境における各種応用のための事前学習モデルのカスタマイズまで,LCMのライフサイクルの全体地図を提供する。 LLMのライフサイクルにおける各ステップを説明し、教育の文脈で生じる可能性のあるバイアスの原因を特定する。従来の機械学習による偏見は、テキストが高次元であること、複数の正しい応答が存在すること、不公平であることより、教育におけるLLM生成コンテンツへの変換に失敗する可能性があること、などについて論じる。本論は,LLMアプリケーションにおける偏見の複雑な性質を明らかにすることを目的として,その評価のための実践的ガイダンスを提供する。 Large Language Models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers. The unprecedented capacity of LLM-based applications to understand and generate natural language can potentially improve instructional effectiveness and learning outcomes, but the integration of LLMs in education technology has renewed concerns over algorithmic bias which may exacerbate educational inequities. In this review, building on prior work on mapping the traditional machine learning life cycle, we provide a holistic map of the LLM life cycle from the initial development of LLMs to customizing pre-trained models for various applications in educational settings. We explain each step in the LLM life cycle and identify potential sources of bias that may arise in the context of education. We discuss why current measures of bias from traditional machine learning fail to transfer to LLM-generated content in education, such as tutoring conversations because the text is high-dimensional, there can be multiple correct responses, and tailoring responses may be pedagogically desirable rather than unfair. This review aims to clarify the complex nature of bias in LLM applications and provide practical guidance for their evaluation to promote educational equity.	翻訳日:2024-07-22 12:00:08 公開日:2024-06-03
# 学習バディとしてのジェネレーティブAI : 教員の使い方と態度 Generative AI as a Learning Buddy and Teaching Assistant: Pre-service Teachers' Uses and Attitudes ( http://arxiv.org/abs/2407.11983v1 ) ライセンス: Link先を確認	Matthew Nyaaba, Lehong Shi, Macharious Nabang, Xiaoming Zhai, Patrick Kyeremeh, Samuel Arthur Ayoberd, Bismark Nyaaba Akanzire,	(参考訳) 先進的な教員(PST)のユーザ体験と生成的人工知能(GenAI)アプリケーションに対する認識を明らかにするために,Ghana PSTsの学習仲間および指導助手としてのGenAIの具体的な使用状況と,それらの応用に対する態度を調査した。探索的因子分析(EFA)を用いて,PSTのGenAIに対する態度を形作る3つの重要な要因を同定した。これらの要因の平均スコアは、GenAIに対する概して肯定的な態度を示し、PSTのコンテンツ知識を高め、学習や教材へのアクセスを可能とすることで、同僚の援助の必要性を減らした。特に、PSTは、GenAIを学習仲間として、読み物、深い内容の説明、実践例へのアクセス、教材の強化、アセスメント戦略の展開、プランニングの指導支援として利用している。回帰分析の結果,年齢,性別,研究年数などの背景因子はPSTsのGenAIに対する態度を予測しないが,年齢と研究年数はGenAIの使用頻度を有意に予測する一方で,性別は予測しないことがわかった。これらの結果から,教員教育プログラムにおける高齢者のPSTとそれに伴うPSTは,より頻繁にGenAIを使用する可能性があるが,その適用に対する認識は変化していないことが示唆された。しかし、PSTはGenAIアプリケーションが提供する情報の正確性と信頼性に関する懸念を表明している。そこで我々は,これらの懸念に対処し,教員準備プログラムにおいてPSTが確実にこれらの応用に頼れるようにすることを提案する。さらに,PSTの学習・教育プロセスにGenAIをより効果的に統合するための戦略を推奨する。 To uncover pre-service teachers' (PSTs') user experience and perceptions of generative artificial intelligence (GenAI) applications, we surveyed 167 Ghana PSTs' specific uses of GenAI as a learning buddy and teaching assistant, and their attitudes towards these applications. Employing exploratory factor analysis (EFA), we identified three key factors shaping PSTs' attitudes towards GenAI: teaching, learning, and ethical and advocacy factors. The mean scores of these factors revealed a generally positive attitude towards GenAI, indicating high levels of agreement on its potential to enhance PSTs' content knowledge and access to learning and teaching resources, thereby reducing their need for assistance from colleagues. Specifically, PSTs use GenAI as a learning buddy to access reading materials, in-depth content explanations, and practical examples, and as a teaching assistant to enhance teaching resources, develop assessment strategies, and plan lessons. A regression analysis showed that background factors such as age, gender, and year of study do not predict PSTs' attitudes towards GenAI, but age and year of study significantly predict the frequency of their use of GenAI, while gender does not. These findings suggest that older PSTs and those further along in their teacher education programs may use GenAI more frequently, but their perceptions of the application remain unchanged. However, PSTs expressed concerns about the accuracy and trustworthiness of the information provided by GenAI applications. We, therefore, suggest addressing these concerns to ensure PSTs can confidently rely on these applications in their teacher preparation programs. Additionally, we recommend targeted strategies to integrate GenAI more effectively into both learning and teaching processes for PSTs.	翻訳日:2024-07-22 11:50:18 公開日:2024-06-03
# AI開発とガバナンスへの参加的アプローチ:原則的アプローチ Participatory Approaches in AI Development and Governance: A Principled Approach ( http://arxiv.org/abs/2407.13100v1 ) ライセンス: Link先を確認	Ambreesh Parthasarathy, Aditya Phalnikar, Ameen Jauhar, Dhruv Somayajula, Gokul S Krishnan, Balaraman Ravindran,	(参考訳) 人工知能(AI)技術が公共部門や民間セクターに広く採用され、新しい、予期せぬ方法で人々の生活に大きな影響を与えている。この文脈では、設計、開発、デプロイメントがどのように行われるかを知ることが重要になります。この調査の結果、これらのシステムの展開によって影響を受けそうな人は、どのように開発されているかはほとんど語られていないことが明らかとなった。この研究は、より責任があり、安全で、人間中心のAIシステムを構築し、使用するのに、参加的アプローチが(実用的にも規範的にも)有益である、という前提を推し進めている。厳密には、これはプロセスの公正性を高め、市民が自分の生活に大きな影響を及ぼす可能性のあるシステムへの関心を喚起する権限を与える。実際には、AIアルゴリズムの品質向上に役立ちそうな、新たな情報手段を開発者に提供します。論文はまず,AIシステムのライフサイクルを説明することによって,この議論を推し進める。第2に,参加型エクササイズにおいて関連する利害関係者を特定するために使用される基準を特定し,第3に,関連する利害関係者をAIライフサイクルの異なる段階にマッピングすることによって,この議論を推し進める。本稿は、AIにおける参加型ガバナンスに関する2部構成のシリーズの第1部を構成する。第2の論文では、本論文で開発された原則を拡張し、拡張し、実際のAIシステムのユースケースに適用する。 The widespread adoption of Artificial Intelligence (AI) technologies in the public and private sectors has resulted in them significantly impacting the lives of people in new and unexpected ways. In this context, it becomes important to inquire how their design, development and deployment takes place. Upon this inquiry, it is seen that persons who will be impacted by the deployment of these systems have little to no say in how they are developed. Seeing this as a lacuna, this research study advances the premise that a participatory approach is beneficial (both practically and normatively) to building and using more responsible, safe, and human-centric AI systems. Normatively, it enhances the fairness of the process and empowers citizens in voicing concerns to systems that may heavily impact their lives. Practically, it provides developers with new avenues of information which will be beneficial to them in improving the quality of the AI algorithm. The paper advances this argument first, by describing the life cycle of an AI system; second, by identifying criteria which may be used to identify relevant stakeholders for a participatory exercise; and third, by mapping relevant stakeholders to different stages of AI lifecycle. This paper forms the first part of a two-part series on participatory governance in AI. The second paper will expand upon and concretise the principles developed in this paper and apply the same to actual use cases of AI systems.	翻訳日:2024-07-22 08:07:30 公開日:2024-06-03
# AI開発とガバナンスへの参加的アプローチ:ケーススタディ Participatory Approaches in AI Development and Governance: Case Studies ( http://arxiv.org/abs/2407.13103v1 ) ライセンス: Link先を確認	Ambreesh Parthasarathy, Aditya Phalnikar, Gokul S Krishnan, Ameen Jauhar, Balaraman Ravindran,	(参考訳) 本稿では、AI開発と展開への参加的アプローチの価値に関する2部シリーズの第2部を構成する。最初の論文は、この2つのエクササイズ(つまり、AIの開発と展開)に参加メソッドをデプロイするための、原則と実践的な正当化を考案した。現実的な正当化は、よりきめ細かい情報を提供することで、全体的なアルゴリズムの品質を向上させることである。より原則化された正当化は、アルゴリズムの展開に影響を受けそうな人たちへの声を提供し、AIシステムの信頼と購入を築こうとするエンゲージメントを通じて実現している。参加型アプローチでは、AIシステムのライフサイクルを通じて、実際の意思決定プロセスにさまざまな利害関係者(特定の方法を定義する)を含めます。上記の正当化にもかかわらず、実際の実装は、プロセス全体の利害関係者の特定方法、どのような情報が提供され、どのように組み込まれているかに大きく依存する。本稿では、これらの予備的な結論を、法と秩序の覚醒における顔認識技術の使用と、医療分野における大規模言語モデルの使用の2つの分野で検証する。これらの部門は2つの主要な理由から選ばれた。 Facial Recognition Technologiesは、よく研究され、その影響が十分に文書化されているAIソリューションの分野であるため、PAIを既存のドメイン、特に最近かなり批判的な領域に適応するさまざまな側面を説明するための確立されたスペースを提供する。医療分野におけるLLMは、比較的研究の少ない分野のキャンバスを提供し、イノベーションが常に患者の福祉と整合しなくてはならない分野において、比較的新しい技術のためにPAIの原則を具現化する方法を、どのように想像できるかを説明するのに役立つ。 This paper forms the second of a two-part series on the value of a participatory approach to AI development and deployment. The first paper had crafted a principled, as well as pragmatic, justification for deploying participatory methods in these two exercises (that is, development and deployment of AI). The pragmatic justification is that it improves the quality of the overall algorithm by providing more granular and minute information. The more principled justification is that it offers a voice to those who are going to be affected by the deployment of the algorithm, and through engagement attempts to build trust and buy-in for an AI system. By a participatory approach, we mean including various stakeholders (defined a certain way) in the actual decision making process through the life cycle of an AI system. Despite the justifications offered above, actual implementation depends crucially on how stakeholders in the entire process are identified, what information is elicited from them, and how it is incorporated. This paper will test these preliminary conclusions in two sectors, the use of facial recognition technology in the upkeep of law and order and the use of large language models in the healthcare sector. These sectors have been chosen for two primary reasons. Since Facial Recognition Technologies are a branch of AI solutions that are well-researched and the impact of which is well documented, it provides an established space to illustrate the various aspects of adapting PAI to an existing domain, especially one that has been quite contentious in the recent past. LLMs in healthcare provide a canvas for a relatively less explored space, and helps us illustrate how one could possibly envision enshrining the principles of PAI for a relatively new technology, in a space where innovation must always align with patient welfare.	翻訳日:2024-07-22 08:07:30 公開日:2024-06-03
# MOT:アルゴリズム取引のための最適輸送によるアクター強化学習手法の混合 MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading ( http://arxiv.org/abs/2407.01577v1 ) ライセンス: Link先を確認	Xi Cheng, Jinghao Zhang, Yunan Zeng, Wenfang Xue,	(参考訳) アルゴリズム取引は、自動的に特定された取引機会に基づいて、特定の資産の売買注文を実行することを指す。強化学習(RL)に基づく戦略は,アルゴリズム取引問題に対処する際,顕著な能力を示した。しかし、流通データの変化により、取引パターンは市場状況によって異なる。データ内の複数のパターンを無視することは、RLのパフォーマンスを損なう。本稿では,複数のアクターを非交叉表現学習で設計し,市場の異なるパターンをモデル化するMOTを提案する。さらに、正規化損失項を導入することにより、サンプルを適切なアクターに割り当てるために、最適なトランスポート(OT)アルゴリズムを組み込む。さらに,アクターの出力を専門家の戦略と整合させ,RLの探索と活用のバランスを良くすることで,模倣学習を容易にするためのPretrain Moduleを提案する。将来の市場データによる実験結果から,MOTはリスクのバランスを保ちながら優れた収益性を示すことが示された。アブレーション研究はMOTの成分の有効性を検証する。 Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the data will undermine the performance of RL. In this paper, we propose MOT,which designs multiple actors with disentangled representation learning to model the different patterns of the market. Furthermore, we incorporate the Optimal Transport (OT) algorithm to allocate samples to the appropriate actor by introducing a regularization loss term. Additionally, we propose Pretrain Module to facilitate imitation learning by aligning the outputs of actors with expert strategy and better balance the exploration and exploitation of RL. Experimental results on real futures market data demonstrate that MOT exhibits excellent profit capabilities while balancing risks. Ablation studies validate the effectiveness of the components of MOT.	翻訳日:2024-07-07 13:34:23 公開日:2024-06-03
# 反復的局所探索-スパロー探索アルゴリズムに基づくユーザVR体験予測のためのランダムフォレスト機械学習アルゴリズムの最適化 Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm ( http://arxiv.org/abs/2406.16905v1 ) ライセンス: Link先を確認	Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong,	(参考訳) 本稿では,空間探索アルゴリズムと局所探索最適化スパロウ探索アルゴリズムにより改良されたランダムフォレストアルゴリズムを導入することにより,VRユーザエクスペリエンス予測の改善手法について検討する。この研究はまずデータを統計的に分析し、続いて従来のランダム森林モデルを用いて訓練および試験を行い、スパロウ探索アルゴリズムによって改良されたランダム森林モデルと、反復的局所探索-スパロー探索アルゴリズムに基づいて改良されたランダム森林アルゴリズムを用いてランダム森林モデルを構築した。その結果、従来のランダム林モデルでは、トレーニングセットで93%の予測精度を持つが、一般化が不十分なテストセットでは73.3%に過ぎず、一方、スパロウ探索アルゴリズムで改良されたモデルは、従来のモデルと比較して94%の予測精度を持つことがわかった。さらに注目すべきは、反復的な局所探索-スパロー探索アルゴリズムに基づく改良されたモデルが、トレーニングとテストセットの両方で100%精度を達成し、他の2つの手法よりもはるかに優れていることである。これらの研究結果は、VRユーザエクスペリエンス予測の新しいアイデアと方法、特に、反復的局所探索-スパロー探索アルゴリズムに基づく改善されたモデルを提供し、ユーザのVRエクスペリエンスをより正確に予測し、分類することができる。将来的には、他の分野への本手法の適用をさらに検討し、実際の事例を通してその有効性を検証し、ユーザエクスペリエンス分野におけるAI技術の開発を促進することができる。 In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by the sparrow search algorithm, and the random forest algorithm improved based on the iterative local search-sparrow search algorithm, respectively. The results show that the traditional random forest model has a prediction accuracy of 93% on the training set but only 73.3% on the test set, which is poor in generalisation; whereas the model improved by the sparrow search algorithm has a prediction accuracy of 94% on the test set, which is improved compared with the traditional model. What is more noteworthy is that the improved model based on the iterative local search-sparrow search algorithm achieves 100% accuracy on both the training and test sets, which is significantly better than the other two methods. These research results provide new ideas and methods for VR user experience prediction, especially the improved model based on the iterative local search-sparrow search algorithm performs well and is able to more accurately predict and classify the user's VR experience. In the future, the application of this method in other fields can be further explored, and its effectiveness can be verified through real cases to promote the development of AI technology in the field of user experience.	翻訳日:2024-07-01 06:41:31 公開日:2024-06-03
# REST: 残留状態更新による効率よく加速されたEEGseizure分析 REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates ( http://arxiv.org/abs/2406.16906v1 ) ライセンス: Link先を確認	Arshia Afzal, Grigorios Chrysos, Volkan Cevher, Mahsa Shoaran,	(参考訳) EEGベースの発作検出モデルは、推測速度とメモリ効率の点で課題に直面し、臨床機器におけるリアルタイム実装を制限する。本稿では、てんかん発作検出などのアプリケーションにおけるリアルタイム脳波信号解析のための新しいグラフベースの残状態更新機構(REST)を提案する。グラフニューラルネットワークとリカレント構造の組み合わせを活用することで、RESTは、非ユークリッド幾何学とEEGデータ内の時間的依存関係の両方を効率的にキャプチャする。本モデルは,発作検出と分類作業において高い精度を示す。特に、RESTは最先端のモデルと比較して、推論速度の9倍の大幅な加速を実現していますが、同時にこのタスクで使用される最小のモデルよりもメモリをかなり少なく要求しています。これらの属性は、RESTを、レスポンシブ神経刺激や発作警報システムなど、臨床機器におけるリアルタイム実装の候補と位置づけている。 EEG-based seizure detection models face challenges in terms of inference speed and memory efficiency, limiting their real-time implementation in clinical devices. This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis in applications such as epileptic seizure detection. By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data. Our model demonstrates high accuracy in both seizure detection and classification tasks. Notably, REST achieves a remarkable 9-fold acceleration in inference speed compared to state-of-the-art models, while simultaneously demanding substantially less memory than the smallest model employed for this task. These attributes position REST as a promising candidate for real-time implementation in clinical devices, such as Responsive Neurostimulation or seizure alert systems.	翻訳日:2024-07-01 06:41:31 公開日:2024-06-03
# FLOW:IMUを用いたユーザ間人間活動認識のためのグローバルおよびローカルビューの融合とシャッフル FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs ( http://arxiv.org/abs/2406.18569v1 ) ライセンス: Link先を確認	Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie,	(参考訳) 慣性測定ユニット(IMU)センサーは、可搬性、エネルギー効率、研究の関心の高まりにより、HAR(Human Activity Recognition)に広く利用されている。しかし、IMU-HARモデルにとって重要な課題は、多様なユーザー間で堅牢な一般化性能を達成することである。この制限は、個々のユーザ間でのデータ分散のかなりのバリエーションに起因する。この分布の相違の主な理由は、局所座標系におけるIMUセンサデータの表現にある。この問題に対処するために,IMUデータの特徴に基づいてグローバルなビュー表現を抽出し,着用スタイルによるデータ分散の相違を効果的に緩和する手法を提案する。グローバルビュー表現の有効性を検証するため,グローバルビューデータとローカルビューデータの両方を実験モデルに投入した。その結果,グローバルなビューデータは,ユーザ間の実験において,ローカルなビューデータよりも有意に優れていた。さらに,Shufflingに基づくマルチビュー監視ネットワーク(MVFNet)を提案し,ローカルビューとグローバルビューデータを効果的に融合させる。ビュー分割とビューシャッフルを通じて各ビューの特徴抽出を監督し、重要な特徴を無視したモデルを避ける。 OPPORTUNITYとPAMAP2データセットを用いた大規模な実験により、提案アルゴリズムはユーザ間HARにおける現在の最先端手法よりも優れていることを示した。 Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-03
# 画像生成器の創造的な流動度を計測する「バグ」ではなく「バグ」 It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators ( http://arxiv.org/abs/2406.18570v1 ) ライセンス: Link先を確認	Aditi Ramaswamy, Melane Navaratnarajah, Hana Chockler,	(参考訳) 無償で利用できる画像生成装置の登場に伴い、AI生成アートは、人間の創造性の概念に関する一連の熱い議論の中心となっている。画像生成AIは、アーティストと同じタイプの「創造性」を示すことができる。本稿では,AIにおける創造的行動の1つの側面を定義し,実験的に測定する試みとして,選択された画像生成装置の「素早い解釈の流動性」や単に「流動性」を定量化する実験を行った。流動性を研究するために,(1) 初期「地中真実」の画像を用いた自動生成プロンプトと画像のチェーンの作成,(3) 既存の視覚的および意味的指標を用いたこれらのチェーンの破壊点の測定,(4) 統計的検査と視覚的説明の両方を用いて,これらのチェーンを解析し,生成に使用する画像生成装置が流動性を示すか否かを判定する。 With the rise of freely available image generators, AI-generated art has become the centre of a series of heated debates, one of which concerns the concept of human creativity. Can an image generation AI exhibit ``creativity'' of the same type that artists do, and if so, how does that manifest? Our paper attempts to define and empirically measure one facet of creative behavior in AI, by conducting an experiment to quantify the "fluidity of prompt interpretation", or just "fluidity", in a series of selected popular image generators. To study fluidity, we (1) introduce a clear definition for it, (2) create chains of auto-generated prompts and images seeded with an initial "ground-truth: image, (3) measure these chains' breakage points using preexisting visual and semantic metrics, and (4) use both statistical tests and visual explanations to study these chains and determine whether the image generators used to produce them exhibit significant fluidity.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-03
# UltraCortex: サブミリ超高磁場9.4 T1脳MR画像収集と手動皮質切片 UltraCortex: Submillimeter Ultra-High Field 9.4 T1 Brain MR Image Collection and Manual Cortical Segmentations ( http://arxiv.org/abs/2406.18571v1 ) ライセンス: Link先を確認	Lucas Mahler, Julius Steiglechner, Benjamin Bender, Tobias Lindig, Dana Ramadan, Jonas Bause, Florian Birk, Rahel Heule, Edyta Charyasz, Michael Erb, Vinod Jangir Kumar, Gisela E Hagberg, Pascal Martin, Gabriele Lohmann, Klaus Scheffler,	(参考訳) UltraCortexリポジトリ(https://www.ultracortex.org)には、超高磁場強度9.4Tで取得したヒト脳の磁気共鳴画像データが格納されている。さらに、レポジトリは12の脳をグレーとホワイトの物質区画に分割する。これらのセグメンテーションは、2人の専門神経放射線学者によって独立に検証され、信頼できる金の標準として確立されている。このリソースは、高品質な脳画像データと検証されたセグメンテーションへのアクセスを提供し、神経画像の研究を促進し、脳の構造と機能の理解を促進する。既存のリポジトリは7 T以上のフィールド強度を許容せず、検証されたセグメンテーションも提供せず、この新しいリソースの重要性を強調している。 The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independently validated by two expert neuroradiologists, thus establishing them as a reliable gold standard. This resource provides researchers with access to high-quality brain imaging data and validated segmentations, facilitating neuroimaging studies and advancing our understanding of brain structure and function. Existing repositories do not accommodate field strengths beyond 7 T, nor do they offer validated segmentations, underscoring the significance of this new resource.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-03
# GeoReasoner:大規模視覚言語モデルを用いたストリートビューにおける推論による地理局在化 GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model ( http://arxiv.org/abs/2406.18572v1 ) ライセンス: Link先を確認	Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng,	(参考訳) 本研究は,人間の推論知識を付加した大規模視覚言語モデル (LVLM) を用いた新しいパラダイムを用いて,ジオローカライゼーションの課題に取り組む。既存のストリートビューデータセットには、視覚的な手がかりが欠如し、推論が欠如している多くの低品質画像が含まれていることが多い。データ品質の問題に対処するため、我々はCLIPベースのネットワークを考案し、街路ビューがどこにあるかを定量化し、高度に配置可能な街路ビューからなる新しいデータセットを作成する。推論の精度を高めるために,実地局所化ゲームから得られた外部知識を統合し,価値ある人間の推論能力を活用する。データはGeoReasonerのトレーニングに利用される。質的および定量的評価により、GeoReasonerは、国レベルでは25%以上、都市レベルでは38%、StreetCLIPのパフォーマンスを上回り、トレーニングリソースの削減を図っている。データとコードはhttps://github.com/lingli1996/GeoReasoner.comで入手できる。 This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https://github.com/lingli1996/GeoReasoner.	翻訳日:2024-07-01 05:50:36 公開日:2024-06-03
# O(3)等変結晶テンソル予測のための空間群対称性インフォームドネットワーク A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction ( http://arxiv.org/abs/2406.12888v1 ) ライセンス: Link先を確認	Keqiang Yan, Alexandra Saxton, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji,	(参考訳) 誘電体,圧電体,弾性テンソルを含む結晶材料の一般的な引張特性の予測を考察する。ここでの重要な課題は、予測が O(3) 群に対する一意のテンソル同値と結晶空間群への不変性を満足させる方法である。そこで本研究では,必要な対称性を満たすために,GMTNet(General Materials Tensor Network)を提案する。提案手法を評価するため, 結晶テンソル予測の複雑さに合わせて, データセットをキュレートし, 評価指標を確立する。実験結果から,GMTNetは様々な順序の結晶テンソル上での有望な性能を達成するだけでなく,固有結晶対称性と完全に一致した予測を生成することがわかった。私たちのコードはAIRSライブラリ(https://github.com/divelab/AIRS)の一部として公開されています。 We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to O(3) group and invariance to crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries. To evaluate our method, we curate a dataset and establish evaluation metrics that are tailored to the intricacies of crystal tensor predictions. Experimental results show that our GMTNet not only achieves promising performance on crystal tensors of various orders but also generates predictions fully consistent with the intrinsic crystal symmetries. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).	翻訳日:2024-06-23 13:24:48 公開日:2024-06-03
# 小言語モデルにおけるスパースアクティベーションの実現 Achieving Sparse Activation in Small Language Models ( http://arxiv.org/abs/2406.06562v1 ) ライセンス: Link先を確認	Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao,	(参考訳) 入力依存ニューロンのみを選択的に活性化するスパースアクティベーションは、再訓練や適応をすることなく、LLM(Large Language Models)の計算コストを削減するのに有用である。しかし、最近登場したSLM(Small Language Models)に適用できるかどうかは疑問視されている。本稿では,SLMにおけるスパースアクティベーションの実現を目指す。まず, ニューロンの出力大小をベースとしたLLMのスパース活性化スキームはSLMには適用できないことを示し, その属性スコアに基づいてニューロンを活性化することがよりよい選択肢であることを示した。さらに,異なる層にまたがるニューロンの属性スコア間の相互依存性から,スパースアクティベーション時に既存の属性メトリクスの大規模な誤差を実証し,定量化した。これらの観測に基づいて,これらの誤りを確実に修正し,正確なスパースアクティベーションを実現するための新しい属性指標を提案した。複数のSLMおよびデータセットに対する実験結果から,本手法はモデルの精度損失を5%に抑えながら80%のスペース化比を達成できることが示唆された。ソースコードは、https://github.com/pittisl/Sparse-Activation.comで入手できる。 Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In this paper, we aim to achieve sparse activation in SLMs. We first show that the existing sparse activation schemes in LLMs that build on neurons' output magnitudes cannot be applied to SLMs, and activating neurons based on their attribution scores is a better alternative. Further, we demonstrated and quantified the large errors of existing attribution metrics when being used for sparse activation, due to the interdependency among attribution scores of neurons across different layers. Based on these observations, we proposed a new attribution metric that can provably correct such errors and achieve precise sparse activation. Experiments over multiple popular SLMs and datasets show that our approach can achieve 80% sparsification ratio with <5% model accuracy loss, comparable to the sparse activation achieved in LLMs. The source code is available at: https://github.com/pittisl/Sparse-Activation.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# Skywork-MoE:Mixture-of-Experts言語モデルのトレーニングテクニックを深く掘り下げる Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models ( http://arxiv.org/abs/2406.06563v1 ) ライセンス: Link先を確認	Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou,	(参考訳) 本稿では,約1460億のパラメータと16人のエキスパートを対象とする,高性能な言語モデル (LLM) であるSkywork-MoEの開発に実装されたトレーニング手法を紹介する。既存のSkywork-13Bモデルの高密度チェックポイントから初期化されています。我々は,スクラッチ初期化によるアップサイクリングとトレーニングの比較効果について検討した。以上の結果から,これらの2つのアプローチの選択は,既存の高密度チェックポイントの性能とMoEトレーニング予算の両方を考慮すべきであることが示唆された。本稿では, 適応型補助損失係数, 適応型補助損失係数を改良し, 補助損失係数の層特異的な調整を可能にする2つの革新的な手法について述べる。これらの手法の有効性を実験的に検証した。これらの技術と洞察を活用して、SkyPileコーパスの凝縮したサブセットで、リサイクルされたSkywork-MoEをトレーニングしました。評価結果は,本モデルが幅広いベンチマークで高い性能を示すことを示す。 In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# 動的パラメータ調整による大規模言語モデル学習の革新 Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment ( http://arxiv.org/abs/2406.06564v1 ) ライセンス: Link先を確認	Kaiye Zhou, Shucheng Wang,	(参考訳) 大規模言語モデルの時代になると、計算資源の効率的な利用への需要が重要になってきている。パラメータ効率のよい微調整技術は完全な微調整に匹敵する結果を得たが、事前学習フェーズでの応用は大きな課題を生んでいる。具体的には、特に大規模モデルにおいて、事前学習の開始時にパラメータ効率の戦略を採用することは、効率を著しく損なう可能性がある。本稿では,パラメータのトレーニング可能な部分を頻繁に変更し,効果的な事前学習を容易にする新しいパラメータ効率訓練手法を提案する。提案手法は, 事前学習段階における現在最先端パラメータ効率アルゴリズムに匹敵するメモリ削減と計算オーバーヘッドを達成するだけでなく, 完全事前学習段階に匹敵する精度も維持する。提案手法の有効性を実証するために,理論的解析と実証的証拠の両方を提供する。 In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can severely compromise efficiency, especially in larger models. In this paper, building upon the fine-tuning method LoRA, we introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method not only achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase but also maintains accuracy levels comparable to those of full pre-training. We provide both theoretical analyses and empirical evidence to demonstrate the effectiveness of our approach.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# MixEval: LLMベンチマークから群衆の知恵を導き出す MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures ( http://arxiv.org/abs/2406.06565v1 ) ライセンス: Link先を確認	Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You,	(参考訳) 大規模言語モデル(LLM)の評価は難しい。 LLM-as-judgeベンチマークは、グレーディングバイアスと限られたクエリ量に悩まされている。両者とも時間とともに汚染されることもある。 Chatbot Arenaのようなユーザによる評価は、信頼できる信号を提供するが、高価で遅い。そこで本研究では,市販のベンチマークを戦略的に混合することにより,効率的な金標準LCM評価を実現するための新しいパラダイムであるMixEvalを提案する。提案手法は,(1)包括的でよく分散された実世界のユーザクエリと(2)Webから抽出したクエリと,既存のベンチマークからの類似したクエリとをマッチングすることによって,効率よく,かつ,かなり改善された基盤トラスベースのベンチマークを橋渡しする。 MixEvalをベースにMixEval-Hardを構築しました。本ベンチマークの利点は,(1) 高速かつ安価かつ再現性の高い実行(MMLUの時間とコストの6%),(3) 高速かつ安定なデータ更新パイプラインで実現可能な動的評価などである。我々は, LLM評価に関するコミュニティの理解を深め, 今後の研究方向性を導くため, 既存の LLM ベンチマークのメタ評価と分析を行う。 Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and slow. In this work, we propose MixEval, a new paradigm for establishing efficient, gold-standard LLM evaluation by strategically mixing off-the-shelf benchmarks. It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks. Based on MixEval, we further build MixEval-Hard, which offers more room for model improvement. Our benchmarks' advantages lie in (1) a 0.96 model ranking correlation with Chatbot Arena arising from the highly impartial query distribution and grading mechanism, (2) fast, cheap, and reproducible execution (6% of the time and cost of MMLU), and (3) dynamic evaluation enabled by the rapid and stable data update pipeline. We provide extensive meta-evaluation and analysis for our and existing LLM benchmarks to deepen the community's understanding of LLM evaluation and guide future research directions.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# 家庭電力モニタリングに関するRAGの議論が可能に RAG Enabled Conversations about Household Electricity Monitoring ( http://arxiv.org/abs/2406.06566v1 ) ライセンス: Link先を確認	Carolina Fortuna, Vid Hanžel, Blaž Bertalanič,	(参考訳) 本稿では,ChatGPT,Gemini,Llamaなどの大規模言語モデル(LLM)とRAG(Retrieval Augmented Generation)を統合することにより,電気データセットに関する複雑な質問に対する応答の精度と特異性を向上する。実感的理解よりもトレーニングデータのパターンに依存しているため,LLMの正確で文脈的に関係のある回答を生成する際の限界を認識し,専門的な電気知識グラフを活用するソリューションを提案する。このアプローチは、LLMの生成能力によって合成される正確なリアルタイムデータの検索を容易にする。以上の結果から,RAG手法はLLMが生成する誤情報の発生を減少させるだけでなく,検証可能なデータに応答することで,出力の質を著しく向上させることがわかった。本稿では、我々の方法論を詳述し、RAGを用いた応答と非応答の比較分析を行い、エネルギーデータ分析のような専門分野におけるAIの今後の応用について考察する。 In this paper, we investigate the integration of Retrieval Augmented Generation (RAG) with large language models (LLMs) such as ChatGPT, Gemini, and Llama to enhance the accuracy and specificity of responses to complex questions about electricity datasets. Recognizing the limitations of LLMs in generating precise and contextually relevant answers due to their dependency on the patterns in training data rather than factual understanding, we propose a solution that leverages a specialized electricity knowledge graph. This approach facilitates the retrieval of accurate, real-time data which is then synthesized with the generative capabilities of LLMs. Our findings illustrate that the RAG approach not only reduces the incidence of incorrect information typically generated by LLMs but also significantly improves the quality of the output by grounding responses in verifiable data. This paper details our methodology, presents a comparative analysis of responses with and without RAG, and discusses the implications of our findings for future applications of AI in specialized sectors like energy data analysis.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# DHA:適応型頭融合による変圧器チェックポイントからの非結合型注意の学習 DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion ( http://arxiv.org/abs/2406.06567v1 ) ライセンス: Link先を確認	Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun,	(参考訳) 数十億のパラメータを持つ大規模言語モデル(LLM)は、素晴らしいパフォーマンスを示している。しかし、LLMにおけるMHA(Multi-Head Attention)は、推論中にかなりの計算コストとメモリコストを発生させる。ヘッドを切断したり、ヘッド間でパラメータを共有することで注意機構を最適化する試みもあるが、これらの手法は性能低下や性能回復のためにかなりの事前訓練コストを必要とすることが多い。注意力の冗長性の分析に基づいて,DHA(Decoupled-Head Attention)機構を設計する。 DHAは、様々なレイヤにわたるキーヘッドとバリューヘッドのグループ共有を適応的に構成し、パフォーマンスと効率のバランスを改善する。そこで本研究では,MHAチェックポイントのパラメトリック知識を維持しつつ,類似頭部パラメータの線形融合を段階的に行うことで,MHAチェックポイントをDHAモデルに段階的に変換することを提案する。 DHA モデルの構築には,目標とする予算に応じて様々な MHA チェックポイントを変換する。我々の実験によると、DHAは、75パーセントのKVキャッシュを節約しながら97.6%のパフォーマンスを達成するために、オリジナルのモデルの事前トレーニング予算のわずか0.25\%しか必要としていない。グループクエリアテンション(GQA)と比較して、DHAは5$\times$トレーニングアクセラレーション、最大13.93\%の事前トレーニング予算、0.05\%の相対的な改善を達成している。 Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget.	翻訳日:2024-06-17 00:11:14 公開日:2024-06-03
# 合成データによる臨床ドキュメンテーションの強化:精度向上のための生成モデルを活用する Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy ( http://arxiv.org/abs/2406.06569v1 ) ライセンス: Link先を確認	Anjanava Biswas, Wrick Talukdar,	(参考訳) 正確かつ包括的な臨床文書は、高品質な医療の提供、提供者間の効果的なコミュニケーションの促進、規制要件の遵守の確保に不可欠である。しかし、手動による書き起こしとデータ入力のプロセスは、時間がかかり、エラーが発生し、不整合に陥り、不完全または不正確な医療記録に繋がる。本稿では, 臨床文書の現実的, 多様な書式を生成するために, 合成データ生成技術を活用することによって, 臨床文書の充実に向けた新たなアプローチを提案する。本稿では,GAN (Generative Adversarial Networks) やVAE (variantal Autoencoders) といった最先端のジェネレーティブ・モデルと,実際の臨床転写とその他の臨床データを組み合わせて合成転写を生成する手法を提案する。これらの合成写本は、既存のドキュメントワークフローを補完し、自然言語処理モデルのための追加のトレーニングデータを提供し、より正確で効率的な転写プロセスを可能にするために使用することができる。匿名化クリニカル・トランスクリプトの大規模なデータセットに関する広範な実験を通じて、実世界のデータによく似た高品質な合成・トランスクリプトを作成する上で、我々のアプローチの有効性を実証した。パープレキシティスコアやBLEUスコア、ドメインの専門家による質的評価などの定量的評価指標は、生成された合成転写産物の忠実さと有用性を検証する。本研究は, 患者医療の改善, 管理負担の軽減, 医療システム効率の向上など, 臨床ドキュメントの課題に対処する合成データ生成の可能性を明らかにするものである。 Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper proposes a novel approach to augment clinical documentation by leveraging synthetic data generation techniques to generate realistic and diverse clinical transcripts. We present a methodology that combines state-of-the-art generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), with real-world clinical transcript and other forms of clinical data to generate synthetic transcripts. These synthetic transcripts can then be used to supplement existing documentation workflows, providing additional training data for natural language processing models and enabling more accurate and efficient transcription processes. Through extensive experiments on a large dataset of anonymized clinical transcripts, we demonstrate the effectiveness of our approach in generating high-quality synthetic transcripts that closely resemble real-world data. Quantitative evaluation metrics, including perplexity scores and BLEU scores, as well as qualitative assessments by domain experts, validate the fidelity and utility of the generated synthetic transcripts. Our findings highlight synthetic data generation's potential to address clinical documentation challenges, improving patient care, reducing administrative burdens, and enhancing healthcare system efficiency.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# コンピュータ・エピグラフィーの概観 Review of Computational Epigraphy ( http://arxiv.org/abs/2406.06570v1 ) ライセンス: Link先を確認	Vishal Kumar,	(参考訳) 計算エピノグラフィー(Computational Epigraphy)とは、計算手法の助けを借りて、石碑文、翻訳、解釈、帰属からテキストを抽出する過程を指す。伝統的なエピノグラフィー法は時間がかかり、テキストを抽出しながら碑文を損傷させる傾向がある。さらに、解釈と帰属は主観的であり、異なるエピグラフィーによって異なる可能性がある。しかし、現代の計算手法は、テキストを抽出するだけでなく、テキストを頑健な方法で解釈し、属性付けするためにも利用できる。エピノグラフィーにおける上記の課題を支援する既存の計算手法を調査・文書化する。 Computational Epigraphy refers to the process of extracting text from stone inscription, transliteration, interpretation, and attribution with the aid of computational methods. Traditional epigraphy methods are time consuming, and tend to damage the stone inscriptions while extracting text. Additionally, interpretation and attribution are subjective and can vary between different epigraphers. However, using modern computation methods can not only be used to extract text, but also interpret and attribute the text in a robust way. We survey and document the existing computational methods that aid in the above-mentioned tasks in epigraphy.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# SUBLLM: LLMのためのToken Sequence Subsamplingを用いた新しい効率的なアーキテクチャ SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM ( http://arxiv.org/abs/2406.06571v1 ) ライセンス: Link先を確認	Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang,	(参考訳) 大規模言語モデル(LLM)は様々な分野で大きな成功を収めてきたが、トレーニングと推論の効率性は依然として大きな課題である。本稿では,Subsampling-Upsampling-Bypass Large Language Modelの略で,Subsampling, Upsampling, Bypassモジュールを組み込んでコアデコーダのみのフレームワークを拡張する革新的なアーキテクチャであるSUBLLMを提案する。サブサンプリングモジュールはシーケンスを短縮し、アップサンプリングモジュールはシーケンスの長さを復元し、バイパスモジュールは収束を高める。 LLaMAと比較して、提案されたSUBLLMは、トレーニング速度と推論速度、メモリ使用量の両方で大幅に向上し、競合する数ショットのパフォーマンスを維持している。トレーニング中、SUBLLMはスピードを26%向上し、GPU毎にメモリを10GB削減する。推論では、スピードを最大37%向上し、1GPUあたりのメモリを1GB削減する。トレーニングと推論のスピードは、コンテキストウィンドウが8192に拡張された場合、それぞれ34%と52%向上できる。提案されたアーキテクチャのソースコードを公開バージョンで公開します。 While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# グラフニューラルネットワークによるLLMの問合せ検索 Graph Neural Network Enhanced Retrieval for Question Answering of LLMs ( http://arxiv.org/abs/2406.06572v1 ) ライセンス: Link先を確認	Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang,	(参考訳) 検索拡張生成は、ファクトサポートを提供することで、大規模言語モデル(LLM)の出力に革命をもたらした。それにもかかわらず、複雑な推論問題に必要な知識をすべて捉えるのに苦労している。既存の検索方法は通常、参照文書を通路に分割し、それらを分離して扱う。しかし、これらの節はしばしば相互に関連しており、例えば連続した節や同じキーワードを共有している節などである。したがって、検索プロセスの強化には関連性を認識することが不可欠である。本稿では,グラフニューラルネットワーク(GNN)を利用した新しい検索手法GNN-Retを提案する。具体的には、まず、構造に関連がありキーワードに関連のある通路を接続することで、通路のグラフを構築する。グラフニューラルネットワーク(GNN)を使用して、パス間の関係を利用して、サポートパスの検索を改善する。さらに、リカレントグラフニューラルネットワーク(RGNN-Ret)を用いて、マルチホップ推論問題に対処する手法を拡張した。各ステップにおいて、RGNN-Retは、前のステップからのパスのグラフを統合し、サポートパスの検索を強化する。ベンチマークデータセットに対する大規模な実験により、GNN-Retは複数のクエリを必要とする強いベースラインよりも単一のLLMクエリによる質問応答の精度が高く、RGNN-Retはさらに精度を改善し、最先端のパフォーマンスを実現し、2WikiMQAデータセットでは最大10.4%の精度向上を実現している。 Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are contiguous or share the same keywords. Therefore, recognizing the relatedness is crucial for enhancing the retrieval process. In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by considering the relatedness between passages. Specifically, we first construct a graph of passages by connecting passages that are structure-related and keyword-related. A graph neural network (GNN) is then leveraged to exploit the relationships between passages and improve the retrieval of supporting passages. Furthermore, we extend our method to handle multi-hop reasoning questions using a recurrent graph neural network (RGNN), named RGNN-Ret. At each step, RGNN-Ret integrates the graphs of passages from previous steps, thereby enhancing the retrieval of supporting passages. Extensive experiments on benchmark datasets demonstrate that GNN-Ret achieves higher accuracy for question answering with a single query of LLMs than strong baselines that require multiple queries, and RGNN-Ret further improves accuracy and achieves state-of-the-art performance, with up to 10.4% accuracy improvement on the 2WikiMQA dataset.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# MedFuzz: 医療質問応答における大規模言語モデルのロバスト性を探る MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering ( http://arxiv.org/abs/2406.06573v1 ) ライセンス: Link先を確認	Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E. Priebe, Eric Horvitz,	(参考訳) 大規模言語モデル (LLM) は、医学的質問応答ベンチマークにおいて優れたパフォーマンスを達成している。しかし、高いベンチマーク精度は、実際の臨床環境にパフォーマンスが一般化することを意味するものではない。医学的質問答えベンチマークは、LLMのパフォーマンスの定量化と整合した仮定に頼っているが、クリニックのオープンワールドには当てはまらないかもしれない。しかし、LLMは、著名なベンチマークにおける非現実的な仮定に関係なく、LLMが実践的な条件に一般化するのに役立つ幅広い知識を学習する。我々は, LLM の医学的質問応答ベンチマークの性能が, ベンチマークの仮定に違反した場合にどのように一般化するかを定量化する。具体的には,MedFuzz(医療ファズリング)と呼ぶ逆法を提案する。 MedFuzz は LLM のコンバウンドを目的とした方法でベンチマークの問題を修正しようと試みている。 MedQAベンチマークで提示された患者特性に関する強い仮定をターゲットとして,本手法を実証する。成功した"アタック"は、ベンチマーク項目を医療専門家を騙す可能性が低い方法で修正するが、にもかかわらず、LSMを正しい回答から間違った回答に変更させる"トリック"を行う。さらに,攻撃を成功させることが統計的に重要であることを示すための置換試験手法を提案する。我々は、"MedFuzzed"ベンチマークでのパフォーマンスの使用方法と、個々の攻撃の成功例を示します。これらの手法は、LLMがより現実的な環境で堅牢に動作する能力についての洞察を提供することを約束している。 Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong assumptions about patient characteristics presented in the MedQA benchmark. Successful "attacks" modify a benchmark item in ways that would be unlikely to fool a medical expert but nonetheless "trick" the LLM into changing from a correct to an incorrect answer. Further, we present a permutation test technique that can ensure a successful attack is statistically significant. We show how to use performance on a "MedFuzzed" benchmark, as well as individual successful attacks. The methods show promise at providing insights into the ability of an LLM to operate robustly in more realistic settings.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# 透明性に向けて:ビジュアルトピックモデリングとセマンティックフレームによるLCMトレーニングデータセットの探索 Towards Transparency: Exploring LLM Trainings Datasets through Visual Topic Modeling and Semantic Frame ( http://arxiv.org/abs/2406.06574v1 ) ライセンス: Link先を確認	Charles de Dampierre, Andrei Mogoutov, Nicolas Baumard,	(参考訳) LLMは現在、質問に答えることから物事の分類に至るまで、日々の生活において重要な役割を担っている。近年、計算とモデルアーキテクチャは急速に拡大しているが、トレーニングデータセットのキュレーションへの取り組みはまだ始まったばかりである。このトレーニングデータセットの過小評価により、LLMはバイアスのある低品質のコンテンツを作成できるようになった。この問題を解決するために、AIと認知科学を活用してテキストデータセットの洗練を改善するソフトウェアであるBunkaを紹介する。トピックモデリングと2次元カルトグラフィーを組み合わせることで、データセットの透明性が向上することを示す。次に、同じトピックモデリング手法をPreferencesデータセットに適用して、微調整プロセスを加速し、異なるベンチマーク上でモデルの能力を高める方法を示す。最後に、フレーム分析を用いることで、トレーニングコーパス内の既存のバイアスに対する洞察が得られることを示す。全体として、私たちはLLMのトレーニングデータセットの品質と透明性を探求し、向上するためのより良いツールが必要であると論じています。 LLMs are now responsible for making many decisions on behalf of humans: from answering questions to classifying things, they have become an important part of everyday life. While computation and model architecture have been rapidly expanding in recent years, the efforts towards curating training datasets are still in their beginnings. This underappreciation of training datasets has led LLMs to create biased and low-quality content. In order to solve that issue, we present Bunka, a software that leverages AI and Cognitive Science to improve the refinement of textual datasets. We show how Topic Modeling coupled with 2-dimensional Cartography can increase the transparency of datasets. We then show how the same Topic Modeling techniques can be applied to Preferences datasets to accelerate the fine-tuning process and increase the capacities of the model on different benchmarks. Lastly, we show how using Frame Analysis can give insights into existing biases in the training corpus. Overall, we argue that we need better tools to explore and increase the quality and transparency of LLMs training datasets.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# Ask-EDA: LLM, Hybrid RAG, Abbreviation De-hallucinationを活用したデザインアシスタント Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-hallucination ( http://arxiv.org/abs/2406.06575v1 ) ライセンス: Link先を確認	Luyao Shi, Michael Kazda, Bradley Sears, Nick Shropshire, Ruchir Puri,	(参考訳) 電子設計技術者は、設計構築、検証、技術開発における無数のタスクに対して、関連情報を効率的に見つけることが課題である。大規模言語モデル(LLM)は、主語の専門家として効果的に機能する会話エージェントとして機能することで生産性を向上させる可能性がある。本稿では,設計技術者にガイダンスを提供するために,24x7のエキスパートとして設計されたチャットエージェントであるAsk-EDAを実演する。 Ask-EDAは、LLM、ハイブリッド検索拡張生成(RAG)、短縮脱ハロシン化(ADH)技術を利用して、より関連性が高く正確な応答を提供する。我々は,q2a-100,cmds-100,abr-100の3つの評価データセットをキュレートした。各データセットは、一般的な設計質問応答、デザインコマンドハンドリング、省略解決といった、異なる側面を評価するように調整されている。我々は、ハイブリッドRAGがq2a-100データセットのリコールを40%以上改善し、cmds-100データセットの60%以上をRAGを使用しないのに対して、ADHはabr-100データセットのリコールを70%以上改善することを示した。評価の結果,Ask-EDAは設計関連質問に対して効果的に応答できることがわかった。 Electronic design engineers are challenged to find relevant information efficiently for a myriad of tasks within design construction, verification and technology development. Large language models (LLM) have the potential to help improve productivity by serving as conversational agents that effectively function as subject-matter experts. In this paper we demonstrate Ask-EDA, a chat agent designed to serve as a 24x7 expert available to provide guidance to design engineers. Ask-EDA leverages LLM, hybrid retrieval augmented generation (RAG) and abbreviation de-hallucination (ADH) techniques to deliver more relevant and accurate responses. We curated three evaluation datasets, namely q2a-100, cmds-100 and abbr-100. Each dataset is tailored to assess a distinct aspect: general design question answering, design command handling and abbreviation resolution. We demonstrated that hybrid RAG offers over a 40% improvement in Recall on the q2a-100 dataset and over a 60% improvement on the cmds-100 dataset compared to not using RAG, while ADH yields over a 70% enhancement in Recall on the abbr-100 dataset. The evaluation results show that Ask-EDA can effectively respond to design-related inquiries.	翻訳日:2024-06-17 00:04:06 公開日:2024-06-03
# VerilogReader: LLM支援ハードウェアテスト生成 VerilogReader: LLM-Aided Hardware Test Generation ( http://arxiv.org/abs/2406.04373v1 ) ライセンス: Link先を確認	Ruiyang Ma, Yuxin Yang, Ziqian Liu, Jiaxi Zhang, Min Li, Junhua Huang, Guojie Luo,	(参考訳) テスト生成はハードウェア設計の検証において、重要かつ労働集約的なプロセスである。近年,Large Language Model (LLM) の出現とその高度な理解と推論能力は,新しいアプローチを導入している。本研究では,LLMがVerilog Readerとして機能するCoverage Directed Test Generation (CDG)プロセスへのLLMの統合について検討する。コードロジックを正確に把握し、未探索のコードブランチに到達可能な刺激を生成する。私たちは、自設計のVerilogベンチマークスイートを使用して、ランダムなテストとフレームワークを比較します。実験により,本フレームワークはLLMの理解範囲内での設計におけるランダムなテストよりも優れていることが示された。また,LLMの理解範囲と精度を高めるために,迅速な工学的最適化を提案する。 Test generation has been a critical and labor-intensive process in hardware design verification. Recently, the emergence of Large Language Model (LLM) with their advanced understanding and inference capabilities, has introduced a novel approach. In this work, we investigate the integration of LLM into the Coverage Directed Test Generation (CDG) process, where the LLM functions as a Verilog Reader. It accurately grasps the code logic, thereby generating stimuli that can reach unexplored code branches. We compare our framework with random testing, using our self-designed Verilog benchmark suite. Experiments demonstrate that our framework outperforms random testing on designs within the LLM's comprehension scope. Our work also proposes prompt engineering optimizations to augment LLM's understanding scope and accuracy.	翻訳日:2024-06-10 18:49:00 公開日:2024-06-03
# $\ell_0$-regularized問題に対する分岐境界更新フレームワーク A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems ( http://arxiv.org/abs/2406.03504v1 ) ライセンス: Link先を確認	Theo Guyard, Cédric Herzet, Clément Elvira, Ayşe-Nur Arslan,	(参考訳) 本稿では,ブランチ・アンド・バウンド(BnB)アルゴリズムによる$\ell_0$-regularizationを含む学習問題の解決について考察する。これらの手法は、問題の実現可能な空間の領域を探索し、それらが「プルーニングテスト」によって解を含まないかどうかを確認する。標準的な実装では、プルーニングテストの評価には凸最適化の問題が解決され、計算ボトルネックが発生する可能性がある。本稿では,$\ell_0$-regularized問題に対するプルーニングテストの実装方法を提案する。提案手法により,複数の領域の同時評価が可能となり,計算オーバーヘッドが無視できる標準BnB実装に組み込むことができる。我々は,機械学習アプリケーションで発生する典型的な問題に対して,BnBプロシージャの解法時間を桁違いに改善できることを数値シミュレーションにより示す。 We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.	翻訳日:2024-06-07 19:34:24 公開日:2024-06-03
# 部分ラベル情報を用いた半教師付きコントラスト学習 Semi-supervised Contrastive Learning Using Partial Label Information ( http://arxiv.org/abs/2003.07921v2 ) ライセンス: Link先を確認	Colin B. Hansen, Vishwesh Nath, Diego A. Mesa, Yuankai Huo, Bennett A. Landman, Thomas A. Lasko,	(参考訳) 半教師付き学習では、ラベルなし例からの情報はラベル付き例から学習したモデルを改善するために使用される。いくつかの学習問題では、ラベルの情報をラベルのない例から推測し、モデルをさらに改善するために使用することができる。特に、トレーニングサンプルのサブセットがラベル自体が欠落しているにも関わらず、同じラベルを持つことがわかっているときに、部分的なラベル情報が存在している。対照的な学習目標を通じて、モデルに同じラベルをすべての例に付与するように促すことで、パフォーマンスを向上する可能性がある。この促進をNullspace Tuningと呼ぶのは、同じラベルを持つ任意の一対の例の差分ベクトルが線型モデルのnull空間にあるからである。そこで,本稿では,適切に分類された公開データセットに対する慎重な比較フレームワークを用いて,部分ラベル情報を使用することの利点について検討する。部分ラベルによって提供される付加情報は、良い半教師付き手法よりもテストエラーを2倍から5.5倍に減らすことを示す。また、最新かつ最先端のMixMatchメソッドにNullspace Tuningを追加することで、テストエラーを最大1.8倍に削減することを示す。 In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# 高次元偏微分方程式に対する時空間ディープニューラルネットワーク近似 Space-time deep neural network approximations for high-dimensional partial differential equations ( http://arxiv.org/abs/2006.02199v2 ) ライセンス: Link先を確認	Fabian Hornung, Arnulf Jentzen, Diyora Salimova,	(参考訳) 応用数学において、高次元偏微分方程式(PDE)を近似的に解くことが最も難しい問題の一つであり、科学文献におけるPDEの数値近似法は、対応する近似スキームで用いられる計算演算の数が PDE 次元および/または $\varepsilon$ の逆数で指数関数的に増加するという意味で、いわゆる次元の呪いに苦しむ。近年, 深層学習に基づくPDEの近似法が提案されており, 深部ニューラルネットワーク(DNN)近似は, PDE次元の$d\in\mathbb{N}$と所定精度の$\varepsilon>0$の両方において, 近似DNNを記述するために用いられる実パラメータの数が多項式的に増加するという意味で, 次元性の呪いを克服する能力を持つ可能性が示唆されている。現在では、DNNがPDEの近似解における次元性の呪いを克服していることを証明することによって、この予想を裏付ける科学文献に厳密な結果がいくつかある。これらの結果は、DNN が適当な PDE 解を一定時間点 $T>0$ で近似し、コンパクトな立方体 $[a,b]^d$ で空間で近似することで、次元性の呪いを克服することを証明しているが、これらの結果は、次元性の呪いを伴わない DNN で PDE 解全体が $[0,T]\times [a,b]^d$ で近似できるかどうかという疑問に対する答えを与えていない。この問題を克服するのはまさにこの記事の主題である。より具体的には、この研究の主な結果は、任意の$a\in\mathbb{R}$, $ b\in (a,\infty)$に対して、あるコルモゴロフ PDE の解は時空領域 $[0,T]\times [a,b]^d$ の時空領域 $[0,T]\times [a,b]^d$ の DNN によって近似可能であることを証明している。 It is one of the most challenging issues in applied mathematics to approximately solve high-dimensional partial differential equations (PDEs) and most of the numerical approximation methods for PDEs in the scientific literature suffer from the so-called curse of dimensionality in the sense that the number of computational operations employed in the corresponding approximation scheme to obtain an approximation precision $\varepsilon>0$ grows exponentially in the PDE dimension and/or the reciprocal of $\varepsilon$. Recently, certain deep learning based approximation methods for PDEs have been proposed and various numerical simulations for such methods suggest that deep neural network (DNN) approximations might have the capacity to indeed overcome the curse of dimensionality in the sense that the number of real parameters used to describe the approximating DNNs grows at most polynomially in both the PDE dimension $d\in\mathbb{N}$ and the reciprocal of the prescribed accuracy $\varepsilon>0$. There are now also a few rigorous results in the scientific literature which substantiate this conjecture by proving that DNNs overcome the curse of dimensionality in approximating solutions of PDEs. Each of these results establishes that DNNs overcome the curse of dimensionality in approximating suitable PDE solutions at a fixed time point $T>0$ and on a compact cube $[a,b]^d$ in space but none of these results provides an answer to the question whether the entire PDE solution on $[0,T]\times [a,b]^d$ can be approximated by DNNs without the curse of dimensionality. It is precisely the subject of this article to overcome this issue. More specifically, the main result of this work in particular proves for every $a\in\mathbb{R}$, $ b\in (a,\infty)$ that solutions of certain Kolmogorov PDEs can be approximated by DNNs on the space-time region $[0,T]\times [a,b]^d$ without the curse of dimensionality.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# MNIST-1Dによるディープラーニングのスケールアップ Scaling Down Deep Learning with MNIST-1D ( http://arxiv.org/abs/2011.14439v5 ) ライセンス: Link先を確認	Sam Greydanus, Dmitry Kobak,	(参考訳) 深層学習モデルは商業的・政治的に関係があるが、その訓練と運用の重要な側面はいまだに理解されていない。これはディープラーニングプロジェクトの科学への関心を喚起し、その多くは大量の時間、お金、電気を必要とする。しかし、この研究のどれ程を大規模に行う必要があるのか? 本稿では,従来のディープラーニングベンチマークに代わる最小限のプロシージャ生成,低メモリ,低計算量であるMNIST-1Dを紹介する。 MNIST-1Dの寸法は40に過ぎず、デフォルトのトレーニングセットのサイズは4000に限られるが、MNIST-1Dは異なる深層建築の帰納バイアスの研究、宝くじの発見、深層二重降下の観察、アクティベーション関数の金属化、および自己教師付き学習におけるギロチン正則化の実証に使用できる。これらの実験はすべてGPU上で、あるいは数分でCPU上で行うことができ、高速なプロトタイピング、教育ユースケース、低予算での最先端の研究を可能にする。 Although deep learning models have taken on commercial and political relevance, key aspects of their training and operation remain poorly understood. This has sparked interest in science of deep learning projects, many of which require large amounts of time, money, and electricity. But how much of this research really needs to occur at scale? In this paper, we introduce MNIST-1D: a minimalist, procedurally generated, low-memory, and low-compute alternative to classic deep learning benchmarks. Although the dimensionality of MNIST-1D is only 40 and its default training set size only 4000, MNIST-1D can be used to study inductive biases of different deep architectures, find lottery tickets, observe deep double descent, metalearn an activation function, and demonstrate guillotine regularization in self-supervised learning. All these experiments can be conducted on a GPU or often even on a CPU within minutes, allowing for fast prototyping, educational use cases, and cutting-edge research on a low budget.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# ドメイン特化人工知能を用いた発達小児のデジタル治療の改善 : 機械学習による研究 Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study ( http://arxiv.org/abs/2012.08678v2 ) ライセンス: Link先を確認	Peter Washington, Haik Kalantarian, John Kent, Arman Husic, Aaron Kline, Emilie Leblanc, Cathy Hou, Onur Cezmi Mutlu, Kaitlyn Dunlap, Yordan Penev, Maya Varma, Nate Tyler Stockham, Brianna Chrisman, Kelley Paskov, Min Woo Sun, Jae-Yoon Jung, Catalin Voss, Nick Haber, Dennis Paul Wall,	(参考訳) 背景: 自動感情分類は、自閉症などの発達的行動状態を持つ子供を含む感情の認識に苦慮する人々を支援する。しかし、ほとんどのコンピュータビジョンの感情認識モデルは大人の感情に基づいて訓練されているため、子供の顔に適用された場合、性能は低下する。目的:我々は,児童の感情に富んだ画像の収集とラベル付けをゲーミフィケーションし,児童の感情自動認識モデルの性能を,デジタル医療のアプローチに必要なレベルに近づけるための戦略を考案した。方法: 発達的, 行動的条件の子ども向けに設計されたスマートフォンゲームGuessWhatを, ゲームによって引き起こされる様々な感情を表現した子どものビデオデータのセキュアな収集に活用した。独立して、私たちはHorwoodSquaresと呼ばれる人間のラベル付け作業をゲーミフィケーションするためのセキュアなWebインターフェースを作成しました。私たちは2155の動画、39,968の感情フレーム、106,001のラベルをすべての画像に集めてラベル付けしました。この拡張された小児感情中心データベース(既存の一般の小児感情データセットの30倍)を用いて、我々は、子供によって誘発される幸せ、悲しみ、驚き、恐怖、怒り、嫌悪感、中立表現の畳み込みニューラルネットワーク(CNN)コンピュータビジョン分類器を訓練した。結果: この分類器の精度は66.9%, 顔表情全体のF1スコア67.4%, バランスの取れた精度79.1%, CAFEサブセットAではF1スコア78%であった。この性能は、CAFEに対して評価されたすべての開発済みの分類器よりも少なくとも10%高く、最も優れたものは、"anger"と"disgust"を1つのクラスに組み合わせた場合でも、56%のバランスの取れた精度に達した。 Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# エキスパートの一貫性を活用してアルゴリズム決定サポートを改善する Leveraging Expert Consistency to Improve Algorithmic Decision Support ( http://arxiv.org/abs/2101.09648v3 ) ライセンス: Link先を確認	Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova,	(参考訳) 機械学習(ML)は、高い意思決定をサポートするためにますます使われています。しかし、意思決定タスクに対する関心の構成と、MLモデルをトレーニングするためにラベルとして使われるプロキシで捉えられるものとの間には、しばしば構成上のギャップがある。その結果、MLモデルは決定基準の重要な次元を捉えることができず、意思決定支援の実用性を阻害する可能性がある。したがって、決定支援のためのMLシステムの設計において重要なステップは、利用可能なプロキシの中からターゲットラベルを選択することである。この研究では、構成ギャップを狭めるために観測結果と組み合わせることができる情報の源泉として、歴史的専門家による決定がリッチで不完全なものとして使われることを探る。マネージャとシステムデザイナは、観察結果から学習しながら、相互に一貫性を示すケースで専門家から学ぶことに興味があるかもしれない、と私たちは主張する。我々は,組織情報システムでよく見られる情報を用いて,この目標を達成するための方法論を開発する。これには2つの中核ステップが含まれる。まず、データ内の各ケースが1人の専門家によって評価された場合、専門家の一貫性を間接的に推定する影響関数に基づく方法論を提案する。第2に,MLモデルを専門家の判断から同時に学習し,その結果を観察するラベルアマルガメーション手法を導入する。本研究は, 臨床環境におけるシミュレーションと児童福祉領域の実世界データを用いた実証的評価から, 提案手法が構成ギャップを狭くし, 観察結果や専門家の判断だけでの学習よりも優れた予測性能が得られることを示した。 Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# サンプリングの力:民間EMMにおける次元自由リスク境界 The Power of Sampling: Dimension-free Risk Bounds in Private ERM ( http://arxiv.org/abs/2105.13637v4 ) ライセンス: Link先を確認	Yin Tat Lee, Daogao Liu, Zhou Lu,	(参考訳) DP-ERM(differially private empirical risk minimization)は、プライベート最適化における基本的な問題である。 DP-ERMの理論はよく研究されているが、大規模モデルが普及するにつれて、従来のDP-ERM法は、(1)周囲次元への禁忌的依存、(2)非滑らかな目的関数、(3)高価な一階勾配オラクルなど、新しい課題に直面している。このような課題は、既存のDP-ERM方法論を再考することを要求する。本研究では,既存のサンプルと組み合わせた正規化指数関数機構が,これらの課題を完全に解決できることを示す: 標準の非制約領域と低ランク勾配仮定の下では,従来の手法では達成されなかったゼロ次オーラクルのみを用いて,非滑らかな凸対象に対するランク依存的リスクバウンダリを実現することができる。これは、差分プライバシーにおけるサンプリングのパワーを強調します。さらに下限を構築し、勾配がフルランクの場合、制約された設定と制約のない設定の間には分離がないことを示す。我々の下限は、制約された領域に制限されない一般的なブラックボックス還元と、独立した関心を持つかもしれない制約された設定における改善された下限から導かれる。 Differentially private empirical risk minimization (DP-ERM) is a fundamental problem in private optimization. While the theory of DP-ERM is well-studied, as large-scale models become prevalent, traditional DP-ERM methods face new challenges, including (1) the prohibitive dependence on the ambient dimension, (2) the highly non-smooth objective functions, (3) costly first-order gradient oracles. Such challenges demand rethinking existing DP-ERM methodologies. In this work, we show that the regularized exponential mechanism combined with existing samplers can address these challenges altogether: under the standard unconstrained domain and low-rank gradients assumptions, our algorithm can achieve rank-dependent risk bounds for non-smooth convex objectives using only zeroth order oracles, which was not accomplished by prior methods. This highlights the power of sampling in differential privacy. We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings. Our lower bound is derived from a general black-box reduction from unconstrained to the constrained domain and an improved lower bound in the constrained setting, which might be of independent interest.	翻訳日:2024-06-07 05:08:03 公開日:2024-06-03
# Rydberg原子における3体微細構造変化フェルスター共鳴に基づくトフォリゲート Toffoli gate based on a three-body fine-structure-state-changing Förster resonance in Rydberg atoms ( http://arxiv.org/abs/2112.11058v3 ) ライセンス: Link先を確認	I. N. Ashkarin, I. I. Beterov, E. A. Yakshina, D. B. Tretyakov, V. M. Entin, I. I. Ryabtsev, P. Cheinet, K. -L. Pham, S. Lepoutre, P. Pillet,	(参考訳) 我々は,3体リングバーグ相互作用を変化させる微細構造状態に基づく3量子トフォリゲートの改良手法を開発した。 I.I.Beterov et al , Physical Review A 98, 042704 (2018)]。異なるタイプの3体F\"オースター共鳴を用いることにより、レーザー励起と集合3体状態の位相ダイナミクスのスキームを大幅に単純化した。このタイプのF\オースター共鳴は、2つ以上の原子を持つ系にしか存在しないが、2体共鳴は存在しない。我々は,外部磁場のゆらぎに対するゲート密度の感度を低減し,Rydberg原子に基づくトフォリゲートの以前の方式と比較して,共振電界値の微調整に外部磁場を使用する必要をなくした。計算の結果, ゲート忠実度は99%であった。 We have developed an improved scheme of a three-qubit Toffoli gate based on fine structure state changing three-body Stark-tuned Rydberg interaction. This scheme is a substantial improvement of our previous proposal [I.I.Beterov et al., Physical Review A 98, 042704 (2018)]. Due to the use of a different type of three-body F\"orster resonance we substantially simplified the scheme of laser excitation and phase dynamics of collective three-body states. This type of F\"orster resonance exists only in systems with more than two atoms, while the two-body resonance is absent. We reduced the sensitivity of the gate fidelity to fluctuations of external electric field and eliminated the necessity to use external magnetic field for fine tuning of the resonant electric field value, compared to the previous scheme of Toffoli gate based on Rydberg atoms. A gate fidelity of >99% was demonstrated in the calculations.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-03
# TATTOOED:拡散スペクトルチャネル符号化に基づくロバストなディープニューラルネットワーク透かし方式 TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding ( http://arxiv.org/abs/2202.06091v3 ) ライセンス: Link先を確認	Giulio Pagnotta, Dorjan Hitaj, Briland Hitaj, Fernando Perez-Cruz, Luigi V. Mancini,	(参考訳) 近年、ディープニューラルネットワーク(DNN)の透かしは、所有者の許可なくこれらのモデルが取得されるシナリオにおいて、DNNの所有権を検証するメカニズムとして多くの(透かし)戦略が提案されている。しかし, 既存の透かし機構は, 微調整, パラメータの刈り取り, シャッフルなど, 除去技術に非常に敏感であることが示された。本稿では,既存の脅威に対して堅牢な新しいDNN透かし技術であるTATTOOEDを提案する。 DNN所有者は, TATTOOEDを透かし機構として使用することにより, 99%のモデルパラメータが変更されている場合においても, 透かしを取得し, モデルのオーナシップを検証できることを示した。さらに、TATTOOEDは、トレーニングパイプラインで簡単に使用でき、モデルパフォーマンスに無視できる影響があることが示される。 Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.	翻訳日:2024-06-07 04:58:43 公開日:2024-06-03
# ニューラルネットワークによるアスファルトコンクリートの疲労寿命予測 Predicting the fatigue life of asphalt concrete using neural networks ( http://arxiv.org/abs/2406.01523v1 ) ライセンス: Link先を確認	Jakub Houlík, Jan Valentin, Václav Nežerka,	(参考訳) アスファルトコンクリート(AC)の耐久性と維持要求は, その疲労寿命に強く影響される。この特徴を決定する従来の方法は、リソース集約と時間消費の両方である。本研究では, ニューラルネットワークを用いて交流疲労寿命を予測し, ひずみレベル, バインダー含量, 空気ボイド含量の影響に着目した。実際のデータセットを活用することで、当社のモデルを調整し、一般的に対数スケールで表現される幅広い疲労寿命データを効果的に扱えるようにしました。平均2乗対数誤差を損失関数として利用し, 疲労寿命のすべてのレベルにわたって予測精度を向上した。各種ハイパーパラメータの比較分析により,データ内の複雑な関係を抽出する機械学習モデルを開発した。以上の結果から, 高いバインダー含有量では疲労寿命が著しく向上する一方, 気液含量の影響はバインダー濃度によって大きく変化することが示唆された。最も重要なこととして、この研究は、ANNをモデリングに使用する複雑さに関する洞察を提供し、より大きなデータセットでその潜在的なユーティリティを示す。この研究で使用されたコードとデータはGitHubリポジトリのオープンソースとして提供され、論文には完全なアクセスのためのリンクが含まれている。 Asphalt concrete's (AC) durability and maintenance demands are strongly influenced by its fatigue life. Traditional methods for determining this characteristic are both resource-intensive and time-consuming. This study employs artificial neural networks (ANNs) to predict AC fatigue life, focusing on the impact of strain level, binder content, and air-void content. Leveraging a substantial dataset, we tailored our models to effectively handle the wide range of fatigue life data, typically represented on a logarithmic scale. The mean square logarithmic error was utilized as the loss function to enhance prediction accuracy across all levels of fatigue life. Through comparative analysis of various hyperparameters, we developed a machine-learning model that captures the complex relationships within the data. Our findings demonstrate that higher binder content significantly enhances fatigue life, while the influence of air-void content is more variable, depending on binder levels. Most importantly, this study provides insights into the intricacies of using ANNs for modeling, showcasing their potential utility with larger datasets. The codes developed and the data used in this study are provided as open source on a GitHub repository, with a link included in the paper for full access.	翻訳日:2024-06-06 23:49:24 公開日:2024-06-03
# PPINtonus:Deep-Learning Tonal Analysis を用いたパーキンソン病早期発見 PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis ( http://arxiv.org/abs/2406.02608v1 ) ライセンス: Link先を確認	Varun Reddy,	(参考訳) PPINtonusはパーキンソン病(PD)を早期に検出するためのシステムであり、ディープラーニングの音節解析を利用して、従来の神経学的検査に代わる費用対効果とアクセス性を提供する。 Parkinson's Voice Project (PVP)と共同で、PPINtonusは、半教師付き条件生成対向ネットワークを使用して合成データポイントを生成し、多層ディープニューラルネットワークのトレーニングデータセットを強化している。 PRAAT音声ソフトウェアと組み合わせて、典型的な家庭内騒音条件下で標準マイクを用いて実施した120秒音声検査から、生体医学的音声測定値を正確に評価する。モデルの性能は混乱行列を用いて検証され、92.5 \%の精度で偽陰性率を低くした。 PPINtonusは92.7 \%の精度を示し、早期PD検出のための信頼性の高いツールとなった。 PPINtonusの非侵襲的で効率的な方法は、早期診断を可能にし、タイムリーな介入と管理を通じて何百万人ものPD患者の生活の質を向上させることによって、発展途上国に多大な利益をもたらすことができる。 PPINtonus is a system for the early detection of Parkinson's Disease (PD) utilizing deep-learning tonal analysis, providing a cost-effective and accessible alternative to traditional neurological examinations. Partnering with the Parkinson's Voice Project (PVP), PPINtonus employs a semi-supervised conditional generative adversarial network to generate synthetic data points, enhancing the training dataset for a multi-layered deep neural network. Combined with PRAAT phonetics software, this network accurately assesses biomedical voice measurement values from a simple 120-second vocal test performed with a standard microphone in typical household noise conditions. The model's performance was validated using a confusion matrix, achieving an impressive 92.5 \% accuracy with a low false negative rate. PPINtonus demonstrated a precision of 92.7 \%, making it a reliable tool for early PD detection. The non-intrusive and efficient methodology of PPINtonus can significantly benefit developing countries by enabling early diagnosis and improving the quality of life for millions of PD patients through timely intervention and management.	翻訳日:2024-06-06 23:39:37 公開日:2024-06-03
# Pseudo-Label Filtering for Continual Test-Time Adaptation Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation ( http://arxiv.org/abs/2406.02609v1 ) ライセンス: Link先を確認	Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang,	(参考訳) 連続的テスト時間適応(CTTA)は、ソースデータにアクセスすることなく、テストフェーズ中に対象ドメインのシーケンスに事前訓練されたモデルを適用することを目的としている。未知のドメインからのラベルのないデータに適応するために、既存のメソッドは、すべてのサンプルに対して擬似ラベルを構築し、自己学習を通じてモデルを更新する。しかし、これらの擬似ラベルは、しばしばノイズを伴い、適応が不十分になる。 Pseudo Labeling Filter (PLF) と呼ばれるCTTAの擬似ラベル選択法を提案する。 PLFの鍵となる考え方は、擬似ラベルの適切なしきい値を選択し続け、自己学習のための信頼できるしきい値を特定することである。具体的には、初期化、成長、多様性を含む、継続的なドメイン学習の間にしきい値を設定するための3つの原則を提示します。これらの原則に基づいて、擬似ラベルをフィルタするために自己適応型閾値を設計する。さらに、未知のドメインサンプルに対して多様な予測を行うようモデルに促すために、クラス優先アライメント(CPA)手法を導入する。広範な実験を通じて、PLFは現在の最先端の手法よりも優れており、CTTAにおいてその効果が証明されている。 Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.	翻訳日:2024-06-06 23:39:37 公開日:2024-06-03
# MoFormer:条件付き変圧器連成多モード核融合記述子に基づく多目的抗微生物ペプチド生成 MoFormer: Multi-objective Antimicrobial Peptide Generation Based on Conditional Transformer Joint Multi-modal Fusion Descriptor ( http://arxiv.org/abs/2406.02610v1 ) ライセンス: Link先を確認	Li Wang, Xiangzheng Fu, Jiahao Yang, Xinyi Zhang, Xiucai Ye, Yiping Liu, Tetsuya Sakurai, Xiangxiang Zeng,	(参考訳) 深層学習は、より望ましい性質を持つ既存のペプチドを最適化する大きな可能性を秘めている。いくつかの最適化された抗微生物ペプチド(AMP)生成法が最近出現したにもかかわらず、多目的最適化は依然として理想主義と現実主義のトレードオフにおいて非常に難しい。そこで我々は,AMPの多属性同時最適化のための多目的AMP合成パイプライン (MoFormer) を構築した。 MoFormer は高度に構造化された潜伏空間における AMP 配列の所望の属性を改善し, 条件制約と細粒度多記述子により誘導される。また,大規模モデルの微調整に基づくパレートに基づく非支配的ソートアルゴリズムとプロキシを用いて,候補を階層的にランク付けする。 1)分子シミュレーションとアミノ酸間の相互作用のスコアリングによるAMPの構造と機能の解析,(2)品質と分布特性の検証のための潜伏空間の可視化,デザイン制約のある多目的最適化AMPの有効な方法の検証,の2点から,MoFormerを用いた実質的な特性改善を実証した。 Deep learning holds a big promise for optimizing existing peptides with more desirable properties, a critical step towards accelerating new drug discovery. Despite the recent emergence of several optimized Antimicrobial peptides(AMP) generation methods, multi-objective optimizations remain still quite challenging for the idealism-realism tradeoff. Here, we establish a multi-objective AMP synthesis pipeline (MoFormer) for the simultaneous optimization of multi-attributes of AMPs. MoFormer improves the desired attributes of AMP sequences in a highly structured latent space, guided by conditional constraints and fine-grained multi-descriptor.We show that MoFormer outperforms existing methods in the generation task of enhanced antimicrobial activity and minimal hemolysis. We also utilize a Pareto-based non-dominated sorting algorithm and proxies based on large model fine-tuning to hierarchically rank the candidates. We demonstrate substantial property improvement using MoFormer from two perspectives: (1) employing molecular simulations and scoring interactions among amino acids to decipher the structure and functionality of AMPs; (2) visualizing latent space to examine the qualities and distribution features, verifying an effective means to facilitate multi-objective optimization AMPs with design constraints	翻訳日:2024-06-06 23:39:37 公開日:2024-06-03
# LOLA:コンテンツ実験のためのLLM支援オンライン学習アルゴリズム LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments ( http://arxiv.org/abs/2406.02611v1 ) ライセンス: Link先を確認	Zikun Ye, Hema Yoganarasimhan, Yufeng Zheng,	(参考訳) 急速に進化するデジタルコンテンツの世界では、メディア企業やニュース出版社は、ユーザーエンゲージメントを高めるための自動化された効率的な方法を必要としている。本稿では,LLM-Assisted Online Learning Algorithm (LOLA)を紹介し,Large Language Models (LLM) と適応実験を統合し,コンテンツ配信を最適化する新しいフレームワークを提案する。記事の内容に関連付けられた様々な見出しのパフォーマンスを評価するための17,681の見出しA/Bテストを含む、Upworthyから大規模データセットを活用することで、まず、プロンプトベースのメソッド、埋め込みベースの分類モデル、微調整されたオープンソースLCMの3つの幅広い純粋なLLMアプローチを調査する。以上の結果から,プロンプトベースアプローチの精度は65%に満たないことが示唆された。対照的に、OpenAI埋め込みベースの分類モデルと微調整のLlama-3-8bモデルは82～84%の精度を実現しているが、十分なトラフィックでの実験性能には達していない。次に,最適純粋LLM手法とアッパー信頼境界アルゴリズムを組み合わせたLOLAを導入し,トラフィックを適応的に割り当て,クリックを最大化する。 Upworthy データの数値実験により,LOLA は標準的な A/B テスト法 (Upworthy の現在の状態 quo ) ,純バンドビットアルゴリズム,純粋LLM アプローチ,特に実験トラフィックの制限や多数のアームのシナリオにおいて,優れた性能を示した。当社のアプローチは,デジタル広告やソーシャルメディアレコメンデーションなどのユーザエンゲージメントを最適化する,さまざまなディジタルセッティングのコンテンツ実験にも適用可能です。 In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which includes 17,681 headline A/B tests aimed at evaluating the performance of various headlines associated with the same article content, we first investigate three broad pure-LLM approaches: prompt-based methods, embedding-based classification models, and fine-tuned open-source LLMs. Our findings indicate that prompt-based approaches perform poorly, achieving no more than 65% accuracy in identifying the catchier headline among two options. In contrast, OpenAI-embedding-based classification models and fine-tuned Llama-3-8b models achieve comparable accuracy, around 82-84%, though still falling short of the performance of experimentation with sufficient traffic. We then introduce LOLA, which combines the best pure-LLM approach with the Upper Confidence Bound algorithm to adaptively allocate traffic and maximize clicks. Our numerical experiments on Upworthy data show that LOLA outperforms the standard A/B testing method (the current status quo at Upworthy), pure bandit algorithms, and pure-LLM approaches, particularly in scenarios with limited experimental traffic or numerous arms. Our approach is both scalable and broadly applicable to content experiments across a variety of digital settings where firms seek to optimize user engagement, including digital advertising and social media recommendations.	翻訳日:2024-06-06 23:39:37 公開日:2024-06-03
# データ評価は学習可能か、解釈可能か? Is Data Valuation Learnable and Interpretable? ( http://arxiv.org/abs/2406.02612v1 ) ライセンス: Link先を確認	Ou Wu, Weiyao Zhu, Mengyang Li,	(参考訳) 個々のサンプルの価値を測定することは、深層学習モデルのトレーニングなど、多くのデータ駆動タスクにおいて重要である。近年の文献では、データ評価手法の開発に多大な努力が注がれている。主要なデータ評価手法はゲーム理論のShapley値に基づいており、この経路に沿って様々な手法が提案されている。例えば、Shapleyの値に基づく評価には理論的な根拠があるが、完全に実験に基づくアプローチであり、これまでに評価モデルが構築されていない。さらに、現在のデータアセスメント手法は、データ価格などのアプリケーションにおいて、相互運用可能なデータアセスメント手法が非常に有用であるにもかかわらず、出力値の解釈可能性を無視している。この研究は、データバリュエーションは学習可能か、解釈可能か、という重要な疑問に答えることを目的としている。学習された評価モデルには、パラメータの固定数や知識再利用可能性など、いくつかの望ましいメリットがある。解釈不能なデータバリュエーションモデルは、なぜサンプルが価値あるのか、あるいは価値がないのかを説明することができる。この目的のために、2つの新しいデータ価値モデリングフレームワークを提案し、モデルトレーニングと解釈可能性のための特定のベースモデルとして、多層知覚~〜(MLP)と新しい回帰ツリーをそれぞれ利用した。ベンチマークデータセット上で大規模な実験が行われる。実験結果は、その質問に対して肯定的な答えを与える。 }本研究は,データ値の評価のための新たな技術パスを開く。大規模なデータバリュエーションモデルは、さまざまなデータ駆動タスクにまたがって構築することができ、データバリュエーションの広範な適用を促進することができる。 Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in developing data valuation methods. The primary data valuation methodology is based on the Shapley value from game theory, and various methods are proposed along this path. {Even though Shapley value-based valuation has solid theoretical basis, it is entirely an experiment-based approach and no valuation model has been constructed so far.} In addition, current data valuation methods ignore the interpretability of the output values, despite an interptable data valuation method is of great helpful for applications such as data pricing. This study aims to answer an important question: is data valuation learnable and interpretable? A learned valuation model have several desirable merits such as fixed number of parameters and knowledge reusability. An intrepretable data valuation model can explain why a sample is valuable or invaluable. To this end, two new data value modeling frameworks are proposed, in which a multi-layer perception~(MLP) and a new regression tree are utilized as specific base models for model training and interpretability, respectively. Extensive experiments are conducted on benchmark datasets. {The experimental results provide a positive answer for the question.} Our study opens up a new technical path for the assessing of data values. Large data valuation models can be built across many different data-driven tasks, which can promote the widespread application of data valuation.	翻訳日:2024-06-06 23:39:37 公開日:2024-06-03
# ACCO: 分散LLMトレーニングにおけるコミュニケーションを保ちながら蓄積する ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training ( http://arxiv.org/abs/2406.02613v1 ) ライセンス: Link先を確認	Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon,	(参考訳) 大規模言語モデル(LLM)のトレーニングは、複数のGPUを使用してモデルレプリカの確率勾配を並列に計算する分散実装に大きく依存している。しかし、データ並列設定における勾配の同期は、分散ワーカーの数の増加に伴って通信オーバーヘッドを増大させ、並列化の効率向上を妨げる可能性がある。この課題に対処するために、フェデレートラーニングで使用される局所最適化手法など、労働者間通信を減らす最適化アルゴリズムが登場した。通信オーバヘッドの最小化には有効であるが、これらの手法はスケーラビリティを損なうため、余分な運動量変数に加えて、複数のローカル最適化ステップ間の通信が許される場合、オプティマイザの状態はワーカ間で共有できない。これに対して,LLMの分散トレーニングに適したメモリ効率最適化アルゴリズムである$\textbf{AC}$cumulate while $\textbf{CO}$mmunicate ($\textt{ACCO}$。 $\texttt{ACCO}$は、ワーカー間でオプティマイザステートをシャーディングし、グラデーション計算と通信をオーバーラップして通信コストを隠蔽し、異種ハードウェアに対応する。本手法は、勾配計算と通信の並列実行に固有の1ステップ遅延を緩和し、ウォームアップステップを不要とし、標準分散最適化のトレーニングダイナミクスと整合し、ウォールクロック時間でより高速に収束する手法である。我々は、いくつかのLLMトレーニングおよび微調整タスクにおける$\texttt{ACCO}$の有効性を実証する。 Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge, optimization algorithms reducing inter-worker communication have emerged, such as local optimization methods used in Federated Learning. While effective in minimizing communication overhead, these methods incur significant memory costs, hindering scalability: in addition to extra momentum variables, if communications are only allowed between multiple local optimization steps, then the optimizer's states cannot be sharded among workers. In response, we propose $\textbf{AC}$cumulate while $\textbf{CO}$mmunicate ($\texttt{ACCO}$), a memory-efficient optimization algorithm tailored for distributed training of LLMs. $\texttt{ACCO}$ allows to shard optimizer states across workers, overlaps gradient computations and communications to conceal communication costs, and accommodates heterogeneous hardware. Our method relies on a novel technique to mitigate the one-step delay inherent in parallel execution of gradient computations and communications, eliminating the need for warmup steps and aligning with the training dynamics of standard distributed optimization while converging faster in terms of wall-clock time. We demonstrate the effectiveness of $\texttt{ACCO}$ on several LLMs training and fine-tuning tasks.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 都市間ファウショット交通予報のための周波数強化事前学習 Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting ( http://arxiv.org/abs/2406.02614v1 ) ライセンス: Link先を確認	Zhanyu Liu, Jianrong Ding, Guanjie Zheng,	(参考訳) インテリジェントトランスポーテーションシステム(ITS)の分野は、様々な下流アプリケーションを実現するために正確なトラフィック予測に依存している。しかし、開発途上国は、限られた資源と時代遅れのインフラのために、十分なトレーニングトラフィックデータを収集する上で、しばしば課題に直面している。この障害を認識して、都市間数発の予測という概念が実現可能なアプローチとして浮上した。従来の都市間数ショット予測手法では、都市間の周波数類似性は無視されていたが、都市間の周波数領域では、交通データがより類似していることが観察された。この事実に基づき、我々は \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross})を提案する。 FEPCrossは事前訓練段階と微調整段階を有する。事前学習段階において,時間・周波数領域の情報を含むクロスドメイン空間・テンポラルエンコーダを提案する。微調整の段階では、トレーニングサンプルを豊かにし、モーメント更新されたグラフ構造を維持するモジュールを設計し、これにより、数ショットのトレーニングデータに過度に適合するリスクを軽減する。実世界の交通データセット上で実施された実証的な評価は、FEPCrossの異常な有効性を検証し、多様なカテゴリの既存アプローチを上回り、都市間数ショット予測の進行を促進する特性を示す。 The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While previous cross-city few-shot forecasting methods ignore the frequency similarity between cities, we have made an observation that the traffic data is more similar in the frequency domain between cities. Based on this fact, we propose a \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross}). FEPCross has a pre-training stage and a fine-tuning stage. In the pre-training stage, we propose a novel Cross-Domain Spatial-Temporal Encoder that incorporates the information of the time and frequency domain and trains it with self-supervised tasks encompassing reconstruction and contrastive objectives. In the fine-tuning stage, we design modules to enrich training samples and maintain a momentum-updated graph structure, thereby mitigating the risk of overfitting to the few-shot training data. Empirical evaluations performed on real-world traffic datasets validate the exceptional efficacy of FEPCross, outperforming existing approaches of diverse categories and demonstrating characteristics that foster the progress of cross-city few-shot forecasting.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 非パラメトリックな測地に対する低次モデリングとグラフニューラルネットワークのハイブリッド数値解法結合:構造力学問題への応用 A hybrid numerical methodology coupling Reduced Order Modeling and Graph Neural Networks for non-parametric geometries: applications to structural dynamics problems ( http://arxiv.org/abs/2406.02615v1 ) ライセンス: Link先を確認	Victor Matray, Faisal Amlani, Frédéric Feyel, David Néron,	(参考訳) 本研究は、複雑な物理系を管理する時間領域偏微分方程式(PDE)の数値解析を高速化するための新しいアプローチを導入する。この手法は、古典的な低次モデリング(ROM)フレームワークと最近導入されたグラフニューラルネットワーク(GNN)の組み合わせに基づいている。提案手法は非パラメトリックなジオメトリに特に適しており、最終的には多様なジオメトリやトポロジーを扱えることが示されている。航空機の座席の設計およびそれに対応する衝撃に対する機械的応答に関する応用文脈において,性能研究は計算負荷を低減し,非パラメトリックな測地を伴わない問題に対する迅速な設計イテレーションを可能にすることが主な動機である。提案手法は, 有限要素に基づく数値シミュレーションを多数必要とする他の科学的・工学的な問題にも適用可能である。 This work introduces a new approach for accelerating the numerical analysis of time-domain partial differential equations (PDEs) governing complex physical systems. The methodology is based on a combination of a classical reduced-order modeling (ROM) framework and recently-introduced Graph Neural Networks (GNNs), where the latter is trained on highly heterogeneous databases of varying numerical discretization sizes. The proposed techniques are shown to be particularly suitable for non-parametric geometries, ultimately enabling the treatment of a diverse range of geometries and topologies. Performance studies are presented in an application context related to the design of aircraft seats and their corresponding mechanical responses to shocks, where the main motivation is to reduce the computational burden and enable the rapid design iteration for such problems that entail non-parametric geometries. The methods proposed here are straightforwardly applicable to other scientific or engineering problems requiring a large number of finite element-based numerical simulations, with the potential to significantly enhance efficiency while maintaining reasonable accuracy.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# エッジコンピューティングにおける無線LLM推論のための適応層分割:モデルに基づく強化学習アプローチ Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach ( http://arxiv.org/abs/2406.02616v1 ) ライセンス: Link先を確認	Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, Honggang Zhang,	(参考訳) エッジコンピューティング環境における大規模言語モデル(LLM)のデプロイの最適化は、プライバシと計算効率の向上に不可欠である。本研究は,エッジコンピューティングにおける効率的な無線LLM推論に向けて,主要なオープンソースLLMにおける分割点の影響を包括的に分析する。そこで本研究では,モデルベース強化学習(MBRL)からインスピレーションを得て,エッジとユーザ機器(UE)間の最適分割点を決定するフレームワークを提案する。報酬代理モデルを導入することで、頻繁な性能評価の計算コストを大幅に削減できる。大規模シミュレーションにより, この手法は, 異なるネットワーク条件下での推論性能と計算負荷のバランスを効果的に保ち, 分散環境におけるLLM配置の堅牢なソリューションを提供することを示した。 Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 免疫組織学のために自動生成された巨大免疫細胞データベースImmunocto Immunocto: a massive immune cell database auto-generated for histopathology ( http://arxiv.org/abs/2406.02618v1 ) ライセンス: Link先を確認	Mikaël Simard, Zhuoyan Shen, Maria A. Hawkins, Charles-Antoine Collins-Fekete,	(参考訳) 免疫療法などの新しいがん治療オプションの出現に伴い、腫瘍免疫マイクロ環境の研究は予後を知らせ、治療薬に対する反応を理解するために重要である。腫瘍免疫マイクロ環境を特徴付けるための重要なアプローチは、(1)ヘマトキシリンとエオシン(H&E)染色組織断面のデジタル化による高分解能光学像と(2)自動免疫細胞検出および分類法を組み合わせることである。しかし、デジタル病理学における現在の個別免疫細胞分類モデルでは、比較的性能が劣っている。これは主に、現在利用可能な個々の免疫細胞のデータセットが限られているためであり、これは、デジタル化されたH&E全スライド画像に免疫細胞を手動で注釈付けするという、時間と難しい問題の結果である。そこで本研究では,CD4$^+$T細胞リンパ球,CD8$^+$T細胞リンパ球,B細胞リンパ球,マクロファージの4つのサブタイプにまたがる2,282,818個の免疫細胞を含む,6,848,454個のヒト細胞の自動生成データベースであるImmomoctoを紹介する。それぞれのセルに対して、64$\times$64ピクセルのH&Eイメージを$\mathbf{40}\times$倍率で提供し、核とラベルのバイナリマスクを提供します。 Imctoを作成するために、オープンソースモデルとデータを組み合わせて、輪郭やラベルを自動生成しました。これらの細胞は、一致したH&EおよびOrionプラットフォームから免疫蛍光性大腸癌データセットから取得され、Segment Anything Modelを用いて輪郭が取得される。免疫組織からのH&E画像に基づいて訓練された分類器は、平均的なF1スコア0.74を生成し、4つの免疫細胞サブタイプや他の細胞を区別する。 Immunocto は https://zenodo.org/uploads/11073373 でダウンロードできる。 With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 暗号変換器回路を用いた言語モデルにおける未知のバックドア Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits ( http://arxiv.org/abs/2406.02619v1 ) ライセンス: Link先を確認	Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, Charlie Rogers-Smith, Jeffrey Ladish, Christian Schroeder de Witt,	(参考訳) オープンソース言語モデルの急速な普及は、下流のバックドア攻撃のリスクを著しく高める。これらのバックドアは、モデル展開中に危険な振る舞いを導入し、従来のサイバーセキュリティ監視システムによる検出を回避することができる。本稿では,従来の技術とは対照的に,自己回帰型トランスフォーマーモデルにおけるバックドアの新たなクラスについて紹介する。無効性は、ディフェンダーがバックドアをトリガーすることを防ぐため、完全なホワイトボックスアクセスを与えられたり、レッドチームや特定の形式的な検証方法のような自動化技術を使用したりしても、デプロイ前に評価や検出が不可能になる。我々は, 暗号技術を用いることで, 新規な構築が不必要であるだけでなく, 良好な堅牢性を有することを示す。これらの特性を実証的な調査で確認し、我々のバックドアが最先端の緩和戦略に耐えられることを示す。さらに、ホワイトボックス設定で完全に検出できないような普遍的なバックドアは、既存の設計よりも検出が難しいことを示して、これまでの作業を拡張しました。本稿では, トランスモデルへのバックドアのシームレスな統合の実現可能性を示すことによって, プリデプロイ検出戦略の有効性を根本的に疑問視する。これにより、AIの安全性とセキュリティにおける犯罪と防御のバランスに関する新たな洞察が得られる。 The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 大規模言語モデルの保護: 調査 Safeguarding Large Language Models: A Survey ( http://arxiv.org/abs/2406.02622v1 ) ライセンス: Link先を確認	Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang,	(参考訳) 大規模言語モデル (LLMs) の急成長する分野において、堅牢な安全メカニズムを開発する「安全ガード (safeguards)」あるいは「ガードレール (guardrails)」は、指定された境界内でのLLMの倫理的使用を保証するために必須となっている。本稿は、この重要なメカニズムの現状について、体系的な文献レビューを提供する。その主な課題と、様々な文脈における倫理的問題を扱う包括的なメカニズムにどのように拡張できるかを論じる。まず、主要なLCMサービスプロバイダとオープンソースコミュニティが採用している保護メカニズムの現在の状況を明らかにする。続いて、幻覚、公正性、プライバシーなど、ガードレールが強制したいと思われるいくつかの(望ましくない)プロパティを評価し、分析し、拡張するテクニックが続く。これらに基づいて、これらの制御(すなわち攻撃)を回避し、攻撃を防御し、ガードレールを補強する手法をレビューする。上記の技術は現状や研究動向を反映しているが,本手法では容易に対処できないいくつかの課題についても論じるとともに,多分野的アプローチ,ニューラルシンボリック手法,システム開発ライフサイクルの完全な検討を通じて,包括的ガードレールの実装方法に関するビジョンを提示する。 In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# Linuxカーネルの爆発におけるページスプレーの理解 Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation ( http://arxiv.org/abs/2406.02624v1 ) ライセンス: Link先を確認	Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing,	(参考訳) 近年,カーネル脆弱性に対するページレベルのエクスプロイトに着目したPage Sprayと呼ばれる新しい手法が登場している。エクスプロイラビリティ、安定性、互換性の面では利点があるが、Page Sprayに関する包括的な研究は依然として乏しい。その根本原因、搾取モデル、他の搾取技術に対する比較利益、および潜在的緩和戦略に関する質問は、ほとんど答えられていない。本稿では,本手法の詳細な理解を提供するため,Page Sprayの系統的な検討を行う。我々は、その基本原理を解明し、Shasysモデルと呼ばれる包括的なエクスプロイトモデルを導入する。さらに、Linuxカーネル内でのPage Spray発生の原因となる根本原因を徹底的に分析する。我々は,Page Spray解析モデルに基づく解析器を設計し,Page Sprayの呼び出し元を同定する。次に, 微妙に設計した実験により, ページスプレーの安定性, 利用性, 適合性を評価する。最後に,Page Sprayに対処するための緩和原則を提案し,軽量化アプローチを提案する。この研究は、セキュリティ研究者や開発者がPage Sprayに関する洞察を得るのを支援することを目的としており、最終的に、この新たなエクスプロイト技術に対する我々の集団的理解を高め、コミュニティの改善を図っている。 Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# プログレッシブ推論:中間予測を用いたデコーダオンリーシーケンス分類モデルの説明 Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions ( http://arxiv.org/abs/2406.02625v1 ) ライセンス: Link先を確認	Sanjay Kariyappa, Freddy Lécué, Saumitra Mishra, Christopher Pond, Daniele Magazzeni, Manuela Veloso,	(参考訳) 本稿では、デコーダのみのシーケンス分類モデルの予測を説明するために、入力属性を計算するためのフレームワークであるプログレッシブ推論を提案する。本研究は、デコーダのみのトランスフォーマーモデルの分類ヘッドを用いて、入力シーケンスの異なる点で評価することで中間予測を行うことができるという知見に基づいている。因果的注意機構のため、これらの中間予測は推論点の前のトークンにのみ依存し、マスク付き入力サブシーケンス上でモデルの予測を得ることができ、計算上のオーバーヘッドは無視できる。この知見を用いてサブシーケンスレベルの属性を提供する2つの方法を開発した。まず,連続する中間予測の差を捉えて属性を計算するシングルパスプログレッシブ推論(Single Pass-Progressive Inference,SP-PI)を提案する。次に、Kernel SHAPとの接続を利用して、MP-PI(Multiple Pass-Progressive Inference)を開発する。 MP-PIは、複数のマスク付きバージョンの入力から中間予測を使用して、より高い品質の属性を計算する。テキスト分類タスクを訓練した多種多様なモデルについて検討したところ,SP-PIとMP-PIは,従来の作業に比べて有意に優れた属性を提供することがわかった。 This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechanism, these intermediate predictions only depend on the tokens seen before the inference point, allowing us to obtain the model's prediction on a masked input sub-sequence, with negligible computational overheads. We develop two methods to provide sub-sequence level attributions using this insight. First, we propose Single Pass-Progressive Inference (SP-PI), which computes attributions by taking the difference between consecutive intermediate predictions. Second, we exploit a connection with Kernel SHAP to develop Multi Pass-Progressive Inference (MP-PI). MP-PI uses intermediate predictions from multiple masked versions of the input to compute higher quality attributions. Our studies on a diverse set of models trained on text classification tasks show that SP-PI and MP-PI provide significantly better attributions compared to prior work.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# ディープラーニングを用いたMRI再構成のための最適化アルゴリズムの概要 A Brief Overview of Optimization-Based Algorithms for MRI Reconstruction Using Deep Learning ( http://arxiv.org/abs/2406.02626v1 ) ライセンス: Link先を確認	Wanyu Bian,	(参考訳) 磁気共鳴イメージング(MRI)はその例外的な軟組織コントラストと高い空間分解能で知られており、医用画像において重要なツールである。ディープラーニングアルゴリズムの統合は、MRI再構成プロセスを最適化する大きな可能性を秘めている。この領域における研究の活発化にもかかわらず、MRI再構成に適した最適化に基づくディープラーニングモデルに関する総合的な調査はまだ行われていない。本稿では,MRI再構成に特化して設計されたディープラーニングにおいて,最新の最適化アルゴリズムを徹底的に検討することにより,このギャップに対処する。本研究の目的は、MRIコミュニティ内でのさらなるイノベーションと応用を促進するために、これらの進歩を研究者に詳細に理解することである。 Magnetic resonance imaging (MRI) is renowned for its exceptional soft tissue contrast and high spatial resolution, making it a pivotal tool in medical imaging. The integration of deep learning algorithms offers significant potential for optimizing MRI reconstruction processes. Despite the growing body of research in this area, a comprehensive survey of optimization-based deep learning models tailored for MRI reconstruction has yet to be conducted. This review addresses this gap by presenting a thorough examination of the latest optimization-based algorithms in deep learning specifically designed for MRI reconstruction. The goal of this paper is to provide researchers with a detailed understanding of these advancements, facilitating further innovation and application within the MRI community.	翻訳日:2024-06-06 23:29:51 公開日:2024-06-03
# 平均アンサンブルを超える - サブシーズン予測のための気候モデルアンサンブルの活用 Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting ( http://arxiv.org/abs/2211.15856v4 ) ライセンス: Link先を確認	Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin A. Cash, Rebecca Willett,	(参考訳) 温暖化や降水などの重要な気候変数の季節下時間スケールにおける高品質な予測は、長年にわたって運用上の予測のギャップであった。本研究では,機械学習モデル(ML)を時系列予測のための後処理ツールとして応用することを検討した。大陸アメリカにおける月平均降水量と2週間前の2週間の気温を予測するために、タグ付き数値アンサンブル予測(すなわち、メンバーが初期化日が異なるアンサンブル)と観測データ(相対湿度、海面圧力、測地高度など)をMLの様々な手法に組み込む。回帰、量子レグレッション、およびtercile分類タスクでは、線形モデル、ランダムフォレスト、畳み込みニューラルネットワーク、および積み重ねモデル(個々のMLモデルの予測に基づくマルチモデルアプローチ)を用いて検討する。アンサンブルを単独で使用する従来のMLアプローチとは異なり、アンサンブル予測に埋め込まれた情報を活用して予測精度を向上させる。さらに,計画や緩和に不可欠な極端な事象予測についても検討する。アンサンブルメンバーを空間予測の集合として考慮し、空間情報を用いた様々なアプローチを探求する。異なるアプローチ間のトレードオフは、モデルの積み重ねによって緩和される可能性がある。提案手法は,気候予報やアンサンブル手段などの標準基準よりも優れている。さらに,全アンサンブルを用いた場合とアンサンブル平均のみを用いた場合のトレードオフ,空間的変動を考慮した説明方法の相違について検討した。 Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.	翻訳日:2024-06-06 16:52:40 公開日:2024-06-03
# 情報理論を用いた目的関数の選択法 How to select an objective function using information theory ( http://arxiv.org/abs/2212.06566v4 ) ライセンス: Link先を確認	Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall,	(参考訳) 機械学習や科学計算では、モデル性能は客観的関数で測定される。しかし、なぜ別の目的を選ぶのか? 情報理論は1つの答えを与える: モデルの情報を最大限にするために、最少ビットにおけるエラーを表す目的関数を選択する。異なる目的を評価するために、これらを可能性関数に変換する。可能性として、それらの相対的な大きさは、ある目的が他の目標よりもどれだけ強く、その関係のログはビット長の違いと不確実性の違いを表す。言い換えれば、どちらの目的も不確実性を最小化する。情報理論のパラダイムの下では、最終的な目的は、特定のユーティリティとは対照的に、情報の最大化(および不確実性の最小化)である。このパラダイムは、気候変動の影響を理解するために使用される大規模な地球システムモデルのように、多くの用途を持ち、明確な実用性を持たないモデルに適している、と我々は主張する。 In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.	翻訳日:2024-06-06 16:52:40 公開日:2024-06-03
# OpenAPI Specification Extended Security Scheme:Broken Object Level Authorizationの頻度を下げる方法 OpenAPI Specification Extended Security Scheme: A method to reduce the prevalence of Broken Object Level Authorization ( http://arxiv.org/abs/2212.06606v3 ) ライセンス: Link先を確認	Rami Haddad, Rim El Malki, Daniel Cozma,	(参考訳) APIは、サービス間通信を達成するための重要な技術になっています。 APIデプロイメントの増加により、セキュリティ標準の欠如に対処する緊急性が高まっている。 API Securityは、OpenAPI標準の標準化された認証がないため、不適切な認証は、既知の脆弱性や未知の脆弱性の可能性を開く。本稿は,API Security: Broken Object Level Authorization (BOLA) における第1の脆弱性について検討し,この脆弱性の頻度を下げるための方法とツールを提案する。 BOLAはさまざまなAPIフレームワークに影響を与えており、私たちのスコープはOpenAPI Specification(OAS)に固定されています。 OASはAPIの記述と実装の標準であり、一般的なOAS実装はFastAPI、Connexion(Flask)などである。これらの実装には、OASsのAPIプロパティに関する知識に関連する長所と短所がある。 Open API Specificationsのセキュリティプロパティは、オブジェクト認証に対処せず、そのようなオブジェクトプロパティを定義するための標準化されたアプローチを提供しない。これにより、オブジェクトレベルのセキュリティは開発者の慈悲に委ねられ、意図しない攻撃ベクタ生成のリスクが増大する。私たちの目標は、この空白に挑戦することです。 1) OAS ESS(OpenAPI Specification Extended Security Scheme)には、OAS(Design-based approach)内のオブジェクトに対する宣言型セキュリティ制御が含まれている。 2) APIサービス(Flask/FastAPI)にインポートして、オブジェクトレベルで認証チェックを実行することができる認証モジュール(開発ベースのアプローチ)。 APIサービスを構築する場合、開発者はAPI設計(仕様)またはそのコードから始めることができる。どちらの場合も、BOLAの頻度を緩和し、削減するために一連のメカニズムが導入される。 APIs have become the prominent technology of choice for achieving inter-service communications. The growth of API deployments has driven the urgency in addressing its lack of security standards. API Security is a topic for concern given the absence of standardized authorization in the OpenAPI standard, improper authorization opens the possibility for known and unknown vulnerabilities, which in the past years have been exploited by malicious actors resulting in data loss. This paper examines the number one vulnerability in API Security: Broken Object Level Authorization(BOLA), and proposes methods and tools to reduce the prevalence of this vulnerability. BOLA affects various API frameworks, our scope is fixated on the OpenAPI Specification(OAS). The OAS is a standard for describing and implementing APIs; popular OAS Implementations are FastAPI, Connexion (Flask), and many more. These implementations carry the pros and cons that are associated with the OASs knowledge of API properties. The Open API Specifications security properties do not address object authorization and provide no standardized approach to define such object properties. This leaves object-level security at the mercy of developers, which presents an increased risk of unintentionally creating attack vectors. Our aim is to tackle this void by introducing 1) the OAS ESS (OpenAPI Specification Extended Security Scheme) which includes declarative security controls for objects in OAS (design-based approach), and 2) an authorization module that can be imported to API services (Flask/FastAPI) to enforce authorization checks at the object level (development-based approach). When building an API service, a developer can start with the API design (specification) or its code. In both cases, a set of mechanisms are introduced to help developers mitigate and reduce the prevalence of BOLA.	翻訳日:2024-06-06 16:52:40 公開日:2024-06-03
# テキスト・ツー・イメージ・ジェネレータを用いたインターベンショナルデータ拡張に向けて Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators ( http://arxiv.org/abs/2212.11237v4 ) ライセンス: Link先を確認	Jianhao Yuan, Francesco Pinto, Adam Davies, Philip Torr,	(参考訳) ニューラルイメージ分類器は、トレーニングデータと異なる環境条件からサンプリングされた入力に曝されると、深刻な性能劣化が起こることが知られている。近年のテキスト・トゥ・イメージ・ジェネレーション(T2I)の進展を考えると、近年のT2Iジェネレータは、トレーニングデータを強化し、下流分類器の堅牢性を向上させるために、こうした環境要因に対する任意の介入をシミュレートするためにどのように使用できるのかという疑問がある。我々は、単一ドメイン一般化(SDG)におけるベンチマークの多種多様なコレクションを実験し、介入プロンプト戦略、条件付け機構、ポストホックフィルタリングを含む、T2I生成の重要な次元にまたがるスプリアス特徴(RRSF)への依存を減らした。我々の広範な実証実験により、Stable Diffusionのような現代のT2Iジェネレータは、それぞれの寸法がどう構成されているかに関わらず、従来の最先端のデータ拡張技術よりも優れた、強力な介入データ拡張メカニズムとして実際に使用できることが示された。 Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that are sampled from environmental conditions that differ from their training data. Given the recent progress in Text-to-Image (T2I) generation, a natural question is how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors in order to augment training data and improve the robustness of downstream classifiers. We experiment across a diverse collection of benchmarks in single domain generalization (SDG) and reducing reliance on spurious features (RRSF), ablating across key dimensions of T2I generation, including interventional prompting strategies, conditioning mechanisms, and post-hoc filtering. Our extensive empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism, outperforming previously state-of-the-art data augmentation techniques regardless of how each dimension is configured.	翻訳日:2024-06-06 14:46:08 公開日:2024-06-03
# 精密健康におけるクラウドソーシングとヒューマン・イン・ザ・ループワークフローの展望 A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health ( http://arxiv.org/abs/2303.03578v2 ) ライセンス: Link先を確認	Peter Washington,	(参考訳) 現代の機械学習アプローチは、様々な健康状態に対するパフォーマンス診断モデルにつながっている。決定木やディープニューラルネットワークなど、いくつかの機械学習アプローチは、原則として、任意の関数を近似することができる。しかし、入力データが不均一で高次元であり、出力クラスが非常に非線形である場合に、過度に適合する傾向が拡大されるため、このパワーはギフトと呪いの両方と見なすことができる。この問題は、特に主観的基準で診断される行動や精神状態を予測する診断システムに悩まされる可能性がある。この問題に対する新たな解決策はクラウドソーシング(クラウドソーシング)であり、クラウドワーカーは金銭的補償やゲーミフィケーション体験の見返りに複雑な行動特徴に注釈を付けるために支払われる。これらのラベルは、直接または診断機械学習モデルへの入力としてラベルを使用することによって、診断を導出するために使用することができる。この視点では、この新興分野における既存の研究について述べ、新たな研究分野であるクラウドパワー診断システムにおける現在進行中の課題と機会について論じる。正しい考慮により、複雑でニュアンスのある健康状態の予測のために、人為的な機械学習ワークフローにクラウドソーシングを追加することで、スクリーニング、診断、最終的にケアへのアクセスを加速することができる。 Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.	翻訳日:2024-06-06 14:46:07 公開日:2024-06-03
# MAWSEO: 不正なオンラインプロモーションのための逆ウィキ検索 MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion ( http://arxiv.org/abs/2304.11300v3 ) ライセンス: Link先を確認	Zilong Lin, Zhengyi Li, Xiaojing Liao, XiaoFeng Wang, Xiaozhong Liu,	(参考訳) Wiki検索中毒(Wiki search poisoning for illicit promotion)は、ウィキ記事の編集と、関連するクエリのWiki検索結果による不正なビジネスの促進を目的としたサイバー犯罪である。本稿では,Wiki上のステルスブラックハットSEOが自動化可能であることを示す研究を報告する。我々の技術はMAWSEOと呼ばれ、現実のサイバー犯罪の目的を達成するために、ランクアップ、破壊的検出回避、トピック関連性、セマンティック一貫性、プロモーションコンテンツのユーザ認識(警告はしない)など、敵対的な修正を用いています。評価とユーザスタディにより、MAWSEOは、最先端のWiki破壊検知器をバイパスし、アラームを発生させることなく、Wikiユーザーにプロモーションコンテンツを届けることのできる、敵の破壊的編集を効果的かつ効率的に生成できることが示されている。さらに, ウィキエコシステムにおける攻撃に対するコヒーレンスに基づく検出および破壊行為検出の敵意訓練を含む潜在的防御について検討した。 As a prominent instance of vandalism edits, Wiki search poisoning for illicit promotion is a cybercrime in which the adversary aims at editing Wiki articles to promote illicit businesses through Wiki search results of relevant queries. In this paper, we report a study that, for the first time, shows that such stealthy blackhat SEO on Wiki can be automated. Our technique, called MAWSEO, employs adversarial revisions to achieve real-world cybercriminal objectives, including rank boosting, vandalism detection evasion, topic relevancy, semantic consistency, user awareness (but not alarming) of promotional content, etc. Our evaluation and user study demonstrate that MAWSEO is capable of effectively and efficiently generating adversarial vandalism edits, which can bypass state-of-the-art built-in Wiki vandalism detectors, and also get promotional content through to Wiki users without triggering their alarms. In addition, we investigated potential defense, including coherence based detection and adversarial training of vandalism detection, against our attack in the Wiki ecosystem.	翻訳日:2024-06-06 14:36:23 公開日:2024-06-03
# SciMON:新奇性に最適化された科学的な吸気装置 SciMON: Scientific Inspiration Machines Optimized for Novelty ( http://arxiv.org/abs/2305.14259v7 ) ライセンス: Link先を確認	Qingyun Wang, Doug Downey, Heng Ji, Tom Hope,	(参考訳) 文献に基づく新たな科学的方向を生成するために,ニューラルランゲージモデルを探索し,拡張する。文献に基づく仮説生成の研究は伝統的に、仮説の表現性を制限する二進的リンク予測に焦点を当ててきた。この一連の作品は、新規性を最適化することにも焦点をあてていない。我々は、入力背景コンテキスト(例えば、問題、実験的な設定、目標)としてモデルを使い、文学に根ざした自然言語のアイデアを出力する、新しい設定で劇的な出発点を取ります。本稿では,過去の科学的論文から「吸入」を抽出し,先行論文と反復的に比較し,十分な新規性が達成されるまでアイデア提案を更新することによって,新規性のために明示的に最適化するモデリングフレームワークであるSciMONについて述べる。包括的評価の結果,GPT-4は全体的に低い技術深度と新規性を持つアイデアを産み出す傾向にあることがわかった。我々の研究は、科学文献から生まれた新しいアイデアを生み出す言語モデルの評価と開発に向けた第一歩である。 We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and developing language models that generate new ideas derived from the scientific literature	翻訳日:2024-06-06 14:36:23 公開日:2024-06-03
# ロバストなデータ駆動型規範性最適化 Robust Data-driven Prescriptiveness Optimization ( http://arxiv.org/abs/2306.05937v2 ) ライセンス: Link先を確認	Mehran Poursoltani, Erick Delage, Angelos Georghiou,	(参考訳) データの豊富さは、利用可能なサイド情報を活用してより予測的な決定を下そうとする、さまざまな最適化手法の出現につながっている。応用の幅広い方法や文脈は、規範性の係数として知られる普遍的な単位のないパフォーマンス尺度の設計を動機付けている。この係数は、参照情報と比較して文脈決定の質と、サイド情報の規範的パワーの両方を定量化するように設計された。データ駆動型コンテキストにおいて前者を最大化するポリシーを特定するために,古典的経験的リスク最小化の目的に代えて規範性の係数が代わる分布的ロバストな文脈最適化モデルを提案する。分布のあいまいさ集合が適切なネスト形式と多面体構造を持つ場合、一連の線形プログラムを解くことに依存する、このモデルを解くための分岐アルゴリズムを提案する。文脈的最短経路問題について検討し、アウト・オブ・サンプルデータセットが様々な分布シフトを受ける場合の代替手法に対する結果のロバスト性を評価する。 The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.	翻訳日:2024-06-06 14:26:34 公開日:2024-06-03
# CompanyKG: 企業類似性定量化のための大規模不均一グラフ CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification ( http://arxiv.org/abs/2306.10649v3 ) ライセンス: Link先を確認	Lele Cao, Vilhelm von Ehrenheim, Mark Granroth-Wilding, Richard Anselmo Stahl, Andrew McCornack, Armin Catovic, Dhiana Deva Cavacanti Rocha,	(参考訳) 投資業界では、市場マッピング、競合分析、合併・買収など、さまざまな目的のために、きめ細かい会社の類似度定量化を実施することが不可欠であることが多い。我々は,企業の特徴や関係を多様に表現し,学習するための知識グラフである企業KGを提案し,公開する。具体的には、117万の企業が企業記述の埋め込みに富んだノードとして表現され、15の異なる企業間関係によって51.06百万のエッジが生成される。企業類似度定量化のための手法を総合的に評価するために, 類似度予測, 競合検索, 類似度ランキングという, 注釈付きテストセットを用いた3つの評価タスクを考案し, コンパイルした。本稿では,11個の再現可能な予測手法について,ノードのみ,エッジのみ,ノード+エッジの3つのグループに分類したベンチマーク結果を示す。私たちの知る限りでは、企業間類似性を定量化するのに適した、実世界の投資プラットフォームから派生した、最初の大規模な異種グラフデータセットである。 In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.	翻訳日:2024-06-06 14:26:34 公開日:2024-06-03
# 分類における部分的バイアスの補正 Correcting Underrepresentation and Intersectional Bias for Classification ( http://arxiv.org/abs/2306.11112v4 ) ライセンス: Link先を確認	Emily Diana, Alexander Williams Tolbert,	(参考訳) 偏見バイアスによって劣化したデータから学習することの問題点を考察し, 正の例を, 一定数のセンシティブなグループに対して異なる未知のレートでフィルタする。交叉群のメンバーシップが各交叉率を計算不能にするような設定であっても,少数の偏りのないデータを用いてグループ単位のドロップアウト率を効率的に推定できることが示される。これらの推定値を用いて、偏りのあるサンプル上で経験的誤差のみを観測しても、真の分布上の仮説の損失を近似できる再重み付け方式を構築する。そこで本研究では,この学習過程と再加重過程を包括するアルゴリズムを提案する。最後に,表現不足と交叉バイアス設定に対するPAC学習可能性の概念を定義し,このアルゴリズムが有限VC次元のモデルクラスに対して効率的な学習を可能にすることを示す。 We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show that our algorithm permits efficient learning for model classes of finite VC dimension.	翻訳日:2024-06-06 14:26:34 公開日:2024-06-03
# 2層ReLUニューラルネットワークによる確率的マルチタスク表現学習 Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks ( http://arxiv.org/abs/2307.06887v4 ) ライセンス: Link先を確認	Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai,	(参考訳) ますます人気のある機械学習パラダイムは、多くのタスクでニューラルネットワーク(NN)をオフラインで事前トレーニングし、ダウンストリームタスクに適応させることである。このアプローチは様々な文脈において強力なダウンストリーム性能をもたらし、マルチタスク事前学習が効果的な特徴学習につながることを示す。最近のいくつかの理論的研究は、浅いNNはいずれかが有意義な特徴を学習していることを示している。 i) 単調なタスクで訓練されるか (ii)これらは線型であり、非線型NNが複数のタスクで訓練された場合についてはほとんど知られていない。本研究では,複数タスクにおける非線形モデルを用いたトレーニング中に特徴学習が発生することを示す最初の結果を示す。私たちのキーとなる洞察は、マルチタスク事前トレーニングは、通常タスク間で同じラベルを持つポイントを整列する表現を好む擬似コントラスト的損失を誘導するということです。この結果から,2層 ReLU NN 上の単純な勾配に基づくマルチタスク学習アルゴリズムにより,データを$d\gg r$-dimensional 入力空間内の$r$-dimensional 部分空間に投影した値に依存したラベル付きバイナリ分類タスクが,このプロジェクションを復元し,サンプルとニューロンの複雑さを$d$と独立にダウンストリームタスクに一般化できることが示唆された。対照的に、1つのタスクの引き分けよりも高い確率で、この1つのタスクのトレーニングは、すべての$r$グランドトルース機能を学ぶことを保証できない。 An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a {\em single} task or (ii) they are {\em linear}, very little is known about the closer-to-practice case of {\em nonlinear} NNs trained on {\em multiple} tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an $r$-dimensional subspace within the $d\gg r$-dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of $d$. In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all $r$ ground-truth features.	翻訳日:2024-06-06 14:26:34 公開日:2024-06-03
# シフト雑音をもつ分布ロバスト変動量子アルゴリズム Distributionally Robust Variational Quantum Algorithms with Shifted Noise ( http://arxiv.org/abs/2308.14935v2 ) ライセンス: Link先を確認	Zichang He, Bo Peng, Yuri Alexeev, Zheng Zhang,	(参考訳) 短期的な量子優位性を示す可能性を考えると、変分量子アルゴリズム(VQA)は広く研究されている。 VQAパラメータ最適化のための多くの技術が開発されているが、依然として大きな課題である。現実的な問題は、量子ノイズは非常に不安定であり、したがってリアルタイムに変化する可能性が高いことである。これは、最適化されたVQAアンザッツが異なるノイズ環境下では効果的に動作しないため、重要な問題となる。本稿では,VQAパラメータを未知のシフトノイズに対して頑健に最適化する方法を初めて検討する。ノイズレベルを未知の確率密度関数を持つ確率変数(PDF)としてモデル化し、不確実性セット内でPDFがシフトする可能性があると仮定する。この仮定は、シフトノイズの下で有効性を維持するパラメータを見つけることを目的として、分布的に堅牢な最適化問題を定式化することを促す。我々は,分布的に頑健なベイズ最適化問題を定式化するために利用する。このことは、量子近似最適化アルゴリズム(QAOA)とハードウェア効率のアンサッツを持つ変分量子固有解器(VQE)の両方で数値的な証拠を提供し、シフトノイズ下でより堅牢に実行されるパラメータを特定できることを示唆している。本研究は,パラメータ最適化の観点からのシフトノイズの影響を受け,VQAの信頼性向上に向けた第一歩とみなす。 Given their potential to demonstrate near-term quantum advantage, variational quantum algorithms (VQAs) have been extensively studied. Although numerous techniques have been developed for VQA parameter optimization, it remains a significant challenge. A practical issue is that quantum noise is highly unstable and thus it is likely to shift in real time. This presents a critical problem as an optimized VQA ansatz may not perform effectively under a different noise environment. For the first time, we explore how to optimize VQA parameters to be robust against unknown shifted noise. We model the noise level as a random variable with an unknown probability density function (PDF), and we assume that the PDF may shift within an uncertainty set. This assumption guides us to formulate a distributionally robust optimization problem, with the goal of finding parameters that maintain effectiveness under shifted noise. We utilize a distributionally robust Bayesian optimization solver for our proposed formulation. This provides numerical evidence in both the Quantum Approximate Optimization Algorithm (QAOA) and the Variational Quantum Eigensolver (VQE) with hardware-efficient ansatz, indicating that we can identify parameters that perform more robustly under shifted noise. We regard this work as the first step towards improving the reliability of VQAs influenced by shifted noise from the parameter optimization perspective.	翻訳日:2024-06-06 14:16:48 公開日:2024-06-03
# 因果的基礎モデルに向けて:因果的推論と注意の二重性について Towards Causal Foundation Model: on Duality between Causal Inference and Attention ( http://arxiv.org/abs/2310.00809v3 ) ライセンス: Link先を確認	Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, Chao Ma,	(参考訳) ファンデーションモデルは、機械学習の風景に変化をもたらし、多様なタスクにまたがる人間レベルのインテリジェンスの火花を誇示している。しかし、因果推論のような複雑なタスクにおいてギャップは持続し、主に複雑な推論ステップと高い数値的精度の要求に関連する課題が原因である。本研究では,治療効果推定のための因果認識基盤モデルの構築に向けて第一歩を踏み出す。提案手法は,複数のラベルのないデータセットを用いて自己教師付き因果学習を行い,その結果,未知のタスクに対するゼロショット因果推論を新しいデータで実現する,Causal Inference with Attention (CInA) と呼ばれる,理論的に正当化された手法を提案する。これは、最適共変量バランスと自己アテンションの原始的双対関係を実証し、訓練されたトランスフォーマー型アーキテクチャの最終層を通したゼロショット因果推論を容易にする理論結果に基づいている。我々は、CInAが、従来のデータセットごとの手法にマッチしたり、超えたりしながら、分散データセットや様々な実世界のデータセットに効果的に一般化できることを実証的に実証した。これらの結果は,本手法が因果基盤モデルの発展の足掛かりとなる可能性を示唆する証拠となる。 Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset methodologies. These results provide compelling evidence that our method has the potential to serve as a stepping stone for the development of causal foundation models.	翻訳日:2024-06-06 14:16:48 公開日:2024-06-03
# 不確かさを定量的に予測するオンラインアルゴリズム Online Algorithms with Uncertainty-Quantified Predictions ( http://arxiv.org/abs/2310.11558v2 ) ライセンス: Link先を確認	Bo Sun, Jerry Huang, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman, Raouf Boutaba,	(参考訳) 予測を伴うアルゴリズムの急成長する分野は、オンラインアルゴリズムのパフォーマンスを改善するために、潜在的に不完全な機械学習予測を使用することの問題を研究する。このフレームワークの既存のアルゴリズムのほとんどすべてが予測品質を前提としていないが、機械学習モデルに不確実な定量化(UQ)を提供する方法が近年開発され、意思決定時の予測品質に関する追加情報を可能にしている。本研究では,オンラインアルゴリズムの設計における不確実性定量化予測を最適に活用する問題について検討する。特に,スキーレンタルとオンライン検索という2つの古典的なオンライン問題について検討し,意思決定者がUQを付加した予測を行い,基底真理が特定の範囲の値に収まる可能性について述べる。我々は、UQ予測を完全に活用するために、アルゴリズム設計への非自明な修正が必要であることを実証する。さらに、より一般的なUQの活用方法を考察し、マルチインスタンス環境での意思決定にUQを活用することを学ぶオンライン学習フレームワークを提案する。 The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which could enable additional information about prediction quality at decision time. In this work, we investigate the problem of optimally utilizing uncertainty-quantified predictions in the design of online algorithms. In particular, we study two classic online problems, ski rental and online search, where the decision-maker is provided predictions augmented with UQ describing the likelihood of the ground truth falling within a particular range of values. We demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the UQ predictions. Moreover, we consider how to utilize more general forms of UQ, proposing an online learning framework that learns to exploit UQ to make decisions in multi-instance settings.	翻訳日:2024-06-06 14:07:02 公開日:2024-06-03
# ParisLuco3D:LiDAR知覚の領域一般化のための高品質なターゲットデータセット ParisLuco3D: A high-quality target dataset for domain generalization of LiDAR perception ( http://arxiv.org/abs/2310.16542v3 ) ライセンス: Link先を確認	Jules Sanchez, Louis Soum-Fontez, Jean-Emmanuel Deschaud, Francois Goulette,	(参考訳) LiDARは、シーンに関する正確な幾何学的情報を収集することによって、自律運転に不可欠なセンサーである。 % 利用可能なデータの量が増えるにつれて,この情報を認識するために公開することが興味深い。様々なLiDAR認識タスクの性能が向上するにつれて、これらの最適化されたモデルを実環境下でテストするために、新しい環境やセンサーへの一般化が出現している。本稿では,クロスドメイン評価のための新しいデータセットParisLuco3Dを提案する。データセットに加えて、LiDARセマンティックセグメンテーション、LiDARオブジェクト検出、LiDARトラッキングのためのオンラインベンチマークも提供され、メソッド間の公正な比較が保証される。 ParisLuco3Dデータセット、評価スクリプト、ベンチマークへのリンクは以下のウェブサイトで見ることができる。 LiDAR is an essential sensor for autonomous driving by collecting precise geometric information regarding a scene. %Exploiting this information for perception is interesting as the amount of available data increases. As the performance of various LiDAR perception tasks has improved, generalizations to new environments and sensors has emerged to test these optimized models in real-world conditions. This paper provides a novel dataset, ParisLuco3D, specifically designed for cross-domain evaluation to make it easier to evaluate the performance utilizing various source datasets. Alongside the dataset, online benchmarks for LiDAR semantic segmentation, LiDAR object detection, and LiDAR tracking are provided to ensure a fair comparison across methods. The ParisLuco3D dataset, evaluation scripts, and links to benchmarks can be found at the following website:https://npm3d.fr/parisluco3d	翻訳日:2024-06-06 14:07:02 公開日:2024-06-03
# 言語モデルからの制御された復号化 Controlled Decoding from Language Models ( http://arxiv.org/abs/2310.17022v3 ) ライセンス: Link先を確認	Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami,	(参考訳) KL正規化強化学習(KL-regularized reinforcement learning、RL)は、高い報奨結果に対する言語モデル応答を制御するための一般的なアライメントフレームワークである。トークン単位のRLを目的とし、制御復号(CD)と呼ばれるモジュラーソルバを提案する。 CDは個別のプレフィックススコアリングモジュールを通じて制御を行い、報酬の値関数を学習するように訓練される。プレフィックススコアラは、推論時に凍結ベースモデルから生成を制御するために使用され、RL目標に対する溶液から確実にサンプリングされる。我々は,CDが人気のあるベンチマークの制御機構として有効であることを実証的に実証した。また,複数報酬に対するプレフィックススコアラを推論時に組み合わせることで,追加のトレーニングを伴わずに多目的RL問題を効果的に解決できることを示す。また,CD転送を未確認のベースモデルに適用することで,さらなるチューニングを行なわないメリットが示された。最後に,CDを推論時にブロックワイズで復号化することで,一般的なK戦略と強化学習によるトークンワイズ制御のギャップを埋めることができることを示す。これにより、CDは言語モデルのアライメントに有望なアプローチとなる。 KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-K strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.	翻訳日:2024-06-06 14:07:02 公開日:2024-06-03
# 無線通信におけるデータ再構成強化のための条件付き拡散確率モデル Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications ( http://arxiv.org/abs/2310.19460v2 ) ライセンス: Link先を確認	Mehdi Letafati, Samad Ali, Matti Latva-aho,	(参考訳) 本稿では,無線チャネル上でのデータ伝送と再構成を強化するために,条件付き拡散確率モデル(DDPM)を提案する。 DDPMの基盤となるメカニズムは、いわゆる“デノイング”ステップでデータ生成プロセスを分解することだ。これに触発された鍵となる考え方は、情報信号の「ノイズからクリーン」変換を学ぶ際に、拡散モデルの生成的先行を活用して、データ再構成を強化することである。提案手法は,マルチメディア通信において,情報コンテンツに関する事前知識が利用できる通信シナリオに有用である。したがって、情報レートを下げる複雑なチャネル符号を使う代わりに、信頼性の高いデータ再構成、特に信号対雑音比(SNR)の低い信号対雑音比(SNR)やハードウェア障害通信による極端なチャネル条件下で拡散先を利用することができる。提案したDDPM支援受信機は、MNISTデータセットを用いた無線画像伝送のシナリオに合わせて調整される。数値計算の結果は,従来のデジタル通信やディープニューラルネットワーク(DNN)ベースのベンチマークと比較して,提案手法の再構築性能を強調した。また, 誤り訂正のための情報レートを低下させることなく, 低いSNR体制下で10dB以上の改善が達成できることが示唆された。 In this paper, conditional denoising diffusion probabilistic models (DDPMs) are proposed to enhance the data transmission and reconstruction over wireless channels. The underlying mechanism of DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, the key idea is to leverage the generative prior of diffusion models in learning a "noisy-to-clean" transformation of the information signal to help enhance data reconstruction. The proposed scheme could be beneficial for communication scenarios in which a prior knowledge of the information content is available, e.g., in multimedia transmission. Hence, instead of employing complicated channel codes that reduce the information rate, one can exploit diffusion priors for reliable data reconstruction, especially under extreme channel conditions due to low signal-to-noise ratio (SNR), or hardware-impaired communications. The proposed DDPM-assisted receiver is tailored for the scenario of wireless image transmission using MNIST dataset. Our numerical results highlight the reconstruction performance of our scheme compared to the conventional digital communication, as well as the deep neural network (DNN)-based benchmark. It is also shown that more than 10 dB improvement in the reconstruction could be achieved in low SNR regimes, without the need to reduce the information rate for error correction.	翻訳日:2024-06-06 14:07:02 公開日:2024-06-03
# VQPy: 現代的なビデオ分析のためのオブジェクト指向アプローチ VQPy: An Object-Oriented Approach to Modern Video Analytics ( http://arxiv.org/abs/2311.01623v4 ) ライセンス: Link先を確認	Shan Yu, Zhenting Zhu, Yu Chen, Hanchen Xu, Pengzhan Zhao, Yang Wang, Arthi Padmanabhan, Hugo Latapie, Harry Xu,	(参考訳) ビデオ分析は現代のシステムやサービスで広く使われている。ビデオ分析の最前線は、ユーザが特定の関心のあるオブジェクトを見つけるために開発するビデオクエリである。ビデオ分析の中心である映像オブジェクト(人間、動物、車など)は、従来のオブジェクト指向言語でモデル化されたオブジェクトと精神的に類似しているという知見に基づいて、ビデオ分析のためのオブジェクト指向アプローチを開発することを提案する。 VQPyという名前のこのアプローチは、フロントエンド$\unicode{x2015}$a Python variantと、ビデオオブジェクトとそのインタラクションを簡単に表現できるコンストラクトと、ビデオオブジェクトに基づいてパイプラインを自動構築および最適化する拡張可能なバックエンドで構成されている。私たちは、DeepVisionフレームワークの一部としてCiscoで製品化されているVQPyを実装、オープンソース化しました。 Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a frontend$\unicode{x2015}$a Python variant with constructs that make it easy for users to express video objects and their interactions$\unicode{x2015}$as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which has been productized in Cisco as part of its DeepVision framework.	翻訳日:2024-06-06 14:07:02 公開日:2024-06-03
# genEVA:LLMを用いた分岐物語の生成と可視化 GENEVA: GENErating and Visualizing branching narratives using LLMs ( http://arxiv.org/abs/2311.09213v2 ) ライセンス: Link先を確認	Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan,	(参考訳) 対話型ロールプレイングゲーム(RPG)は強力なストーリーテリングを必要とする。これらの物語は、大きな創造的なチームを書くのに何年もかかるかもしれない。本研究では,このプロセスを支援するため,大規模生成テキストモデルの可能性を示す。プロトタイプツールである \textbf{GENEVA} は、デザイナによって提供される高レベルな物語記述と制約にマッチするストーリーラインの分岐と再収束を伴うリッチな物語グラフを生成する。大規模言語モデル(LLM)であるGPT-4は、分岐した物語を生成し、2段階のプロセスでグラフ形式でレンダリングするために使用される。本稿では,異なる文脈制約下での4つの有名な物語の分岐物語生成におけるgenEVAの利用について述べる。このツールはゲーム開発、シミュレーション、その他のゲームライクな特性を持つアプリケーションを支援する可能性がある。 Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# 材料生成のためのスケーラブル拡散 Scalable Diffusion for Materials Generation ( http://arxiv.org/abs/2311.09235v2 ) ライセンス: Link先を確認	Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk,	(参考訳) インターネット規模のデータに基づいてトレーニングされた生成モデルは、新規で現実的なテキスト、画像、ビデオを生成することができる。次の自然な疑問は、新しい安定物質を生成するなど、これらのモデルが科学を前進させることができるかどうかである。伝統的に、明示的な構造を持つモデル(例えばグラフ)は、科学データ(例えば結晶中の原子や結合)の構造関係をモデル化するのに使われてきたが、大規模で複雑なシステムにスケールすることは困難である。材料生成におけるもうひとつの課題は、標準生成モデリングメトリクスと下流アプリケーションとのミスマッチである。例えば、復元誤差のような一般的な指標は、安定した材料を発見するという下流の目標とよく相関しない。本研究では,任意の結晶構造を表現可能な統一結晶表現(UniMat)を開発し,これらのUniMat表現上で拡散確率モデルを訓練することによって,拡張性に挑戦する。実験の結果,UniMatは明示的な構造モデリングの欠如にもかかわらず,より大規模で複雑な化学系から高忠実度結晶構造を生成できることが示唆された。新規な安定材料発見などの下流アプリケーションへの材料生成の質向上を図るため,密度関数理論(DFT)の分解エネルギーを通した凸殻に対するコンベックス生成エネルギーと安定性を含む材料生成モデルの評価指標を提案する。最後に、UniMatを用いた条件付き生成は、数百万の結晶構造を持つ既存の結晶データセットにスケール可能であることを示し、新しい安定物質を発見する上で、ランダムな構造探索(構造発見の現在の先導方法)よりも優れていることを示す。 Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# 量子開始スコア Quantum Inception Score ( http://arxiv.org/abs/2311.12163v3 ) ライセンス: Link先を確認	Akira Sone, Akira Tanji, Naoki Yamamoto,	(参考訳) 機械学習における古典的生成モデルの成功に触発されて、量子バージョンの熱心な探索が最近始まった。この旅に出発するためには、量子生成モデルの質を評価するための関連する指標を開発することが重要である。本稿では,cISの自然な拡張として,量子発生器の量子開始スコア(qIS)を提案する。重要な点として、QISは、与えられたデータセットを分類する量子チャネルのホレボ情報に品質を関連付ける。この文脈では、qISのいくつかの特性を示す。第一に、qISは対応するcISよりも大きいか等しいかであり、システム出力の投影測定によって定義される。第2に、QISとcISの違いは、非対称性の資源理論によって特徴づけられるように、量子コヒーレンスの存在から生じる。第3に、絡み合ったジェネレータのセットを用意した場合には、QISのさらなる拡張につながる分類プロセスが存在する。第4に、量子ゆらぎ定理を利用して、QISの物理的極限を特徴づける。最後に、量子多体物理学における位相分類問題に対して、量子畳み込みニューラルネットワークを量子分類器として、量子生成モデルとして1次元スピンチェーンモデルの品質を評価するためにqISを適用した。 Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such example is the (classical) inception score (cIS). In this paper, as a natural extension of cIS, we propose the quantum inception score (qIS) for quantum generators. Importantly, qIS relates the quality to the Holevo information of the quantum channel that classifies a given dataset. In this context, we show several properties of qIS. First, qIS is greater than or equal to the corresponding cIS, which is defined through projection measurements on the system output. Second, the difference between qIS and cIS arises from the presence of quantum coherence, as characterized by the resource theory of asymmetry. Third, when a set of entangled generators is prepared, there exists a classifying process leading to the further enhancement of qIS. Fourth, we harness the quantum fluctuation theorem to characterize the physical limitation of qIS. Finally, we apply qIS to assess the quality of the one-dimensional spin chain model as a quantum generative model, with the quantum convolutional neural network as a quantum classifier, for the phase classification problem in the quantum many-body physics.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# OASIS:フェデレートラーニングにおけるアクティブリコンストラクションアタックのオフセット OASIS: Offsetting Active Reconstruction Attacks in Federated Learning ( http://arxiv.org/abs/2311.13739v2 ) ライセンス: Link先を確認	Tre' R. Jeter, Truc Nguyen, Raed Alharbi, My T. Thai,	(参考訳) フェデレートラーニング(FL)は、モデルのトレーニング効率を高めながら、ユーザのプライバシを保護する可能性について、大きな注目を集めている。そのため、FLは医療から工業工学まで、特に機密情報やプライバシー法によってデータが簡単に交換できない分野において、さまざまな領域で利用されてきた。しかし、最近の研究では、不適切なサーバによって実行されるアクティブリコンストラクションアタックによって、FLプロトコルが容易に損なわれることが示されている。これらの攻撃には、グローバルモデルパラメータの悪意ある修正が含まれており、サーバは、勾配更新を反転させることで、ユーザのプライベートデータの冗長コピーを取得することができる。このタイプの攻撃に対処することは、強力な脅威モデルのために重要な課題である。本稿では, モデル性能を維持しつつ, アクティブリコンストラクション攻撃を効果的に防止する, 画像強化に基づく防御機構, OASISを提案する。まず,これらの攻撃を可能にする勾配反転の原理を明らかにし,攻撃戦略によらず防御が堅牢である主条件を理論的に同定する。次に,攻撃原理を損なう可能性があることを示す画像拡張による防御を構築した。総合的な評価は、そのソリューションとしての可能性を強調する防衛機構の有効性を示すものである。 Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. For that reason, FL has found its use in various domains, from healthcare to industrial engineering, especially where data cannot be easily exchanged due to sensitive information or privacy laws. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose a defense mechanism, namely OASIS, based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct our defense with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of the defense mechanism highlighting its feasibility as a solution.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# 量子コンピューティングアプローチによる高スピンモデルの2次元コヒーレントスペクトル Two-dimensional coherent spectrum of high-spin models via a quantum computing approach ( http://arxiv.org/abs/2311.14035v4 ) ライセンス: Link先を確認	Martin Mootz, Peter P. Orth, Chuankun Huang, Liang Luo, Jigang Wang, Yong-Xin Yao,	(参考訳) 本稿では,高スピンモデルの2次元コヒーレントスペクトル(2DCS)を計算するための量子コンピューティング手法を提案する。本手法は,数個の磁場パルスの存在下でのリアルタイムダイナミクスのシミュレーションに基づく。適応型変動量子力学シミュレーション(AVQDS)アルゴリズムを,その小型回路による研究に利用し,周波数空間の必要な分解能を達成するために,十分に長時間のシミュレーションを可能にする。具体的には、Dzyaloshinskii-Moriya相互作用と単一イオン異方性を含む反強磁性量子スピンモデルを考える。得られた2DCSスペクトルは、未摂動ハミルトニアンの異なる固有状態間の遷移から生じるマグノン周波数の倍数の異なるピークを示す。 1次元コヒーレントスペクトルを2DCSと比較することにより、2DCSがエネルギースペクトルの高分解能を提供することを示す。さらに、高スピン演算子の2つの異なるバイナリエンコーディング(標準バイナリエンコーディングとグレイ符号)を用いて、スピンの大きさで量子資源がスケールする方法について検討する。低磁場では、両方の符号化は同等の量子資源を必要とするが、より大きな磁場ではグレイ符号が有利である。サイト数が増加するスピンモデルの数値シミュレーションは、量子資源の多項式系サイズのスケーリングを示している。最後に,2DCSの数値計算結果と希土類オルソフェリット系の実験結果を比較した。量子ハイスピンモデルの2DCSにおける高調波発生信号の観測強度は実験データとよく一致し, 対応する平均場よりも顕著に向上した。 We present and benchmark a quantum computing approach to calculate the two-dimensional coherent spectrum (2DCS) of high-spin models. Our approach is based on simulating their real-time dynamics in the presence of several magnetic field pulses, which are spaced in time. We utilize the adaptive variational quantum dynamics simulation (AVQDS) algorithm for the study due to its compact circuits, which enables simulations over sufficiently long times to achieve the required resolution in frequency space. Specifically, we consider an antiferromagnetic quantum spin model that incorporates Dzyaloshinskii-Moriya interactions and single-ion anisotropy. The obtained 2DCS spectra exhibit distinct peaks at multiples of the magnon frequency, arising from transitions between different eigenstates of the unperturbed Hamiltonian. By comparing the one-dimensional coherent spectrum with 2DCS, we demonstrate that 2DCS provides a higher resolution of the energy spectrum. We further investigate how the quantum resources scale with the magnitude of the spin using two different binary encodings of the high-spin operators: the standard binary encoding and the Gray code. At low magnetic fields both encodings require comparable quantum resources, but at larger field strengths the Gray code is advantageous. Numerical simulations for spin models with increasing number of sites indicate a polynomial system-size scaling for quantum resources. Lastly, we compare the numerical 2DCS with experimental results on a rare-earth orthoferrite system. The observed strength of the magnonic high-harmonic generation signals in the 2DCS of the quantum high-spin model aligns well with the experimental data, showing significant improvement over the corresponding mean-field results.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# 混成分類器による精度・ロバスト性取引の軽減 Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off ( http://arxiv.org/abs/2311.15165v2 ) ライセンス: Link先を確認	Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi,	(参考訳) 深層神経分類器は、最近、データ駆動制御システムで大きな成功を収めている。しかし、既存のモデルは精度と敵の堅牢性の間のトレードオフに悩まされている。この制限は、高い性能と厳格な堅牢性の両方を必要とする安全クリティカルなシステムの制御において克服されなければならない。本研究では、ロバストモデルから高いロバスト性と標準モデルから高い精度を同時に継承する分類器を開発する。具体的には、標準ニューラルネットワークとロバストニューラルネットワークの出力確率を混合した理論的動機付け型定式化を提案する。どちらの基本分類器も事前訓練されているので、我々の方法は追加の訓練を必要としない。数値実験により,混合分類器は精度・損耗トレードオフを顕著に改善し,ロバスト基底分類器の信頼性特性を,より良質なトレードオフの鍵となるものとして同定することを確認した。我々の理論的結果は、弱い仮定の下で、ロバスト基底モデルのロバスト性が証明された場合、入力上の閉じた形式である$\ell_p$半径内での変更や攻撃は、混合分類器の誤分類をもたらすことを証明している。 Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.	翻訳日:2024-06-06 13:57:08 公開日:2024-06-03
# 電界波の夢:拡散モデルを用いた心臓励起波の生成モデル Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models ( http://arxiv.org/abs/2312.14830v2 ) ライセンス: Link先を確認	Tanish Baranwal, Jan Lebert, Jan Christoph,	(参考訳) 心臓の電気波は、心房細動や心室細動などの不整脈が持続する間、回転する渦巻波またはスクロール波を形成する。波動力学は通常、励起媒質中の反応拡散力学を記述する結合偏微分方程式を用いてモデル化される。最近では、物理的および生物学的システムにおいて時空間パターンを生成する代替として、データ駆動生成モデリングが出現している。本稿では,心組織における電磁波パターンの生成モデル構築のための拡散確率モデルについて検討する。我々は、非条件および条件付き生成タスクにおいて、そのような波動パターンを生成できるように、模擬波動パターンを用いた拡散モデルを訓練した。例えば、拡散に基づく研究を行った。 i) パラメータ固有の生成 ii) 進化と進化三表面二次元測定による三次元スクロール波動の再構成を含む渦巻き波動の塗装さらに, 任意の形状の両心室ジオメトリを生成し, 拡散を利用したスクロールウェーブパターンを同時に開始した。生体物理モデルを用いて得られた解に対する拡散生成溶液の特性と比較を行った結果,拡散モデルはスパイラル波とスクロール波のダイナミックスを再現し,心組織における励起波のデータ駆動モデリングに利用できることがわかった。例えば、拡散誘起スパイラル波動のアンサンブルは、生物物理学モデルでシミュレートされた対応するアンサンブルと同様の自己終端統計を示す。しかし, 拡散モデルでは, トレーニングデータが不足している場合, 例えば, 自己終端時, および, 制約が不十分な場合の「幻覚」波のパターンを生成できることがわかった。 Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored the diffusion-based i) parameter-specific generation, ii) evolution and iii) inpainting of spiral wave dynamics, including reconstructing three-dimensional scroll wave dynamics from superficial two-dimensional measurements. Further, we generated arbitrarily shaped bi-ventricular geometries and simultaneously initiated scroll wave patterns inside these geometries using diffusion. We characterized and compared the diffusion-generated solutions to solutions obtained with corresponding biophysical models and found that diffusion models learn to replicate spiral and scroll waves dynamics so well that they could be used for data-driven modeling of excitation waves in cardiac tissue. For instance, an ensemble of diffusion-generated spiral wave dynamics exhibits similar self-termination statistics as the corresponding ensemble simulated with a biophysical model. However, we also found that diffusion models {produce artifacts if training data is lacking, e.g. during self-termination,} and `hallucinate' wave patterns when insufficiently constrained.	翻訳日:2024-06-06 13:37:33 公開日:2024-06-03
# 高分解能ジコトコス像の両側参照 Bilateral Reference for High-Resolution Dichotomous Image Segmentation ( http://arxiv.org/abs/2401.03407v4 ) ライセンス: Link先を確認	Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe,	(参考訳) 高分解能ディコトコス像分割(DIS)のための新しい両側参照フレームワーク(BiRefNet)を導入する。本研究は,2つの基本成分: 局所化モジュール (LM) と再構成モジュール (RM) を, 提案した両側参照 (BiRef) で構成する。 LMはグローバルな意味情報を用いたオブジェクトのローカライゼーションを支援する。 RM内では、画像の階層的パッチがソース参照を提供し、勾配マップがターゲット参照として機能する、再構成プロセスにBiRefを利用する。これらのコンポーネントは、最終的な予測マップを生成するために協力する。また,より詳細な領域に焦点を絞るために,補助的な勾配監督を導入する。さらに、地図の質とトレーニングプロセスを改善するために、Disdisに適した実践的なトレーニング戦略を概説する。提案手法の汎用性を検証するため,BiRefNetがすべてのベンチマークにおいて,タスク固有の最先端手法よりも優れた性能を示すことを示すため,4つのタスクについて広範な実験を行った。私たちのコードはhttps://github.com/ZhengPeng7/BiRefNetで公開されています。 We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.	翻訳日:2024-06-06 13:27:48 公開日:2024-06-03
# REBUS: シンボル理解のためのロバストな評価ベンチマーク REBUS: A Robust Evaluation Benchmark of Understanding Symbols ( http://arxiv.org/abs/2401.05604v2 ) ライセンス: Link先を確認	Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung,	(参考訳) 本稿では,リバスパズルを用いたマルチモーダル大言語モデルの性能評価手法を提案する。データセットは、画像ベースのワードプレイのオリジナル例333をカバーし、映画、作曲家、主要都市、食品など13のカテゴリを網羅している。キーワードやフレーズを識別するベンチマークで優れたパフォーマンスを達成するためには、画像認識と文字列操作を仮説テスト、多段階推論、人間の認知の理解と組み合わせて、複雑なマルチモーダルな機能評価を行う必要がある。 GPT-4oは他のモデルよりも大幅に優れており、続いてプロプライエタリモデルも他のモデルよりも優れています。しかし、最高のモデルでさえ、最終的な精度は42\%に過ぎず、ハードパズルでは7\%に低下し、推論の大幅な改善の必要性が浮かび上がっている。さらに、モデルはパズルのすべての部分をほとんど理解せず、ほとんど常に正解を遡って説明できない。したがって、我々のベンチマークは、マルチモーダルな大言語モデルの知識と推論における大きな欠点を特定するのに利用できる。 We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that GPT-4o significantly outperforms all other models, followed by proprietary models outperforming all other evaluated models. However, even the best model has a final accuracy of only 42\%, which goes down to just 7\% on hard puzzles, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models.	翻訳日:2024-06-06 13:27:48 公開日:2024-06-03
# ニューロ・シンボリック推論と学習のための凸とバイレベル最適化 Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning ( http://arxiv.org/abs/2401.09651v2 ) ライセンス: Link先を確認	Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor,	(参考訳) 我々は凸と双レベル最適化の手法を活用し、ニューラルシンボリック(NeSy)システムのための一般的な勾配に基づくパラメータ学習フレームワークを開発する。我々は、最先端のNeSyアーキテクチャであるNeuPSLを使って、我々のフレームワークを実演する。そこで本研究では、NeuPSL推論のスムーズな原始的および双対的定式化を提案し、学習勾配が最適双対変数の関数であることを示す。さらに,温暖化開始を自然に活用する新しい定式化のための二重ブロック座標降下アルゴリズムを開発した。これにより、現在の最高のNeuPSL推論メソッドよりも100倍以上の学習ランタイムが改善される。最後に、さまざまなタスクをカバーする8つのデータセットにわたる広範な経験的評価を行い、我々の学習フレームワークが、代替学習手法よりも最大16%のポイント予測性能の向上を達成することを実証する。 We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.	翻訳日:2024-06-06 13:27:48 公開日:2024-06-03
# モース不整合からのリップシンクディープフェイクの抽出 Exposing Lip-syncing Deepfakes from Mouth Inconsistencies ( http://arxiv.org/abs/2401.10113v2 ) ライセンス: Link先を確認	Soumyya Kanti Datta, Shan Jia, Siwei Lyu,	(参考訳) リップシンクのディープフェイク(英: Lip-syncing Deepfake)は、人の唇の動きをAIモデルを使って説得力のある方法で生成し、修正された音声や全く新しい音声にマッチさせるデジタル操作されたビデオである。リップ同期のディープフェイクは、人工物がリップ領域に限定されており、識別が困難であるため、危険なタイプのディープフェイクである。本稿では,口領域の時間的不整合を識別し,口内不整合(LIPINC)に基づく口内深度検出法を提案する。これらの矛盾は、隣接するフレームやビデオ全体に見られる。我々のモデルはこれらの不規則性をうまく捉え、いくつかのベンチマークディープフェイクデータセットで最先端の手法より優れている。コードはhttps://github.com/skrantidatta/LIPINCで公開されている。 A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio. Lip-syncing deepfakes are a dangerous type of deepfakes as the artifacts are limited to the lip region and more difficult to discern. In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency (LIPINC), for lip-syncing deepfake detection by identifying temporal inconsistencies in the mouth region. These inconsistencies are seen in the adjacent frames and throughout the video. Our model can successfully capture these irregularities and outperforms the state-of-the-art methods on several benchmark deepfake datasets. Code is available at https://github.com/skrantidatta/LIPINC	翻訳日:2024-06-06 13:27:48 公開日:2024-06-03
# 任意スケールの病理画像スーパーレゾリューションに向けて: インシシト自己テクスチャ強化による効率的なデュアルブランチフレームワーク Towards Arbitrary-Scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework via Implicit Self-texture Enhancement ( http://arxiv.org/abs/2401.15613v2 ) ライセンス: Link先を確認	Minghong Duan, Linhao Qu, Zhiwei Yang, Manning Wang, Chenxi Zhang, Zhijian Song,	(参考訳) 高品質な全スライディングスキャナーは高価で複雑で時間を要するため、日常臨床における高解像度の病理画像の取得と利用が制限される。低分解能画像から高分解能画像を合成することにより、深層学習に基づく単一画像の超解像技術がこの問題の解決に有効な方法である。しかし、病理画像に適用された既存の超解像モデルは、固定整数倍率でしか機能せず、適用性が著しく低下する。暗黙的な神経表現に基づく手法は、自然画像の任意のスケールの超解像において有望な結果を示しているが、それを病理画像に直接適用することは、自然画像とは異なる独特の微細な画像テクスチャを持つため、不十分である。そこで本研究では,この課題に対処するために,任意の規模の病理像の超解像を実現するためのImplicit Self-Texture Enhancement-based dual-branch framework (ISTE)を提案する。 ISTEには、まずピクセルの特徴とテクスチャの特徴を学習するテクスチャ学習ブランチと、画素学習ブランチが含まれている。そして、2段階のテクスチャ強化戦略を設計し、2段階のテクスチャを融合させて超解像結果を得る。 3つの公開データセットに対する大規模な実験によると、ISTEは既存の固定スケールおよび任意のスケールのアルゴリズムを複数の倍率で上回り、下流タスクのパフォーマンスを向上させる。我々の知る限りでは、病理画像における任意のスケールの超解像を実現するための最初の試みである。コードは利用可能。 High-quality whole-slide scanners are expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution pathology whole-slide images in daily clinical work. Deep learning-based single-image super-resolution techniques are an effective way to solve this problem by synthesizing high-resolution images from low-resolution ones. However, the existing super-resolution models applied in pathology images can only work in fixed integer magnifications, significantly decreasing their applicability. Though methods based on implicit neural representation have shown promising results in arbitrary-scale super-resolution of natural images, applying them directly to pathology images is inadequate because they have unique fine-grained image textures different from natural images. Thus, we propose an Implicit Self-Texture Enhancement-based dual-branch framework (ISTE) for arbitrary-scale super-resolution of pathology images to address this challenge. ISTE contains a pixel learning branch and a texture learning branch, which first learn pixel features and texture features, respectively. Then, we design a two-stage texture enhancement strategy to fuse the features from the two branches to obtain the super-resolution results, where the first stage is feature-based texture enhancement, and the second stage is spatial-domain-based texture enhancement. Extensive experiments on three public datasets show that ISTE outperforms existing fixed-scale and arbitrary-scale algorithms at multiple magnifications and helps to improve downstream task performance. To the best of our knowledge, this is the first work to achieve arbitrary-scale super-resolution in pathology images. Codes will be available.	翻訳日:2024-06-06 13:17:49 公開日:2024-06-03
# 変圧器はコピー時の状態空間モデルより優れている Repeat After Me: Transformers are Better than State Space Models at Copying ( http://arxiv.org/abs/2402.01032v2 ) ライセンス: Link先を確認	Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach,	(参考訳) トランスフォーマーはシーケンスモデリングにおいて支配的なアーキテクチャであるが、我々は「一般化状態空間モデル」(GSSM)と呼ばれるシーケンス長に依存しない固定サイズの潜在状態を使用するモデルへの関心が高まっている。本稿では,GSSMは推論時間効率の面で有望であるが,入力コンテキストからのコピーを必要とするタスクにおいて,トランスフォーマーモデルと比較して限定的であることを示す。まず,2層変換器が指数関数長の文字列をコピーできるのに対して,GSSMは固定サイズ潜在状態によって根本的に制限されていることを証明する。実験により,コンテクストの複製を必要とする合成タスクにおいて,トランスフォーマーがGSSMよりも効率や一般化に優れていることが判明した。最後に、事前学習した大規模言語モデルを評価し、コンテクストからの情報のコピーと検索において、トランスフォーマーモデルが状態空間モデルより劇的に優れていることを見出した。これらの結果は,本研究の課題におけるトランスフォーマーとGSSMの根本的なギャップを示唆するものである。 Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context. We start with a theoretical analysis of the simple task of string copying and prove that a two layer transformer can copy strings of exponential length while GSSMs are fundamentally limited by their fixed-size latent state. Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context. Finally, we evaluate pretrained large language models and find that transformer models dramatically outperform state space models at copying and retrieving information from context. Taken together, these results suggest a fundamental gap between transformers and GSSMs on tasks of practical interest.	翻訳日:2024-06-06 13:17:49 公開日:2024-06-03
# 構成生成モデリング:1つのモデルだけでは十分ではない Compositional Generative Modeling: A Single Model is Not All You Need ( http://arxiv.org/abs/2402.01103v3 ) ライセンス: Link先を確認	Yilun Du, Leslie Kaelbling,	(参考訳) 大量のデータに基づいてトレーニングされた巨大なモノリシックな生成モデルは、AI研究においてますます支配的なアプローチになりつつある。本稿では,より小さな生成モデルを構成することによって,より大規模な生成システムを構築するべきであると論じる。このような構成的生成アプローチによって、よりデータ効率の良い方法で分布を学習し、トレーニング時に見つからないデータ分布の一部に一般化できることを示す。さらに、トレーニングで完全に見えないタスクのための新しい生成モデルをプログラムし、構築することを可能にする方法を示す。最後に、多くの場合、データから別々の構成成分を発見できることを示す。 Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.	翻訳日:2024-06-06 13:17:49 公開日:2024-06-03
# PINNの育成における課題--景観の喪失をめざして Challenges in Training PINNs: A Loss Landscape Perspective ( http://arxiv.org/abs/2402.01868v2 ) ライセンス: Link先を確認	Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell,	(参考訳) 本稿では,物理情報ニューラルネットワーク(PINN)の学習における課題について考察し,学習過程における損失景観の役割を強調した。本稿では, PINN損失関数の最小化の難しさについて検討する。我々は、勾配に基づく最適化器AdamとL-BFGSとそれらの組み合わせAdam+L-BFGSを比較し、Adam+L-BFGSの優位性を示し、新しい二階最適化器NysNewton-CG(NNCG)を導入し、PINNの性能を大幅に向上させた。理論的には、不条件微分演算子と不条件演算子のPINN損失の関係を解明し、一階と二階の最適化法を組み合わせる利点を示す。我々の研究は、PINNを訓練するための貴重な洞察とより強力な最適化戦略を示し、難しい偏微分方程式を解くためのPINNの有用性を向上させることができる。 This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.	翻訳日:2024-06-06 13:17:49 公開日:2024-06-03
# 効率的であることを学ぶ - 大規模言語モデルにおける構造化された疎結合の構築 Learn To be Efficient: Build Structured Sparsity in Large Language Models ( http://arxiv.org/abs/2402.06126v3 ) ライセンス: Link先を確認	Haizhong Zheng, Xiaoyan Bai, Xueshen Liu, Z. Morley Mao, Beidi Chen, Fan Lai, Atul Prakash,	(参考訳) 大きな言語モデル(LLM)は、その10億レベルのパラメータで驚くべき成功を収めていますが、高い推論オーバーヘッドを引き起こします。 LLMにおける活性化空間の出現は、推論のためのパラメータの一部だけを含むことによって、このコストを削減する自然なアプローチを提供する。しかし、既存の手法では、この自然に形成された活性化空間を訓練後の環境で利用することのみに焦点が当てられており、この固有領域をさらに増幅する可能性を見越している。本稿では,より構造化された活性化空間を実現することにより,LCMが効率良く学習できるという仮説を立てる。そこで本研究では,Learning-To-be-Efficient(LTE)という新しいトレーニングアルゴリズムを導入し,LLMを学習してニューロンの活性化を減らし,空間性と性能のトレードオフを改善することを目的とした。さらに、主にReLUベースのモデルに焦点を当てたSOTA MoEfication法とは異なり、LTEは非ReLUアクティベーションを使用してLLaMAのようなLLMにも適用することができる。言語理解、言語生成、命令チューニングタスクに関する広範囲な評価は、LTEがSOTAベースラインを一貫して上回っていることを示している。ハードウェア対応のカスタムカーネル実装に加えて、LTEはLLaMA2-7B推論遅延を50%の間隔で25%削減します。 Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like LLaMA using non-ReLU activations. Extensive evaluation on language understanding, language generation, and instruction tuning tasks show that LTE consistently outperforms SOTA baselines. Along with our hardware-aware custom kernel implementation, LTE reduces LLaMA2-7B inference latency by 25% at 50% sparsity.	翻訳日:2024-06-06 13:08:02 公開日:2024-06-03
# 効率的な普遍的形態制御のための蒸留型ハイパーネット Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control ( http://arxiv.org/abs/2402.06570v2 ) ライセンス: Link先を確認	Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson,	(参考訳) 異なるロボット形態の普遍的なポリシーを学ぶことは、学習効率を著しく向上させ、ゼロショットの一般化を目に見えない形態の一般化を可能にする。しかし、高性能なユニバーサルポリシーを学ぶには、より単純な多層パーセプトロン(MLP)よりもメモリと計算コストが大きいトランスフォーマー(TF)のような高度なアーキテクチャを必要とする。 TFのような優れた性能と、推論時のMLPのような高効率を実現するために、(1)ロボットのMDPポリシーを生成する形態条件付きハイパーネットワーク(HN)、(2)トレーニングを成功させるために不可欠なポリシー蒸留アプローチからなるHyperDistillを提案する。何百もの多様な形態のベンチマークであるUNIMALにおいて、HyperDistillはトレーニングと未確認テストロボットの共通TF教師ポリシーと同様に、異なる環境でモデルサイズを6～14倍、計算コストを67～160倍削減することを示した。我々の分析は、推論時間におけるHyperDistillの効率性は、知識分離、すなわち、タスク間知識とタスク内知識を分離する能力に起因している。 Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.	翻訳日:2024-06-06 13:08:02 公開日:2024-06-03
# ランダム化平滑化を用いたセグメンテーションのための適応的階層的認証 Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing ( http://arxiv.org/abs/2402.08400v2 ) ライセンス: Link先を確認	Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz,	(参考訳) 機械学習の認証は、特定の条件下でモデルを回避する敵のサンプルが存在しないことを証明している。セグメンテーションの一般的な認証方法は、平らな粒度のクラスを使い、多くのクラスでモデルの不確実性のために高い断続率をもたらす。本稿では,複数レベルの階層内の画素を認証し,不安定なコンポーネントに対する粗いレベルの認証を適応的に緩和する,より実用的な設定を提案する。問題設定を数学的に定式化し、適応的階層的認証アルゴリズムを導入し、その保証の正確性を証明する。認証精度は、粗いクラスを考慮した情報損失を考慮しないので、クラス粒度レベルに比例した認証情報ゲイン(\mathrm{CIG}$)メトリクスを導入する。 Cityscapes, PASCAL-Context, ACDC, COCO-Stuffのデータセットに関する広範な実験により、我々の適応アルゴリズムは、現在の最先端認証法と比較して、より高い$\mathrm{CIG}$と低い吸収率を達成することを示した。私たちのコードは、https://github.com/AlaaAnani/adaptive-certify.comで参照できます。 Certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions, a necessity for safety-critical domains. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty across many classes. We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees. Since certified accuracy does not take the loss of information into account for coarser classes, we introduce the Certified Information Gain ($\mathrm{CIG}$) metric, which is proportional to the class granularity level. Our extensive experiments on the datasets Cityscapes, PASCAL-Context, ACDC and COCO-Stuff demonstrate that our adaptive algorithm achieves a higher $\mathrm{CIG}$ and lower abstain rate compared to the current state-of-the-art certification method. Our code can be found here: https://github.com/AlaaAnani/adaptive-certify.	翻訳日:2024-06-06 12:58:06 公開日:2024-06-03
# 時間分布シフト下におけるモデル評価と選択 Model Assessment and Selection under Temporal Distribution Shift ( http://arxiv.org/abs/2402.08672v2 ) ライセンス: Link先を確認	Elise Han, Chengpiao Huang, Kaizheng Wang,	(参考訳) 変動環境におけるモデル評価と選択について,現在と歴史的時代の両方からデータセットを合成することによって検討する。未知かつ潜在的に任意の時間分布シフトに対処するため、与えられたモデルの一般化誤差を推定する適応型ローリングウインドウ手法を開発した。この戦略はまた、一般化誤差の差を推定することにより、任意の2つの候補モデルの比較を容易にする。さらに、ペアワイズ比較を単一消去トーナメントに統合し、候補の集合から最適に近いモデル選択を実現する。理論的解析と数値実験により,提案手法の非定常性に対する適応性を示す。 We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.	翻訳日:2024-06-06 12:58:06 公開日:2024-06-03
# トランスダクティブサンプル複合体はコンパクトである Transductive Sample Complexities Are Compact ( http://arxiv.org/abs/2402.10360v2 ) ライセンス: Link先を確認	Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng,	(参考訳) すべての仮説クラス$H$は、すべての有限射影が標本複雑性$m$で学習可能であれば、正確には、半帰納的標本複雑性$m$で学習可能である。この厳密なコンパクト性は、任意の適切な計量損失函数(例えば、$\mathbb{R}^d$のノルム)およびコンパクト空間上の任意の連続損失(例えば、クロスエントロピー、正方形損失)に関して、実現可能かつ非依存的な学習に成り立つことを証明している。不適切な計量損失を伴う実現可能な学習のために、サンプルの複雑さの正確なコンパクト性は失敗しうることを示し、そのようなサンプルの複雑さが相違する程度で2の係数の上と下の境界が一致することを示す。我々は、無知の場合においてより大きなギャップが可能であると推測する。さらに、PACのサンプル複雑度とトランスダクティブモデル(実現可能な場合、低次因子まで)の等価性を呼び出すことで、結果を直接PACモデルに移植することが可能となり、PAC学習において広く保持されるほぼ正確なコンパクト性の形式が明らかになる。 We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to any proper metric loss function (e.g., any norm on $\mathbb{R}^d$) and any continuous loss on a compact space (e.g., cross-entropy, squared loss). For realizable learning with improper metric losses, we show that exact compactness of sample complexity can fail, and provide matching upper and lower bounds of a factor of 2 on the extent to which such sample complexities can differ. We conjecture that larger gaps are possible for the agnostic case. Furthermore, invoking the equivalence between sample complexities in the PAC and transductive models (up to lower order factors, in the realizable case) permits us to directly port our results to the PAC model, revealing an almost-exact form of compactness holding broadly in PAC learning.	翻訳日:2024-06-06 12:58:06 公開日:2024-06-03
# PAT-Questions: リアルタイム質問応答のための自己更新ベンチマーク PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering ( http://arxiv.org/abs/2402.11034v2 ) ライセンス: Link先を確認	Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong, Vagelis Hristidis,	(参考訳) TQA(Temporal Question Answering)の既存の研究は、主に特定のタイムスタンプやイベント(1970年のアメリカ大統領は誰だったのか? 時間的文脈が現在と相対的な問題(例えば「前大統領は誰だったのか」など)は、ほとんど研究されていない。本報告では,この問題をPATQA(Present-Anchored Temporal QA)と呼ぶ。 PATQAは、(1)大きな言語モデル(LLM)が時代遅れの知識を持つかもしれないし、(2)複雑な時間的関係(例えば 'before' や 'previous' など)は推論が難しいし、(3)マルチホップ推論が必要かもしれないし、(4)ベンチマークの金の回答を継続的に更新する必要がある。これらの課題に対処するために、単座と多座の時間的問題を含むPAT-Questionsベンチマークを導入する。 PAT-Questionsの回答は、もし利用可能であれば、ナレッジグラフ上でSPARQLクエリを再実行することで、自動的に更新できる。我々は、直接的プロンプトと検索強化生成(RAG)を用いて、PAT-Questionsにおける最先端のLLMとSOTA時間的推論モデル(TEMPREASON-T5)を評価した。その結果、PATQAにおける既存のソリューションの限界を強調し、PATQA推論機能を改善するための新しい方法の必要性を動機付けている。 Existing work on Temporal Question Answering (TQA) has predominantly focused on questions anchored to specific timestamps or events (e.g. "Who was the US president in 1970?"). Little work has studied questions whose temporal context is relative to the present time (e.g. "Who was the previous US president?"). We refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses unique challenges: (1) large language models (LLMs) may have outdated knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are hard to reason, (3) multi-hop reasoning may be required, and (4) the gold answers of benchmarks must be continuously updated. To address these challenges, we introduce the PAT-Questions benchmark, which includes single and multi-hop temporal questions. The answers in PAT-Questions can be automatically refreshed by re-running SPARQL queries on a knowledge graph, if available. We evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model (TEMPREASON-T5) on PAT-Questions through direct prompting and retrieval-augmented generation (RAG). The results highlight the limitations of existing solutions in PATQA and motivate the need for new methods to improve PATQA reasoning capabilities.	翻訳日:2024-06-06 12:58:06 公開日:2024-06-03
# 近接量子限界雑音特性を持つ4波混合を用いた4-8GHzの動特性インダクタンスパラメトリック増幅器 A 4-8 GHz Kinetic Inductance Travelling-Wave Parametric Amplifier Using Four-Wave Mixing with Near Quantum-Limit Noise Performance ( http://arxiv.org/abs/2402.11751v4 ) ライセンス: Link先を確認	Farzad Faramarzi, Ryan Stephenson, Sasha Sypkens, Byeong H. Eom, Henry LeDuc, Peter Day,	(参考訳) 動インダクタンス進行波パラメトリック増幅器(KI-TWPA)は、量子限界に近い性能と比較的高いダイナミックレンジを持つ広い瞬時帯域を有する。このため、低温検出器や超伝導量子ビットに適した読み出し装置であり、量子センシングに様々な応用がある。本研究では,NbTiNマイクロストリップ伝送線路における4波長混合に基づくKI-TWPAの設計,製造,性能について述べる。別個の高周波数帯域で発生する画像トーンから汚染されることなく、4〜8〜GHzの信号帯域を増幅する。 4〜8〜GHz帯は、マイクロ波速度インダクタンス検出器(MKID)やジョセフソンジャンクションベースの量子ビットなどの低温検出器の読み出しに一般的に用いられている。 1-dBゲイン圧縮点が-58dBmの4波長混合による最大ゲイン20dB以上を,そのバンドよりも15dBのゲイン15dBで測定した。帯域幅とピークゲインは、ポンプ音の周波数と電力を調整することで調整可能である。また、Y-factor法を用いて、4.5 - 8GHzの1.5$光子に対して0.5ドル/0.5ドル/0.5ドル/0.5ドル/の増幅雑音を測定する。 Kinetic inductance traveling-wave parametric amplifiers (KI-TWPA) have a wide instantaneous bandwidth with near quantum-limited performance and a relatively high dynamic range. Because of this, they are suitable readout devices for cryogenic detectors and superconducting qubits and have a variety of applications in quantum sensing. This work discusses the design, fabrication, and performance of a KI-TWPA based on four-wave mixing in a NbTiN microstrip transmission line. This device amplifies a signal band from 4 to 8~GHz without contamination from image tones, which are produced in a separate higher frequency band. The 4 - 8~GHz band is commonly used to read out cryogenic detectors, such as microwave kinetic inductance detectors (MKIDs) and Josephson junction-based qubits. We report a measured maximum gain of over 20 dB using four-wave mixing with a 1-dB gain compression point of -58 dBm at 15 dB of gain over that band. The bandwidth and peak gain are tunable by adjusting the pump-tone frequency and power. Using a Y-factor method, we measure an amplifier-added noise of $ 0.5 \leq N_{added} \leq 1.5$ photons from 4.5 - 8 GHz.	翻訳日:2024-06-06 12:48:21 公開日:2024-06-03
# 非線形力学系の状態とパラメータ推定のための反復INLA Iterated INLA for State and Parameter Estimation in Nonlinear Dynamical Systems ( http://arxiv.org/abs/2402.17036v2 ) ライセンス: Link先を確認	Rafael Anderka, Marc Peter Deisenroth, So Takao,	(参考訳) データ同化法(DA)法は、微分方程式から生じる先行値を用いてデータを頑健に補間し、外挿する。高次元非線形PDE事前処理を行うアンサンブル法のような一般的な手法は、主に状態推定に重点を置いているが、パラメータを正確に学習することは困難である。一方、機械学習に基づくアプローチは、状態とパラメータを自然に学習することができるが、適用性は制限されるか、解釈が難しい不確実性を生成することができる。空間統計学におけるIntegrated Nested Laplace Approximation (INLA)法に着想を得て,動的モデルの反復線形化に基づくDAへの代替手法を提案する。これにより、各イテレーションでガウスマルコフランダムフィールドを生成し、INLAを使って状態とパラメータを推測することができる。本手法は,解釈可能性を維持しながら任意の非線形システムに利用することができ,さらにDAタスクにおける既存手法よりも優れていることを示す。非線形PDE事前処理に対するよりニュアンスなアプローチを提供することにより、予測精度の向上とロバスト性、特にデータ空間が普及している場所での予測が可能となる。 Data assimilation (DA) methods use priors arising from differential equations to robustly interpolate and extrapolate data. Popular techniques such as ensemble methods that handle high-dimensional, nonlinear PDE priors focus mostly on state estimation, however can have difficulty learning the parameters accurately. On the other hand, machine learning based approaches can naturally learn the state and parameters, but their applicability can be limited, or produce uncertainties that are hard to interpret. Inspired by the Integrated Nested Laplace Approximation (INLA) method in spatial statistics, we propose an alternative approach to DA based on iteratively linearising the dynamical model. This produces a Gaussian Markov random field at each iteration, enabling one to use INLA to infer the state and parameters. Our approach can be used for arbitrary nonlinear systems, while retaining interpretability, and is furthermore demonstrated to outperform existing methods on the DA task. By providing a more nuanced approach to handling nonlinear PDE priors, our methodology offers improved accuracy and robustness in predictions, especially where data sparsity is prevalent.	翻訳日:2024-06-06 12:38:37 公開日:2024-06-03
# 急激な不安定性を超えて--LLMにおける政治的世界観の信頼性と一貫性の評価 Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs ( http://arxiv.org/abs/2402.17649v2 ) ライセンス: Link先を確認	Tanise Ceron, Neele Falk, Ana Barić, Dmitry Nikolaev, Sebastian Padó,	(参考訳) ユビキタスシステムで大規模言語モデル(LLM)が広く使われているため、それらが特定の世界観を埋め込んでいるのか、どのように反映されているのかを理解する必要がある。近年の研究では、政治的アンケートにより、LLMは左利き(Feng et al , 2023; Motoki et al , 2024)を示すことが報告されている。しかし、これらの傾きが信頼できるか(変動を促すために悪用されている)、また、その傾きが政策や政治的傾きに一貫したものであるかは定かではない。本研究では、EU7カ国から収集された投票支援票のデータセットに基づいて、政治声明に対するLCMの姿勢の信頼性と整合性を評価する一連のテストを提案する。本研究では, 7B から 70B までの大きさの LLM について検討し, パラメータ数によって信頼性が向上することを確認した。より大規模なモデルは、左派政党との全体的な整合性を示すが、政策プログラムによって異なる: 環境保護、社会福祉国家、リベラル社会に対する(左派)肯定的な姿勢と、(右派)法と秩序を、外交政策と移民に一貫した好意を持たない。 Due to the widespread use of large language models (LLMs) in ubiquitous systems, we need to understand whether they embed a specific worldview and what these views reflect. Recent studies report that, prompted with political questionnaires, LLMs show left-liberal leanings (Feng et al., 2023; Motoki et al., 2024). However, it is as yet unclear whether these leanings are reliable (robust to prompt variations) and whether the leaning is consistent across policies and political leaning. We propose a series of tests which assess the reliability and consistency of LLMs' stances on political statements based on a dataset of voting-advice questionnaires collected from seven EU countries and annotated for policy domains. We study LLMs ranging in size from 7B to 70B parameters and find that their reliability increases with parameter count. Larger models show overall stronger alignment with left-leaning parties but differ among policy programs: They evince a (left-wing) positive stance towards environment protection, social welfare state and liberal society but also (right-wing) law and order, with no consistent preferences in foreign policy and migration.	翻訳日:2024-06-06 12:38:37 公開日:2024-06-03
# ヘラクレス:高分解能画像と時系列解析のためのハイブリッドSSM変換器モデル Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis ( http://arxiv.org/abs/2403.18063v2 ) ライセンス: Link先を確認	Badri N. Patro, Suhas Ranganath, Vinay P. Namboodiri, Vijay S. Agneeswaran,	(参考訳) トランスフォーマーは、DeIT、Swin、SVT、Biformer、STVit、FDVITなどの適応で画像モデリングタスクに革命をもたらした。しかし、これらのモデルはしばしば誘導バイアスと高い二次的複雑性の課題に直面し、高解像度画像では効率が低下する。 Mamba、V-Mamba、ViM、SiMBAのような状態空間モデル(SSM)は、コンピュータビジョンタスクで高解像度の画像を処理する代替手段を提供する。これらのSSMは2つの大きな問題に遭遇する。まず、大規模なネットワークサイズにスケールすると不安定になる。第二に、画像内のグローバルな情報を効率的にキャプチャするが、本質的にはローカル情報を扱うのに苦労する。これらの課題に対処するため,ローカルSSM,グローバルSSM,アテンションベースのトークンインタラクションモジュールを統合した新しいSSMであるHeraclesを紹介した。 Heraclesは、グローバルイメージ情報のためのHartelyカーネルベースのステートスペースモデル、ローカル詳細のためのローカライズされた畳み込みネットワーク、トークンインタラクションのためのより深いレイヤにおけるアテンションメカニズムを活用する。大規模な実験により、Heracles-C-smallは84.5\%のトップ-1精度でImageNetデータセット上で最先端のパフォーマンスを達成することが示された。 Heracles-C-Large と Heracles-C-Huge はさらに精度を 85.9\% と 86.4\% に改善した。さらに、Heraclesは、CIFAR-10、CIFAR-100、Oxford Flowers、Stanford Carsといったデータセット上のトランスファー学習タスクや、例えばMSCOCOデータセット上のセグメンテーションに優れています。ヘラクレスはまた、7つの時系列データセットで最先端の結果を達成し、スペクトルデータでドメインをまたいで一般化する能力を示し、ローカル情報とグローバル情報の両方をキャプチャすることで、その汎用性を証明している。プロジェクトのページはこちらのリンクで公開されている。 https://github.com/badripatro/heracles} Transformers have revolutionized image modeling tasks with adaptations like DeIT, Swin, SVT, Biformer, STVit, and FDVIT. However, these models often face challenges with inductive bias and high quadratic complexity, making them less efficient for high-resolution images. State space models (SSMs) such as Mamba, V-Mamba, ViM, and SiMBA offer an alternative to handle high resolution images in computer vision tasks. These SSMs encounter two major issues. First, they become unstable when scaled to large network sizes. Second, although they efficiently capture global information in images, they inherently struggle with handling local information. To address these challenges, we introduce Heracles, a novel SSM that integrates a local SSM, a global SSM, and an attention-based token interaction module. Heracles leverages a Hartely kernel-based state space model for global image information, a localized convolutional network for local details, and attention mechanisms in deeper layers for token interactions. Our extensive experiments demonstrate that Heracles-C-small achieves state-of-the-art performance on the ImageNet dataset with 84.5\% top-1 accuracy. Heracles-C-Large and Heracles-C-Huge further improve accuracy to 85.9\% and 86.4\%, respectively. Additionally, Heracles excels in transfer learning tasks on datasets such as CIFAR-10, CIFAR-100, Oxford Flowers, and Stanford Cars, and in instance segmentation on the MSCOCO dataset. Heracles also proves its versatility by achieving state-of-the-art results on seven time-series datasets, showcasing its ability to generalize across domains with spectral data, capturing both local and global information. The project page is available at this link.\url{https://github.com/badripatro/heracles}	翻訳日:2024-06-06 12:19:03 公開日:2024-06-03
# IoTクラウドシステムのストレステストのためのリーンシミュレーションフレームワーク A Lean Simulation Framework for Stress Testing IoT Cloud Systems ( http://arxiv.org/abs/2404.11542v3 ) ライセンス: Link先を確認	Jia Li, Behrad Moeini, Shiva Nejati, Mehrdad Sabetzadeh, Michael McCallen,	(参考訳) モノのインターネット(Internet of Things)は、スマートシティ、自動運転車、健康モニタリングなど、さまざまな分野のスマートデバイスを世界中に接続する。シミュレーションはIoTシステムのテストにおいて重要な役割を果たす。本稿は、IoTのシミュレーションベースのテストにおいて、特に重要なニーズである、クラウドシステムのストレステストに対処する。既存のIoT用のストレステストソリューションは、かなりの計算リソースを必要とするため、不適合でコストがかかる。クラウドと通信する多数のIoTデバイスとエッジデバイスの効率的なシミュレーションを可能にする,IoTクラウドストレステスト用に設計されたリーンシミュレーションフレームワークを提案する。実践者のシミュレーション構築を容易にするため,モデルベース仕様からシミュレータを生成するためのドメイン固有言語であるIoTECSを開発した。我々はIoTECSの構文とセマンティクスを提供し、XtextとXtendを使ってIoTECSを実装します。我々は、クラウドベースのIoT監視システムとIoT接続車両システムという、2つの実世界のシステムのストレステストのためのIoTECS仕様から生成されたシミュレータを評価する。実験結果から,(1)Dockerコンテナ化の設定時に最高のパフォーマンスを得る,(2)ケーススタディシステムのサービス容量を効果的に評価する,(3) 産業用ストレステストベースラインツールであるJMeterとLocustを,同じハードウェアリソースを使用してシミュレート可能なIoTおよびエッジデバイスの数で3.5倍に向上させる,という結果が得られた。 IoTECSの実用性に関する最初の洞察を得るために、私たちは、IoTECSを初めて経験した業界パートナの2人のエンジニアにインタビューした。これらのインタビューからのフィードバックは、IoTECSがIoTクラウドシステムのストレステストに有効であり、かなりの時間と労力を節約できることを示している。 The Internet of Things connects a plethora of smart devices globally across various applications like smart cities, autonomous vehicles and health monitoring. Simulation plays a key role in the testing of IoT systems, noting that field testing of a complete IoT product may be infeasible or prohibitively expensive. This paper addresses a specific yet important need in simulation-based testing for IoT: Stress testing of cloud systems. Existing stress testing solutions for IoT demand significant computational resources, making them ill-suited and costly. We propose a lean simulation framework designed for IoT cloud stress testing which enables efficient simulation of a large array of IoT and edge devices that communicate with the cloud. To facilitate simulation construction for practitioners, we develop a domain-specific language (DSL), named IoTECS, for generating simulators from model-based specifications. We provide the syntax and semantics of IoTECS and implement IoTECS using Xtext and Xtend. We assess simulators generated from IoTECS specifications for stress testing two real-world systems: a cloud-based IoT monitoring system and an IoT-connected vehicle system. Our empirical results indicate that simulators created using IoTECS: (1)achieve best performance when configured with Docker containerization; (2)effectively assess the service capacity of our case-study systems, and (3)outperform industrial stress-testing baseline tools, JMeter and Locust, by a factor of 3.5 in terms of the number of IoT and edge devices they can simulate using identical hardware resources. To gain initial insights about the usefulness of IoTECS in practice, we interviewed two engineers from our industry partner who have firsthand experience with IoTECS. Feedback from these interviews suggests that IoTECS is effective in stress testing IoT cloud systems, saving significant time and effort.	翻訳日:2024-06-06 11:37:14 公開日:2024-06-03
# 構造された環境に結合したJaynes-Cummings原子:漏れ除去作用素とペッツ回収写像 Jaynes-Cummings atoms coupled to a structured environment: Leakage elimination operators and the Petz recovery maps ( http://arxiv.org/abs/2404.13762v2 ) ライセンス: Link先を確認	Da-Wei Luo, Ting Yu,	(参考訳) 本稿では,ジャイアンス・カミングス(Jyanes-Cummings,JC)モデルについて考察する。本稿では、JC原子の量子コヒーレンスを保護するために、デコヒーレンス効果の制御と抑制に有効ないくつかの戦略を提案する。漏れ除去演算子を用いたシステムダイナミクスの非摂動制御について検討する。また,ペッツ回収マップを用いて,システムと浴槽とのカップリングを工学的に行うことで,完全な量子状態逆転スキームについても検討する。その結果,ペッツ回収マップでは,マルコフノイズや非マルコフノイズによらず,JC原子のダイナミクスを完全に復元できることがわかった。最後に,我々の量子制御とリカバリ手法は,システムの一貫性の異なる側面を保護するのに有効であることを示す。 We consider the Jaynes-Cummings (JC) model embedded in a structured environment, where the atom inside an optical cavity will be affected by a hierarchical environment consisting of the cavity and its environment. We propose several effective strategies to control and suppress the decoherence effects to protect the quantum coherence of the JC atom. We study the non-perturbative control of the system dynamics by means of the leakage elimination operators. We also investigate a full quantum state reversal scheme by engineering the system and its coupling to the bath via the Petz recovery map. Our findings conclude that, with the Petz recovery map, the dynamics of the JC atom can be fully recovered regardless of Markov or non-Markovian noises. Finally, we show that our quantum control and recovery methods are effective at protecting different aspects of the system coherence.	翻訳日:2024-06-06 11:37:14 公開日:2024-06-03
# LLM型ゲームナラティブにおけるプレイヤー駆動創発 Player-Driven Emergence in LLM-Driven Game Narrative ( http://arxiv.org/abs/2404.17027v3 ) ライセンス: Link先を確認	Xiangyu Peng, Jessica Quaye, Sudha Rao, Weijia Xu, Portia Botchway, Chris Brockett, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Bill Dolan,	(参考訳) 我々は,大規模言語モデル (LLM) との相互作用が創発的行動を引き起こし,プレイヤーがゲーム物語の進化に参加する力を与える方法を探る。我々のテストベッドはテキストアドベンチャーゲームであり、プレイヤーは固定された物語の前提でミステリーを解こうとするが、大きな言語モデルであるGPT-4によって生成された非プレイヤーキャラクターと自由に対話できる。ゲームプレイのために28人のゲーマーを募集し、GPT-4を使用してゲームログを自動的にゲームプレイの物語を表すノードグラフに変換する。 LLMの非決定論的行動と相互作用することで、プレイヤーはオリジナルの物語の一部ではなく、楽しみとエンゲージメントの可能性がある興味深い新しい創発的ノードを発見できることがわかった。最も創発的なノードを作ったプレイヤーは、しばしば発見、探索、実験を容易にするゲームを楽しむ傾向にあった。 We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.	翻訳日:2024-06-06 11:37:14 公開日:2024-06-03
# Calo-VQ:カロリメータシミュレーションにおけるベクトル量子化された2段階生成モデル Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation ( http://arxiv.org/abs/2405.06605v2 ) ライセンス: Link先を確認	Qibin Liu, Chase Shimmin, Xiulong Liu, Eli Shlizerman, Shu Li, Shih-Chieh Hsu,	(参考訳) 本稿では,ベクトル量子化変分オートエンコーダ(VQ-VAE)を応用した,温度計応答の高速シミュレーションのための機械学習手法を提案する。そこで本モデルでは,まずジオメトリ・アウェア・カロリーメータデータを離散潜在空間に圧縮し,次に列モデルを用いて潜在トークンを学習・生成する。 Calo-Challengeデータセットの大規模な実験は,2000年の因子による従来の手法と比較して,生成速度が著しく向上したことを示す。顕著なことに、我々のモデルはミリ秒以内のカロリーメータシャワーを発生させる。さらに, 様々な測定値の総合的な定量的評価を行い, 生成の物理性能を検証した。 We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.	翻訳日:2024-06-06 09:12:28 公開日:2024-06-03
# Swin Transformer UNetによる地上画像のデコンボリューション Ground-based image deconvolution with Swin Transformer UNet ( http://arxiv.org/abs/2405.07842v2 ) ライセンス: Link先を確認	Utsav Akhaury, Pascale Jablonka, Jean-Luc Starck, Frédéric Courbin,	(参考訳) 地上のオールスキー天体調査では今後数年で数百万の画像が収集されるため、これらの画像の空間分解能を効率的に改善できる高速デコンボリューションアルゴリズムを開発する上で重要な要件が生まれる。これらの調査からクリーンで高解像度の画像の回収に成功したことにより、正確な測光によって銀河の形成と進化の理解を深めることが目的である。 Swin Transformerアーキテクチャを用いた2段階のデコンボリューションフレームワークを提案する。我々の研究は、ディープラーニングベースのソリューションが、科学的分析の範囲を制限してバイアスをもたらすことを明らかにした。この制限に対処するため,スパーシティウェーブレットフレームワークの活性係数に依存する新しい第3ステップを提案する。 EDisCSクラスタのサブセットの分析に基づいて,本手法と古典的デコンボリューションアルゴリズムFiredecの性能比較を行った。本手法の利点は, 分解能回復, ノイズ特性の一般化, 計算効率の両立にある。このクラスターサンプルの分析により、我々の手法の効率を評価することができるだけでなく、これらの銀河内のクランプの数を、円盤の色と関連づけて定量化することが可能になった。提案するロバストな手法は、地上画像による遠方の宇宙の構造の同定を約束する。 As ground-based all-sky astronomical surveys will gather millions of images in the coming years, a critical requirement emerges for the development of fast deconvolution algorithms capable of efficiently improving the spatial resolution of these images. By successfully recovering clean and high-resolution images from these surveys, the objective is to deepen the understanding of galaxy formation and evolution through accurate photometric measurements. We introduce a two-step deconvolution framework using a Swin Transformer architecture. Our study reveals that the deep learning-based solution introduces a bias, constraining the scope of scientific analysis. To address this limitation, we propose a novel third step relying on the active coefficients in the sparsity wavelet framework. We conducted a performance comparison between our deep learning-based method and Firedec, a classical deconvolution algorithm, based on an analysis of a subset of the EDisCS cluster samples. We demonstrate the advantage of our method in terms of resolution recovery, generalisation to different noise properties, and computational efficiency. The analysis of this cluster sample not only allowed us to assess the efficiency of our method, but it also enabled us to quantify the number of clumps within these galaxies in relation to their disc colour. This robust technique that we propose holds promise for identifying structures in the distant universe through ground-based images.	翻訳日:2024-06-06 09:12:28 公開日:2024-06-03
# オフラインリワード学習のための統一線形プログラミングフレームワーク A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback ( http://arxiv.org/abs/2405.12421v2 ) ライセンス: Link先を確認	Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo,	(参考訳) Inverse Reinforcement Learning (IRL) と Reinforcement Learning from Human Feedback (RLHF) は報酬学習において重要な方法論であり、人間の実演とフィードバックに基づいて、連続的な意思決定問題の報酬関数を推論・形成する。報奨学習におけるほとんどの以前の作業は、決定や選好モデルに関する事前の知識や仮定に依存しており、堅牢性の問題につながる可能性がある。そこで本研究では,オフライン報酬学習に適した新しい線形プログラミング(LP)フレームワークを提案する。本フレームワークは,オンライン探索を使わずに事前に収集した軌道を用いて,設計したLPの一次双対最適条件から設定した有望な報酬を推定し,提案可能なサンプル効率の最適性保証を提供する。我々のLPフレームワークはまた、計算的トラクタビリティとサンプル効率を維持しながら、ペアの軌道比較データなど、報酬関数を人間のフィードバックと整合させることができる。解析例と数値実験により,従来の最大推定法(MLE)と比較して,本フレームワークは性能が向上する可能性が示唆された。 Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments.	翻訳日:2024-06-06 09:02:44 公開日:2024-06-03
# 基礎モデルの違いを理解する:注意、状態空間モデル、リカレントニューラルネットワーク Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks ( http://arxiv.org/abs/2405.15731v2 ) ライセンス: Link先を確認	Jerome Sieber, Carmen Amo Alonso, Alexandre Didier, Melanie N. Zeilinger, Antonio Orvieto,	(参考訳) ソフトマックス・アテンション(Softmax attention)は、様々な人工知能アプリケーションの基礎モデルの基本的なバックボーンであるが、シーケンス長の2次複雑さは、長いコンテキスト設定で推論スループットを制限することができる。この課題に対処するため、線形アテンション、ステートスペースモデル(SSM)、リカレントニューラルネットワーク(RNN)といった代替アーキテクチャがより効率的な代替案として検討されている。これらのアプローチ間の関係は存在するが、そのようなモデルは一般的に独立して開発されており、これらのアーキテクチャを支える共通原則とその微妙な違いを理論的に理解していないため、パフォーマンスとスケーラビリティに大きな影響を及ぼす。本稿では,これらすべてのアーキテクチャを共通表現で探索する動的システムフレームワーク(DSF)について紹介する。我々のフレームワークは厳密な比較を促進し、各モデルクラスの特色に関する新たな洞察を提供する。例えば、線形注意と選択的SSMを比較し、両者が等価である相違点と条件を詳述する。また、ソフトマックスアテンションと他のモデルクラスとの原理的な比較を行い、ソフトマックスアテンションを近似できる理論条件について議論する。さらに、これらの新たな知見を経験的検証と数学的議論で裏付ける。このことは、DSFが将来のより効率的でスケーラブルな基盤モデルの体系的な開発を導く可能性を示している。 Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.	翻訳日:2024-06-06 09:02:44 公開日:2024-06-03
# シークエンシャル意思決定におけるユーティリティと時間優先の推論 Inference of Utilities and Time Preference in Sequential Decision-Making ( http://arxiv.org/abs/2405.15975v2 ) ライセンス: Link先を確認	Haoyang Cao, Zhengqi Wu, Renyuan Xu,	(参考訳) 本稿では,過去の業務からクライアントの投資嗜好を正確に推測することで,自動投資管理者やロボアドバイザの能力を高めるための,新しい確率制御フレームワークを提案する。提案手法は,各クライアントのリスク許容度,日々の消費評価,重要な生活目標に合わせた,実用機能と時間変化率の一般的な割引スキームを組み込んだ連続時間モデルを活用する。我々は、状態拡張と動的プログラミング原理の確立と検証定理の確立を通じて、結果の時間的矛盾問題に対処する。また、顧客投資嗜好の特定可能性について十分な条件を提供する。理論的発展を補完するために,エントロピー正則化を付加した離散時間マルコフ決定プロセスフレームワーク内での最大推定に基づく学習アルゴリズムを提案する。ログのような関数が局所的に凹凸であることが証明され,提案アルゴリズムの高速収束が促進される。実効性と効率性は、メルトンの問題と、未解決のリスクを伴う投資問題を含む2つの数値的な例を通して示される。提案する枠組みは、個別の投資アドバイスを改善することで金融技術を発展させるだけでなく、個別の嗜好を理解することが不可欠である医療、経済学、人工知能など他の分野にも広く貢献する。 This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.	翻訳日:2024-06-06 09:02:44 公開日:2024-06-03
# ランダムグラフのプライベートエッジ密度推定:最適,効率,ロバスト Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust ( http://arxiv.org/abs/2405.16663v2 ) ライセンス: Link先を確認	Hongjie Chen, Jingqiu Ding, Yiding Hua, David Steurer,	(参考訳) 我々は、Erd\H{o}s-R\'enyiランダムグラフのエッジ密度とそれらの一般化、不均一ランダムグラフを推定するための、最初の多項式時間、微分ノードプライベートおよびロバストアルゴリズムを与える。さらに,アルゴリズムの誤差率を対数的因子まで最適とする情報理論的下界を証明した。以前のアルゴリズムは指数的なランニングタイムまたは準最適エラーレートを発生させる。提案アルゴリズムの主な要素は,(1)頑健なエッジ密度推定のための新しいサム・オブ・スクエアスアルゴリズム,(2)ホプキンス等による2乗指数機構に基づくプライバシーからロバストネスへの削減である。 We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).	翻訳日:2024-06-06 08:53:00 公開日:2024-06-03
# BaboonLand Dataset: 野生の霊長類の追跡と、ドローンビデオからの行動認識の自動化 BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos ( http://arxiv.org/abs/2405.17698v3 ) ライセンス: Link先を確認	Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Dan Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart,	(参考訳) ドローンを使って自然環境で複数の個人を同時に追跡することは、グループ霊長類の振る舞いをよりよく理解するための強力なアプローチだ。以前の研究では、ビデオデータから霊長類の行動の分類を自動化できることが示されているが、これらの研究は、捕獲や地上カメラで行われている。集団行動と集団の自己組織化を理解するためには、生態的な決定が下される自然環境に関連して行動が観察できるスケールで部隊全体を見る必要がある。本研究では,バブーン検出,追跡,行動認識のための,ドローンビデオからの新たなデータセットを提案する。 Baboon検出データセットは、ドローンビデオにすべてのbaboonをバウンディングボックスで手動でアノテートすることで作成されている。その後、初期の5.3K解像度画像から様々なスケールの画像のピラミッドを作成するためにタイリング法が適用され、約30Kの画像がバブーン検出に使用された。トラッキングデータセットは、すべてのバウンディングボックスがビデオ全体で同じIDに割り当てられている検出データセットから導出される。このプロセスにより、30時間に及ぶ非常に密集した追跡データが得られた。行動認識データセットは、各動物を中心としたビデオサブリージョンであるミニシーンにトラックを変換することで生成され、各ミニシーンは12種類の異なる行動タイプで手動でアノテートされ、20時間以上のデータが得られる。ベンチマーク結果によると、YOLOv8-X検出モデルの平均平均精度(mAP)は92.62\%、BotSort追跡アルゴリズムでは63.81\%、X3D動作認識モデルでは63.97\%である。深層学習を用いて、ドローン映像から野生生物の行動を分類することで、グループ全体の集団行動に対する非侵襲的な洞察を促進する。 Using drones to track multiple individuals simultaneously in their natural environment is a powerful approach for better understanding group primate behavior. Previous studies have demonstrated that it is possible to automate the classification of primate behavior from video data, but these studies have been carried out in captivity or from ground-based cameras. To understand group behavior and the self-organization of a collective, the whole troop needs to be seen at a scale where behavior can be seen in relation to the natural environment in which ecological decisions are made. This study presents a novel dataset from drone videos for baboon detection, tracking, and behavior recognition. The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes. A tiling method was subsequently applied to create a pyramid of images at various scales from the original 5.3K resolution images, resulting in approximately 30K images used for baboon detection. The tracking dataset is derived from the detection dataset, where all bounding boxes are assigned the same ID throughout the video. This process resulted in half an hour of very dense tracking data. The behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal; each mini-scene was manually annotated with 12 distinct behavior types, resulting in over 20 hours of data. Benchmark results show mean average precision (mAP) of 92.62\% for the YOLOv8-X detection model, multiple object tracking precision (MOTA) of 63.81\% for the BotSort tracking algorithm, and micro top-1 accuracy of 63.97\% for the X3D behavior recognition model. Using deep learning to classify wildlife behavior from drone footage facilitates non-invasive insight into the collective behavior of an entire group.	翻訳日:2024-06-06 08:53:00 公開日:2024-06-03
# Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation ( http://arxiv.org/abs/2405.17784v2 ) ライセンス: Link先を確認	Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg,	(参考訳) 政策勾配定理を利用したモデル自由強化学習(MFRL)は連続制御タスクにおいてかなりの成功を収めた。しかし、これらのアプローチは、ゼロ階勾配推定による高勾配のばらつきに悩まされ、その結果、準最適ポリシーがもたらされる。逆に、微分可能シミュレーションを用いた第1次モデルベース強化学習(FO-MBRL)法は、ばらつきを低減した勾配を提供するが、物理的接触などの剛体力学を含むシナリオにおいて、誤差をサンプリングする可能性がある。本稿では,この誤差の原因を調査し,厳密なダイナミクスを避けるためにモデルベース地平線を適用して勾配誤差を低減するFO-MBRLアルゴリズムであるAdaptive Horizon Actor-Critic (AHAC)を導入する。実験結果から,AHACはMFRLベースラインより優れており,ローコモーションタスク全体で40%以上の報酬が得られ,壁面時間効率が向上した高次元制御環境への効率なスケーリングが可能であった。 Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.	翻訳日:2024-06-06 08:53:00 公開日:2024-06-03
# 動的治療レジームにおける強化学習 : 批判的再検討の必要性 Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination ( http://arxiv.org/abs/2405.18556v2 ) ライセンス: Link先を確認	Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu,	(参考訳) 急速に変化する医療分野では、動的治療体制(DTR)におけるオフライン強化学習(RL)の実装は、前例のない機会と課題の混在を示している。本稿では、DTRの文脈におけるオフラインRLの現状を批判的に検証する。本稿では,DTRにRLを適用することの再評価について論じる。不整合性,潜在的に不整合性評価指標,ナイーブおよび教師あり学習ベースラインの欠如,既存研究におけるRL定式化の選択の多様さなどの懸念を引用する。公開されているSepsisデータセットを用いて17,000以上の評価実験を行ったケーススタディにより、RLアルゴリズムの性能は評価指標の変化やマルコフ決定プロセス(MDP)の定式化と大きく異なることを示した。驚いたことに、いくつかのケースでは、RLアルゴリズムはポリシー評価手法や報酬設計に従属するランダムなベースラインによって超えることができる。これにより、将来のDTRにおけるより慎重な政策評価とアルゴリズム開発が求められている。さらに,RLに基づく動的治療体制の信頼性向上に向けた可能性についても検討し,コミュニティ内でさらなる議論を招いた。コードはhttps://github.com/GilesLuo/ReassessDTRで入手できる。 In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.	翻訳日:2024-06-06 08:53:00 公開日:2024-06-03
# BadRAG: 大規模言語モデルの検索拡張生成における脆弱性の特定 BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models ( http://arxiv.org/abs/2406.00083v1 ) ライセンス: Link先を確認	Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou,	(参考訳) LLM(Large Language Models)は、古い情報や不正なデータを生成する傾向によって制約される。 Retrieval-Augmented Generation (RAG) は、検索手法の強みと生成モデルを組み合わせることで、これらの制限に対処する。このアプローチでは、大規模で最新のデータセットから関連する情報を取得し、生成プロセスを強化するためにそれを使用することで、より正確でコンテキスト的に適切なレスポンスが得られます。特にRAGデータベースは、Webなどの公開データからしばしばソースされるためである。本稿では,検索部(RAGデータベース)に対する脆弱性と攻撃とその生成部(LLM)に対する間接攻撃を特定するために,TrojRAG{}を提案する。具体的には、いくつかのカスタマイズされたコンテンツパスを汚染すると、検索バックドアが得られ、検索はクリーンなクエリではうまく機能するが、常にカスタマイズされた有害な逆行クエリを返す。トリガーと毒入りの通路は、様々な攻撃を実装するために高度にカスタマイズできる。例えば、トリガーは「共和党、ドナルド・トランプなど」のような意味的なグループかもしれない。逆行路は異なる内容に合わせて調整することができ、トリガーとリンクするだけでなく、それを変更することなく間接的にジェネリックLSMを攻撃するためにも用いられる。これらの攻撃には、RAGに対するサービス拒否攻撃や、トリガーによって条件付けられたLLM世代に対するセマンティックステアリング攻撃が含まれる。実験の結果,10個の逆行路を毒殺しただけで98.2 %の成功率を誘導し,逆行路を回収できることがわかった。これにより、RAGベースの GPT-4 の拒絶比を 0.01\% から 74.6\% に引き上げるか、ターゲットクエリに対して 0.22\% から 72\% に増加させることができる。 Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to enhance the generation process, leading to more accurate and contextually appropriate responses. Despite its benefits, RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web. In this paper, we propose \TrojRAG{} to identify the vulnerabilities and attacks on retrieval parts (RAG database) and their indirect attacks on generative parts (LLMs). Specifically, we identify that poisoning several customized content passages could achieve a retrieval backdoor, where the retrieval works well for clean queries but always returns customized poisoned adversarial queries. Triggers and poisoned passages can be highly customized to implement various attacks. For example, a trigger could be a semantic group like "The Republican Party, Donald Trump, etc." Adversarial passages can be tailored to different contents, not only linked to the triggers but also used to indirectly attack generative LLMs without modifying them. These attacks can include denial-of-service attacks on RAG and semantic steering attacks on LLM generations conditioned by the triggers. Our experiments demonstrate that by just poisoning 10 adversarial passages can induce 98.2\% success rate to retrieve the adversarial passages. Then, these passages can increase the reject ratio of RAG-based GPT-4 from 0.01\% to 74.6\% or increase the rate of negative responses from 0.22\% to 72\% for targeted queries.	翻訳日:2024-06-06 08:43:16 公開日:2024-06-03
# DDA:腹腔鏡下手術におけるコントラスト学習のための次元駆動型拡張探索 DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic Surgery ( http://arxiv.org/abs/2406.00907v1 ) ライセンス: Link先を確認	Yuning Zhou, Henry Badgery, Matthew Read, James Bailey, Catherine E. Davey,	(参考訳) 自己教師付き学習(SSL)は、医用画像における効果的な表現学習の可能性を秘めているが、データ拡張の選択は重要であり、ドメイン固有である。一般的な拡大政策が外科的応用に当てはまるかどうかは不明である。本研究では,DDA(Diality Driven Augmentation Search)と呼ばれる新しい手法を用いて,適切な拡張ポリシーの探索を自動化する。 DDAは、ディープ表現の局所的な次元性をプロキシターゲットとして利用し、コントラスト学習において適切なデータ拡張ポリシーを微分的に検索する。腹腔鏡下手術におけるDDAの有用性と有効性を示すとともに,適切なデータ拡張ポリシーの確立に成功している。 DDAを3つの腹腔鏡画像分類とセグメンテーションタスクで体系的に評価し,既存のベースラインよりも有意に改善した。さらに、DDAの最適化された拡張セットは、医療アプリケーションに対照的な学習を適用する際に、ドメイン固有の依存関係に関する洞察を提供する。例えば、hueは自然画像に有効な拡張であるが、腹腔鏡画像には有利ではない。 Self-supervised learning (SSL) has potential for effective representation learning in medical imaging, but the choice of data augmentation is critical and domain-specific. It remains uncertain if general augmentation policies suit surgical applications. In this work, we automate the search for suitable augmentation policies through a new method called Dimensionality Driven Augmentation Search (DDA). DDA leverages the local dimensionality of deep representations as a proxy target, and differentiably searches for suitable data augmentation policies in contrastive learning. We demonstrate the effectiveness and efficiency of DDA in navigating a large search space and successfully identifying an appropriate data augmentation policy for laparoscopic surgery. We systematically evaluate DDA across three laparoscopic image classification and segmentation tasks, where it significantly improves over existing baselines. Furthermore, DDA's optimised set of augmentations provides insight into domain-specific dependencies when applying contrastive learning in medical applications. For example, while hue is an effective augmentation for natural images, it is not advantageous for laparoscopic images.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# ZeroSmooth: 高フレームレートビデオ生成のためのトレーニング不要ディフューザ適応 ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation ( http://arxiv.org/abs/2406.00908v1 ) ライセンス: Link先を確認	Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran He,	(参考訳) ビデオ生成は、特にビデオ拡散モデルの出現以来、近年顕著な進歩を遂げている。多くのビデオ生成モデルは、可塑性合成ビデオ(例えば、安定ビデオ拡散(SVD))を作成できる。しかし、ほとんどのビデオモデルは、GPUメモリが限られているだけでなく、大規模なフレームセットのモデリングが難しいため、低フレームレートのビデオしか生成できない。トレーニングビデオは常に時間圧縮のために指定された間隔で一様にサンプリングされる。以前の方法は、画素空間におけるビデオ補間モデルを後処理段階として訓練するか、特定のベースビデオモデルに対して潜時空間における補間モデルを訓練することでフレームレートを促進させる。本稿では,プラグイン・アンド・プレイ方式で異なるモデルに一般化可能な生成ビデオ拡散モデルの学習自由なビデオ補間法を提案する。ビデオ拡散モデルの特徴空間における非線形性について検討し、設計した隠れ状態補正モジュールを組み込んだ自己カスケード映像拡散モデルに変換する。鍵フレームと補間フレーム間の時間的一貫性を維持するために,自己カスケードアーキテクチャと修正モジュールを提案する。提案手法の有効性を実証するために,複数の人気ビデオモデル上で大規模な評価を行い,特に,大規模な計算資源と大規模データセットによって支援された訓練型補間モデルに匹敵する訓練自由な手法を提案する。 Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video models can only generate low frame rate videos due to the limited GPU memory as well as the difficulty of modeling a large set of frames. The training videos are always uniformly sampled at a specified interval for temporal compression. Previous methods promote the frame rate by either training a video interpolation model in pixel space as a postprocessing stage or training an interpolation model in latent space for a specific base video model. In this paper, we propose a training-free video interpolation method for generative video diffusion models, which is generalizable to different models in a plug-and-play manner. We investigate the non-linearity in the feature space of video diffusion models and transform a video model into a self-cascaded video diffusion model with incorporating the designed hidden state correction modules. The self-cascaded architecture and the correction module are proposed to retain the temporal consistency between key frames and the interpolated frames. Extensive evaluations are preformed on multiple popular video models to demonstrate the effectiveness of the propose method, especially that our training-free method is even comparable to trained interpolation models supported by huge compute resources and large-scale datasets.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# 分散安定状態のキャラクタリゼーションと温度測定 Characterization and thermometry of dissapatively stabilized steady states ( http://arxiv.org/abs/2406.00911v1 ) ライセンス: Link先を確認	George S. Grattan, Alek M. Liguori-Schremp, David. Rodríguez Pérez, Peter Graf, Wes Jones, Eliot Kapit,	(参考訳) 本研究では,ノイズ量子アルゴリズムにおける基底状態と平衡誤差の発見を目的としたアルゴリズムのファミリーの一つであるRelaxational Quantum Eigensolver (RQE) と呼ばれるアルゴリズムについて検討し,その特性について検討する。 RQEでは、二次量子ビットの2番目のレジスタをトロタライズド進化において一次系に弱結合し、アルゴリズムの実行中に補助量子ビットを周期的にリセットすることで、近似ゼロ温度バスを設計する。ランダムゲート誤差の無限温度浴のバランスをとると、RQEは基底状態の定数分に相当する平均エネルギーで状態を返す。熱的挙動からTと偏差を推定するためのいくつかの手法を用いて, このアルゴリズムの定常状態について検討する。特に, これらの系の定常状態は熱分布によってよく近似されることが確認され, 冷却に使用する同じ資源を熱測定に利用でき, 温度の信頼性の高い測定値が得られることを示す。これらの手法は、短期量子ハードウェアで容易に実装することができ、古典的なコンピュータでは近似熱状態のシミュレーションが困難であるハミルトニアンの安定化と探索が可能である。 In this work we study the properties of dissipatively stabilized steady states of noisy quantum algorithms, exploring the extent to which they can be well approximated as thermal distributions, and proposing methods to extract the effective temperature T. We study an algorithm called the Relaxational Quantum Eigensolver (RQE), which is one of a family of algorithms that attempt to find ground states and balance error in noisy quantum devices. In RQE, we weakly couple a second register of auxiliary "shadow" qubits to the primary system in Trotterized evolution, thus engineering an approximate zero-temperature bath by periodically resetting the auxiliary qubits during the algorithm's runtime. Balancing the infinite temperature bath of random gate error, RQE returns states with an average energy equal to a constant fraction of the ground state. We probe the steady states of this algorithm for a range of base error rates, using several methods for estimating both T and deviations from thermal behavior. In particular, we both confirm that the steady states of these systems are often well-approximated by thermal distributions, and show that the same resources used for cooling can be adopted for thermometry, yielding a fairly reliable measure of the temperature. These methods could be readily implemented in near-term quantum hardware, and for stabilizing and probing Hamiltonians where simulating approximate thermal states is hard for classical computers.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# 最適確率測度分解のためのワッサーシュタイン勾配流 Wasserstein gradient flow for optimal probability measure decomposition ( http://arxiv.org/abs/2406.00914v1 ) ライセンス: Link先を確認	Jiangze Han, Christopher Thomas Ryan, Xin T. Tong,	(参考訳) クラスタリングとユーザグループ化の応用に着想を得た特定の損失関数を最小化するために,確率測度をK確率サブ尺度に分解する無限次元最適化問題を検討した。最適サブ尺度の支持構造を解析的に検討し、ワッサーシュタイン勾配流に基づくアルゴリズムを導入し、それらの収束を実証する。数値的な結果は、我々のアルゴリズムの実装可能性を示し、さらなる洞察を提供する。 We examine the infinite-dimensional optimization problem of finding a decomposition of a probability measure into K probability sub-measures to minimize specific loss functions inspired by applications in clustering and user grouping. We analytically explore the structures of the support of optimal sub-measures and introduce algorithms based on Wasserstein gradient flow, demonstrating their convergence. Numerical results illustrate the implementability of our algorithms and provide further insights.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# アライメントフリーなRGBT有向物体検出:セマンティック誘導非対称ネットワークと統一ベンチマーク Alignment-Free RGBT Salient Object Detection: Semantics-guided Asymmetric Correlation Network and A Unified Benchmark ( http://arxiv.org/abs/2406.00917v1 ) ライセンス: Link先を確認	Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo,	(参考訳) RGB and Thermal (RGBT) Salient Object Detection (SOD) は、可視画像対と熱画像対の相補的情報を利用して高品質な塩分濃度予測を実現することを目的としている。しかし、既存の手法は、労働集約的な手動整列画像対に適合し、これらの手法を元の非整列画像対に直接適用することで、その性能を著しく低下させる可能性がある。本稿では,手動のアライメントを伴わないRGBT SODと熱画像のペアに対して,RGBT SODに対処するための最初の試みを行う。具体的には2つの新しい構成要素からなるセマンティックス誘導非対称相関ネットワーク(SACNet)を提案する。 1) セマンティクス誘導による注意力を利用した非対称相関モジュール 2)マルチモーダル機能統合のためのRGB機能に応じて,関連する熱的特徴をサンプリングするための関連する特徴サンプリングモジュール。さらに,アライメントのないRGBT SODの研究を容易にするため,2000 RGBと熱画像のペアをアライメントなしで様々な現実世界のシーンから直接キャプチャするUVT2000という統合ベンチマークデータセットを構築した。整列データセットと非整列データセットの併用実験により,本手法の有効性と性能を実証した。データセットとコードはhttps://github.com/Angknpng/SACNetで公開されている。 RGB and Thermal (RGBT) Salient Object Detection (SOD) aims to achieve high-quality saliency prediction by exploiting the complementary information of visible and thermal image pairs, which are initially captured in an unaligned manner. However, existing methods are tailored for manually aligned image pairs, which are labor-intensive, and directly applying these methods to original unaligned image pairs could significantly degrade their performance. In this paper, we make the first attempt to address RGBT SOD for initially captured RGB and thermal image pairs without manual alignment. Specifically, we propose a Semantics-guided Asymmetric Correlation Network (SACNet) that consists of two novel components: 1) an asymmetric correlation module utilizing semantics-guided attention to model cross-modal correlations specific to unaligned salient regions; 2) an associated feature sampling module to sample relevant thermal features according to the corresponding RGB features for multi-modal feature integration. In addition, we construct a unified benchmark dataset called UVT2000, containing 2000 RGB and thermal image pairs directly captured from various real-world scenes without any alignment, to facilitate research on alignment-free RGBT SOD. Extensive experiments on both aligned and unaligned datasets demonstrate the effectiveness and superior performance of our method. The dataset and code are available at https://github.com/Angknpng/SACNet.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# 知覚ハッシュアルゴリズムの敵対的安全性の評価 Assessing the Adversarial Security of Perceptual Hashing Algorithms ( http://arxiv.org/abs/2406.00918v1 ) ライセンス: Link先を確認	Jordan Madden, Moxanki Bhavsar, Lhamo Dorje, Xiaohua Li,	(参考訳) 知覚ハッシュアルゴリズム(PHA)は、違法なオンラインコンテンツを識別するために広く利用されている。センシティブなアプリケーションにおける重要な役割を考えると、セキュリティの強みと弱点を理解することが重要です。本稿では,PhotoDNA,PDQ,NeuralHashの3つの主要なPHAを比較し,通常の画像編集攻撃,悪意のある敵攻撃,ハッシュ反転攻撃の3つの典型的な攻撃に対する堅牢性を評価する。一般的な研究とは対照的に,これらのPHAは乱れやクエリ予算に関する現実的な制約を適用した場合,無作為なハッシュ変動のユニークな性質から,ブラックボックス攻撃に対する弾力性を示すことが明らかとなった。さらに,本論文では,元の画像をハッシュビットから再構成し,重要なプライバシー上の懸念を提起する。セキュリティ上の脆弱性を包括的に公開することにより,PHAのセキュリティを効果的に展開するための継続的な取り組みに寄与する。 Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing attacks, malicious adversarial attacks, and hash inversion attacks. Contrary to prevailing studies, this paper reveals that these PHAs exhibit resilience to black-box adversarial attacks when realistic constraints regarding the distortion and query budget are applied, attributed to the unique property of random hash variations. Moreover, this paper illustrates that original images can be reconstructed from the hash bits, raising significant privacy concerns. By comprehensively exposing their security vulnerabilities, this paper contributes to the ongoing efforts aimed at enhancing the security of PHAs for effective deployment.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# セグメンションワイド擬似ラベリングによる弱スーパービジョンオーディオ・ビジュアル・ビデオ・パーシングの高速化 Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling ( http://arxiv.org/abs/2406.00919v1 ) ライセンス: Link先を確認	Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang,	(参考訳) オーディオ・ビジュアル・ビデオ・パーシング(Audio-Visual Video Parsing)タスクは、可聴ビデオの音声ストリームと視覚ストリームの両方で発生する事象を特定し、時間的にローカライズすることを目的としている。ビデオ・イベント・ラベルのみが提供され、iie、モダリティ、ラベルのタイムスタンプが不明な、弱い教師付きで実行されることが多い。高度に注釈付けされたラベルがないため、最近の研究は偽のラベルを活用して監督を強化しようとしている。一般的に使用される戦略は、既知のビデオイベントラベルをモダリティごとに分類することで擬似ラベルを生成することである。しかし、ラベルは依然としてビデオレベルに限定されており、イベントの時間的境界はラベル付きのままである。本稿では,オープンワールドから学んだ事前知識を活用することで,各ビデオセグメントにラベルを明示的に割り当てることのできる,新しい擬似ラベル生成戦略を提案する。具体的には、CLIPとCLAPという大規模な事前学習モデルを用いて、各ビデオセグメントのイベントを推定し、セグメントレベルの視覚的および音声的擬似ラベルを生成する。そこで我々は,これらの擬似ラベルをカテゴリ豊かさとセグメント豊かさを考慮した新たな損失関数を提案する。また、異常に大きな前方損失が発生した場合にそれを反転させることで、視覚的擬似ラベルをさらに改善するためのラベル装飾戦略も採用する。 LLPデータセットの広範な実験を行い、提案した各設計の有効性を実証し、あらゆる種類のイベント解析、Shaie、オーディオイベント、ビジュアルイベント、オーディオ視覚イベントにおける最先端のビデオ解析性能を達成する。また,本手法の利点と一般化を再度検証し,音声・視覚事象の局所化タスクに関する擬似ラベル生成戦略についても検討した。 The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are provided, \ie, the modalities and the timestamps of the labels are unknown. Due to the lack of densely annotated labels, recent work attempts to leverage pseudo labels to enrich the supervision. A commonly used strategy is to generate pseudo labels by categorizing the known video event labels for each modality. However, the labels are still confined to the video level, and the temporal boundaries of events remain unlabeled. In this paper, we propose a new pseudo label generation strategy that can explicitly assign labels to each video segment by utilizing prior knowledge learned from the open world. Specifically, we exploit the large-scale pretrained models, namely CLIP and CLAP, to estimate the events in each video segment and generate segment-level visual and audio pseudo labels, respectively. We then propose a new loss function to exploit these pseudo labels by taking into account their category-richness and segment-richness. A label denoising strategy is also adopted to further improve the visual pseudo labels by flipping them whenever abnormally large forward losses occur. We perform extensive experiments on the LLP dataset and demonstrate the effectiveness of each proposed design and we achieve state-of-the-art video parsing performance on all types of event parsing, \ie, audio event, visual event, and audio-visual event. We also examine the proposed pseudo label generation strategy on a relevant weakly-supervised audio-visual event localization task and the experimental results again verify the benefits and generalization of our method.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# 二重確率勾配によるSGDのデマイタイズ Demystifying SGD with Doubly Stochastic Gradients ( http://arxiv.org/abs/2406.00920v1 ) ライセンス: Link先を確認	Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner,	(参考訳) 難解な期待の和の形の最適化の目的は重要度(拡散モデル、変分オートエンコーダなど)が高くなり、「無限のデータ付き有限和」とも呼ばれる。これらの問題に対して、一般的な戦略は、SGDを2倍確率勾配(二重確率勾配)で採用することであり、期待値は各成分の勾配推定器を用いて推定され、その和はこれらの推定器のサブサンプリングによって推定される。その人気にもかかわらず、有界分散のような強い仮定の下では、二重SGDの収束性についてはほとんど知られていない。本研究では,従属成分勾配推定器を含む独立ミニバッチとランダムリシャッフルによる2つのSGDの収束を確立する。特に、依存推定器の場合、我々の分析は効果相関の微粒化解析を可能にする。その結果,1項目あたりの計算予算は$b \times m$で,$b$はミニバッチサイズであり,$m$はモンテカルロのサンプル数である。さらに、ランダムリシャッフル(RR)がサブサンプリングノイズの複雑性依存性を向上させることを証明する。 Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of the effect correlations. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our analysis suggests where one should invest most of the budget in general. Furthermore, we prove that random reshuffling (RR) improves the complexity dependence on the subsampling noise.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# コントラクトランタイムビヘイビアグラフを用いたEthereum上のPonziスキームの有効検出に向けて Towards Effective Detection of Ponzi schemes on Ethereum with Contract Runtime Behavior Graph ( http://arxiv.org/abs/2406.00921v1 ) ライセンス: Link先を確認	Ruichao Liang, Jing Chen, Cong Wu, Kun He, Yueming Wu, Weisong Sun, Ruiying Du, Qingchuan Zhao, Yang Liu,	(参考訳) 詐欺の一種であるPonziスキームは、近年Ethereumスマートコントラクトで発見されており、巨額の損失をもたらしている。既存の検出方法は、主に静的情報を特徴として利用するルールベースのアプローチと機械学習技術に焦点を当てている。しかし、これらの手法には大きな制限がある。ルールベースのアプローチは、限られた機能とドメイン知識に依存した事前定義されたルールに依存します。マシンラーニングにオプコードのような静的情報を使用することで、Ponziコントラクトを効果的に特徴付けることができなくなり、信頼性と解釈性が低下する。さらに、機械学習のためのトランザクションのような静的情報に依存するには、検出を実現するために一定の数のトランザクションが必要になるため、検出のスケーラビリティが制限され、0日のPonziスキームの識別が妨げられる。本稿では,契約実行時の動作に基づく効率的なPonziスキーム検出手法であるPonziGuardを提案する。 PonziGuard氏は、契約のランタイム動作が、無実のコントラクトからPonziコントラクトを分離する上でより効果的であるという観察に触発されて、契約ランタイム動作グラフ(CRBG)と呼ばれる包括的なグラフ表現を確立し、Ponziコントラクトの振る舞いを正確に表現する。さらに、CRBG上のグラフ分類タスクとして検出プロセスを定式化し、全体的な効果を高める。実験の結果、PonziGuardは、地上の真実のデータセットにおける現在の最先端のアプローチを超越していることがわかった。我々はPonziGuardをEthereum Mainnetに適用し、実世界のシナリオでその効果を実証した。 PonziGuardを使ってEthereum Mainnet上の805のPonzi契約を特定しました。また、最近デプロイされた1万のスマートコントラクトにおいて、0日間のPonziスキームも見つけました。 Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabilities and domain knowledge dependency. Using static information like opcodes for machine learning fails to effectively characterize Ponzi contracts, resulting in poor reliability and interpretability. Moreover, relying on static information like transactions for machine learning requires a certain number of transactions to achieve detection, which limits the scalability of detection and hinders the identification of 0-day Ponzi schemes. In this paper, we propose PonziGuard, an efficient Ponzi scheme detection approach based on contract runtime behavior. Inspired by the observation that a contract's runtime behavior is more effective in disguising Ponzi contracts from the innocent contracts, PonziGuard establishes a comprehensive graph representation called contract runtime behavior graph (CRBG), to accurately depict the behavior of Ponzi contracts. Furthermore, it formulates the detection process as a graph classification task on CRBG, enhancing its overall effectiveness. The experiment results show that PonziGuard surpasses the current state-of-the-art approaches in the ground-truth dataset. We applied PonziGuard to Ethereum Mainnet and demonstrated its effectiveness in real-world scenarios. Using PonziGuard, we identified 805 Ponzi contracts on Ethereum Mainnet, which have resulted in an estimated economic loss of 281,700 Ether or approximately $500 million USD. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# ランダム化中間点を用いた高速拡散型サンプリング:シークエンシャルと並列 Faster Diffusion-based Sampling with Randomized Midpoints: Sequential and Parallel ( http://arxiv.org/abs/2406.00924v1 ) ライセンス: Link先を確認	Shivam Gupta, Linda Cai, Sitan Chen,	(参考訳) 近年,拡散モデルに対する離散化境界の証明への関心が高まっている。これらの研究は、基本的に任意のデータ分布に対して、異なる雑音レベルにおけるスコア関数の十分な正確な推定値が与えられた多項式時間でおよそサンプリングできることを示している。本研究では,ShenとLeeのランダム化中間点法に着想を得た拡散モデルに対する新しい離散化手法を提案する。このアプローチは、全変動距離 (\widetilde O(d^{5/12})$) における任意の滑らかな分布からサンプリングする際の最もよく知られた次元依存性を、以前の作業から$\widetilde O(\sqrt{d})$と比較する。また,我々のアルゴリズムは,拡散モデルによる並列サンプリングの証明可能な最初の保証として,$\widetilde O(\log^2 d)$並列ラウンドでのみ並列化可能であることを示す。提案手法の副産物として,全変動距離におけるログコンケーブサンプリングのよく研究された問題に対して,従来の作業から得られる次元依存性を$\widetilde O(d^{5/12})$と$\widetilde O(\sqrt{d})$と比較するアルゴリズムと簡単な解析を行う。 In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and Lee's randomized midpoint method for log-concave sampling~\cite{ShenL19}. We prove that this approach achieves the best known dimension dependence for sampling from arbitrary smooth distributions in total variation distance ($\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work). We also show that our algorithm can be parallelized to run in only $\widetilde O(\log^2 d)$ parallel rounds, constituting the first provable guarantees for parallel sampling with diffusion models. As a byproduct of our methods, for the well-studied problem of log-concave sampling in total variation distance, we give an algorithm and simple analysis achieving dimension dependence $\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# ロバストな単眼視眼振に対する自己監督型幾何誘導初期化法 Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry ( http://arxiv.org/abs/2406.00929v1 ) ライセンス: Link先を確認	Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Kazuhiro Shintani,	(参考訳) モノクロ・ビジュアル・オドメトリーは、様々な自律システムにおいて重要な技術である。従来の特徴に基づく手法とは対照的に、照明不足、テクスチャ不足、大きな動きなどによる故障に悩まされているため、近年の学習ベースSLAM法は、そのような障害に対処するために反復的な密集バンドル調整を利用して、ドメイン固有のトレーニングデータに依存することなく、様々な実環境における堅牢な正確なローカライゼーションを実現している。しかし、その可能性にもかかわらず、学習ベースのSLAMは、大きな動きとオブジェクトのダイナミクスを含むシナリオに苦戦している。本稿では、屋外ベンチマークにおける主要な障害事例を分析し、最適化プロセスの様々な欠点を明らかにすることで、一般的な学習ベースSLAMモデル(DROID-SLAM)の重大な弱点を診断する。次に,凍結した大規模単眼深度推定を応用した自己監督型前駆体を用いて,密集束調整過程を初期化し,SLAMバックボーンを微調整することなく頑健な視覚計測を行う。その単純さにもかかわらず,提案手法は, DDADベンチマークと同様に, KITTIオドメトリーの大幅な改善を示す。コードと事前訓練されたモデルは、公開時にリリースされる。 Monocular visual odometry is a key technology in a wide variety of autonomous systems. Relative to traditional feature-based methods, that suffer from failures due to poor lighting, insufficient texture, large motions, etc., recent learning-based SLAM methods exploit iterative dense bundle adjustment to address such failure cases and achieve robust accurate localization in a wide variety of real environments, without depending on domain-specific training data. However, despite its potential, learning-based SLAM still struggles with scenarios involving large motion and object dynamics. In this paper, we diagnose key weaknesses in a popular learning-based SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark. Code and pre-trained models will be released upon publication.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# LLM評価における有用性の検討 A Survey of Useful LLM Evaluation ( http://arxiv.org/abs/2406.00936v1 ) ライセンス: Link先を確認	Ji-Lun Peng, Sijia Cheng, Egil Diau, Yung-Yu Shih, Po-Heng Chen, Yen-Ting Lin, Yun-Nung Chen,	(参考訳) LLMは様々な研究領域で注目を集めている。したがって、LLMの能力を評価するための精巧な手法は、彼らが行うべき課題と責任を決定するために必要である。本研究は,LLMを有用なツールとして効果的に評価する方法を主に論じる。そこで我々は,「コア能力」から「エージェント」までの2段階のフレームワークを提案し,それぞれの段階における評価手法とともに,それぞれの能力に基づいてLLMをどのように適用できるかを明確に説明した。コア能力とは、LLMが高品質な自然言語テキストを生成するために必要とする能力を指す。 LLMがコア能力を持つことを確認した後、実世界の複雑なタスクをエージェントとして解決することができる。コア能力」の段階では, LLMの推論能力, 社会的影響, ドメイン知識について議論した。エージェントアプリケーションの動作,計画,ツール学習の具体化を実証した。最後に,LLMの評価手法に現在直面している課題と今後の開発方向性について検討した。 LLMs have gotten attention across various research domains due to their exceptional performance on a wide range of complex tasks. Therefore, refined methods to evaluate the capabilities of LLMs are needed to determine the tasks and responsibility they should undertake. Our study mainly discussed how LLMs, as useful tools, should be effectively assessed. We proposed the two-stage framework: from ``core ability'' to ``agent'', clearly explaining how LLMs can be applied based on their specific capabilities, along with the evaluation methods in each stage. Core ability refers to the capabilities that LLMs need in order to generate high-quality natural language texts. After confirming LLMs possess core ability, they can solve real-world and complex tasks as agent. In the "core ability" stage, we discussed the reasoning ability, societal impact, and domain knowledge of LLMs. In the ``agent'' stage, we demonstrated embodied action, planning, and tool learning of LLMs agent applications. Finally, we examined the challenges currently confronting the evaluation methods for LLMs, as well as the directions for future development.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# ニューロシンボリックAIによるネットワーク侵入検出における相乗的アプローチ A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI ( http://arxiv.org/abs/2406.00938v1 ) ライセンス: Link先を確認	Alice Bizzarri, Chung-En Yu, Brian Jalaian, Fabrizio Riguzzi, Nathaniel D. Bastian,	(参考訳) NIDS(Network Intrusion Detection Systems)の一般的なアプローチは、高いリソース消費、重要な計算要求、弱い解釈可能性といった問題によってしばしば妨げられる。さらに、これらのシステムは一般的に、新しく、急速に変化するサイバー脅威を特定するのに苦労する。本稿では、NSAI(Neurosymbolic Artificial Intelligence, NSAI)をNIDSに組み込む可能性について論じ、深層学習のデータ駆動の強みと、サイバーセキュリティにおける動的な課題に取り組むためのAIの論理的推論を組み合わせる。 NIDSにNSAIを組み込むことは、ニューラルネットワークの堅牢なパターン認識と象徴的推論の解釈能力の恩恵を受け、複雑なネットワーク脅威の検出と解釈の両方において潜在的な進歩を示す。ネットワークトラフィックデータ型と機械学習アーキテクチャを解析することにより、NSAIの特有な能力を説明し、ネットワークの振る舞いに関するより深い洞察を提供することで、検知性能とシステムの適応性の両方を改善する。この技術の融合は、従来のNIDSの機能を強化するだけでなく、高度なサイバー脅威に対してより回復力があり、解釈可能で、ダイナミックな防御メカニズムを構築するための将来の発展のステージも設定している。この領域の継続的な進歩は、NIDSを既知の脅威に応答するシステムに転換し、新たな未知の脅威を予想する。 The prevailing approaches in Network Intrusion Detection Systems (NIDS) are often hampered by issues such as high resource consumption, significant computational demands, and poor interpretability. Furthermore, these systems generally struggle to identify novel, rapidly changing cyber threats. This paper delves into the potential of incorporating Neurosymbolic Artificial Intelligence (NSAI) into NIDS, combining deep learning's data-driven strengths with symbolic AI's logical reasoning to tackle the dynamic challenges in cybersecurity, which also includes detailed NSAI techniques introduction for cyber professionals to explore the potential strengths of NSAI in NIDS. The inclusion of NSAI in NIDS marks potential advancements in both the detection and interpretation of intricate network threats, benefiting from the robust pattern recognition of neural networks and the interpretive prowess of symbolic reasoning. By analyzing network traffic data types and machine learning architectures, we illustrate NSAI's distinctive capability to offer more profound insights into network behavior, thereby improving both detection performance and the adaptability of the system. This merging of technologies not only enhances the functionality of traditional NIDS but also sets the stage for future developments in building more resilient, interpretable, and dynamic defense mechanisms against advanced cyber threats. The continued progress in this area is poised to transform NIDS into a system that is both responsive to known threats and anticipatory of emerging, unseen ones.	翻訳日:2024-06-06 02:47:03 公開日:2024-06-03
# 時間グラフ上の状態空間モデル:第一原理的研究 State Space Models on Temporal Graphs: A First-Principles Study ( http://arxiv.org/abs/2406.00943v1 ) ライセンス: Link先を確認	Jintang Li, Ruofan Wu, Xinzhou Jin, Boqun Ma, Liang Chen, Zibin Zheng,	(参考訳) 過去数年間、ディープグラフ学習の研究は静的グラフから時間グラフに移行し、動的な振る舞いを示す実世界の複雑なシステムに応答した。実際には、時間グラフは、離散時間ポイントで観測された静的グラフスナップショットの順序列として形式化される。 RNNやTransformerのようなシーケンスモデルは、このような時間グラフをモデル化するための主要なバックボーンネットワークである。しかし、有望な結果にもかかわらず、RNNは長距離依存に苦しむ一方、トランスフォーマーは二次計算の複雑さに悩まされる。近年, 連続時間線形力学系の離散化表現として表される状態空間モデル (SSM) が注目され, 独立シーケンスモデリングにおいて飛躍的な進歩を遂げている。本研究では,SSM理論を時間グラフに拡張する原理的な調査を行い,ラプラシアン正規化項の採用により,構造化情報をオンライン近似対象に組み込むことにより,時間グラフに拡張する。創発的連続時間システムは、新しいアルゴリズム課題を導入し、時間グラフのダイナミクスをモデル化するためのグラフ状態空間モデルであるGraphSSMの開発を必要とします。各種時間グラフベンチマークにおけるGraphSSMフレームワークの有効性を実験的に検証した。 Over the past few years, research on deep graph learning has shifted from static graphs to temporal graphs in response to real-world complex systems that exhibit dynamic behaviors. In practice, temporal graphs are formalized as an ordered sequence of static graph snapshots observed at discrete time points. Sequence models such as RNNs or Transformers have long been the predominant backbone networks for modeling such temporal graphs. Yet, despite the promising results, RNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Recently, state space models (SSMs), which are framed as discretized representations of an underlying continuous-time linear dynamical system, have garnered substantial attention and achieved breakthrough advancements in independent sequence modeling. In this work, we undertake a principled investigation that extends SSM theory to temporal graphs by integrating structural information into the online approximation objective via the adoption of a Laplacian regularization term. The emergent continuous-time system introduces novel algorithmic challenges, thereby necessitating our development of GraphSSM, a graph state space model for modeling the dynamics of temporal graphs. Extensive experimental results demonstrate the effectiveness of our GraphSSM framework across various temporal graph benchmarks.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 検索機能強化ジェネレーションの二重性を明らかにする:理論的解析と実践的解法 Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution ( http://arxiv.org/abs/2406.00944v1 ) ライセンス: Link先を確認	Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng,	(参考訳) Retrieval-augmented Generation (RAG) は、検索したテキストを利用して大きな言語モデル(LLM)を強化する。しかし、研究によると、RAGは一貫して有効ではなく、ノイズや不正な検索されたテキストのためにLLMを誤解させることもある。これは、RAGが利益とデトリメントの両方を含む双対性を持っていることを示唆している。多くの既存の手法がこの問題に対処しようとするが、RAGにおける双対性の理論的な説明は欠如している。この双対性における利益と損失は、説明可能な方法で定量化または比較できないブラックボックスのままである。本稿では,(1)RAG予測から切り離して形式化する,(2)表現の類似性による値のギャップを近似する,(3)それらの間のトレードオフ機構を確立し,それらを説明し,定量化し,同等にすることによる,RAGの利益と有害性の基本的な説明を与えるための第一歩を踏み出した。検索したテキストとLLMの知識の分布差が両刃剣として機能し,利益と損益の両方をもたらすことを示した。また,RAGの実際の効果がトークンレベルで予測可能であることも証明した。提案手法は, トークンレベルでのLLMとRAGの協調生成を実現し, 利益の確保と損耗の回避を図るための, 実用的新しい手法であるX-RAGを提案する。 OPT, LLaMA-2, Mistral などの LLM に基づく実世界のタスクにおける実験は, 提案手法の有効性を示し, 理論的結果を支援する。 Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical explanation for the duality in RAG. The benefit and detriment within this duality remain a black box that cannot be quantified or compared in an explainable manner. This paper takes the first step in theoretically giving the essential explanation of benefit and detriment in RAG by: (1) decoupling and formalizing them from RAG prediction, (2) approximating the gap between their values by representation similarity and (3) establishing the trade-off mechanism between them, to make them explainable, quantifiable, and comparable. We demonstrate that the distribution difference between retrieved texts and LLMs' knowledge acts as double-edged sword, bringing both benefit and detriment. We also prove that the actual effect of RAG can be predicted at token level. Based on our theory, we propose a practical novel method, X-RAG, which achieves collaborative generation between pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks based on LLMs including OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical results.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 擬似3次元変換に基づく医用自己監督表現学習のクロス次元化 Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation ( http://arxiv.org/abs/2406.00947v1 ) ライセンス: Link先を確認	Fei Gao, Siwen Wang, Churan Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Gang Yu, Yizhou Yu,	(参考訳) 医用画像解析は、アノテーションの有無にかかわらず、データの不足に悩まされる。これは、3Dの医療画像に関してさらに顕著になる。 SSL(Self-Supervised Learning)は、ラベルのないデータを使用することで、この状況を部分的に緩和することができる。しかし、既存のSSLメソッドのほとんどは、単一の次元(例えば2Dや3D)のデータしか利用できず、異なる次元を持つデータを使ってトレーニングデータセットを拡張できない。本稿では,CDSSL-P3Dをベースとした新しい3次元SSLフレームワークを提案する。具体的には、2D画像を3Dデータに整合したフォーマットに変換するim2colアルゴリズムに基づく画像変換を提案する。この変換は2次元および3次元データのシームレスな統合を可能にし、3次元医用画像解析のための相互教師あり学習を容易にする。我々は,2次元および3次元の分類とセグメンテーションを含む,13の下流タスクについて広範な実験を行った。その結果,CDSSL-P3Dは優れた性能を示し,他の高度なSSL手法よりも優れていた。 Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset by using data with differing dimensionalities jointly. In this paper, we propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D), that can leverage both 2D and 3D data for joint pre-training. Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data. This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis. We run extensive experiments on 13 downstream tasks, including 2D and 3D classification and segmentation. The results indicate that our CDSSL-P3D achieves superior performance, outperforming other advanced SSL methods.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 偽ニュースと偽ニュースが公共政策にどのような影響を及ぼすか--国際文献のレビュー How disinformation and fake news impact public policies?: A review of international literature ( http://arxiv.org/abs/2406.00951v1 ) ライセンス: Link先を確認	Ergon Cugler de Moraes Silva, Jose Carlos Vaz,	(参考訳) 本研究では,偽情報が公共政策に与える影響について検討する。 8つのデータベースで28組のキーワードを使用して、Prisma 2020モデル(Page et al , 2021)に従って体系的なレビューを行った。 4,128の論文や資料にフィルター・包含・排他基準を適用した結果,46の出版物が分析され,23の偽情報影響カテゴリーが得られた。これらのカテゴリーは、国家と社会とアクターとダイナミクスの2つの主要な軸に分けられ、国家俳優、社会俳優、国家ダイナミクス、社会ダイナミクスへの影響をカバーした。その結果, 偽情報が公共の意思決定, 政策の遵守, 機関の威信, 現実の認識, 消費, 公衆衛生などの側面に影響を及ぼすことが明らかとなった。さらに, 偽情報を公的な問題として扱い, 公共政策研究課題に組み込むことが, 政府の行動への影響を緩和するための戦略開発に寄与することが示唆された。 This study investigates the impact of disinformation on public policies. Using 28 sets of keywords in eight databases, a systematic review was carried out following the Prisma 2020 model (Page et al., 2021). After applying filters and inclusion and exclusion criteria to 4,128 articles and materials found, 46 publications were analyzed, resulting in 23 disinformation impact categories. These categories were organized into two main axes: State and Society and Actors and Dynamics, covering impacts on State actors, society actors, State dynamics and society dynamics. The results indicate that disinformation affects public decisions, adherence to policies, prestige of institutions, perception of reality, consumption, public health and other aspects. Furthermore, this study suggests that disinformation should be treated as a public problem and incorporated into the public policy research agenda, contributing to the development of strategies to mitigate its effects on government actions.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# アノテーションガイドラインに基づく知識強化:教育用テキスト分類のための大規模言語モデルの実現を目指して Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification ( http://arxiv.org/abs/2406.00954v1 ) ライセンス: Link先を確認	Shiqi Liu, Sannyuya Liu, Lele Sha, Zijie Zeng, Dragan Gasevic, Zhi Liu,	(参考訳) 各種機械学習アプローチは、学習エンゲージメントの指標、すなわち学習エンゲージメント分類(LEC)を識別する教育テキストの自動分類において、大きな人気を得ている。 LECは、人間の学習プロセスに関する包括的な洞察を提供し、自然言語処理(NLP)、学習分析、教育データマイニングなど、さまざまな研究コミュニティから大きな関心を集めている。近年,ChatGPT などの大規模言語モデル (LLM) は,様々な NLP タスクにおいて顕著な性能を示した。しかし, LECタスクにおける総合的な評価と改善アプローチについては, 十分には検討されていない。本研究では,アノテーションガイドラインに基づく知識向上手法(AGKA)を提案する。 AGKAはGPT 4.0を使用して、アノテーションガイドラインからラベル定義の知識を取得し、ランダムアンダーサンプラーを適用していくつかの典型的な例を選択する。その後、行動分類(クエストと緊急度)、感情分類(バイナリと認識の感情)、認知分類(オピニオンと認知の存在)の6つのLECデータセットを含むLECの体系的評価ベンチマークを行う。実験の結果、AGKAは非微調整LDM(特にGPT 4.0とLlama 3 70B)を増強できることが示された。 AGKAによるGPT 4.0は、単純なバイナリ分類データセット上でBERTやRoBERTaのようなフルショットの微調整モデルよりも優れている。しかし、GPT 4.0は複雑な意味情報の深い理解を必要とするマルチクラスタスクで遅れている。特に、Llama 370B と AGKA はオープンソース LLM をベースとした有望な組み合わせである。加えて、LLMは、マルチクラスの分類において、類似した名前のラベルを区別するのに苦労している。 Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# ビデオ会議はどのように表現を変えるか How Video Meetings Change Your Expression ( http://arxiv.org/abs/2406.00955v1 ) ライセンス: Link先を確認	Sumit Sarin, Utkarsh Mall, Purva Tendulkar, Carl Vondrick,	(参考訳) ビデオ通話で話すと表情が変わるのか? 人のビデオが2つあるとすると、各セットに特有の時空間パターンを自動的に見つけ出そうとする。既存の方法は差別的アプローチを使用して、ポストホックな説明可能性分析を行う。このような手法は、明らかなデータセットバイアス以上の洞察を与えることができないため不十分であり、その説明は、人間自身がそのタスクに長けている場合に限り有用である。その代わりに、生成ドメイン翻訳のレンズを用いてこの問題に取り組む。本手法は、学習された、入力に依存した時空間的特徴の詳細なレポートと、それらがドメイン間で変化する範囲を出力する。本研究では,F2F(F2F)とVC(Voice-calls)の対話行動の違いを,本手法が検出できることを実証する。また,本手法が大統領通信方式の違いを発見する上での有効性を示す。さらに、教師なしの方法で表現を分離するビデオにおける時間的変化点を予測でき、モデルの解釈可能性や有用性を高めることができる。最後に,F2F設定で記録したようにビデオ通話を変換して表示する手法を提案する。実験と可視化は、我々のアプローチが様々な行動を発見し、人間の行動をより深く理解するための一歩を踏み出したことを示している。 Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set. Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task. Instead, we tackle the problem through the lens of generative domain translation: our method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains. We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). We also show the applicability of our method on discovering differences in presidential communication styles. Additionally, we are able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of our model. Finally, our method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show our approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 飛行中のセグメンテーションを改善する:医療画像セグメンテーションのための補助的オンライン学習と適応的融合 Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation ( http://arxiv.org/abs/2406.00956v1 ) ライセンス: Link先を確認	Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang,	(参考訳) SAM(Segment Anything Model)の現在の変種は、オリジナルのSAMとメディカルSAMを含むが、医用画像の十分な正確なセグメンテーションを生成できない。医療画像の文脈では、SAMがそのセグメンテーション予測を生成した後、人間の専門家が特定のテストサンプルのセグメンテーションを修正することは珍しくない。これらの修正は通常、最先端のアノテーションツールを使用した手動または半手動の修正を必要とする。このプロセスにより、オンライン機械学習の利点を活用して、テスト期間中にセグメンツ・ア・シング(SA)を強化する新しいアプローチを導入する。医用画像におけるSAのセグメンテーション品質を改善することを目的として,オンライン学習のための修正アノテーションを用いた。 SAMのような大規模ビジョンモデルと統合したオンライン学習の有効性と効率を向上させるため,AuxOL(Auxiliary Online Learning)と呼ばれる新しい手法を提案する。 AuxOLはSAM(ジェネラリスト)と連携して小さな補助モデルを作成し、適用し、適応的なオンラインバッチと適応的なセグメンテーション融合を必要とする。 4つの医用画像モダリティをカバーする8つのデータセットを用いて実験を行い,提案手法の有効性を検証した。本研究は,下流セグメンテーションタスク(例えば,医用画像セグメンテーション)におけるSAを強化するための,新しい,実用的で効果的なアプローチを提案し,検証する。 The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 矛盾する視点をナビゲートする:学習に対する信頼を損なう Navigating Conflicting Views: Harnessing Trust for Learning ( http://arxiv.org/abs/2406.00958v1 ) ライセンス: Link先を確認	Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe,	(参考訳) 対立を解決することは、多視点分類の決定をより信頼できるものにするために不可欠である。すべての視点が同一に重要であり、厳密に整合していると仮定して、異なる視点における一貫した情報表現の学習について多くの研究がなされている。しかし、現実のマルチビューデータは必ずしもこれらの仮定に従わないかもしれない。この問題に対処するために,異なる視点の衝突が発生する可能性のあるシナリオにおいて,既存の信頼に値するフレームワークを強化するための,計算信頼に基づく割引手法を開発した。その信念融合プロセスは、個別の視点による予測の信頼性を、確率に敏感な信頼割引機構を通じて考慮する。提案手法は,Top-1精度,AUC-ROC for Uncertainty-Aware Prediction,Fleiss' Kappa,および基底真理ラベルを考慮したMulti-View Agreement with Ground Truthという新たな指標を用いて,実世界の6つのデータセットに対して評価を行った。実験結果から,コンフリクトを効果的に解決し,実世界のアプリケーションにおいてより信頼性の高いマルチビュー分類モデルを実現する方法が示された。 Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct information. To address this issue, we develop a computational trust-based discounting method to enhance the existing trustworthy framework in scenarios where conflicts between different views may arise. Its belief fusion process considers the trustworthiness of predictions made by individual views via an instance-wise probability-sensitive trust discounting mechanism. We evaluate our method on six real-world datasets, using Top-1 Accuracy, AUC-ROC for Uncertainty-Aware Prediction, Fleiss' Kappa, and a new metric called Multi-View Agreement with Ground Truth that takes into consideration the ground truth labels. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 動的ユーザ参加によるフェデレーション・アンラーニングにおけるデータプライバシの保証 Guaranteeing Data Privacy in Federated Unlearning with Dynamic User Participation ( http://arxiv.org/abs/2406.00966v1 ) ライセンス: Link先を確認	Ziyao Liu, Yu Jiang, Weifeng Jiang, Jiale Guo, Jun Zhao, Kwok-Yan Lam,	(参考訳) フェデレート・アンラーニング(FU)は、訓練されたグローバルなFLモデルから、FL(Federated Learning)ユーザのデータの影響を排除するために、その能力で注目を集めている。単純なFUメソッドでは、未学習のユーザを削除し、その後、残りのすべてのユーザとスクラッチから新しいグローバルFLモデルをトレーニングする。非学習効率を高めるため、広く採用されている戦略では、FLユーザをクラスタに分割し、各クラスタが独自のFLモデルを維持している。最終的な推論は、これらのサブモデルの推論から過半数の投票を集約することで決定される。これにより、未学習プロセスを個々のクラスタに閉じ込めてユーザを除去し、未学習の効率を高める。しかし、現在のクラスタリングベースのFUスキームは、学習効率を高めるためにクラスタリングの精細化に重点を置いているが、FLユーザの勾配からの情報漏洩の可能性を見落としている。通常、各クラスタにセキュアアグリゲーション(SecAgg)スキームを統合することで、プライバシ保護FUが容易になる。それでも、SecAggスキームをシームレスに組み込んだクラスタリング方法論の構築は、特に敵ユーザや動的ユーザを含むシナリオでは難しい。本稿では,SecAggプロトコルをクラスタリングをベースとした,最も広く使用されているフェデレーションアンラーニングスキームに統合して,動的ユーザ参加を効果的に管理しながらプライバシの確保を目的とした,プライバシ保護型FUフレームワークの確立を体系的に検討する。総合的な理論的評価と実験結果から,提案手法は,ユーザの参加状況に応じて,プライバシー保護とレジリエンスの向上とともに,同等の非学習効果を達成できることが示された。 Federated Unlearning (FU) is gaining prominence for its capacity to eliminate influences of Federated Learning (FL) users' data from trained global FL models. A straightforward FU method involves removing the unlearned users and subsequently retraining a new global FL model from scratch with all remaining users, a process that leads to considerable overhead. To enhance unlearning efficiency, a widely adopted strategy employs clustering, dividing FL users into clusters, with each cluster maintaining its own FL model. The final inference is then determined by aggregating the majority vote from the inferences of these sub-models. This method confines unlearning processes to individual clusters for removing a user, thereby enhancing unlearning efficiency by eliminating the need for participation from all remaining users. However, current clustering-based FU schemes mainly concentrate on refining clustering to boost unlearning efficiency but overlook the potential information leakage from FL users' gradients, a privacy concern that has been extensively studied. Typically, integrating secure aggregation (SecAgg) schemes within each cluster can facilitate a privacy-preserving FU. Nevertheless, crafting a clustering methodology that seamlessly incorporates SecAgg schemes is challenging, particularly in scenarios involving adversarial users and dynamic users. In this connection, we systematically explore the integration of SecAgg protocols within the most widely used federated unlearning scheme, which is based on clustering, to establish a privacy-preserving FU framework, aimed at ensuring privacy while effectively managing dynamic user participation. Comprehensive theoretical assessments and experimental results show that our proposed scheme achieves comparable unlearning effectiveness, alongside offering improved privacy protection and resilience in the face of varying user participation.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 多様な視点を識別するためにRLを用いると、ソーシャルメディア上のコミュニティを識別するためのLLM能力が向上する Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media ( http://arxiv.org/abs/2406.00969v1 ) ライセンス: Link先を確認	Nikhil Mehta, Dan Goldwasser,	(参考訳) ソーシャルメディアの大規模利用と、その大きな影響が組み合わさって、ソーシャルメディアを理解することがますます重要になっている。特に、ユーザコミュニティを特定することは、多くのダウンストリームタスクに役立ちます。しかし、特にモデルが過去のデータに基づいてトレーニングされ、将来のテストを行う場合、これは難しい。本稿では,Large Language Models (LLMs) を利用してユーザコミュニティの同定を行う。また,ChatGPT など多くの LLM が固定されており,ブラックボックスとして扱わなければならないという事実から,より小規模な LLM を訓練することで,それらをより促進するためのアプローチを提案する。我々は、この小さなモデルをトレーニングするための戦略を考案し、コミュニティを検出するLLMのより大きな能力をどのように改善するかを示した。実験の結果、RedditとTwitterのデータ、コミュニティ検出、ボット検出、ニュースメディアのプロファイリングのタスクが改善された。 The large scale usage of social media, combined with its significant impact, has made it increasingly important to understand it. In particular, identifying user communities, can be helpful for many downstream tasks. However, particularly when models are trained on past data and tested on future, doing this is difficult. In this paper, we hypothesize to take advantage of Large Language Models (LLMs), to better identify user communities. Due to the fact that many LLMs, such as ChatGPT, are fixed and must be treated as black-boxes, we propose an approach to better prompt them, by training a smaller LLM to do this. We devise strategies to train this smaller model, showing how it can improve the larger LLMs ability to detect communities. Experimental results show improvements on Reddit and Twitter data, on the tasks of community detection, bot detection, and news media profiling.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# MiniGPT-Reverse-Designing: MiniGPT-4を用いた画像調整予測 MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4 ( http://arxiv.org/abs/2406.00971v1 ) ライセンス: Link先を確認	Vahid Azizi, Fatemeh Koochaki,	(参考訳) VLM(Vision-Language Models)は近年,LLM(Large Language Models)との統合によって,大幅な進歩を遂げている。画像とテキストのモダリティを同時に処理するVLMは、様々なマルチモーダルタスクにおける画像とテキスト間の相互作用を学習し、理解する能力を示している。複雑な視覚言語タスクとして定義できるリバースデザインは、ソースイメージ、編集バージョン、オプションの高レベルテキスト編集記述を与えられたときに、編集とそのパラメータを予測することを目的としている。このタスクでは、VLMは、ソースイメージ、編集されたバージョン、オプションのテキストコンテキスト間の相互作用を、従来の視覚言語タスクを超えて同時に理解する必要がある。本稿では,逆設計タスクのためにMiniGPT-4を拡張し,微調整する。本実験では, 逆設計などの複雑なタスクに対して, 市販VLM, 特にMiniGPT-4の拡張性を示す。 code is available at this \href{https://github.com/VahidAz/MiniGPT-Reverse-Designing} Vision-Language Models (VLMs) have recently seen significant advancements through integrating with Large Language Models (LLMs). The VLMs, which process image and text modalities simultaneously, have demonstrated the ability to learn and understand the interaction between images and texts across various multi-modal tasks. Reverse designing, which could be defined as a complex vision-language task, aims to predict the edits and their parameters, given a source image, an edited version, and an optional high-level textual edit description. This task requires VLMs to comprehend the interplay between the source image, the edited version, and the optional textual context simultaneously, going beyond traditional vision-language tasks. In this paper, we extend and fine-tune MiniGPT-4 for the reverse designing task. Our experiments demonstrate the extensibility of off-the-shelf VLMs, specifically MiniGPT-4, for more complex tasks such as reverse designing. Code is available at this \href{https://github.com/VahidAz/MiniGPT-Reverse-Designing}	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# パーソナライズされた埋め込み領域の省力化によるコールドスタート勧告 Cold-start Recommendation by Personalized Embedding Region Elicitation ( http://arxiv.org/abs/2406.00973v1 ) ライセンス: Link先を確認	Hieu Trung Nguyen, Duy Nguyen, Khoa Doan, Viet Anh Nguyen,	(参考訳) レーティング・エリケーション(英: Rating elicitation)は、冷間開始時に、利用者の好みを事前に知ることなく、新たに到着したユーザに対して、商品を推薦する必要があるようなレコメンデーションシステムの成功要素である。既存のelicitationメソッドでは,ユーザの好みを学習し,残りの項目に対してユーザの好みを推測するために,固定されたアイテムセットを使用している。固定されたシードセットを使用することで、潜在的に多様な好みを持つすべての新規ユーザにとって、シードセットが最適ではないため、レコメンデーションシステムのパフォーマンスを制限することができる。本稿では、この課題を2段階のパーソナライズド・エイコレーション・スキームを用いて解決する。まず,"burn-in' フェーズにおいて,ユーザに対して,人気項目の小さなセットの評価を依頼する。第2に、ユーザの嗜好や表現を洗練させるために、適応項目の格付けを順次求めている。プロセス全体を通して、システムは、ポイント推定ではなく、リージョン推定によって、ユーザの埋め込み値を表す。ユーザの商品に対するレーティングを問うことで得られる情報の値は、ユーザの真の埋め込み値の信頼性の高い領域中心埋め込み空間からの距離によって定量化される。最後に、ユーザの嗜好領域を考慮したレコメンデーションを順次生成する。提案手法では,各サブプロブレムを効率よく実装可能であることを示す。さらに,提案手法の有効性を実証的に実証した。 Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can limit the performance of the recommendation system since the seed set is unlikely optimal for all new users with potentially diverse preferences. This paper addresses this challenge using a 2-phase, personalized elicitation scheme. First, the elicitation scheme asks users to rate a small set of popular items in a ``burn-in'' phase. Second, it sequentially asks the user to rate adaptive items to refine the preference and the user's representation. Throughout the process, the system represents the user's embedding value not by a point estimate but by a region estimate. The value of information obtained by asking the user's rating on an item is quantified by the distance from the region center embedding space that contains with high confidence the true embedding value of the user. Finally, the recommendations are successively generated by considering the preference region of the user. We show that each subproblem in the elicitation scheme can be efficiently implemented. Further, we empirically demonstrate the effectiveness of the proposed method against existing rating-elicitation methods on several prominent datasets.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# Luna: 高精度で低コストな言語モデル幻覚をキャッチするための評価基礎モデル Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost ( http://arxiv.org/abs/2406.00975v1 ) ライセンス: Link先を確認	Masha Belyi, Robert Friel, Shuai Shao, Atindriyo Sanyal,	(参考訳) Retriever Augmented Generation (RAG) システムは,外部知識検索機構を組み込むことで,言語モデルの能力向上に重要な役割を担っている。しかし、これらのシステムを業界アプリケーションに展開する上で重要な課題は幻覚の検出と緩和である。この問題に対処することは、様々な業界環境で大きな言語モデル(LLM)が生み出す応答の信頼性と正確性を保証するために不可欠である。現在の幻覚検出技術は、精度、低レイテンシ、低コストを同時に提供できない。本稿では,RAG設定における幻覚検出のためのLuna: a DeBERTA-large (440M)エンコーダについて紹介する。その結果,Luna は幻覚検出タスクにおいて GPT-3.5 と商用評価フレームワークを上回り,97% と 96% のコスト削減と遅延削減を実現している。 Lunaは軽量で、複数の業界分野とドメイン外データにまたがって一般化されており、業界LLMアプリケーションにとって理想的な候補となっている。 Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of responses generated by large language models (LLMs) in diverse industry settings. Current hallucination detection techniques fail to deliver accuracy, low latency, and low cost simultaneously. We introduce Luna: a DeBERTA-large (440M) encoder, finetuned for hallucination detection in RAG settings. We demonstrate that Luna outperforms GPT-3.5 and commercial evaluation frameworks on the hallucination detection task, with 97% and 96% reduction in cost and latency, respectively. Luna is lightweight and generalizes across multiple industry verticals and out-of-domain data, making it an ideal candidate for industry LLM applications.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# 効率的な階層変換器を用いた事前学習音声モデル Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer ( http://arxiv.org/abs/2406.00976v1 ) ライセンス: Link先を確認	Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu,	(参考訳) 近年の言語モデルの進歩は大きな進歩を遂げているが、ニューラルオーディオコーデックの長い音響シーケンスをモデル化する際の大きな課題に直面している。本稿では,効率的な音声言語モデリングのために設計された階層型トランスフォーマである \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST)を紹介する。 GPSTは、音声波形を2種類の離散音声表現に量子化し、階層的なトランスフォーマーアーキテクチャに統合し、統一された1段階生成プロセスとHi-Res音声生成機能を向上させる。エンド・ツー・エンドの教師なしで大規模な音声コーパスを訓練することにより、GPSTは多様な話者の同一性を持つ構文的に一貫した音声を生成することができる。短時間の3秒のプロンプトによって、GPSTは自然で一貫性のあるパーソナライズされた音声を生成し、コンテキスト内学習能力を示す。さらに,多言語意味トークンと普遍的音響トークンを組み込むことで,音声言語間音声生成へのアプローチを容易に拡張することができる。実験結果から,GPSTは単語誤り率,音声品質,話者類似度において,既存の言語モデルよりも有意に優れていた。デモサンプルについては \url{https://youngsheen.github.io/GPST/demo} を参照してください。 While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio waveforms into two distinct types of discrete speech representations and integrates them within a hierarchical transformer architecture, allowing for a unified one-stage generation process and enhancing Hi-Res audio generation capabilities. By training on large corpora of speeches in an end-to-end unsupervised manner, GPST can generate syntactically consistent speech with diverse speaker identities. Given a brief 3-second prompt, GPST can produce natural and coherent personalized speech, demonstrating in-context learning abilities. Moreover, our approach can be easily extended to spoken cross-lingual speech generation by incorporating multi-lingual semantic tokens and universal acoustic tokens. Experimental results indicate that GPST significantly outperforms the existing speech language models in terms of word error rate, speech quality, and speaker similarity. See \url{https://youngsheen.github.io/GPST/demo} for demo samples.	翻訳日:2024-06-06 02:37:18 公開日:2024-06-03
# Dragonfly:マルチリゾリューションズームが大型のビジュアルランゲージモデルをスーパーチャージャー Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model ( http://arxiv.org/abs/2406.00977v1 ) ライセンス: Link先を確認	Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou,	(参考訳) 大規模マルチモーダルモデル(LMM)の最近の進歩は、高解像度画像の解像度が、視覚的コモンセンス推論やバイオメディカル画像解析といったタスクにおいて重要な、画像詳細のきめ細かい理解を促進することを示唆している。しかし、入力解像度の増大は2つの大きな課題をもたらす。 1) 言語モデルに必要なコンテキスト長を拡張し、非効率になり、モデルのコンテキスト限界に達する。 2) 視覚的機能の複雑さを増大させ、より多くのトレーニングデータやより複雑なアーキテクチャを必要とする。我々はDragonflyという新しいLMMアーキテクチャを導入し、これらの課題に対処するための画像領域のきめ細かい視覚的理解と推論を可能にした。 Dragonflyには、マルチ解像度のビジュアルエンコーディングとズームインパッチ選択という、2つの重要な戦略がある。これらの戦略により、適切なコンテキスト長を維持しつつ、高解像度画像を効率的に処理することができる。一般的な8つのベンチマークの実験では、Dragonflyは他のアーキテクチャと比較して、競争力や性能が向上していることが示され、設計の有効性が強調された。さらに,Dragonflyのバイオメディカルインストラクションを微調整し,Path-VQAデータセット(Med-Geminiの83.3%)の92.3%の精度と,バイオメディカルイメージキャプションの最も高い報告結果を含む,詳細な視覚的理解を必要とする複数のバイオメディカルタスクの最先端の結果を得た。モデルトレーニングを支援するため,一般領域の550万イメージインストラクションサンプルと,バイオメディカル領域の1.4万サンプルを用いた視覚的インストラクションチューニングデータセットをキュレートした。また、様々な建築設計や画像解像度の影響を特徴づけるアブレーション研究を行い、視覚的指示のアライメントに関する今後の研究への洞察を提供した。コードベースとモデルはhttps://github.com/together computer/Dragonfly.comで公開されている。 Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting the model's context limit; 2) It increases the complexity of visual features, necessitating more training data or more complex architecture. We introduce Dragonfly, a new LMM architecture that enhances fine-grained visual understanding and reasoning about image regions to address these challenges. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection. These strategies allow the model to process high-resolution images efficiently while maintaining reasonable context length. Our experiments on eight popular benchmarks demonstrate that Dragonfly achieves competitive or better performance compared to other architectures, highlighting the effectiveness of our design. Additionally, we finetuned Dragonfly on biomedical instructions, achieving state-of-the-art results on multiple biomedical tasks requiring fine-grained visual understanding, including 92.3% accuracy on the Path-VQA dataset (compared to 83.3% for Med-Gemini) and the highest reported results on biomedical image captioning. To support model training, we curated a visual instruction-tuning dataset with 5.5 million image-instruction samples in the general domain and 1.4 million samples in the biomedical domain. We also conducted ablation studies to characterize the impact of various architectural designs and image resolutions, providing insights for future research on visual instruction alignment. The codebase and model are available at https://github.com/togethercomputer/Dragonfly.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 視覚的質問を選択的に答える Selectively Answering Visual Questions ( http://arxiv.org/abs/2406.00980v1 ) ライセンス: Link先を確認	Julian Martin Eisenschlos, Hernán Maina, Guido Ivetta, Luciana Benotti,	(参考訳) 近年,大規模なマルチモーダルモデル (LMM) が出現し,キャプションや視覚質問応答 (VQA) などの視覚タスクを前例のない精度で実行できるようになった。盲人や視覚障害者を助けるようなアプリケーションには、正確な答えが不可欠である。モデルを適切に校正し、不確実性を定量化して、いつ答えるか、いつ断念するか、明確化を求めるのかを選択的に決定することは特に重要である。テキスト内学習LMMを用いたVQAのためのキャリブレーション手法とメトリクスの詳細な分析を行う。 VQAを2つの解答性ベンチマークで検討したところ、サンプリング手法が一般的に優れているが、明確な勝者が存在しないテキストのみのテキスト学習よりも、視覚的に接地されたモデルのスコアが適していることが示された。 Avg BLEU は,サンプリング法と確率法の両方の利点をモダリティで組み合わせたキャリブレーションスコアである。 Recently, large multi-modal models (LMMs) have emerged with the capacity to perform vision tasks such as captioning and visual question answering (VQA) with unprecedented accuracy. Applications such as helping the blind or visually impaired have a critical need for precise answers. It is specially important for models to be well calibrated and be able to quantify their uncertainty in order to selectively decide when to answer and when to abstain or ask for clarifications. We perform the first in-depth analysis of calibration methods and metrics for VQA with in-context learning LMMs. Studying VQA on two answerability benchmarks, we show that the likelihood score of visually grounded models is better calibrated than in their text-only counterparts for in-context learning, where sampling based methods are generally superior, but no clear winner arises. We propose Avg BLEU, a calibration score combining the benefits of both sampling and likelihood methods across modalities.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 有害言語検出における非現実的因果効果による嫌悪感 Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect ( http://arxiv.org/abs/2406.00983v1 ) ライセンス: Link先を確認	Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin,	(参考訳) 現在の有害言語検出法(TLD)は、通常、決定を行うための特定のトークンに依存しており、それらが語彙バイアスに悩まされ、性能や一般化が低下する。語彙バイアスは「有用」と「誤解」の両方が毒性の理解に影響を及ぼす。残念なことに、これらの影響を区別する代わりに、現在のデバイアス法は一般的にそれらを無差別に排除し、結果としてモデルの検出精度が低下する。そこで本研究では,TLDにおける語彙バイアスを軽減するために,CCDF(Counterfactual Causal Debiasing Framework)を提案する。語彙バイアスの「無駄な影響」を保ち、「誤解を招く影響」を排除している。具体的には、まず、原文と偏見付きトークンの合計効果を因果的視点から判断する。次に、語彙バイアスの直接的な因果効果を全体効果から排除するために、反事実推論を行う。 CCDFを組み込んだデバイアスドTLDモデルは,複数のバニラモデルに適用した競合ベースラインと比較して,精度と公正性の両方で最先端の性能を発揮することを示す実証評価を行った。我々のモデルの一般化能力は、分布外データに対する現在のデバイアスモデルより優れています。 Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 単語埋め込みを用いたアナロジー課題による薬物・遺伝子関係の予測 Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings ( http://arxiv.org/abs/2406.00984v1 ) ライセンス: Link先を確認	Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Mariko Okada, Hidetoshi Shimodaira,	(参考訳) 自然言語処理(NLP)は、テキスト中の単語が通常、埋め込みと呼ばれる特徴ベクトルに変換される幅広い分野で利用される。 BioConceptVecは生物学に適した埋め込みの具体例であり、スキップグラムのようなモデルを使用して約3000万のPubMed抽象化に基づいてトレーニングされている。一般に、単語埋め込みは単純な算術演算によって類似タスクを解くことが知られている。例えば、$\mathrm{\textit{king}} - \mathrm{\textit{man}} + \mathrm{\textit{woman}}$ predicts $\mathrm{\textit{queen}}$である。本研究では,BioConceptVec の埋め込みと,PubMed の抽象化で訓練した埋め込みが,薬物遺伝子関係の情報を包含し,アナログ計算により薬剤の標的遺伝子を予測できることを実証した。また, 生物学的経路を用いた薬物や遺伝子を分類することで, 性能が向上することを示した。さらに,過去の既知の関係から派生したベクトルが,データセットの未知の将来の関係を年々予測できることを示す。 Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector arithmetic. For instance, $\mathrm{\textit{king}} - \mathrm{\textit{man}} + \mathrm{\textit{woman}}$ predicts $\mathrm{\textit{queen}}$. In this study, we demonstrate that BioConceptVec embeddings, along with our own embeddings trained on PubMed abstracts, contain information about drug-gene relations and can predict target genes from a given drug through analogy computations. We also show that categorizing drugs and genes using biological pathways improves performance. Furthermore, we illustrate that vectors derived from known relations in the past can predict unknown future relations in datasets divided by year.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 複数編集:テキスト・画像拡散モデルによる同時多視点編集 MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models ( http://arxiv.org/abs/2406.00985v1 ) ライセンス: Link先を確認	Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu,	(参考訳) テキスト駆動画像合成は、テキストプロンプトから視覚コンテンツがどのように生成されるかを変える拡散モデルの開発において、大きな進歩を遂げた。これらの進歩にもかかわらず、コンピュータグラフィックスの重要な領域であるテキスト駆動画像編集は、ユニークな課題に直面している。最大の課題は、複数のオブジェクトや属性を同時に編集することだ。マルチアスペクト編集にこれらの手法を順次適用すると、計算要求と効率損失が増大する。本稿では,これらの課題に多大な貢献をしながら対処する。私たちの主な貢献は、複数の属性をまたいだ同時編集をシームレスに管理するメソッドであるMultiEditsの開発です。従来のアプローチとは対照的に、MultiEditsは単一の属性編集の品質を保持するだけでなく、マルチタスク編集のパフォーマンスを大幅に改善する。これは、革新的な注意分布機構と、複数の処理ヘッドをまたいで動作するマルチブランチ設計によって実現される。さらに、元のPIE-Benchデータセットを拡張したPIE-Bench++データセットを導入し、複数のオブジェクトと属性を含む画像編集タスクの評価を同時にサポートする。このデータセットは、多面的シナリオにおけるテキスト駆動画像編集手法を評価するためのベンチマークである。データセットとコードはhttps://mingzhenhuang.com/projects/MultiEdits.htmlで公開されている。 Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially for multi-aspect edits increases computational demands and efficiency losses. In this paper, we address these challenges with significant contributions. Our main contribution is the development of MultiEdits, a method that seamlessly manages simultaneous edits across multiple attributes. In contrast to previous approaches, MultiEdits not only preserves the quality of single attribute edits but also significantly improves the performance of multitasking edits. This is achieved through an innovative attention distribution mechanism and a multi-branch design that operates across several processing heads. Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. This dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios. Dataset and code are available at https://mingzhenhuang.com/projects/MultiEdits.html.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 乱れによる教師なしグラフ異常検出における公平性向上 Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement ( http://arxiv.org/abs/2406.00987v1 ) ライセンス: Link先を確認	Wenjing Chang, Kay Liu, Philip S. Yu, Jianjun Yu,	(参考訳) グラフ異常検出(GAD)は、金融詐欺検出から偽ニュース検出まで、さまざまなアプリケーションにおいてますます重要になっている。しかし、現在のGAD法は主に公平性の問題を見落としており、差別的判断は、センシティブな属性(例えば、性別、宗教、民族など)で定義された特定の人口集団に偏っている可能性がある。これは、社会的および倫理的制約を考慮して、現実世界のシナリオにおけるこれらの手法の適用性を大幅に制限する。この重要なギャップに対処するため、我々はGAD意思決定における実用性と公正性を統合するための最初の試みを行う。具体的には,DefEND と呼ばれる属性グラフ上に,新しい DisEntangle ベースの FairnEss 対応 aNomaly 検出フレームワークを考案する。 DEFEND はまず GNN のアンタングル化を導入し、情報的かつ機密性の高いノード表現をキャプチャし、グラフ表現学習に固有の社会的バイアスを効果的に低減する。さらに、異常ノードの評価における識別バイアスを軽減するために、DEFENDは、グラフ構造を組み込まずにノード属性のみに集中する再構成ベースの異常検出を採用する。さらに、入力属性と感度属性の固有の関連性を考えると、DEFENDは再構成エラーと予測された感度属性との相関を制約する。実世界のデータセットに対する実証的な評価から、DEFENDはGADにおいて効果的に機能し、最先端のベースラインと比較して公正性を著しく向上することが明らかとなった。再現性を高めるため、私たちのコードはhttps://github.com/AhaChang/DEFEND.comで利用可能です。 Graph anomaly detection (GAD) is increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender, religion, ethnicity, etc.). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing societal bias inherent in graph representation learning. Besides, to alleviate discriminatory bias in evaluating anomalous nodes, DEFEND adopts a reconstruction-based anomaly detection, which concentrates solely on node attributes without incorporating any graph structure. Additionally, given the inherent association between input and sensitive attributes, DEFEND constrains the correlation between the reconstruction error and the predicted sensitive attributes. Our empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. To foster reproducibility, our code is available at https://github.com/AhaChang/DEFEND.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 軌道最適化のための制約を考慮した拡散モデル Constraint-Aware Diffusion Models for Trajectory Optimization ( http://arxiv.org/abs/2406.00990v1 ) ライセンス: Link先を確認	Anjian Li, Zihan Ding, Adji Bousso Dieng, Ryne Beeson,	(参考訳) 拡散モデルは、軌道最適化問題に対する高品質で多様な解を生成することに成功している。しかし、ニューラルネットワークを用いた拡散モデルは、必然的に予測エラーを発生させ、非金属目標や衝突のような制約違反を引き起こす。本稿では,軌道最適化のための制約対応拡散モデルを提案する。本稿では,拡散サンプルの制約違反を最小限に抑えつつ,元のデータ分布を復元する学習用ハイブリッド損失関数を提案する。本モデルでは, 局所最適解に近いサンプルを生成するとともに, 制約違反を最小限に抑えつつ, 従来の拡散モデルよりも優れていることを示す。 The diffusion model has shown success in generating high-quality and diverse solutions to trajectory optimization problems. However, diffusion models with neural networks inevitably make prediction errors, which leads to constraint violations such as unmet goals or collisions. This paper presents a novel constraint-aware diffusion model for trajectory optimization. We introduce a novel hybrid loss function for training that minimizes the constraint violation of diffusion samples compared to the groundtruth while recovering the original data distribution. Our model is demonstrated on tabletop manipulation and two-car reach-avoid problems, outperforming traditional diffusion models in minimizing constraint violations while generating samples close to locally optimal solutions.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 分散リファインメントネットワーク:ディープラーニングによる分布予測 Distributional Refinement Network: Distributional Forecasting via Deep Learning ( http://arxiv.org/abs/2406.00998v1 ) ライセンス: Link先を確認	Benjamin Avanzi, Eric Dong, Patrick J. Laub, Bernard Wong,	(参考訳) アクチュエータモデリングにおける重要なタスクは、損失の分布特性をモデル化することである。 Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) のような古典的(分配的な)回帰アプローチは一般的に用いられるが、モデルの開発には課題が残っている。 (i)共変体が条件分布の異なる側面に柔軟に影響を及ぼすことを可能にする。二機械学習とAIの進歩を統合して予測力を最大化すること。 (i)及び(iii)モデルとその出力に対する信頼を高めるためにモデルにおける解釈可能性のレベルを維持し、追跡においてしばしば侵害される。 (i)および (II)。我々は、本質的に解釈可能なベースラインモデル(GLMなど)とフレキシブルニューラルネットワークを改良したDeep Distribution Regression(DDR; Li et al , 2019)を組み合わせた分散リファインメントネットワーク(DRN)を提案する。 Actuarial Neural Network (CANN, Schelldorfer and W{\'u}thrich, 2019)に触発された我々のアプローチは,ベースライン分布全体を柔軟に洗練する。結果として、DRNは全ての量子化の様々な効果を捉え、適切な解釈性を維持しながら予測性能を向上させる。合成データと実世界のデータの両方を用いて、DRNの優れた分布予測能力を示す。 DRNは、アクチュエータ科学などにおいて、強力な分散回帰モデルになる可能性を持っている。 A key task in actuarial modelling involves modelling the distributional properties of losses. Classic (distributional) regression approaches like Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) are commonly used, but challenges remain in developing models that can (i) allow covariates to flexibly impact different aspects of the conditional distribution, (ii) integrate developments in machine learning and AI to maximise the predictive power while considering (i), and, (iii) maintain a level of interpretability in the model to enhance trust in the model and its outputs, which is often compromised in efforts pursuing (i) and (ii). We tackle this problem by proposing a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model (such as GLMs) with a flexible neural network-a modified Deep Distribution Regression (DDR; Li et al., 2019) method. Inspired by the Combined Actuarial Neural Network (CANN; Schelldorfer and W{\''u}thrich, 2019), our approach flexibly refines the entire baseline distribution. As a result, the DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability. Using both synthetic and real-world data, we demonstrate the DRN's superior distributional forecasting capacity. The DRN has the potential to be a powerful distributional regression model in actuarial science and beyond.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# 木を通して森を見る:部分変圧器勾配からのデータ漏洩 Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients ( http://arxiv.org/abs/2406.00999v1 ) ライセンス: Link先を確認	Weijun Li, Qiongkai Xu, Mark Dras,	(参考訳) 近年の研究では、分散機械学習は勾配反転攻撃に弱いことが示されており、トレーニングで共有されるモデルの勾配を分析することで、プライベートトレーニングデータを再構成することができる。以前の攻撃では、モデル全体の全てのパラメータからの勾配を使って、そのような再構築が可能であった。しかし、関係するモジュールやそのサブモジュールのほとんどが、データ漏洩を訓練するリスクがあることを仮定し、言語モデルの様々な中間層でそのような脆弱性を検証する。広範な実験により、単一トランスフォーマー層、あるいは0.54%のパラメータを持つ単一の線形コンポーネントからの勾配が、データ漏洩のトレーニングに影響されることが判明した。さらに、トレーニング中の勾配に差分プライバシーを適用することは、データ開示の新たな脆弱性に対して限定的な保護を提供することを示す。 Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# Uni-ISP: 複数のカメラからISPを学ぶこと Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras ( http://arxiv.org/abs/2406.01003v1 ) ライセンス: Link先を確認	Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu,	(参考訳) 現代のエンドツーエンドの画像信号プロセッサ(ISP)はRAW/XYZデータからsRGB(あるいは逆)への複雑なマッピングを学習し、画像処理の新たな可能性を開く。しかし、カメラモデルの多様性が拡大し続けているため、個々のISPの開発とメンテナンスは長期的には持続可能ではなく、本質的には汎用性に欠けており、複数のカメラモデルへの適応性を妨げている。本稿では,複数のカメラからISPを学習するための新しいパイプラインUni-ISPを提案する。 Uni-ISPの中核は、逆/フォワードISPとその特別なトレーニングスキームを学習することで、デバイス対応の埋め込みを活用することである。これにより、Uni-ISPは、逆/フォワードISPのパフォーマンスを向上するだけでなく、既存の学習ISPにはアクセスできない様々な新しいアプリケーションをアンロックする。さらに,複数のカメラで同期して撮影するデータセットは存在しないため,実世界の4KデータセットであるFiveCamを構築し,SRGB-RAW画像の2400組以上を5台のスマートフォンで同期的に撮影する。 Inverse/forward ISPsにおけるUni-ISPの精度(+1.5dB/2.4dB PSNRの改善)、新しいアプリケーションの実現における汎用性、新しいカメラモデルへの適応性など、幅広い実験を行った。 Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# SemCoder: 包括的なセマンティクスによるコード言語モデルのトレーニング SemCoder: Training Code Language Models with Comprehensive Semantics ( http://arxiv.org/abs/2406.01006v1 ) ライセンス: Link先を確認	Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray,	(参考訳) コードLLM(Code Large Language Models)は、コード補完のようなタスクに優れていますが、実行効果や動的状態のようなより深いセマンティクスを見逃すことがよくあります。本稿では,静的テキストデータへのコードLLMの依存と,デバッグやプログラムの修復といった複雑なタスクに対する詳細な意味理解の必要性のギャップを埋めることを目的としている。本稿では,高レベルの機能記述,個々の文の局所的な実行効果,入力/出力動作全般を包含し,静的コードテキストを動的実行状態にリンクする,包括的セマンティクスによるコードLLMのトレーニング手法を提案する。まずは、機能記述と実行トレースを備えた、完全に実行可能なサンプルのクリーンコードコーパスであるPyXの収集から始めます。我々は、自然言語を用いてコードを書き、実行動作を表現し、推論するためのCode LLMのトレーニングを提案し、人間の言葉によるデバッグを模倣する。このアプローチは、コード生成と実行の推論タスクにおいてGPT-3.5-turboと競合する性能を示す6.7Bパラメータしか持たないコードLLMであるSemCoderの開発につながった。 SemCoderはHumanEval(GPT-3.5-turbo:76.8%)で81.1%、CRUXEval-I(GPT-3.5-turbo:50.3%)で54.5%を達成した。また,具体的なスクラッチパッド推論と比較して,SemCoderのモノローグスタイルの実行推論の有効性について検討し,複数の次元のセマンティクスをよりスムーズに統合することを示す。最後に、学習したセマンティクスを適用して、コードLLMのデバッグと自己修正機能を改善する可能性を実証する。 Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean code corpus of fully executable samples with functional descriptions and execution tracing. We propose training Code LLMs to write code and represent and reason about execution behaviors using natural language, mimicking human verbal debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 81.1% on HumanEval (GPT-3.5-turbo: 76.8%) and 54.5% on CRUXEval-I (GPT-3.5-turbo: 50.3%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# イメージングレーダ3次元物体検出に基づく多対象追跡 Multi-Object Tracking based on Imaging Radar 3D Object Detection ( http://arxiv.org/abs/2406.01011v1 ) ライセンス: Link先を確認	Patrick Palmer, Martin Krüger, Richard Altendorfer, Torsten Bertram,	(参考訳) 周辺交通参加者の効果的な追跡により、将来の行動予測やエゴ車両軌道の適切な計画に必要となる正確な状態推定が可能となる。周辺交通の参加者を検知・追跡するためのアプローチは、学習に基づく物体検出と古典的な追跡アルゴリズムの組み合わせである。学習に基づく物体検出器はライダーとカメラのデータに適切に対応し、学習に基づく物体検出器は標準のレーダーデータ入力により劣っていることが示されている。近年,レーダセンサ技術の改良により,レーダの物体検出性能は大幅に向上したが,レーダ点雲の広さによりライダーセンサに制限が加えられている。これは、多目的追跡のタスクに特有の課題である。追跡アルゴリズムは、一貫したトラックを生成しながら、限られた検出品質を克服しなければならない。この目的のために、下流タスクの可能性を調べるために、レーダデータに対する異なるマルチオブジェクト追跡手法の比較が必要である。この研究は、複数のアプローチを比較し、レーダーデータに適用した場合の限界を分析します。さらに, この課題に対して, 確率的アソシエーションアルゴリズムによる提案手法の強化が検討されている。 Effective tracking of surrounding traffic participants allows for an accurate state estimation as a necessary ingredient for prediction of future behavior and therefore adequate planning of the ego vehicle trajectory. One approach for detecting and tracking surrounding traffic participants is the combination of a learning based object detector with a classical tracking algorithm. Learning based object detectors have been shown to work adequately on lidar and camera data, while learning based object detectors using standard radar data input have proven to be inferior. Recently, with the improvements to radar sensor technology in the form of imaging radars, the object detection performance on radar was greatly improved but is still limited compared to lidar sensors due to the sparsity of the radar point cloud. This presents a unique challenge for the task of multi-object tracking. The tracking algorithm must overcome the limited detection quality while generating consistent tracks. To this end, a comparison between different multi-object tracking methods on imaging radar data is required to investigate its potential for downstream tasks. The work at hand compares multiple approaches and analyzes their limitations when applied to imaging radar data. Furthermore, enhancements to the presented approaches in the form of probabilistic association algorithms are considered for this task.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# テンソル積表現の注意に基づく反復分解 Attention-based Iterative Decomposition for Tensor Product Representation ( http://arxiv.org/abs/2406.01012v1 ) ライセンス: Link先を確認	Taewon Park, Inchul Choi, Minho Lee,	(参考訳) 近年の研究では、データの構成構造を学習することにより、ディープニューラルネットワークの体系的一般化タスクにテンソル製品表現(TPR)を適用している。しかし、これらの先行研究は、その構造表現への分解が不完全であるため、目に見えないテストデータからシンボル構造を発見し、表現する上で、限られた性能を示した。本研究では,TPRを用いた逐次入力データから符号化された構造化表現の分解操作を強化するために,Attention-based Iterative Decomposition (AID)モジュールを提案する。我々のAIDは、任意のTPRモデルに容易に適応でき、入力特徴と構造化表現との間の競合的な注意機構を通じて、体系的な分解を提供する。本実験では,一連の系統的一般化タスクにおいて,TPRに基づく先行作業の性能を大幅に向上させることにより,AIDの有効性を示す。さらに、定量的および定性的な評価では、AIDは他の作品よりも構成的および十分有界な構造表現を生成する。 In recent research, Tensor Product Representation (TPR) is applied for the systematic generalization task of deep neural networks by learning the compositional structure of data. However, such prior works show limited performance in discovering and representing the symbolic structure from unseen test data because their decomposition to the structural representations was incomplete. In this work, we propose an Attention-based Iterative Decomposition (AID) module designed to enhance the decomposition operations for the structured representations encoded from the sequential input data with TPR. Our AID can be easily adapted to any TPR-based model and provides enhanced systematic decomposition through a competitive attention mechanism between input features and structured representations. In our experiments, AID shows effectiveness by significantly improving the performance of TPR-based prior works on the series of systematic generalization tasks. Moreover, in the quantitative and qualitative evaluations, AID produces more compositional and well-bound structural representations than other works.	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# Rewardの過度な最適化を緩和するためのスケーラブルな実装 Scalable Ensembling For Mitigating Reward Overoptimisation ( http://arxiv.org/abs/2406.01013v1 ) ライセンス: Link先を確認	Ahmed M. Ahmed, Rafael Rafailov, Stepan Sharkov, Xuechen Li, Sanmi Koyejo,	(参考訳) Reinforcement Learning from Human Feedback (RLHF)は、強力な命令追従モデルのための言語モデリングにおける大幅な進歩を可能にした。しかしながら、これらのモデルのアライメントは、よりパフォーマンスの高い ` `gold" 報酬モデルによって測定された、学習した ``proxy' 報酬モデルに過度に適合する傾向にあり、これは 'textit{over-optimization} として知られる現象である。オフライン強化学習では一般的だが、高いメモリ要求の言語モデルでは信じられないほどコストがかかるため、十分に大きなモデルではそのようなアプローチは実現できない。この目的のために、共有エンコーダを用いるが、分離された線形ヘッドを提案する。これは完全なアンサンブルと同じようなパフォーマンスをもたらしながら、同じサイズのモデルのトレーニングに必要なメモリと時間の大幅な節約を可能にします。 \end{abstract} Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomenon known as \textit{over-optimization}. Prior work has mitigated this issue by computing a pessimistic statistic over an ensemble of reward models, which is common in Offline Reinforcement Learning but incredibly costly for language models with high memory requirements, making such approaches infeasible for sufficiently large models. To this end, we propose using a shared encoder but separate linear heads. We find this leads to similar performance as the full ensemble while allowing tremendous savings in memory and time required for training for models of similar size. \end{abstract}	翻訳日:2024-06-06 02:27:34 公開日:2024-06-03
# Mobile-Agent-v2:マルチエージェントコラボレーションによる効果的なナビゲーション機能を備えたモバイルデバイス操作アシスタント Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration ( http://arxiv.org/abs/2406.01014v1 ) ライセンス: Link先を確認	Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang,	(参考訳) モバイルデバイス操作タスクは、一般的なマルチモーダルAIアプリケーションシナリオになりつつある。現在のMLLM(Multi-modal Large Language Models)は、訓練データによって制約されているが、操作アシスタントとして効果的に機能する能力は欠如している。代わりに、ツール呼び出しによる機能強化を行うMLLMベースのエージェントが、このシナリオに徐々に適用されている。しかし、モバイル機器操作タスクにおける2つの大きなナビゲーション課題、タスク進捗ナビゲーション、フォーカスコンテンツナビゲーションは、既存の作業の単一エージェントアーキテクチャの下でかなり複雑である。これは、非常に長いトークンシーケンスと、パフォーマンスを制限するインターリーブされたテキストイメージデータフォーマットのためである。これらのナビゲーション課題を効果的に解決するために,モバイルデバイス操作支援のためのマルチエージェントアーキテクチャであるMobile-Agent-v2を提案する。アーキテクチャは、計画エージェント、決定エージェント、反射エージェントの3つのエージェントから構成される。計画エージェントはタスク進捗を生成し、履歴操作のナビゲーションをより効率的にする。フォーカス内容を維持するため、タスクの進捗に応じて更新するメモリユニットを設計する。さらに、誤った操作を正すために、反射剤は各操作の結果を観察し、それに応じて誤りを処理する。実験の結果, Mobile-Agent-v2は, Mobile-Agentの単一エージェントアーキテクチャに比べてタスク完了率が30%以上向上していることがわかった。コードはhttps://github.com/X-PLUG/MobileAgent.comで公開されている。 Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# 変分モンテカルロ法におけるニューラル量子状態:簡単な概要 Neural Quantum States in Variational Monte Carlo Method: A Brief Summary ( http://arxiv.org/abs/2406.01017v1 ) ライセンス: Link先を確認	Yuntai Song,	(参考訳) 本稿では,スピン系の量子状態に基づく変分モンテカルロ法について概説する。ニューラルネットワークを波動関数として使用すると、その非線型活性化関数と密接に関連している非常に非局所的な相互作用を含む、様々な種類の相互作用のより一般化された表現が可能になる。さらに、ニューラルネットワークは、高次元システムを扱う場合、比較的小さな計算資源を持つ比較的複雑な波動関数を表現できる。量子状態トモグラフィーにおいて、ニューラル量子状態の表現法はすでに大きな成果を上げており、より大きなシステムを扱う可能性を示している。 In this note, variational Monte Carlo method based on neural quantum states for spin systems is reviewed. Using a neural network as the wave function allows for a more generalized expression of various types of interactions, including highly non-local interactions, which are closely related to its non-linear activation functions. Additionally, neural networks can represent relatively complex wave functions with relatively small computational resources when dealing with higher-dimensional systems, which is undoubtedly a "flattening" advantage. In quantum-state tomography, the representation method of neural quantum states has already achieved significant results, hinting at its potential in handling larger-sized systems.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# マルチレベルVAEと逆学習を用いたテキスト音声のアクセント変換 Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training ( http://arxiv.org/abs/2406.01018v1 ) ライセンス: Link先を確認	Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans,	(参考訳) 急速なグローバル化により、包括的で代表的な音声技術を構築する必要性は過大評価されない。アクセントは、包括的音声合成装置を構築する際に考慮すべき音声の重要な側面である。包括的音声技術は、特定のアクセントを持つ人々のような特定のグループに対する偏見を消すことを目的としている。アクセントに焦点を絞らずに高品質な音声を生成するように設計されているため、現状のTTS(Text-to-Speech)システムは、背景に関係なく、現在すべての人に適していない可能性があることに留意する。本稿では,TTSにおけるアクセント付き音声合成と変換に対応するために,マルチレベル変分オートエンコーダを用いたTSモデルを提案する。主観的評価と主観的聴力テストによる評価を行った。その結果,アクセント変換能力はベースラインに比べて向上した。 With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated. Accent is an important aspect of speech that needs to be taken into consideration while building inclusive speech synthesizers. Inclusive speech technology aims to erase any biases towards specific groups, such as people of certain accent. We note that state-of-the-art Text-to-Speech (TTS) systems may currently not be suitable for all people, regardless of their background, as they are designed to generate high-quality voices without focusing on accent. In this paper, we propose a TTS model that utilizes a Multi-Level Variational Autoencoder with adversarial learning to address accented speech synthesis and conversion in TTS, with a vision for more inclusive systems in the future. We evaluate the performance through both objective metrics and subjective listening tests. The results show an improvement in accent conversion ability compared to the baseline.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment ( http://arxiv.org/abs/2406.01020v1 ) ライセンス: Link先を確認	Daekyu Kwon, Dongyoung Kim, Sehwan Ki, Younghyun Jo, Hyong-Euk Lee, Seon Joo Kim,	(参考訳) no-reference Image Quality Assessment (NR-IQA)では、限られたデータセットサイズでの課題は、堅牢で一般化可能なモデルの開発を妨げている。従来の方法では、大きなデータセットを使用してIQAのリッチな表現を抽出することでこの問題に対処する。また、視覚言語モデル(VLM)をベースとしたIQAを提案する手法もあるが、汎用VLMとIQAのドメインギャップはスケーラビリティを制約している。本稿では,VLM から品質関連知識を選択的に抽出し,大規模データセットのスケーラビリティを活用することにより,IQA の一般化可能な表現を構築する新しい事前学習フレームワークを提案する。具体的には、5つの代表的な画像品質属性に対して最適なテキストプロンプトを慎重に選択し、VLMを用いて擬似ラベルを生成する。多数の属性を意識した擬似ラベルを大きな画像データセットで生成し,画像品質に関する豊かな表現をIQAモデルで学習する。提案手法は,複数のIQAデータセット上での最先端性能を実現し,優れた一般化能力を示す。これらの長所を生かして、画像生成モデルの評価や画像強調モデルの訓練、実世界の適用可能性の実証など、いくつかの応用を提案する。私たちはそのコードを利用できるようにします。 In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalability. In this work, we propose a novel pretraining framework that constructs a generalizable representation for IQA by selectively extracting quality-related knowledge from VLM and leveraging the scalability of large datasets. Specifically, we carefully select optimal text prompts for five representative image quality attributes and use VLM to generate pseudo-labels. Numerous attribute-aware pseudo-labels can be generated with large image datasets, allowing our IQA model to learn rich representations about image quality. Our approach achieves state-of-the-art performance on multiple IQA datasets and exhibits remarkable generalization capabilities. Leveraging these strengths, we propose several applications, such as evaluating image generation models and training image enhancement models, demonstrating our model's real-world applicability. We will make the code available for access.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# フィンランド小説の文学的分析のための定性的・計算的アプローチの組み合わせ Combining Qualitative and Computational Approaches for Literary Analysis of Finnish Novels ( http://arxiv.org/abs/2406.01021v1 ) ライセンス: Link先を確認	Emily Ohman, Riikka Rossi,	(参考訳) 計算感情分析を用いてフィンランド文学の古典から何が学べるか? 本稿は、文学作品研究における感情分析の計算手法が、文学や影響に対する質的あるいはより「伝統的な」アプローチとどのように併用できるかを検討することで、この問題に答えようとしている。本研究では,フィンランド文学の文体に適応した感情レキシコンと,フィンランド文学の文体の意味的感情空間を図解する単語埋め込みを組み合わせた,感情分析の単純かつ堅牢な計算手法を提示・開発する。我々は,ユハニ・アホ(Juhani Aho),ミンナ・カント(Minna Canth),マリア・ジョトゥニ(Maria Jotuni),F.E.シランプ(F.E. Sillanp\"a\"a"a)の4つの作品について定性的な分析を行った。テキストの語彙の計算分析は、テキスト内の感情的原子価の大規模な分布を評価するのに有用であり、他の研究者が研究結果を再現するのに役立つガイドラインを提供する。計算手法は, 文献に影響を及ぼす研究において, 近読的分析支援ツールとしての役割を担っているが, ジャンルや全国的カノンの大規模比較も可能であることを示す。 What can we learn from the classics of Finnish literature by using computational emotion analysis? This article tries to answer this question by examining how computational methods of sentiment analysis can be used in the study of literary works in conjunction with a qualitative or more 'traditional' approach to literature and affect. We present and develop a simple but robust computational approach of affect analysis that uses a carefully curated emotion lexicon adapted to Finnish turn-of-the-century literary texts combined with word embeddings to map out the semantic emotional spaces of seminal works of Finnish literature. We focus our qualitative analysis on selected case studies: four works by Juhani Aho, Minna Canth, Maria Jotuni, and F. E. Sillanp\"a\"a, but provide emotion arcs for a total of 975 Finnish novels. We argue that a computational analysis of a text's lexicon can be valuable in evaluating the large distribution of the emotional valence in a text and provide guidelines to help other researchers replicate our findings. We show that computational approaches have a place in traditional studies on affect in literature as a support tool for close-reading-based analyses, but also allowing for large-scale comparison between, for example, genres or national canons.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# レコメンダシステムにおける攻撃と防御 Poisoning Attacks and Defenses in Recommender Systems: A Survey ( http://arxiv.org/abs/2406.01022v1 ) ライセンス: Link先を確認	Zongwei Wang, Junliang Yu, Min Gao, Guanhua Ye, Shazia Sadiq, Hongzhi Yin,	(参考訳) 現代のレコメンデーターシステム(RS)は、デジタルプラットフォーム全体のユーザエクスペリエンスを著しく向上させたが、毒殺攻撃による重大な脅威に直面している。これらの攻撃は、非倫理的な利益のためにレコメンデーションアウトプットを操作することを目的としており、悪意のあるデータを注入したり、モデルのトレーニングを介入することでRSの脆弱性を悪用している。この調査は、攻撃者のレンズを通してこれらの脅威を調べ、そのメカニズムと影響について新たな洞察を提供することによって、ユニークな視点を示す。具体的には、攻撃目標の設定、攻撃能力の評価、被害者のアーキテクチャの分析、毒殺戦略の実行の4段階を含む、系統的なパイプラインを詳述する。パイプラインは様々な攻撃戦術と整合するだけでなく、異なる毒殺攻撃の焦点を特定するための包括的分類としても機能する。これに対応して、我々は防衛戦略を2つの主要なカテゴリに分類する: 有害なデータフィルタリングと、防御者の視点からの堅牢な訓練である。最後に、既存の制限を強調し、この分野におけるさらなる探索のための革新的な方向性を提案する。 Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats through the lens of an attacker, offering fresh insights into their mechanics and impacts. Concretely, we detail a systematic pipeline that encompasses four stages of a poisoning attack: setting attack goals, assessing attacker capabilities, analyzing victim architecture, and implementing poisoning strategies. The pipeline not only aligns with various attack tactics but also serves as a comprehensive taxonomy to pinpoint focuses of distinct poisoning attacks. Correspondingly, we further classify defensive strategies into two main categories: poisoning data filtering and robust training from the defender's perspective. Finally, we highlight existing limitations and suggest innovative directions for further exploration in this field.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# Khayyamがペルシアの筆跡データセットをオフラインで公開 Khayyam Offline Persian Handwriting Dataset ( http://arxiv.org/abs/2406.01025v1 ) ライセンス: Link先を確認	Pourya Jafarzadeh, Padideh Choobdar, Vahid Mohammadi Safarzadeh,	(参考訳) 手書き解析は、マシンラーニングにおいて依然として重要な応用である。どんな手書き認識アプリケーションでも基本的な要件は、包括的なデータセットが利用できることだ。標準ラベル付きデータセットは、学習アルゴリズムのトレーニングと評価において重要な役割を果たす。本稿では,ハヤムデータセットをペルシア語の要素(単語,文,文字,数字)の非拘束手書きデータセットとして提示する。現在利用可能なデータセットでは稀なペルシャ語サンプルの収集に集中しています。カヤムのデータセットには44000語、60000文字、6000桁が含まれている。さらに、この形式は400人のペルシア人作家によって埋められた。データセットの適用性を示すために、数字、文字、単語データに基づいて機械学習アルゴリズムを訓練し、結果を報告する。このデータセットは研究や学術的な用途で利用できる。 Handwriting analysis is still an important application in machine learning. A basic requirement for any handwriting recognition application is the availability of comprehensive datasets. Standard labelled datasets play a significant role in training and evaluating learning algorithms. In this paper, we present the Khayyam dataset as another large unconstrained handwriting dataset for elements (words, sentences, letters, digits) of the Persian language. We intentionally concentrated on collecting Persian word samples which are rare in the currently available datasets. Khayyam's dataset contains 44000 words, 60000 letters, and 6000 digits. Moreover, the forms were filled out by 400 native Persian writers. To show the applicability of the dataset, machine learning algorithms are trained on the digits, letters, and word data and results are reported. This dataset is available for research and academic use.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# 言語モデルの信頼性を向上したシンボル結合 Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors ( http://arxiv.org/abs/2406.01026v1 ) ライセンス: Link先を確認	Mengge Xue, Zhenyu Hu, Meng Zhao, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Chengguo Yin,	(参考訳) 大規模言語モデル (LLMs) の研究において, MCQ (Multiple-Choice Questions) が重要な研究領域となっている。これまでの研究は、LCMのパフォーマンスが回答選択の提示に影響され、スーパービジョン・ファインチューニング(SFT)における選択バイアスが未探索のままである、というシナリオにおいて、MCQにおける選択バイアス問題を調査してきた。本稿では,LLMのMCSB能力が不十分なため,選択バイアスがSFT相に持続していることを明らかにする。この制限は、モデルが解の選択肢と対応する記号(例えば、A/B/C/D)を効果的に関連付けるのに苦労していることを意味する。モデルのMCSB能力を高めるために、まず損失関数にオプション内容を取り込んで、オプションシンボルとコンテンツの重みを調整し、現在のシンボルのオプション内容を理解するようモデルに指示する。そこで我々は,ポイントワイド・インテリジェント・フィードバック (PIF) と呼ばれるMCQに対する効率的なSFTアルゴリズムを提案する。 PIFは、不正なオプション内容とすべての候補シンボルをランダムに組み合わせて負のインスタンスを構築し、これらの負のサンプルをLLMにフィードバックするポイントワイズ損失を提案する。実験の結果, PIF は MCSB 能力を向上させることにより, モデル選択バイアスを著しく低減することが示された。興味深いことに、PIFはMCQの精度を大幅に向上させる。 Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM's inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model's MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combining the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model's selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# PRICE: クロスデータベース・カーディナリティ推定のための事前訓練モデル PRICE: A Pretrained Model for Cross-Database Cardinality Estimation ( http://arxiv.org/abs/2406.01027v1 ) ライセンス: Link先を確認	Tianjing Zeng, Junwei Lan, Jiahong Ma, Wenqing Wei, Rong Zhu, Pengfei Li, Bolin Ding, Defu Lian, Zhewei Wei, Jingren Zhou,	(参考訳) クエリ実行計画の最適化には,カーディナリティ推定(CardEst)が不可欠である。最近のMLベースのCardEst手法は、データベース間の転送可能性の欠如と高い準備コストのため、高い精度を達成できるが、デプロイメント上の課題に直面している。本稿では,これらの制約に対処するPRetrained MultI-table CardEstモデルであるPRICEを提案する。 PRICEは低レベルだが転送可能なデータ分散とクエリ情報を取得し、メタ知識を学習するために自己認識モデルをエレガントに適用し、任意のデータベースの濃度を計算する。一般に、その作成コストは基本的な1次元ヒストグラムベースのCardEst法とほとんど変わらないが、高い推定精度を達成するために、目に見えない新しいデータベースに適用できる。さらに、PRICEを微調整することで、特定のデータベース上での性能をさらに向上することができる。 30の多様なデータセットを使用してPRICEを事前トレーニングし、約5時間で処理を完了し、結果としてモデルサイズは約40MBになった。評価の結果、PRICEは既存の手法を一貫して上回り、いくつかの未確認データベース上で最高の推定精度を達成し、オーバーヘッドを低くして高速な実行計画を生成することがわかった。少量のデータベース固有のクエリで微調整した後、PRICEは最適なクエリに非常に近いプランを見つけることができた。一方、PRICEは一般的に、データ更新、データスケーリング、クエリのワークロードシフトなど、さまざまな設定に適用できます。私たちはすべてのデータとコードをhttps://github.com/StCarmen/PRICE.comで公開しました。 Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across databases. In this paper, we propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations. PRICE takes low-level but transferable features w.r.t. data distributions and query information and elegantly applies self-attention models to learn meta-knowledge to compute cardinality in any database. It is generally applicable to any unseen new database to attain high estimation accuracy, while its preparation cost is as little as the basic one-dimensional histogram-based CardEst methods. Moreover, PRICE can be finetuned to further enhance its performance on any specific database. We pretrained PRICE using 30 diverse datasets, completing the process in about 5 hours with a resulting model size of only about 40MB. Evaluations show that PRICE consistently outperforms existing methods, achieving the highest estimation accuracy on several unseen databases and generating faster execution plans with lower overhead. After finetuning with a small volume of databasespecific queries, PRICE could even find plans very close to the optimal ones. Meanwhile, PRICE is generally applicable to different settings such as data updates, data scaling, and query workload shifts. We have made all of our data and codes publicly available at https://github.com/StCarmen/PRICE.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# LLEMamba:ディープ・アンフォールディング・ネットワークを用いたライティングガイドマンバによる低照度化 LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network ( http://arxiv.org/abs/2406.01028v1 ) ライセンス: Link先を確認	Xuanqi Zhang, Haijin Zeng, Jinwang Pan, Qiangqiang Shen, Yongyong Chen,	(参考訳) トランスフォーマーをベースとした低照度化手法は,グローバルコンテキストにおける長距離依存性を効果的にキャプチャすることで,有望な性能を実現している。しかし、その高い計算需要は、深層展開ネットワークにおける複数イテレーションのスケーラビリティを制限するため、解釈可能性と歪みの柔軟バランスが困難である。この問題に対処するために,Retinex Optimization と Mamba Deep Priors によって理論的解釈性と忠実性が保証される深層展開ネットワーク (LLEMamba) を用いたリライト誘導型マンバによる新しい低照度化手法を提案する。具体的には、LLEMambaは、まず、深く展開するネットワーク内に、乗算器の交互方向法(ADMM)に基づく反復最適化過程を組み込んだ、深い事前のRetinexモデルを構築します。 Transformerとは異なり、複数のイテレーションで深層展開フレームワークを支援するため、LLEMambaは計算複雑性の低い新しいMambaアーキテクチャを導入している。ベンチマーク実験により,LLEMambaは既存の最先端手法と比較して,優れた定量的評価と低歪みの視覚的結果が得られることが示された。 Transformer-based low-light enhancement methods have yielded promising performance by effectively capturing long-range dependencies in a global context. However, their elevated computational demand limits the scalability of multiple iterations in deep unfolding networks, and hence they have difficulty in flexibly balancing interpretability and distortion. To address this issue, we propose a novel Low-Light Enhancement method via relighting-guided Mamba with a deep unfolding network (LLEMamba), whose theoretical interpretability and fidelity are guaranteed by Retinex optimization and Mamba deep priors, respectively. Specifically, our LLEMamba first constructs a Retinex model with deep priors, embedding the iterative optimization process based on the Alternating Direction Method of Multipliers (ADMM) within a deep unfolding network. Unlike Transformer, to assist the deep unfolding framework with multiple iterations, the proposed LLEMamba introduces a novel Mamba architecture with lower computational complexity, which not only achieves light-dependent global visual context for dark images during reflectance relight but also optimizes to obtain more stable closed-form solutions. Experiments on the benchmarks show that LLEMamba achieves superior quantitative evaluations and lower distortion visual results compared to existing state-of-the-art methods.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# CYCLO: サイクリックグラフ変換器による空中映像の多目的関係モデリング CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos ( http://arxiv.org/abs/2406.01029v1 ) ライセンス: Link先を確認	Trong-Thuan Nguyen, Pha Nguyen, Xin Li, Jackson Cothren, Alper Yilmaz, Khoa Luu,	(参考訳) 映像シーングラフ生成(VidSGG)は、オブジェクト間の複雑な関係とビデオシーケンスにおける時間的ダイナミクスをキャプチャし、解釈するための変換的アプローチとして登場した。本稿では,空中ビデオにおける多目的関係モデリングに焦点を当てた新しいAeroEyeデータセットを提案する。私たちのAeroEyeデータセットには、さまざまなドローンシーンが含まれており、オブジェクト間の複雑な関係や空間的配置をキャプチャする、視覚的に包括的で正確な述語集が含まれています。この目的のために,循環グラフ変換器 (CYCLO) の手法を提案する。また、提案手法により、固有巡回パターンでシーケンスを処理し、オブジェクト関係を正しい順序で処理することができる。これにより、情報損失を最小限に抑えつつ、周期的・重複的な関係を効果的に捉えることができる。 AeroEyeデータセットに関する広範な実験は、提案されたCYCLOモデルの有効性を示し、ドローンビデオのシーン理解を行う可能性を示している。最後に、CYCLO法は、PVSGとASPIReの2つのシーングラフ生成ベンチマークに対して、常にステート・オブ・ザ・アート(SOTA)結果を達成する。 Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlapping relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# LLMとGNNは補完的:マルチモーダルグラフ学習のためのLLMを蒸留する LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning ( http://arxiv.org/abs/2406.01032v1 ) ライセンス: Link先を確認	Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang,	(参考訳) グラフニューラルネットワーク(GNN)の最近の進歩は、複雑な分子構造をモデル化して特性を予測する能力を大幅に強化している。それでも、分子データは、GNNがうまく扱えないテキスト情報や視覚情報を含む、単なるグラフ構造以上のものを含んでいる。このギャップを埋めるために,マルチモーダルな分子データを用いてLarge Language Models (LLMs) から洞察を抽出する,革新的なフレームワークを提案する。 GALLON(Graph Learning from Large Language Model Distillation)は,マルチモーダル知識をMLP(Multilayer Perceptron)に統合することにより,LLMとGNNの能力を相乗化するフレームワークである。本手法は、分子のリッチテキストデータと視覚データと、GNNの構造解析能力を統合する。大規模実験により, 蒸留MLPモデルにより, 分子特性予測の精度と効率が著しく向上することが明らかとなった。 Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# 配向誘導重み補正を用いたマルチタスク学習を用いた一般化ジャージ数認識 Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement ( http://arxiv.org/abs/2406.01033v1 ) ライセンス: Link先を確認	Yung-Hui Lin, Yu-Wen Chang, Huang-Chia Shih, Takahiro Ogawa,	(参考訳) ジャージー番号認識(JNR)はスポーツ分析において常に重要な課題である。画像がぼやけ、隠蔽、変形、解像度の低いため、認識精度の向上は現在も進行中の課題である。近年の研究では、数値ローカライゼーションと光学的文字認識を用いてこの問題に対処している。いくつかのアプローチでは、人体回転角がジャージの数字の識別に与える影響を無視して、プレイヤー識別スキームを画像シーケンスに適用している。マルチタスクスキームを用いて各数字を正確に予測することで、より堅牢な結果が得られる。そこで本研究では,人体方向角と数字の手がかりを組み合わせた多タスク学習手法であるアングル・ディジット・リファレンス・スキーム(ADRS)を提案する。実験結果から,提案手法は推測情報を増加させ,予測精度を大幅に向上させる。 1種類のスポーツしか扱えない最先端の手法と比較して、提案手法はより多種多様な実用的JNRアプリケーションを生成する。サッカー,サッカー,バスケットボール,バレーボール,野球などの多種多様なチームスポーツをデータセットに組み込むことは,スポーツ分析におけるJNRの一般化に大きく貢献する。我々の精度はトップ1で64.07%、トップ2で89.97%、対応するF1スコアは67.46%、90.64%である。 Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequences, ignoring the impact of human body rotation angles on jersey digit identification. Accurately predicting the number of jersey digits by using a multi-task scheme to recognize each individual digit enables more robust results. Based on the above considerations, this paper proposes a multi-task learning method called the angle-digit refine scheme (ADRS), which combines human body orientation angles and digit number clues to recognize athletic jersey numbers. Based on our experimental results, our approach increases inference information, significantly improving prediction accuracy. Compared to state-of-the-art methods, which can only handle a single type of sport, the proposed method produces a more diverse and practical JNR application. The incorporation of diverse types of team sports such as soccer, football, basketball, volleyball, and baseball into our dataset contributes greatly to generalized JNR in sports analytics. Our accuracy achieves 64.07% on Top-1 and 89.97% on Top-2, with corresponding F1 scores of 67.46% and 90.64%, respectively.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# 3次元心筋変形解析のための合成データ生成 Synthetic Data Generation for 3D Myocardium Deformation Analysis ( http://arxiv.org/abs/2406.01040v1 ) ライセンス: Link先を確認	Shahar Zuler, Dan Raviv,	(参考訳) 高分解能CTデータセットとGTアノテーションを用いた3次元心筋変形の正確な解析は、心血管画像研究の進展に不可欠である。しかし、そのようなデータセットの不足は、堅牢な心筋変形解析モデルを開発する上で大きな課題となる。そこで本研究では,心血管画像データセットの充実のための合成データ生成手法を提案する。本稿では,GT 3D光フローアノテーションを付加した合成データ生成手法を提案する。心4次元CTスキャン(4D)からのデータ作成,パラメータの選択,および同一または他の心3次元CTデータからの合成データの作成について概説した。本研究は,高分解能CTデータセットの欠如による限界を克服し,臨床応用と診断のための正確かつ信頼性の高い心筋変形解析アルゴリズムの開発に寄与する。私たちのコードは、http://www.github.com/shaharzuler/cardio_volume_skewerで利用可能です。 Accurate analysis of 3D myocardium deformation using high-resolution computerized tomography (CT) datasets with ground truth (GT) annotations is crucial for advancing cardiovascular imaging research. However, the scarcity of such datasets poses a significant challenge for developing robust myocardium deformation analysis models. To address this, we propose a novel approach to synthetic data generation for enriching cardiovascular imaging datasets. We introduce a synthetic data generation method, enriched with crucial GT 3D optical flow annotations. We outline the data preparation from a cardiac four-dimensional (4D) CT scan, selection of parameters, and the subsequent creation of synthetic data from the same or other sources of 3D cardiac CT data for training. Our work contributes to overcoming the limitations imposed by the scarcity of high-resolution CT datasets with precise annotations, thereby facilitating the development of accurate and reliable myocardium deformation analysis algorithms for clinical applications and diagnostics. Our code is available at: http://www.github.com/shaharzuler/cardio_volume_skewer	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# ガウススプラッティングを用いた単眼ビデオからの自己校正4次元新しいビュー合成 Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting ( http://arxiv.org/abs/2406.01042v1 ) ライセンス: Link先を確認	Fang Li, Hao Zhang, Narendra Ahuja,	(参考訳) ガウス散乱(GS)は、特にダイナミックシーンにおいて、ニューラルレイディアンス場(NeRF)と比較して、シーン再構成効率と新規ビュー合成(NVS)の精度を著しく向上させた。しかし、GS や NeRF をベースとした現在の 4D NVS の手法は、主に COLMAP が提供するカメラパラメータに依存しており、COLMAP が生成したスパース点雲を初期化に利用している。これは、特に大きな物体の動きのあるシーンや、大きな回転と組み合わされた小さな翻訳のような極端なカメラ条件において、動的シーンの表現が貧弱になることがある。いくつかの研究は、市販のモデルから得られた深度、光学的流れなどの追加情報によって、カメラパラメータとシーンの推定を同時に最適化する。この証明されていない情報を真実として使うと、堅牢性と精度が低下し、長いモノクロビデオ(例えば数百フレーム)で頻繁に発生する。本稿では,カメラパラメータの自己校正による高忠実度 4D GS シーン表現の学習手法を提案する。これには、3D構造を頑健に表現する2D点の特徴の抽出や、カメラパラメータと3D構造を連続的に4Dシーンの最適化に利用することが含まれる。提案手法の精度と時間効率を,いくつかの標準ベンチマークにおける定量的,定性的な実験結果を通じて実証する。その結果,4次元の新規なビュー合成のための最先端手法よりも顕著な改善が見られた。ソースコードは近々https://github.com/fangli333/SC-4DGSで公開される。 Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis. The source code will be released soon at https://github.com/fangli333/SC-4DGS.	翻訳日:2024-06-06 02:17:50 公開日:2024-06-03
# 核医学人工知能の行動:Bethesda Report (AI Summit 2024) Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024) ( http://arxiv.org/abs/2406.01044v1 ) ライセンス: Link先を確認	Arman Rahmim, Tyler J. Bradshaw, Guido Davidzon, Joyita Dutta, Georges El Fakhri, Munir Ghesani, Nicolas A. Karakatsanis, Quanzheng Li, Chi Liu, Emilie Roncali, Babak Saboury, Tahir Yusufaly, Abhinav K. Jha,	(参考訳) 第2回SNMMI人工知能(AI)サミット(第2回SNMMI AI Task Force)は、2024年2月29日から3月1日にかけて、MDのベセスダで開催された。さまざまなコミュニティメンバと利害関係者を集結させ、2022年に成功したAI Summitに続いて、サミットのテーマは「AI in Action」だった。主なトピックは6つ。 i)AIタスクフォースによる事前及び進行中の取り組みの概要二計算核腫瘍学の新たなニーズ及びツール三大規模言語及び生成モデルにおける新たなフロンティア四核医学におけるAIの利用に関する価値提案を定義すること。 (v)データとモデルリポジトリの取り組みを含むオープンサイエンス (vi)返済及び資金調達の問題。主な取り組み、発見、課題、次のステップはこの写本にまとめられている。 The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# LLMを用いたスキーマ対応イベント抽出 Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs ( http://arxiv.org/abs/2406.01045v1 ) ライセンス: Link先を確認	Fatemeh Shiri, Van Nguyen, Farhad Moghimifar, John Yoo, Gholamreza Haffari, Yuan-Fang Li,	(参考訳) 大規模言語モデル(LLM)は、自然言語データを処理する上で重要な能力を示し、さまざまなテキストソースから効率的な知識を抽出し、状況認識を高め、意思決定を支援する。しかし、幻覚への感受性が原因で懸念が生じ、文脈的に不正確な内容が生じる。この作業は、イベント抽出の自動化にLLMを活用することに焦点を当て、タスクをイベント検出とイベント引数抽出に分解することで幻覚に対処する新しい方法を導入する。さらに,提案手法では,動的スキーマ対応の拡張検索例を特定の質問に合わせたプロンプトに統合し,検索機能強化生成などの高度なプロンプト技術を拡張し,適応させる。顕著なイベント抽出ベンチマークの評価結果と、合成されたベンチマークの結果は、ベースラインアプローチと比較して、手法の優れた性能を示している。 Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# クラウドコンピューティングにおけるデシリアブルワークロードのオンラインスケジューリングのための高度な強化学習フレームワーク An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing ( http://arxiv.org/abs/2406.01047v1 ) ライセンス: Link先を確認	Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda,	(参考訳) 効率的なリソース利用と完全なユーザエクスペリエンスは通常、クラウドコンピューティングプラットフォームで互いに衝突します。リソース利用の増加には多大な努力が注がれているが、クラウドコンピューティングプラットフォームのユーザエクスペリエンスに影響を与えないように努力している。プラットフォーム全体に分散した残りのコンピューティングリソースをより有効活用するために、遅延可能なジョブには、ユーザに割引価格が提供される。この種の遅延可能なジョブに対しては、ユーザーは将来、柔軟な時間帯で特定の中断のない期間に、大きな割引で実行されるジョブを提出することができる。これらの遅延可能なジョブは、オンデマンドジョブをデプロイした後、残りのキャパシティの下でスケジュールされるため、高いリソース利用を達成するとともに、オンラインでの待ち時間を可能な限り短縮することが課題である。本稿では,クラウド上でのDeferrableJobsのオンラインスケジューリング手法である‘textit{Online Scheduling for DeferrableJobs in Cloud’ (\OSDEC{})を提案する。統合強化学習フレームワークにより,提案手法は,高資源利用を維持しつつ,デプロイメントスケジュールを適切に計画し,ユーザの待ち時間を短縮することができる。提案手法は公開データセット上で検証され,優れた性能を示す。 Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with a discounted price to users. For this type of deferrable jobs, users are allowed to submit jobs that will run for a specific uninterrupted duration in a flexible range of time in the future with a great discount. With these deferrable jobs to be scheduled under the remaining capacity after deploying those on-demand jobs, it remains a challenge to achieve high resource utilization and meanwhile shorten the waiting time for users as much as possible in an online manner. In this paper, we propose an online deferrable job scheduling method called \textit{Online Scheduling for DEferrable jobs in Cloud} (\OSDEC{}), where a deep reinforcement learning model is adopted to learn the scheduling policy, and several auxiliary tasks are utilized to provide better state representations and improve the performance of the model. With the integrated reinforcement learning framework, the proposed method can well plan the deployment schedule and achieve a short waiting time for users while maintaining a high resource utilization for the platform. The proposed method is validated on a public dataset and shows superior performance.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# MACT:談話表現構造解析のためのモデル非依存型言語横断学習 MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing ( http://arxiv.org/abs/2406.01052v1 ) ライセンス: Link先を確認	Jiangming Liu,	(参考訳) Discourse Representation Structure (DRS) は、言語間の任意の長さのテキストの意味を捉えるために設計された、革新的な意味表現である。意味表現解析は論理形式による自然言語理解の実現に不可欠である。それでも、DRS解析モデルの性能は、モノリンガルデータのみに制限されている。この問題に対処するために、言語横断的なトレーニング戦略を導入する。提案手法はモデルに依存しないが,有効性が高い。言語間のトレーニングデータを活用し、事前訓練された言語モデルにエンコードされた言語間のアライメントを完全に活用する。標準ベンチマークで行った実験は、言語間学習法を用いて訓練されたモデルが、英語、ドイツ語、イタリア語、オランダ語でDRS節とグラフ解析を大幅に改善したことを示している。最終モデルと以前のモデルを比較すると、標準ベンチマークで最先端の結果が得られます。さらに、詳細な分析はパーサの性能について深い洞察を与え、DRS解析における将来の研究にインスピレーションを与える。ベンチマークの新しい結果を付録にアップデートし続けます。 Discourse Representation Structure (DRS) is an innovative semantic representation designed to capture the meaning of texts with arbitrary lengths across languages. The semantic representation parsing is essential for achieving natural language understanding through logical forms. Nevertheless, the performance of DRS parsing models remains constrained when trained exclusively on monolingual data. To tackle this issue, we introduce a cross-lingual training strategy. The proposed method is model-agnostic yet highly effective. It leverages cross-lingual training data and fully exploits the alignments between languages encoded in pre-trained language models. The experiments conducted on the standard benchmarks demonstrate that models trained using the cross-lingual training method exhibit significant improvements in DRS clause and graph parsing in English, German, Italian and Dutch. Comparing our final models to previous works, we achieve state-of-the-art results in the standard benchmarks. Furthermore, the detailed analysis provides deep insights into the performance of the parsers, offering inspiration for future research in DRS parsing. We keep updating new results on benchmarks to the appendix.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# 確率分布を用いた継続的疾患分類における信頼度に基づくタスク予測 Confidence-Based Task Prediction in Continual Disease Classification Using Probability Distribution ( http://arxiv.org/abs/2406.01054v1 ) ライセンス: Link先を確認	Tanvi Verma, Lukas Schwemer, Mingrui Tan, Fei Gao, Yong Liu, Huazhu Fu,	(参考訳) 深層学習モデルは、疾患分類における医療画像の発見に有効であることが広く認識されている。しかし、これらの制限は、様々なソースから新たに注釈付けされた医療データの継続的な流入を特徴とする、ダイナミックで絶え間なく変化する臨床環境において明らかになる。この文脈では、継続的な学習の必要性は、進化する医療シナリオに適応するだけでなく、医療データのプライバシーを確保するためにも特に重要となる。そこで本研究では,新しいタスクが導入されるたびに,新たな専門家分類器が付加される,専門家分類器からなるネットワークの利用を強調した。本稿では,信頼度を利用したタスクID予測器CTPを提案し,その確率分布(ログ)を利用して,推定時のタスクIDを正確に決定する。ログは、分類器が自分自身以外のタスクに関連付けられたデータに対して高いエントロピー分布が得られるように調整される。分布と計算信頼性スコアのノイズ領域を定義することにより、CTPは他の関連する連続学習手法と比較して優れた性能が得られる。さらに、推論時のデータの連続体を提供することにより、CTPの性能をさらに向上することができる。 Deep learning models are widely recognized for their effectiveness in identifying medical image findings in disease classification. However, their limitations become apparent in the dynamic and ever-changing clinical environment, characterized by the continuous influx of newly annotated medical data from diverse sources. In this context, the need for continual learning becomes particularly paramount, not only to adapt to evolving medical scenarios but also to ensure the privacy of healthcare data. In our research, we emphasize the utilization of a network comprising expert classifiers, where a new expert classifier is added each time a new task is introduced. We present CTP, a task-id predictor that utilizes confidence scores, leveraging the probability distribution (logits) of the classifier to accurately determine the task-id at inference time. Logits are adjusted to ensure that classifiers yield a high-entropy distribution for data associated with tasks other than their own. By defining a noise region in the distribution and computing confidence scores, CTP achieves superior performance when compared to other relevant continual learning methods. Additionally, the performance of CTP can be further improved by providing it with a continuum of data at the time of inference.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# 要求品質研究成果物:回収・分析・管理指針 Requirements Quality Research Artifacts: Recovery, Analysis, and Management Guideline ( http://arxiv.org/abs/2406.01055v1 ) ライセンス: Link先を確認	Julian Frattini, Lloyd Montgomery, Davide Fucci, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach,	(参考訳) 要求品質調査は、要求仕様の品質を評価し改善することに特化したもので、データセット(品質欠陥に関する情報を含む)や実装(これらの欠陥を自動的に検出し、除去する)のような研究成果物に依存します。しかし、最近の研究では、これらの研究成果の大部分は入手できないか、公表されていないことが判明し、研究領域の進歩を阻害している。本研究は,要求品質研究における研究成果物の利用性の向上を目的としている。この目的のために,(1)人工物回収イニシアチブを拡張し,(2)ベイジアンデータ分析による人工物利用の理由を実証的に評価し,(3)オープンサイエンスアーティファクト開示のための簡潔なガイドラインをコンパイルする。その結果,回収した10データセットと7実装,時間とともにアーティファクトの可用性が向上する実証的サポート,パブリックホスティングサービスの肯定的な効果,コミュニティコメントのための実用的アーティファクト管理ガイドラインが得られた。本研究により、オープンサイエンスの原則への固執を奨励し、支援し、要求研究品質コミュニティのための研究アーティファクトの可用性を向上させることを期待する。 Requirements quality research, which is dedicated to assessing and improving the quality of requirements specifications, is dependent on research artifacts like data sets (containing information about quality defects) and implementations (automatically detecting and removing these defects). However, recent research exposed that the majority of these research artifacts have become unavailable or have never been disclosed, which inhibits progress in the research domain. In this work, we aim to improve the availability of research artifacts in requirements quality research. To this end, we (1) extend an artifact recovery initiative, (2) empirically evaluate the reasons for artifact unavailability using Bayesian data analysis, and (3) compile a concise guideline for open science artifact disclosure. Our results include 10 recovered data sets and 7 recovered implementations, empirical support for artifact availability improving over time and the positive effect of public hosting services, and a pragmatic artifact management guideline open for community comments. With this work, we hope to encourage and support adherence to open science principles and improve the availability of research artifacts for the requirements research quality community.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# 世界航法士としての仮想アバター生成モデル Virtual avatar generation models as world navigators ( http://arxiv.org/abs/2406.01056v1 ) ライセンス: Link先を確認	Sai Mandava,	(参考訳) 本稿では,仮想アバターを用いたロッククライミング環境における人間の動きをシミュレーションする新しいビデオモデルSABR-CLIMBを紹介する。拡散変換器は、各拡散ステップのノイズの代わりにサンプルを予測し、全動画を取り込み、完全なモーションシーケンスを出力する。大規模プロプライエタリなデータセット、NAV-22M、および相当量の計算資源を活用することで、ロボット工学、スポーツ、医療における複雑なタスクのために汎用仮想アバターを訓練するシステムの概念実証を示す。 We introduce SABR-CLIMB, a novel video model simulating human movement in rock climbing environments using a virtual avatar. Our diffusion transformer predicts the sample instead of noise in each diffusion step and ingests entire videos to output complete motion sequences. By leveraging a large proprietary dataset, NAV-22M, and substantial computational resources, we showcase a proof of concept for a system to train general-purpose virtual avatars for complex tasks in robotics, sports, and healthcare.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# VIP:マルチモーダル大言語モデルによる画像出力 VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model ( http://arxiv.org/abs/2406.01059v1 ) ライセンス: Link先を確認	Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Zeke Xie, Zhong Ji, Jungong Han, Mingming Sun,	(参考訳) 本稿では,画像の中心的内容から周囲の部分を外挿することを目的とした,画像の露光に関する問題の解決に焦点をあてる。最近の研究は有望なパフォーマンスを達成したが、汎用性とカスタマイズの欠如は、より広範なシナリオにおける実践的な応用を妨げる。そこで本研究では,ユーザの要求に応じて結果のカスタマイズが可能な,新たな画像出力フレームワークを提案する。まず,画像のマスキング部分とマスキング部分のテキスト記述を自動的に抽出し整理するマルチモーダル大規模言語モデル(MLLM)を利用する。そこで、得られたテキストプロンプトを導入して、出力結果のカスタマイズを可能にする。さらに、画像の特定の空間領域とテキストプロンプトの対応する部分との相互作用を強化するために、特別にCentral-Total-Surrounding (CTS) と呼ばれるCentral-Attentionモジュールが精巧に設計されている。既存のほとんどの手法とは異なり、本手法はスクラッチから訓練されるのではなく、オフザシェルフ安定拡散(SD)モデルでわずかに微調整されているため、非常に資源効率が高い。最後に、Scenery、Building、WikiArtの3つの一般的なデータセットの実験結果から、私たちのモデルはSoTAの手法を大幅に上回ることを示した。さらに、そのカスタマイズ能力を示すために、多彩なアウトペイント結果がリストアップされる。 In this paper, we focus on resolving the problem of image outpainting, which aims to extrapolate the surrounding parts given the center contents of an image. Although recent works have achieved promising performance, the lack of versatility and customization hinders their practical applications in broader scenarios. Therefore, this work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users. First of all, we take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image. Accordingly, the obtained text prompts are introduced to endow our model with the capacity to customize the outpainting results. In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts. Note that unlike most existing methods, our approach is very resource-efficient since it is just slightly fine-tuned on the off-the-shelf stable diffusion (SD) model rather than being trained from scratch. Finally, the experimental results on three commonly used datasets, i.e. Scenery, Building, and WikiArt, demonstrate our model significantly surpasses the SoTA methods. Moreover, versatile outpainting results are listed to show its customized ability.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# マグノ・オプトメカニクスにおける高次例外点まわりの力学力学 Mechanical dynamics around higher-order exceptional point in magno-optomechanics ( http://arxiv.org/abs/2406.01060v1 ) ライセンス: Link先を確認	Wen-Di He, Xiao-Hong Fan, Ming-Yue Liu, Guo-Qiang Zhang, Hai-Chao Li, Wei Xiong,	(参考訳) 実験的に実現可能なマグノオプトメカニクスにおける多種多様な例外点 (EP) を, 物理的に直接接触してマグノメカニクスサブシステムに結合したオプトメカニクスサブシステムを用いて理論的に検討した。空洞とキッテルモードの両方を断熱的に除去することにより、散逸時間およびパリティ時間対称な例外点が観察できる。キャビティモードのみを除去すると、非退化(退化)機械モードに対して第2(第3)次擬エルミートEPが出現する。これらのEPを取り巻く2つの力学モードの特異な力学挙動についてさらに研究した。提案手法は多種多様なEPを設計し,非エルミート相転移をマグノ・オプトメカニクスにおける異常な動的挙動で定量化するための有望な方法である。 We theoretically study diverse exceptional points (EPs) in an experimentally feasible magno-optomechanics consisting of an optomechanical subsystem coupled to a magnomechanical subsystem via physically direct contact. By adiabatically eliminating both the cavity and the Kittel mode, dissipative and parity-time symmetric exceptional points can be observed. When only the cavity mode is eliminated, a second (third) -order pseudo-Hermitian EP emerges for nondegenerate (degenerate) mechanical modes. The distinct dynamical behavior of two mechanical modes around these EPs are further studied. Our proposal provides a promising way to engineer diverse EPs and quantify non-Hermitian phase transition with exceptional dynamical behavior in magno-optomechanics.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# SceneTextGen:拡散モデルを用いたレイアウト非依存のシーンテキスト画像合成 SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models ( http://arxiv.org/abs/2406.01062v1 ) ライセンス: Link先を確認	Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan,	(参考訳) 拡散モデルは画像生成の質を大幅に向上させてきたが、これらの画像内のテキストを正確かつコヒーレントにレンダリングする能力は依然として大きな課題である。従来の拡散に基づくシーンテキスト生成法は、中間レイアウト出力に依存して制限されるのが一般的である。この依存はしばしば、レイアウト生成フェーズの決定論的性質から生じる固有の制限である、テキストスタイルとフォントの制限された多様性をもたらす。これらの課題に対処するために,本稿では,事前定義されたレイアウトステージの必要性を回避するために設計された,新しい拡散ベースモデルであるSceneTextGenを紹介する。そうすることで、SceneTextGenはテキストのより自然で多様な表現を促進する。 SceneTextGenの斬新さは、3つの重要なコンポーネントの統合にある: 詳細なタイポグラフィ特性をキャプチャする文字レベルエンコーダと、文字レベルのインスタンスセグメンテーションモデルと、不要なテキスト生成とマイナーな文字不正確な問題に対処するワードレベルスポッティングモデルである。本手法の有効性は,標準拡散法とテキスト固有法を比較検討し,異なる公開視覚テキストデータセット間で生成した画像に対する文字認識率の向上を示すことで検証した。 While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text styles and fonts, an inherent limitation stemming from the deterministic nature of the layout generation phase. To address these challenges, this paper introduces SceneTextGen, a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage. By doing so, SceneTextGen facilitates a more natural and varied representation of text. The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed typographic properties, coupled with a character-level instance segmentation model and a word-level spotting model to address the issues of unwanted text generation and minor character inaccuracies. We validate the performance of our method by demonstrating improved character recognition rates on generated images across different public visual text datasets in comparison to both standard diffusion based methods and text specific methods.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# DANCE: データセット凝縮のためのデュアルビュー分散アライメント DANCE: Dual-View Distribution Alignment for Dataset Condensation ( http://arxiv.org/abs/2406.01063v1 ) ライセンス: Link先を確認	Hansong Zhang, Shikun Li, Fanzhao Lin, Weiping Wang, Zhenxing Qian, Shiming Ge,	(参考訳) データセット凝縮は、より大きな実際のトレーニングセットから本質的な知識を保持する小さな合成トレーニングセットを学習することで、データ負担の問題に対処する。これまでのところ、最先端のSOTA(State-of-the-art)の結果は、最適化指向の手法によって得られることが多いが、その非効率性は、現実的なデータセットへの適用を妨げる。一方、分散マッチング(DM)法は、最適化指向法と比較して、優れた効率性を示すが、準最適結果を示す。本稿では,内部クラスとクラス間の視点,すなわち永続的トレーニングと分散シフトから,現行のDMベースの手法の限界を明らかにする。これらの問題に対処するため,Dance(Dual-view Distribution AligNment for dataset CondEnsation)と呼ばれるDMベースの新しい手法を提案する。具体的には、内部クラスの観点からは、複数の「中間エンコーダ」を構築し、擬似的な長期分布アライメントを行い、コンデンサセットをトレーニングプロセス全体において実モデルの優れたプロキシとし、クラス間ビューでは、専門家モデルを用いて分布キャリブレーションを行い、コンデンサ中の合成データが実クラス領域に留まることを保証する。実験により,提案手法は様々なシナリオにおいて,元のDMに匹敵する効率を保ちながら,SOTA性能を実現することを示した。ソースコードはhttps://github.com/Hansong-Zhang/DANCEで入手できる。 Dataset condensation addresses the problem of data burden by learning a small synthetic training set that preserves essential knowledge from the larger real training set. To date, the state-of-the-art (SOTA) results are often yielded by optimization-oriented methods, but their inefficiency hinders their application to realistic datasets. On the other hand, the Distribution-Matching (DM) methods show remarkable efficiency but sub-optimal results compared to optimization-oriented methods. In this paper, we reveal the limitations of current DM-based methods from the inner-class and inter-class views, i.e., Persistent Training and Distribution Shift. To address these problems, we propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE), which exploits a few pre-trained models to improve DM from both inner-class and inter-class views. Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment, making the condensed set a good proxy of the real one during the whole training process; while from the inter-class view, we use the expert models to perform distribution calibration, ensuring the synthetic data remains in the real class region during condensing. Experiments demonstrate the proposed method achieves a SOTA performance while maintaining comparable efficiency with the original DM across various scenarios. Source codes are available at https://github.com/Hansong-Zhang/DANCE.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# モデルに基づくオフライン強化学習の因果的促進 Causal prompting model-based offline reinforcement learning ( http://arxiv.org/abs/2406.01065v1 ) ライセンス: Link先を確認	Xuehui Yu, Yi Guan, Rujia Shen, Xin Li, Chen Tang, Jingchi Jiang,	(参考訳) モデルベースのオフライン強化学習(RL)では、エージェントは追加または非倫理的な探索を必要とせずに、事前にコンパイルされたデータセットを完全に活用することができる。しかし、モデルベースのオフラインRLをオンラインシステムに適用することは、主にオンラインシステムによって生成されるデータセットの高度に最適化された(ノイズに満ちた)多様な性質のため、課題を提起する。これらの課題に対処するために,高度に最適化されたリソース制約のあるオンラインシナリオ用に設計されたCausal Prompting Reinforcement Learning (CPRL)フレームワークを紹介する。 CPRLの最初のフェーズは、環境力学をモデル化するためのHidden-Parameter Block Causal Prompting Dynamic (Hip-BCPD)の導入である。このアプローチは、不変因果的プロンプトを利用し、新しい多様なオンラインユーザを一般化するために隠れパラメータを調整する。その後のフェーズでは、再利用可能なスキルの融合を通じて複数のタスクに対処するための単一のポリシーが訓練され、スクラッチからトレーニングの必要性を回避する。 Dnurse APPのシミュレーションベースおよび実世界のオフラインデータセットを含む、さまざまなレベルのノイズを持つデータセットに対して行われた実験は、提案手法が、分配外およびノイズの多い環境で堅牢な決定を行え、同時代のアルゴリズムより優れていることを示した。さらに,Hip-BCPDの貢献と,パフォーマンスの堅牢性に対するスキル再利用戦略を別途検証する。我々はHip-BCPDの視覚構造とサブスキルの解釈可能性をさらに分析する。私たちはソースコードと、正確な医療意思決定タスクのための世界初の実世界の医療データセットをリリースしました。 Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal Prompting Reinforcement Learning (CPRL) framework, designed for highly suboptimal and resource-constrained online scenarios. The initial phase of CPRL involves the introduction of the Hidden-Parameter Block Causal Prompting Dynamic (Hip-BCPD) to model environmental dynamics. This approach utilises invariant causal prompts and aligns hidden parameters to generalise to new and diverse online users. In the subsequent phase, a single policy is trained to address multiple tasks through the amalgamation of reusable skills, circumventing the need for training from scratch. Experiments conducted across datasets with varying levels of noise, including simulation-based and real-world offline datasets from the Dnurse APP, demonstrate that our proposed method can make robust decisions in out-of-distribution and noisy environments, outperforming contemporary algorithms. Additionally, we separately verify the contributions of Hip-BCPDs and the skill-reuse strategy to the robustness of performance. We further analyse the visualised structure of Hip-BCPD and the interpretability of sub-skills. We released our source code and the first ever real-world medical dataset for precise medical decision-making tasks.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# グラフ上の分布シフトに対するトポロジ対応動的再重み付け Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph ( http://arxiv.org/abs/2406.01066v1 ) ライセンス: Link先を確認	Weihuang Zheng, Jiashuo Liu, Jiaxing Li, Jiayun Wu, Peng Cui, Youyong Kong,	(参考訳) グラフニューラルネットワーク(GNN)は、ノード分類タスクに広く使用されているが、トレーニングとテストノードが異なるディストリビューションから来ると、その実用性を制限するために一般化に失敗することが多い。これを解決するために、近年のアプローチでは、環境全体にわたって安定した予測方法を確立することを目的とした、アウト・オブ・ディストリビューション(OOD)一般化分野からの不変学習手法を採用している。しかし、これらの不変な仮定がグラフデータに適用可能であることは証明されておらず、そのような手法は理論的な確固たる支持を欠いていることが多い。本研究では,トポロジー・アウェア・ダイナミック・リウェイトリング(TAR)フレームワークを導入し,トレーニング中の幾何学的ワッサーシュタイン空間の勾配流を通して試料重量を動的に調整する。厳密な不変性の仮定に頼る代わりに,本手法が分散ロバスト性を提供できることを証明し,グラフデータに対する分布外一般化性能を向上させる。固有のグラフ構造を利用することで、TARは分散シフトを効果的に処理する。我々のフレームワークの優位性は、4つのグラフOODデータセットと3つのクラス不均衡ノード分類データセットの標準テストによって実証され、既存の手法よりも顕著に改善されている。 Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions, limiting their practicality. To overcome this, recent approaches adopt invariant learning techniques from the out-of-distribution (OOD) generalization field, which seek to establish stable prediction methods across environments. However, the applicability of these invariant assumptions to graph data remains unverified, and such methods often lack solid theoretical support. In this work, we introduce the Topology-Aware Dynamic Reweighting (TAR) framework, which dynamically adjusts sample weights through gradient flow in the geometric Wasserstein space during training. Instead of relying on strict invariance assumptions, we prove that our method is able to provide distributional robustness, thereby enhancing the out-of-distribution generalization performance on graph data. By leveraging the inherent graph structure, TAR effectively addresses distribution shifts. Our framework's superiority is demonstrated through standard testing on four graph OOD datasets and three class-imbalanced node classification datasets, exhibiting marked improvements over existing methods.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# UniQA: 画像品質と審美評価のための統合ビジョンランゲージ事前トレーニング UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment ( http://arxiv.org/abs/2406.01069v1 ) ライセンス: Link先を確認	Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li,	(参考訳) 画像品質評価(IQA)と画像審美評価(IAA)は、人間の視覚的品質と美的魅力に対する主観的知覚をシミュレートすることを目的としている。既存の手法は、異なる学習目的のために、これらのタスクを独立して扱うのが一般的である。しかし、両タスクの相互接続性は無視され、人間の主観的知覚に対するタスクに依存しない共有表現の学習を妨げる。この課題に対処するため、我々は2つのタスクの一般的な認識を学習するために、品質と美学の統一視覚言語事前学習(UniQA)を提案する。 IQAデータセットにおけるテキストの欠如とIAAデータセットにおけるテキストノイズの存在に対処するため,(1)マルチモーダル・大規模言語モデル(MLLM)を用いて高品質なテキスト記述を生成する。事前学習したUniQAを下流タスクに効果的に適応させるために,多目的キューを利用して事前学習したモデルの広範な知識をフル活用する軽量アダプタを提案する。本手法はIQAタスクとIAAタスクの両タスクにおいて,新たな最先端性能を実現するとともに,例外的なゼロショットとラベルの少ないイメージアセスメント機能を同時に実現していることを示す。ソースコードはhttps://github.com/zht8506/UniQA.comで入手できる。 Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for human subjective perception. To confront this challenge, we propose Unified vision-language pre-training of Quality and Aesthetics (UniQA), to learn general perceptions of two tasks, thereby benefiting them simultaneously. Addressing the absence of text in the IQA datasets and the presence of textual noise in the IAA datasets, (1) we utilize multimodal large language models (MLLMs) to generate high-quality text descriptions; (2) the generated text for IAA serves as metadata to purify noisy IAA data. To effectively adapt the pre-trained UniQA to downstream tasks, we further propose a lightweight adapter that utilizes versatile cues to fully exploit the extensive knowledge of the pre-trained model. Extensive experiments demonstrate that our approach attains a new state-of-the-art performance on both IQA and IAA tasks, while concurrently showcasing exceptional zero-shot and few-label image assessment capabilities. The source code will be available at https://github.com/zht8506/UniQA.	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# ChatGPTによる高次ドメインサマリ生成の誘導 Guiding ChatGPT to Generate Salient Domain Summaries ( http://arxiv.org/abs/2406.01070v1 ) ライセンス: Link先を確認	Jun Gao, Ziqiang Cao, Shaoyao Huang, Luozheng Qin, Chunhui Ai,	(参考訳) チャットGPTは、ヒューマンフィードバックからの強化学習(Reinforcement Learning from Human Feedback, RLHF)を通じて、人間の嗜好に合わせるために、一般的な、人為的なコンテンツを生成するよう指示される。したがって、この場合、ChatGPTはゼロショット設定でドメイン要件を満たすことができず、ROUGEスコアが低い。 In-Context Learning (ICL) と ChatGPT のリテリング能力に触発された本論文では,ChatGPT を \textbf{D}omain \textbf{S}ummarization でサポートするための PADS を提案する。 PADSは、コーパスから類似した例を検索する検索器と、ChatGPTが生成した複数の候補要約をランク付けするランクモデルで構成される。具体的には、推論文書が与えられたら、最初に検索者を通してコンテキスト内デモを検索する。次に、ChatGPTは、検索したデモのガイダンスに基づいて、推論文書に対して$k$の候補要約を生成する必要がある。最後に、ランクモデルは、その品質に応じて$k$候補サマリーを独立にスコアし、最適なサマリーを選択する。提案手法を広範に検討し、参照のための効果的な実演を選択するとともに、各要約文書の候補要約の質を反映するランクモデルを効果的に訓練する。さらに、PADSにはランクモデルから派生した4億のトレーニング可能なパラメータが含まれており、トレーニングには2.5kのデータのみを収集する。その結果,PADSの各モジュールはChatGPTを効果的に誘導し,異なるドメイン要件に適合した有能な要約を生成することが示唆された。具体的には、一般的な要約データセットであるGigawordでは、PADSはゼロショット設定の単純なChatGPTと比較して、ROUGE-Lで+8以上のゲインを達成する。 \footnote{Our code are available at \url{https://github.com/jungao1106/PADS}} ChatGPT is instruct-tuned to generate general and human-expected content to align with human preference through Reinforcement Learning from Human Feedback (RLHF), meanwhile resulting in generated responses not salient enough. Therefore, in this case, ChatGPT may fail to satisfy domain requirements in zero-shot settings, leading to poor ROUGE scores. Inspired by the In-Context Learning (ICL) and retelling ability of ChatGPT, this paper proposes PADS, a \textbf{P}ipeline for \textbf{A}ssisting ChatGPT in \textbf{D}omain \textbf{S}ummarization. PADS consists of a retriever to retrieve similar examples from corpora and a rank model to rerank the multiple candidate summaries generated by ChatGPT. Specifically, given an inference document, we first retrieve an in-context demonstration via the retriever. Then, we require ChatGPT to generate $k$ candidate summaries for the inference document at a time under the guidance of the retrieved demonstration. Finally, the rank model independently scores the $k$ candidate summaries according to their quality and selects the optimal one. We extensively explore dense and sparse retrieval methods to select effective demonstrations for reference and efficiently train the rank model to reflect the quality of candidate summaries for each given summarized document. Additionally, PADS contains merely 400M trainable parameters originating from the rank model and we merely collect 2.5k data to train it. We evaluate PADS on five datasets from different domains, and the result indicates that each module in PADS is committed to effectively guiding ChatGPT to generate salient summaries fitting different domain requirements. Specifically, in the popular summarization dataset Gigaword, PADS achieves over +8 gain on ROUGE-L, compared with the naive ChatGPT in the zero-shot setting. \footnote{Our code are available at \url{https://github.com/jungao1106/PADS}}	翻訳日:2024-06-06 02:08:05 公開日:2024-06-03
# 合成画像データセット生成パイプラインによるビジュアルカーブランド分類 Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline ( http://arxiv.org/abs/2406.01071v1 ) ライセンス: Link先を確認	Jan Lippemeier, Stefanie Hittmeyer, Oliver Niehörster, Markus Lange-Hegermann,	(参考訳) 近年の機械学習,特にディープラーニングとオブジェクト検出の進歩は,画像分類や合成など,様々なタスクのパフォーマンスを著しく向上させた。しかし、特に特定のユースケースを正確に表現したラベル付きデータを取得する際には、課題は継続する。本研究では,高精細な画像を生成可能な画像合成モデルであるStable Diffusionを用いて,合成画像データセットを生成するための自動パイプラインを提案する。 YOLOv8を用いて自動境界ボックス検出と合成画像の品質評価を行う。コントリビューションには、合成データのみに基づく画像分類器の訓練の実現可能性、画像生成パイプラインの自動化、そして我々のアプローチの計算要件の説明が含まれる。安定拡散の異なるモードのユーザビリティを評価し,75%の分類精度を実現する。 Recent advancements in machine learning, particularly in deep learning and object detection, have significantly improved performance in various tasks, including image classification and synthesis. However, challenges persist, particularly in acquiring labeled data that accurately represents specific use cases. In this work, we propose an automatic pipeline for generating synthetic image datasets using Stable Diffusion, an image synthesis model capable of producing highly realistic images. We leverage YOLOv8 for automatic bounding box detection and quality assessment of synthesized images. Our contributions include demonstrating the feasibility of training image classifiers solely on synthetic data, automating the image generation pipeline, and describing the computational requirements for our approach. We evaluate the usability of different modes of Stable Diffusion and achieve a classification accuracy of 75%.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# スパイク活動に基づくプルーニングによる効率的なディープスパイクニューラルネットワーク構築に向けて Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning ( http://arxiv.org/abs/2406.01072v1 ) ライセンス: Link先を確認	Yaxin Li, Qi Xu, Jiangrong Shen, Hongming Xu, Long Chen, Gang Pan,	(参考訳) 多様な複雑なデータセットにまたがって高いパフォーマンスを示す深層および大規模スパイクニューラルネットワーク(SNN)の出現は、その低消費電力と生物学的解釈可能性の利点をより効果的に活用することを目的として、かなりの数の冗長構造ユニットが存在するため、ネットワークモデルを圧縮する必要がある。現在、SNNのほとんどのモデル圧縮技術は、特定のハードウェアサポートを必要とする個々の接続の非構造化プルーニングに基づいている。そこで本稿では,Spking Channel Activity-based (SCA) network pruning frameworkという,畳み込みカーネルの動作レベルに基づく構造化プルーニング手法を提案する。本手法は, 学習中の畳み込みカーネルの切断・再生によりネットワーク構造を動的に調整し, 現在の目標タスクへの適応性を高める。モデル性能を維持しながら、このアプローチはネットワークアーキテクチャを洗練し、究極的には計算負荷を減らし、推論プロセスを加速する。このことは、構造化された動的スパース学習手法により、低消費電力・高効率シナリオにおける深部SNNの適用がより容易になることを示している。 The emergence of deep and large-scale spiking neural networks (SNNs) exhibiting high performance across diverse complex datasets has led to a need for compressing network models due to the presence of a significant number of redundant structural units, aiming to more effectively leverage their low-power consumption and biological interpretability advantages. Currently, most model compression techniques for SNNs are based on unstructured pruning of individual connections, which requires specific hardware support. Hence, we propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework. Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task. While maintaining model performance, this approach refines the network architecture, ultimately reducing computational load and accelerating the inference process. This indicates that structured dynamic sparse learning methods can better facilitate the application of deep SNNs in low-power and high-efficiency scenarios.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 映像ベースFew-Shot行動認識モデルのクロスドメイン能力の理解 Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models ( http://arxiv.org/abs/2406.01073v1 ) ライセンス: Link先を確認	Georgia Markham, Mehala Balamurali, Andrew J. Hill,	(参考訳) Few-shot Action Recognition (FSAR) は、ビデオ中の新しいアクションをわずかに例を使って識別できるモデルを学ぶことを目的としている。メタトレーニング中に見られるベースデータセットと、評価に使用される新しいデータセットは、異なるドメインから得ることができると仮定すると、クロスドメインの少ショット学習は、より監督的な方法や従来の(単ドメインの)少ショットメソッドで必要とされるデータ収集とアノテーションコストを軽減します。このような学習形態は画像分類のために広く研究されているが、クロスドメインFSAR(CD-FSAR)の研究は、既存のモデルのクロスドメイン能力を最初に理解するのではなく、モデルの提案に限られている。そこで本研究では,既存の単一ドメイン,転送ベース,およびクロスドメインFSARメソッドを,ベースと新規セット間のドメインシフトに基づいて,難易度の高い新しいクロスドメインタスクに対して体系的に評価する。実験的なメタアナリシスにより,領域差と下流数ショットのパフォーマンスの相関が明らかとなり,CD-FSARにどのモデル側面が有効か,さらなる開発が必要なのか,いくつかの重要な知見が得られた。すなわち、ドメイン差が大きくなるにつれて、単純な転送学習アプローチは、他の手法よりも12%以上のパフォーマンスを示し、これらの難易度の高いクロスドメイン設定の下では、特別化されたクロスドメインモデルが最も低い性能を達成する。また,従来の手法とよく似た,あるいは悪い性能を実現するために,時間的アライメントを用いた最先端の単一ドメインFSARモデルも見受けられ,既存の時間的アライメント手法は目に見えない領域を一般化できないことが示唆された。我々の知る限りでは、我々はCD-FSAR問題を詳細に体系的に研究した最初の人物である。私たちの研究で明らかになった洞察と課題は、これらの方向における今後の研究を刺激し、知らせてくれることを願っています。 Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples. In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs required by methods with greater supervision and conventional (single-domain) few-shot methods. While this form of learning has been extensively studied for image classification, studies in cross-domain FSAR (CD-FSAR) are limited to proposing a model, rather than first understanding the cross-domain capabilities of existing models. To this end, we systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks with increasing difficulty, measured based on the domain shift between the base and novel set. Our empirical meta-analysis reveals a correlation between domain difference and downstream few-shot performance, and uncovers several important insights into which model aspects are effective for CD-FSAR and which need further development. Namely, we find that as the domain difference increases, the simple transfer-learning approach outperforms other methods by over 12 percentage points, and under these more challenging cross-domain settings, the specialised cross-domain model achieves the lowest performance. We also witness state-of-the-art single-domain FSAR models which use temporal alignment achieving similar or worse performance than earlier methods which do not, suggesting existing temporal alignment techniques fail to generalise on unseen domains. To the best of our knowledge, we are the first to systematically study the CD-FSAR problem in-depth. We hope the insights and challenges revealed in our study inspires and informs future work in these directions.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 温度制御SPDCによるナイルレッドの絡み合った2光子吸収の促進 Enhancing entangled two-photon absorption of Nile Red via temperature-controlled SPDC ( http://arxiv.org/abs/2406.01075v1 ) ライセンス: Link先を確認	Aleksa Krstić, Tobias Bernd Gäbler, Nitish Jain, Patrick Then, Valerio Flavio Gili, Sina Saravi, Frank Setzpfandt, Christian Eggeling, Markus Gräfe,	(参考訳) 絡み合った2光子吸収は、励起パワーによる蛍光発光の線形スケーリングを可能にする。古典的な2光子吸収と二次的なスケーリングは対照的に、蛍光イメージングやフォトリソグラフィーを極小露光強度で高軸分解能で行うことができる。しかし、2光子吸収に関するほとんどの実験的研究は、絡み合った光子対によって引き起こされる蛍光放出の曖昧な証明を示さなかった。一方、既存の理論モデルは、化学的に複雑な染料の絡み合った2光子吸収挙動を正確に予測するのに苦労している。本稿では, 一般的な蛍光染料において, 化学特性を考慮した2光子の絡み合った吸収をシミュレートする手法を提案する。理論モデルにより実験結果のより深い理解が可能となり, 絡み合った2光子吸収が発生する。特に, 非線形材料の相整合温度に吸収確率の顕著な依存性が認められた。さらに、ナイルレッドの実験データに対する理論的アプローチの結果を比較した。 Entangled two-photon absorption can enable a linear scaling of fluorescence emission with the excitation power. In comparison to classical two-photon absorption with a quadratic scaling, this can allow fluorescence imaging or photolithography with high axial resolution at minimal exposure intensities. However, most experimental studies on two-photon absorption were not able to show an unambiguous proof of fluorescence emission driven by entangled photon pairs. On the other hand, existing theoretical models struggle to accurately predict the entangled-two-photon-absorption behavior of chemically complex dyes. In this paper, we introduce an approach to simulate entangled two-photon absorption in common fluorescence dyes considering their chemical properties. Our theoretical model allows a deeper understanding of experimental results and thus the occurrence of entangled two-photon absorption. In particular, we found a remarkable dependency of the absorption probability on the phase-matching temperature of the nonlinear material. Further, we compared results of our theoretical approach to experimental data for Nile Red.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# キャノピー高さをスケールで推定する Estimating Canopy Height at Scale ( http://arxiv.org/abs/2406.01076v1 ) ライセンス: Link先を確認	Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke,	(参考訳) 衛星データに基づく世界規模キャノピー高さ推定のためのフレームワークを提案する。提案手法は,地中高度測定に固有の位置不正確性に対抗するために設計された新しい損失関数を利用して,山間部における誤ったラベルを効果的にフィルタリングし,それらの領域における予測の信頼性を高める。 MAE/RMSEは総計2.43/4.73(メートル)、樹高は4.45/6.72(メートル)である。結果として得られた高さマップと基盤となるフレームワークは、大規模な森林やバイオマスモニタリングを含む、世界規模での生態学的分析を促進・促進する。 We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# CUT: コントロール可能な、ユニバーサルで、トレーニング不要なビジュアル異常生成フレームワーク CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework ( http://arxiv.org/abs/2406.01078v1 ) ライセンス: Link先を確認	Han Sun, Yunkang Cao, Olga Fink,	(参考訳) 視覚異常検出(AD)は、異常データの不足により本質的に重大な課題に直面している。異常サンプルを合成するための多くの研究が提案されているが、生成されたサンプルは信頼性に欠けることが多く、利用可能なトレーニングデータサンプルの分布のみを反映できる。本研究では,画像生成における安定拡散(SD)の能力を生かして,多種多様な現実的な異常を生成する,制御可能・ユニバーサル・トレーニング不要な視覚異常生成フレームワークCUTを提案する。 CUTでは、新たなトレーニングを行なわずに単一のモデルを用いて、目に見えないデータと新しい異常タイプの両方にわたって、制御可能で現実的な異常生成を実現する。提案手法の有効性を示すために,視覚言語に基づく異常検出フレームワーク(VLAD)を提案する。生成した異常サンプルを用いてVLADモデルをトレーニングすることにより、いくつかのベンチマーク異常検出タスクで最先端のパフォーマンスを実現し、合成データによって実現された重要な改善点を浮き彫りにした。 Visual anomaly detection (AD) inherently faces significant challenges due to the scarcity of anomalous data. Although numerous works have been proposed to synthesize anomalous samples, the generated samples often lack authenticity or can only reflect the distribution of the available training data samples. In this work, we propose CUT: a Controllable, Universal and Training-free visual anomaly generation framework, which leverages the capability of Stable Diffusion (SD) in image generation to generate diverse and realistic anomalies. With CUT, we achieve controllable and realistic anomaly generation universally across both unseen data and novel anomaly types, using a single model without acquiring additional training effort. To demonstrate the effectiveness of our approach, we propose a Vision-Language-based Anomaly Detection framework (VLAD). By training the VLAD model with our generated anomalous samples, we achieve state-of-the-art performance on several benchmark anomaly detection tasks, highlighting the significant improvements enabled by our synthetic data.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 自己中心型オンライン行動検出を意識した物体認識 Object Aware Egocentric Online Action Detection ( http://arxiv.org/abs/2406.01079v1 ) ライセンス: Link先を確認	Joungbin An, Yunsu Park, Hyolim Kang, Seon Joo Kim,	(参考訳) Ego4D、EPIC-Kitchens、Ego-Exo4Dといったエゴセントリックなビデオデータセットの進歩は、拡張現実や生活支援の応用に欠かせない、一人称人間のインタラクションの研究を豊かにしている。これらの進歩にもかかわらず、ストリーミングビデオ中のアクションを効率的に検出する現在のオンラインアクション検出方法は、主に外向的な視点のために設計されており、したがって、自我中心の動画に固有のユニークな視点を生かしていない。このギャップに対処するため,既存のOADフレームワークにエゴセントリックな事前情報を統合したObject-Aware Moduleを導入し,一対一の映像解釈を強化した。我々のモジュールは、オブジェクト固有の詳細と時間的ダイナミクスを利用して、アクションの検出におけるシーン理解を改善する。 Epic-Kitchens 100データセットで広く検証された私たちの作業は、オーバーヘッドを最小限にして既存のモデルにシームレスに統合することができ、一貫したパフォーマンス向上を実現しています。 Advancements in egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D have enriched the study of first-person human interactions, which is crucial for applications in augmented reality and assisted living. Despite these advancements, current Online Action Detection methods, which efficiently detect actions in streaming videos, are predominantly designed for exocentric views and thus fail to capitalize on the unique perspectives inherent to egocentric videos. To address this gap, we introduce an Object-Aware Module that integrates egocentric-specific priors into existing OAD frameworks, enhancing first-person footage interpretation. Utilizing object-specific details and temporal dynamics, our module improves scene understanding in detecting actions. Validated extensively on the Epic-Kitchens 100 dataset, our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements, marking an important step forward in adapting action detection systems to egocentric video analysis.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# No Vandalism:プライバシ保護とビザンチン・ロバスト・フェデレーション・ラーニング No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning ( http://arxiv.org/abs/2406.01080v1 ) ライセンス: Link先を確認	Zhibo Xing, Zijian Zhang, Zi'ang Zhang, Jiamou Liu, Liehuang Zhu, Giovanni Russello,	(参考訳) フェデレートされた学習により、複数のクライアントがプライベートデータを共有せずに1つの機械学習モデルを共同でトレーニングし、プライバシ保護を提供する。しかし、従来の連合学習は毒性攻撃に弱いため、モデルの性能を低下させるだけでなく、悪意のあるバックドアを埋め込むこともできる。さらに、ローカルモデルパラメータの直接提出は、トレーニングデータセットのプライバシー漏洩につながる可能性がある。本稿では,悪意ある参加者からの攻撃に対して,有害行為(NoV)のない環境を提供するために,プライバシ保護とビザンチン損なうフェデレーション・ラーニング・スキームを構築することを目的とする。具体的には, 中毒モデルに対するモデルフィルタを構築し, データからグローバルモデルを守るとともに, モデル中毒攻撃から保護する。このモデルフィルタはゼロ知識証明を組み合わせて、さらなるプライバシー保護を提供する。そして、シークレット共有を採用して、安全なアグリゲーションを検証し、アグリゲーションプロセスを妨害する悪意のあるクライアントを削除します。我々の公式な分析によると、NoVはデータのプライバシーを保護し、ビザンツの攻撃者を追い払うことができる。我々の実験は、NoVがPGDを含むデータや毒殺攻撃を効果的に処理し、他の関連するスキームよりも優れていることを示した。 Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection. However, traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors. In addition, direct submission of local model parameters can also lead to the privacy leakage of the training dataset. In this paper, we aim to build a privacy-preserving and Byzantine-robust federated learning scheme to provide an environment with no vandalism (NoV) against attacks from malicious participants. Specifically, we construct a model filter for poisoned local models, protecting the global model from data and model poisoning attacks. This model filter combines zero-knowledge proofs to provide further privacy protection. Then, we adopt secret sharing to provide verifiable secure aggregation, removing malicious clients that disrupting the aggregation process. Our formal analysis proves that NoV can protect data privacy and weed out Byzantine attackers. Our experiments illustrate that NoV can effectively address data and model poisoning attacks, including PGD, and outperforms other related schemes.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 雑音チャネルにおけるコヒーレント状態重畳の適応 Adapting coherent-state superpositions in noisy channels ( http://arxiv.org/abs/2406.01081v1 ) ライセンス: Link先を確認	Jan Provazník, Petr Marek, Julien Laurat, Radim Filip,	(参考訳) 量子非ガウス状態は、非線形ボゾン系の基本的な理解と、量子技術における同時に高度な応用に不可欠である。多くのボソニックな実験において、重要な量子非ガウス的特徴は、ボソンによる量子計算の基礎であるウィグナー函数の負性である。残念なことに、複雑な量子状態に存在するネガティビティは、実験的な実装の避けられない部分である環境との結合によって引き起こされるエネルギー損失、ノイズ、嫌悪といったデコヒーレンスの影響に対して極めて脆弱である。その効果を緩和する効果的な方法は、量子状態をよりレジリエントな形式に適応させることである。本研究では,不斉熱損失チャネルの列に対するコヒーレント状態の重ね合わせを適切なスキューズ操作により最適に保護することを提案する。 Quantum non-Gaussian states are crucial for the fundamental understanding of non-linear bosonic systems and simultaneously advanced applications in quantum technologies. In many bosonic experiments the important quantum non-Gaussian feature is the negativity of the Wigner function, a cornerstone for quantum computation with bosons. Unfortunately, the negativities present in complex quantum states are extremely vulnerable to the effects of decoherence, such as energy loss, noise and dephasing, caused by the coupling to the environment, which is an unavoidable part of any experimental implementation. An efficient way to mitigate its effects is by adapting quantum states into more resilient forms. We propose an optimal protection of superpositions of coherent states against a sequence of asymmetric thermal lossy channels by suitable squeezing operations.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# FedAdOb: 適応的難読化によるプライバシ保護型深層学習 FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation ( http://arxiv.org/abs/2406.01085v1 ) ライセンス: Link先を確認	Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang,	(参考訳) フェデレーテッド・ラーニング(FL)は、複数のクライアントがプライベートデータを共有せずに、共同で機械学習モデルを学習できるコラボレーティブ・アプローチとして登場した。特定の条件下で実証されたプライバシー漏洩に関する懸念は、強力な攻撃方法の設計とこれらの攻撃方法の阻止を目的とした効果的な防御メカニズムに関する多くの追跡研究を引き起こしている。それでも、これらの防御手法で使用されるプライバシー保護メカニズムは、プライベートデータや勾配に適用される固定された難読化のために、しばしば妥協されたモデルパフォーマンスをもたらす。そこで本稿では,FedAdObと呼ばれる新しい適応難読化機構を提案する。技術的には、FedAdObはパスポートベースの適応難読化を利用して、水平および垂直の両方のフェデレーション学習環境におけるデータのプライバシを確保する。 FedAdObのプライバシー保護機能は、特にプライベート機能とラベルに関して、理論上はTheorems 1と2で証明されている。さらに、様々なデータセットやネットワークアーキテクチャに対して行われた広範な実験的評価により、プライバシ保護とモデル性能のトレードオフが既存の手法よりも優れていることを示すことにより、FedAdObの有効性が示された。 Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacking methods. Nevertheless, privacy-preserving mechanisms employed in these defending methods invariably lead to compromised model performances due to a fixed obfuscation applied to private data or gradients. In this article, we, therefore, propose a novel adaptive obfuscation mechanism, coined FedAdOb, to protect private data without yielding original model performances. Technically, FedAdOb utilizes passport-based adaptive obfuscation to ensure data privacy in both horizontal and vertical federated learning settings. The privacy-preserving capabilities of FedAdOb, specifically with regard to private features and labels, are theoretically proven through Theorems 1 and 2. Furthermore, extensive experimental evaluations conducted on various datasets and network architectures demonstrate the effectiveness of FedAdOb by manifesting its superior trade-off between privacy preservation and model performance, surpassing existing methods.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# ニューラルネットワークプルーニングのレンズによる効果的なサブセット選択 Effective Subset Selection Through The Lens of Neural Network Pruning ( http://arxiv.org/abs/2406.01086v1 ) ライセンス: Link先を確認	Noga Bar, Raja Giryes,	(参考訳) 大量の注釈付きデータを持つことは、ディープニューラルネットワークの有効性に大きな影響を及ぼす。しかし、医療データなど一部の領域では、アノテーションタスクは非常に高価である可能性がある。したがって、アノテートするデータを賢明に選択することが重要であり、これはサブセット選択問題として知られている。より広範に研究されているサブセット選択とニューラルネットワークプルーニングの関係について検討し,それらの対応性を確立する。ネットワークプルーニングからの洞察を活用し,ニューラルネットワーク特徴のノルム基準を利用してサブセット選択法を改善することを提案する。提案手法を様々なネットワークやデータセット上で実証的に検証し,精度を向上した。これは、サブセットの選択にプルーニングツールを使う可能性を示している。 Having large amounts of annotated data significantly impacts the effectiveness of deep neural networks. However, the annotation task can be very expensive in some domains, such as medical data. Thus, it is important to select the data to be annotated wisely, which is known as the subset selection problem. We investigate the relationship between subset selection and neural network pruning, which is more widely studied, and establish a correspondence between them. Leveraging insights from network pruning, we propose utilizing the norm criterion of neural network features to improve subset selection methods. We empirically validate our proposed strategy on various networks and datasets, demonstrating enhanced accuracy. This shows the potential of employing pruning tools for subset selection.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 平滑性制約下における線形力学系の連成学習 Joint Learning of Linear Dynamical Systems under Smoothness Constraints ( http://arxiv.org/abs/2406.01094v1 ) ライセンス: Link先を確認	Hemant Tyagi,	(参考訳) 複数の線形力学系の連立学習の問題点を考察する。これは最近、モデルパラメータに関する様々なタイプの仮定の下で大きな注目を集めています。私たちが考慮する設定は、与えられた無向グラフ $G = ([m], \mathcal{E})$ のノードに存在する$m$線型系の集合を含む。系行列は極端に安定であり、グラフ上の信号の二次的変動に類似した滑らか性制約 w.r.t$G$ を満たすと仮定する。ノードの状態が$T$タイムポイントを超えると、平均二乗誤差(MSE)の非漸近誤差境界とともに、システム行列の合同推定のための2つの推定器を提案する。特に、MSE が 0 に収束する条件は、通常多項式的に速い w.r.t $m$ である。結果は軽度(すなわち$T \sim \log m$)か、時には$T$(すなわち$T \geq 2$)の仮定さえない。 We consider the problem of joint learning of multiple linear dynamical systems. This has received significant attention recently under different types of assumptions on the model parameters. The setting we consider involves a collection of $m$ linear systems each of which resides on a node of a given undirected graph $G = ([m], \mathcal{E})$. We assume that the system matrices are marginally stable, and satisfy a smoothness constraint w.r.t $G$ -- akin to the quadratic variation of a signal on a graph. Given access to the states of the nodes over $T$ time points, we then propose two estimators for joint estimation of the system matrices, along with non-asymptotic error bounds on the mean-squared error (MSE). In particular, we show conditions under which the MSE converges to zero as $m$ increases, typically polynomially fast w.r.t $m$. The results hold under mild (i.e., $T \sim \log m$), or sometimes, even no assumption on $T$ (i.e. $T \geq 2$).	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 教師なし学習と教師付き学習の相乗化: 自然言語タスクモデリングの高精度化のためのハイブリッドアプローチ Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling ( http://arxiv.org/abs/2406.01096v1 ) ライセンス: Link先を確認	Wrick Talukdar, Anjanava Biswas,	(参考訳) 教師付き学習モデルは、様々な自然言語処理(NLP)タスクにおいて顕著なパフォーマンスを示しているが、その成功は大規模ラベル付きデータセットの可用性に大きく依存している。逆に、教師なし学習技術は、豊富なラベルのないテキストデータを利用してリッチな表現を学習するが、特定のNLPタスクに対して直接最適化するわけではない。本稿では,NLPタスクモデリングの精度を向上させるために,教師なし学習と教師なし学習を相乗化する新しいハイブリッド手法を提案する。教師付きモデルは特定のタスクで優れているが、大きなラベル付きデータセットに依存している。教師なしのテクニックは、豊富なラベルのないテキストからリッチな表現を学ぶことができますが、タスクを直接最適化することはできません。提案手法は,ラベルのないコーパス(例えば,言語モデル,単語埋め込み)から表現を学習する教師なしモジュールと,これらの表現を活用してタスク固有のモデルを強化する教師付きモジュールを統合する。我々は、テキスト分類と名前付きエンティティ認識(NER)に対するアプローチを評価し、教師付きベースラインよりも一貫したパフォーマンス向上を示す。テキスト分類では、言語モデルからの文脈単語の埋め込みは、繰り返しまたは変換器ベースの分類器を事前訓練する。 NER の場合、ワード埋め込みは BiLSTM シーケンスラベルを初期化する。手法の相乗化により、我々のハイブリッドアプローチはベンチマークデータセット上でSOTAの結果を達成し、よりデータ効率が高くロバストなNLPシステムを実現する。 While supervised learning models have shown remarkable performance in various natural language processing (NLP) tasks, their success heavily relies on the availability of large-scale labeled datasets, which can be costly and time-consuming to obtain. Conversely, unsupervised learning techniques can leverage abundant unlabeled text data to learn rich representations, but they do not directly optimize for specific NLP tasks. This paper presents a novel hybrid approach that synergizes unsupervised and supervised learning to improve the accuracy of NLP task modeling. While supervised models excel at specific tasks, they rely on large labeled datasets. Unsupervised techniques can learn rich representations from abundant unlabeled text but don't directly optimize for tasks. Our methodology integrates an unsupervised module that learns representations from unlabeled corpora (e.g., language models, word embeddings) and a supervised module that leverages these representations to enhance task-specific models. We evaluate our approach on text classification and named entity recognition (NER), demonstrating consistent performance gains over supervised baselines. For text classification, contextual word embeddings from a language model pretrain a recurrent or transformer-based classifier. For NER, word embeddings initialize a BiLSTM sequence labeler. By synergizing techniques, our hybrid approach achieves SOTA results on benchmark datasets, paving the way for more data-efficient and robust NLP systems.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# アルゴリズムによる決定木と森林の学習 Learning Decision Trees and Forests with Algorithmic Recourse ( http://arxiv.org/abs/2406.01098v1 ) ライセンス: Link先を確認	Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike,	(参考訳) 本稿では,表現行動の存在を保証しつつ,正確な木モデル学習のための新しいアルゴリズムを提案する。 Algorithmic Recourse(AR)は、モデルによって与えられる望ましくない予測結果を変更するためのリコースアクションを提供することを目的としている。典型的なARメソッドは、実行可能なアクション間で必要な労力を最小限に抑える最適化タスクを解くことで、合理的なアクションを提供する。しかし、実際には、予測性能に最適化されたモデルに対して、そのようなアクションが常に存在するとは限らない。この問題を緩和するために、できるだけ多くの事例に対して合理的な行動が存在することを保証する制約の下で、正確な分類木を学習するタスクを定式化する。そこで本稿では,対戦型学習手法を利用した効率的なトップダウングリーディアルゴリズムを提案する。また,本アルゴリズムは,木アンサンブルを学習するための一般的なフレームワークとして知られ,ランダムな森林に適用可能であることを示す。実験結果から,提案手法は精度と計算効率を著しく低下させることなく,ベースラインよりも多くのインスタンスに対して合理的な作用を与えることができた。 This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. In practice, however, such actions do not always exist for models optimized only for predictive performance. To alleviate this issue, we formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency.	翻訳日:2024-06-06 01:58:18 公開日:2024-06-03
# 連続動作を伴う弱結合型MDPの深部強化学習 Deep reinforcement learning for weakly coupled MDP's with continuous actions ( http://arxiv.org/abs/2406.01099v1 ) ライセンス: Link先を確認	Francisco Robledo, Urtzi Ayesta, Konstantin Avrachenkov,	(参考訳) 本稿では,連続行動空間と弱結合なMDP問題を対象とした強化学習アルゴリズムであるLagrange Policy for Continuous Actions (LPCA)を紹介する。 LPCAは、Q値計算のためのニューラルネットワークフレームワークにおいて、弱い結合のMDP問題のラグランジュ緩和を導入することで、継続的な行動に依存するリソース制約の課題に対処する。このアプローチはMDPを効果的に分離し、資源制約環境における効率的な政策学習を可能にする。グローバル最適化に差分進化を利用するLPCA-DEと,Q値勾配に基づく行動の漸進的かつ段階的に選択するLPCA-Greedyの2つのバリエーションを示す。他の最先端技術との比較分析では、LPCAの資源配分管理における堅牢性と効率性を強調し、報酬を最大化している。 This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# 商業格闘技におけるDRLエージェントの強化:トレーニング,統合,エージェント・ヒューマンアライメント Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment ( http://arxiv.org/abs/2406.01103v1 ) ライセンス: Link先を確認	Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang,	(参考訳) Deep Reinforcement Learning (DRL)エージェントは、幅広いゲームジャンルで素晴らしい成功を収めている。しかし、既存の研究は主にDRL能力の最適化に重点を置いており、長期にわたるプレイヤーインタラクションの課題に対処している。本稿では,1億人以上の登録ユーザを持つ人気の格闘ゲームであるナルトモバイルに,Sh\=ukaiという名の格闘ゲームのための実用的なDRLエージェントシステムを提案する。 Sh\=ukaiは、バランスの取れた能力、一般化可能性、訓練効率を達成するためにヘテロジニアスリーグトレーニング(HELT)を導入することで、一般化性を高めるための状態を定量化する。さらに、Sh\=ukaiは、エージェントの行動と人間の期待を一致させるために、特定の報酬を実装している。シュ=ウカイの一般化能力は、全キャラクタに対して一貫した能力で示されるが、そのうち13%しか訓練されていない。さらに、HELTは試料効率を22%改善した。シュ=ウカイはナルトモバイルのプレイヤーにとって貴重なトレーニングパートナーであり、彼らの能力とスキルを高めることができる。 Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Sh\=ukai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Sh\=ukai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Sh\=ukai implements specific rewards to align the agent's behavior with human expectations. Sh\=ukai's ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Sh\=ukai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# BACON: データセット蒸留のためのベイズ最適凝縮フレームワーク BACON: Bayesian Optimal Condensation Framework for Dataset Distillation ( http://arxiv.org/abs/2406.01112v1 ) ライセンス: Link先を確認	Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao,	(参考訳) Dataset Distillation (DD)は、テストセットのパフォーマンスを維持しながら、広範なデータセットからよりコンパクトなデータセットに知識を抽出することを目的としており、ストレージコストとトレーニングコストを削減している。しかし、既存の手法は計算強度に悩まされることが多く、DD問題を解析するための堅牢な理論的枠組みが欠如しているため、特にデータセットサイズが大きな場合、最適以下の性能を示す。これらの課題に対処するために,ベイズ理論フレームワークをDDの文献に導入する最初の試みであるBACON(Bayesian optimal Condensation framework)を提案する。このフレームワークはDDの性能を高めるための理論的サポートを提供する。さらに、BACONは、ベイズフレームワークを用いた結合確率分布における予測リスク関数の最小化としてDD問題を定式化する。さらに、最適凝縮に対する予測リスク関数を解析することにより、特定の仮定に基づいて数値的に実現可能な下界を導出し、BACONの近似解を提供する。 BACONを複数のデータセットで検証し、既存の最先端手法と比較して優れた性能を示す。例えば、ICC-10設定下では、BACONはCIFAR-10データセットのIDM法よりも3.46%精度が向上し、TinyImageNetデータセットの3.10%精度が向上する。本研究では,BACONの有効性と既存手法とのシームレスな統合性を確認し,DDタスクの性能向上を図る。コードと蒸留されたデータセットはBACONで入手できる。 Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyzing the DD problem. To address these challenges, we propose the BAyesian optimal CONdensation framework (BACON), which is the first work to introduce the Bayesian theoretical framework to the literature of DD. This framework provides theoretical support for enhancing the performance of DD. Furthermore, BACON formulates the DD problem as the minimization of the expected risk function in joint probability distributions using the Bayesian framework. Additionally, by analyzing the expected risk function for optimal condensation, we derive a numerically feasible lower bound based on specific assumptions, providing an approximate solution for BACON. We validate BACON across several datasets, demonstrating its superior performance compared to existing state-of-the-art methods. For instance, under the IPC-10 setting, BACON achieves a 3.46% accuracy gain over the IDM method on the CIFAR-10 dataset and a 3.10% gain on the TinyImageNet dataset. Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task. Code and distilled datasets are available at BACON.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# 動的命題をもつブール式による大域的解釈可能な分類器 Globally Interpretable Classifiers via Boolean Formulas with Dynamic Propositions ( http://arxiv.org/abs/2406.01114v1 ) ライセンス: Link先を確認	Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander,	(参考訳) 解釈可能性と説明可能性は、現代の人工知能において最も重要な課題の一つであり、様々な立法機関にも言及されている。本稿では,図形データから即時人間の解釈可能な分類器を抽出する手法を開発する。分類器は、カテゴリー属性から直接抽出するか、数値属性から直接動的に計算できる命題で構築された短いブール公式の形で与えられる。提案手法はAnswer Set Programmingを用いて実装する。我々は7つのデータセットを調査し、その結果をグラフデータ、すなわちXGBoostとランダムフォレストに対して最先端の分類器で得られるものと比較した。全てのデータセットにおいて,本手法で取得可能な精度は参照手法と類似している。すべてのケースにおいて、分類器の利点は、参照メソッドのブラックボックスの性質とは対照的に、非常に短く、すぐに人間の知性があることです。 Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extracted from categorical attributes or dynamically computed from numeric ones. Our method is implemented using Answer Set Programming. We investigate seven datasets and compare our results to ones obtainable by state-of-the-art classifiers for tabular data, namely, XGBoost and random forests. Over all datasets, the accuracies obtainable by our method are similar to the reference methods. The advantage of our classifiers in all cases is that they are very short and immediately human intelligible as opposed to the black-box nature of the reference methods.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# Cohort Squeeze: クロスデバイスフェデレーション学習におけるコホート毎のコミュニケーションラウンドを超えて Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning ( http://arxiv.org/abs/2406.01115v1 ) ライセンス: Link先を確認	Kai Yi, Timur Kharisov, Igor Sokolov, Peter Richtárik,	(参考訳) FedAvgを含む、事実上全てのフェデレーションラーニング(FL)メソッドは、以下の方法で動作する。一オーケストレーションサーバは、特定の規則により選択されたクライアントのコホートに現在のモデルパラメータを送信する。二これらのクライアントは、それぞれ独自のトレーニングデータを用いて、独立してローカルトレーニング手順(例えば、SGD又はAdamを介して)を行う。三結果のモデルが集約のためにサーバに出荷されること。このプロセスは、適切な品質のモデルが見つかるまで繰り返される。これらの手法の注目すべき特徴は、各コホートがサーバのみとの単一の通信ラウンドに関与していることである。本研究では、このアルゴリズム設計のプリミティブに挑戦し、単一の通信ラウンドで可能なものよりも、それぞれのコホートから「もっとジュースをスクイーズ」できるかどうかを検討する。驚いたことに、これは事実であり、当社のアプローチはデバイス横断環境でのFLモデルのトレーニングに必要な通信コストを最大74%削減する。提案手法は,従来のクライアント選択手法と比較して,多くのクライアントサンプリング手順をサポートする確率的近点法 (SPPM-AS) の新たな変種に基づくものである。 Virtually all federated learning (FL) methods, including FedAvg, operate in the following manner: i) an orchestrating server sends the current model parameters to a cohort of clients selected via certain rule, ii) these clients then independently perform a local training procedure (e.g., via SGD or Adam) using their own training data, and iii) the resulting models are shipped to the server for aggregation. This process is repeated until a model of suitable quality is found. A notable feature of these methods is that each cohort is involved in a single communication round with the server only. In this work we challenge this algorithmic design primitive and investigate whether it is possible to ``squeeze more juice" out of each cohort than what is possible in a single communication round. Surprisingly, we find that this is indeed the case, and our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting. Our method is based on a novel variant of the stochastic proximal point method (SPPM-AS) which supports a large collection of client sampling procedures some of which lead to further gains when compared to classical client selection approaches.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# 閉形式分類器を用いた不均一フェデレーション学習の高速化 Accelerating Heterogeneous Federated Learning with Closed-form Classifiers ( http://arxiv.org/abs/2406.01116v1 ) ライセンス: Link先を確認	Eros Fanì, Raffaello Camoriano, Barbara Caputo, Marco Ciccone,	(参考訳) フェデレートラーニング(FL)手法は、しばしば非常に統計的に異質な設定で苦労する。実際、非IIDデータ分布は、クライアントのドリフトとバイアスのあるローカルソリューション、特に最終分類層で発音され、収束速度と精度に悪影響を及ぼす。この問題に対処するため、Fed3R(Federated Recursive Ridge Regression)を紹介します。本手法は,事前学習した特徴を活かしたクローズド形式で計算されたリッジ回帰分類器に適合する。 Fed3Rは統計的不均一性に免疫を持ち、クライアントのサンプリング順序に不変である。そのため、クロスデバイスシナリオでは特に有効である。さらに、通信コストと計算コストの面では高速で効率的であり、競合他社よりも最大2桁少ないリソースを必要とする。最後に、Fed3Rパラメータをソフトマックス分類器の初期化として利用し、任意のFLアルゴリズム(Fed3R with Fine-Tuning, Fed3R+FT)を用いてモデルを微調整する。また, 定型分類器の維持は, デバイス間設定におけるトレーニングの安定化と, より差別的な特徴の学習に有効であることが示唆された。公式サイト: https://fed-3r.github.io/.com Federated Learning (FL) methods often struggle in highly statistically heterogeneous settings. Indeed, non-IID data distributions cause client drift and biased local solutions, particularly pronounced in the final classification layer, negatively impacting convergence speed and accuracy. To address this issue, we introduce Federated Recursive Ridge Regression (Fed3R). Our method fits a Ridge Regression classifier computed in closed form leveraging pre-trained features. Fed3R is immune to statistical heterogeneity and is invariant to the sampling order of the clients. Therefore, it proves particularly effective in cross-device scenarios. Furthermore, it is fast and efficient in terms of communication and computation costs, requiring up to two orders of magnitude fewer resources than the competitors. Finally, we propose to leverage the Fed3R parameters as an initialization for a softmax classifier and subsequently fine-tune the model using any FL algorithm (Fed3R with Fine-Tuning, Fed3R+FT). Our findings also indicate that maintaining a fixed classifier aids in stabilizing the training and learning more discriminative features in cross-device settings. Official website: https://fed-3r.github.io/.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# カールマン・グラッド法による流体の量子シミュレーション Carleman-Grad approach to the quantum simulation of fluids ( http://arxiv.org/abs/2406.01118v1 ) ライセンス: Link先を確認	Claudio Sanavio, Enea Mauri, Sauro Succi,	(参考訳) グラッドの一般化流体力学に基づく古典流体の量子シミュレーションに対するカールマン線形化法について論じ、格子ボルツマンとナヴィエ・ストークスの定式化に基づく以前の研究と比較した。カールマン・グラッド法は両者の中間的性質を示す。すなわち、カールマン反復の数十の時間ステップへの収束と、量子線型代数解法を用いた潜在的に実行可能な量子回路の実装である。しかし、どちらの特徴も流体流のための実行可能な量子アルゴリズムを得るためにかなりの改善が必要である。 We discuss the Carleman linearization approach to the quantum simulation of classical fluids based on Grad's generalized hydrodynamics and compare it to previous investigations based on lattice Boltzmann and Navier-Stokes formulations. We show that the Carleman-Grad procedure exhibits intermediate properties between the two. Namely, convergence of the Carleman iteration over a few tens of timesteps and a potentially viable quantum circuit implementation using quantum linear algebra solvers. However, both features still need substantial improvements to yield a viable quantum algorithm for fluid flows.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# $Δ$-DiT:拡散変換器のための訓練不要加速法 $Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers ( http://arxiv.org/abs/2406.01125v1 ) ライセンス: Link先を確認	Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, Tao Chen,	(参考訳) 拡散モデルは高品質で多様な画像を生成するために広く認識されているが、そのリアルタイム性能の低さは、主にUNetベースの構造に焦点をあてた多くの加速作業につながっている。拡散変圧器(DiT)によりより成功した結果により、DiT構造が生成に与える影響や、DiTアーキテクチャに合わせた加速度フレームワークが存在しないことに関して、まだ探索の余地がない。これらの課題に対処するため、我々は、DiTブロックと画像生成の相関について検討する。以上の結果から,DiTの前面ブロックは生成画像の輪郭に関連し,後方ブロックは細部に関連があることが判明した。そこで本研究では,初期サンプリング段階における後部DiTブロックと後期サンプリング段階における前部DiTブロックを高速化するためのキャッシュ機構を設計した,トレーニングフリー推論アクセラレーションフレームワークである$\Delta$-DiTを提案する。具体的には、前のサンプリング画像の入力を考慮し、推論のバイアスを低減する、$\Delta$-Cacheと呼ばれるDiT固有のキャッシュ機構を提案する。 PIXART-$\alpha$とDiT-XLの大規模な実験は、$\Delta$-DiTが20ステップ世代で1.6\times$のスピードアップを達成でき、ほとんどの場合パフォーマンスも向上することを示した。 4段階の一貫性のあるモデル生成とより困難な1.12\times$Accelerationのシナリオでは,提案手法は既存手法よりも大幅に優れている。私たちのコードは公開されます。 Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a lack of exploration regarding the impact of DiT structure on generation, as well as the absence of an acceleration framework tailored to the DiT architecture. To tackle these challenges, we conduct an investigation into the correlation between DiT blocks and image generation. Our findings reveal that the front blocks of DiT are associated with the outline of the generated images, while the rear blocks are linked to the details. Based on this insight, we propose an overall training-free inference acceleration framework $\Delta$-DiT: using a designed cache mechanism to accelerate the rear DiT blocks in the early sampling stages and the front DiT blocks in the later stages. Specifically, a DiT-specific cache mechanism called $\Delta$-Cache is proposed, which considers the inputs of the previous sampling image and reduces the bias in the inference. Extensive experiments on PIXART-$\alpha$ and DiT-XL demonstrate that the $\Delta$-DiT can achieve a $1.6\times$ speedup on the 20-step generation and even improves performance in most cases. In the scenario of 4-step consistent model generation and the more challenging $1.12\times$ acceleration, our method significantly outperforms existing methods. Our code will be publicly available.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# TCMBench: 漢方医学における大規模言語モデル評価のための総合ベンチマーク TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine ( http://arxiv.org/abs/2406.01126v1 ) ライセンス: Link先を確認	Wenjing Yue, Xiaoling Wang, Wei Zhu, Ming Guan, Huanran Zheng, Pengfei Wang, Changzhi Sun, Xin Ma,	(参考訳) 大規模言語モデル(LLM)は、西欧医学領域を含む様々な自然言語処理タスクにおいて、ベンチマークによって著しくよく機能している。しかしながら、LSMの専門的評価ベンチマークは、歴史的歴史と大きな影響力を持つ伝統的な中国医学(TCM)領域では、まだカバーされていない。そこで本研究では,TCMにおけるLLM性能を評価するための総合的なベンチマークであるTCM-Benchを紹介する。 TCM-EDデータセットは、TCMLE(TCM Licensing Exam)から得られた5,473の質問からなり、1,300の質問が権威的な分析によって集められている。 TCMの基礎と臨床の実践を含む、TCMLEのコアコンポーネントをカバーしている。質問応答の精度を超越してLLMを評価するために,TCM関連質問に対してLLMが生成する回答の質を評価するための指標であるTCMScoreを提案する。 TCMのセマンティクスと知識の一貫性を包括的に検討する。様々な観点から総合的な実験分析を行った結果,(1)このベンチマークにおけるLCMの不満足な性能は,TCMの大幅な改善の余地を浮き彫りにした。 2) ドメイン知識の導入により, LLMの性能が向上する。しかし、ZhongJing-TCMのようなドメイン内モデルでは、生成した解析テキストの品質は低下しており、それらの微調整プロセスが基本的なLLM機能に影響を与えると仮定する。 3) Rouge や BertScore のようなテキスト生成品質の伝統的な指標は,テキストの長さや表面意味の曖昧さに影響を受けやすいが,TMScore のようなドメイン固有の指標は,その評価結果をさらに補完し,説明することができる。これらの知見は,TCM における LLM の機能と限界を強調し,医療研究に深い支援を提供することを目的としている。 Large language models (LLMs) have performed remarkably well in various natural language processing tasks by benchmarking, including in the Western medical domain. However, the professional evaluation benchmarks for LLMs have yet to be covered in the traditional Chinese medicine(TCM) domain, which has a profound history and vast influence. To address this research gap, we introduce TCM-Bench, an comprehensive benchmark for evaluating LLM performance in TCM. It comprises the TCM-ED dataset, consisting of 5,473 questions sourced from the TCM Licensing Exam (TCMLE), including 1,300 questions with authoritative analysis. It covers the core components of TCMLE, including TCM basis and clinical practice. To evaluate LLMs beyond accuracy of question answering, we propose TCMScore, a metric tailored for evaluating the quality of answers generated by LLMs for TCM related questions. It comprehensively considers the consistency of TCM semantics and knowledge. After conducting comprehensive experimental analyses from diverse perspectives, we can obtain the following findings: (1) The unsatisfactory performance of LLMs on this benchmark underscores their significant room for improvement in TCM. (2) Introducing domain knowledge can enhance LLMs' performance. However, for in-domain models like ZhongJing-TCM, the quality of generated analysis text has decreased, and we hypothesize that their fine-tuning process affects the basic LLM capabilities. (3) Traditional metrics for text generation quality like Rouge and BertScore are susceptible to text length and surface semantic ambiguity, while domain-specific metrics such as TCMScore can further supplement and explain their evaluation results. These findings highlight the capabilities and limitations of LLMs in the TCM and aim to provide a more profound assistance to medical research.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# マルチモーダル・サリアント・オブジェクト検出のための適応型融合銀行の学習 Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection ( http://arxiv.org/abs/2406.01127v1 ) ライセンス: Link先を確認	Kunpeng Wang, Zhengzheng Tu, Chenglong Li, Cheng Zhang, Bin Luo,	(参考訳) マルチモーダル・サリエント物体検出(MSOD)は、可視光源を深度や熱赤外と統合することにより、塩分検出性能を向上させることを目的としている。既存の方法は通常、特定の問題や課題を扱うために異なる融合スキームを設計する。これらの融合スキームは特定の問題や課題に対処するのに効果的であるが、複数の複雑な課題を同時に扱うのに苦労する可能性がある。そこで本研究では,MSODの高機能化のために,様々な課題を同時に扱うための基本核融合方式から,相補的メリットをフル活用する新しい適応核融合銀行を提案する。我々は,MSODにおける5つの大きな課題,すなわち中心バイアス,スケール変動,画像クラッタ,低照度,熱的クロスオーバーあるいは深度あいまいさの対処に重点を置いている。提案した核融合銀行は5つの代表的な核融合スキームから構成されており、それぞれの課題の特徴に基づいて特別に設計されている。銀行はスケーラブルで、さらなる課題のために、さらなる統合計画が銀行に組み込まれる可能性がある。マルチモーダル入力に対する適切な融合方式を適応的に選択するために,適応型融合バンクを形成する適応型アンサンブルモジュールを導入する。さらに,高レベルな意味情報と低レベルな空間的詳細をスキップ統合することで,高レベルな中空オブジェクトを正確に検出するための間接的対話型誘導モジュールを設計する。 3つのRGBTデータセットと7つのRGBDデータセットに対する大規模な実験により、提案手法が最先端の手法と比較して優れた性能を達成することを示した。コードと結果はhttps://github.com/Angknpng/LAFB.comで公開されている。 Multi-modal salient object detection (MSOD) aims to boost saliency detection performance by integrating visible sources with depth or thermal infrared ones. Existing methods generally design different fusion schemes to handle certain issues or challenges. Although these fusion schemes are effective at addressing specific issues or challenges, they may struggle to handle multiple complex challenges simultaneously. To solve this problem, we propose a novel adaptive fusion bank that makes full use of the complementary benefits from a set of basic fusion schemes to handle different challenges simultaneously for robust MSOD. We focus on handling five major challenges in MSOD, namely center bias, scale variation, image clutter, low illumination, and thermal crossover or depth ambiguity. The fusion bank proposed consists of five representative fusion schemes, which are specifically designed based on the characteristics of each challenge, respectively. The bank is scalable, and more fusion schemes could be incorporated into the bank for more challenges. To adaptively select the appropriate fusion scheme for multi-modal input, we introduce an adaptive ensemble module that forms the adaptive fusion bank, which is embedded into hierarchical layers for sufficient fusion of different source data. Moreover, we design an indirect interactive guidance module to accurately detect salient hollow objects via the skip integration of high-level semantic information and low-level spatial details. Extensive experiments on three RGBT datasets and seven RGBD datasets demonstrate that the proposed method achieves the outstanding performance compared to the state-of-the-art methods. The code and results are available at https://github.com/Angknpng/LAFB.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# SAVA: スケーラブルな学習非依存データ評価 SAVA: Scalable Learning-Agnostic Data Valuation ( http://arxiv.org/abs/2406.01130v1 ) ライセンス: Link先を確認	Samuel Kessler, Tam Le, Vu Nguyen,	(参考訳) 大規模でWebスクラッドな実際のデータセットには、個々のデータポイントの品質と関連性に影響を与えるノイズの多いアーティファクトが含まれているため、マシンラーニングモデルのトレーニングに適したデータを選択することが重要です。これらのアーティファクトは、モデルのパフォーマンスと一般化に影響を与えます。我々は、この問題をデータ評価タスクとして定式化し、クリーンでキュレートされた検証セットの類似性や相似性に応じて、トレーニングセット内のデータポイントに値を割り当てる。近年,LAVA (Just et al 2023) は,大規模ノイズ学習データセットとクリーン・バリデーション・セット間の最適輸送(OT)をモデル性能に依存せずに効率的にトレーニングデータを評価できることを示した。しかし、LAVAアルゴリズムは、入力としてデータセット全体を必要とするため、そのアプリケーションは大規模なデータセットに制限される。データセット全体ではなく,データポイントのバッチ上で計算を行う確率的(漸進的)アプローチのスケーラビリティに着想を得て,LAVAのスケーラブルな変種であるSAVAを提案し,その計算をデータポイントのバッチ上で行う。直感的には、SAVAはデータバリュエーションに階層的に定義されたOTを利用するLAVAと同じスキームに従う。しかし、LAVAがデータセット全体を処理している間、SAVAはデータセットをデータポイントのバッチに分割し、これらのバッチ上でOT問題計算を実行する。 SAVAが数百万のデータポイントを持つ大規模なデータセットにスケール可能であること、データバリュエーションのパフォーマンスをトレードオフしないことを実証するために、広範な実験を行います。 Selecting suitable data for training machine learning models is crucial since large, web-scraped, real datasets contain noisy artifacts that affect the quality and relevance of individual data points. These artifacts will impact the performance and generalization of the model. We formulate this problem as a data valuation task, assigning a value to data points in the training set according to how similar or dissimilar they are to a clean and curated validation set. Recently, LAVA (Just et al. 2023) successfully demonstrated the use of optimal transport (OT) between a large noisy training dataset and a clean validation set, to value training data efficiently, without the dependency on model performance. However, the LAVA algorithm requires the whole dataset as an input, this limits its application to large datasets. Inspired by the scalability of stochastic (gradient) approaches which carry out computations on batches of data points instead of the entire dataset, we analogously propose SAVA, a scalable variant of LAVA with its computation on batches of data points. Intuitively, SAVA follows the same scheme as LAVA which leverages the hierarchically defined OT for data valuation. However, while LAVA processes the whole dataset, SAVA divides the dataset into batches of data points, and carries out the OT problem computation on those batches. We perform extensive experiments, to demonstrate that SAVA can scale to large datasets with millions of data points and doesn't trade off data valuation performance.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# Favi-Score:AI評価のための自動選好評価における相違点の測定 Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation ( http://arxiv.org/abs/2406.01131v1 ) ライセンス: Link先を確認	Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak,	(参考訳) 生成型AIシステムは、あらゆる種類のモダリティに対してユビキタスなものとなり、そのようなモデルの評価の問題はより強固になっている。 1つの一般的なアプローチは選好評価であり、それぞれの選好を選択する評価者に異なるシステムの出力が示される。近年、フィールドは自動(トレーニングされた)メトリクスの開発に移行し、生成したアウトプットを評価した。本研究では,現在人間の判断と相関や計算符号の精度のスコアに頼っている指標自体の評価について検討する。これらの測定は、測定値が人間の評価とどの程度うまく一致しているかを評価するだけである。しかし、我々の研究は、これが全体像を示さないことを示している。ほとんどのメトリクスは、特定のテキスト生成システムに好まれる人間のシステムアセスメントと不一致を示しており、自動化されたメトリクスにある程度の好意を呈している。本稿では、選好指標における好意の形式的定義を紹介し、この現象を測るファビスコアを導出する。特に、最終システムランキングの誤差に好意が強く関係していることが示される。そこで本稿では,手話の精度と好ましさの両面から,嗜好に基づく指標を評価することを提案する。 Generative AI systems have become ubiquitous for all kinds of modalities, which makes the issue of the evaluation of such models more pressing. One popular approach is preference ratings, where the generated outputs of different systems are shown to evaluators who choose their preferences. In recent years the field shifted towards the development of automated (trained) metrics to assess generated outputs, which can be used to create preference ratings automatically. In this work, we investigate the evaluation of the metrics themselves, which currently rely on measuring the correlation to human judgments or computing sign accuracy scores. These measures only assess how well the metric agrees with the human ratings. However, our research shows that this does not tell the whole story. Most metrics exhibit a disagreement with human system assessments which is often skewed in favor of particular text generation systems, exposing a degree of favoritism in automated metrics. This paper introduces a formal definition of favoritism in preference metrics, and derives the Favi-Score, which measures this phenomenon. In particular we show that favoritism is strongly related to errors in final system rankings. Thus, we propose that preference-based metrics ought to be evaluated on both sign accuracy scores and favoritism.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# デバイス独立量子乱数生成の検討 Investigating a Device Independence Quantum Random Number Generation ( http://arxiv.org/abs/2406.01132v1 ) ライセンス: Link先を確認	Vardaan Mongia, Abhishek Kumar, Shashi Prabhakar, Anindya Banerji, R. P. Singh,	(参考訳) QRNG(Quantum random number generation)は、暗号分野において必要となるリソースである。しかし、その認証は難しかった。本稿では,デバイス独立設定における量子絡み合いの助けを借りてランダム性を証明し,ソース特性化のための2光子干渉を選択する。 CHSH不等式違反と量子状態トモグラフィーは、測定装置の独立チェックとして使用される。これらの測度は、量子乱数生成の予測不可能性を保証する。この処理は、高速なランダム性拡張プロトコルに容易に拡張できる。 Quantum random number generation (QRNG) is a resource that is a necessity in the field of cryptography. However, its certification has been challenging. In this article, we certify randomness with the aid of quantum entanglement in a device independent setting, where we choose two-photon interference for source characterisation. The CHSH inequality violation and quantum state tomography are used as independent checks on the measurement devices. These measures ensure the unpredictability of quantum random number generation. This work can be easily extended to faster randomness expansion protocols.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# 内部の危険 - ビジネスプロセスモデルを用いたインサイダー脅威モデリング The Danger Within: Insider Threat Modeling Using Business Process Models ( http://arxiv.org/abs/2406.01135v1 ) ライセンス: Link先を確認	Jan von der Assen, Jasmin Hochuli, Thomas Grübl, Burkhard Stiller,	(参考訳) 脅威モデリングは、情報システム内の技術的脅威のモデル化に成功している。しかし、非技術資産とその表現に焦点を当てた手法の欠如は理論や実践において観察できる。業界実践者の声に続き、ビジネスプロセスモデルに基づいてインサイダー脅威をモデル化する方法を考察した。そこで本研究では、BPMN(Business Process Modeling and Notation)を活用した、新たなインサイダー脅威知識ベースと脅威モデリングアプリケーションを開発した。最後に、理論的な知識とそのプロトタイプがいかに実践されるかを理解するため、本研究では、ITプロバイダのビジネスプロセスと、実際の投票プロセスのための実験的なデプロイの実際のケーススタディを実施した。その結果は、アノテーションなしでもBPMNダイアグラムを利用して組織内の脅威を自動的に識別できることを示している。 Threat modeling has been successfully applied to model technical threats within information systems. However, a lack of methods focusing on non-technical assets and their representation can be observed in theory and practice. Following the voices of industry practitioners, this paper explored how to model insider threats based on business process models. Hence, this study developed a novel insider threat knowledge base and a threat modeling application that leverages Business Process Modeling and Notation (BPMN). Finally, to understand how well the theoretic knowledge and its prototype translate into practice, the study conducted a real-world case study of an IT provider's business process and an experimental deployment for a real voting process. The results indicate that even without annotation, BPMN diagrams can be leveraged to automatically identify insider threats in an organization.	翻訳日:2024-06-06 01:48:31 公開日:2024-06-03
# 深さ境界型てんかん計画 Depth-Bounded Epistemic Planning ( http://arxiv.org/abs/2406.01139v1 ) ライセンス: Link先を確認	Thomas Bolander, Alessandro Burigana, Marco Montali,	(参考訳) 本稿では,動的てんかん論理(DEL)に基づく新しいてんかん計画法を提案する。新規性は、計画エージェントの推論の深さを上限bに制限することであり、計画エージェントは高次知識を少なくとも(モーダル)深さbにしか推論できないことを意味する。このアルゴリズムは、b-bisimulationに関する一意の最小モデルを保証する新しい種類の標準的b-bisimulation収縮を利用する。深度境界の計画アルゴリズムを音声で表す。さらに、推論深さの有界b内にある解を持つ計画タスクに関して完備であることを示す(従って、反復的有界深化変種は標準的意味で完備である)。推論深さの有界bについては、アルゴリズムは (b + 1)-EXPTIME 完全であることが示され、さらにエージェントと原子の数で固定パラメータが抽出可能である。本稿では,木探索版とグラフ検索版をそれぞれ提案し,木探索版の実装をベースライン・エピステミック・プランナーに対してベンチマークする。 In this paper, we propose a novel algorithm for epistemic planning based on dynamic epistemic logic (DEL). The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b, meaning that the planning agent can only reason about higher-order knowledge to at most (modal) depth b. The algorithm makes use of a novel type of canonical b-bisimulation contraction guaranteeing unique minimal models with respect to b-bisimulation. We show our depth-bounded planning algorithm to be sound. Additionally, we show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth (and hence the iterative bound-deepening variant is complete in the standard sense). For bound b of reasoning depth, the algorithm is shown to be (b + 1)-EXPTIME complete, and furthermore fixed-parameter tractable in the number of agents and atoms. We present both a tree search and a graph search variant of the algorithm, and we benchmark an implementation of the tree search version against a baseline epistemic planner.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 帰納的知識グラフ補完のための関係ネットワークを用いた論理推論 Logical Reasoning with Relation Network for Inductive Knowledge Graph Completion ( http://arxiv.org/abs/2406.01140v1 ) ライセンス: Link先を確認	Qinggang Zhang, Keyu Duan, Junnan Dong, Pai Zheng, Xiao Huang,	(参考訳) 帰納的知識グラフ補完(KGC)は、トレーニングセットに現れない新しいエンティティセットの欠落を推測することを目的としている。現実世界のKGは絶えず進化し、新しい知識を導入している。近年の研究では,KGCに新たなエンティティを組み込むために,サブグラフ上でのメッセージパッシングを用いた有望な結果が示されている。しかしながら、これらの手法の帰納的能力は通常2つの重要な問題によって制限される。 i) KGCは常にデータ疎結合に悩まされており、新しいエンティティが元のKGとほとんど、あるいは全く関係のないインダクティブKGCでは、状況はさらに悪化している。 (II)コールドスタート問題正確なKG推論では、少数の隣人からローカル情報を収集することで、新しいエンティティの表現を生成するために粗い粒度を超越している。この目的のために、誘導KG完了のための新しいiNfOmax RelAtion Network、すなわちNORANを提案する。帰納的KG完了のための潜在関係パターンの抽出を目的とする。具体的には、関係に集中することにより、NORANはKGモデリングに対するハイパービューを提供し、関係間の相関は帰納的KGCを実行するための実体に依存しない論理的証拠として自然に捉えることができる。 5つのベンチマークの大規模な実験結果から、我々のフレームワークは最先端のKGC手法よりも大幅に優れていることが示された。 Inductive knowledge graph completion (KGC) aims to infer the missing relation for a set of newly-coming entities that never appeared in the training set. Such a setting is more in line with reality, as real-world KGs are constantly evolving and introducing new knowledge. Recent studies have shown promising results using message passing over subgraphs to embed newly-coming entities for inductive KGC. However, the inductive capability of these methods is usually limited by two key issues. (i) KGC always suffers from data sparsity, and the situation is even exacerbated in inductive KGC where new entities often have few or no connections to the original KG. (ii) Cold-start problem. It is over coarse-grained for accurate KG reasoning to generate representations for new entities by gathering the local information from few neighbors. To this end, we propose a novel iNfOmax RelAtion Network, namely NORAN, for inductive KG completion. It aims to mine latent relation patterns for inductive KG completion. Specifically, by centering on relations, NORAN provides a hyper view towards KG modeling, where the correlations between relations can be naturally captured as entity-independent logical evidence to conduct inductive KGC. Extensive experiment results on five benchmarks show that our framework substantially outperforms the state-of-the-art KGC methods.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 知識グラフによる推論のためのGNN-LLM構文解析フレームワーク Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph ( http://arxiv.org/abs/2406.01145v1 ) ライセンス: Link先を確認	Guangyi Liu, Yongqi Zhang, Yong Li, Quanming Yao,	(参考訳) 知識グラフ(KG)に対する推論の課題は、複雑な構造と大量の無関係情報のために、Large Language Models(LLM)にとって大きな課題となる。既存のLCM推論手法は、正確な知識を提供するため、KGにおける構成学習の重要性を見落としている。加えて、LLMとの微調整と頻繁な相互作用は、かなりの時間と資源コストを発生させる。本稿では,知識グラフに対する質問回答(KGQA)タスクに焦点をあて,LLMとグラフニューラルネットワーク(GNN)を相乗化してKGを推論する探索-then-Determine(EtD)フレームワークを提案する。探索段階(Explore stage)は、有望な候補と質問に対する関連するきめ細かい知識を探索するために軽量なGNNを使用し、決定段階(Determine stage)は、探索された情報を利用して、知識を増強した複数選択プロンプトを構築し、凍結したLLMを誘導して最終回答を決定する。 3つのベンチマークKGQAデータセットの大規模な実験は、EtDが最先端のパフォーマンスを達成し、忠実な推論結果を生成することを示した。 The task of reasoning over Knowledge Graphs (KGs) poses a significant challenge for Large Language Models (LLMs) due to the complex structure and large amounts of irrelevant information. Existing LLM reasoning methods overlook the importance of compositional learning on KG to supply with precise knowledge. Besides, the fine-tuning and frequent interaction with LLMs incur substantial time and resource costs. This paper focuses on the Question Answering over Knowledge Graph (KGQA) task and proposes an Explore-then-Determine (EtD) framework that synergizes LLMs with graph neural networks (GNNs) for reasoning over KGs. The Explore stage employs a lightweight GNN to explore promising candidates and relevant fine-grained knowledge to the questions, while the Determine stage utilizes the explored information to construct a knowledge-enhanced multiple-choice prompt, guiding a frozen LLM to determine the final answer. Extensive experiments on three benchmark KGQA datasets demonstrate that EtD achieves state-of-the-art performance and generates faithful reasoning results.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# EMおよびAMアルゴリズムを用いた混合線形回帰の非依存学習 Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms ( http://arxiv.org/abs/2406.01149v1 ) ライセンス: Link先を確認	Avishek Ghosh, Arya Mazumdar,	(参考訳) 混合線形回帰はパラメトリック統計学と機械学習においてよく研究されている問題である。サンプルの集合、共変量およびラベルのタプルが与えられたとき、混合線形回帰のタスクは、サンプルに最もよく適合する線形関係の小さなリストを見つけることである。通常、ラベルは2つ以上の線形関数のうちの1つをランダムに選択し、この選択された関数を共変量に適用し、その結果にノイズを導入することによって確率的に生成されると仮定される。この状況下では、基底真実線型関数をパラメータ誤差まで推定することが目的である。一般的な予測最大化 (EM) と交代最小化 (AM) アルゴリズムは、これまで分析されてきた。本稿では,このような生成モデルを用いることなく,サンプルからの混合線形回帰の非依存学習のより一般的な問題について考察する。特に, AMとEMのアルゴリズムは, 分離性と良好な初期化の標準的な条件下で, 集団損失最小化器に収束することにより, 混合線形回帰における非依存的な学習をもたらすことを示す。ある意味で、これは「最適解」に収束するAMアルゴリズムとEMアルゴリズムの強みを示している。 Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying this chosen function to the covariates, and potentially introducing noise to the result. In that situation, the objective is to estimate the ground-truth linear functions up to some parameter error. The popular expectation maximization (EM) and alternating minimization (AM) algorithms have been previously analyzed for this. In this paper, we consider the more general problem of agnostic learning of mixed linear regression from samples, without such generative models. In particular, we show that the AM and EM algorithms, under standard conditions of separability and good initialization, lead to agnostic learning in mixed linear regression by converging to the population loss minimizers, for suitably defined loss functions. In some sense, this shows the strength of AM and EM algorithms that converges to ``optimal solutions'' even in the absence of realizable generative models.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 振り返って:ゴールコンディションGFlowNetの振り返り後方合成 Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets ( http://arxiv.org/abs/2406.01150v1 ) ライセンス: Link先を確認	Haoran He, Can Chang, Huazhe Xu, Ling Pan,	(参考訳) Generative Flow Networks (GFlowNets) は、確率的ポリシーを学習し、報酬に比例した確率で合成対象を逐次生成するためのアモータイズされたサンプリング手法である。 GFlowNetsは、単一の最適解に収束する標準的な戻り値最大化強化学習アプローチとは対照的に、多種多様なハイリワードオブジェクトを生成する優れた能力を示す。近年、目標条件付きGFlowNetを学習し、タスクが指定した目標を達成できる単一のGFlowNetをトレーニングすることを目的として、様々な有用なプロパティを取得するための研究が進められている。しかし、目標条件付きGFlowNetのトレーニングは、大きな状態空間でさらに悪化する極めて少ない報酬のために、重要な課題を生んでいる。本研究では,これらの課題に対処するため,RBS (Retrospective Backward Synthesis) という新しい手法を提案する。具体的には、RBSはGFlowNetsの後方方針に基づいて新しい後方軌道を合成し、品質と多様性を高めたトレーニング軌道を充実させ、スパース報酬問題を効率的に解決する。実験結果から,本手法はサンプル効率を大幅に向上し,各種標準評価ベンチマークにおいて高いベースラインを達成できることが示唆された。 Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single optimal solution. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. However, training a goal-conditioned GFlowNet poses critical challenges due to extremely sparse rewards, which is further exacerbated in large state spaces. In this work, we propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory based on the backward policy in GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby efficiently solving the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# DeepUniUSTransformer: ガイド付きUltraSoundモデルに向けて DeepUniUSTransformer: Towards A Universal UltraSound Model with Prompted Guidance ( http://arxiv.org/abs/2406.01154v1 ) ライセンス: Link先を確認	Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan,	(参考訳) 超音波は、低コスト、可搬性、安全性のために臨床実践において広く用いられている画像モダリティである。一般医療向けAIにおける現在の研究は、大きな言語モデルと一般的なセグメンテーションモデルに焦点を当てており、疾患予測と組織セグメンテーションの両方に対処するソリューションにはあまり注意を払っていない。本研究では,複数の臨床的タスクを伴いやすいモデルであるDeepUniUSTransformerという,超音波のための新しいユニバーサルフレームワークを提案する。このモデルの普遍性は、様々な側面にわたる汎用性から導かれる。超音波の性質、解剖学的位置、あらゆる入力タイプを巧みに管理し、セグメンテーションタスクだけでなく、コンピュータ支援の診断タスクでも優れている。我々は、この情報をプロンプトとして組み込んだ新しいモジュールを導入し、モデルの学習プロセスにシームレスに組み込む。提案したモデルをトレーニングし,検証するために,9.7K以上のアノテーションで最大7つの解剖学的位置を含む,公開ソースからの包括的超音波データセットをキュレートした。実験結果から,本モデルが1つのデータセットでトレーニングされたモデルと,即時ガイダンスを欠いたネットワークの短縮バージョンの両方を超越していることが判明した。我々は、継続的にデータセットを拡張し、医療用超音波の普遍性に向けてタスク固有のプロンプト機構を最適化する。モデルウェイト、データセット、コードは、オープンソースとして公開される。 Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namely DeepUniUSTransformer, which is a promptable model accommodating multiple clinical task. The universality of this model is derived from its versatility across various aspects. It proficiently manages any ultrasound nature, any anatomical position, any input type and excelling not only in segmentation tasks but also in computer-aided diagnosis tasks. We introduce a novel module that incorporates this information as a prompt and seamlessly embedding it within the model's learning process. To train and validate our proposed model, we curated a comprehensive ultrasound dataset from publicly accessible sources, encompassing up to 7 distinct anatomical positions with over 9.7K annotations. Experimental results demonstrate that our model surpasses both a model trained on a single dataset and an ablated version of the network lacking prompt guidance. We will continuously expand the dataset and optimize the task specific prompting mechanism towards the universality in medical ultrasound. Model weights, datasets, and code will be open source to the public.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 手術用スピンホールナノオシレータアレイの光ヘテロダイン顕微鏡 Optical heterodyne microscopy of operating spin Hall nano-oscillator arrays ( http://arxiv.org/abs/2406.01155v1 ) ライセンス: Link先を確認	A. Alemán, A. A. Awad, S. Muralidhar, R. Khymyn, A. Kumar, A. Houshang, D. Hanstorp, J. Åkerman,	(参考訳) 光ヘテロダイン検出は、幅広い物理励起を特徴づける強力な技術である。ここでは,光ヘテロダイン検出技術(基礎およびパラメトリックポンプ)を用いて,単一および複数ナノ収縮スピンホールナノオシレータ(SHNO)の高周波オートオシレーションを顕微鏡的に特徴付ける。この技術の有効性を実証するために,NiFe/PtとW/CoFeB/MgOの2つの異なる材料スタックからなるSHNOについて検討し,RF注入力とレーザパワーの両方が測定に与える影響について検討し,従来の電気的測定と比較した。 SHNO磁気力学の直接的,非侵襲的,サブミクロン的,空間的,位相分解的な特徴を示すために,Ising Machinesで使用される2種類のSHNOの自己振動の大きさと位相を図示する。この概念実証プラットフォームは、さらなる拡張のための強力な基盤を確立し、スピントロニクスデバイスに基づく新興コンピューティング技術の重要な特徴付け技術の開発に引き続き貢献する。 Optical heterodyne detection is a powerful technique for characterizing a wide range of physical excitations. Here, we use two types of optical heterodyne detection techniques (fundamental and parametric pumping) to microscopically characterize the high-frequency auto-oscillations of single and multiple nano-constriction spin Hall nano-oscillators (SHNOs). To validate the technique and demonstrate its robustness, we study SHNOs made from two different material stacks, NiFe/Pt and W/CoFeB/MgO, and investigate the influence of both the RF injection power and the laser power on the measurements, comparing the optical results to conventional electrical measurements. To demonstrate the key features of direct, non-invasive, submicron, spatial, and phase-resolved characterization of the SHNO magnetodynamics, we map out the auto-oscillation magnitude and phase of two phase-binarized SHNOs used in Ising Machines. This proof-of-concept platform establishes a strong foundation for further extensions, contributing to the ongoing development of crucial characterization techniques for emerging computing technologies based on spintronics devices	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 強/弱絡み状態を持つフォトニック回路のための量子一貫したニューラル/テンソルネットワーク Quantum consistent neural/tensor networks for photonic circuits with strongly/weakly entangled states ( http://arxiv.org/abs/2406.01157v1 ) ライセンス: Link先を確認	Nicolas Allegra,	(参考訳) フォトニック量子コンピュータや量子イメージングデバイスのような現代の量子光学系は、絡み合いを現実的に活用し、真の量子優位性に達することを期待して、その設計と実装に大きな精度を必要とする。これらのシステムの理論的、実験的探索と検証は、我々の古典的なシミュレーションの精度に大きく依存している。しかし、ヒルベルト空間が増加するにつれて、これらのシステムを設計し最適化するために使われる伝統的な計算手法は、次元的に量子的呪いのために厳しい制約に直面する。この課題に対処するために、ニューラルネットワークとテンソルネットワークに基づくアプローチを提案し、閉じた絡み合ったシステムの正確なユニタリ進化を、正確で効率的で量子一貫した方法で近似する。量子力学のごく少数の例でネットワークを訓練することにより、より大きなヒルベルト空間における効率的なパラメータ推定を可能にし、多くの量子力学問題に対する興味深い解を提供する。 Modern quantum optical systems such as photonic quantum computers and quantum imaging devices require great precision in their designs and implementations in the hope to realistically exploit entanglement and reach a real quantum advantage. The theoretical and experimental explorations and validations of these systems are greatly dependent on the precision of our classical simulations. However, as Hilbert spaces increases, traditional computational methods used to design and optimize these systems encounter hard limitations due to the quantum curse of dimensionally. To address this challenge, we propose an approach based on neural and tensor networks to approximate the exact unitary evolution of closed entangled systems in a precise, efficient and quantum consistent manner. By training the networks with a reasonably small number of examples of quantum dynamics, we enable efficient parameter estimation in larger Hilbert spaces, offering an interesting solution for a great deal of quantum metrology problems.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# プライベートスケッチからのプロファイル再構成 Profile Reconstruction from Private Sketches ( http://arxiv.org/abs/2406.01158v1 ) ライセンス: Link先を確認	Hao Wu, Rasmus Pagh,	(参考訳) a multiset of $n$ items from $\mathcal{D}$, \emph{known reconstruction} problem for $t = 0, 1, \dots, n$, the fraction $\vec{f}[t]$ in $\mathcal{D}$ that appear exactly $t$ times。分散空間制約付き環境では,$\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$の近似を計算できるような,マルチセットのアップダブルでプライベートなスケッチを維持したいと考える。離散ラプラス雑音を用いて民生化したヒストグラムを用いて,Dwork et al ~ (ITCS '10。 LPベースのテクニックを多項式時間から$O(d + n \log n)$に高速化する方法を示し、$d = \|\mathcal{D}\|$, $\ell_1$, $\ell_2$および$\ell_\infty$ノルムで達成可能なエラーを分析する。すべての場合、$d$上のエラーの依存関係は$O(1 / \sqrt{d})$ -- である。 Given a multiset of $n$ items from $\mathcal{D}$, the \emph{profile reconstruction} problem is to estimate, for $t = 0, 1, \dots, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to ``reverse'' the noise, using an approach of Dwork et al.~(ITCS '10). We show how to speed up their LP-based technique from polynomial time to $O(d + n \log n)$, where $d = \|\mathcal{D}\|$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d$ is $O( 1 / \sqrt{d})$ -- we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# Dimba: Transformer-Mamba拡散モデル Dimba: Transformer-Mamba Diffusion Models ( http://arxiv.org/abs/2406.01159v1 ) ライセンス: Link先を確認	Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang,	(参考訳) 本稿では,Transformer と Mamba 要素を組み合わせた独自のハイブリッドアーキテクチャを用いた新しいテキスト・画像拡散モデルである Dimba について述べる。具体的には、DimbaはTransformer層とMamba層の間で順次積み重ねられたブロックを交互に扱い、コンディション情報をクロスアテンション層を通じて統合することで、両方のアーキテクチャパラダイムの利点を生かしている。画質調整,解像度適応,大規模画像生成に必要な重要な構成など,いくつかの最適化手法について検討する。モデルの柔軟な設計は、特定のリソースの制約や目的に対応するシナリオをサポートする。適切にスケールすると、Dimbaは従来の純粋なTransformersベースのベンチマークと比較してスループットとメモリフットプリントが大幅に削減される。大規模な実験により、Dimbaは画像の品質、芸術的レンダリング、セマンティックコントロールの点でベンチマークと比較すると、同等のパフォーマンスを達成したことが示されている。また,実験中の評価およびリリースチェックポイント中に発見されたアーキテクチャの興味深い特性について報告する。本研究は,拡散モデルの基礎段階における大規模ハイブリッドトランスフォーマー・マンバアーキテクチャの実現を強調し,テキスト・画像生成の明るい未来を示唆するものである。 This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investigate several optimization strategies, including quality tuning, resolution adaption, and identify critical configurations necessary for large-scale image generation. The model's flexible design supports scenarios that cater to specific resource constraints and objectives. When scaled appropriately, Dimba offers substantial throughput and a reduced memory footprint relative to conventional pure Transformers-based benchmarks. Extensive experiments indicate that Dimba achieves comparable performance compared with benchmarks in terms of image quality, artistic rendering, and semantic control. We also report several intriguing properties of architecture discovered during evaluation and release checkpoints in experiments. Our findings emphasize the promise of large-scale hybrid Transformer-Mamba architectures in the foundational stage of diffusion models, suggesting a bright future for text-to-image generation.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 動的構造因果モデル Dynamic Structural Causal Models ( http://arxiv.org/abs/2406.01161v1 ) ライセンス: Link先を確認	Philip Boeken, Joris M. Mooij,	(参考訳) 本研究では,動的構造因果モデル (DSCM) と呼ばれる,内因性変数が時間の関数を表す特定のタイプのSCMについて検討する。目的として,確率微分方程式(SDE)の特定の系をDSCMで適切に表現できることを示す。この構成の直接的な結果は、SDEのシステムに対するグラフィカルなマルコフ特性である。時間分割操作を定義し、局所的な独立の概念(連続時間グランガー(非因果性)の概念)を分析する。また、離散時間DSCMを返却し、サブサンプリング時間列の数学的解析に使用できるサブサンプリング演算を定義する。本稿では、DSCMが時間依存的介入の因果効果の同定にどのように利用できるか、および既存の制約に基づく因果探索アルゴリズムが時系列データにどのように適用できるかを提案する。 We study a specific type of SCM, called a Dynamic Structural Causal Model (DSCM), whose endogenous variables represent functions of time, which is possibly cyclic and allows for latent confounding. As a motivating use-case, we show that certain systems of Stochastic Differential Equations (SDEs) can be appropriately represented with DSCMs. An immediate consequence of this construction is a graphical Markov property for systems of SDEs. We define a time-splitting operation, allowing us to analyse the concept of local independence (a notion of continuous-time Granger (non-)causality). We also define a subsampling operation, which returns a discrete-time DSCM, and which can be used for mathematical analysis of subsampled time-series. We give suggestions how DSCMs can be used for identification of the causal effect of time-dependent interventions, and how existing constraint-based causal discovery algorithms can be applied to time-series data.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 制約付き特徴選択のための条件付きGumbel-Softmaxと無線センサネットワークにおけるノード選択への応用 Conditional Gumbel-Softmax for constrained feature selection with application to node selection in wireless sensor networks ( http://arxiv.org/abs/2406.01162v1 ) ライセンス: Link先を確認	Thomas Strypsteen, Alexander Bertrand,	(参考訳) 本稿では,与えられたタスクとディープニューラルネットワーク(DNN)モデルに対する最適な特徴部分集合のエンドツーエンド学習を行う手法として,条件付きGumbel-Softmaxを導入する。これは、サブセット内の各機能の選択を他の機能に条件付けすることで行います。本稿では,無線センサネットワーク(WSN)を構成するタスク最適ノードの選択に,無線センサ間の通信を必要とするノードが互いに距離が大きすぎることを保証するとともに,通信に要する電力を制限するために,このアプローチをどのように利用できるかを実証する。本手法は,運動実行タスクを解くためのEmulated Wireless Electroencephalography (EEG) Sensor Network (WESN)上で検証する。本研究では,制約がより厳密になるにつれてWESNの性能がどう変化するか,条件付きGumbel-Softmaxがヒューリスティックで欲求的な選択法と比較した場合の精度について分析する。本稿では,ウェアラブル脳-コンピュータインタフェースに焦点をあてるが,提案手法は汎用的であり,無線センサネットワークにおけるノード配置や,他のアプリケーションにおける制約付き特徴選択にも容易に適用できる。 In this paper, we introduce Conditional Gumbel-Softmax as a method to perform end-to-end learning of the optimal feature subset for a given task and deep neural network (DNN) model, while adhering to certain pairwise constraints between the features. We do this by conditioning the selection of each feature in the subset on another feature. We demonstrate how this approach can be used to select the task-optimal nodes composing a wireless sensor network (WSN) while ensuring that none of the nodes that require communication between one another have too large of a distance between them, limiting the required power spent on this communication. We validate this approach on an emulated Wireless Electroencephalography (EEG) Sensor Network (WESN) solving a motor execution task. We analyze how the performance of the WESN varies as the constraints are made more stringent and how well the Conditional Gumbel-Softmax performs in comparison with a heuristic, greedy selection method. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology is generic and can readily be applied to node deployment in wireless sensor networks and constrained feature selection in other applications as well.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# AIはどのように倫理的であるべきか? LLMのリスク設定をAIがどう形作るか How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs ( http://arxiv.org/abs/2406.01168v1 ) ライセンス: Link先を確認	Shumiao Ouyang, Hayong Yun, Xingjian Zheng,	(参考訳) 本研究では,Large Language Models(LLMs)のリスク嗜好と,それらと人間の倫理基準を整合させるプロセスが,その経済的な意思決定に与える影響について検討する。 30個のLSMを解析することにより、リスク逆からリスク探索まで、幅広い固有のリスクプロファイルを明らかにした。そして、さまざまなタイプのAIアライメント、モデルが人間の価値観に従って行動することを保証するプロセス、無害性、有用性、誠実性に焦点を当てたプロセス、が、これらの基本的リスク嗜好を変化させる方法について検討する。アライメントはLSMをリスク回避に大きくシフトさせ、最も保守的な投資行動を示す3つの倫理的側面をすべて組み込んだモデルである。企業決算書から企業投資を予測するためにLLMを用いた以前の研究を再現し、いくつかのアライメントは投資予測の精度を向上させることができるが、過剰なアライメントは過度に慎重な予測をもたらすことを示した。これらの結果から, 過度に整合したLCMを財務意思決定に投入することは, 深刻な過小評価につながる可能性が示唆された。我々は、金融の LLM を活用する際に、経済領域の特定の要件と倫理的整合性の度合いを慎重にバランスさせるニュアンス的なアプローチの必要性を強調している。 This study explores the risk preferences of Large Language Models (LLMs) and how the process of aligning them with human ethical standards influences their economic decision-making. By analyzing 30 LLMs, we uncover a broad range of inherent risk profiles ranging from risk-averse to risk-seeking. We then explore how different types of AI alignment, a process that ensures models act according to human values and that focuses on harmlessness, helpfulness, and honesty, alter these base risk preferences. Alignment significantly shifts LLMs towards risk aversion, with models that incorporate all three ethical dimensions exhibiting the most conservative investment behavior. Replicating a prior study that used LLMs to predict corporate investments from company earnings call transcripts, we demonstrate that although some alignment can improve the accuracy of investment forecasts, excessive alignment results in overly cautious predictions. These findings suggest that deploying excessively aligned LLMs in financial decision-making could lead to severe underinvestment. We underline the need for a nuanced approach that carefully balances the degree of ethical alignment with the specific requirements of economic domains when leveraging LLMs within finance.	翻訳日:2024-06-06 01:38:29 公開日:2024-06-03
# 外乱露光によるゼロショットアウトオブディストリビューション検出 Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure ( http://arxiv.org/abs/2406.01170v1 ) ライセンス: Link先を確認	Choubo Ding, Guansong Pang,	(参考訳) CLIPのような視覚言語モデルは、ゼロショットタスクに広く適用され、インディストリビューション(ID)データにおいて顕著なパフォーマンスを得るため、ゼロショット設定におけるアウト・オブ・ディストリビューション(OOD)インプットの検出と拒否は、そのようなモデルをオンザフライで使用することの安全性を確保するために欠かせないものとなっている。既存のゼロショットOOD検出器の多くは、IDイメージの分類とOODイメージの拒否においてCLIPを誘導するIDクラスラベルベースのプロンプトに依存している。この作業では、代わりに、OODクラステキストプロンプトとして、多種多様な補助的なoutlierクラスラベルをCLIPにプロンプトして、ゼロショットのOOD検出を強化することを提案しています。鍵となる直感は、ID画像は、OOD画像よりも、これらの外れ値クラスのプロンプトと類似性が低いことが期待されていることである。 1つの問題は、生のクラスラベルは、しばしばノイズラベル(例えば、IDラベルの同義語)を含み、生のOLEベースの検出が効果的でないことである。この問題に対処するため,OODスコアリングを組み込むために,OODスコアリングを行うために,OODラベルの即時埋め込みを利用したOOD学習モジュールを導入する。さらに、outlierクラスとそのプロトタイプはIDクラスと疎結合になり、分離不能な決定領域につながる。そこで,我々は,OLEにおける検出のさらなる校正を行うために,外付け型プロトタイプとIDクラス埋め込みを合成して,外付け型プロトタイプを生成する,外付け型ラベル生成モジュールも導入した。その単純さにもかかわらず、広範囲な実験により、OLEは検出性能を大幅に改善し、大規模OODおよびハードOOD検出ベンチマークにおける新しい最先端性能を実現することが示されている。 As vision-language models like CLIP are widely applied to zero-shot tasks and gain remarkable performance on in-distribution (ID) data, detecting and rejecting out-of-distribution (OOD) inputs in the zero-shot setting have become crucial for ensuring the safety of using such models on the fly. Most existing zero-shot OOD detectors rely on ID class label-based prompts to guide CLIP in classifying ID images and rejecting OOD images. In this work we instead propose to leverage a large set of diverse auxiliary outlier class labels as pseudo OOD class text prompts to CLIP for enhancing zero-shot OOD detection, an approach we called Outlier Label Exposure (OLE). The key intuition is that ID images are expected to have lower similarity to these outlier class prompts than OOD images. One issue is that raw class labels often include noise labels, e.g., synonyms of ID labels, rendering raw OLE-based detection ineffective. To address this issue, we introduce an outlier prototype learning module that utilizes the prompt embeddings of the outlier labels to learn a small set of pivotal outlier prototypes for an embedding similarity-based OOD scoring. Additionally, the outlier classes and their prototypes can be loosely coupled with the ID classes, leading to an inseparable decision region between them. Thus, we also introduce an outlier label generation module that synthesizes our outlier prototypes and ID class embeddings to generate in-between outlier prototypes to further calibrate the detection in OLE. Despite its simplicity, extensive experiments show that OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# LLMにおける2つのペルソナ:ロールプレイングとパーソナライズに関する調査 Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization ( http://arxiv.org/abs/2406.01171v1 ) ライセンス: Link先を確認	Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Yu-Ching Hsu, Jia-Yin Foo, Chao-Wei Huang, Yun-Nung Chen,	(参考訳) 近年,大規模言語モデル (LLM) を特定のシナリオに適用する方法が注目されている。特に、もともと対話文学で採用された「textit{persona}」という概念は、有望な道として復活してきた。しかし、ペルソナの研究は比較的非組織化され、体系的な概要が欠如している。ギャップを埋めるために、フィールドの現状を分類するための総合的な調査を提示する。 LLMロールプレイング(LLMロールプレイング)、LLMパーソナライゼーション(LLMパーソナライゼーション)、LLMパーソナライゼーション(LLMパーソナライゼーション)という2つの研究の行を識別する。我々は,LLMロールプレイングとLLMパーソナライズに適した最初の調査を,分類学,現在の課題,潜在的方向性を含むペルソナの統一的視点の下で提示する。将来の取り組みを促進するため、私たちはコミュニティで利用可能なペーパーコレクションを積極的に維持しています。 Recently, methods investigating how to adapt large language models (LLMs) for specific scenarios have gained great attention. Particularly, the concept of \textit{persona}, originally adopted in dialogue literature, has re-surged as a promising avenue. However, the growing research on persona is relatively disorganized, lacking a systematic overview. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. To the best of our knowledge, we present the first survey tailored for LLM role-playing and LLM personalization under the unified view of persona, including taxonomy, current challenges, and potential directions. To foster future endeavors, we actively maintain a paper collection available to the community: https://github.com/MiuLab/PersonaLLM-Survey	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# 原子分子BECにおける中性原子量子ビット Neutral-atom qubits in atom-molecular BEC ( http://arxiv.org/abs/2406.01177v1 ) ライセンス: Link先を確認	Leena Barshilia, Rajiuddin Sk, Prasanta K. Panigrahi, Avinash Khare,	(参考訳) 近年、中立原子は量子コンピューティングの有望なプラットフォームとして登場し、スケーラビリティを提供している。本研究では,原子-分子ボース-アインシュタイン凝縮体における原子量子ビットの実現について述べる。第1の例では、凝縮した分子は平らな形状の液滴プラットフォームを形成し、外部環境と隣接する分子の両方から効果的に隔離される。第2の原子量子ビットは「パルス」形式の波動関数を持ち、パワー法則の振る舞いを示すが、第3の原子はそれぞれの合成形式、$\sech^2{\beta x}$と$\sech{\beta x}\tanh{\beta x}$の基底および励起状態の波動関数を持つ。量子ビットの局在は、光結合によって制御される化学ポテンシャルに依存し、量子ビットの操作を効果的に制御する。エネルギーレベル分離, ヒーリング長, 原子番号などの関連するパラメータは, 高分子クビットおよび分子滴の挙動を管理する光結合の非線形性と強度に影響されている。 Recently, neutral atoms have emerged as a promising platform for quantum computing, offering scalability. In this study, we showcase the realization of atomic qubits in atom-molecular Bose-Einstein condensate, belonging to three distinct classes. In the first case, the condensed molecules form a droplet platform with a flat-top configuration, facilitating effective isolation from both external environments and neighbouring molecules. The second atomic qubits have wavefunctions in the ``pulse" form, exhibiting power law behaviour, whereas the third one has ground and excited state wavefunctions in their respective composite forms, $\sech^2{\beta x}$ and $\sech{\beta x}\tanh{\beta x}$. The localization of the qubits depends on the chemical potential, which is governed by the photo association, providing effective control for qubit manipulation. The relevant parameters, such as energy level separation, healing length, and atom numbers, are found to be influenced by the non-linearity and strength of photo associations governing the behaviour of macroscopic qubits and molecular droplets.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# 潜時空間オブジェクトに基づく最適制御を用いた深層強化学習行動モードスイッチング Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective ( http://arxiv.org/abs/2406.01178v1 ) ライセンス: Link先を確認	Sindre Benjamin Remman, Bjørn Andreas Kristiansen, Anastasios M. Lekkas,	(参考訳) 本研究では,政策の潜伏空間で直接最適化することで,深層強化学習政策の行動を変えるために最適制御を用いる。我々は,深い強化学習政策の潜伏空間の特定の領域において,個別の行動パターン,いわゆる行動モードが識別可能であることを仮定し,これらの領域において特定の行動や戦略が好ましいことを示す。我々は,これらの行動モードを,<ac{pacmap} を用いた潜時空間次元推論を用いて同定する。最適な制御手順によって生成された動作を用いて、システムを1つの行動モードから別の行動モードに移動させる。その後、ニューラルネットワークポリシーを解釈するためのフィルタとしてこれらのアクションを利用する。提案手法は, 失敗エピソードを成功させる方法を示すとともに, 月面着陸支援学習環境を用いて, 望ましい行動モードを付与できることが示唆された。 In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# AIによるテキスト検出は対向的摂動に頑健か? Are AI-Generated Text Detectors Robust to Adversarial Perturbations? ( http://arxiv.org/abs/2406.01179v1 ) ライセンス: Link先を確認	Guanhua Huang, Yuchen Zhang, Zhe Li, Yongjian You, Mingze Wang, Zhouwang Yang,	(参考訳) 大規模言語モデル(LLM)の普及は、AI生成テキストの潜在的な誤用に対する懸念を引き起こしている。 AI生成テキスト(AIGT)の現在の検出器は、人間生成テキストとAI生成テキストを区別する際の逆転の原因となる文字や単語の微妙な変化など、敵の摂動に対する堅牢性に欠ける。本稿では,既存のAIGT検出手法の堅牢性について検討し,新しい検出器であるシームズ校正再構成ネットワーク(SCRN)を導入する。 SCRNは、テキストからのノイズの追加と除去に再構成ネットワークを使用し、局所的な摂動に対して堅牢な意味表現を抽出する。また、異なる雑音下でモデルが等しく信頼されるように訓練するためのシムズ校正手法を提案し、逆方向の摂動に対するモデルの頑健性を向上させる。 4つの公開データセットの実験により、SCRNは全てのベースライン法を上回り、6.5\%-18.25\%の絶対精度の向上を達成した。さらに、クロスドメイン、クロスジャンル、ミックスソースのシナリオにおいて、優れた一般化性を示す。コードは \url{https://github.com/CarlanLark/Robust-AIGC-Detector} で公開されている。 The widespread use of large language models (LLMs) has sparked concerns about the potential misuse of AI-generated text, as these models can produce content that closely resembles human-generated text. Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations, with even minor changes in characters or words causing a reversal in distinguishing between human-created and AI-generated text. This paper investigates the robustness of existing AIGT detection methods and introduces a novel detector, the Siamese Calibrated Reconstruction Network (SCRN). The SCRN employs a reconstruction network to add and remove noise from text, extracting a semantic representation that is robust to local perturbations. We also propose a siamese calibration technique to train the model to make equally confidence predictions under different noise, which improves the model's robustness against adversarial perturbations. Experiments on four publicly available datasets show that the SCRN outperforms all baseline methods, achieving 6.5\%-18.25\% absolute accuracy improvement over the best baseline method under adversarial attacks. Moreover, it exhibits superior generalizability in cross-domain, cross-genre, and mixed-source scenarios. The code is available at \url{https://github.com/CarlanLark/Robust-AIGC-Detector}.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# Q-BiC:in vitroおよびin vivoにおけるスピンベース量子センシングのための生体適合型集積チップ Q-BiC: A biocompatible integrated chip for in vitro and in vivo spin-based quantum sensing ( http://arxiv.org/abs/2406.01181v1 ) ライセンス: Link先を確認	Louise Shanahan, Sophia Belser, Jack W. Hart, Qiushi Gu, Julien R. E. Roth, Annika Mechnich, Michael Hoegen, Soham Pal, David Jordan, Eric A. Miska, Mete Atature, Helena S. Knowles,	(参考訳) 光処理可能なスピンベースの量子センサーは、システムの温度、磁場、pH、その他の物理的特性のナノスケールの測定を可能にする。生体細胞や多細胞生物の実証実験を超えて、信頼性が高く損傷のない量子センシングにセンサーを応用することは、3つの技術的課題をもたらす。第一に、スピンベースの量子センシングは光アクセシビリティとマイクロ波の伝送を必要とする。第二に、あらゆるマイクロエレクトロニクスは生物と互換性があり、生きた標本を撮像するために設計されなければならない。第三に、マイクロ波の効率のよい供給と温度制御は、望ましくない加熱を減らし、最適な生物学的環境を維持するために不可欠である。本稿では,マイクロ流体に適合したマイクロ波の伝送を容易にするQuantum Biosensing Chip (Q-BiC)について述べる。本研究では, 窒素空孔中心を含むナノダイヤモンドとQ-BiCを併用し, 生体内での磁気共鳴を光学的に検出した。我々は,HeLa細胞および線虫Caenorhabditis elegansの光磁気共鳴をin vitroで検出するために必要なマイクロ波励起の生体適合性を定量化し,除草効果が観測される前に許容されるマイクロ波露光範囲を決定する。さらに, ナノスケールの量子温度測定を, 最小限の応力で, 固定化・無麻酔の成人線虫に行うことができることを示した。これらの結果は、研究中の生体システムに損傷を与えることなく、スピンベースの量子センサーを使用することを可能にし、細胞内プロセスの局所熱力学および粘弾性特性の研究を容易にする。 Optically addressable spin-based quantum sensors enable nanoscale measurements of temperature, magnetic field, pH, and other physical properties of a system. Advancing the sensors beyond proof-of-principle demonstrations in living cells and multicellular organisms towards reliable, damage-free quantum sensing poses three distinct technical challenges. First, spin-based quantum sensing requires optical accessibility and microwave delivery. Second, any microelectronics must be biocompatible and designed for imaging living specimens. Third, efficient microwave delivery and temperature control are essential to reduce unwanted heating and to maintain an optimal biological environment. Here, we present the Quantum Biosensing Chip (Q-BiC), which facilitates microfluidic-compatible microwave delivery and includes on-chip temperature control. We demonstrate the use of Q-BiC in conjunction with nanodiamonds containing nitrogen vacancy centers to perform optically detected magnetic resonance in living systems. We quantify the biocompatibility of microwave excitation required for optically detected magnetic resonance both in vitro in HeLa cells and in vivo in the nematode Caenorhabditis elegans for temperature measurements and determine the microwave-exposure range allowed before detrimental effects are observed. In addition, we show that nanoscale quantum thermometry can be performed in immobilised but non-anaesthetised adult nematodes with minimal stress. These results enable the use of spin-based quantum sensors without damaging the biological system under study, facilitating the investigation of the local thermodynamic and viscoelastic properties of intracellular processes.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# スペクトルニューラルネットワークによる自動入力特徴関連性 Automatic Input Feature Relevance via Spectral Neural Networks ( http://arxiv.org/abs/2406.01183v1 ) ライセンス: Link先を確認	Lorenzo Chicchi, Lorenzo Buffoni, Diego Febbe, Lorenzo Giambagli, Raffaele Marino, Duccio Fanelli,	(参考訳) 機械学習の分野では、高次元データを扱うことが一般的なプラクティスである。したがって、より効率的な数値処理を行うために、よりコンパクトなデータセットを得るために、関連する入力特徴を特定することが重要である。さらに、意思決定の基盤となる重要な要素を分離することで、モデルによる解釈可能性に関する詳細化に寄与することができる。本稿では,ディープニューラルネットワークにおける入力成分の相対的重要性を推定する新しい手法を提案する。これは最適化プロセスのスペクトル再パラメータ化を活用することで達成される。入力ノードに関連する固有値は、実際に供給されたエントリ特徴の関連性を評価するための堅牢なプロキシを提供する。既存の技術とは異なり、スペクトル特徴ランキングはネットワークトレーニングの副産物として自動的に実行される。この手法は、合成データと実データの両方に対してうまく挑戦されている。 Working with high-dimensional data is a common practice, in the field of machine learning. Identifying relevant input features is thus crucial, so as to obtain compact dataset more prone for effective numerical handling. Further, by isolating pivotal elements that form the basis of decision making, one can contribute to elaborate on - ex post - models' interpretability, so far rather elusive. Here, we propose a novel method to estimate the relative importance of the input components for a Deep Neural Network. This is achieved by leveraging on a spectral re-parametrization of the optimization process. Eigenvalues associated to input nodes provide in fact a robust proxy to gauge the relevance of the supplied entry features. Unlike existing techniques, the spectral features ranking is carried out automatically, as a byproduct of the network training. The technique is successfully challenged against both synthetic and real data.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# SNPGuard: オープンソースツールを使用したSEV-SNP VMのリモートテスト SNPGuard: Remote Attestation of SEV-SNP VMs Using Open Source Tools ( http://arxiv.org/abs/2406.01186v1 ) ライセンス: Link先を確認	Luca Wilke, Gianluca Scopelliti,	(参考訳) クラウドコンピューティングは、今日の複雑なコンピューティング要求に対処するための、ユビキタスなソリューションである。しかし、クラウドサービスプロバイダがインフラストラクチャ上で実行されているコードとデータに完全なアクセス権を持つため、データのプライバシに関する懸念が伴う。 VMベースのTrusted Execution Environments(TEEs)は、この問題を解決するための有望なソリューションです。クラウドサービスプロバイダをロックアウトするための強力なアイソレーション保証と、エンドユーザが信頼性を確認するための検証メカニズムを提供する。 VMのブートチェーン全体をテストすることは、いくつかのソフトウェアコンポーネントの変更を必要とする難しいタスクです。個々のコンポーネントにはオープンソースソリューションがあるが、それらを適切に統合するためのツールやドキュメントはいまだに不足している。本稿では、このギャップを2つの一般的なブートワークフローで解決し、手作業の少ないオープンソースのツールを提供することで埋めようとしている。最初のワークフローでは、VMイメージは整合性のみを必要とするが機密性を必要としないと仮定し、中断されていないブートプロセスを可能にする。第2のワークフローは、暗号化されたルートファイルシステムでVMをブートすることを含み、早期起動時に復号鍵をセキュアにプロビジョニングする必要がある。私たちのツールはAMD Secure Encrypted Virtualization (SEV) VMをターゲットにしていますが、コンセプトはIntel Trusted Domain Extensions (TDX)のような他のVMベースのTEEにも当てはまります。 Cloud computing is a ubiquitous solution to handle today's complex computing demands. However, it comes with data privacy concerns, as the cloud service provider has complete access to code and data running on their infrastructure. VM-based Trusted Execution Environments (TEEs) are a promising solution to solve this issue. They provide strong isolation guarantees to lock out the cloud service provider, as well as an attestation mechanism to enable the end user to verify their trustworthiness. Attesting the whole boot chain of a VM is a challenging task that requires modifications to several software components. While there are open source solutions for the individual components, the tooling and documentation for properly integrating them remains scarce. In this paper, we try to fill this gap by elaborating on two common boot workflows and providing open source tooling to perform them with low manual effort. The first workflow assumes that the VM image does only require integrity but not confidentiality, allowing for an uninterrupted boot process. The second workflow covers booting a VM with an encrypted root filesystem, requiring secure provisioning of the decryption key during early boot. While our tooling targets AMD Secure Encrypted Virtualization (SEV) VMs, the concepts also apply to other VM-based TEEs such as Intel Trusted Domain Extensions (TDX).	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# 自動透過光から蛍光画像への移行のためのパッチベースエンコーダデコーダアーキテクチャ:LightMyCellsチャレンジへの貢献 Patch-Based Encoder-Decoder Architecture for Automatic Transmitted Light to Fluorescence Imaging Transition: Contribution to the LightMyCells Challenge ( http://arxiv.org/abs/2406.01187v1 ) ライセンス: Link先を確認	Marek Wodzinski, Henning Müller,	(参考訳) ラベルなし光入力画像から蛍光ラベル付きオルガネラの自動予測は重要な課題であるが難しい課題である。従来の蛍光画像の取得方法は、時間とコストのかかる生化学ラベリングを行うことに関係している。したがって、ラベルのない透過型光顕微鏡に基づいてタスクを実行する自動アルゴリズムは、非常に有益である。このタスクの重要性は、明るい磁場、位相コントラスト、または差分コントラスト顕微鏡画像からなる入力に基づいて、蛍光標識された核、ミトコンドリア、チューリン、アクチンを自動的に予測するアルゴリズムを提案することを目的として、LightMyCellsチャレンジを組織するためにフランス・ビオイマの研究者を動機づけた。本稿では,その課題のかなりのスコアを達成し,最もパフォーマンスの高いチームのひとつに位置づける,慎重に準備され,訓練されたエンコーダ-デコーダディープニューラルネットワークに基づくAGHSSOチームの貢献を紹介する。 Automatic prediction of fluorescently labeled organelles from label-free transmitted light input images is an important, yet difficult task. The traditional way to obtain fluorescence images is related to performing biochemical labeling which is time-consuming and costly. Therefore, an automatic algorithm to perform the task based on the label-free transmitted light microscopy could be strongly beneficial. The importance of the task motivated researchers from the France-BioImaging to organize the LightMyCells challenge where the goal is to propose an algorithm that automatically predicts the fluorescently labeled nucleus, mitochondria, tubulin, and actin, based on the input consisting of bright field, phase contrast, or differential interference contrast microscopic images. In this work, we present the contribution of the AGHSSO team based on a carefully prepared and trained encoder-decoder deep neural network that achieves a considerable score in the challenge, being placed among the best-performing teams.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# UniAnimate: 一貫性のある人間の画像アニメーションのための統一ビデオ拡散モデルの開発 UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation ( http://arxiv.org/abs/2406.01188v1 ) ライセンス: Link先を確認	Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang,	(参考訳) 最近の拡散に基づく人間の画像アニメーション技術は、与えられた参照アイデンティティと望ましい動きの連続を忠実に追従するビデオの合成において、驚くべき成功を収めている。それにもかかわらず、まだ2つの制限がある。一特典画像とメインビデオブランチとの整合を図り、最適化の負担とモデルパラメータを大幅に増大させる余分な参照モデルが必要である。二生成されたビデオは、通常、短い時間(例えば24フレーム)であり、実用上の応用を妨げる。これらの欠点に対処するため、我々は、効率よく長期的なヒューマンビデオ生成を可能にするUniAnimateフレームワークを提案する。まず、最適化の難しさを低減し、時間的コヒーレンスを確保するため、映像拡散モデルを統合することで、姿勢誘導やノイズビデオとともに、参照画像を共通の特徴空間にマッピングする。第2に、ランダムノイズ入力と第1フレーム条件入力をサポートする統一ノイズ入力を提案する。最後に、より効率的に長いシーケンスを処理するために、状態空間モデルに基づく代替の時間的モデリングアーキテクチャを探索し、計算に費やした時間的変換器を置き換える。実験結果から,UniAnimateは既存の最先端技術よりも定量的および定性的な評価において優れた合成結果が得られることが示唆された。特に、UniAnimateは、第1フレーム条件付け戦略を反復的に活用することで、高度に一貫した1分間のビデオを生成することができる。コードとモデルは公開されます。プロジェクトページ: https://unianimate.github.io/.com Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization burden and model parameters; ii) the generated video is usually short in time (e.g., 24 frames), hampering practical applications. To address these shortcomings, we present a UniAnimate framework to enable efficient and long-term human video generation. First, to reduce the optimization difficulty and ensure temporal coherence, we map the reference image along with the posture guidance and noise video into a common feature space by incorporating a unified video diffusion model. Second, we propose a unified noise input that supports random noised input as well as first frame conditioned input, which enhances the ability to generate long-term video. Finally, to further efficiently handle long sequences, we explore an alternative temporal modeling architecture based on state space model to replace the original computation-consuming temporal Transformer. Extensive experimental results indicate that UniAnimate achieves superior synthesis results over existing state-of-the-art counterparts in both quantitative and qualitative evaluations. Notably, UniAnimate can even generate highly consistent one-minute videos by iteratively employing the first frame conditioning strategy. Code and models will be publicly available. Project page: https://unianimate.github.io/.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# S-CycleGAN: CT-Ultrasound Image-to- Image Translationによるロボット超音波診断 S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography ( http://arxiv.org/abs/2406.01191v1 ) ライセンス: Link先を確認	Yuhan Song, Nak Young Chong,	(参考訳) 超音波画像は、その非侵襲性や安全性のため、様々な診断において重要である。臨床実践においては,超音波画像解析の精度と精度が重要である。近年の深層学習の進歩は, 医用画像の処理能力が大きく向上している。しかし、深層学習のデータ飢えの性質と高品質な超音波画像訓練データ不足により、深層学習に基づく超音波解析法の開発が抑制される。これらの課題に対処するために,CTデータから高品質な合成超音波画像を生成するS-CycleGANという高度なディープラーニングモデルを導入する。このモデルは、CycleGANフレームワークにセマンティック識別器を組み込んで、スタイル転送プロセス中に重要な解剖学的詳細が保存されることを保証する。生成した合成画像は、セマンティックセグメンテーションモデルとロボット支援超音波スキャンシステムの開発のためのトレーニングデータセットを強化するために使用され、実際の超音波画像を正確に解析する能力を高める。 Ultrasound imaging is pivotal in various medical diagnoses due to its non-invasive nature and safety. In clinical practice, the accuracy and precision of ultrasound image analysis are critical. Recent advancements in deep learning are showing great capacity of processing medical images. However, the data hungry nature of deep learning and the shortage of high-quality ultrasound image training data suppress the development of deep learning based ultrasound analysis methods. To address these challenges, we introduce an advanced deep learning model, dubbed S-CycleGAN, which generates high-quality synthetic ultrasound images from computed tomography (CT) data. This model incorporates semantic discriminators within a CycleGAN framework to ensure that critical anatomical details are preserved during the style transfer process. The synthetic images produced are used to augment training datasets for semantic segmentation models and robot-assisted ultrasound scanning system development, enhancing their ability to accurately parse real ultrasound imagery.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# アダプティブ・アダプティブ・アダプティブ・アダプティブ・リニア・バンド Sparsity-Agnostic Linear Bandits with Adaptive Adversaries ( http://arxiv.org/abs/2406.01192v1 ) ライセンス: Link先を確認	Tianyuan Jin, Kyoungseok Jang, Nicolò Cesa-Bianchi,	(参考訳) 本研究では,各ラウンドで学習者が一組のアクション(特徴ベクトル)を受け取り,その要素を選択し,確率的報酬を得る確率的線形包帯について検討する。期待される報酬は、選択されたアクションの固定だが未知の線形関数である。線形報酬関数の非ゼロ係数数$S$に依存するスパース後悔境界について検討する。以前の作業は、$S$が知られている場合、またはアクションセットが追加の仮定を満たす場合に焦点を当てていた。本研究では、S$が未知で作用集合が逆生成されたときに保持される最初のスパース後悔境界を得る。我々の手法は、オンラインから信頼セットへの変換と、ネストされた信頼セットの階層上の新しいランダム化モデル選択アプローチを組み合わせる。 S$が知られているとき、我々の分析は、逆作用集合の最先端境界を回復する。また,我々の手法の変種であるExp3を用いて動的に信頼集合を選択することにより,確率線形帯域の経験的性能を向上し,時間的地平線への最適依存に縛られた後悔を享受できることを示す。 We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# AFF-ttention! 短期オブジェクトインタラクション予測のための予測モデルと注意モデル AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation ( http://arxiv.org/abs/2406.01194v1 ) ライセンス: Link先を確認	Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero, Giovanni Maria Farinella, Antonino Furnari,	(参考訳) 短期的オブジェクトインタラクション予測は、次のアクティブなオブジェクトの位置、対話の名詞と動詞のカテゴリ、および自我中心のビデオの観察から接触する時間を検出することで構成される。この能力は、ユーザの目標を理解するためのウェアラブルアシスタントやヒューマンロボットのインタラクションには基本的だが、正確で信頼性の高い方法でSTAを実行するための改善の余地はまだ残っている。本稿では,2つのコントリビューションによるSTA予測の性能向上について述べる。 1. STAformerは、フレームガイド付き時間プーリング、デュアルイメージビデオアテンション、マルチスケール機能融合を統合し、画像入力ビデオペアからのSTA予測をサポートする新しいアテンションベースアーキテクチャである。まず,特定の物理的場面で発生する相互作用の永続記憶として機能する環境割当モデルを統合する。第2に、手と物体の軌跡の観測から相互作用ホットスポットを予測し、ホットスポット周辺に局在したSTA予測に対する信頼性を高める。以上の結果より,EPIC-Kitchens STAラベルの新規セットでは,Ego4Dでは45%,EPIC-Kitchens STAでは42%,Top-5 mAPでは45%と有意に改善した。 Ego4D と EPIC- Kitchens のコード、アノテーション、事前抽出した価格を公表し、この分野の今後の研究を奨励します。 Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise and reliable way. In this work, we improve the performance of STA predictions with two contributions: 1. We propose STAformer, a novel attention-based architecture integrating frame guided temporal pooling, dual image-video attention, and multiscale feature fusion to support STA predictions from an image-input video pair. 2. We introduce two novel modules to ground STA predictions on human behavior by modeling affordances.First, we integrate an environment affordance model which acts as a persistent memory of interactions that can take place in a given physical scene. Second, we predict interaction hotspots from the observation of hands and object trajectories, increasing confidence in STA predictions localized around the hotspot. Our results show significant relative Overall Top-5 mAP improvements of up to +45% on Ego4D and +42% on a novel set of curated EPIC-Kitchens STA labels. We will release the code, annotations, and pre extracted affordances on Ego4D and EPIC- Kitchens to encourage future research in this area.	翻訳日:2024-06-06 01:28:45 公開日:2024-06-03
# セマンティックグラフアテンションネットワークと距離情報に基づく3次元全身電位推定 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information ( http://arxiv.org/abs/2406.01196v1 ) ライセンス: Link先を確認	Sihan Wen, Xiantan Zhu, Zhiming Tan,	(参考訳) 近年,3次元ポーズ推定のための多種多様な手法が提案されている。これらのうち、自己認識機構とグラフ畳み込みはどちらも効果的で実用的な方法であることが証明されている。これら2つの技法の強みを認識し,世界的文脈を捉える自己認識能力の恩恵を受けるとともに,骨格の局所的な接続性や構造的制約にグラフ畳み込みを利用するセマンティックグラフ注意ネットワークを開発した。また,身体の特定の部分に関する情報の抽出と精算を支援する身体部分デコーダを設計する。さらに,提案手法は距離情報を導入し,空間的関係を理解・正確に予測するモデルの能力を高める。最後に、体の構造的骨格に重要な制約を課し、モデルの予測が人間の姿勢の自然な限界に従うことを保証する幾何学的損失を導入する。実験の結果,提案手法の有効性を検証し,システム内のすべての要素がポーズ推定結果の改善に不可欠であることを実証した。最先端と比較して、提案された作業は適合するだけでなく、既存のベンチマークを超えている。 In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# 微細調整と多重回帰による多次元スコーリングの自動評価 Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression ( http://arxiv.org/abs/2406.01198v1 ) ライセンス: Link先を確認	Kun Sun, Rong Wang,	(参考訳) 自動エッセイスコア(英: Automated essay score、AES)とは、エッセイの筆記品質を反映したスコアの予測である。既存のAESシステムでは、スコアは1点のみである。しかし、ユーザとL2学習者は、現実世界の応用における英語エッセイに対する様々な次元(語彙、文法、コヒーレンスなど)のスコアを期待している。このニーズに対処するため、我々は2つの大きなデータセットに微調整と他の戦略を用いることで、複数の次元にわたる英語エッセイを自動的にスコアする2つのモデルを開発した。その結果, 精度, F1スコア, Quadratic Weighted Kappa の3つの基準を用いて, 評価において優れた性能が得られた。さらに,システム全体のスコアリングにおいて,既存の手法よりも優れています。 Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# ImageNet-1Kを超えるディープクラスタリングメソッドのスケールアップ Scaling Up Deep Clustering Methods Beyond ImageNet-1K ( http://arxiv.org/abs/2406.01203v1 ) ライセンス: Link先を確認	Nikolas Adaloglou, Felix Michels, Kaspar Senft, Diana Petrusheva, Markus Kollmann,	(参考訳) ディープイメージクラスタリング手法は通常、小規模のバランスの取れた分類データセットで評価されるが、機能ベースの$k$-meansはプロプライエタリな10億規模のデータセットで適用されている。本研究では、以下のデータ関連要因の影響を解消しつつ、大規模ベンチマークにおける機能ベースのディープクラスタリング手法の性能について検討する。 i) 階級不均衡二階級の粒度三容易に認識できるクラス、及び iv) 複数のクラスをキャプチャする機能。その結果,ImageNet21Kをベースとした複数のベンチマークが開発された。我々の実験分析によると、機能ベースの$k$-meansはバランスの取れたデータセットで不公平に評価されることが多い。しかし、ディープクラスタリング手法は、ほとんどの大規模ベンチマークで$k$-meansを上回っている。興味深いことに、$k$-meansはベンチマークの分類が容易でない。しかし、パフォーマンスのギャップはImageNet21Kのような最も高いデータレギュレータでは減少する。最後に、プライマリなクラスタ予測は意味のあるクラス(すなわち粗いクラス)をキャプチャする。 Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$-means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$-means across most large-scale benchmarks. Interestingly, $k$-means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on the highest data regimes such as ImageNet21K. Finally, we find that non-primary cluster predictions capture meaningful classes (i.e. coarser classes).	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# クーロンゲージにおける量子電磁力学の量子シミュレーション Quantum simulations of quantum electrodynamics in Coulomb gauge ( http://arxiv.org/abs/2406.01204v1 ) ライセンス: Link先を確認	Tianyin Li,	(参考訳) 近年では、従来のモンテカルロ格子ゲージ理論(LGT)シミュレーションの符号問題に量子計算法が用いられている。本稿では,LGTの量子シミュレーションにおいてクーロンゲージ(CG)を用いることを提案する。これは、CGにおいて冗長な自由度を排除できるためである。したがって、CG のハミルトニアンはゲージ不変性を必要としないので、ゲージ場をネーティブに微分することができる。離散化されたゲージ場とフェルミオン場はそれぞれ運動量と位置格子に置かれるべきである。このスキームの下では、CG条件とガウスの法則は偏極ベクトルの代数方程式を解くことで便利に保存できる。また、ゲージ場を量子ビットにマッピングする手順についても論じ、量子ビットの多項式スケーリングと時間発展の複雑さを実証する。最後に、U(1)プラケット演算子とWilsonループの真空期待値(VEV)を古典的なデバイス上で計算し、離散化方式の性能をテストする。 In recent years, the quantum computing method has been used to address the sign problem in traditional Monte Carlo lattice gauge theory (LGT) simulations. We propose that the Coulomb gauge (CG) should be used in quantum simulations of LGT. This is because the redundant degrees of freedom can be eliminated in CG. Therefore, the Hamiltonian in CG does not need to be gauge invariance, allowing the gauge field to be discretized naively. We point out that discretized gauge fields and fermion fields should be placed on momentum and position lattices, respectively. Under this scheme, the CG condition and Gauss's law can be conveniently preserved by solving algebraic equations of polarization vectors. We also discuss the procedure for mapping gauge fields to qubits, and then demonstrate the polynomial scaling of qubits and the complexity of time evolution. Finally, we calculate the vacuum expectation value (VEV) of the U(1) plaquette operator and the Wilson loop on a classical device to test the performance of our discretization scheme.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# ControlSpeech: Decoupled Codecによるゼロショット話者クローンとゼロショット言語スタイル制御の同時実現に向けて ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec ( http://arxiv.org/abs/2406.01205v1 ) ライセンス: Link先を確認	Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao,	(参考訳) 本稿では,音声音声の完全クローン化を実現し,数秒の音声プロンプトと簡単なテクスチャ記述プロンプトに基づいて,任意の音声スタイルの制御と調整が可能なTTS(Text-to-Speech)システムについて述べる。以前のゼロショットTSモデルとコントロール可能なTSモデルは、さらなる制御と調整機能なしでスピーカーの声を模倣することしかできず、スピーカー固有の音声生成とは無関係であった。そのため、ControlSpeechは、制御可能な音色、コンテンツ、スタイルを同時に備えた、より困難なタスク・ア・TSシステムにフォーカスしている。 ControlSpeechは、音声プロンプト、コンテンツプロンプト、スタイルプロンプトを入力として取り、双方向の注意とマスクベースの並列デコードを使用して、対応するコーデック表現を離散デカップリングコーデック空間でキャプチャする。さらに、多対多のマッピング方式でテキストスタイルの制御性の問題を発見し、この問題を解決するためにスタイル混合意味密度(SMSD)モデルを提案した。ガウス混合密度ネットワークに基づくSMSDモジュールは,スタイル意味情報の詳細な分割とサンプリング機能を強化し,より多様なスタイルで音声を生成するように設計されている。実験では、新しいスタイル制御可能なデータセット、いくつかの再現ベースラインモデルを備えた制御可能なモデルツールキット「ControlToolkit」を利用可能にするとともに、ControlSpeechにおける制御機能と生成オーディオの品質の両方を評価するための新しいメトリクスを提案する。関連するアブレーション研究は、制御音声における各成分の必要性を検証している。 ControlSpeechが、制御可能な音声合成の次の基盤パラダイムを確立できることを願っている。関連コードとデモはhttps://github.com/jishengpeng/ControlSpeech.comで公開されている。 In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and adjustment capabilities or were unrelated to speaker-specific voice generation. Therefore, ControlSpeech focuses on a more challenging new task-a TTS system with controllable timbre, content, and style at the same time. ControlSpeech takes speech prompts, content prompts, and style prompts as inputs and utilizes bidirectional attention and mask-based parallel decoding to capture corresponding codec representations in a discrete decoupling codec space. Moreover, we discovered the issue of text style controllability in a many-to-many mapping fashion and proposed the Style Mixture Semantic Density (SMSD) model to resolve this problem. SMSD module which is based on Gaussian mixture density networks, is designed to enhance the fine-grained partitioning and sampling capabilities of style semantic information and generate speech with more diverse styles. In terms of experiments, we make available a controllable model toolkit called ControlToolkit with a new style controllable dataset, some replicated baseline models and propose new metrics to evaluate both the control capability and the quality of generated audio in ControlSpeech. The relevant ablation studies validate the necessity of each component in ControlSpeech is necessary. We hope that ControlSpeech can establish the next foundation paradigm of controllable speech synthesis. The relevant code and demo are available at https://github.com/jishengpeng/ControlSpeech .	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# 言語横断的名前付きエンティティ認識のためのグローバルローカルDenoisingフレームワークによる擬似ラベルの改良 Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition ( http://arxiv.org/abs/2406.01213v1 ) ライセンス: Link先を確認	Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen,	(参考訳) NER (cross-lingual named entity recognition) は、ラベル付きソース言語データとラベルなしターゲット言語データのみを活用するターゲット言語のためのNERモデルをトレーニングすることを目的としている。従来のアプローチでは、翻訳されたソース言語データにラベルプロジェクションを実行するか、あるいはソースモデルを使用して、ターゲット言語データに擬似ラベルを割り当て、これらの擬似ラベルデータにターゲットモデルをトレーニングし、ターゲット言語に一般化する。しかし、これらの自動ラベリング手順は必然的にノイズのあるラベルを導入し、パフォーマンスが低下する。本稿では,言語間NERのためのGlobal-Local Denoising framework(GLoDe)を提案する。特に、GLoDeは、意味空間におけるグローバルな分布情報とローカルな分布情報を活用することによって、誤った擬似ラベルを正すプログレッシブデノケーション戦略を導入している。改良された擬似ラベル付きターゲット言語データにより、モデルの一般化能力が大幅に向上する。さらに,従来の手法では言語に依存しない特徴を用いたモデルの改良しか検討しなかったが,対象言語固有の特徴も重要であり,無視すべきではないと論じている。この目的を達成するために、我々は単純な補助的タスクを用いる。 6つのターゲット言語を持つ2つのベンチマークデータセットの実験結果から,提案したGLoDeは最先端の手法よりも優れていることが示された。 Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generalize to the target language. However, these automatic labeling procedures inevitably introduce noisy labels, thus leading to a performance drop. In this paper, we propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER. Specifically, GLoDe introduces a progressive denoising strategy to rectify incorrect pseudo labels by leveraging both global and local distribution information in the semantic space. The refined pseudo-labeled target language data significantly improves the model's generalization ability. Moreover, previous methods only consider improving the model with language-agnostic features, however, we argue that target language-specific features are also important and should never be ignored. To this end, we employ a simple auxiliary task to achieve this goal. Experimental results on two benchmark datasets with six target languages demonstrate that our proposed GLoDe significantly outperforms current state-of-the-art methods.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# ホップのような問題--現実世界の問題の新しい特徴の提示とモデル化 The hop-like problem nature -- unveiling and modelling new features of real-world problems ( http://arxiv.org/abs/2406.01215v1 ) ライセンス: Link先を確認	Michal W. Przewozniczek, Bartosz Frej, Marcin M. Komarnicki,	(参考訳) ベンチマークはオプティマイザの開発に不可欠なツールです。それらを使用すれば、任意のオプティマイザが有効かどうかを確認できる。 Evolutionary Computationフィールドの目的は、ハードで現実世界の問題を解決するツールをサポートするため、これらの特徴に類似したベンチマークは特に価値があるように思われる。そこで本研究では,最適化プロセスのホップ解析を提案する。この分析をNP-hard, large-scale real-world problemに適用する。その結果は、有名なリーディング・ワンズ問題の特徴のいくつかの存在を示唆している。これらの特徴をうまくモデル化するために,リードブロック問題(LBP)を提案する。 LBPは、検討された最先端遺伝アルゴリズム(GA)によってうまく扱えない新しいタイプのハード最適化問題を組み立てることができる。最後に, LBP と実世界の課題を解決しつつ, GAs の有効性を改善するためには, どのようなメカニズムを提案する必要があるかを明らかにする。 Benchmarks are essential tools for the optimizer's development. Using them, we can check for what kind of problems a given optimizer is effective or not. Since the objective of the Evolutionary Computation field is to support the tools to solve hard, real-world problems, the benchmarks that resemble their features seem particularly valuable. Therefore, we propose a hop-based analysis of the optimization process. We apply this analysis to the NP-hard, large-scale real-world problem. Its results indicate the existence of some of the features of the well-known Leading Ones problem. To model these features well, we propose the Leading Blocks Problem (LBP), which is more general than Leading Ones and some of the benchmarks inspired by this problem. LBP allows for the assembly of new types of hard optimization problems that are not handled well by the considered state-of-the-art genetic algorithm (GA). Finally, the experiments reveal what kind of mechanisms must be proposed to improve GAs' effectiveness while solving LBP and the considered real-world problem.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# 制約に基づく逆例合成 Constraint-based Adversarial Example Synthesis ( http://arxiv.org/abs/2406.01219v1 ) ライセンス: Link先を確認	Fang Yu, Ya-Yu Chi, Yu-Fang Chen,	(参考訳) 人工知能(AI)の急速な進歩の時代、ニューラルネットワークモデルは顕著なブレークスルーを達成した。しかし、敵の攻撃に対する脆弱性が懸念されている。この研究は、ニューラルネットワークを実装するPythonプログラムをテストするための特殊なテクニックであるConcolic Testingの強化に焦点を当てている。拡張ツールであるPyCTは、浮動小数点演算やアクティベーション関数計算など、幅広いニューラルネットワーク操作に対応している。予測経路の制約を体系的に生成することにより、潜在的敵対例の同定を容易にする。この研究は、様々なニューラルネットワークアーキテクチャにおける有効性を実証し、敵攻撃に対するPythonベースのニューラルネットワークモデルの脆弱性を強調している。この研究は、潜在的な敵対的脅威を検出し軽減するための堅牢なテスト手法の必要性を強調して、AIによるアプリケーションを保護することに貢献する。 Pythonの信頼性の高いアプリケーションのために、ニューラルネットワークモデルを強化する上で、厳格なテストテクニックの重要性を強調している。 In the era of rapid advancements in artificial intelligence (AI), neural network models have achieved notable breakthroughs. However, concerns arise regarding their vulnerability to adversarial attacks. This study focuses on enhancing Concolic Testing, a specialized technique for testing Python programs implementing neural networks. The extended tool, PyCT, now accommodates a broader range of neural network operations, including floating-point and activation function computations. By systematically generating prediction path constraints, the research facilitates the identification of potential adversarial examples. Demonstrating effectiveness across various neural network architectures, the study highlights the vulnerability of Python-based neural network models to adversarial attacks. This research contributes to securing AI-powered applications by emphasizing the need for robust testing methodologies to detect and mitigate potential adversarial threats. It underscores the importance of rigorous testing techniques in fortifying neural network models for reliable applications in Python.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# ゼロショットインコンテキスト学習のためのデモ強化 Demonstration Augmentation for Zero-shot In-context Learning ( http://arxiv.org/abs/2406.01224v1 ) ライセンス: Link先を確認	Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang,	(参考訳) 大規模言語モデル(LLM)は、ICL(In-context Learning)と呼ばれる印象的な機能を実証した。しかし、多くの研究は、モデルの性能がデモの選択に敏感であることを強調しており、ユーザクエリの事前知識が欠如している実用的なアプリケーションにとって重要な課題であることを示している。そのため、大規模な実証プールを構築し、モデルを支援するために外部データベースを組み込まなければならないため、かなりの時間と費用がかかる。これを踏まえて、最近の研究はゼロショットICLに焦点を移し、モデル固有の生成能力を活用して外部情報への依存を減らすことを目的としている。これらのアプローチの有効性にもかかわらず、モデルによって生成されたコンテンツは信頼できない可能性があり、生成プロセスは時間がかかる。これらの課題に対処するために,本モデルが予測した過去のサンプルをその後のサンプルの実証として用いたDAIL(Demonstration Augmentation for In-context Learning)を提案する。 DAILは追加の推論コストをもたらしず、モデルの生成能力に依存しない。実験の結果,DAILは直接ゼロショット推論よりもモデルの性能を著しく向上させることができ,外部情報なしに数発のICLよりも優れることがわかった。 Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we lack prior knowledge of user queries. Consequently, we need to construct an extensive demonstration pool and incorporate external databases to assist the model, leading to considerable time and financial costs. In light of this, some recent research has shifted focus towards zero-shot ICL, aiming to reduce the model's reliance on external information by leveraging their inherent generative capabilities. Despite the effectiveness of these approaches, the content generated by the model may be unreliable, and the generation process is time-consuming. To address these issues, we propose Demonstration Augmentation for In-context Learning (DAIL), which employs the model's previously predicted historical samples as demonstrations for subsequent ones. DAIL brings no additional inference cost and does not rely on the model's generative capabilities. Our experiments reveal that DAIL can significantly improve the model's performance over direct zero-shot inference and can even outperform few-shot ICL without any external information.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# AGALE: グラフ対応連続学習評価フレームワーク AGALE: A Graph-Aware Continual Learning Evaluation Framework ( http://arxiv.org/abs/2406.01229v1 ) ライセンス: Link先を確認	Tianqi Zhao. Alan Hanjalic. Megha Khosla,	(参考訳) 近年、連続学習(CL)技術は、連続的なタスク、特にユークリッドデータの領域における知識を維持しながら、ストリーミングデータからの学習において大きな進歩を遂げている。 CL設定における公平な評価の促進と課題の認識を目的として,ユークリッドデータの単一・複数ラベル分類タスクを中心に,いくつかの評価フレームワークが提案されている。しかし、これらの評価フレームワークは、グラフに固有のトポロジ構造を考慮しないため、入力データがグラフ構造である場合、簡単には適用できない。既存の連続グラフ学習(CGL)評価フレームワークは、ノード分類(NC)タスクにおける単一ラベルシナリオに重点を置いている。この焦点はマルチラベルシナリオの複雑さを見落としており、ノードは複数のラベルとのアフィリエイトを示し、同時に複数のタスクに参加することができる。単一ラベルノードと複数ラベルノードの両方に対応可能なグラフ対応評価フレームワーク(\agale)を開発し,従来の評価フレームワークの限界に対処する。特に、新たなインクリメンタル設定を定義し、CGLデータセットに適したデータパーティショニングアルゴリズムを考案する。本研究では,連続学習,連続グラフ学習,動的グラフ学習(DGL)の各分野の手法の比較実験を行った。理論的には \agale を解析し、比較手法の性能におけるホモフィリーの役割に関する新たな知見を提供する。私たちはフレームワークをhttps://github.com/Tianqi-py/AGALEでリリースします。 In recent years, continual learning (CL) techniques have made significant progress in learning from streaming data while preserving knowledge across sequential tasks, particularly in the realm of euclidean data. To foster fair evaluation and recognize challenges in CL settings, several evaluation frameworks have been proposed, focusing mainly on the single- and multi-label classification task on euclidean data. However, these evaluation frameworks are not trivially applicable when the input data is graph-structured, as they do not consider the topological structure inherent in graphs. Existing continual graph learning (CGL) evaluation frameworks have predominantly focussed on single-label scenarios in the node classification (NC) task. This focus has overlooked the complexities of multi-label scenarios, where nodes may exhibit affiliations with multiple labels, simultaneously participating in multiple tasks. We develop a graph-aware evaluation (\agale) framework that accommodates both single-labeled and multi-labeled nodes, addressing the limitations of previous evaluation frameworks. In particular, we define new incremental settings and devise data partitioning algorithms tailored to CGL datasets. We perform extensive experiments comparing methods from the domains of continual learning, continual graph learning, and dynamic graph learning (DGL). We theoretically analyze \agale and provide new insights about the role of homophily in the performance of compared methods. We release our framework at https://github.com/Tianqi-py/AGALE.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# ${\cal PT}=対称無調波発振器の厳密な量子化条件と全透過構造 Exact quantization conditions and full transseries structures for ${\cal PT}$ symmetric anharmonic oscillators ( http://arxiv.org/abs/2406.01230v1 ) ライセンス: Link先を確認	Syo Kamata,	(参考訳) V_{\cal PT}(x) = \omega^2 x^2 + g x^{2 K} (i x)^{\varepsilon}$ with $\omega \in {\mathbb R}_{\ge 0}$, $g \in {\mathbb R}_{>0}$ and $K, \varepsilon \in {\mathbb N}$である。分析では、主に質量のない場合、すなわち$\omega = 0$を検討し、任意の$(K,\varepsilon)$に対する正確な量子化条件(QC)を導出する。正確なQCから、エネルギースペクトルの逆エネルギー準位展開に関する全列構造を明らかにし、その後、グッツウィラートレース公式、スペクトル和形式、ユークリッドパス積分を定式化する。大規模の場合、すなわち$\omega > 0$ に対して、正確な QC の解の存在を要求することによって、EWKB における解析的連続の経路は与えられた$N = 2K + \varepsilon$ に対して一意に決定され、したがって、正確な QC 、エネルギースペクトル、および3つの公式はすべて摂動的であるという事実を示す。ヘルミタンのQMと復活の類似性も追加の発言として議論されている。 We study exact Wentzel-Kramers-Brillouin analysis (EWKB) for a ${\cal PT}$ symmetric quantum mechanics (QM) defined by the potential that $V_{\cal PT}(x) = \omega^2 x^2 + g x^{2 K} (i x)^{\varepsilon}$ with $\omega \in {\mathbb R}_{\ge 0}$, $g \in {\mathbb R}_{>0}$ and $K, \varepsilon \in {\mathbb N}$ to clarify its perturbative/non-perturbative structure. In our analysis, we mainly consider the massless cases, i.e., $\omega = 0$, and derive the exact quantization conditions (QCs) for arbitrary $(K,\varepsilon)$ including all perturbative/non-perturbative corrections. From the exact QCs, we clarify full transseries structure of the energy spectra with respect to the inverse energy level expansion, and then formulate the Gutzwiller trace formula, the spectral summation form, and the Euclidean path-integral. For the massive cases, i.e., $\omega > 0$, we show the fact that, by requiring existence of solution of the exact QCs, the path of analytic continuation in EWKB is uniquely determined for a given $N = 2K + \varepsilon$, and in consequence the exact QCs, the energy spectra, and the three formulas are all perturbative. Similarities to Hermitian QMs and resurgence are also discussed as additional remarks.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# 語彙的商品検索を改善する多語組込み Multi-word Term Embeddings Improve Lexical Product Retrieval ( http://arxiv.org/abs/2406.01233v1 ) ライセンス: Link先を確認	Viktor Shcherbakov, Fedor Krasnov,	(参考訳) 製品検索は、文書、インターネットリソース、または空白の検索とは異なるため、専門的な検索システムを開発する必要がある。本研究は,eコマースプラットフォームにおける製品記述のオフラインインデックス化を目的としたH1埋め込みモデルについて述べる。このモデルは、製品検索のための語彙的手法と意味的埋め込みに基づく手法の利点を取り入れたハイブリッド製品検索システムのフレームワーク内の他の最先端(SoTA)埋め込みモデルと比較される。検索インデックスのための意味的にリッチな用語語彙を構築するためのアプローチを提案する。他のプロダクションセマンティックモデルと比較すると、H1は複数の単語の製品用語を1つのトークンとして処理できるため、提案手法と組み合わせている。例えば、検索クエリの"new balance shoes"や"gloria jeans children wear"というブランドエンティティは、"new balance"、"gloria jeans"という1つのトークンとして表現される。これにより、リコールに影響を与えることなくシステムの精度が向上する。提案したモデルを用いたハイブリッドサーチシステムは、mAP@12 = 56.1%、R@1k = 86.6%をWADSの公開データセットでスコアし、他のSoTAアナログを上回ります。 Product search is uniquely different from search for documents, Internet resources or vacancies, therefore it requires the development of specialized search systems. The present work describes the H1 embdedding model, designed for an offline term indexing of product descriptions at e-commerce platforms. The model is compared to other state-of-the-art (SoTA) embedding models within a framework of hybrid product search system that incorporates the advantages of lexical methods for product retrieval and semantic embedding-based methods. We propose an approach to building semantically rich term vocabularies for search indexes. Compared to other production semantic models, H1 paired with the proposed approach stands out due to its ability to process multi-word product terms as one token. As an example, for search queries "new balance shoes", "gloria jeans kids wear" brand entity will be represented as one token - "new balance", "gloria jeans". This results in an increased precision of the system without affecting the recall. The hybrid search system with proposed model scores mAP@12 = 56.1% and R@1k = 86.6% on the WANDS public dataset, beating other SoTA analogues.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# 平均リワードMDPにおけるトラクタブルミニマックス最適レグレットの実現 Achieving Tractable Minimax Optimal Regret in Average Reward MDPs ( http://arxiv.org/abs/2406.01234v1 ) ライセンス: Link先を確認	Victor Boone, Zihan Zhang,	(参考訳) 近年, 平均回帰マルコフ決定過程(MDP)の学習に注目が集まっている。しかし、既存のアルゴリズムは、過度な後悔の保証や計算の非効率に悩まされている。本稿では、最小限の極小残差が$\widetilde{\mathrm{O}}(\sqrt{\mathrm{sp}(h^) S A T})$, ここで、$\mathrm{sp}(h^)$は最適バイアス関数$h^$, $S \times A$は状態-作用空間のサイズであり、学習ステップの回数は$T$である。注目すべきは、我々のアルゴリズムは$\mathrm{sp}(h^)$に関する事前情報を必要としないことである。我々のアルゴリズムは、バイアス制約された最適ポリシーを効率的に計算するために、新しいサブルーチンであるPMEVI(Projected Mitigated Extended Value Iteration)に依存している。このサブルーチンは、様々な過去のアルゴリズムに適用して、後悔の限界を改善することができる。 In recent years, significant attention has been directed towards learning average-reward Markov Decision Processes (MDPs). However, existing algorithms either suffer from sub-optimal regret guarantees or computational inefficiencies. In this paper, we present the first tractable algorithm with minimax optimal regret of $\widetilde{\mathrm{O}}(\sqrt{\mathrm{sp}(h^) S A T})$, where $\mathrm{sp}(h^)$ is the span of the optimal bias function $h^$, $S \times A$ is the size of the state-action space and $T$ the number of learning steps. Remarkably, our algorithm does not require prior information on $\mathrm{sp}(h^)$. Our algorithm relies on a novel subroutine, Projected Mitigated Extended Value Iteration (PMEVI), to compute bias-constrained optimal policies efficiently. This subroutine can be applied to various previous algorithms to improve regret bounds.	翻訳日:2024-06-06 01:18:57 公開日:2024-06-03
# EffiQA:知識グラフに基づく戦略的多モデルコラボレーションによる効率的な質問応答 EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs ( http://arxiv.org/abs/2406.01238v1 ) ライセンス: Link先を確認	Zixuan Dong, Baoyun Peng, Yufei Wang, Jia Fu, Xiaodong Wang, Yongxue Shan, Xin Zhou,	(参考訳) 大規模言語モデル(LLM)は自然言語処理において顕著な能力を示してきたが、知識グラフ(KG)を含む複雑な多段階推論タスクに苦慮している。 LLMとKGを統合する既存のアプローチは、LLMの推論能力の不足や、密結合による計算コストの制限に悩まされている。これらの制約に対処するため、反復的パラダイムを通じて性能と効率のバランスをとることができる、EffiQAという新しい協調フレームワークを提案する。 EffiQAは、グローバルプランニング、効率的なKG探査、自己回帰という3つの段階から構成される。特に、EffiQAはLLMのコモンセンス能力を活用し、グローバルプランニングを通じて潜在的推論経路を探索する。そして、効率的なKG探索のために、セマンティックプルーニングを小さなプラグインモデルにオフロードする。最後に, 探査結果を自己回帰のためにLLMに供給し, グローバルプランニングと効率的なKG探査をさらに改善する。複数のKBQAベンチマークに関する実証的な証拠は、EffiQAの有効性を示し、推論精度と計算コストの最適バランスを達成している。我々は、LLMとKGの統合を再定義し、知識に基づく質問応答に関する今後の研究を促進することにより、より効率的で知識集約的なクエリの道を開くことを期待する。 While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propose a novel collaborative framework named EffiQA that can strike a balance between performance and efficiency via an iterative paradigm. EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection. Specifically, EffiQA leverages the commonsense capability of LLMs to explore potential reasoning pathways through global planning. Then, it offloads semantic pruning to a small plug-in model for efficient KG exploration. Finally, the exploration results are fed to LLMs for self-reflection to further improve the global planning and efficient KG exploration. Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness, achieving an optimal balance between reasoning accuracy and computational costs. We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying by redefining the integration of LLMs and KGs, fostering future research on knowledge-based question answering.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# 非線形スペクトルフィルタを用いたグラフ上の等変機械学習 Equivariant Machine Learning on Graphs with Nonlinear Spectral Filters ( http://arxiv.org/abs/2406.01249v1 ) ライセンス: Link先を確認	Ya-Wei Eileen Lin, Ronen Talmon, Ron Levie,	(参考訳) 等変機械学習は、モデルの複雑さを減らし、一般化を改善することを目的として、問題の対称性を尊重するディープラーニングモデルを設計するためのアプローチである。本稿では,画像上の畳み込みネットワークの基盤であるシフト均衡の一般グラフへの拡張に焦点をあてる。画像とは異なり、グラフはドメイン翻訳という自然な概念を持っていない。したがって、グラフ汎函数シフトを対称性群、すなわちグラフシフト作用素と可換なユニタリ作用素と考える。特に、このような対称性は信号空間で直接空間でではなく、信号空間で機能する。標準スペクトルグラフニューラルネットワーク(GNN)の各線形フィルタ層はグラフ関数シフトと可換であるが、活性化関数はこの対称性を破る。代わりに、グラフ汎関数シフトに完全同値な非線形スペクトルフィルタ(NLSF)を提案し、それらが普遍近似特性を持つことを示す。提案したNLSFは、グラフ間で転送可能な新しいスペクトル領域に基づいている。ノードおよびグラフ分類ベンチマークにおいて、既存のスペクトルGNNよりもNLSFの方が優れた性能を示す。 Equivariant machine learning is an approach for designing deep learning models that respect the symmetries of the problem, with the aim of reducing model complexity and improving generalization. In this paper, we focus on an extension of shift equivariance, which is the basis of convolution networks on images, to general graphs. Unlike images, graphs do not have a natural notion of domain translation. Therefore, we consider the graph functional shifts as the symmetry group: the unitary operators that commute with the graph shift operator. Notably, such symmetries operate in the signal space rather than directly in the spatial space. We remark that each linear filter layer of a standard spectral graph neural network (GNN) commutes with graph functional shifts, but the activation function breaks this symmetry. Instead, we propose nonlinear spectral filters (NLSFs) that are fully equivariant to graph functional shifts and show that they have universal approximation properties. The proposed NLSFs are based on a new form of spectral domain that is transferable between graphs. We demonstrate the superior performance of NLSFs over existing spectral GNNs in node and graph classification benchmarks.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# DumpKV:LSM木におけるキーバリュー分離のための学習型生涯意識型ガベージコレクション DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree ( http://arxiv.org/abs/2406.01250v1 ) ライセンス: Link先を確認	Zhutao Zhuang, Xinqi Zeng, Zhiguang Chen,	(参考訳) キー\-値分離は、書き込み増幅を減らすために別々のログファイルに大きな値を格納するためにLSM\-treeで使用されるが、ガベージコレクションは無効な値をガベージコレクションするために必要である。 LSM\-treeの既存のガベージコレクション技術は、通常、ガベージコレクションに静的パラメータベースのガベージコレクションを採用する。 DumpKVは、動的ライフタイム調整による学習に基づくライフタイムアウェアメントガベージコレクションを導入し、効率の良いガベージコレクションを実現し、ライトアンプリフィケーションを低くする。 DumpKVは、キーの過去の書き込みアクセス情報に基づいて、さまざまなアプリケーションに適した機能を備えたトレーニングされた軽量モデルを使用して大きな値を管理し、各キーの寿命予測を行い、効率的なガベージコレクションを実現する。書き込みスループットに対する干渉を低減するため、DumpKVは、L0\-L1コンパクト化時に、LSM\-treeがKV分離下で小さいという事実を活用して特徴収集を行う。実験結果から,DumpKVは従来のキー値分離ガベージコレクション LSM\-tree ストアと比較して,書き込み増幅率を 38 %\-73 % 低下させることがわかった。 Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient garbage collection. To reduce interference to write throughput DumpKV conducts feature collection during L0\-L1 compaction leveraging the fact that LSM\-tree is small under KV separation. Experimental results show that DumpKV achieves lower write amplification by 38\%\-73\% compared to existing key\-value separation garbage collection LSM\-tree stores with small feature storage overhead.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# LLMのスケーラブルな自動アライメントに向けた調査 Towards Scalable Automated Alignment of LLMs: A Survey ( http://arxiv.org/abs/2406.01252v1 ) ライセンス: Link先を確認	Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu,	(参考訳) アライメントは、人間のニーズを満たす大規模言語モデル(LLM)を構築する上で最も重要なステップである。 LLMの急速な開発が徐々に人間の能力を超えていく中、人間のアノテーションに基づく従来のアライメント手法は、スケーラビリティの要求を満たすことができなくなっている。そのため、自動アライメント信号と技術的アプローチの新たな源を探究する必要がある。本稿では,最近の自動化アライメントの手法を体系的に検討し,LLMの能力が人間の能力を超えれば,効果的でスケーラブルで自動化アライメントを実現する方法について検討する。具体的には、既存の自動アライメント手法をアライメント信号の源泉に基づく4つの主要なカテゴリに分類し、各カテゴリの現状と潜在的な発展について論じる。さらに、自動アライメントを可能にするメカニズムについて検討し、アライメントの基本的役割から自動化アライメント技術を実現可能かつ効果的にするための重要な要因について議論する。 Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# animal2vecとMeerKAT: 希少な生オーディオ入力のための自己教師型トランスフォーマーとバイオ音響学のための大規模参照データセット animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics ( http://arxiv.org/abs/2406.01253v1 ) ライセンス: Link先を確認	Julian C. Schäfer-Zimmermann, Vlad Demartsev, Baptiste Averly, Kiran Dhanjal-Adams, Mathieu Duteil, Gabriella Gall, Marius Faiß, Lily Johnson-Ulrich, Dan Stowell, Marta B. Manser, Marie A. Roch, Ariana Strandburg-Peshkin,	(参考訳) 生物音響学的研究は、動物の行動、生態、保存に関する貴重な洞察を提供する。ほとんどのバイオ音響データセットは、声化のような興味のある出来事が極めて稀な長い記録で構成されている。これらのデータセットを分析することは、研究者にとって重要な課題であり、ディープラーニング技術が標準的手法として登場した。彼らの適応は依然として困難であり、コンピュータビジョンのために考案されたモデルに焦点を合わせ、そこではオーディオ波形を訓練と推論のための分光表現にエンジニアリングする。本稿では,生物音響学における深層学習の現状を2つの方法で改善する。まず,スパースおよびアンバランスな生体音響データに適した,完全に解釈可能なトランスフォーマーモデルと自己教師型トレーニングスキームであるAnimal2vecフレームワークを提示する。第二に、MeerKAT: Meerkat Kalahari Audio Transcriptsは、1068h以上の長さのメエルカット上に展開されたバイオログによって収集されたオーディオを含む大規模データセットである。さらに NIPS4Bplus Birdong データセットに対して animal2vec をベンチマークした。両データセットの最新の結果について報告し,ラベル付きトレーニングデータのAnimal2vecの少数ショット機能の評価を行った。最後に,人間の生成音に対するバニラ変圧器ベースラインとアーキテクチャの違いを明らかにするためのアブレーション研究を行った。 animal2vecは大量のバイオ音響データを分類できるさらに、MeerKATデータセットは、プリトレイン/ファイントゥンパラダイムでバイオ音響モデルのベンチマークを行うための最初の大規模ミリ秒分解能コーパスである。これはバイオ音響学の新しい基準点の舞台となると信じている。 Bioacoustic research provides invaluable insights into the behavior, ecology, and conservation of animals. Most bioacoustic datasets consist of long recordings where events of interest, such as vocalizations, are exceedingly rare. Analyzing these datasets poses a monumental challenge to researchers, where deep learning techniques have emerged as a standard method. Their adaptation remains challenging, focusing on models conceived for computer vision, where the audio waveforms are engineered into spectrographic representations for training and inference. We improve the current state of deep learning in bioacoustics in two ways: First, we present the animal2vec framework: a fully interpretable transformer model and self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. Second, we openly publish MeerKAT: Meerkat Kalahari Audio Transcripts, a large-scale dataset containing audio collected via biologgers deployed on free-ranging meerkats with a length of over 1068h, of which 184h have twelve time-resolved vocalization-type classes, each with ms-resolution, making it the largest publicly-available labeled dataset on terrestrial mammals. Further, we benchmark animal2vec against the NIPS4Bplus birdsong dataset. We report new state-of-the-art results on both datasets and evaluate the few-shot capabilities of animal2vec of labeled training data. Finally, we perform ablation studies to highlight the differences between our architecture and a vanilla transformer baseline for human-produced sounds. animal2vec allows researchers to classify massive amounts of sparse bioacoustic data even with little ground truth information available. In addition, the MeerKAT dataset is the first large-scale, millisecond-resolution corpus for benchmarking bioacoustic models in the pretrain/finetune paradigm. We believe this sets the stage for a new reference point for bioacoustics.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# 層正規化の非線形性について On the Nonlinearity of Layer Normalization ( http://arxiv.org/abs/2406.01255v1 ) ライセンス: Link先を確認	Yunhao Ni, Yuxin Guo, Junlong Jia, Lei Huang,	(参考訳) 層正規化 (Layer normalization, LN) はディープラーニングにおけるユビキタスな手法であるが, 我々の理論的理解はいまだ解明されていない。本稿では,LNの非線形性と表現能力に関する新たな理論的方向性について検討する。本稿では,LN-Netと呼ばれる線形およびLN変換を階層的に構成したネットワークの表現能力について検討する。理論的には、ラベル割り当てのある$m$サンプルが与えられた場合、各層に3つのニューロンしか持たないLN-Netと$O(m)$LN層がそれらを正しく分類できることが示される。さらに、LN-NetのVC次元の低い境界を示す。 LNの非線形性は群分割によって増幅することができ、これは理論上は軽微な仮定で示され、実験によって実証的に支持される。本研究は,LNの非線形性を利用してニューラルアーキテクチャを設計し,その有効性を実証することを目的としている。 Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically show that, given $m$ samples with any label assignment, an LN-Net with only 3 neurons in each layer and $O(m)$ LN layers can correctly classify them. We further show the lower bound of the VC dimension of an LN-Net. The nonlinearity of LN can be amplified by group partition, which is also theoretically demonstrated with mild assumption and empirically supported by our experiments. Based on our analyses, we consider to design neural architecture by exploiting and amplifying the nonlinearity of LN, and the effectiveness is supported by our experiments.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# リモートオブジェクトグラウンドニングのための強化コモンセンス知識 Augmented Commonsense Knowledge for Remote Object Grounding ( http://arxiv.org/abs/2406.01256v1 ) ライセンス: Link先を確認	Bahram Mohammadi, Yicong Hong, Yuankai Qi, Qi Wu, Shirui Pan, Javen Qinfeng Shi,	(参考訳) ヴィジュアル・アンド・ランゲージ・ナビゲーション(VLN)タスクは、エージェントが周囲を知覚し、自然言語の指示に従い、写真に写らない環境で行動するために必要となる。既存のメソッドのほとんどは、ナビゲート可能な視点を表すために、画像またはオブジェクトのすべての特徴を使用している。しかし、これらの表現は適切な行動予測には不十分であり、特に「主寝室に青いクッションをくれ」といった簡潔な指示を使うREVERIEタスクでは不十分である。エージェントナビゲーションを改善するための時空間知識グラフとして,コモンセンス情報を活用するための拡張コモンセンス知識モデル(ACK)を提案する。具体的には,ConceptNetからコモンセンス情報を検索して知識ベースを構築するとともに,ノイズや無関係な知識を除去するための改良モジュールを構築する。さらに、視覚的表現と視覚的テキストデータアライメントを強化するための知識グラフ対応クロスモーダルおよび概念集約モジュールからなるACKについて、オブジェクトと知識の時間情報を含む可視的オブジェクト、常識的知識、概念史を統合する。さらに,コモンセンスに基づく意思決定プロセスに新たなパイプラインを追加し,より正確な局所行動予測を実現する。実験結果は,提案モデルがベースラインを著しく上回り,REVERIEベンチマークで最先端のデータをアーカイブすることを示す。 The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task, which uses concise high-level instructions, such as ''Bring me the blue cushion in the master bedroom''. To address enhancing representation, we propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as a spatio-temporal knowledge graph for improving agent navigation. Specifically, the proposed approach involves constructing a knowledge base by retrieving commonsense information from ConceptNet, followed by a refinement module to remove noisy and irrelevant knowledge. We further present ACK which consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment by integrating visible objects, commonsense knowledge, and concept history, which includes object and knowledge temporal information. Moreover, we add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction. Experimental results demonstrate our proposed model noticeably outperforms the baseline and archives the state-of-the-art on the REVERIE benchmark.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# アンラーニングの難しさとそれについて何をすべきか What makes unlearning hard and what to do about it ( http://arxiv.org/abs/2406.01257v1 ) ライセンス: Link先を確認	Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, Peter Triantafillou,	(参考訳) 機械学習は、トレーニングされたモデルからトレーニングデータのサブセット('forget set'')の影響を取り除く問題であり、モデルのユーティリティegを損なうことなく、ユーザのデータ削除要求に応じたり、不正なラベル付き、有毒、その他の問題のあるデータを削除したりする。未学習の研究はまだ初期段階であり、多くの基本的なオープンな疑問が存在する: 問題の難しさに大きく影響する、忘れられた集合の解釈可能な特性は存在するか? これらの特徴は、異なる最先端アルゴリズムにどのように影響しますか? 本稿では,これらの疑問に答えることを目的とした最初の調査について述べる。未学習の難易度と未学習アルゴリズムの性能に影響を及ぼす2つの要因を同定する。これらの識別された要因を分離する左折集合の評価は、ランダムな左折集合を具現化しない最先端アルゴリズムの既知動作を明らかにする。我々の洞察に基づき、我々はRefined-Unlearning Meta-algorithm(RUM)というフレームワークを開発した。一異なる特徴により、左折集合を同質化された部分集合に精製すること。 (ii) 既存のアルゴリズムを用いて各サブセットを解放し、最終的に全体の左折集合を解放したモデルを提供するメタアルゴリズム。 RUMは、トップパフォーマンスの未学習アルゴリズムを大幅に改善する。全体として、私たちの仕事は重要なステップだと考えています。一未学習の科学的理解を深めること (II)最先端化に向けた新たな道筋を明らかにすること。 Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable characteristics of forget sets that substantially affect the difficulty of the problem? How do these characteristics affect different state-of-the-art algorithms? With this paper, we present the first investigation aiming to answer these questions. We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms. Evaluation on forget sets that isolate these identified factors reveals previously-unknown behaviours of state-of-the-art algorithms that don't materialize on random forget sets. Based on our insights, we develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set. We find that RUM substantially improves top-performing unlearning algorithms. Overall, we view our work as an important step in (i) deepening our scientific understanding of unlearning and (ii) revealing new pathways to improving the state-of-the-art.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# SCALLER: 標準セル集合と局所レイアウト効果に基づくリングオシレータ SCALLER: Standard Cell Assembled and Local Layout Effect-based Ring Oscillators ( http://arxiv.org/abs/2406.01258v1 ) ライセンス: Link先を確認	Muayad J. Aljafar, Zain Ul Abideen, Adriaan Peetermans, Benedikt Gierlichs, Samuel Pagliarini,	(参考訳) 本稿では,リングオシレータ(ROs)の周波数の微調整を可能にする手法を提案する。可変素子の数が異なる複数のROは65nmのCMOS技術で設計・製造された。調整可能な要素は、異なるローカルレイアウト効果(LLE)の下で2つのインバータと多重化器から構成される。 LLEはインバータの過渡応答に決定論的に影響を与え、大きなプロセス変動があっても微調整可能な機構を確立できる。 RO全体はデジタルであり、レイアウトは標準セル互換である。 80-900MHz帯の発振周波数測定と90KHz帯のチューニングステップによる多段ROのチューニング性を示す。 This letter presents a technique that enables very fine tunability of the frequency of Ring Oscillators (ROs). Multiple ROs with different numbers of tunable elements were designed and fabricated in a 65nm CMOS technology. A tunable element consists of two inverters under different local layout effects (LLEs) and a multiplexer. LLEs impact the transient response of inverters deterministically and allow to establish a fine tunable mechanism even in the presence of large process variation. The entire RO is digital and its layout is standard-cell compatible. We demonstrate the tunability of multi-stage ROs with post-silicon measurements of oscillation frequencies in the range of 80-900MHz and tuning steps of 90KHz	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# FreeTumor: 大規模腫瘍合成による進行腫瘍分離 FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis ( http://arxiv.org/abs/2406.01264v1 ) ライセンス: Link先を確認	Linshan Wu, Jiaxin Zhuang, Xuefeng Ni, Hao Chen,	(参考訳) AIによる腫瘍分析は、医療の注目を集めている。しかし、その進歩は、放射線学者が収集とアノテーションに多くの労力を費やす必要がある注釈付き腫瘍の欠如によって著しく妨げられている。本稿では, アノテーションのない合成腫瘍と, 腫瘍に罹患した患者を自由にしたいという願望を指すFreeTumorという, 堅牢な腫瘍合成とセグメンテーションのための, 極めて実用的なソリューションについて紹介する。高度な技術合成モジュールを追求する代わりに、我々は大規模データのパワーを解き放つために、単純で効果的な腫瘍合成パラダイムを設計することを目指している。特に,FreeTumorは,既存の手法を主に3つの側面から進めている。(1)既存の手法は,異なるソースからの未確認データに対して,十分に一般化する能力を制限した,小規模なラベル付きデータのみを合成訓練に活用する。そこで本研究では, 大規模かつ多種多様な未標識データを合成訓練に活用し, 腫瘍合成を著しく改善する逆行訓練戦略を提案する。 2) 既存の方法は, セグメンテーション訓練における低品質合成腫瘍の負の影響を無視するものであった。そこで我々は,低品質な合成腫瘍を自動的に除去し,その悪影響を効果的に軽減するために,対向型判別器を用いた。既存の方法では腫瘍の分節に数百の症例しか使われなかった。 FreeTumorでは,腫瘍セグメント化におけるデータスケーリングの法則について,データセットを1万1千件までスケールアップすることで検討する。大規模な実験では、3つの腫瘍セグメンテーションベンチマークにおけるFreeTumorの優位性、実際の腫瘍のみを使用するベースラインに対する平均$+8.9\%、最先端の腫瘍合成法に対する$+6.6\%のDSCが示されている。コードは利用可能です。 AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synthetic tumors and our desire to free patients that suffering from tumors. Instead of pursuing sophisticated technical synthesis modules, we aim to design a simple yet effective tumor synthesis paradigm to unleash the power of large-scale data. Specifically, FreeTumor advances existing methods mainly from three aspects: (1) Existing methods only leverage small-scale labeled data for synthesis training, which limits their ability to generalize well on unseen data from different sources. To this end, we introduce the adversarial training strategy to leverage large-scale and diversified unlabeled data in synthesis training, significantly improving tumor synthesis. (2) Existing methods largely ignored the negative impact of low-quality synthetic tumors in segmentation training. Thus, we employ an adversarial-based discriminator to automatically filter out the low-quality synthetic tumors, which effectively alleviates their negative impact. (3) Existing methods only used hundreds of cases in tumor segmentation. In FreeTumor, we investigate the data scaling law in tumor segmentation by scaling up the dataset to 11k cases. Extensive experiments demonstrate the superiority of FreeTumor, e.g., on three tumor segmentation benchmarks, average $+8.9\%$ DSC over the baseline that only using real tumors and $+6.6\%$ DSC over the state-of-the-art tumor synthesis method. Code will be available.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# グラッドCAM期待:勾配忠実化に向けて Expected Grad-CAM: Towards gradient faithfulness ( http://arxiv.org/abs/2406.01274v1 ) ライセンス: Link先を確認	Vincenzo Buono, Peyman Sheikholharam Mashhadi, Mahmoud Rahat, Prayag Tiwari, Stefan Byttner,	(参考訳) インプット・グラディエント・テクニックは勾配に関する課題を緩和し対処するために進化してきたが、現代の勾配重み付けCAMアプローチは、飽和現象に本質的に影響を受けやすいバニラ勾配に依存している。近年の強化は、緩和策として反ファクト的勾配戦略を取り入れているが、これらの局所的な説明手法は、その基準パラメータに対する感度の欠如をまだ示している。本研究は,勾配計算を再構成することで,飽和度と感度の両問題に対処する勾配重み付きCAM拡張法を提案する。元の定式化を摂動積分勾配の滑らかな期待として再考することにより、不完全性を最小化するより忠実で局所的で堅牢な説明を同時に構築することができる。摂動分布の微調整により、説明の複雑さ特性を制御し、安定な特徴を選択的に識別することができる。近年のGrad-CAMとは違って,本手法は,基礎的なGrad-CAMアルゴリズムの代替として設計された勾配計算を最適化する。本手法の有効性を評価するため, 定量的, 質的な評価を行った。 Although input-gradients techniques have evolved to mitigate and tackle the challenges associated with gradients, modern gradient-weighted CAM approaches still rely on vanilla gradients, which are inherently susceptible to the saturation phenomena. Despite recent enhancements have incorporated counterfactual gradient strategies as a mitigating measure, these local explanation techniques still exhibit a lack of sensitivity to their baseline parameter. Our work proposes a gradient-weighted CAM augmentation that tackles both the saturation and sensitivity problem by reshaping the gradient computation, incorporating two well-established and provably approaches: Expected Gradients and kernel smoothing. By revisiting the original formulation as the smoothed expectation of the perturbed integrated gradients, one can concurrently construct more faithful, localized and robust explanations which minimize infidelity. Through fine modulation of the perturbation distribution it is possible to regulate the complexity characteristic of the explanation, selectively discriminating stable features. Our technique, Expected Grad-CAM, differently from recent works, exclusively optimizes the gradient computation, purposefully designed as an enhanced substitute of the foundational Grad-CAM algorithm and any method built therefrom. Quantitative and qualitative evaluations have been conducted to assess the effectiveness of our method.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# 未知の因子を持つリフティング係数グラフ Lifting Factor Graphs with Some Unknown Factors ( http://arxiv.org/abs/2406.01275v1 ) ライセンス: Link先を確認	Malte Luttermann, Ralf Möller, Marcel Gehrke,	(参考訳) リフティングは確率的グラフィカルモデルにおいて、識別不能なオブジェクトの代用体を用いて対称性を利用しており、正確な答えを維持しながらより効率的にクエリ応答を実行することができる。本稿では,ポテンシャルが不明な因子を含む因子グラフに対して,昇降法によって確率的推論を行う方法について検討する。本稿では,未知の因子を含む因子グラフの対称部分グラフを同定するLIFAGU (Lifting Factor Graphs with Some Unknown Factors) アルゴリズムを提案する。 Lifting exploits symmetries in probabilistic graphical models by using a representative for indistinguishable objects, allowing to carry out query answering more efficiently while maintaining exact answers. In this paper, we investigate how lifting enables us to perform probabilistic inference for factor graphs containing factors whose potentials are unknown. We introduce the Lifting Factor Graphs with Some Unknown Factors (LIFAGU) algorithm to identify symmetric subgraphs in a factor graph containing unknown factors, thereby enabling the transfer of known potentials to unknown potentials to ensure a well-defined semantics and allow for (lifted) probabilistic inference.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# fruit-SALAD:画像埋め込みにおける類似性知覚を明らかにするスタイルアラインアートワークデータセット fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings ( http://arxiv.org/abs/2406.01278v1 ) ライセンス: Link先を確認	Tillmann Ohm, Andres Karjus, Mikhail Tamm, Maximilian Schich,	(参考訳) 視覚的類似性の概念は、コンピュータビジョン、および画像のベクトル埋め込みに関する応用と研究に不可欠である。しかしながら、ベンチマークデータセットの不足は、これらのモデルが類似性をどう認識するかを調査する上で、大きなハードルとなっている。ここではSALAD(Style Aligned Artwork Datasets)を紹介する。このセマンティックなカテゴリとスタイルのベンチマークは、10の区別容易なスタイルに対して、10の認識容易なフルーツカテゴリのそれぞれ100のインスタンスで構成されている。生成画像合成の体系的なパイプラインを活用することで、この視覚的に多様だがバランスの取れたベンチマークは、機械学習モデル、特徴抽出アルゴリズム、複雑性測定、参照の概念モデルなど、さまざまな計算モデルにおけるセマンティックなカテゴリとスタイルの類似性重みの顕著な相違を示す。この綿密に設計されたデータセットは、類似性知覚の比較分析のための制御されバランスの取れたプラットフォームを提供する。 SALADフレームワークは、これらのモデルがどのようにセマンティックなカテゴリとスタイル認識タスクを実行するかを比較して、逸話的知識のレベルを超え、堅牢な定量化と質的な解釈を可能にする。 The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALADs), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# 双曲型ニューラルPDEによる連続幾何学的グラフ拡散 Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE ( http://arxiv.org/abs/2406.01282v1 ) ライセンス: Link先を確認	Jiaxu Liu, Xinping Yi, Sihao Wu, Xiangyu Yin, Tianle Zhang, Xiaowei Huang, Jin Shi,	(参考訳) Hyperbolic Graph Neural Network (HGNN)は最近、階層グラフデータを扱う強力なツールとして登場したが、スケーラビリティと効率性の限界により、より深いモデルへの一般化が妨げられている。本稿では,HGNNを分割し,情報伝達を偏微分方程式として再構成することにより,ハイパーボリック・ニューラルPDE(HPDE)における拡散度の役割をノードの注意に委ねる。 HPDE積分のための非ユークリッド多様体上での場と流れ、勾配、発散、および拡散率の理論的原理を導入することにより、数値HPDE解法を定式化するための暗黙的および明示的な離散化スキームを議論する。さらに,ハイパーボリックグラフ拡散方程式 (HGDE) を提案する。埋め込みのポテンシャルエネルギー減衰を解析することにより、HGDEは局所的な拡散関数の利点により、低次および高次近接の両方をモデル化できることを示した。ノード分類およびリンク予測および画像テキスト分類タスクの実験は、提案手法の優位性を検証する。 While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting node-wise attention undertake the role of diffusivity within the Hyperbolic Neural PDE (HPDE). By introducing theoretical principles \textit{e.g.,} field and flow, gradient, divergence, and diffusivity on a non-Euclidean manifold for HPDE integration, we discuss both implicit and explicit discretization schemes to formulate numerical HPDE solvers. Further, we propose the Hyperbolic Graph Diffusion Equation (HGDE) -- a flexible vector flow function that can be integrated to obtain expressive hyperbolic node embeddings. By analyzing potential energy decay of embeddings, we demonstrate that HGDE is capable of modeling both low- and high-order proximity with the benefit of local-global diffusivity functions. Experiments on node classification and link prediction and image-text classification tasks verify the superiority of the proposed method, which consistently outperforms various competitive models by a significant margin.	翻訳日:2024-06-06 01:09:07 公開日:2024-06-03
# コアに焦点をあてる: 文書分類のためのPruned Token Compressionによる効率的な注意力 Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification ( http://arxiv.org/abs/2406.01283v1 ) ライセンス: Link先を確認	Jungmin Yun, Mihyeon Kim, Youngbin Kim,	(参考訳) トランスフォーマーベースのモデルは、多くのNLPタスクにおいて、主要なパフォーマンスを実現している。彼らの顕著な成功にもかかわらず、BERTのような事前訓練されたトランスフォーマーは、分類性能に好ましくないものを含む全てのトークンと相互作用する計算的に高価な自己保持機構に悩まされている。これらの課題を克服するために、トークンプルーニングとトークンの組み合わせという2つの戦略を統合することを提案する。トケンプルーニングは、アテンションメカニズムのキーと値において、レイヤを通過するときに重要でないトークンを排除します。さらに,不確実性に対処するファジィ論理を採用し,各トークンの重要度の不均衡分布から生じる潜在的な誤計算リスクを軽減する。一方、入力シーケンスをより小さなサイズに縮合させ、モデルをさらに圧縮する。これら2つのアプローチを統合することで、モデルの性能を向上するだけでなく、計算要求を減らすことができる。様々なデータセットを用いた実験は、ベースラインモデルよりも優れた性能を示し、特に既存のBERTモデルよりも優れた改善をしており、精度は+5%p、F1スコアは+5.6%である。さらに、メモリコストを0.61倍に削減し、1.64倍のスピードアップを実現する。 Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's performance but also reduce its computational demands. Experiments with various datasets demonstrate superior performance compared to baseline models, especially with the best improvement over the existing BERT model, achieving +5%p in accuracy and +5.6%p in F1 score. Additionally, memory cost is reduced to 0.61x, and a speedup of 1.64x is achieved.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# Recommender システムとしての大規模言語モデル:大衆バイアスの研究 Large Language Models as Recommender Systems: A Study of Popularity Bias ( http://arxiv.org/abs/2406.01285v1 ) ライセンス: Link先を確認	Jan Malte Lichtenberg, Alexander Buchholz, Pola Schwöbel,	(参考訳) 人気アイテムが不均等に推奨され、人気度が低かったり、関連性の高いアイテムを誇張するという人気バイアスの問題は、レコメンデーターシステムにおいて大きな課題となっている。近年,汎用大規模言語モデル (LLM) のアーキテクチャへの統合が進んでいる。この統合は、LLMのトレーニングデータが人気のあるアイテムに支配されている可能性が高いことを考えると、人気バイアスを悪化させる可能性があるという懸念を提起する。しかし、即時チューニングによってバイアスに対処する新たな機会を同時に提示する。本研究は,LLMがレコメンデーションシステムにおける人気バイアスに寄与するか,緩和するかを,この二分法について検討する。既存のメトリクスについて議論し、一連のデシラタを満たす新しいメトリクスを提案することによって、人気バイアスを測定するための原則的手法を導入する。新しい基準に基づいて,映画推薦作業における従来のレコメンデータシステムと,シンプルなLLMベースのレコメンデータを比較した。 LLMレコメンデータは, 明示的な緩和を伴わずとも, 人気バイアスが低いことが判明した。 The issue of popularity bias -- where popular items are disproportionately recommended, overshadowing less popular but potentially relevant items -- remains a significant challenge in recommender systems. Recent advancements have seen the integration of general-purpose Large Language Models (LLMs) into the architecture of such systems. This integration raises concerns that it might exacerbate popularity bias, given that the LLM's training data is likely dominated by popular items. However, it simultaneously presents a novel opportunity to address the bias via prompt tuning. Our study explores this dichotomy, examining whether LLMs contribute to or can alleviate popularity bias in recommender systems. We introduce a principled way to measure popularity bias by discussing existing metrics and proposing a novel metric that fulfills a series of desiderata. Based on our new metric, we compare a simple LLM-based recommender to traditional recommender systems on a movie recommendation task. We find that the LLM recommender exhibits less popularity bias, even without any explicit mitigation.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 改良されたFew-Shot Jailbreakは、言語モデルとその防御を回避できる Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses ( http://arxiv.org/abs/2406.01288v1 ) ライセンス: Link先を確認	Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Jing Jiang, Min Lin,	(参考訳) 最近、Anil et al (2024) は、多数のデモ(最大数百のデモ)が、その長いコンテキスト能力を利用して最先端のLLMをジェイルブレイクできることを示した。それでも、限られたコンテキストサイズでLLMを効率的にジェイルブレイクするために、数発のデモを使用することは可能ですか? バニラ数発のジェイルブレイクは効率が悪いかもしれないが、我々は[/INST]のような特別なシステムトークンを注入したり、収集されたデモプールからデモレベルのランダム検索を採用するなど、改良された手法を提案する。これらの単純な技術は、(高度な防御でも)整列したLDMに対して驚くほど効果的なジェイルブレイクをもたらす。例えば, Llama-2-7B と Llama-3-8B の ASR は, パープレキシティ検出や SmoothLLM などの強力な防御力によって強化されたとしても, 再起動を伴わない 80% 以上 (95% 以上) の ASR が達成される。さらに,システムプロンプトを正しく使用するための総合的かつ精巧な評価(例えば,システムプロンプトの正しい使用)を,他のLLMや先進防衛に対して実施する。私たちのコードはhttps://github.com/sail-sg/I-FSJ.comで利用可能です。 Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting special system tokens like [/INST] and employing demo-level random search from a collected demo pool. These simple techniques result in surprisingly effective jailbreaking against aligned LLMs (even with advanced defenses). For examples, our method achieves >80% (mostly >95%) ASRs on Llama-2-7B and Llama-3-8B without multiple restarts, even if the models are enhanced by strong defenses such as perplexity detection and/or SmoothLLM, which is challenging for suffix-based jailbreaking. In addition, we conduct comprehensive and elaborate (e.g., making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs. Our code is available at https://github.com/sail-sg/I-FSJ.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 資源制約フェアネス Resource-constrained Fairness ( http://arxiv.org/abs/2406.01290v1 ) ライセンス: Link先を確認	Sofie Goethals, Eoin Delaney, Brent Mittelstadt, Chris Russell,	(参考訳) リソースへのアクセスは、決定を強く制約します。学生全員に奨学金を提供したい、あるいは専門家とのフォローアップミーティングのために患者全員をスケジュールしたいと思うかもしれませんが、リソースは限られているため、これは不可能です。公正な機械学習のための既存のツールは、これらの重要な制約を無視しており、ほとんどのメソッドは、決定が下される有限のリソース制限を無視している。本研究は,「資源制約公正性」の概念を導入し,この枠組みにおける公正性のコストを定量化する。利用可能な資源のレベルがこのコストに大きく影響することを示し、これは過去の評価で見過ごされてきた要素である。 Access to resources strongly constrains the decisions we make. While we might wish to offer every student a scholarship, or schedule every patient for follow-up meetings with a specialist, limited resources mean that this is not possible. Existing tools for fair machine learning ignore these key constraints, with the majority of methods disregarding any finite resource limitations under which decisions are made. Our research introduces the concept of ``resource-constrained fairness" and quantifies the cost of fairness within this framework. We demonstrate that the level of available resources significantly influences this cost, a factor that has been overlooked in previous evaluations.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 単光子検出による安定したキャリブレーションを有する時間-デジタル変換器 A time-to-digital converter with steady calibration through single-photon detection ( http://arxiv.org/abs/2406.01293v1 ) ライセンス: Link先を確認	Matías Rubén Bolaños Wagner, Daniele Vogrig, Paolo Villoresi, Giuseppe Vallone, Andrea Stanco,	(参考訳) タイム・トゥ・デジタル・コンバータ(TDC)は幅広い分野、特に量子通信において重要なツールである。近年,FPGA ベースの TDC は ASIC の代替として有効なものとなっている。それらの非線形性を補うためには、通常補間法に基づく校正手順が必要である。ここではFPGAベースで27psの残留ジッタを示すTDCの設計と実演について述べる。量子鍵分布(Quantum Key Distribution, QKD)の応用は、データ取得の停止や補間方法の使用を必要としない単一光子検出の活用に基づくユニークなキャリブレーション法により議論され、精度が向上し、データ損失が除去される。キャリブレーションは, 5{\deg}Cと80{\deg}Cの間の装置挙動を調べた。さらに,TDCオーバーフローを伴わずに,12Mevents/sまで最大1週間連続的にストリーミングすることが可能である。 Time-to-Digital Converters (TDCs) are a crucial tool in a wide array of fields, in particular for quantum communication, where time taggers performance can severely affect the quality of the entire application. Nowadays, FPGA-based TDCs present a viable alternative to ASIC ones, once the nonlinear behaviour due to the intrinsic nature of the device is properly mitigated. To compensate said nonlinearities, a calibration procedure is required, usually based on an interpolation methods. Here we present the design and the demonstration of a TDC that is FPGA-based and showing a residual jitter of 27 ps, that is scalable for multichannel operation. The application in Quantum Key Distribution (QKD) is discussed with a unique calibration method based on the exploitation of single-photon detection that does not require to stop the data acquisition or to use any interpolation methods, thus increasing accuracy and removing data loss. The calibration was tested in a relevant environment, investigating the device behaviour between 5{\deg}C and 80{\deg}C. Moreover, our design is capable of continuously streaming up to 12 Mevents/s for up to ~1 week without the TDC overflowing.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 水中画像再構成のためのカプセル型変分オートエンコーダ Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction ( http://arxiv.org/abs/2406.01294v1 ) ライセンス: Link先を確認	Rita Pucci, Niki Martinel,	(参考訳) 水中画像解析は海洋モニタリングに不可欠である。しかし、それは2つの大きな課題を提示します。一波長依存性の光減衰、散乱、水の種類により、画像の視覚的品質が劣化することがしばしばある。 (II)高解像度画像のキャプチャと保存はハードウェアによって制限されており、長期の環境分析を妨げている。近年,自律型水中画像取得システムの限界によって生じる課題を無視しながら,水中強調のためにディープニューラルネットワークが導入されている。本稿では,ベクトル量子化変分オートエンコーダ(\myVQVAE)の離散的特徴量化アプローチから着想を得て,両問題に共同で取り組む新しいアーキテクチャを提案する。我々のモデルは、入力を潜在表現に圧縮する符号化ネットワークと、2つの独立デコードネットワークを組み合わせることで、潜在表現のみを使用して画像の強調と再構成を行う。 1つのデコーダは空間情報に焦点を当て、もう1つのデコーダはカプセルの概念を利用して画像内のエンティティに関する情報をキャプチャする。カプセル層の使用により、特定の最適化トリックを必要とせずに、ソリューションをエンドツーエンドでトレーニング可能にするために、‘myVQVAE’という差別化の問題を克服する。カプセルは、完全に微分可能な方法で特徴量化を行う。コントリビューションの有効性を評価するため、6つのベンチマークデータセットに対して、徹底的な定量的および定性的な評価を行った。その結果、既存のメソッド(例えば、LSUI Test-L400データセットで約1.4dB$のゲイン)よりもパフォーマンスが優れており、データストレージに必要なスペース(つまり、$3\times$より効率的に)を大幅に削減しています。 Underwater image analysis is crucial for marine monitoring. However, it presents two major challenges (i) the visual quality of the images is often degraded due to wavelength-dependent light attenuation, scattering, and water types; (ii) capturing and storing high-resolution images is limited by hardware, which hinders long-term environmental analyses. Recently, deep neural networks have been introduced for underwater enhancement yet neglecting the challenge posed by the limitations of autonomous underwater image acquisition systems. We introduce a novel architecture that jointly tackles both issues by drawing inspiration from the discrete features quantization approach of Vector Quantized Variational Autoencoder (\myVQVAE). Our model combines an encoding network, that compresses the input into a latent representation, with two independent decoding networks, that enhance/reconstruct images using only the latent representation. One decoder focuses on the spatial information while the other captures information about the entities in the image by leveraging the concept of capsules. With the usage of capsule layers, we also overcome the differentiabilty issues of \myVQVAE making our solution trainable in an end-to-end fashion without the need for particular optimization tricks. Capsules perform feature quantization in a fully differentiable manner. We conducted thorough quantitative and qualitative evaluations on 6 benchmark datasets to assess the effectiveness of our contributions. Results demonstrate that we perform better than existing methods (eg, about $+1.4dB$ gain on the challenging LSUI Test-L400 dataset), while significantly reducing the amount of space needed for data storage (ie, $3\times$ more efficient).	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# LLMの誤りはいつ修正できるか? LLMの自己補正の批判的調査 When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs ( http://arxiv.org/abs/2406.01297v1 ) ライセンス: Link先を確認	Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang,	(参考訳) 自己補正(Self-correction)は、LLMを用いて推論中に応答を精製することで、大きな言語モデル(LLM)からの応答を改善するアプローチである。これまでの作業では,自己評価や外部からのフィードバックなど,さまざまなフィードバック源を用いたさまざまな自己補正フレームワークが提案されていた。しかし、最近の研究では否定的な結果も報告されているため、LLMが自身の誤りをいつ修正できるかについては、まだ合意が得られていない。本研究では,幅広い論文を批判的に調査し,自己補正を成功させるために必要な条件について議論する。まず、先行研究は、しばしば研究の問題を詳細に定義せず、自己補正を過度に評価する非現実的な枠組みや不公平な評価を伴わない。これらの課題に対処するため、自己補正研究における研究課題を分類し、適切な実験を設計するためのチェックリストを提供する。本研究の批判的調査は,(1)先行研究が一般タスクにおける LLM からのフィードバックによる自己補正を成功させていないこと,(2) 信頼性の高い外部フィードバックを活用可能なタスクにおいて自己補正がうまく機能していること,(3) 大規模微調整が自己補正を可能にすることを示唆している。 Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback, including self-evaluation and external feedback. However, there is still no consensus on the question of when LLMs can correct their own mistakes, as recent studies also report negative results. In this work, we critically survey broad papers and discuss the conditions required for successful self-correction. We first find that prior studies often do not define their research questions in detail and involve impractical frameworks or unfair evaluations that over-evaluate self-correction. To tackle these issues, we categorize research questions in self-correction research and provide a checklist for designing appropriate experiments. Our critical survey based on the newly categorized research questions shows that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs in general tasks, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 量子幾何学的光・物質結合と表面偏光子によるほぼ平らなバンドのフロケット工学 Floquet engineering nearly flat bands through quantum-geometric light-matter coupling with surface polaritons ( http://arxiv.org/abs/2406.01298v1 ) ライセンス: Link先を確認	Mikołaj Walicki, Christian J. Eckhardt, Michael A. Sentef,	(参考訳) 非自明な量子幾何学を具備したほぼ平らなバンドを包含する最小限のモデルであるソートゥース・チェーンにおけるフロケの工学を、駆動された表面偏光子と組み合わせて検討する。このパラダイム的フラットバンドモデルでは、バンド速度とバンド曲率の消失にもかかわらず、量子幾何により、フラットバンドへの光物質結合が実現される。偏光性条件下での光偏光と有限運動量移動は、自由空間でレーザーパルスで達成できる範囲を超えて、時として急激なバンド構造の変化を伴って、平坦なバンドや不平坦なバンドに十分な調整性をもたらすことを示す。原型フラットバンドモワールやかごめ材料における光駆動現象の可能性について論じる。 We investigate Floquet engineering in a sawtooth chain -- a minimal model hosting a nearly flat band endowed with nontrivial quantum geometry -- coupled to driven surface polaritons. In this paradigmatic flat band model, light-matter coupling to a flat band is enabled by quantum geometry despite the vanishing band velocity and band curvature. We show that light polarization and finite momentum transfer in polaritonic settings provide sufficient tunability to flatten or unflatten bands, with sometimes drastic band structure modifications beyond what is attainable with laser pulses in free space. Possible implications for light-driven phenomena in prototypical flat-band moir\'e or kagome materials are discussed.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 運動規則化器によるニューラルネットワークによる動的CT画像再構成の高速化 Enhancing Dynamic CT Image Reconstruction with Neural Fields Through Explicit Motion Regularizers ( http://arxiv.org/abs/2406.01299v1 ) ライセンス: Link先を確認	Pablo Arratia, Matthias Ehrhardt, Lisa Kreusser,	(参考訳) 高度にアンサンプされたデータによる動的逆問題に対する画像再構成は、プロセスのダイナミクスを考慮しないことは、時間的規則性のない非現実的な動きにつながるという大きな課題を生じさせる。時間微分をペナル化したり、動きモデル正規化器を導入したりする変分法は、その後のフレームを関連づけ、グリッドベースの離散化を用いて画質を改善するために提案されている。ニューラルネットワークは、望まれる時空間量の別のパラメトリゼーションを、深いニューラルネットワークで提供し、軽量で連続的で、滑らかな表現に偏っている。帰納バイアスは、動的逆問題に対して時間規則性を強制するために利用され、その結果、データ忠実度項のみを最小化することによって、ニューラルネットワークが最適化される。本稿では,2次元以上の時間計算トモグラフィーにおいて,PDEに基づく運動正規化器,すなわち光流方程式を導入する利点について検討し,その利点を示す。また、ニューラルネットワークをグリッドベースの解法と比較し、前者が後者より優れていることを示す。 Image reconstruction for dynamic inverse problems with highly undersampled data poses a major challenge: not accounting for the dynamics of the process leads to a non-realistic motion with no time regularity. Variational approaches that penalize time derivatives or introduce motion model regularizers have been proposed to relate subsequent frames and improve image quality using grid-based discretization. Neural fields offer an alternative parametrization of the desired spatiotemporal quantity with a deep neural network, a lightweight, continuous, and biased towards smoothness representation. The inductive bias has been exploited to enforce time regularity for dynamic inverse problems resulting in neural fields optimized by minimizing a data-fidelity term only. In this paper we investigate and show the benefits of introducing explicit PDE-based motion regularizers, namely, the optical flow equation, in 2D+time computed tomography for the optimization of neural fields. We also compare neural fields against a grid-based solver and show that the former outperforms the latter.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# pOps:フォトインスパイアされた拡散演算子 pOps: Photo-Inspired Diffusion Operators ( http://arxiv.org/abs/2406.01300v1 ) ライセンス: Link先を確認	Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or,	(参考訳) テキスト誘導画像生成により、テキスト記述から視覚コンテンツを作成することができる。しかし、特定の視覚概念は言語だけでは効果的に伝達できない。これは、IP-Adapterのようなメソッドを通じて、より視覚的に指向したタスクにCLIPイメージの埋め込みスペースを活用することに、新たな関心を喚起した。興味深いことに、CLIP画像埋め込み空間は意味論的に意味があることが示され、この空間内の線形操作は意味論的に意味のある結果をもたらす。しかし、これらの操作の特定の意味は、異なる画像間で予測不能に変化する可能性がある。この可能性を活用するために、私たちは、CLIPイメージの埋め込みに直接、特定のセマンティック演算子をトレーニングするフレームワークであるpOpsを紹介します。各pOpsオペレータは、事前訓練された拡散事前モデルに基づいて構築される。 Diffusion Priorモデルはもともとテキストの埋め込みと画像の埋め込みをマッピングするために訓練されたものの、新しい入力条件に合わせるように調整できることを実証し、拡散演算子をもたらすことを示した。イメージ埋め込みを直接処理することで、セマンティック操作の学習能力が向上するだけでなく、必要に応じてテキストCLIP損失を追加の監視として直接使用することが可能になります。 pOpsは、異なる意味を持つ様々なフォトインスパイアされた演算子を学習するために使用でき、提案手法のセマンティック多様性とポテンシャルを強調している。 Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image embeddings, we demonstrate that it can be tuned to accommodate new input conditions, resulting in a diffusion operator. Working directly over image embeddings not only improves our ability to learn semantic operations but also allows us to directly use a textual CLIP loss as an additional supervision when needed. We show that pOps can be used to learn a variety of photo-inspired operators with distinct semantic meanings, highlighting the semantic diversity and potential of our proposed approach.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# CTと臨床データを用いたマルチモーダルラーニングによる肺塞栓症の死亡率予測 Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data ( http://arxiv.org/abs/2406.01302v1 ) ライセンス: Link先を確認	Zhusi Zhong, Helen Zhang, Fayez H. Fayad, Andrew C. Lancaster, John Sollee, Shreyas Kulkarni, Cheng Ting Lin, Jie Li, Xinbo Gao, Scott Collinsa, Sun H. Ahn, Harrison X. Bai, Zhicheng Jiao, Michael K. Atalay,	(参考訳) 目的: 肺塞栓症(PE)はアメリカにおいて重大な死因である。本研究の目的は,CTPA(CTPA),臨床データ,PESI(PE Severity Index)スコアを用いたディープラーニング(DL)モデルを用いてPE死亡率を予測することである。対象と方法:3施設の振り返り調査により918例(年齢64歳,13-99歳,女性52%)のCTPA3,978例が確認された。生存を予測するため、CTPAから疾患関連画像の特徴を抽出するためにAIモデルが使用された。画像特徴および臨床変数をDLモデルに組み込んで生存率を予測した。 1)CTPA画像のみの使用,(2)臨床変数のみの使用,(3)CTPAと臨床変数を統合したマルチモーダル,(4)PESIスコアを算出したマルチモーダルの4つのモデルを開発した。コーマンス指数 (c-index) とネット再分類改善 (Net Reclassification Improvement) を用いて各モードのパフォーマンスと寄与を評価した。性能はウィルコクソン符号ランク試験を用いてPESI予測と比較した。カプラン・マイアー分析を行い,高リスク群と低リスク群に分類した。右室機能障害を考慮し追加の因子リスク分析を行った。結果: PESI融合モデルとマルチモーダルモデルでは, PESI単独よりも高いc-指標が得られた。マルチモーダルおよびPESI融合モデルによる高リスク群と低リスク群への成層化後,死亡率は有意に異なっていた(p<0.001。高リスクグループ化とRV機能障害との間には強い相関関係が認められた。結論:CTPAの特徴,臨床データ,PESIを取り入れた多相DLモデルはPESI単独よりも高いc-指標をPE生存予測のために達成した。 Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs were identified via retrospective review across three institutions. To predict survival, an AI model was used to extract disease-related imaging features from CTPAs. Imaging features and/or clinical variables were then incorporated into DL models to predict survival outcomes. Four models were developed as follows: (1) using CTPA imaging features only; (2) using clinical variables only; (3) multimodal, integrating both CTPA and clinical variables; and (4) multimodal fused with calculated PESI score. Performance and contribution from each modality were evaluated using concordance index (c-index) and Net Reclassification Improvement, respectively. Performance was compared to PESI predictions using the Wilcoxon signed-rank test. Kaplan-Meier analysis was performed to stratify patients into high- and low-risk groups. Additional factor-risk analysis was conducted to account for right ventricular (RV) dysfunction. Results: For both data sets, the PESI-fused and multimodal models achieved higher c-indices than PESI alone. Following stratification of patients into high- and low-risk groups by multimodal and PESI-fused models, mortality outcomes differed significantly (both p<0.001). A strong correlation was found between high-risk grouping and RV dysfunction. Conclusions: Multiomic DL models incorporating CTPA features, clinical data, and PESI achieved higher c-indices than PESI alone for PE survival prediction.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# CodeR: マルチエージェントとタスクグラフによる問題解決 CodeR: Issue Resolving with Multi-Agent and Task Graphs ( http://arxiv.org/abs/2406.01304v1 ) ライセンス: Link先を確認	Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang,	(参考訳) GitHubのイシュー解決は最近、アカデミックや業界から大きな注目を集めている。 SWEベンチは問題解決における性能を測定するために提案されている。本稿では,マルチエージェントフレームワークと事前に定義されたタスクグラフを採用して,報告されたバグの修復と解決を行い,コードリポジトリに新機能を追加するCodeRを提案する。 SWE-bench lite では、CodeR は各問題に 1 回だけ提出した場合に 28.00% の問題を解決することができる。我々は,CodeRの各設計の性能への影響について検討し,この研究の方向性を推し進めるための洞察を提供する。 GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.00% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# 大規模言語モデルによる教師なしディトラクタ生成とコントラスト復号 Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding ( http://arxiv.org/abs/2406.01306v1 ) ライセンス: Link先を確認	Fanyi Qu, Hao Sun, Yunfang Wu,	(参考訳) 読解理解のコンテキスト内では、DG(Distractor Generation)タスクは、読者を混乱させるいくつかの誤った選択肢を生成することを目的としている。従来のDGの監督手法は、高価な人間に注釈付けされたイントラクタラベルに大きく依存している。本稿では,小学生モデルのDG能力を高めるために,LLM(Large Language Models)をコスト効率のよいアノテータとして活用する,教師なしのDGフレームワークを提案する。特に, 2段階の学習プロセスにおいて, LLMからの擬似的注意散らしと, 目的とする元の回答情報とを統合して, 知識蒸留を行うための2つのタスクトレーニング戦略を提案する。さらに,DGモデルの注意をそらす能力を高めるために,反実的コントラストデコーディング機構を考案した。実験の結果,バルトベースを用いた教師なし生成法はモデルパラメータの200倍の差でGPT-3.5-turbo性能を大幅に上回ることがわかった。筆者らが提案する教師なしDG手法は, 手間のかかる注意散らしアノテーションやコストのかかる大規模モデルを必要としない, 実用的な読解アプリケーションのための費用効率のよいフレームワークを提供する。 Within the context of reading comprehension, the task of Distractor Generation (DG) aims to generate several incorrect options to confuse readers. Traditional supervised methods for DG rely heavily on expensive human-annotated distractor labels. In this paper, we propose an unsupervised DG framework, leveraging Large Language Models (LLMs) as cost-effective annotators to enhance the DG capability of smaller student models. Specially, to perform knowledge distilling, we propose a dual task training strategy that integrates pseudo distractors from LLMs and the original answer in-formation as the objective targets with a two-stage training process. Moreover, we devise a counterfactual contrastive decoding mechanism for increasing the distracting capability of the DG model. Experiments show that our unsupervised generation method with Bart-base greatly surpasses GPT-3.5-turbo performance with only 200 times fewer model parameters. Our proposed unsupervised DG method offers a cost-effective framework for practical reading comprehension applications, without the need of laborious distractor annotation and costly large-size models	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# Revolve: 自律運転のための大規模言語モデルによるリワード進化 REvolve: Reward Evolution with Large Language Models for Autonomous Driving ( http://arxiv.org/abs/2406.01309v1 ) ライセンス: Link先を確認	Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires,	(参考訳) 効果的な報酬関数の設計は、強化学習(RL)アルゴリズムの訓練に不可欠である。しかし、この設計は、明確に定量化が難しい特定のタスクの主観的な性質のため、ドメインの専門家にとっても、簡単ではない。近年,大規模言語モデル (LLM) は自然言語のタスク記述から報酬を生成するために用いられ,その広範囲な指導チューニングと人間の行動の常識的理解を活用している。本研究では,人間からのフィードバックによって導かれるLLMが,人間による報酬関数の定式化に有効である,という仮説を立てる。具体的には、これを「良い」運転の概念が暗黙的で定量化が難しい自律運転(AD)の挑戦的な設定で研究する。この目的のために,AD における報酬設計に LLM を用いる進化的フレームワークである Revolve を紹介する。 Revolveは人間のフィードバックを利用して報酬関数を作成し、進化過程をガイドし、暗黙の人間の知識を訓練(深い)RLエージェントの明確な報酬関数に効果的に翻訳する。我々は、Revolve-Designed rewardsで訓練されたエージェントが人間の運転基準と密接に一致していることを示し、その結果、他の最先端のベースラインを上回っている。 Designing effective reward functions is crucial to training reinforcement learning (RL) algorithms. However, this design is non-trivial, even for domain experts, due to the subjective nature of certain tasks that are hard to quantify explicitly. In recent works, large language models (LLMs) have been used for reward generation from natural language task descriptions, leveraging their extensive instruction tuning and commonsense understanding of human behavior. In this work, we hypothesize that LLMs, guided by human feedback, can be used to formulate human-aligned reward functions. Specifically, we study this in the challenging setting of autonomous driving (AD), wherein notions of "good" driving are tacit and hard to quantify. To this end, we introduce REvolve, an evolutionary framework that uses LLMs for reward design in AD. REvolve creates and refines reward functions by utilizing human feedback to guide the evolution process, effectively translating implicit human knowledge into explicit reward functions for training (deep) RL agents. We demonstrate that agents trained on REvolve-designed rewards align closely with human driving standards, thereby outperforming other state-of-the-art baselines.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# FactGenius:知識グラフによるファクト検証を改善するためにゼロショットプロンプトとファジィリレーションマイニングを組み合わせる FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs ( http://arxiv.org/abs/2406.01311v1 ) ライセンス: Link先を確認	Sushant Gautam,	(参考訳) ファクトチェック(Fact-checking)は、信頼できる証拠を考慮し、クレームの真正性を検証する重要な自然言語処理(NLP)タスクである。伝統的な手法は労働集約的なデータキュレーションとルールベースのアプローチによって制限されることが多い。本稿では,大規模言語モデル(LLM)のゼロショットプロンプトと知識グラフ(KG)のファジィテキストマッチングを組み合わせたファクトチェック手法であるFactGeniusを提案する。ウィキペディアから派生した構造化リンクデータデータセットであるDBpediaを利用することで、FactGeniusは、類似度測定を使用してLLM生成された接続を洗練し、正確性を保証する。ファクト検証のベンチマークデータセットであるFactKG上でのFactGeniusの評価は、特に分類器として微調整されたRoBERTaにおいて、既存のベースラインを著しく上回っていることを示している。コネクションのフィルタリングと検証という2段階のアプローチは、さまざまな推論タイプで優れたパフォーマンスを実現し、堅牢なファクトチェックのための有望なツールとしてFactGeniusを確立する上で、極めて重要である。コードと資料はhttps://github.com/SushantGautam/FactGeniusで入手できる。 Fact-checking is a crucial natural language processing (NLP) task that verifies the truthfulness of claims by considering reliable evidence. Traditional methods are often limited by labour-intensive data curation and rule-based approaches. In this paper, we present FactGenius, a novel method that enhances fact-checking by combining zero-shot prompting of large language models (LLMs) with fuzzy text matching on knowledge graphs (KGs). Leveraging DBpedia, a structured linked data dataset derived from Wikipedia, FactGenius refines LLM-generated connections using similarity measures to ensure accuracy. The evaluation of FactGenius on the FactKG, a benchmark dataset for fact verification, demonstrates that it significantly outperforms existing baselines, particularly when fine-tuning RoBERTa as a classifier. The two-stage approach of filtering and validating connections proves crucial, achieving superior performance across various reasoning types and establishing FactGenius as a promising tool for robust fact-checking. The code and materials are available at https://github.com/SushantGautam/FactGenius.	翻訳日:2024-06-05 23:09:15 公開日:2024-06-03
# ソフトマックスフリー変圧器を用いた医用画像分類とシーケンス正規化 Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization ( http://arxiv.org/abs/2406.01314v1 ) ライセンス: Link先を確認	Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav Müller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn,	(参考訳) Transformerモデルは、自然言語処理、音声認識、コンピュータビジョンなどの進歩する分野において重要な役割を担っている。しかし、このモデルの重要な制限は、その2次計算とメモリの複雑さであり、より長いシーケンスにその適用を制限している。これは、高解像度画像がギガピクセルスケールに達する医療画像において特に重要である。この問題に対処する努力は、トランスフォーマーのアーキテクチャに不可欠なソフトマックス操作を分解するといった複雑な技術に主に焦点を当てている。本稿では、トランスフォーマーモデルのこの2次計算複雑性に対処し、注意機構からソフトマックス関数を排除し、キー、クエリ、バリュートークンのシーケンス正規化手法を採用することにより、この問題を回避する、驚くほど単純かつ効果的な方法を提案する。行列乗法の再順序付けと組み合わせて、このアプローチはメモリと計算の複雑さを線形スケールに減らす。本手法は, 眼底鏡, 皮膚鏡, 放射線画像, 組織画像データを含む様々な医用画像データセットにまたがって評価される。以上の結果から,従来のトランスモデルと同等の性能を示しながら,より長いシーケンスを効率的に処理できることが示唆された。 The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity relative to the sequence length, which constrains its application to longer sequences. This is especially crucial in medical imaging where high-resolution images can reach gigapixel scale. Efforts to address this issue have predominantely focused on complex techniques, such as decomposing the softmax operation integral to the Transformer's architecture. This paper addresses this quadratic computational complexity of Transformer models and introduces a remarkably simple and effective method that circumvents this issue by eliminating the softmax function from the attention mechanism and adopting a sequence normalization technique for the key, query, and value tokens. Coupled with a reordering of matrix multiplications this approach reduces the memory- and compute complexity to a linear scale. We evaluate this approach across various medical imaging datasets comprising fundoscopic, dermascopic, radiologic and histologic imaging data. Our findings highlight that these models exhibit a comparable performance to traditional transformer models, while efficiently handling longer sequences.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 微分型永続ホモロジーを用いたスケールフリー画像キーポイント Scale-Free Image Keypoints Using Differentiable Persistent Homology ( http://arxiv.org/abs/2406.01315v1 ) ライセンス: Link先を確認	Giovanni Barbarani, Francesco Vaccarino, Gabriele Trivigno, Marco Guerra, Gabriele Berton, Carlo Masone,	(参考訳) コンピュータビジョンでは、キーポイント検出は基本的な課題であり、ロボット工学から画像検索まで応用されるが、既存の学習ベースの手法はスケール依存と柔軟性の欠如に悩まされている。本稿では、モース理論と永続ホモロジー、代数トポロジーに根ざした強力なツールを活用する新しいアプローチを紹介する。本稿では,近年の永続的ホモロジーにおける下位段階の概念を導入し,トポロジカルラーニングへの道を開いた新しい損失関数を提案する。私たちの検出器であるMorseDetは、特徴検出のための最初のトポロジベースの学習モデルであり、キーポイント反復性において競合性能を実現し、その問題に対して原理的かつ理論的に堅牢なアプローチを導入する。 In computer vision, keypoint detection is a fundamental task, with applications spanning from robotics to image retrieval; however, existing learning-based methods suffer from scale dependency and lack flexibility. This paper introduces a novel approach that leverages Morse theory and persistent homology, powerful tools rooted in algebraic topology. We propose a novel loss function based on the recent introduction of a notion of subgradient in persistent homology, paving the way toward topological learning. Our detector, MorseDet, is the first topology-based learning model for feature detection, which achieves competitive performance in keypoint repeatability and introduces a principled and theoretically robust approach to the problem.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 言語・詩・合成IMUの統合表現による慣性ハンドベースHARの強化 Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs ( http://arxiv.org/abs/2406.01316v1 ) ライセンス: Link先を確認	Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu, Paul Lukowicz,	(参考訳) HARにおけるラベル付きセンサデータの不足により、以前の研究はビデオデータを使用して慣性計測ユニット(IMU)データを合成し、リッチなアクティビティアノテーションを活用している。しかし、ビデオからIMUデータを生成することは、合成IMUデータの質の悪さと微妙できめ細かな動きによるHARの課題を示す。本稿では,制約データの問題に対処する新しいマルチモーダル,マルチタスク,コントラストベースのフレームワークであるMulti$3$Netを提案する。オンラインレポジトリでは,テキスト,ポーズ,IMUの同時表現を同時に学習することを目的として,事前学習を行う。ビデオデータとコントラスト学習を用いて、特に微妙な活動の認識においてウェアラブルHAR性能を向上させることを目指しており、我々の実験結果により、IMUデータを用いたHAR性能向上におけるアプローチの有効性が検証された。提案手法は,ビデオから生成したIMUデータを用いて学習したモデルが,よりきめ細かい活動を認識するための既存手法を超越していることを示す。 Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi$^3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our approach in improving HAR performance with IMU data. We demonstrate that models trained with synthetic IMU data generated from videos using our method surpass existing approaches in recognizing fine-grained activities.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# インテリジェントで効果的なグラフニューラル付加ネットワーク The Intelligible and Effective Graph Neural Additive Networks ( http://arxiv.org/abs/2406.01317v1 ) ライセンス: Link先を確認	Maya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach,	(参考訳) グラフニューラルネットワーク(GNN)は,グラフ構造化データを学習するための主要なアプローチとして登場した。しかし、ほとんどのGNNはブラックボックスモデルとして機能し、ポストホックな説明を必要とする。本稿では,設計によって解釈可能なGNNを提案する。我々のモデルであるグラフニューラル付加ネットワーク(GNAN)は、一般化付加モデル(Generalized Additive Models)の解釈可能なクラスの拡張であり、人間によって可視化され、完全に理解することができる。 GNANは完全に解釈可能なように設計されており、モデルを直接視覚化することで、機能とグラフレベルでのグローバルな説明とローカルな説明が可能である。これらの視覚化は、モデルがターゲット変数、特徴、およびグラフの関係をどのように利用するかを正確に記述する。我々は、さまざまなタスクやデータセットの一連の例において、GNANの知性を示す。さらに、GNANの精度はブラックボックスGNNと同等であり、透明性が不可欠である重要なアプリケーションに高い精度で適合することを示す。 Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel extension of the interpretable class of Generalized Additive Models, and can be visualized and fully understood by humans. GNAN is designed to be fully interpretable, allowing both global and local explanations at the feature and graph levels through direct visualization of the model. These visualizations describe the exact way the model uses the relationships between the target variable, the features, and the graph. We demonstrate the intelligibility of GNANs in a series of examples on different tasks and datasets. In addition, we show that the accuracy of GNAN is on par with black-box GNNs, making it suitable for critical applications where transparency is essential, alongside high accuracy.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 縮退拡散確率モデルの収束性 Convergence of the denoising diffusion probabilistic models ( http://arxiv.org/abs/2406.01320v1 ) ライセンス: Link先を確認	Yumiharu Nakano,	(参考訳) 我々は,Ho,J.,Jain,A.,Abbeel,P.,Advanceds in Neural Information Processing Systems, 33 (2020), pp. 6840-6851で提示された拡散確率モデル(DDPM)の原版を理論的に解析した。我々の主定理は、分散スケジュールのパラメータの漸近条件、$L^2$ベースのスコア推定誤差、および時間ステップ数に対するノイズ推定関数の下で、元のDDPMサンプリングアルゴリズムによって構築されたシーケンスが、無限大となるにつれて、与えられたデータ分布に弱収束することを示している。定理の証明において、サンプリング列は逆時間確率微分方程式(SDE)の指数積分器型近似として見ることができる。さらに、一般的な連続過程の逆イットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットイットの逆時間表現を厳密に証明する。 We theoretically analyze the original version of the denoising diffusion probabilistic models (DDPMs) presented in Ho, J., Jain, A., and Abbeel, P., Advances in Neural Information Processing Systems, 33 (2020), pp. 6840-6851. Our main theorem states that the sequence constructed by the original DDPM sampling algorithm weakly converges to a given data distribution as the number of time steps goes to infinity, under some asymptotic conditions on the parameters for the variance schedule, the $L^2$-based score estimation error, and the noise estimating function with respect to the number of time steps. In proving the theorem, we reveal that the sampling sequence can be seen as an exponential integrator type approximation of a reverse time stochastic differential equation (SDE). Moreover, we give a proper definition of the backward It\^o integral for general continuous processes and prove rigorously the reverse time representation of a given SDE with backward It\^o integral, without using the smoothness and uniqueness of the associated forward Kolmogorov equations.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# シーケンス・ツー・シークエンスマルチモーダル音声のインペインティング Sequence-to-Sequence Multi-Modal Speech In-Painting ( http://arxiv.org/abs/2406.01321v1 ) ライセンス: Link先を確認	Mahsa Kadkhodaei Elyaderani, Shahram Shirani,	(参考訳) 音声インペインティングは、信頼性のあるコンテキスト情報を用いて、欠落した音声コンテンツを再生するタスクである。近年,音声のマルチモーダル認識に関する研究が盛んに行われているが,音声における視覚情報や聴覚情報の効果的な注入はいまだに必要である。本稿では,エンコーダ・デコーダアーキテクチャを用いて,音声信号に視覚情報を利用する新しいシーケンス・ツー・シーケンスモデルを提案する。エンコーダは、顔記録のためのリップリーダーの役割を担い、デコーダは、エンコーダ出力と歪んだ音声スペクトログラムの両方を取り込み、元の音声を復元する。提案手法は音声のみの音声インパインティングモデルより優れており,300msから1500msの歪みに対して,近年のマルチモーダル音声インパインターと同等の精度で,マルチモーダル音声インパインティングの有効性を示す。 Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech in-painting model and has comparable results with a recent multi-modal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 構造介入と不平等のダイナミクス Structural Interventions and the Dynamics of Inequality ( http://arxiv.org/abs/2406.01323v1 ) ライセンス: Link先を確認	Aurora Zhang, Annette Hosoi,	(参考訳) アルゴリズムフェアネス文学における近年の議論は、フェアネスの標準的な概念に関していくつかの懸念を提起している。第一に、公正度ベンチマークを満たすための予測アルゴリズムの制約は、不適切なグループに対して最適でない結果をもたらす可能性がある。第二に、技術的介入は、特に社会的不平等を生み出す構造的過程の理解から離反した場合、それ自体で効果がないことが多い。これら2つの批判にインスパイアされた我々は、ローンローンを実例として、共通の意思決定モデルを構築した。いくつかの条件下では、決定しきい値の選択は、パレートの最適政策から逸脱しない限り、金融安定の既存の格差を必然的に持続させることが示される。そして、3種類の介入の効果をモデル化する。外部パラメータによる構造変化の実施の難しさや、政策立案者の株式や効率の好みによって、どのように異なる介入が推奨されるかを示す。対極的には、株式よりも効率の優先が、未公開グループをターゲットにした介入の推奨につながる可能性があることを実証する。最後に、HMDAとFannie Maeローンデータを組み合わせたデータセットに対する介入の効果をシミュレートする。この研究は、一見バイアスのない決定機構によって構造的不平等が過小評価される方法を強調し、多くの状況において、技術的な解決策は社会的変化を起こすために外部の文脈に合った介入と組み合わせなければならないことを示す。 Recent conversations in the algorithmic fairness literature have raised several concerns with standard conceptions of fairness. First, constraining predictive algorithms to satisfy fairness benchmarks may lead to non-optimal outcomes for disadvantaged groups. Second, technical interventions are often ineffective by themselves, especially when divorced from an understanding of structural processes that generate social inequality. Inspired by both these critiques, we construct a common decision-making model, using mortgage loans as a running example. We show that under some conditions, any choice of decision threshold will inevitably perpetuate existing disparities in financial stability unless one deviates from the Pareto optimal policy. Then, we model the effects of three different types of interventions. We show how different interventions are recommended depending upon the difficulty of enacting structural change upon external parameters and depending upon the policymaker's preferences for equity or efficiency. Counterintuitively, we demonstrate that preferences for efficiency over equity may lead to recommendations for interventions that target the under-resourced group. Finally, we simulate the effects of interventions on a dataset that combines HMDA and Fannie Mae loan data. This research highlights the ways that structural inequality can be perpetuated by seemingly unbiased decision mechanisms, and it shows that in many situations, technical solutions must be paired with external, context-aware interventions to enact social change.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# TabPedia: 概念シナジーによる総合的なビジュアルテーブル理解を目指して TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy ( http://arxiv.org/abs/2406.01326v1 ) ライセンス: Link先を確認	Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Houqiang Li, Can Huang,	(参考訳) 表には、機械の理解に挑戦する様々な構造や内容を伴う実データと定量的データが含まれている。従来の手法は一般にタスク固有のアーキテクチャと個々のタスクの目的を設計し、結果としてモーダルな分離と複雑なワークフローをもたらす。本稿では,概念シナジー機構を備えた新しい視覚言語モデルTabPediaを提案する。このメカニズムでは、様々な視覚テーブル理解(VTU)タスクとマルチソース視覚埋め込みを概念として抽象化する。この統合フレームワークは、大規模な言語モデル(LLM)の機能を活用することで、テーブル検出、テーブル構造認識、テーブルクエリ、テーブル質問応答といったVTUタスクをシームレスに統合することを可能にする。さらに、この概念のシナジー機構により、テーブル認識関連および理解関連タスクが調和して機能し、対応するソース認識埋め込みから必要な手がかりを効果的に活用することができる。さらに、実世界のシナリオにおけるVTUタスクをよりよく評価するために、約9000のQAペアを備えた新しい総合的なテーブルVQAベンチマークComTQAを構築した。表認識と理解タスクの両面において,多種多岐にわたる定量的,質的な実験を行い,TabPediaの有効性を検証した。優れた性能は、全ての概念がシナジーで動くとき、視覚テーブルを理解するためにLLMを使うことの可能性をさらに確認する。 ComTQAベンチマークはhttps://huggingface.co/datasets/ByteDance/ComTQAでオープンソース化された。ソースコードとモデルは後日リリースされる予定だ。 Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks to work in harmony, as they can effectively leverage the needed clues from the corresponding source perception embeddings. Furthermore, to better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9,000 QA pairs. Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our TabPedia. The superior performance further confirms the feasibility of using LLMs for understanding visual tables when all concepts work in synergy. The benchmark ComTQA has been open-sourced at https://huggingface.co/datasets/ByteDance/ComTQA. The source code and model will be released later.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# X)AIに基づく学習システムによるドメイン知識の伝達 Transferring Domain Knowledge with (X)AI-Based Learning Systems ( http://arxiv.org/abs/2406.01329v1 ) ライセンス: Link先を確認	Philipp Spitzer, Niklas Kühl, Marc Goutier, Manuel Kaschura, Gerhard Satzger,	(参考訳) 多くのハイテイクドメインでは、従来の学習システムによる初心者のトレーニングは十分ではない。暗黙の知識を与えるためには、専門家の手引き指導が不可欠である。しかし、専門家による初級生のトレーニングは費用がかかり時間もかかり、代替手段の必要性が高まる。説明可能な人工知能(XAI)は、従来、ブラックボックス人工知能システムを解釈するために用いられてきた。本研究では,XAIを代替として活用する: (X)AIシステムは,専門家の過去の判断に基づいて訓練され,説明と組み合わせた事例を提供することで初心者の教育に使用される。本研究では,249名の参加者を対象に,分類課題に対するアプローチの有効性を計測した。我々は,(X)AIに基づく学習システムが初心者の学習を誘導し,その認知スタイルが中等学習であることを示す。このようにして、XAIが人間の学習に与える影響を明らかにする第一歩を踏み出し、(X)AIベースの学習システムの設計をカスタマイズする将来の選択肢にAI開発者を向けます。 In numerous high-stakes domains, training novices via conventional learning systems does not suffice. To impart tacit knowledge, experts' hands-on guidance is imperative. However, training novices by experts is costly and time-consuming, increasing the need for alternatives. Explainable artificial intelligence (XAI) has conventionally been used to make black-box artificial intelligence systems interpretable. In this work, we utilize XAI as an alternative: An (X)AI system is trained on experts' past decisions and is then employed to teach novices by providing examples coupled with explanations. In a study with 249 participants, we measure the effectiveness of such an approach for a classification task. We show that (X)AI-based learning systems are able to induce learning in novices and that their cognitive styles moderate learning. Thus, we take the first steps to reveal the impact of XAI on human learning and point AI developers to future options to tailor the design of (X)AI-based learning systems.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 事前学習データ検出のための言語モデルの提案 Probing Language Models for Pre-training Data Detection ( http://arxiv.org/abs/2406.01333v1 ) ライセンス: Link先を確認	Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Haonan Lu, Bing Liu, Wenliang Chen,	(参考訳) 大きな言語モデル(LLM)は、その印象的な機能を示しつつ、プライバシの問題や事前トレーニングフェーズにおけるベンチマークデータセットのリークによるデータ汚染問題への懸念も提起している。したがって、LLMが対象テキスト上で事前訓練されているかどうかを確認することにより、汚染を検出することが不可欠である。近年の研究では、表面的な特徴であり信頼性に欠ける、生成されたテキストと計算難易度に焦点が当てられている。本研究では,モデルの内部アクティベーションを調べることにより,事前学習データ検出のための探索手法を提案する。我々の手法はシンプルで効果的であり、より信頼性の高い事前学習データ検出につながる。さらに,計算機科学と数学のカテゴリからarxivを抽象化した新しい挑戦的ベンチマークArxivMIAを提案する。実験の結果,本手法はWikiMIAとArxivMIAの双方ですべてのベースラインを上回る性能を示し,その有効性を確認した(我々のコードとデータセットはhttps://github.com/zhliu0106/probing-lm-dataで入手できる)。 Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy (Our code and dataset are available at https://github.com/zhliu0106/probing-lm-data).	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# HHMR:グラフ拡散モデルのマルチモーダル制御性向上によるホリスティックハンドメッシュ回復 HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models ( http://arxiv.org/abs/2406.01334v1 ) ライセンス: Link先を確認	Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, Yebin Liu,	(参考訳) 近年、世代と復興のパラダイムが深く統合される傾向が見られた。本稿では,HHMR(Holistic Hand Mesh Recovery)と呼ばれる単一フレームワークで直接手メッシュ生成,塗り絵,再構築,嵌合を行う,より包括的な手メッシュ回復タスクのための,制御可能な生成モデルの拡張について述べる。我々のキーとなる観察は、強力なマルチモーダル制御性を持つ単一の生成モデルによって、異なるタイプのハンドメッシュリカバリタスクが達成可能であることであり、そのようなフレームワークでは、異なるタスクを実現するためには、異なるシグナルを条件として与えることしか必要としない。この目的を達成するために,グラフ畳み込みとアテンション機構に基づくオールインワン拡散フレームワークを提案する。マルチモーダル制御信号のデカップリングを確保しつつ、強力な制御生成能力を実現するため、異なるモードを共有特徴空間にマッピングし、モダリティと特徴レベルの両方でクロススケールなランダムマスキングを適用する。このように、手前の学習において、異なるモダリティ間の相関が完全に活用される。さらに,生成したモデルと制御信号とのアライメントを向上させるための条件整合型グラディエントガイダンスを提案し,ハンドメッシュの再構築とフィッティングの精度を大幅に向上させる。実験により,我々の新しいフレームワークは,複数のハンドメッシュリカバリタスクを同時に実現し,既存のメソッドを異なるタスクで上回り,ジェスチャ認識やポーズ生成,メッシュ編集など,その後の下流アプリケーションにさらなる可能性をもたらすことが示された。 Recent years have witnessed a trend of the deep integration of the generation and reconstruction paradigms. In this paper, we extend the ability of controllable generative models for a more comprehensive hand mesh recovery task: direct hand mesh generation, inpainting, reconstruction, and fitting in a single framework, which we name as Holistic Hand Mesh Recovery (HHMR). Our key observation is that different kinds of hand mesh recovery tasks can be achieved by a single generative model with strong multimodal controllability, and in such a framework, realizing different tasks only requires giving different signals as conditions. To achieve this goal, we propose an all-in-one diffusion framework based on graph convolution and attention mechanisms for holistic hand mesh recovery. In order to achieve strong control generation capability while ensuring the decoupling of multimodal control signals, we map different modalities to a shared feature space and apply cross-scale random masking in both modality and feature levels. In this way, the correlation between different modalities can be fully exploited during the learning of hand priors. Furthermore, we propose Condition-aligned Gradient Guidance to enhance the alignment of the generated model with the control signals, which significantly improves the accuracy of the hand mesh reconstruction and fitting. Experiments show that our novel framework can realize multiple hand mesh recovery tasks simultaneously and outperform the existing methods in different tasks, which provides more possibilities for subsequent downstream applications including gesture recognition, pose generation, mesh editing, and so on.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# データサイエンスとファイナンスのための最大エントロピー原理による統計インフォームド量子回路 Statistics-Informed Parameterized Quantum Circuit via Maximum Entropy Principle for Data Science and Finance ( http://arxiv.org/abs/2406.01335v1 ) ライセンス: Link先を確認	Xi-Ning Zhuang, Zhao-Yun Chen, Cheng Xue, Xiao-Fan Xu, Chao Wang, Huan-Yu Liu, Tai-Ping Sun, Yun-Jie Wang, Yu-Chun Wu, Guo-Ping Guo,	(参考訳) 量子機械学習は、特にデータサイエンスやファイナンスといった統計に焦点を当てた分野において、実践的な問題を解決する上で大きな可能性を示している。しかし、トレーニング可能性や解釈可能性の問題により、量子プロセッサ上の統計モデルの作成と学習には課題が残っている。本稿では、最大エントロピー原理を用いて、任意の分布とその重み付き混合を含む量子量子統計モデルを効率的に準備し、訓練する統計インフォームドパラメタライズド量子回路(SI-PQC)を設計する。 SI-PQCは、トレーニング可能なパラメータを持つ静的構造を備え、詳細な最適化された回路コンパイルを可能にし、リソースと時間消費の指数関数的削減を可能にし、量子状態と古典モデルパラメータを同時に学習するためのトレーニング性と解釈性を改善している。 SI-PQCは、様々な量子アルゴリズムで準備および学習するための効率的なサブルーチンとして、入力ボトルネックに対処し、事前知識の注入を容易にする。 Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-informed parameterized quantum circuit (SI-PQC) that efficiently prepares and trains quantum computational statistical models, including arbitrary distributions and their weighted mixtures. The SI-PQC features a static structure with trainable parameters, enabling in-depth optimized circuit compilation, exponential reductions in resource and time consumption, and improved trainability and interpretability for learning quantum states and classical model parameters simultaneously. As an efficient subroutine for preparing and learning in various quantum algorithms, the SI-PQC addresses the input bottleneck and facilitates the injection of prior knowledge.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# ARCH2S: ポイントクラウドから外部構造を学ぶためのデータセット、ベンチマーク、課題 ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds ( http://arxiv.org/abs/2406.01337v1 ) ライセンス: Link先を確認	Ka Lung Cheung, Chi Chung Lee,	(参考訳) 建築構造物の精密なセグメンテーションは, 各種建築部品の詳細な情報を提供し, 建築環境に対する理解と相互作用を高める。それでも、既存の屋外3Dポイントクラウドデータセットには、プライバシの懸念とデータ取得とアノテーションの高価なコストによる、アーキテクチャ外部に関する限定的で詳細なアノテーションがある。この欠点を克服するために,本研究では,セマンティックセグメンテーションのためのセマンティック・セグメンテーションのためのセマンティック・モデル・データセットとベンチマークを提案する。現実世界の建物の4つの異なる建築目的と、香港のオープンな建築景観を特徴としている。各点クラウドは14のセマンティッククラスのうちの1つに注釈付けされる。 Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# ユーザフローの再利用による適合性モバイルアプリのクレーシェからのリカバリ Recover as It is Designed to Be: Recovering from Compatibility Mobile App Crashes by Reusing User Flows ( http://arxiv.org/abs/2406.01339v1 ) ライセンス: Link先を確認	Donghwi Kim, Hyungjun Yoon, Chang Min Park, Sujin Han, Youngjin Kwon, Steven Y. Ko, Sung-Ju Lee,	(参考訳) Android OSは、API更新とデバイスベンダのOSカスタマイズによって著しく断片化されており、非常に異なるOSバージョンが共存する市場条件を形成している。これにより、Androidアプリが特定のAndroidバージョンでクラッシュするが、他のバージョンではクラッシュしない互換性のクラッシュ問題が発生する。この問題はよく知られていますが、テストが必要な市場にはAndroidバージョンが多すぎるため、アプリ開発者が克服するのは極めて困難です。 RecoFlowは、アプリの開発者が、私たちのAPIとビジュアルツールを使ったプログラミングユーザフローによって、クラッシュから自動的にアプリを復元することを可能にするフレームワークです。 RecoFlowは、ユーザデバイス上のユーザフローによるアプリの機能使用を追跡し、クラッシュによって中断されたアプリの機能のUIアクションを再生することで、クラッシュからアプリを回復する。繰り返し発生する互換性のクラッシュを防止するため、RecoFlowは、我々の新しいAndroid OS仮想化技術によって実現された互換性モードで、以前クラッシュしたアプリを実行します。プロのAndroid開発者に対する私たちの評価は、APIとツールが使いやすく、互換性のクラッシュから回復するのに有効であることを示しています。 Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist. This gives rise to compatibility crash problems where Android apps crash on certain Android versions but not on others. Although well-known, this problem is extremely challenging for app developers to overcome due to the sheer number of Android versions in the market that must be tested. We present RecoFlow, a framework for enabling app developers to automatically recover an app from a crash by programming user flows with our API and visual tools. RecoFlow tracks app feature usage with the user flows on user devices and recovers an app from a crash by replaying UI actions of the app feature disrupted by the crash. To prevent recurring compatibility crashes, RecoFlow executes a previously crashed app in compatibility mode that is enabled by our novel Android OS virtualization technique. Our evaluation with professional Android developers shows that our API and tools are easy to use and effective in recovering from compatibility crashes.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# 三角環におけるハイゼンベルク反強磁性モデルを用いた$\mathrm{Cu}_{3}$-like化合物に基づく量子機械 Quantum machines based on $\mathrm{Cu}_{3}$-like compounds using the Heisenberg antiferromagnetic model in a triangular ring ( http://arxiv.org/abs/2406.01340v1 ) ライセンス: Link先を確認	Onofre Rojas, Moises Rojas,	(参考訳) 本研究では, 反強磁性結合スピン系, 特に$\text{Cu}_{3}-\text{X}(\text{X=As, Sb})$に関する理論的研究を行い, 以前の文献で確認されたように, わずかに歪んだ等方三角形の構成を示す。このシステムは、三角構造内のハイゼンベルクモデルを用いてモデル化され、交換相互作用、ジアロシンスキー-モリヤ相互作用、g因子、および外部磁場が組み込まれている。我々は、$\text{Cu}_{3}$-like反強磁性結合スピン系に基づく3つの量子マシンを探索する。垂直磁場が約$\sim5$Tである場合、低温で顕著に重要なMCE (Magneticocaloric effect) は、およそ$T\sim1$Kである。熱機関および冷凍機として, 外部磁場の影響を観測し, これらの条件下での熱効率について検討した。以上の結果から,MCEの強化により,熱機関としての操作領域が広いことが示唆された。さらに、量子オットーマシンを探索し、熱エンジン、冷蔵庫、ヒーター、熱加速器として機能する汎用性を示した。しかし、主に冷蔵庫やアクセラレーターとして運用されている。また、対応する熱効率についても検討する。同様に、熱エンジン、冷蔵庫、ヒーター、熱加速器として機能する量子スターリングマシンも分析しましたが、主に冷凍機と熱加速器として機能します。また,対応する熱効率についても検討した。熱エンジンとしてのオットーマシンの性能は、特にMCEの影響を受けており、スターリングマシンの動作モードは、MCE周辺の熱エンジンと加速器の間で切替される。 In this work, we present a theoretical investigation into an antiferromagnetically coupled spin system, specifically $\text{Cu}_{3}-\text{X}(\text{X=As, Sb})$, which exhibits a configuration of a slightly distorted equilateral triangle, as identified in previous literature. This system is modeled using the Heisenberg model within a triangular structure, incorporating exchange interaction, Dzyaloshinskii-Moriya interaction, g-factors, and an external magnetic field. We explore three quantum machines based on the $\text{Cu}_{3}$-like antiferromagnetically coupled spin system. The magnetocaloric effect (MCE), which is notably more significant at low temperatures, around $T\sim1$K, for a perpendicular magnetic field at approximately $\sim5$T, has been analyzed. We examine the Carnot machine, observing the influence of the external magnetic field on its operation as both a heat engine and refrigerator, and discuss the thermal efficiencies under these conditions. Our findings suggest that enhanced MCE allows for broader operation regions as a heat engine. Additionally, we explore the quantum Otto machine, showing its versatility in functioning as a heat engine, refrigerator, heater, and thermal accelerator. However, it mainly operates as a refrigerator and accelerator. We also explore their corresponding thermal efficiencies. Similarly, we have analyzed the quantum Stirling machine, which is capable of functioning as a heat engine, refrigerator, heater, and thermal accelerator, but it mainly operates as a refrigerator and thermal accelerator. We also examined the corresponding thermal efficiencies. It is worth mentioning that the Otto machine performance as a heat engine is notably influenced by the MCE, while the operational mode of the Stirling machine switches between a heat engine and accelerator around MCE is more prominent.	翻訳日:2024-06-05 22:59:31 公開日:2024-06-03
# BMRS: 構造的刈り込みのためのベイズモデル削減 BMRS: Bayesian Model Reduction for Structured Pruning ( http://arxiv.org/abs/2406.01345v1 ) ライセンス: Link先を確認	Dustin Wright, Christian Igel, Raghavendra Selvan,	(参考訳) 現代のニューラルネットワークはしばしば過度にパラメータ化され、トレーニングと推論の間に高い計算コストをもたらす。優れた性能を維持しながら、ニューラルネットワークの計算効率とエネルギー効率を改善する効果的な方法は、モデル出力に限られた影響を持つ完全なネットワーク構造(例えば、ニューロンや畳み込みフィルタ)を除去する構造化プルーニングである。本研究では,完全にエンドツーエンドの構造化刈り込み手法であるBMRS(Bayesian Model Reduction for Structured Pruning)を提案する。 BMRSは2つの最近の手法に基づいており、ベイジアン構造プルーニングとベイジアンモデルリダクション(BMR)は、ベイジアンモデルの事前変化による効率的な比較を可能にする手法である。我々は、異なる構造化プルーニング特性をもたらす、異なる事前から派生したBMRSの2つの実現法を提案する。 1) BMRS_Nは、閾値を調整することなく、信頼性の高い圧縮率と精度を提供する、切り詰められたログ正規化前のBMRS_Nである。 2) BMRS_Uはトランケーションの境界に基づいてより攻撃的な圧縮を実現することができる。全体として、BMRSは、高い圧縮率と精度の両方をもたらすニューラルネットワークの構造的プルーニングに対して理論的に基礎的なアプローチを提供する。複雑度の異なる複数のデータセットとニューラルネットワークの実験により、2つのBMRS法は、他のプルーニング法と比較して、競合するパフォーマンス効率のトレードオフを提供することが示された。 Modern neural networks are often massively overparameterized leading to high compute costs during training and at inference. One effective method to improve both the compute and energy efficiency of neural networks while maintaining good performance is structured pruning, where full network structures (e.g. neurons or convolutional filters) that have limited impact on the model output are removed. In this work, we propose Bayesian Model Reduction for Structured pruning (BMRS), a fully end-to-end Bayesian method of structured pruning. BMRS is based on two recent methods: Bayesian structured pruning with multiplicative noise, and Bayesian model reduction (BMR), a method which allows efficient comparison of Bayesian models under a change in prior. We present two realizations of BMRS derived from different priors which yield different structured pruning characteristics: 1) BMRS_N with the truncated log-normal prior, which offers reliable compression rates and accuracy without the need for tuning any thresholds and 2) BMRS_U with the truncated log-uniform prior that can achieve more aggressive compression based on the boundaries of truncation. Overall, we find that BMRS offers a theoretically grounded approach to structured pruning of neural networks yielding both high compression rates and accuracy. Experiments on multiple datasets and neural networks of varying complexity showed that the two BMRS methods offer a competitive performance-efficiency trade-off compared to other pruning methods.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 制御可能な長ビデオ生成によるエンド・ツー・エンド自律運転の解き放つ一般化 Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation ( http://arxiv.org/abs/2406.01349v1 ) ライセンス: Link先を確認	Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu,	(参考訳) 生成モデルを使用して新しいデータを合成することは、データ不足問題に対処する自律運転におけるデファクトスタンダードとなっている。既存の手法は知覚モデルを向上させることができるが、生成したビデオは通常8フレーム未満であり、空間的および時間的矛盾は無視できないため、エンド・ツー・エンドの自律走行モデルの設計性能を向上できない。この目的のために,空間的整合性を高めるために多視点で共有ノイズモデリング機構を備えた拡散型長ビデオ生成手法であるDelphiと,正確な制御性と時間的整合性を両立する特徴整合モジュールを提案する。本手法は,最先端の手法に比べて約5倍長くなる一貫性を損なうことなく,最大40フレームの映像を生成することができる。我々は、新しいデータをランダムに生成する代わりに、サンプル効率を改善するために、これらの障害ケースに類似した新しいデータをDelphiが生成できるようにサンプリングポリシーを設計する。これは、事前トレーニングされたビジュアル言語モデルの助けを借りて、フェールケース駆動フレームワークを構築することで実現される。我々の大規模な実験は、Delphiが従来の最先端の手法を超越した、より高品質な長編ビデオを生成することを示した。結果として、トレーニングデータセットの4%しか生成できないため、私たちのフレームワークは、認識と予測タスクを越えて、私たちの知る限りでは初めて、エンドツーエンドの自動運転モデルの計画性能を25%向上することが可能になります。 Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency. This is achieved by building a failure-case driven framework with the help of pre-trained visual language models. Our extensive experiment demonstrates that our Delphi generates a higher quality of long videos surpassing previous state-of-the-art methods. Consequentially, with only generating 4% of the training dataset size, our framework is able to go beyond perception and prediction tasks, for the first time to the best of our knowledge, boost the planning performance of the end-to-end autonomous driving model by a margin of 25%.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 量子回路による置換群の実現 Realization of permutation groups by quantum circuit ( http://arxiv.org/abs/2406.01350v1 ) ライセンス: Link先を確認	Junchi Liu, Yangyang Ren, Yan Cao, Hanyi Sun, Lin Chen,	(参考訳) 本稿では、CNOTゲートを2つ以上の要素が生成する置換群の実装に用いた。 Lemma 1では、3つのCNOTゲートが必要であり、2ビットスワップゲート操作を実行するのに十分である。その後、Lemma 2 において、n-qubit 置換演算を行うために必要な CNOT ゲートの最大数は 3(n-1) であることが示される。第3節では, 5つ以上のCNOTゲートの利用が, 置換要素に対応する3ビットスワップゲートの実装に不十分であることを明らかにする(123)。したがって、6つのCNOTゲートが必要であり、実装に十分である(123)。これは、グラフ理論のアプローチを用いて、結果を少なくとも5つのCNOTゲートで厳密に検証することで実現される。計算ツールを用いて、正確に6つのCNOTゲートを含む有効な回路図を網羅的に探索し、(123)のスワップゲートをうまく実行し、Remark 6 と Table 2 の等価クラスを説明する。結論は Theorem 7.1 で、解析を多ビットシナリオにまで拡張するために、定義において再現可能かつ既約な置換要素を提示する。 8 マルチキュービット空間における行間の等価性を明確にし、上記の定理の演算を行うための近似上界を多キュービットに対して与える。 9. 本論文の総合的な探索は, 特定の2ビットゲートを多用することにより, 量子回路最適化のさらなる発展を図ることを目的としている。 In this paper, we exclusively utilize CNOT gates for implementing permutation groups generated by more than two elements. In Lemma 1, we recall that three CNOT gates are both necessary and sufficient to execute a two-qubit swap gate operation. Subsequently, in Lemma 2, we show that the maximum number of CNOT gates needed to carry out an n-qubit substitution operation is 3(n-1). Moving forward, our analysis in Section 3 reveals that utilizing five or fewer CNOT gates is insufficient for implementing a three-qubit swap gate corresponding to the permutation element (123). Hence six CNOT gates are both necessary and sufficient for implementing (123). This is done by employing a graph-theoretic approach to rigorously validate the results in terms of at most five CNOT gates. Using computational tools, we exhaustively explore all valid circuit diagrams containing exactly six CNOT gates to successfully execute the swap gate for (123), by explaining the equivalence classes in Remark 6 and Table 2. We conclude them in Theorem 7.To extend our analysis to the multiqubit scenario, we present the reducible and irreducible permutation elements in Definition 8. We clarify the equivalence between rows in the multi-qubit space and provide an approximate upper bound for multi-qubits to perform the aforementioned operations in Theorem 9. The comprehensive exploration of this paper aims to pave the way for further advancements in understanding quantum circuit optimization via multiple use of a specific two-qubit gate.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 位置紙:認知神経科学の教訓に触発されたAIの内的解釈可能性フレームワーク Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience ( http://arxiv.org/abs/2406.01352v1 ) ライセンス: Link先を確認	Martina G. Vilas, Federico Adolfi, David Poeppel, Gemma Roig,	(参考訳) 内的解釈可能性(Inner Interpretability)は、AIシステムの内部メカニズムを明らかにすることを任務とする、有望な新興分野である。さらに、最近の批判は、AIのより広い目標を前進させるための有用性に疑問を呈する問題を提起している。しかし、これらの問題は別の分野の認知神経科学と類似していることが見過ごされている。ここでは、関係する関係を描き、フィールド間で生産的に伝達できる教訓を強調します。そこで本研究では,AIの内部解釈可能性研究における機械的説明を構築するための,一般的な概念的枠組みを提案し,具体的な方法論的戦略を提案する。この概念的なフレームワークによって、インナー・インタプリタビリティは批判を排除し、AIシステムを説明する生産的なパスに自らを置くことができる。 Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 拡散モデルの微分プライベート微調整 Differentially Private Fine-Tuning of Diffusion Models ( http://arxiv.org/abs/2406.01355v1 ) ライセンス: Link先を確認	Yu-Lin Tsai, Yizhe Li, Zekai Chen, Po-Yu Chen, Chia-Mu Yu, Xuebin Ren, Francois Buet-Golfouse,	(参考訳) 差分プライバシー(DP)と拡散モデル(DM)の統合は、特にかなりのプライバシーリスクをもたらすDMの記憶能力のために、有望だが挑戦的なフロンティアを示す。差分プライバシーは、モデルトレーニング中に個々のデータポイントを保護するための厳格なフレームワークを提供する。拡散法は画像生成を反復的なステップに分解し、理論的にはDPのインクリメンタルノイズ付加と整合する。自然に適合しているにもかかわらず、DMのユニークなアーキテクチャは、プライバシーとユーティリティのトレードオフを効果的にバランスをとるために、カスタマイズされたアプローチを必要とする。この分野での最近の進歩は、公開データ(イメージネット)で事前学習し、プライベートデータで微調整することで高品質な合成データを生成する可能性を強調しているが、DP設定におけるトレードオフの最適化、特にパラメータ効率とモデルのスケーラビリティに関する研究において、明らかなギャップがある。我々の研究は、プライベート拡散モデルに最適化されたパラメータ効率の良い微調整戦略を提案し、プライバシーとユーティリティのトレードオフを強化するためにトレーニング可能なパラメータの数を最小化する。提案手法はDP合成における最先端性を実現し,広く研究されているデータセットのベンチマークを著しく上回り,トレーニング可能なパラメータが0.47Mに留まり,CelebA-64データセットのプライバシ予算が小さく,従来の最先端性よりも35%以上改善されていることを実証的に実証した。 anonymous codes available at https://anonymous.4open.science/r/DP-LORA-F02F The integration of Differential Privacy (DP) with diffusion models (DMs) presents a promising yet challenging frontier, particularly due to the substantial memorization capabilities of DMs that pose significant privacy risks. Differential privacy offers a rigorous framework for safeguarding individual data points during model training, with Differential Privacy Stochastic Gradient Descent (DP-SGD) being a prominent implementation. Diffusion method decomposes image generation into iterative steps, theoretically aligning well with DP's incremental noise addition. Despite the natural fit, the unique architecture of DMs necessitates tailored approaches to effectively balance privacy-utility trade-off. Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data (i.e., ImageNet) and fine-tuning on private data, however, there is a pronounced gap in research on optimizing the trade-offs involved in DP settings, particularly concerning parameter efficiency and model scalability. Our work addresses this by proposing a parameter-efficient fine-tuning strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off. We empirically demonstrate that our method achieves state-of-the-art performance in DP synthesis, significantly surpassing previous benchmarks on widely studied datasets (e.g., with only 0.47M trainable parameters, achieving a more than 35% improvement over the previous state-of-the-art with a small privacy budget on the CelebA-64 dataset). Anonymous codes available at https://anonymous.4open.science/r/DP-LORA-F02F.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# MP-PolarMask - コンケーブ画像の高速かつ高速なインスタンスセグメンテーション MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images ( http://arxiv.org/abs/2406.01356v1 ) ライセンス: Link先を確認	Ke-Lei Wang, Pin-Hsuan Chou, Young-Ching Chou, Chia-Jen Liu, Cheng-Kuan Lin, Yu-Chee Tseng,	(参考訳) 例のセグメンテーションには多くのモデルがあるが、PolarMaskは、Polar座標系によって物体を表すユニークなモデルとして際立っている。アンカーボックスフリーの設計と一度に検出とセグメンテーションを行う単一ステージのフレームワークにより、PolarMaskは効率と精度のバランスをとることができることが証明された。したがって、他のダウンストリームリアルタイムアプリケーションと簡単に接続できる。本研究では,PolarMaskに関連する2つの欠陥について述べる。一凹物を表すことができないこと、及び (II)レイレグレッションの非効率性。複数の極系を利用するMP-PolarMask(Multi-Point PolarMask)を提案する。主なアイデアは、一つの主極系から4つの補助極系へと拡張し、より複雑な凸と凹凸の混合形状を表現できるようにすることである。我々はMP-PolarMaskをCOCOデータセットの一般オブジェクトと食品オブジェクトの両方で検証し、この結果は36光のPolarMaskよりもAP_Lで13.69%、APで7.23%の大幅な改善を示した。 While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time applications. In this work, we observe that there are two deficiencies associated with PolarMask: (i) inability of representing concave objects and (ii) inefficiency in using ray regression. We propose MP-PolarMask (Multi-Point PolarMask) by taking advantage of multiple Polar systems. The main idea is to extend from one main Polar system to four auxiliary Polar systems, thus capable of representing more complicated convex-and-concave-mixed shapes. We validate MP-PolarMask on both general objects and food objects of the COCO dataset, and the results demonstrate significant improvement of 13.69% in AP_L and 7.23% in AP over PolarMask with 36 rays.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# スピナーボース-アインシュタイン凝縮と電流-密度相互作用 Spinor Bose-Einstein condensates subject to current-density interactions ( http://arxiv.org/abs/2406.01357v1 ) ライセンス: Link先を確認	Maria Arazo, Montserrat Guilleumas, Ricardo Mayol, Vicente Delgado, Antonio Muñoz Mateo,	(参考訳) 最近達成されたキラル凝縮物は、電流-密度相互作用によって誘導されるキラル特性の研究に興味深い道を開いた。これらの特徴をスピノル系に含めようとする試みは、スピン成分間の線形結合による保存量の制約とともに、微分軌道電流から生じる非線形で効果的なスピン軌道結合をもたらす。キラリティは、表面波、暗いソリトンと明るいソリトン、ジョセフソン渦で探索される定常状態のスペクトルとその動的安定性に及んでいる。我々の解析的および数値的な結果は、偏極とジョセフソン電流の不安定化の役割を明らかにし、平面波の線形重ね合わせで構築された安定な非線形状態の存在を支持する。 Recently achieved chiral condensates open intriguing avenues for the study of the chiral properties induced by current-density interactions. An attempt to include these features in a spinor system is presented, which gives rise to a nonlinear, effective spin-orbit coupling that emerges from the differential orbital currents, along with constraints in the conserved quantities due to the linear coupling between spin components. Chirality pervades the resulting spectrum of stationary states and their dynamical stability, which are explored in plane waves, dark and bright solitons, and Josephson vortices. Our analytical and numerical results reveal the destabilizing role of polarization and Josephson currents, and support the existence of stable nonlinear states built of linear superpositions of plane waves.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 剣の世界でアタリを弾くことを学ぶ Learning to Play Atari in a World of Tokens ( http://arxiv.org/abs/2406.01361v1 ) ライセンス: Link先を確認	Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou,	(参考訳) モデルベース強化学習エージェントは、拡張コンテキストをモデル化する能力により、サンプル効率が向上し、より正確な世界モデルが得られる。しかし、複雑な推論や計画タスクでは、これらの手法は主に連続的な表現に依存している。これは、補間が可算でないような解離対象クラスのような実世界の離散的性質のモデリングを複雑にする。本研究では,世界と学習行動の両方をモデル化するための離散表現を利用したサンプル効率の手法である,トランスフォーマーベース学習(DART)のための離散抽象表現を紹介する。本研究では,自己回帰的世界モデリングのためのトランスフォーマー・デコーダと,世界モデルの離散表現におけるタスク関連キューへの参加による学習行動のためのトランスフォーマー・デコーダを組み込んだ。部分的な可観測性を扱うために、過去のステップから情報をメモリトークンとして集約する。 DARTは、Atari 100kサンプル効率ベンチマークでルックアヘッド検索を使用しない従来の最先端の手法よりも、平均的な人間正規化スコアが0.790で、26試合中9試合で人間に勝っている。コードをhttps://pranaval.github.io/DART/でリリースします。 Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# LLMによるリコメンデーションのプライバシ:最近の進歩と今後の方向性 Privacy in LLM-based Recommendation: Recent Advances and Future Directions ( http://arxiv.org/abs/2406.01363v1 ) ライセンス: Link先を確認	Sichun Luo, Wei Shao, Yuxuan Yao, Jian Xu, Mingyang Liu, Qintong Li, Bowei He, Maolin Wang, Guanzhi Deng, Hanxu Hou, Xinyi Zhang, Linqi Song,	(参考訳) 近年,大規模言語モデル (LLM) と従来のレコメンデーションモデルが統合され,レコメンデーション性能が向上している。しかしながら、既存の作業の多くはモデルパフォーマンスの改善に重点を置いているものの、プライバシ問題は比較的少ない関心しか寄せられていない。本稿では,LLMに基づくレコメンデーションにおけるプライバシの最近の進歩を概観し,プライバシ攻撃と保護機構に分類する。さらに、いくつかの課題を強調し、これらの重要な問題に対処するためのコミュニティの今後の方向性を提案する。 Nowadays, large language models (LLMs) have been integrated with conventional recommendation models to improve recommendation performance. However, while most of the existing works have focused on improving the model performance, the privacy issue has only received comparatively less attention. In this paper, we review recent advancements in privacy within LLM-based recommendation, categorizing them into privacy attacks and protection mechanisms. Additionally, we highlight several challenges and propose future directions for the community to address these critical problems.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# BELLS: LLMセーフガードの評価のための将来のベンチマークに向けたフレームワーク BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards ( http://arxiv.org/abs/2406.01364v1 ) ライセンス: Link先を確認	Diego Dorn, Alexandre Variengien, Charbel-Raphaël Segerie, Vincent Corruble,	(参考訳) 入力出力セーフガードは、LLM(Large Language Models)システムによって生成されたトレースの異常を検出するために使用される。これらの検出器は、リアルタイム監視、トレースのオフライン評価、コンテンツモデレーションなど、多様な安全クリティカルなアプリケーションの中核にある。しかし、評価する方法論は広く認知されていない。このギャップを埋めるために,(1) 既定の既定の障害モードのベンチマークに基づいて,既存のインプットアウトプットセーフガードのパフォーマンスを比較することを目的とした,確立された障害テスト,(2) 未確認の障害モードの一般化を計測し,より一般的なセーフガードの開発を促進すること,(3) より複雑なスキャフォールディング(LLMエージェントやマルチエージェントシステムなど)のための次世代アーキテクチャテスト, (3) 安全ガードが存在しない将来のアプリケーションに適応可能な安全ガードの開発を促進すること,の3つのカテゴリに編成された,LLMセーフガードの評価のためのベンチマーク(BELLS)を紹介した。さらに、MACHIAVELLI環境を使用して、最初の次世代アーキテクチャテストを実装し、共有し、データセットをインタラクティブに可視化する。 Input-output safeguards are used to detect anomalies in the traces produced by Large Language Models (LLMs) systems. These detectors are at the core of diverse safety-critical applications such as real-time monitoring, offline evaluation of traces, and content moderation. However, there is no widely recognized methodology to evaluate them. To fill this gap, we introduce the Benchmarks for the Evaluation of LLM Safeguards (BELLS), a structured collection of tests, organized into three categories: (1) established failure tests, based on already-existing benchmarks for well-defined failure modes, aiming to compare the performance of current input-output safeguards; (2) emerging failure tests, to measure generalization to never-seen-before failure modes and encourage the development of more general safeguards; (3) next-gen architecture tests, for more complex scaffolding (such as LLM-agents and multi-agent systems), aiming to foster the development of safeguards that could adapt to future applications for which no safeguard currently exists. Furthermore, we implement and share the first next-gen architecture test, using the MACHIAVELLI environment, along with an interactive visualization of the dataset.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 特徴可視化から視覚回路へ:逆モデル操作の効果 From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation ( http://arxiv.org/abs/2406.01365v1 ) ライセンス: Link先を確認	Geraldin Nanfack, Michael Eickenberg, Eugene Belilovsky,	(参考訳) 大規模ディープニューラルネットワークの内部動作機能を理解することは、いくつかの高度なアプリケーションでは難しいが重要な課題である。メカニスティック・インター・プレッタビリティ(Mechanistic Inter-Pretability)は、この課題に対処する創発的な分野である。視覚に順応したモデルでは、これらのサブグラフは通常、特徴視覚化と呼ばれる一般的な手法でノードの特徴を視覚化することで解釈される。近年の研究では, 異なる特徴可視化型の安定性を, 対向モデル操作フレームワーク下で解析している。本稿では,2種類の特徴可視化を同時に操作するProxPulseと呼ばれる新たな攻撃を提案することによって,既存の作業の限界に対処することから始める。驚くべきことに、これらの攻撃を視覚回路の傘で分析すると、視覚回路がProxPulseに頑丈であることが分かる。そこで我々は、ProxPulseに基づく新たな攻撃を導入し、視覚回路のマニピュラビリティを明らかにし、堅牢性の欠如に光を当てる。これらの攻撃の有効性は、ImageNet上でトレーニング済みのAlexNetとResNet-50モデルを使用して検証される。 Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applications. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# D-CPT法:大規模言語モデルのドメイン固有連続事前学習法 D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models ( http://arxiv.org/abs/2406.01375v1 ) ライセンス: Link先を確認	Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng,	(参考訳) 大規模言語モデル(LLM)におけるCPT(Continuous Pre-Training)は、特定の下流ドメイン(例えば、数学やコード)に対するモデルの基本的理解を拡大するために広く用いられている。ドメイン固有LLMに関するCPTでは、一般コーパス(例えば、Dolma、Slim-pajama)と下流ドメインコーパスの最適混合比をどのように選択するかが重要な問題である。既存の手法では、GPUトレーニングのコストが高い混合比のセットをグリッドサーチすることで、退屈な人間の努力を採用するのが一般的である。さらに、選択された比率が特定の領域に最適であることを保証できない。性能予測のためのスケーリング法(Scaling Law for Performance Prediction)に触発された既存手法の限界に対処するため,ドメイン固有連続事前学習法(D-CPT Law)のスケーリング法を検討し,異なるサイズのLCMに対して許容するトレーニングコストと最適混合比を決定することを提案する。具体的には、D-CPT法を適用すれば、任意の混合比、モデルサイズ、データセットサイズの一般および下流性能を、限られた実験において小規模のトレーニングコストを用いて容易に予測できる。さらに、クロスドメイン設定に関する標準D-CPT法を拡張し、ターゲットドメインのD-CPT法を予測するクロスドメインD-CPT法を提案し、ターゲットドメインに対して非常に少ないトレーニングコスト(通常のトレーニングコストの約1%)が必要となる。 6つの下流領域における総合的な実験結果から,提案したD-CPT法とクロスドメインD-CPT法の有効性と一般化性を示した。 Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually adopt laborious human efforts by grid-searching on a set of mixture ratios, which require high GPU training consumption costs. Besides, we cannot guarantee the selected ratio is optimal for the specific domain. To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes. Specifically, by fitting the D-CPT Law, we can easily predict the general and downstream performance of arbitrary mixture ratios, model sizes, and dataset sizes using small-scale training costs on limited experiments. Moreover, we also extend our standard D-CPT Law on cross-domain settings and propose the Cross-Domain D-CPT Law to predict the D-CPT law of target domains, where very small training costs (about 1% of the normal training costs) are needed for the target domains. Comprehensive experimental results on six downstream domains demonstrate the effectiveness and generalizability of our proposed D-CPT Law and Cross-Domain D-CPT Law.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# 時間的コントラスト学習によるマルチエージェントトランスファー学習 Multi-Agent Transfer Learning via Temporal Contrastive Learning ( http://arxiv.org/abs/2406.01377v1 ) ライセンス: Link先を確認	Weihao Zeng, Joseph Campbell, Simon Stepputtis, Katia Sycara,	(参考訳) 本稿では,深層多エージェント強化学習のための新しい伝達学習フレームワークを提案する。このアプローチは、ゴール条件付きポリシーと時間的コントラスト学習を自動的に組み合わせて、意味のあるサブゴールを発見する。このアプローチでは、目標条件付きエージェントを事前トレーニングし、ターゲットドメイン上でそれを微調整し、対照的な学習を使用して、サブゴールを介してエージェントをガイドする計画グラフを構築する。オーバークッキングタスクによるマルチエージェント協調実験では、サンプル効率の向上、スパース・リワードとロングホライゾンの問題を解決する能力、ベースラインと比較して解釈可能性の向上が示されている。その結果、複雑なマルチエージェント変換学習において、目標条件付きポリシーと教師なし時間的抽象学習を統合することの有効性を強調した。最先端のベースラインと比較して,本手法はトレーニングサンプルの21.7%しか必要とせず,同等あるいはより良い性能を実現している。 This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples.	翻訳日:2024-06-05 22:49:47 公開日:2024-06-03
# オフライン意思決定における学習可能性の理論 A Theory of Learnability for Offline Decision Making ( http://arxiv.org/abs/2406.01378v1 ) ライセンス: Link先を確認	Chenjie Mao, Qiaosheng Zhang,	(参考訳) 本稿では,学習目標に部分的に相関したデータセットから決定を学習することに焦点を当てたオフライン意思決定の課題について検討する。従来の研究では、オフライン強化学習(RL)やオフ政治評価(OPE)といった特定のオフライン意思決定問題について広範囲に研究されてきたが、統一された枠組みと理論はいまだに存在しない。このギャップに対処するために、オフラインRL、OPE、オフライン部分観測可能なマルコフ決定プロセス(POMDP)を含む幅広いオフライン意思決定問題をキャプチャする、DMOF(Decision Making with Offline Feedback)と呼ばれる統合フレームワークを導入する。 DMOF フレームワークでは,オフライン推定係数 (OEC) と呼ばれる,オフライン意思決定問題の学習可能性を測定し,導出したミニマックス下界にも反映する難易度尺度を導入する。さらに、インスタンス依存上界とミニマックス上界の両方を確立するための、EDD(Empirical Decision with Divergence)アルゴリズムを導入する。ミニマックス上界は、OECによって決定される下界とほぼ一致する。最後に, 教師付き学習やマルコフ的逐次問題~(例えば, MDPs)などの特定の設定に対して, EDD が高速収束率(例えば, 1/N$ のスケーリングでサンプルサイズが$N$ となる)を達成することを示す。 We study the problem of offline decision making, which focuses on learning decisions from datasets only partially correlated with the learning objective. While previous research has extensively studied specific offline decision making problems like offline reinforcement learning (RL) and off-policy evaluation (OPE), a unified framework and theory remain absent. To address this gap, we introduce a unified framework termed Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems including offline RL, OPE, and offline partially observable Markov decision processes (POMDPs). For the DMOF framework, we introduce a hardness measure called the Offline Estimation Coefficient (OEC), which measures the learnability of offline decision making problems and is also reflected in the derived minimax lower bounds. Additionally, we introduce an algorithm called Empirical Decision with Divergence (EDD), for which we establish both an instance-dependent upper bound and a minimax upper bound. The minimax upper bound almost matches the lower bound determined by the OEC. Finally, we show that EDD achieves a fast convergence rate (i.e., a rate scaling as $1/N$, where $N$ is the sample size) for specific settings such as supervised learning and Markovian sequential problems~(e.g., MDPs) with partial coverage.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 外周部を有する多物体追跡のための畳み込みアンセントカルマンフィルタ Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers ( http://arxiv.org/abs/2406.01380v1 ) ライセンス: Link先を確認	Shiqi Liu, Wenhan Cao, Chang Liu, Tianyi Zhang, Shengbo Eben Li,	(参考訳) マルチオブジェクトトラッキング(MOT)は、自律運転におけるナビゲーションに不可欠な技術である。トラッキング・バイ・検出システムでは、複雑なトラフィックシナリオのため、バイアス、偽陽性、ミスが避けられない。最近の追跡手法は、これらのアウトリーチを見渡すフィルタリングアルゴリズムに基づいており、トラッキングの精度を低下させ、オブジェクトの軌道の損失も減少させる。この課題に対処するために、実測データの分布とフィルタリングに使用される名目計測モデルとの相違点として、外れ値の生成に関する確率論的視点を採用する。さらに、畳み込み操作を設計することで、この不特定性を緩和できることを実証する。一般に採用されている追跡アルゴリズムにおいて、この操作を広く使われているKalmanフィルタ(UKF)に組み込むと、UKF(Convolutional UKF)と呼ばれる外れ値に頑健なUKFの変種を導出する。本稿では,ConvUKFがガウス共役性を維持し,リアルタイムな追跡を可能にすることを示す。また,ConvUKFが外乱の存在下で有界な追従誤差を持つことも証明した。 KITTIおよびnuScenesデータセットの実験結果は、MOTタスクの代表的なベースラインアルゴリズムと比較して精度が向上した。 Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects trajectory. To handle this challenge, we adopt a probabilistic perspective, regarding the generation of outliers as misspecification between the actual distribution of measurement data and the nominal measurement model used for filtering. We further demonstrate that, by designing a convolutional operation, we can mitigate this misspecification. Incorporating this operation into the widely used unscented Kalman filter (UKF) in commonly adopted tracking algorithms, we derive a variant of the UKF that is robust to outliers, called the convolutional UKF (ConvUKF). We show that ConvUKF maintains the Gaussian conjugate property, thus allowing for real-time tracking. We also prove that ConvUKF has a bounded tracking error in the presence of outliers, which implies robust stability. The experimental results on the KITTI and nuScenes datasets show improved accuracy compared to representative baseline algorithms for MOT tasks.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 大規模言語モデルは人々の期待通りに機能するか? : 人間の一般化関数の測定 Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function ( http://arxiv.org/abs/2406.01382v1 ) ライセンス: Link先を確認	Keyon Vafa, Ashesh Rambachan, Sendhil Mullainathan,	(参考訳) 大きな言語モデル(LLM)を印象付けるのは、それらを評価するのが難しいことです。これらのモデルを評価するためには、それらの目的を理解する必要がある。我々は、これらのデプロイメント決定が人々によってなされる状況、特にLDMがうまく機能する場所についての人々の信念を考える。我々は、人間の一般化関数の結果としてのそのような信念をモデル化する: LLMが正しいか間違っているかを見て、人々はそれが成功する可能性のある場所を一般化する。 MMLUとBIG-Benchベンチマークから、79のタスクにまたがる一般化の例を19Kのデータセットで収集する。人間の一般化関数は NLP 法を用いて予測可能であることを示す。次に,人間の一般化関数とLCMのアライメントを評価する。我々の結果は、特にミスのコストが高い場合には、より有能なモデル(例えばGPT-4)は、人間の一般化関数に一致しないため、人々が使用するインスタンスに対して、より悪い結果をもたらすことを示しています。 What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of 19K examples of how humans make generalizations across 79 tasks from the MMLU and BIG-Bench benchmarks. We show that the human generalization function can be predicted using NLP methods: people have consistent structured ways to generalize. We then evaluate LLM alignment with the human generalization function. Our results show that -- especially for cases where the cost of mistakes is high -- more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 自律型身体システムにおける構造因果モデルの拡張 Extending Structural Causal Models for Use in Autonomous Embodied Systems ( http://arxiv.org/abs/2406.01384v1 ) ライセンス: Link先を確認	Rhys Howard, Lars Kunze,	(参考訳) 多くのドメインで因果推論技術を開発するために多くの研究がなされてきたが、自律システムにおける因果性の利用はまだ初期段階にある。自律システムは、構造因果モデル(SCM)のような表現を使用することによって因果関係の統合から大きな恩恵を受ける。このシステムには高いレベルの透明性が与えられ、結果のポストホックな説明を可能にし、外因性変数のオンライン推論を支援する。これらの性質は、自律システムに直接的な利益をもたらすか、公的信頼の構築と規制の通知における貴重なステップとなる。そこで本稿では,SCMからなるモジュールベース自律運転システムについて述べる。この課題にアプローチするには、非常に複雑で大きさのシステムを扱う場合、それ自身で長期にわたって運用する必要がある、多くの課題を考慮する必要がある。ここでは、これらの課題と、その解決策について説明する。ひとつはSCMのコンテキストで、残りは3つの新しい変数カテゴリで、そのうち2つは関数型プログラミングモナドに基づいています。最後に,自律運転システムの因果能力の応用例を示す。この例では,仮想道路衝突事故における車両エージェント間の透水性について考察する。 Much work has been done to develop causal reasoning techniques across a number of domains, however the utilisation of causality within autonomous systems is still in its infancy. Autonomous systems would greatly benefit from the integration of causality through the use of representations such as structural causal models (SCMs). The system would be afforded a higher level of transparency, it would enable post-hoc explanations of outcomes, and assist in the online inference of exogenous variables. These qualities are either directly beneficial to the autonomous system or a valuable step in building public trust and informing regulation. To such an end we present a case study in which we describe a module-based autonomous driving system comprised of SCMs. Approaching this task requires considerations of a number of challenges when dealing with a system of great complexity and size, that must operate for extended periods of time by itself. Here we describe these challenges, and present solutions. The first of these is SCM contexts, with the remainder being three new variable categories -- two of which are based upon functional programming monads. Finally, we conclude by presenting an example application of the causal capabilities of the autonomous driving system. In this example, we aim to attribute culpability between vehicular agents in a hypothetical road collision incident.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 複合型多言語帯域とエピソード強化学習への応用 Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond ( http://arxiv.org/abs/2406.01386v1 ) ライセンス: Link先を確認	Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C. S. Lui, Wei Chen,	(参考訳) 本稿では,多変量および確率的トリガーアーム(CMAB-MT)を用いたCMAB(combinatorial multi-armed bandits)の新たな枠組みを紹介し,各アームの結果は$d$次元の多変量変数であり,フィードバックは一般的なアームトリガープロセスに従う。 CMAB-MTは既存のCMABと比べ、モデリング能力を高めるだけでなく、多変量確率変数の異なる統計特性を活用することで結果を改善することができる。 CMAB-MTに対して,確率変調スムーズな条件を誘導する一般1ノルム多変量法と,この条件に基づく楽観的なCUCB-MTアルゴリズムを提案する。提案手法は, 商品流通におけるエピソード強化学習 (RL) や確率的最大カバレッジなど, 上記の滑らかさ条件を満たすとともに, 既存の作品と比較して, 一致あるいは改善された後悔境界を達成できるような多くの重要な問題を含むことができる。我々の新しい枠組みにより、この2つの重要な方向の相互作用を促進するために、CMABのレンズを通して、エピソードRLを解くための新しい角度を提供することにより、エピソードRLとCMABの文献の間の最初の接続を構築する。 We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or improved regret bounds compared to existing works. Through our new framework, we build the first connection between the episodic RL and CMAB literature, by offering a new angle to solve the episodic RL through the lens of CMAB, which may encourage more interactions between these two important directions.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# AutoStudio:マルチターンインタラクティブ画像生成における一貫性のある主題の作成 AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation ( http://arxiv.org/abs/2406.01388v1 ) ライセンス: Link先を確認	Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang,	(参考訳) 最先端のテキスト・ツー・イメージ(T2I)生成モデルは、既に優れた単一画像の生成に優れており、さらに難しい課題であるマルチターン・インタラクティブな画像生成が、関連研究コミュニティの注目を集め始めている。このタスクでは、複数のターンでユーザーと対話し、一貫性のある画像列を生成する必要がある。しかし、ユーザが頻繁に主題を切り替える可能性があるため、現在の取り組みは多様な画像を生成しながら主題の一貫性を維持するのに苦労している。この問題に対処するために、AutoStudioと呼ばれるトレーニング不要のマルチエージェントフレームワークを導入する。 AutoStudioは、対話を処理するために大きな言語モデル(LLM)に基づく3つのエージェントと、高品質な画像を生成するための安定した拡散(SD)ベースのエージェントを使用している。特にAutoStudioは一対話の対話を解釈し、各主題の文脈を管理する主観管理者二被写体位置を制御するためのきめ細かいバウンディングボックスを生成するレイアウト生成装置三レイアウト改良の提案をする監督官、及び (iv)画像生成を完了させる引き出し。さらに,従来のUNetを置き換えるためにParallel-UNetを導入する。また,小被写体を保存しやすくするための被写体初期化生成手法も導入した。当社のAutoStudioでは,対話的かつ一貫したマルチオブジェクト画像のシーケンスを生成することができる。パブリックなCMIGBenchベンチマークと人間による評価による大規模な実験では、AutoStudioは複数のターンにまたがる複数オブジェクトの一貫性を維持しており、Frechet Inception Distanceの平均は13.65%、平均的な文字-文字類似度は2.83%向上している。 As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subjects frequently, current efforts struggle to maintain subject consistency while generating diverse images. To address this issue, we introduce a training-free multi-agent framework called AutoStudio. AutoStudio employs three agents based on large language models (LLMs) to handle interactions, along with a stable diffusion (SD) based agent for generating high-quality images. Specifically, AutoStudio consists of (i) a subject manager to interpret interaction dialogues and manage the context of each subject, (ii) a layout generator to generate fine-grained bounding boxes to control subject locations, (iii) a supervisor to provide suggestions for layout refinements, and (iv) a drawer to complete image generation. Furthermore, we introduce a Parallel-UNet to replace the original UNet in the drawer, which employs two parallel cross-attention modules for exploiting subject-aware features. We also introduce a subject-initialized generation method to better preserve small subjects. Our AutoStudio hereby can generate a sequence of multi-subject images interactively and consistently. Extensive experiments on the public CMIGBench benchmark and human evaluations show that AutoStudio maintains multi-subject consistency across multiple turns well, and it also raises the state-of-the-art performance by 13.65% in average Frechet Inception Distance and 2.83% in average character-character similarity.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 潜在MDPにおけるRLはトラクタブルである:オフ・プライシ・アセスメントによるオンライン保証 RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation ( http://arxiv.org/abs/2406.01389v1 ) ライセンス: Link先を確認	Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni,	(参考訳) 多くの実世界の決定問題では、部分的に観察された、隠された、あるいは潜伏した情報が、相互作用を通して固定されている。このような決定問題は、遅延マルコフ決定過程(LMDP)としてモデル化することができ、遅延変数は相互作用の開始時に選択され、エージェントには開示されない。過去10年間で、異なる構造的仮定の下でのLMDPの解法は著しく進歩した。しかし、一般のLMDPでは、既存の下界~\cite{kwon2021rl}と確実に一致するような学習アルゴリズムは存在しない。付加的な構造仮定を伴わずにLMDPのサンプル効率アルゴリズムを初めて導入する。本研究は、LMDPにおける外部評価保証とカバレッジ係数の役割に関する新たな視点を、部分的に観察された環境における探索の文脈で見落としている視点から構築したものである。具体的には,新たな非政治評価レムマを確立し,LMDPに対する新しいカバレッジ係数を導入する。次に,これらの手法を用いて,楽観的な探索アルゴリズムの最適に近い保証を導出する方法を示す。これらの結果は,LMDPを超えた幅広い対話型学習問題,特に部分的に観察された環境において有用であると考えられる。 In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound~\cite{kwon2021rl}. We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 大規模言語モデルのための空間加速訓練 Sparsity-Accelerated Training for Large Language Models ( http://arxiv.org/abs/2406.01392v1 ) ライセンス: Link先を確認	Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu,	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクの習熟度を示すが、連続的な事前学習や教師付き微調整のような追加の訓練を必要とすることが多い。しかし、これに関連するコストは、主にパラメータ数が大きいため、依然として高いままである。本稿では,プレトレーニングLDMにおけるemph{sparsity}の利用により,この学習プロセスを高速化することを提案する。前方反復中の活性化ニューロンの間隔を観察することにより、不活性ニューロンを排除して計算速度を上げる可能性を同定する。我々は、既存のニューロン重要度評価指標を拡張し、ラダー省略率スケジューラを導入することで、関連する課題に対処する。 Llama-2の実験では、Sparsity-Accelerated Training (SAT) は標準トレーニングと同等あるいは優れた性能を示しながら、プロセスの大幅な高速化を実現している。具体的には、SATは連続的な事前トレーニングで45 %$スループットの改善を達成し、実際に教師付き微調整で38 %$トレーニング時間を節約する。ハードウェアに依存しないシンプルで、デプロイが容易なフレームワークで、追加のLLMトレーニングを提供する。私たちのコードはhttps://github.com/OpenDFM/SAT.comで公開されています。 Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# プライバシストア:プライバシ削除とリカバリによる大規模言語モデルにおけるプライバシ保護推論 PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration ( http://arxiv.org/abs/2406.01394v1 ) ライセンス: Link先を確認	Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Huiping Zhuang, Cen Chen,	(参考訳) オンラインのLarge Language Models (LLMs) 推論サービスが広く使われていることで、eavesdropperや信頼できないサービスプロバイダへのユーザ入力にプライベート情報が暴露される可能性があるというプライバシー上の懸念が高まっている。 LLMの既存のプライバシー保護方法は、プライバシ保護の不足、性能劣化、厳しい推論時間オーバーヘッドに悩まされている。本稿では,LLM推論におけるユーザ入力のプライバシ保護のためのプライバシストアを提案する。 PrivacyRestoreは、ユーザ入力のプライバシスパンを直接削除し、推論中のアクティベーションステアリングを通じてプライバシ情報を復元する。プライバシスパンは復元ベクトルとしてエンコードされる。本稿では,AWA(Attention-Aware Weighted Aggregation)を提案する。AWAは,入力中のすべてのプライバシの復元ベクトルをメタ復元ベクトルに集約する。 AWAはすべてのプライバシスパンの適切な表現を保証するだけでなく、攻撃者がメタ復元ベクタのみからプライバシスパンを推測することを防ぐ。このメタ復元ベクタは、プライバシが削除されたクエリとともに、サーバに送信される。実験の結果,PrivacyRestoreは,許容レベルのパフォーマンスと推論効率を維持しつつ,個人情報を保護できることがわかった。 The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to eavesdroppers or untrustworthy service providers. Existing privacy protection methods for LLMs suffer from insufficient privacy protection, performance degradation, or severe inference time overhead. In this paper, we propose PrivacyRestore to protect the privacy of user inputs during LLM inference. PrivacyRestore directly removes privacy spans in user inputs and restores privacy information via activation steering during inference. The privacy spans are encoded as restoration vectors. We propose Attention-aware Weighted Aggregation (AWA) which aggregates restoration vectors of all privacy spans in the input into a meta restoration vector. AWA not only ensures proper representation of all privacy spans but also prevents attackers from inferring the privacy spans from the meta restoration vector alone. This meta restoration vector, along with the query with privacy spans removed, is then sent to the server. The experimental results show that PrivacyRestore can protect private information while maintaining acceptable levels of performance and inference efficiency.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# TE-NeXt: トラバーサビリティ推定のためのLiDARベースの3次元スパース畳み込みネットワーク TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation ( http://arxiv.org/abs/2406.01395v1 ) ライセンス: Link先を確認	Antonio Santo, Juan J. Cabrera, David Valiente, Carlos Viegas, Arturo Gil,	(参考訳) 本稿では,残差畳み込みブロックに基づく疎LiDAR点雲からのトラバーサビリティ推定(TE)の新規かつ効率的なアーキテクチャであるTE-NeXtを提案する。 TE-NeXtブロックは、注意機構や3次元スパース畳み込みといった現在のトレンドの概念を融合させる。 TE-NeXtは、SemanticKITTI、Rellis-3D、SemanticUSLといったよく知られた、アクセス可能なデータセットを使用して、さまざまな都市および自然環境における一般化のための高い能力を示すことを目的としている。このように、設計されたアーキテクチャは、セマンティックセグメンテーションの問題における最先端の手法を再構築し、非構造化環境におけるより良い結果を示し、都市環境における高い信頼性と堅牢性を維持し、より良い抽象化をもたらす。実装は、結果の再現性を確保することを目的として、科学コミュニティへのオープンリポジトリで利用可能である。 This paper presents TE-NeXt, a novel and efficient architecture for Traversability Estimation (TE) from sparse LiDAR point clouds based on a residual convolution block. TE-NeXt block fuses notions of current trends such as attention mechanisms and 3D sparse convolutions. TE-NeXt aims to demonstrate high capacity for generalisation in a variety of urban and natural environments, using well-known and accessible datasets such as SemanticKITTI, Rellis-3D and SemanticUSL. Thus, the designed architecture ouperforms state-of-the-art methods in the problem of semantic segmentation, demonstrating better results in unstructured environments and maintaining high reliability and robustness in urbans environments, which leads to better abstraction. Implementation is available in a open repository to the scientific community with the aim of ensuring the reproducibility of results.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# ストロンチウムの円リドバーグ状態のコヒーレント重ね合わせのスローダウン Slowing Down a Coherent Superposition of Circular Rydberg States of Strontium ( http://arxiv.org/abs/2406.01396v1 ) ライセンス: Link先を確認	L. Lachaud, B. Muraz, A. Couto, J. -M. Raimond, M. Brune, S. Gleyzes,	(参考訳) ライドバーグアルカリ土類原子は量子シミュレーションと気象学のための有望な道具である。 2つの価電子のうちの1つが長寿命の環状状態に進むと、第2価電子は大きな自己イオン化なしに光学的に操作できる。この特徴を利用して、円形ストロンチウム原子の熱原子ビームのレーザー減速を実証する。主イオンコア422nmの波長共振を駆動することにより、大きな自己イオン化を伴わずに50m/sの速度低下を観測する。また, 冷却過程における円形状態の重ね合わせは, 数千光子の散乱まで非常に弱い脱コヒーレンスを示す。このロバスト性は、その運動状態を同時に冷却しながら、円形の原子を持つ長い時間スケールでの量子シミュレーションの新しい視点を開く。これにより、量子シミュレーション中のスピンモーションカップリングによる避けられない加熱による有害な効果を軽減することができる。 Rydberg alkaline earth atoms are promising tools for quantum simulation and metrology. When one of the two valence electrons is promoted to long-lived circular states, the second valence electron can be optically manipulated without significant autoionization. We harness this feature to demonstrate laser slowing of a thermal atomic beam of circular strontium atoms. By driving the main ion core 422 nm wavelength resonance, we observe a velocity reduction of 50 m/s without significant autoionization. We also show that a superposition of circular states undergoes very weak decoherence during the cooling process, up to the scattering of more than thousand photons. This robustness opens new perspectives for quantum simulations over long timescales with circular atoms, while simultaneously cooling their motional state. It makes it possible to mitigate the harmful effects of unavoidable heating due to spin-motion coupling during a quantum simulation.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# Null Compliance:NYCローカルロー144とアルゴリズムアカウンタビリティの課題 Null Compliance: NYC Local Law 144 and the Challenges of Algorithm Accountability ( http://arxiv.org/abs/2406.01399v1 ) ライセンス: Link先を確認	Lucas Wright, Roxana Mike Muenster, Briana Vecchione, Tianyao Qu, Pika, Cai, COMM/INFO 2450 Student Investigators, Jacob Metcalf, J. Nathan Matias,	(参考訳) 2023年7月、ニューヨーク市は、商業的なアルゴリズムシステム、特に雇用と昇進に使用される自動雇用決定システム(AEDT)に対するバイアス監査を義務付ける世界で最初の司法管轄区域となった。地方法144 (LL 144) は、人種と性別の偏見について毎年独立して監査することを義務付けており、監査報告書を公表しなければならない。さらに、雇用主は、求職者リストに透明性通知を投稿する義務がある。本研究では,学生調査員155人がLL 144に対する雇用主のコンプライアンスと求職希望者のユーザ体験を391名記録した。これらの雇用者のうち、18人が監査報告、13人が透明性通知を投稿した。これらの値は、LL 144によって制定された説明責任機構の大幅な制限によって説明できる可能性がある。この法律は、雇用主に対して、自分たちのシステムが法律の範囲内であるかどうかについて、かなりの裁量を与えるので、nullの結果が非準拠であるとは言い切れない。従業員の判断は、ほぼ全ての監査が0.8以上の影響因子を報告しているということも説明できるかもしれない。また、通常の求職者に対するLL 144の利点は、アクセシビリティとユーザビリティの不足により制限されていることも判明した。本研究は,アルゴリズムシステムを規制する政策立案者にとって重要な教訓であり,特に規制当事者に付与する判断の度合い,透明性とエンドユーザの責任への依存の限界について考察した。 In July 2023, New York City became the first jurisdiction globally to mandate bias audits for commercial algorithmic systems, specifically for automated employment decisions systems (AEDTs) used in hiring and promotion. Local Law 144 (LL 144) requires AEDTs to be independently audited annually for race and gender bias, and the audit report must be publicly posted. Additionally, employers are obligated to post a transparency notice with the job listing. In this study, 155 student investigators recorded 391 employers' compliance with LL 144 and the user experience for prospective job applicants. Among these employers, 18 posted audit reports and 13 posted transparency notices. These rates could potentially be explained by a significant limitation in the accountability mechanisms enacted by LL 144. Since the law grants employers substantial discretion over whether their system is in scope of the law, a null result cannot be said to indicate non-compliance, a condition we call ``null compliance." Employer discretion may also explain our finding that nearly all audits reported an impact factor over 0.8, a rule of thumb often used in employment discrimination cases. We also find that the benefit of LL 144 to ordinary job seekers is limited due to shortcomings in accessibility and usability. Our findings offer important lessons for policy-makers as they consider regulating algorithmic systems, particularly the degree of discretion to grant to regulated parties and the limitations of relying on transparency and end-user accountability.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 空間フォトニックイジングマシンを用いた効率的な計算:低ランク・循環行列制約を用いた計算 Efficient Computation Using Spatial-Photonic Ising Machines: Utilizing Low-Rank and Circulant Matrix Constraints ( http://arxiv.org/abs/2406.01400v1 ) ライセンス: Link先を確認	Richard Zhipeng Wang, James S. Cummins, Marvin Syed, Nikita Stroev, George Pastras, Jason Sakellariou, Symeon Tsintzos, Alexis Askitopoulos, Daniele Veraldi, Marcello Calvanese Strinati, Silvia Gentilini, Davide Pierangeli, Claudio Conti, Natalia G. Berloff,	(参考訳) 我々は空間フォトニックIsing Machine (SPIM) の可能性を探り、低ランクおよび循環結合行列を用いた計算集約Ising問題に対処する。以上の結果から,SPIMの性能は結合行列のランクと精度に大きく影響していることが明らかとなった。高度な分解技術を開発し,評価することにより,従来のマティス型行列の限界を克服し,SPIMが解決できる問題の範囲を広げる。提案手法は,NP完全問題に適用可能な,本質的に低いランクの行列を含む,多種多様な結合行列に適合する。本研究では,SPIMの現実的応用を実証するために,最適化タスク,特に金融最適化における低ランク近似の実用的メリットについて検討する。最後に,SPIMハードウェアのハードウェア精度に課される計算制限を評価し,これらの制約の中でこれらのシステムの性能を最適化するための戦略を提案する。 We explore the potential of spatial-photonic Ising machines (SPIMs) to address computationally intensive Ising problems that employ low-rank and circulant coupling matrices. Our results indicate that the performance of SPIMs is critically affected by the rank and precision of the coupling matrices. By developing and assessing advanced decomposition techniques, we expand the range of problems SPIMs can solve, overcoming the limitations of traditional Mattis-type matrices. Our approach accommodates a diverse array of coupling matrices, including those with inherently low ranks, applicable to complex NP-complete problems. We explore the practical benefits of low-rank approximation in optimization tasks, particularly in financial optimization, to demonstrate the real-world applications of SPIMs. Finally, we evaluate the computational limitations imposed by SPIM hardware precision and suggest strategies to optimize the performance of these systems within these constraints.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# ローレンツがカシミール効果を増強 Size Matters: Lorentz Boosted Casimir Effect ( http://arxiv.org/abs/2406.01401v1 ) ライセンス: Link先を確認	Yu-Song Cao, YanXia Liu, Ding-Fang Zeng,	(参考訳) 過去数十年の間に多くの証拠が現れ、カシミールエネルギーの負性性がエキゾチックな機械的および重力的効果の原因であることを示した。この研究において、ロレンツはカシミールの空洞を拡大し、歴史研究におけるその勢いにはほとんど注意を払わない。キャビティが持つ真空エネルギーと運動量は,キャビティの膨張特性により,点粒子の運動量とは異なることが判明した。しかし、両者の質量殻状態は、空洞が移動方向のみに沿って有限である限りは同一である。 Many evidences appear in the past decades and show that the negativity of Casimir energy is responsible for exotic mechanical and gravitational effects. We study in this work the Lorentz boost of a Casimir cavity, on which little attention is paid to its momentum in historical works. We find that the vacuum energy and momentum carried by the cavity transform differently from those of point particles due to the cavity's extension feature. However, the mass-shell condition of the two are identical as long as the cavity is finite along the moving direction only.	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# Rationaleの混合:視覚質問応答のためのマルチモーダル推論混合 Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering ( http://arxiv.org/abs/2406.01402v1 ) ライセンス: Link先を確認	Tao Li, Linjun Shou, Xuejun Liu,	(参考訳) ゼロショット視覚質問応答(Zero-shot visual question answering, VQA)は、モダリティ間の推論を必要とする課題である。既存の方法の中には、Chain of Thoughts (CoT)フレームワーク内の1つの理論的根拠に依存しているものもあるが、VQA問題の複雑さを捉えるには不足しているものもある。一方、複数の有理数を用いる他の方法では、低多様性、モダリティアライメントの低さ、非効率な検索と融合に悩まされている。これらの課題に対応するために、VQAの複数の論理を混合した新しいマルチモーダル推論法である \emph{Mixture of Rationales (MoR)} を提案する。 MoRは、単一の凍結されたビジョン・アンド・ランゲージ事前訓練モデル(VLPM)モデルを使用して、動的にマルチモーダル思考を生成、検索、融合する。我々は、NLVR2とOKVQAの2つの挑戦的VQAデータセットに対して、2つの代表的バックボーンOFAとVL-T5でMoRを評価する。 MoR は NLVR2 の 12.43 % の精度向上、OKVQA-S (OKVQA の科学技術カテゴリ) の 2.45 % の精度向上を実現している。 Zero-shot visual question answering (VQA) is a challenging task that requires reasoning across modalities. While some existing methods rely on a single rationale within the Chain of Thoughts (CoT) framework, they may fall short of capturing the complexity of the VQA problem. On the other hand, some other methods that use multiple rationales may still suffer from low diversity, poor modality alignment, and inefficient retrieval and fusion. In response to these challenges, we propose \emph{Mixture of Rationales (MoR)}, a novel multi-modal reasoning method that mixes multiple rationales for VQA. MoR uses a single frozen Vision-and-Language Pre-trained Models (VLPM) model to {dynamically generate, retrieve and fuse multi-modal thoughts}. We evaluate MoR on two challenging VQA datasets, i.e. NLVR2 and OKVQA, with two representative backbones OFA and VL-T5. MoR achieves a 12.43\% accuracy improvement on NLVR2, and a 2.45\% accuracy improvement on OKVQA-S( the science and technology category of OKVQA).	翻訳日:2024-06-05 22:39:57 公開日:2024-06-03
# 組織画像のための専門家駆動型データ生成パイプライン An expert-driven data generation pipeline for histological images ( http://arxiv.org/abs/2406.01403v1 ) ライセンス: Link先を確認	Roberto Basla, Loris Giulivi, Luca Magri, Giacomo Boracchi,	(参考訳) 深層学習(DL)モデルは、生体細胞分画や組織像の分類など、多くの応用に成功している。これらのモデルは、アノテーションが不足し高価である医療分野において、必ずしも利用できない大量の注釈付きデータを必要とする。この制限を克服するため,我々はセルセグメンテーションのための合成データセットを生成する新しいパイプラインを提案する。本手法は,少数の注釈付き画像のみを前提として,DLインスタンスセグメンテーションモデルを効果的に訓練できる大規模な画像データセットを生成する。私たちのソリューションは、データセットの生成中に専門家がドメイン知識を組み込むことによって、現実的な形状と配置のセルを生成するように設計されています。 Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# 制約を用いたスパースと代替サブグループ記述の発見 Using Constraints to Discover Sparse and Alternative Subgroup Descriptions ( http://arxiv.org/abs/2406.01411v1 ) ライセンス: Link先を確認	Jakob Bach,	(参考訳) サブグループ発見法により、ユーザはデータセットで興味深い領域の簡単な記述を取得できる。サブグループ発見における制約の使用は、さらに解釈可能性を高めることができる。まず、サブグループ記述で使用される機能の数を制限することで、後者はスパース化します。第二に、与えられたサブグループと類似したデータオブジェクトの集合をカバーするが、異なる特徴を持つ代替サブグループ記述を見つけるための新しい最適化問題を提案する。両制約型をヒューリスティックなサブグループ発見手法に統合する方法を述べる。さらに, ホワイトボックス最適化問題として, サブグループ探索のSMT (Satifiability Modulo Theories) の新たな定式化を提案する。さらに、両制約型がNP-ハード最適化問題につながることを証明した。最後に,27のバイナリ分類データセットを用いて,非制約・制約付きサブグループ探索のヒューリスティック検索とソルバ検索を比較した。ヒューリスティック探索法は,制約のあるシナリオにおいても,短時間で高品質なサブグループを生成することが多い。 Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare heuristic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# CE-NAS: エンド・ツー・エンドのカーボン効率の良いニューラルネットワーク検索フレームワーク CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework ( http://arxiv.org/abs/2406.01414v1 ) ライセンス: Link先を確認	Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo,	(参考訳) 本研究は,モデル設計プロセスにおける炭素効率の向上を目的とした,ニューラルアーキテクチャ探索(NAS)に対する新しいアプローチを提案する。提案したフレームワークCE-NASは、NASアルゴリズムのエネルギーの炭素放出変化とエネルギー差を探索することにより、NASに関連する高炭素コストの鍵となる課題に対処する。高レベルでは、CE-NASは強化学習エージェントを利用して、時系列変換器によって予測される炭素強度に基づいてGPUリソースを動的に調整し、エネルギー効率の高いサンプリングとエネルギー集約評価タスクのバランスをとる。さらに、CE-NASは、最近提案された多目的最適化器を利用して、NAS探索空間を効果的に削減する。我々は,NASデータセットとオープンドメインNASタスクのSOTA結果を達成しつつ,CE-NASの炭素排出量低減効果を実証した。例えば、HW-NasBenchデータセットでは、CE-NASはバニラNASに匹敵する探索効率を維持しながら、二酸化炭素排出量を最大7.22倍削減する。オープンドメインNASタスクでは、CE-NASはCIFAR-10で97.35%の精度でSOTAを達成し、パラメータはわずか1.68M、二酸化炭素は38.53ポンドである。 ImageNetでは、NVIDIA V100上でFP16を使用して0.78msのTensorRTレイテンシで80.6%のトップ1の精度を実現し、909.86 lbのCO2を消費するだけで、他のワンショットベースのNASベースラインに匹敵する。 This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# ラベルのない配電シフトへの等角予測の適用 Adapting Conformal Prediction to Distribution Shifts Without Labels ( http://arxiv.org/abs/2406.01416v1 ) ライセンス: Link先を確認	Kevin Kasa, Zhiyu Zhang, Heng Yang, Graham W. Taylor,	(参考訳) コンフォーマル予測(CP)により、機械学習モデルは、交換可能なデータを想定した、保証されたカバレッジレートで予測セットを出力できる。残念なことに、交換可能性の仮定は実際には分布のシフトによってしばしば破られ、その課題はテスト時に基礎となる真理ラベルの欠如によって複雑化される。本研究の目的は,テスト領域からのラベルなしデータのみを用いてCP生成予測セットの品質を向上させることである。これは、未ラベルテストデータに対するベースモデルの不確実性に応じてCPのスコア関数を調整する、ECP と EACP と呼ばれる2つの新しい手法によって達成される。大規模データセットとニューラルネットワークアーキテクチャの広範な実験を通じて、我々の手法は既存のベースラインよりも一貫した改善を提供し、教師付きアルゴリズムの性能とほぼ一致していることを示す。 Conformal prediction (CP) enables machine learning models to output prediction sets with guaranteed coverage rate, assuming exchangeable data. Unfortunately, the exchangeability assumption is frequently violated due to distribution shifts in practice, and the challenge is often compounded by the lack of ground truth labels at test time. Focusing on classification in this paper, our goal is to improve the quality of CP-generated prediction sets using only unlabeled data from the test domain. This is achieved by two new methods called ECP and EACP, that adjust the score function in CP according to the base model's uncertainty on the unlabeled test data. Through extensive experiments on a number of large-scale datasets and neural network architectures, we show that our methods provide consistent improvement over existing baselines and nearly match the performance of supervised algorithms.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# 多重補間による混合増幅 Mixup Augmentation with Multiple Interpolations ( http://arxiv.org/abs/2406.01417v1 ) ライセンス: Link先を確認	Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok,	(参考訳) 乱数サンプルペアを用いて、入力とラベルの線形補間により新しいサンプルを生成する。しかし、1つの補間しか生成できないため、増強能力は制限される。本稿では,サンプルペアから複数の補間を生成するマルチミックスという,シンプルで効果的な拡張を提案する。生成されたサンプルの順序を順序付けすることで、マルチミックスは、標準的なミックスアップよりもトレーニングプロセスのガイドに役立てることができる。さらに理論的には、これは確率勾配の分散を減少させることもできる。多数の合成および大規模データセットに対する広範囲な実験により、マルチミックスは、一般化、堅牢性、キャリブレーションの点で様々なミックスアップ変種および非ミックスアップベースラインより優れていることが示された。 Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pair. With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup. Moreover, theoretically, this can also reduce the stochastic gradient variance. Extensive experiments on a number of synthetic and large-scale data sets demonstrate that multi-mix outperforms various mixup variants and non-mixup-based baselines in terms of generalization, robustness, and calibration.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# ランドスケープアーキテクチャにおけるAIの理解の課題 Problematizing AI Omnipresence in Landscape Architecture ( http://arxiv.org/abs/2406.01421v1 ) ライセンス: Link先を確認	Phillip Fernberg, Zihao Zhang,	(参考訳) このポジションペーパーは、ランドスケープアーキテクチャの専門職における現在のAIの狂気を調べるための、重要なレンズを論じ、提供します。著者らは、AIを考える際にランドスケープアーキテクトが住むことができる5つのアーキタイプやメンタルモードを提案している。 AIの判断を加速度の1軸に制限するのではなく、これらのアーチタイプと対応する物語は関係スペクトルに沿って存在し、透過可能であり、文脈に応じてLAがそれらを受け取り、切り替えることができる。我々は、これらのアーチタイプとそれらのAI進歩への貢献の間の関係を、因果ループ図(CLD)を用いてモデル化し、それらの相互作用により、AIに近づいたよりニュアンスな方法が、新しいデジタル経済において新しいプラクティスのモードを開くかもしれないと主張している。 This position paper argues for, and offers, a critical lens through which to examine the current AI frenzy in the landscape architecture profession. In it, the authors propose five archetypes or mental modes that landscape architects might inhabit when thinking about AI. Rather than limiting judgments of AI use to a single axis of acceleration, these archetypes and corresponding narratives exist along a relational spectrum and are permeable, allowing LAs to take on and switch between them according to context. We model these relationships between the archetypes and their contributions to AI advancement using a causal loop diagram (CLD), and with those interactions argue that more nuanced ways of approaching AI might also open new modes of practice in the new digital economy.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# ソフトウェアリポジトリ全体を理解するには? How to Understand Whole Software Repository? ( http://arxiv.org/abs/2406.01422v1 ) ライセンス: Link先を確認	Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li,	(参考訳) 近年,Large Language Model (LLM) をベースとしたエージェントが,自動ソフトウェア工学 (ASE) の大幅な発展を遂げている。有効性は検証されているが、既存の手法の設計は主にコードのローカル情報、例えば問題、クラス、関数に焦点を合わせており、ソフトウェアシステム内のグローバルコンテキストと相互依存を捉えるのに限界がある。人間のSE開発者の実践的な経験から、リポジトリ全体の優れた理解がASEにとって重要な道であると論じます。しかし、リポジトリ全体を理解することは、非常に長いコード入力、ノイズの多いコード情報、複雑な依存関係関係など、さまざまな課題を引き起こします。この目的のために,リポジトリ全体を包括的に理解するエージェントを誘導することにより,RepoUnderstanderという新しいASE手法を開発した。具体的には、まずリポジトリ全体の重要な情報をトップダウンモードでリポジトリ知識グラフに格納し、リポジトリの複雑さを減らします。その後、モンテカルロ木探索に基づくリポジトリ探索戦略を提案することにより、エージェントにリポジトリ全体を理解する能力を与える。さらに、リポジトリレベルの知識をより活用するために、エージェントをまとめ、分析し、計画するように指導します。そして、ツールを操作して情報を動的に取得し、パッチを生成して実際のGitHubの問題を解決する。大規模な実験は、提案されたRepoUnderstanderの優位性と有効性を示している。 SWE-bench LiteベンチマークではSWE-agentと比較して18.5\%改善した。 Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From the practical experiences of the human SE developers, we argue that an excellent understanding of the whole repository will be the critical path to ASE. However, understanding the whole repository raises various challenges, e.g., the extremely long code input, the noisy code information, the complex dependency relationships, etc. To this end, we develop a novel ASE method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. Specifically, we first condense the critical information of the whole repository into the repository knowledge graph in a top-to-down mode to decrease the complexity of repository. Subsequently, we empower the agents the ability of understanding whole repository by proposing a Monte Carlo tree search based repository exploration strategy. In addition, to better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan. Then, they can manipulate the tools to dynamically acquire information and generate the patches to solve the real-world GitHub issues. Extensive experiments demonstrate the superiority and effectiveness of the proposed RepoUnderstander. It achieved 18.5\% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# アクター批判アルゴリズムの値改善 Value Improved Actor Critic Algorithms ( http://arxiv.org/abs/2406.01423v1 ) ライセンス: Link先を確認	Yaniv Oren, Moritz A. Zanger, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin Bohmer,	(参考訳) 多くの現代的な強化学習アルゴリズムはアクター・クリティカル(AC)フレームワークに基づいて構築されており、ポリシー改善演算子を使用したポリシー(アクター)の反復的改善とポリシーの価値の反復的近似(批評家)である。対照的に、人気のある値ベースのアルゴリズムファミリーは、値更新に改善演算子を使用し、値関数を直接反復的に改善する。本稿では、ポリシーベースのアルゴリズムの精神におけるポリシーに適用されるものと、価値ベースのアルゴリズムの精神における価値に適用されるものとの2つの異なる改善演算子を用いたACフレームワークの一般的な拡張を提案する。本稿では,オンラインオフラインACアルゴリズムTD3とDDPGの2つの実用的なVI-ACアルゴリズムを設計する。我々は,MujocoベンチマークのVI-TD3とVI-DDPGを評価し,テスト対象のすべての環境において,それぞれのベースラインの性能を改善したり適合させたりすることを発見した。 Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm family employs improvement operators in the value update, to iteratively improve the value function directly. In this work, we propose a general extension to the AC framework that employs two separate improvement operators: one applied to the policy in the spirit of policy-based algorithms and one applied to the value in the spirit of value-based algorithms, which we dub Value-Improved AC (VI-AC). We design two practical VI-AC algorithms based in the popular online off-policy AC algorithms TD3 and DDPG. We evaluate VI-TD3 and VI-DDPG in the Mujoco benchmark and find that both improve upon or match the performance of their respective baselines in all environments tested.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# フルリカレントモデルによる普遍的インコンテキスト近似 Universal In-Context Approximation By Prompting Fully Recurrent Models ( http://arxiv.org/abs/2406.01424v1 ) ライセンス: Link先を確認	Aleksandar Petrov, Tom A. Lamb, Alasdair Paren, Philip H. S. Torr, Adel Bibi,	(参考訳) ゼロショットおよびインコンテキスト学習は、モデル微調整なしでタスクを解決し、生成モデルソリューションの開発に不可欠である。したがって、事前訓練されたモデルが任意の関数、すなわち、普遍的なインコンテキスト近似器であるかどうかを近似させることができるかどうかを理解することが重要である。近年、トランスモデルにこの特性があることが示されているが、これらの結果は彼らの注意機構に依存している。したがって、これらの発見は、RNN、LSTM、そしてますます人気のあるSSMのような、完全に反復するアーキテクチャには適用されない。我々は、RNN、LSTM、GRU、線形RNN、およびMambaやHawk/Griffinのような線形ゲートアーキテクチャが、普遍的なインコンテキスト近似としても機能できることを実証した。議論を合理化するために、我々はLSRLと呼ばれるプログラミング言語を導入し、これら完全に再帰的なアーキテクチャにコンパイルする。 LSRLは、解釈可能性ベンチマークの構築など、完全再帰モデルのさらなる研究には、独立した関心があるかもしれない。このようなゲーティング(LSTM、GRU、Hawk/Griffin)を組み込んだアーキテクチャは、より安定して特定の操作を実装できるので、より実用的なコンテキスト内普遍近似の候補となる。 Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve as universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# EAGLE: クロスビュー理解における適応幾何学に基づく効率的な学習 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding ( http://arxiv.org/abs/2406.01429v1 ) ライセンス: Link先を確認	Thanh-Dat Truong, Utsav Prabhu, Dongyi Wang, Bhiksha Raj, Susan Gauch, Jeyamkondan Subbiah, Khoa Luu,	(参考訳) 教師なしドメイン適応は、データ分散間でセマンティックセグメンテーションモデルを転送する効率的なアプローチである。一方、大規模視覚言語モデルに基づく最近のオープン語彙セマンティックシーン理解は、多様な概念やカテゴリを学習できるため、オープンセット設定に有効である。しかし、これらの先行手法は、クロスビュー幾何モデリングが欠如しているため、異なるカメラビューをまたいだ一般化に失敗する。現在、クロスビュー学習の分析は限られている。この問題を解決するために,セマンティックシーン理解におけるビュー間の幾何学的構造変化をモデル化するための,教師なしクロスビュー適応学習手法を提案する。まず,カメラ間における画像やセグメンテーションマスクの構造変化をモデル化するための,非ペアデータに関するクロスビュー幾何学的制約を提案する。第2に、カメラビュー間の幾何学的構造変化を効率的に測定するための、新しい測地流に基づく相関指標を提案する。第3に、クロスビュー適応学習において、オープン語彙セグメンテーションネットワークのビュー情報モデリングを強化するために、新しいビュー条件プロンプト機構を導入する。本研究では,従来の教師なし領域適応やオープンボキャブラリセマンティックセマンティックセグメンテーション手法と比較して,SOTA(State-of-the-Art)の性能を達成できることを実証した。 Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# ED-SAM:視覚言語基礎モデルにおけるドメイン一般化のための効率的な拡散サンプリング手法 ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models ( http://arxiv.org/abs/2406.01432v1 ) ライセンス: Link先を確認	Thanh-Dat Truong, Xin Li, Bhiksha Raj, Jackson Cothren, Khoa Luu,	(参考訳) Vision-Language Foundation Modelは、近年、様々な認知学習タスクにおいて優れたパフォーマンスを示している。視覚言語モデルの卓越した性能は、主に大規模事前学習データセットと異なるデータ拡張技術に依存している。しかし、ビジョン言語基盤モデルの領域一般化の問題に対処する必要がある。この問題は、視覚言語基礎モデルの未知のデータ分布への一般化性に制限を与えている。本稿では、視覚言語基盤モデルの一般化性を改善するために、ドメイン一般化(ED-SAM)に対する簡易かつ効率的な拡散サンプリング手法を提案する。本研究の理論的解析は,視覚言語基礎モデルにおける拡散モデルと領域一般化の批判的役割と関係を明らかにする。そこで,本研究では,拡散サンプリング法に簡易かつ効果的なトランスポートトランスフォーメーションを導入する。敵のサンプルを効果的に生成し、未知のデータ分布に対する基礎モデルの一般化性を向上させる。 CC3M, CC12M, LAION400Mなど, 視覚言語による事前学習データセットのさまざまなスケールに関する実験結果から, 提案したED-SAMアプローチの最先端性能とスケーラビリティが他の手法と比較して一貫して示されている。 The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited the generalizability of the vision-language foundation model to unknown data distributions. In this paper, we introduce a new simple but efficient Diffusion Sampling approach to Domain Generalization (ED-SAM) to improve the generalizability of the vision-language foundation model. Our theoretical analysis in this work reveals the critical role and relation of the diffusion model to domain generalization in the vision-language foundation model. Then, based on the insightful analysis, we introduce a new simple yet effective Transport Transformation to diffusion sampling method. It can effectively generate adversarial samples to improve the generalizability of the foundation model against unknown data distributions. The experimental results on different scales of vision-language pre-training datasets, including CC3M, CC12M, and LAION400M, have consistently shown State-of-the-Art performance and scalability of the proposed ED-SAM approach compared to the other recent methods.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# 非対称なカーネル学習を用いたカーネルリッジレス回帰の学習解析 Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning ( http://arxiv.org/abs/2406.01435v1 ) ライセンス: Link先を確認	Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens,	(参考訳) リッジレス回帰は研究者の間で注目を集めており、特に'Benign Overfitting' 現象に照らして、ノイズのあるサンプルを補間するモデルが堅牢な一般化を示す。しかしながら、カーネルのリッジレスレグレッションは、柔軟性の欠如のため、必ずしもうまく機能しない。本稿では,局所適応バンド幅(LAB)RBFカーネルを用いたカーネルリッジレス回帰を改良し,実験と理論の両方における性能向上を目的としたカーネル学習手法を取り入れた。初めて、LAB RBFカーネルから学んだ関数は、Reproducible Kernel Hilbert Spaces (RKHSs) の積分空間に属することを示した。提案モデルに明示的な正規化がないにもかかわらず、その最適化は RKHS の積分空間における$\ell_0$-regularized問題の解法と等価であり、一般化能力の起源を解明する。近似解析の観点から,提案モデルに対する学習率を軽度条件下で導出するために,$l_q$-norm解析手法($0<q<1$)を導入する。この結果は我々の理論的な理解を深め、我々のアルゴリズムの頑健な近似能力はRKHSの積分空間の容量が大きいことから生じると説明し、その一般化能力はサポートベクトルの数によって制御される疎性によって保証される。合成データと実データの両方の実験結果から, 理論的結論が得られた。 Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# ジャイアンツの心を編集する:大規模言語モデルにおける知識編集の落とし穴の詳細な探索 Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models ( http://arxiv.org/abs/2406.01436v1 ) ライセンス: Link先を確認	Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen,	(参考訳) 知識編集は、パラメータの変更を最小限に抑えて、大規模言語モデル(LLM)における事実知識を効率的に更新する技術である。しかし、近年の研究では、知識の歪みや一般的な能力の劣化など、編集後に現れた副作用が特定されている。本調査は,これらの副作用を包括的に研究し,LLMにおける知識編集に関わる課題を統一的に考察する。関連研究を議論し、これらの限界を克服するための潜在的研究の方向性を要約する。本研究は,従来の知識編集手法の限界を強調し,LLMの内部知識構造をより深く理解する必要性を強調し,知識編集手法の改善を図っている。今後の研究を促進するため、私たちはhttps://github.com/MiuLab/EditLLM-Surveyで論文収集などの補完資料を公開しました。 Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing a unified view of the challenges associated with knowledge editing in LLMs. We discuss related works and summarize potential research directions to overcome these limitations. Our work highlights the limitations of current knowledge editing methods, emphasizing the need for deeper understanding of inner knowledge structures of LLMs and improved knowledge editing methods. To foster future research, we have released the complementary materials such as paper collection publicly at https://github.com/MiuLab/EditLLM-Survey	翻訳日:2024-06-05 22:30:12 公開日:2024-06-03
# Asynchronous Byzantine Federated Learning Asynchronous Byzantine Federated Learning ( http://arxiv.org/abs/2406.01438v1 ) ライセンス: Link先を確認	Bart Cox, Abele Mălan, Jérémie Decouchant, Lydia Y. Chen,	(参考訳) フェデレートラーニング(FL)は、地理的に分散した一連のクライアントが、サーバを通じてモデルを集合的に訓練することを可能にする。古典的には、トレーニングプロセスは同期的であるが、遅いクライアントや異種ネットワークで、その速度を維持するために非同期にすることができる。しかしながら、ビザンティンのフォールトトレラントFLシステムの大部分は同期トレーニングプロセスに依存している。私たちのソリューションは、補助的なサーバデータセットを必要とせず、以前の作業の欠点であるストラグラーによって遅延しない、最初のビザンチン耐性で非同期なFLアルゴリズムの1つである。直感的には、ソリューション内のサーバは最新モデルのクライアントから最小限のアップデートを受信して安全に更新するのを待ちます。我々は、勾配インバージョン、摂動、バックドアアタックによる画像およびテキストデータセットの最先端アルゴリズムと比較した。提案手法は, 従来の同期FLソリューションよりも高速にモデルを訓練し, 従来の非同期FLソリューションよりもビザンチンクライアントの存在下で, 摂動および勾配反転攻撃に対して最大1.54x, 1.75xの精度を維持した。 Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchronous to maintain its speed in presence of slow clients and in heterogeneous networks. The vast majority of Byzantine fault-tolerant FL systems however rely on a synchronous training process. Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms that does not require an auxiliary server dataset and is not delayed by stragglers, which are shortcomings of previous works. Intuitively, the server in our solution waits to receive a minimum number of updates from clients on its latest model to safely update it, and is later able to safely leverage the updates that late clients might send. We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets under gradient inversion, perturbation, and backdoor attacks. Our results indicate that our solution trains a model faster than previous synchronous FL solution, and maintains a higher accuracy, up to 1.54x and up to 1.75x for perturbation and gradient inversion attacks respectively, in the presence of Byzantine clients than previous asynchronous FL solutions.	翻訳日:2024-06-05 22:20:28 公開日:2024-06-03
# 地理的分散クライアントのための非同期マルチサーバフェデレーション学習 Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients ( http://arxiv.org/abs/2406.01439v1 ) ライセンス: Link先を確認	Yuncong Zuo, Bart Cox, Jérémie Decouchant, Lydia Y. Chen,	(参考訳) フェデレートラーニング(FL)システムは、複数のクライアントが単一のサーバで中間モデルの重みを同期的に交換することで、機械学習モデルを反復的にトレーニングすることができる。このようなFLシステムのスケーラビリティは、同期通信によるサーバアイドル時間と、ひとつのサーバがボトルネックになるリスクの2つの要因によって制限することができる。本稿では,完全に非同期な最初のマルチサーバFLシステムであるFLアーキテクチャを提案する。私たちのソリューションは、サーバとクライアントの両方を継続的にアクティブにします。従来のマルチサーバメソッドと同様に、クライアントは最も近いサーバとのみ対話し、モデルへの効率的なアップデート統合を保証する。しかし、異なることに、サーバは定期的に互いに非同期に更新し、クライアントとのやりとりを延期しない。我々は、MNISTとCIFAR-10の画像分類データセットとWikiText-2言語モデリングデータセットの3つの代表的なベースラインであるFedAvg、FedAsync、HierFAVGを比較した。我々のソリューションは、以前のベースラインと類似または高い精度レベルに収束し、地理的に分散した設定でそれを行うのに61%の時間を要する。 Federated learning (FL) systems enable multiple clients to train a machine learning model iteratively through synchronously exchanging the intermediate model weights with a single server. The scalability of such FL systems can be limited by two factors: server idle time due to synchronous communication and the risk of a single server becoming the bottleneck. In this paper, we propose a new FL architecture, to our knowledge, the first multi-server FL system that is entirely asynchronous, and therefore addresses these two limitations simultaneously. Our solution keeps both servers and clients continuously active. As in previous multi-server methods, clients interact solely with their nearest server, ensuring efficient update integration into the model. Differently, however, servers also periodically update each other asynchronously, and never postpone interactions with clients. We compare our solution to three representative baselines - FedAvg, FedAsync and HierFAVG - on the MNIST and CIFAR-10 image classification datasets and on the WikiText-2 language modeling dataset. Our solution converges to similar or higher accuracy levels than previous baselines and requires 61% less time to do so in geo-distributed settings.	翻訳日:2024-06-05 22:20:28 公開日:2024-06-03
# 連続四周期駆動時のPXPモデルにおける予熱 Prethermalization in the PXP Model under Continuous Quasiperiodic Driving ( http://arxiv.org/abs/2406.01440v1 ) ライセンス: Link先を確認	Pinaki Dutta, Sayan Choudhury, Vishwanath Shukla,	(参考訳) 周期的に駆動される量子多体系の長寿命非平衡状態を実現する最近の実験により、強いリドベルク閉塞状態における準周期的に駆動されるリドベルク原子鎖の力学を考察した。この体制では、システムは運動論的に制約され、 'PXP' モデルはその力学を記述する。運転なしでも、PXPモデルは、N\'{e}el-順序付き初期状態から発せられる力学に対して、多体スカーリングおよび結果として生じる持続的な振動を示す。システムに連続駆動を施すと,動的な動作の豊富な配列が出現することを示した。高周波系では、この系は周期駆動と準周期駆動の両方でN\'{e}el順序の初期状態の再生と振動を示す。我々は、この非エルゴディディティの起源を、この体制におけるこれらの駆動プロトコルの両方に対して効果的なPXPハミルトニアンに遡る。さらに,高振幅状態下での非単調な非単調性を示す。これは、N\'{e}el-次数と完全偏極初期状態の両方に対して、いくつかの再帰的スカーリング遷移をもたらす。この結果から, 連続準周期駆動プロトコルは, 速度論的に制約された系において, 物質の予熱相を実現するための有望な経路を導出できることが示唆された。 Motivated by recent experiments realizing long-lived non-equilibrium states in aperiodically driven quantum many-body systems, we investigate the dynamics of a quasiperiodically driven Rydberg atom chain in the strong Rydberg blockage regime. In this regime, the system is kinetically constrained and the `PXP' model describes its dynamics. Even without driving, the PXP model exhibits many-body scarring and resultant persistent oscillations for dynamics originating from the N\'{e}el-ordered initial state. We demonstrate that a rich array of dynamical behaviors emerge when the system is subjected to a continuous drive. In the high-frequency regime, the system exhibits revivals and oscillations for the N\'{e}el ordered initial state both for periodic and quasi-periodic drives. We trace the origin of this non-ergodicity to an effective PXP Hamiltonian for both of these driving protocols in this regime. Furthermore, we demonstrate that the behavior of the fidelity and the entanglement entropy is non-monotonic at low frequencies in the high-amplitude regime. This leads to several re-entrant scarring transitions both for both the N\'{e}el-ordered and the fully polarized initial state. Our results demonstrate that continuous quasi-periodic drive protocols can provide a promising route to realize prethermal phases of matter in kinetically constrained systems.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# LexMatcher: LLMを用いた機械翻訳のための辞書中心のデータ収集 LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation ( http://arxiv.org/abs/2406.01441v1 ) ライセンス: Link先を確認	Yongjing Yin, Jiali Zeng, Yafu Li, Fandong Meng, Yue Zhang,	(参考訳) 機械翻訳のためのオープンソースの大規模言語モデル(LLM)の微調整が最近注目され、従来のニューラルネットワーク翻訳からデータ中心の研究へとシフトした。しかし、機械翻訳における微調整のためのデータ収集の領域は、いまだに未探索である。本稿では,バイリンガル辞書を利用してデータセットを生成する簡易かつ効果的なデータ収集手法であるLexMatcherについて述べる。データセットは、既存のコーパスから取得したサブセットと、多文語の頻繁な感覚を補うより小さな合成サブセットとを含む。提案手法は,LLaMA2をベースモデルとして,WMT2022テストセットの確立したベースラインよりも優れ,単語感覚の曖昧さや専門用語の翻訳に関わるタスクにおいて,大幅な性能向上を示す。これらの結果は、LxMatcherがLLMベースの機械翻訳の強化に有効であることを示す。 The fine-tuning of open-source large language models (LLMs) for machine translation has recently received considerable attention, marking a shift towards data-centric research from traditional neural machine translation. However, the area of data collection for instruction fine-tuning in machine translation remains relatively underexplored. In this paper, we present LexMatcher, a simple yet effective method for data collection that leverages bilingual dictionaries to generate a dataset, the design of which is driven by the coverage of senses found in these dictionaries. The dataset comprises a subset retrieved from an existing corpus and a smaller synthesized subset which supplements the infrequent senses of polysemous words. Utilizing LLaMA2 as our base model, our approach outperforms the established baselines on the WMT2022 test sets and also exhibits significant performance improvements in tasks related to word sense disambiguation and specialized terminology translation. These results underscore the effectiveness of LexMatcher in enhancing LLM-based machine translation.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 低リソース言語のためのASRの実装:包括的データセット作成アプローチ Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach ( http://arxiv.org/abs/2406.01446v1 ) ライセンス: Link先を確認	Ara Yeroyan, Nikolay Karpov,	(参考訳) 近年, 音声認識システム(ASR)は, 特に大量の音声データを持つ言語において, 大幅に改善されている。しかし、ASRシステムは少数言語や地域言語のようなリソースが少ない低リソース言語では性能が劣る傾向にある。この研究では、オーディオブックからASRトレーニングデータセットを生成するために設計された、新しいパイプラインを紹介した。これらのオーディオブックの共通構造は、オーディオセグメントの幅が広いため、ユニークな課題となっているが、最適なASRトレーニングには4秒から15秒のセグメントが必要である。そこで本研究では,音声を対応するテキストと効果的に整合させ,それをASR訓練に適した長さに分割する手法を提案する。本稿では,低リソース言語におけるASRシステムのデータ準備を簡略化し,アルメニア語を含むケーススタディを通じてその応用を実証する。提案手法は,データ不足の問題を緩和するだけでなく,表現不足言語に対するASRモデルの性能向上にも寄与する。 In recent years, automatic speech recognition (ASR) systems have significantly improved, especially in languages with a vast amount of transcribed speech data. However, ASR systems tend to perform poorly for low-resource languages with fewer resources, such as minority and regional languages. This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios. The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments, whereas optimal ASR training requires segments ranging from 4 to 15 seconds. To address this, we propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training. Our approach simplifies data preparation for ASR systems in low-resource languages and demonstrates its application through a case study involving the Armenian language. Our method, which is "portable" to many low-resource languages, not only mitigates the issue of data scarcity but also enhances the performance of ASR models for underrepresented languages.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 非退化4レベル原子系による円筒型ベクトルビームの線形及び非線形伝播 Linear and nonlinear propagation of cylindrical vector beam through a non-degenerate four level atomic system ( http://arxiv.org/abs/2406.01447v1 ) ライセンス: Link先を確認	Partha Das, Tarak Nath Dey,	(参考訳) 原子系におけるプローブベクトルビーム(PVB)の両成分の相依存性について検討した。原子は非縮退した4レベル構成で合成される。遷移は、$\pi$の偏光制御場とPVBの直交偏光成分によって結合される。媒体の線形感受性は制御フィールドとPVBの位相シフトに依存し,損失や利得を特徴づけることを示す。さらに、位相シフトは、ベクトルビーム(VB)が伝播するときに偏光回転を引き起こす。さらに, 2 つのRayleigh 長さの媒質を経由した VB 伝播に及ぼす非線形性の影響について検討した。自己焦点および脱離現象は、半径、方位、渦巻VBに対して観察される。特別な鎖状自己焦点と脱離は、適度な利得を持つ連続的な小さなスポットサイズを形成する。したがって、感受性と自己焦点の制御のメカニズムは、吸収体から増幅器への遷移、高分解能顕微鏡、光トラップシステムといった応用の可能性を秘めている。 We investigate the phase-induced susceptibilities for both components of the probe vector beam (PVB) within an atomic system. The atoms are prepared in a non-degenerate four-level configuration. The transitions are coupled by a $\pi$ polarized control field and two orthogonally polarized components of a PVB. We show that the linear susceptibility of the medium depends on the phase shift between the control field and PVB, characterizing loss or gain in the system. Additionally, the phase shift causes polarization rotation in the vector beams (VBs) as they propagate. We further study the effect of nonlinearity on the VB propagation through the medium for a couple of Rayleigh lengths. The self-focusing and defocusing phenomena are observed for radial, azimuthal, and spiral VBs. The special chain-like self-focusing and defocusing leads to the formation of consecutive smaller spot sizes with moderate gain. Therefore, the mechanism of control of susceptibility and self-focusing may hold promise for applications such as transitioning from an absorber to an amplifier, high-resolution microscopy, and optical trap systems.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 固有状態熱化理論 Theory of Eigenstate Thermalisation ( http://arxiv.org/abs/2406.01448v1 ) ライセンス: Link先を確認	Tobias Helbig, Tobias Hofmann, Ronny Thomale, Martin Greiter,	(参考訳) 孤立して相互作用する量子系を固有状態に準備し、初期時に局所観測可能を摂動すると、その期待値は、系の時間進化が決定論的であるにもかかわらず、熱期待値に向かって緩和される。 Deutsch と Srednicki の固有状態熱化仮説 (ETH) は、全量子系の固有状態がそのサブシステムへの熱浴として機能し、サブシステムの密度行列が熱密度行列に類似していることを示唆している。ここでは、相互作用する量子系の固有値分布は、非常に一般的な状況下ではガウス的であり、ダイソン・ブラウン運動ランダム行列論(英語版)は、ETHを導出し、仮説から理論へ高める。我々の分析は、エルゴード性や典型性の概念やエントロピーの概念を必要としない統計力学の導出を提供する。熱力学平衡は、大系への量子力学の適用性と積分性の欠如にのみ従う。 If we prepare an isolated, interacting quantum system in an eigenstate and perturb a local observable at an initial time, its expectation value will relax towards a thermal expectation value, even though the time evolution of the system is deterministic. The eigenstate thermalization hypothesis (ETH) of Deutsch and Srednicki suggests that this is possible because each eigenstate of the full quantum system acts as a thermal bath to its subsystems, such that the reduced density matrices of the subsystems resemble thermal density matrices. Here, we use the observation that the eigenvalue distribution of interacting quantum systems is a Gaussian under very general circumstances, and Dyson Brownian motion random matrix theory, to derive the ETH and thereby elevate it from hypothesis to theory. Our analysis provides a derivation of statistical mechanics which neither requires the concepts of ergodicity or typicality, nor that of entropy. Thermodynamic equilibrium follows solely from the applicability of quantum mechanics to large systems and the absence of integrability.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# SLANT:Spurious Logo Analysis Toolkit SLANT: Spurious Logo ANalysis Toolkit ( http://arxiv.org/abs/2406.01449v1 ) ライセンス: Link先を確認	Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer,	(参考訳) オンラインコンテンツは、広告やソーシャルメディアの投稿からウェブサイトのブランディングや製品の配置まで、ロゴでいっぱいだ。その結果、これらのロゴは、広範囲なタスク(コンテンツモデレーション、オブジェクト分類)に使用されるビジョン・ランゲージ・モデル(Vision-Language Models)の事前トレーニングに使用される広範囲なWebスクラッドデータセットで広く使われている。これらのモデルは様々なタスクにおいて有害な相関関係を学習することが示されているが、これらの相関関係がロゴを含むかどうかはまだ調査されていない。このことを理解することは、ブランドや政府機関のような公共向け機関でよく使われているロゴのため、特に重要である。そこで我々はSLANT: A Spurious Logo ANalysis Toolkitを開発した。例えば、人の写真にAdidasのロゴを追加すると、モデルがその人物を欲張りと分類する。 SLANTには、このような「すっきりとした」ロゴをマイニングするための半自動メカニズムが含まれている。この仕組みは、総合的なロゴバンクCC12M-LogoBankと、VLMがユーザが提供する下流認識ターゲットと急激な相関関係を持つロゴを銀行に検索するアルゴリズムで構成されている。 VLモデルと相関するさまざまな無害なロゴを発見 1)陰性な人形容詞 2)「無害」の概念により、有害なオンラインコンテンツを無害と誤分類させ、 3) ImageNetゼロショット分類では認識精度が低い。さらに、SLANTのロゴは、基本的なモデルに対する効果的な攻撃と見なすことができ、攻撃者は有害なコンテンツに刺激的なロゴを配置することができ、モデルが無害であると誤分類する原因となった。この脅威は、ロゴアタックの単純さを考慮して警戒されており、VLモデルのアタックサーフェスを増加させている。防御として、基礎モデルのゼロショット推論とシームレスに統合する2つの効果的な緩和戦略をツールキットに含めています。 Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for a wide array of tasks (content moderation, object classification). While these models have been shown to learn harmful correlations in various tasks, whether these correlations include logos remains understudied. Understanding this is especially important due to logos often being used by public-facing entities like brands and government agencies. To that end, we develop SLANT: A Spurious Logo ANalysis Toolkit. Our key finding is that some logos indeed lead to spurious incorrect predictions, for example, adding the Adidas logo to a photo of a person causes a model classify the person as greedy. SLANT contains a semi-automatic mechanism for mining such "spurious" logos. The mechanism consists of a comprehensive logo bank, CC12M-LogoBank, and an algorithm that searches the bank for logos that VLMs spuriously correlate with a user-provided downstream recognition target. We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification. Furthermore, SLANT's logos can be seen as effective attacks against foundational models; an attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless. This threat is alarming considering the simplicity of logo attacks, increasing the attack surface of VL models. As a defense, we include in our Toolkit two effective mitigation strategies that seamlessly integrate with zero-shot inference of foundation models.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation ( http://arxiv.org/abs/2406.01451v1 ) ライセンス: Link先を確認	Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji,	(参考訳) 本稿では、ラベル付きデータとラベルなしデータの組み合わせを効果的に活用してRESを実行する半教師付きフレームワークであるSemiRESを紹介する。 RESに半教師付き技法を適用する際の重要なハードルは、特に物体の境界において、ノイズの多い擬似ラベルの出現である。 SemiRESはSegment Anything Model (SAM) を組み込み、これらの擬似ラベルの精度を向上させる。 SemiRES内では、IoUベースの最適マッチング(IOM)と複合部品統合(CPI)の2つの代替マッチング戦略を提供する。これらの戦略はSAMの出力から最も正確なマスクを抽出し、より精度の高い学生モデルのトレーニングを導くように設計されている。利用可能な候補と正確なマスクが一致しない場合,Pixel-Wise Adjustment(PWA)戦略を開発し,学生モデルのトレーニングを擬似ラベルで直接指導する。 RefCOCO、RefCOCO+、G-Refの3つのRESベンチマークの大規模な実験は、完全に教師された手法に比べて優れたパフォーマンスを示している。注目すべきは、1%のラベル付きデータしか持たないSemiRESは、RefCOCO valセットにおいて、教師付きベースラインを大きなマージンで上回り、eg + 18.64%のゲインを達成していることだ。プロジェクトのコードは \url{https://github.com/nini0919/SemiRES} で公開されている。 In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarcation, to improve the accuracy of these pseudo-labels. Within SemiRES, we offer two alternative matching strategies: IoU-based Optimal Matching (IOM) and Composite Parts Integration (CPI). These strategies are designed to extract the most accurate masks from SAM's output, thus guiding the training of the student model with enhanced precision. In instances where a precise mask cannot be matched from the available candidates, we develop the Pixel-Wise Adjustment (PWA) strategy, guiding the student model's training directly by the pseudo-labels. Extensive experiments on three RES benchmarks--RefCOCO, RefCOCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our SemiRES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on RefCOCO val set. The project code is available at \url{https://github.com/nini0919/SemiRES}.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 植物同定のための自動融合型マルチモーダル深層学習 Automatic Fused Multimodal Deep Learning for Plant Identification ( http://arxiv.org/abs/2406.01455v1 ) ライセンス: Link先を確認	Alfreds Lapkovskis, Natalia Nefedova, Ali Beikmohammadi,	(参考訳) 植物分類は, 生態系の保全と農業の生産性, 植物の成長動態の理解の向上, 種保全支援に不可欠である。ディープラーニング(DL)技術の出現は、自律的な特徴抽出を可能にし、手作業の専門知識への依存を大幅に減らし、この分野に革命をもたらした。しかし、従来のDLモデルは単一のデータソースのみに依存しており、植物種の完全な生物学的多様性を包括的に捉えていないことが多い。最近の研究は、植物の特徴の表現を豊かにする複数のデータ型を統合することで、この制限を克服するマルチモーダル学習に転換している。このシフトは、モダリティ融合の最適点を決定するという課題をもたらす。本稿では,自動モダリティ融合を用いた植物分類における先駆的マルチモーダルDLに基づくアプローチを提案する。マルチモーダル・フュージョン・アーキテクチャー・サーチを用いて,複数の植物器官の花,葉,果実,茎のイメージを凝集モデルに統合する。 PlantCLEF2015データセットの956クラスに対して83.48%の精度を達成し、最先端の手法を超越した。後期融合よりも11.07%優れ、欠落したモダリティに対してより堅牢である。我々は、標準的なパフォーマンス指標とMcNemarのテストを用いて、確立されたベンチマークに対してモデルを検証し、その優位性をさらに強調する。 Plant classification is vital for ecological conservation and agricultural productivity, enhancing our understanding of plant growth dynamics and aiding species preservation. The advent of deep learning (DL) techniques has revolutionized this field by enabling autonomous feature extraction, significantly reducing the dependence on manual expertise. However, conventional DL models often rely solely on single data sources, failing to capture the full biological diversity of plant species comprehensively. Recent research has turned to multimodal learning to overcome this limitation by integrating multiple data types, which enriches the representation of plant characteristics. This shift introduces the challenge of determining the optimal point for modality fusion. In this paper, we introduce a pioneering multimodal DL-based approach for plant classification with automatic modality fusion. Utilizing the multimodal fusion architecture search, our method integrates images from multiple plant organs-flowers, leaves, fruits, and stems-into a cohesive model. Our method achieves 83.48% accuracy on 956 classes of the PlantCLEF2015 dataset, surpassing state-of-the-art methods. It outperforms late fusion by 11.07% and is more robust to missing modalities. We validate our model against established benchmarks using standard performance metrics and McNemar's test, further underscoring its superiority.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 大規模言語モデルを用いた微分プライベートタブラルデータ合成 Differentially Private Tabular Data Synthesis using Large Language Models ( http://arxiv.org/abs/2406.01457v1 ) ライセンス: Link先を確認	Toan V. Tran, Li Xiong,	(参考訳) 差分プライバシを持つ合成表データ生成は、正式なプライバシを持つデータ共有を実現する上で重要な問題である。方法論的な研究と開発の歴史は豊富だが、現実的な合成データセットを提供するための、微分的にプライベートな表型データジェネレータを開発することは、依然として困難である。本稿ではDP-LLMTGenについて述べる。DP-LLMTGenは、事前学習された大規模言語モデル(LLM)を利用する、微分プライベートな表形式データ合成のための新しいフレームワークである。 DP-LLMTGenは、2段階の微調整手順と表データに特化して設計された新しい損失関数を用いて、センシティブなデータセットをモデル化する。その後、微調整LDMをサンプリングして合成データを生成する。我々の経験的評価は、DP-LLMTGenが複数のデータセットとプライバシ設定にまたがる様々な既存のメカニズムより優れていることを示している。さらに、この重要な問題に対処する上で、LLMの理解を深めるために、アブレーション研究といくつかの実験的分析を行う。最後に,DP-LLMTGenの制御可能な生成能力を,公平性に制約された生成設定により強調する。 Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data generators that can provide realistic synthetic datasets remains challenging. This paper introduces DP-LLMTGen -- a novel framework for differentially private tabular data synthesis that leverages pretrained large language models (LLMs). DP-LLMTGen models sensitive datasets using a two-stage fine-tuning procedure with a novel loss function specifically designed for tabular data. Subsequently, it generates synthetic data through sampling the fine-tuned LLMs. Our empirical evaluation demonstrates that DP-LLMTGen outperforms a variety of existing mechanisms across multiple datasets and privacy settings. Additionally, we conduct an ablation study and several experimental analyses to deepen our understanding of LLMs in addressing this important problem. Finally, we highlight the controllable generation ability of DP-LLMTGen through a fairness-constrained generation setting.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# マニフォールド仮説に基づくニューラルネットワーク学習の難しさ Hardness of Learning Neural Networks under the Manifold Hypothesis ( http://arxiv.org/abs/2406.01461v1 ) ライセンス: Link先を確認	Bobak T. Kiani, Jason Wang, Melanie Weber,	(参考訳) 多様体仮説は、高次元データが低次元多様体上または近辺にあると仮定する。幾何学的構造を符号化する実用性は実証的に実証されているが、ニューラルネットワークの学習性に対するその影響の厳密な分析はほとんど欠落している。いくつかの最近の結果は、ガウス的あるいは均一なブールデータ分布の下でフィードフォワードおよび同変ニューラルネットワークを学習するための硬度結果を確立している。本稿では,多様体仮説に基づく学習の難しさについて考察する。多様体の曲率と正則性に関する最小の仮定を問うが、もしある場合、学習問題を効率的に学習できる。我々は、SQにおける硬さの証明とBooleanデータ入力の暗号設定を幾何学的設定に拡張することにより、有界曲率の入力多様体の下で学習が難しいことを証明した。一方、データ多様体の体積に関する仮定は、これらの基本的な制限を緩和し、単純な補間引数を通して学習可能性を保証する。この状態の顕著な例は多様体の学習を通じて確実に再構成できる多様体である。今後は、実世界のデータによく見られる不均一な特徴を持つ多様体の中間規則についてコメントし、実証的に検討する。 The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 被被覆レンズによる選好微調整の理解 Understanding Preference Fine-Tuning Through the Lens of Coverage ( http://arxiv.org/abs/2406.01462v1 ) ライセンス: Link先を確認	Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun,	(参考訳) 人間の嗜好データからの学習が,大規模言語モデル (LLM) を微調整する主要なパラダイムとして浮上している。 PPO(Proximal Policy Optimization)のようなオンライン強化学習(RL)と、DPO(Direct Preference Optimization)のようなオフラインのコントラスト的手法は、どちらも同一のオフライン優先データセットから開始する必要があるため、以前の作業では同等と位置づけられていた。選好微調整のためのオンラインとオフラインの技法の類似点と相違点に関する理論的理解をさらに深めるため、データセットカバレッジのレンズを通して厳密な分析を行い、トレーニングデータがテスト分布をどのようにカバーしているかを捉え、RLで広く使われている概念である。グローバルなカバレッジ条件は,オフラインのコントラスト手法が最適ポリシーに収束するのに必要かつ十分であることを示すが,オンラインRL手法ではより弱い部分カバレッジ条件で十分である。この分離によって、オンラインRLメソッドがオフラインメソッドよりも優れたパフォーマンスを得られる理由が説明できる。最後に, 従来の理論的観測をベースとして, オフラインデータをコントラッシブな選好最適化に用いるハイブリッド選好最適化(HyPO)アルゴリズムと, KL正則化のためのオンラインデータを導出する。理論的かつ実証的に、HyPOは純粋なオフラインのDPOよりも高性能でありながら、その計算とメモリ効率を保っていることを実証する。 Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# RaDe-GS: ガウシアン・スティングの深さをラスタライズ RaDe-GS: Rasterizing Depth in Gaussian Splatting ( http://arxiv.org/abs/2406.01467v1 ) ライセンス: Link先を確認	Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan,	(参考訳) Gaussian Splatting (GS) は、高品質でリアルタイムなレンダリングを実現するために、新しいビュー合成に非常に効果的であることが証明されている。しかし, 詳細な3次元形状を復元する可能性については, 十分に調査されていない。既存の方法はしばしば、形状抽出を複雑にするガウススプレートの離散的かつ非構造的な性質のために、限られた形状精度に悩まされる。 2D GSのような最近の技術は形状再構成の改善を試みているが、レンダリング品質と計算効率の両方を下げる方法でガウス原始を再構成することが多い。これらの問題に対処するため,本研究では,一般の3次元ガウススプラットの深度マップと表面正規写像をレンダリングするラスタ化手法を提案する。提案手法は形状復元精度を大幅に向上させるだけでなく,ガウススプラッティングに固有の計算効率も維持する。提案手法は,DTUデータセット上でのNeuraLangeloに匹敵するチャムファー距離誤差と,タンク&テンプルデータセット上での従来のガウススプラッティングと同様のトレーニングとレンダリング時間を実現する。本手法はガウススプラッティングにおける重要な進歩であり,既存のガウススプラッティング法に直接組み込むことができる。 Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods.	翻訳日:2024-06-05 22:20:27 公開日:2024-06-03
# 出力埋め込みにおけるトークン確率エンコーディングの理解 Understanding Token Probability Encoding in Output Embeddings ( http://arxiv.org/abs/2406.01468v1 ) ライセンス: Link先を確認	Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue,	(参考訳) 本稿では,言語モデルの出力埋め込みにおける出力トークン確率情報について検討する。出力埋め込みベクトル内の出力トークン確率の近似共通対数線形符号化を行い、出力空間が大きく、出力ロジットが集中している場合に、それが正確でスパースであることを示す。このような結果に基づいて,出力の埋め込みにおける符号化を編集し,出力確率分布を正確に修正する。さらに、出力確率エンコーディングにおける空間性は、出力埋め込みにおける多数の次元が因果言語モデリングに寄与しないことを示唆している。したがって、出力非関連次元を除去し、出力分布やシーケンス生成のデジェネレーションに大きな動きを伴わずに、30%以上の次元を削除できることを確かめる。さらに、トレーニング力学において、そのようなエンコーディングをプローブとして使用し、明らかな収束が始まる前の初期段階において、出力埋め込みがトークンの周波数情報をキャプチャするのを見つける。 In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and degeneration on sequence generation. Additionally, in training dynamics, we use such encoding as a probe and find that the output embeddings capture token frequency information in early steps, even before an obvious convergence starts.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 検索空間の拡大と全変動を考慮した断層画像再構成と正規化 Tomographic Reconstruction and Regularisation with Search Space Expansion and Total Variation ( http://arxiv.org/abs/2406.01469v1 ) ライセンス: Link先を確認	Mohammad Majid al-Rifaie, Tim Blackwell,	(参考訳) 画像再構成におけるレイプロジェクションの使用は、医用画像の一般的な技術である。不完全なデータの処理は、患者が潜在的に放射線を損傷したり、長いスキャン時間に対処できない場合に特に重要である。本稿では,問題を最適化タスクに再構成し,さらに画像空間内を粒子が移動する高度アンサンプデータからSwarmベースの再構成を用いて再構成誤差を最小化する。最近導入された探索空間拡張技術に加えて,よりスムースなプロセスである全変分正規化も適応し,検討した。提案手法は, 標準トモグラフィ再構成ツールボックスアルゴリズムよりも低い再生誤差を生じさせるとともに, 臨床的に重要なShepp-Loganファントムの高次元オプティマイザの1つである。 The use of ray projections to reconstruct images is a common technique in medical imaging. Dealing with incomplete data is particularly important when a patient is vulnerable to potentially damaging radiation or is unable to cope with the long scanning time. This paper utilises the reformulation of the problem into an optimisation tasks, followed by using a swarm-based reconstruction from highly undersampled data where particles move in image space in an attempt to minimise the reconstruction error. The process is prone to noise and, in addition to the recently introduced search space expansion technique, a further smoothing process, total variation regularisation, is adapted and investigated. The proposed method is shown to produce lower reproduction errors compared to standard tomographic reconstruction toolbox algorithms as well as one of the leading high-dimensional optimisers on the clinically important Shepp-Logan phantom.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 雑音測定による絡み合った状態の検証 Verification of entangled states under noisy measurements ( http://arxiv.org/abs/2406.01470v1 ) ライセンス: Link先を確認	Lan Zhang, Yinfei Li, Ye-Chao Liu, Jiangwei Shang,	(参考訳) 絡み合いは、多くの量子情報や量子計算タスクにおいて欠かせない役割を担い、絡み合った状態を効率的に検証する必要性を浮き彫りにする。近年、量子状態検証が注目されているが、このアプローチを実装する際のノイズ効果に対処する上での課題は未解決のままである。本研究では,計測ノイズの存在下での量子状態検証プロトコルの性能を系統的に評価する。この分析に基づいて、ノイズ測定の下でターゲット状態を一意に識別するために必要かつ十分な条件が提供される。さらに,雑音測定を用いた対称仮説試験検証アルゴリズムを提案する。その後、GHZと安定化器状態のノイズ非適応検証戦略を用いて、検証効率に対するノイズ効果を示す。解析的および数値的両面から、ノイズ検証プロトコルは、サンプルの複雑さと不確かさの間に負の二次的関係を示すことを示す。提案手法は実実験環境に容易に適用でき,将来性を示すことができる。 Entanglement plays an indispensable role in numerous quantum information and quantum computation tasks, underscoring the need for efficiently verifying entangled states. In recent years, quantum state verification has received increasing attention, yet the challenge of addressing noise effects in implementing this approach remains unsolved. In this work, we provide a systematic assessment of the performance of quantum state verification protocols in the presence of measurement noise. Based on the analysis, a necessary and sufficient condition is provided to uniquely identify the target state under noisy measurements. Moreover, we propose a symmetric hypothesis testing verification algorithm with noisy measurements. Subsequently, using a noisy nonadaptive verification strategy of GHZ and stabilizer states, the noise effects on the verification efficiency are illustrated. From both analytical and numerical perspectives, we demonstrate that the noisy verification protocol exhibits a negative quadratic relationship between the sample complexity and the infidelity. Our method can be easily applied to real experimental settings, thereby demonstrating its promising prospects.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 多要素機械学習アンサンブルフレームワークと高スループットフェムト秒レーザー処理によるInconel上のフォトニック面の逆設計 Inverse design of photonic surfaces on Inconel via multi-fidelity machine learning ensemble framework and high throughput femtosecond laser processing ( http://arxiv.org/abs/2406.01471v1 ) ライセンス: Link先を確認	Luka Grbcic, Minok Park, Mahmoud Elzouka, Ravi Prasher, Juliane Müller, Costas P. Grigoropoulos, Sean D. Lubner, Vassilia Zorba, Wibe Albert de Jong,	(参考訳) 我々は、高スループットフェムト秒レーザー処理を用いて作製した11,759個のサンプルのデータセットに基づいて、フォトニック表面の逆設計のためのMF(Multi-fidelity)機械学習アンサンブルフレームワークを実演する。 MFアンサンブルは、設計ソリューションを生成するための初期低忠実度モデルと、これらのソリューションを局所最適化によって洗練する高忠実度モデルを組み合わせる。組み合わせられたMFアンサンブルは、複数の異なるレーザー処理パラメータを生成でき、それぞれが高い精度で同じターゲットの入力スペクトル放射率(ルート平均2乗誤差<2%)を生成できる。 SHapley Additive exPlanations解析は、レーザーパラメータと分光放射率の複雑な関係の透過的なモデル解釈可能性を示している。最後に、MFアンサンブルは、効率エネルギー回収装置の改善のために生成するフォトニック表面の設計を作製し、評価することによって実験的に検証される。本手法は, エネルギー収穫への応用において, フォトニック表面の逆設計を推し進めるための強力なツールを提供する。 We demonstrate a multi-fidelity (MF) machine learning ensemble framework for the inverse design of photonic surfaces, trained on a dataset of 11,759 samples that we fabricate using high throughput femtosecond laser processing. The MF ensemble combines an initial low fidelity model for generating design solutions, with a high fidelity model that refines these solutions through local optimization. The combined MF ensemble can generate multiple disparate sets of laser-processing parameters that can each produce the same target input spectral emissivity with high accuracy (root mean squared errors < 2%). SHapley Additive exPlanations analysis shows transparent model interpretability of the complex relationship between laser parameters and spectral emissivity. Finally, the MF ensemble is experimentally validated by fabricating and evaluating photonic surface designs that it generates for improved efficiency energy harvesting devices. Our approach provides a powerful tool for advancing the inverse design of photonic surfaces in energy harvesting applications.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# DreamPhysics:ビデオ拡散プリミティブを用いた動的3次元ガウスの物理特性の学習 DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors ( http://arxiv.org/abs/2406.01476v1 ) ライセンス: Link先を確認	Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau,	(参考訳) ダイナミックな3Dインタラクションは、最近の作品で大きな関心を集めている。 1つの解決策は物理シミュレーションによる3Dシーンのアニメーションであり、もう1つはビデオ生成モデルの蒸留により静的な3Dオブジェクトの変形を学習することである。前者はターゲットオブジェクトに正確な物理的プロパティを割り当てる必要があり、そうでなければシミュレーション結果が不自然なものになる。後者は、変形学習における物理的な制約がないため、動画を小さな動きと不連続なフレームで定式化する傾向がある。映像生成モデルは実世界の撮影データを用いて訓練されており、シミュレーション環境における物理現象を判断できると考えている。そこで本研究では,映像拡散前の3次元ガウス散乱の物理特性を推定するDreamPhysicsを提案する。 DreamPhysicsは画像とテキストによるガイダンスの両方をサポートし、フレーム補間とログ勾配によるスコア蒸留サンプリングによって物理パラメータを最適化する。本手法は,適切な物理パラメータを持つ物質点法シミュレータに基づいて,現実的な動きを持つ4次元コンテンツを生成する。実験結果から,ビデオ拡散モデルの事前知識を蒸留することにより,不正確な物理特性を徐々に洗練し,高品質なシミュレーションを行うことができた。コードはhttps://github.com/tyhuang0428/DreamPhysics.comで公開されている。 Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the simulated results would become unnatural. The latter tends to formulate the video with minor motions and discontinuous frames, due to the absence of physical constraints in deformation learning. We think that video generative models are trained with real-world captured data, capable of judging physical phenomenon in simulation environments. To this end, we propose DreamPhysics in this work, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors. DreamPhysics supports both image- and text-conditioned guidance, optimizing physical parameters via score distillation sampling with frame interpolation and log gradient. Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions. Experimental results demonstrate that, by distilling the prior knowledge of video diffusion models, inaccurate physical properties can be gradually refined for high-quality simulation. Codes are released at: https://github.com/tyhuang0428/DreamPhysics.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 凹凸最大化による最適ロバストデータ混合の探索 Finding Optimally Robust Data Mixtures via Concave Maximization ( http://arxiv.org/abs/2406.01477v1 ) ライセンス: Link先を確認	Anvith Thudi, Chris J. Maddison,	(参考訳) データ分散の混合に関するトレーニングは、現在の多くの機械学習パイプラインで一般的であり、いくつかの下流タスクでうまく機能するのに役立つ。群分布的ロバスト最適化(群DRO)は、特定のモデルクラスを訓練するための混合重み付けを学習する一般的な方法であるが、群DRO法は非凸損失関数とモデルが非パラメトリックであるために非線形モデルに苦しむ。そこで我々は,より一般的なDRO問題の解法を提案し,MixMaxと呼ぶ手法を提案する。 MixMaxは、特定の凹面目標をエントロピーミラーの上昇で最大化することにより混合重量を選択し、重要なことに、この混合分布を有界予測器の集合に最適に適合させることでグループDRO最適モデルを返すことを証明した。実験では、変換器を用いたシーケンスモデリングタスクと、様々な非パラメトリック学習問題でMixMaxを検証した。すべてのケースにおいて、MixMaxは標準のデータミキシングとグループDROベースラインにマッチまたは性能を向上し、特にACSIncomeとCelebAアノテーションデータセットのバリエーションに対して、データバランシングの唯一のベースラインよりもXGBoostのパフォーマンスを改善した。 Training on mixtures of data distributions is now common in many modern machine learning pipelines, useful for performing well on several downstream tasks. Group distributionally robust optimization (group DRO) is one popular way to learn mixture weights for training a specific model class, but group DRO methods suffer for non-linear models due to non-convex loss functions and when the models are non-parametric. We address these challenges by proposing to solve a more general DRO problem, giving a method we call MixMax. MixMax selects mixture weights by maximizing a particular concave objective with entropic mirror ascent, and, crucially, we prove that optimally fitting this mixture distribution over the set of bounded predictors returns a group DRO optimal model. Experimentally, we tested MixMax on a sequence modeling task with transformers and on a variety of non-parametric learning problems. In all instances MixMax matched or outperformed the standard data mixing and group DRO baselines, and in particular, MixMax improved the performance of XGBoost over the only baseline, data balancing, for variations of the ACSIncome and CelebA annotations datasets.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 確率的ニュートン近位勾配法 Stochastic Newton Proximal Extragradient Method ( http://arxiv.org/abs/2406.01478v1 ) ライセンス: Link先を確認	Ruichen Jiang, Michał Dereziński, Aryan Mokhtari,	(参考訳) 確率的二階法は、雑音の多いヘッセン推定を用いて強凸最適化において高速な局所収束を達成する。しかし、これらの手法は通常、確率的ヘッセン雑音が減少するときにのみ超線形収束し、時間の経過とともに1点当たりのコストが増大する。最近の[arXiv:2204.09266]の研究は、高精細化コストを伴わずに超線型収束を実現するヘッセン平均化スキームでこの問題に対処している。それにもかかわらず、この手法はグローバル収束が遅いため、$\tilde{O}(\kappa^2)$イテレーションを$\tilde{O}((1/t)^{t/2})$に到達させる必要がある。本稿では,これらの境界を改良し,より高速な大域線形速度を実現し,$\tilde{O}(\kappa)$繰り返しで同じ高速な超線形速度に達するような,確率的ニュートン近位勾配法を提案する。我々は,Hybrid Proximal Extragradient (HPE) フレームワークを拡張して,強凸関数に対する高速な大域的および局所的な収束率と,ノイズの多いヘッセンオラクルへのアクセスを実現する。 Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(\kappa^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $\kappa$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(\kappa)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# BIMモデルの再検討の自動化に向けて:構築された環境の3次元セマンティック再構築のための統一フレームワーク Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment ( http://arxiv.org/abs/2406.01480v1 ) ライセンス: Link先を確認	Ka Lung Cheung, Chi Chung Lee,	(参考訳) 建築情報モデリング(BIM)の導入は建設プロジェクトにおいて有益である。しかし、3Dモデルの詳細をBIMに変換する統一的でスケーラブルなフレームワークがないため、課題に直面している。本稿では,BIM生成のための統一的セマンティック再構築アーキテクチャであるSRBIMを紹介する。提案手法の有効性は, 定性的, 定量的な評価を通じて実証され, 自動BIMモデリングの新しいパラダイムが確立された。 The adoption of Building Information Modeling (BIM) is beneficial in construction projects. However, it faces challenges due to the lack of a unified and scalable framework for converting 3D model details into BIM. This paper introduces SRBIM, a unified semantic reconstruction architecture for BIM generation. Our approach's effectiveness is demonstrated through extensive qualitative and quantitative evaluations, establishing a new paradigm for automated BIM modeling.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# ユーザが選択したストリーミングデータから学ぶ Learning from Streaming Data when Users Choose ( http://arxiv.org/abs/2406.01481v1 ) ライセンス: Link先を確認	Jinyan Su, Sarah Dean,	(参考訳) 多くの競合するサービスからなるデジタルマーケットでは、ユーザーは好みに応じて複数のサービスプロバイダを選択し、選択したサービスはユーザーデータを使用してモデルを漸進的に改善する。サービス提供者のモデルが次のステップでどのサービスを選択するかに影響し、その代わりにユーザの選択がモデルの更新に影響を与え、フィードバックループにつながる。本稿では、上記のダイナミクスを形式化し、ユーザ全体の損失を局所的に最小化するために、単純で効率的な分散アルゴリズムを開発する。理論的には、我々のアルゴリズムは漸近的に全体の損失の定常点にほぼ確実に収束することを示す。また,実世界のデータを用いたアルゴリズムの有用性を実験的に実証した。 In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 高忠実性2ビットゲートに対する$^{171}$Yb Rydberg状態の分光とモデリング Spectroscopy and modeling of $^{171}$Yb Rydberg states for high-fidelity two-qubit gates ( http://arxiv.org/abs/2406.01482v1 ) ライセンス: Link先を確認	Michael Peper, Yiyi Li, Daniel Y. Knapp, Mila Bileska, Shuo Ma, Genyue Liu, Pai Peng, Bichen Zhang, Sebastian P. Horvath, Alex P. Burgers, Jeff D. Thompson,	(参考訳) 我々は、高度に励起された$^{174}$Ybおよび$^{171}$Yb Rydberg状態に対して、$L \leq 2$のマルチチャネル量子欠陥(MQDT)モデルを示す。これらのモデルは、既存の文献データと、原子ビーム中の新しい高精度レーザーとマイクロ波分光法を組み合わせて開発され、実験的に測定されたスタークシフトと磁気モーメントとの詳細な比較によって検証される。次に、これらのモデルを用いて、2つのYb原子間の相互作用ポテンシャルを計算し、光学的ツイーザアレイにおける直接測定と良好な一致を見出す。計算された相互作用ポテンシャルから、F=3/2$Rydberg状態を用いて、以前のエンタングゲートの忠実度を$^{171}$Ybで低下させるような異常なF\"オースター共鳴を同定する。次に、より適切な$F=1/2$の状態を特定し、その残差を既知の情報源で完全に説明しながら、最先端の制御されたZゲートの忠実度を$\mathcal{F}=0.994(1)$とする。この研究は、Yb中性原子配列による量子コンピューティング、シミュレーション、エンタングルメント強化メトロジーの継続的な発展の基礎を確立する。 We present multichannel quantum defect (MQDT) models for highly excited $^{174}$Yb and $^{171}$Yb Rydberg states with $L \leq 2$. The models are developed using a combination of existing literature data and new, high-precision laser and microwave spectroscopy in an atomic beam, and validated by detailed comparison with experimentally measured Stark shifts and magnetic moments. We then use these models to compute interaction potentials between two Yb atoms, and find excellent agreement with direct measurements in an optical tweezer array. From the computed interaction potential, we identify an anomalous F\"orster resonance that likely degraded the fidelity of previous entangling gates in $^{171}$Yb using $F=3/2$ Rydberg states. We then identify a more suitable $F=1/2$ state, and achieve a state-of-the-art controlled-Z gate fidelity of $\mathcal{F}=0.994(1)$, with the remaining error fully explained by known sources. This work establishes a solid foundation for the continued development quantum computing, simulation and entanglement-enhanced metrology with Yb neutral atom arrays.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# オンライン最適化による一階・零階分散非平滑非凸確率最適化 Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization ( http://arxiv.org/abs/2406.01484v1 ) ライセンス: Link先を確認	Emre Sahinoglu, Shahin Shahrampour,	(参考訳) 分散確率最適化における非滑らかな非凸目的に対する(\delta,\epsilon$)-定常点の有限時間解析について検討する。エージェントのセットは、ネットワークを介して対話することで、ローカル情報のみを使用してグローバル関数を最小化することを目的としている。本稿では,多言語多言語分散オンライン学習(ME-DOL, Multi Epoch Decentralized Online Learning)と呼ばれる新しいアルゴリズムを提案する。まず,最近提案したオンライン・非凸手法を用いて,滑らかな非凸対象の最適収束率を復元する手法を提案する。次に、無作為な滑らか化とゴールドスタイン偏微分集合の性質に基づいて、解析を非滑らかな設定に拡張する。我々は、$O(\delta^{-1}\epsilon^{-3})$のサンプル複雑性を確立し、これは我々の知る限り、分散化された非滑らかな非凸確率最適化を(弱凸性を伴わない)1次設定で最初の有限時間保証である。さらに, 分散還元を使わずに, ゼロオーダーのオラクル設定に対して同じ速度を証明した。 We investigate the finite-time analysis of finding ($\delta,\epsilon$)-stationary points for nonsmooth nonconvex objectives in decentralized stochastic optimization. A set of agents aim at minimizing a global function using only their local information by interacting over a network. We present a novel algorithm, called Multi Epoch Decentralized Online Learning (ME-DOL), for which we establish the sample complexity in various settings. First, using a recently proposed online-to-nonconvex technique, we show that our algorithm recovers the optimal convergence rate of smooth nonconvex objectives. We then extend our analysis to the nonsmooth setting, building on properties of randomized smoothing and Goldstein-subdifferential sets. We establish the sample complexity of $O(\delta^{-1}\epsilon^{-3})$, which to the best of our knowledge is the first finite-time guarantee for decentralized nonsmooth nonconvex stochastic optimization in the first-order setting (without weak-convexity), matching its optimal centralized counterpart. We further prove the same rate for the zero-order oracle setting without using variance reduction.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# タスクグラフ学習の差別化:エゴセントリックビデオからの手続き的活動表現とオンライン誤検出 Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos ( http://arxiv.org/abs/2406.01486v1 ) ライセンス: Link先を確認	Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari,	(参考訳) 手続き的活動は、特定の目標を達成するための重要なステップのシーケンスである。彼らは、ユーザーを効果的に支援できるインテリジェントなエージェントを構築することが不可欠だ。この文脈では、タスクグラフは手続き的活動の人間の理解可能な表現として現れ、キーステップ上の部分順序を符号化している。従来,ビデオからタスクグラフを抽出するための手作り手法が一般的であったのに対して,本稿では,エッジの重みを直接最適化する手法を提案し,タスクグラフの勾配に基づく学習を可能にし,ニューラルネットワークアーキテクチャに自然にプラグインできる。 CaptainCook4Dデータセットの実験では、アクションシーケンスの観測から正確なタスクグラフを予測できることが示され、以前のアプローチよりも+16.7%向上した。また,提案フレームワークの相違点から,キーステップのテキストやビデオの埋め込みからタスクグラフを予測し,新たな映像理解能力を観察することを目的とした機能ベースのアプローチも導入する。提案手法を用いて学習したタスクグラフは、手続き的エゴセントリックなビデオにおけるオンライン誤検出を著しく向上させ、アセンブリ101およびEPIC-Tentデータセットにおいて、+19.8%および+7.5%の顕著なゲインを達成した。実験を複製するためのコードはhttps://github.com/fpv-iplab/Differentiable-Task-Graph-Learningで公開されている。 Procedural activities are sequences of key-steps aimed at achieving specific goals. They are crucial to build intelligent agents able to assist users effectively. In this context, task graphs have emerged as a human-understandable representation of procedural activities, encoding a partial ordering over the key-steps. While previous works generally relied on hand-crafted procedures to extract task graphs from videos, in this paper, we propose an approach based on direct maximum likelihood optimization of edges' weights, which allows gradient-based learning of task graphs and can be naturally plugged into neural network architectures. Experiments on the CaptainCook4D dataset demonstrate the ability of our approach to predict accurate task graphs from the observation of action sequences, with an improvement of +16.7% over previous approaches. Owing to the differentiability of the proposed framework, we also introduce a feature-based approach, aiming to predict task graphs from key-step textual or video embeddings, for which we observe emerging video understanding abilities. Task graphs learned with our approach are also shown to significantly enhance online mistake detection in procedural egocentric videos, achieving notable gains of +19.8% and +7.5% on the Assembly101 and EPIC-Tent datasets. Code for replicating experiments is available at https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# ラベル平滑化とデータモロフィケーションの結合によるロバスト分類 Robust Classification by Coupling Data Mollification with Label Smoothing ( http://arxiv.org/abs/2406.01494v1 ) ライセンス: Link先を確認	Markus Heinonen, Ba-Hien Tran, Michael Kampffmeyer, Maurizio Filippone,	(参考訳) トレーニング時間拡張の導入は、一般化を強化し、テスト時間の破損に対してディープニューラルネットワークを準備するための重要なテクニックである。生成拡散モデルの成功に触発されて,ラベルの滑らか化を図り,ラベルの信頼度を画像劣化と整合させる手法として,画像のノイズ化とぼやけという形で,新たな結合データ拡張を提案する。このメソッドは実装が簡単で、無視可能なオーバーヘッドを導入し、既存の拡張と組み合わせることができる。 CIFARおよびTinyImageNetデータセットの劣化画像ベンチマークにおいて、ロバスト性および不確実性の定量化が向上したことを示す。 Introducing training-time augmentations is a key technique to enhance generalization and prepare deep neural networks against test-time corruptions. Inspired by the success of generative diffusion models, we propose a novel approach coupling data augmentation, in the form of image noising and blurring, with label smoothing to align predicted label confidences with image degradation. The method is simple to implement, introduces negligible overheads, and can be combined with existing augmentations. We demonstrate improved robustness and uncertainty quantification on the corrupted image benchmarks of the CIFAR and TinyImageNet datasets.	翻訳日:2024-06-05 22:10:43 公開日:2024-06-03
# 言語エージェントのための反射強化自己学習 Reflection-Reinforced Self-Training for Language Agents ( http://arxiv.org/abs/2406.01495v1 ) ライセンス: Link先を確認	Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng,	(参考訳) 自己学習は、人間やより強力なモデルによるデモンストレーションに頼ることなく、言語エージェントのパフォーマンスを向上させる可能性がある。一般的なプロセスでは、モデルからサンプルを生成し、品質を評価し、高品質なサンプルをトレーニングすることでモデルを更新する。しかし, 自己学習は, 優れた性能を実現するためには, 高い品質のサンプルを必要とするため, モデルサンプリングのみに頼っているため, 効率が悪くなるため, 限界に直面することがある。さらに、これらの手法は、しばしば低品質のサンプルを無視し、効果的に利用できない。これらの制約に対処するため,リフレクション強化自己訓練(Re-ReST)を提案する。リフレクションモデルは、モデル出力と外部環境(例えば、コード生成における単体テスト結果)からのフィードバックの両方を入力として、改善されたサンプルを出力として生成する。この手法を用いることで、劣悪なサンプルの品質を効果的に向上させ、高品質なサンプルで自己学習データセットを効率的に強化する。我々は,マルチホップ質問応答,シーケンシャルな意思決定,コード生成,視覚的質問応答,テキスト・ツー・イメージ生成など,タスクにまたがるオープンソースの言語エージェントに関する広範な実験を行った。結果は、設定間での自己学習ベースラインの改善を示す。さらに、アブレーション研究は、高品質な自己学習サンプルの生成における反射モデルの効率と、自己整合性復号化との整合性を確認した。 Self-training can potentially improve the performance of language agents without relying on demonstrations from humans or stronger models. The general process involves generating samples from a model, evaluating their quality, and updating the model by training on high-quality samples. However, self-training can face limitations because achieving good performance requires a good amount of high-quality samples, yet relying solely on model sampling for obtaining such samples can be inefficient. In addition, these methods often disregard low-quality samples, failing to leverage them effectively. To address these limitations, we present Reflection-Reinforced Self-Training (Re-ReST), which leverages a reflection model to refine low-quality samples and subsequently uses these improved samples to augment self-training. The reflection model takes both the model output and feedback from an external environment (e.g., unit test results in code generation) as inputs and produces improved samples as outputs. By employing this technique, we effectively enhance the quality of inferior samples, and enrich the self-training dataset with higher-quality samples efficiently. We perform extensive experiments on open-source language agents across tasks, including multi-hop question answering, sequential decision-making, code generation, visual question answering, and text-to-image generation. Results demonstrate improvements over self-training baselines across settings. Moreover, ablation studies confirm the reflection model's efficiency in generating quality self-training samples and its compatibility with self-consistency decoding.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 大規模言語モデルにおける分類的・階層的概念の幾何学 The Geometry of Categorical and Hierarchical Concepts in Large Language Models ( http://arxiv.org/abs/2406.01506v1 ) ライセンス: Link先を確認	Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch,	(参考訳) 大言語モデルの表現空間において意味の意味がどのようにコード化されているかを理解することは、解釈可能性の根本的な問題である。本稿では,本分野における2つの基礎的課題について考察する。まず、 {'mammal'、'bird'、'reptile'、'fish'} のような分類学的概念はどのように表現されるのか? 第二に、概念間の階層的関係はどのように符号化されるのか? 例えば、"dog"が"mammal"エンコードされた一種の"mammal"であるという事実はどうでしょう? これらの疑問に答えるために線形表現仮説を拡張する方法を示す。単純な分類的概念はsimpliceとして表現され、階層的関連概念は直交的であり、(結果として)複素概念はsimpliceの直和から構築されたポリトープとして表現され、階層的構造を反映する。我々は、これらの理論結果をGemmaの大規模言語モデルで検証し、WordNetのデータを用いて、957の階層的な概念の表現を推定する。 Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure. We validate these theoretical results on the Gemma large language model, estimating representations for 957 hierarchically related concepts using data from WordNet.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 駆動型二段系における外部・内部ダイナミクスの疎結合 Decoupling of External and Internal Dynamics in Driven Two-level Systems ( http://arxiv.org/abs/2406.01511v1 ) ライセンス: Link先を確認	Samuel Böhringer, Alexander Friedrich,	(参考訳) 本研究では、各状態に対する量子化された外部自由度を含むレーザ駆動の2レベル系を、外部自由度のみに作用する振動子方程式の集合に分解し、デチューニングを表す演算子値の減衰を図示する。我々は、時間依存減衰を持つ古典振動子に訴えることにより、この問題の解法を特徴づける方法を提供する。この分類の結果、私たちは (a)外部線形ポテンシャルをもたない自由度を含むラビ振動の解析的・表現的表現自由表現 (b)デチューニング作用素が(解析的あるいは数値的に)対角化されるとき、問題は古典方程式の集合に分解されることを示す。 (c) 振動子方程式を摂動基底として、弱いがそれ以外の任意の外部ポテンシャルにおけるラビ振動を記述することができる。さらに、駆動磁場相のチャープは、駆動位相ノイズの存在がランゲヴィン型の確率進化方程式につながる間、デチューニング作用素の力学のエレンフェスト/平均値部分を補償する手段として自然に現れる。最後に、我々のアプローチは外部自由度に関して自由表現であり、その結果、所望の応用に応じて適切な表現や基底展開を選択することができる。 We show how a laser driven two-level system including quantized external degrees of freedom for each state can be decoupled into a set of oscillator equations acting only on the external degrees of freedom with operator valued damping representing the detuning. We give a way of characterizing the solvability of this family of problems by appealing to a classical oscillator with time-dependent damping. As a consequence of this classification we (a) obtain analytic and representation-free expressions for Rabi oscillations including external degrees of freedom with and without an external linear potential, (b) show that whenever the detuning operator can be diagonalized (analytically or numerically) the problem decomposes into a set of classical equations and (c) we can use the oscillator equations as a perturbative basis to describe Rabi oscillations in weak but otherwise arbitrary external potentials. Moreover, chirping of the driving fields phase emerges naturally as a means of compensating the Ehrenfest/mean-value part of the detuning operator's dynamics while the presence of driving phase noise leads to a stochastic evolution equation of Langevin type. Lastly, our approach is representation free with respect to the external degrees of freedom and as consequence a suitable representation or basis expansion can be chosen a posteriori depending on the desired application at hand.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# MAD:マルチアライメントMEG-to-Textデコーディング MAD: Multi-Alignment MEG-to-Text Decoding ( http://arxiv.org/abs/2406.01512v1 ) ライセンス: Link先を確認	Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Won Hee Lee, Renjing Xu, Hui Xiong,	(参考訳) 脳活動から言語を解読することは脳-コンピュータインターフェース(BCI)研究において重要な課題である。脳波(EEG)や脳磁図(MEG)などの非侵襲的な脳シグナル伝達技術は、その安全性と実用性から、侵襲的な電極移植を避けることで、ますます人気が高まっている。しかし、現在の研究は以下の3点を定めていない。 1)より優れた信号品質を提供するMEGの探索が限定された脳波に主眼を置いている。 2) 未知のテキストの性能が劣り,多様な言語的文脈によりよい一般化が可能なモデルの必要性が示される。 3)他のモダリティからの情報の不十分な統合は、脳活動の複雑なダイナミクスを包括的に理解するために我々の能力を制限する可能性がある。本研究では,複数のアライメントを持つ音声復号化フレームワークを用いて,MEG信号をテキストに変換する手法を提案する。提案手法は,MEG信号から完全に見えないテキストを生成するための,エンドツーエンドのマルチアライメントフレームワークを初めて導入した手法である。我々は、$\textit{GWilliams}$データセット上でBLEU-1の印象的なスコアを達成し、BLEU-1測定値のベースラインを5.49から10.44に大幅に上回った。この改良は、実世界の応用に向けての我々のモデルの進歩を実証し、BCI研究の進展の可能性を裏付けるものである。コードは $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$で入手できる。 Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $\textit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 量子計測に基づくエンジンのエンタングリングによる効率向上 Enhancing the efficiency of quantum measurement-based engines with entangling measurements ( http://arxiv.org/abs/2406.01513v1 ) ライセンス: Link先を確認	Franco Mayo, Augusto J. Roncaglia,	(参考訳) 量子計測に基づくエンジンの効率に及ぼすエンタングル計測の影響について検討する。まず,多くのサブシステムからなるエンジンにおいて,各サブシステム上の局所的な測定とは対照的に,エンタングル計測を行うことにより効率を向上させることができることを示す。集団測定が個々の局所測定と同じ局所状態を生成する場合、効率の改善は相関の量に比例する。最後に、2つのレベルシステムにおいて、これらのエンジンは有限の作業量しか得られず、限界が大きいサブシステムの数で、完全な効率で動作可能であることを示す。 We study the impact of entangling measurements on the efficiency of quantum measurement- based engines. We first show that for engines comprising many subsystems their efficiency can be enhanced by performing entangling measurements, as opposed to local measurements over each subsystem. When the collective measurement produces the same local state for the subsystems as individual local measurements, the improvement in the efficiency is proportional to the amount of correlations. Finally, we show that for two level systems these type of engine can operate at perfect efficiency while yielding a finite amount of work, in the limit large the number of subsystems.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 対称性の超越:(非)有向グラフの有効隣接行列と再正規化 Beyond symmetrization: effective adjacency matrices and renormalization for (un)singed directed graphs ( http://arxiv.org/abs/2406.01517v1 ) ライセンス: Link先を確認	Bruno Messias Farias de Resende,	(参考訳) 有向グラフや符号グラフの特異性に対処するために、新しいラプラシア作用素が現れた。例えば、方向性の場合、磁気演算子、ディレーション(探索不足)、ランダムウォークに基づく演算子等に遭遇する。これらの新しい演算子の定義は、新しい研究や概念の必要性をもたらし、その結果、新しい計算ツールの開発へと繋がる。しかし、これは本当に必要か? 本研究では、磁気、拡張、信号などの変形ラプラシア作用素の定義から生じる効果的な隣接行列の概念を定義する。これらの効果的な行列は、一般的なグラフを符号のない非方向グラフの族にマッピングすることができ、よく探索された測度ツールキット、機械学習方法、および非方向グラフの再正規化グループの適用を可能にする。変形作用素と実効行列の相互作用を探索するために、ホッジ・ヘルムホルツ分解がこの複雑さをナビゲートするのにどのように役立つかを示す。 To address the peculiarities of directed and/or signed graphs, new Laplacian operators have emerged. For instance, in the case of directionality, we encounter the magnetic operator, dilation (which is underexplored), operators based on random walks, and so forth. The definition of these new operators leads to the need for new studies and concepts, and consequently, the development of new computational tools. But is this really necessary? In this work, we define the concept of effective adjacency matrices that arise from the definition of deformed Laplacian operators such as magnetic, dilation, and signal. These effective matrices allow mapping generic graphs to a family of unsigned, undirected graphs, enabling the application of the well-explored toolkit of measures, machine learning methods, and renormalization groups of undirected graphs. To explore the interplay between deformed operators and effective matrices, we show how the Hodge-Helmholtz decomposition can assist us in navigating this complexity.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# BISON:ステートレススコープ特異的誘導体によるブラインド同定 BISON: Blind Identification through Stateless scOpe-specific derivatioN ( http://arxiv.org/abs/2406.01518v1 ) ライセンス: Link先を確認	Jakob Heher, Lena Heimberger, Stefan More,	(参考訳) GoogleやFacebookのような認証プロバイダに認証を委譲することは便利だが、ユーザーのプライバシーを侵害する。グローバルな識別子はインターネット全体の追跡を可能にし、さらに、IDプロバイダはユーザの関連性を記録できる。 Oblivious Pseudorandom関数にインスパイアされたBISONの仮称派生プロトコルを提示することで、どちらも必要悪ではないことを示す。サービスプロバイダのIDをIDプロバイダから隠しますが、信頼され、スコープ化され、不変の偽名を生成します。コローディングサービスプロバイダは、BISONのニックネームをリンクできない。これにより、ユーザの追跡が防止される。 BISONはユーザーデバイスに長期間の状態を必要とせず、認証プロセスにアクターを追加する必要はない。 BISONは軽量暗号を使用している。擬似関数の導出には、楕円曲線スカラー点乗法と4つのハッシュ関数評価の合計4つが必要である。 BISONは既存の認証プロトコルに統合されるように設計されている。我々は、OIDCのPPIDをBISONを用いて引き出すことができるOpenID Connect拡張を提供する。このことは、BISONのプライバシー保証が実際に実現可能であることを示している。これらの理由から、BISONは明日のプライバシーを守るインターネットを実現するための重要な一歩だ。 Delegating authentication to identity providers like Google or Facebook, while convenient, compromises user privacy. Global identifiers enable internet-wide tracking; furthermore, identity providers can also record users' associations. We show that neither is a necessary evil by presenting the BISON pseudonym derivation protocol, inspired by Oblivious Pseudorandom Functions. It hides the service provider's identity from the identity provider, yet produces a trusted, scoped, immutable pseudonym. Colluding service providers cannot link BISON pseudonyms. This prevents user tracking. BISON does not require long-lived state on the user device, and does not add additional actors to the authentication process. BISON uses lightweight cryptography. Pseudonym derivation requires a total of four elliptic curve scalar-point multiplications and four hash function evaluations, totaling to ~3 ms in our proof of concept implementation. BISON is designed to integrate into existing authentication protocols. We provide an OpenID Connect extension that allows OIDC's PPID pseudonyms to be derived using BISON. This demonstrates that BISON's privacy guarantees can be realized in practice. For these reasons, BISON is a crucial stepping stone towards realizing the privacy-preserving internet of tomorrow.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# MOSEAC: ストリーム化された可変時間ステップ強化学習 MOSEAC: Streamlined Variable Time Step Reinforcement Learning ( http://arxiv.org/abs/2406.01521v1 ) ライセンス: Link先を確認	Dong Wang, Giovanni Beltrame,	(参考訳) 従来の強化学習(RL)法は、通常、各サイクルがアクションに対応する固定制御ループを用いる。この剛性は、最適制御周波数がタスク依存であるため、実用的な応用において課題を生じさせる。最適以下の選択は、高い計算要求と探索効率の低下につながる可能性がある。可変時間ステップ強化学習(VTS-RL)は、制御ループに適応周波数を用いることでこれらの問題に対処し、必要な時にのみ動作を実行する。このアプローチはリアクティブプログラミングの原則に根ざして、計算負荷を減らし、アクション時間を含めることでアクション空間を拡張する。しかしながら、VTS-RLの実装は、多目的アクションデュレーション空間(すなわち、目標を達成するためにタスク性能と時間ステップのバランスをとる)での探索を司る複数のハイパーパラメータをチューニングする必要があるため、しばしば複雑である。これらの課題を克服するために、我々はMOSEAC法(Multi-Objective Soft Elastic Actor-Critic)を導入する。本手法は、トレーニング中のタスク報酬の観測傾向に基づいて、ハイパーパラメータを調整する適応型報酬方式を特徴とする。このスキームは、ハイパーパラメータチューニングの複雑さを低減し、探索をガイドするために単一のハイパーパラメータを必要とするため、学習プロセスを簡素化し、デプロイメントコストを削減できる。ニュートンのキネマティクス環境でのシミュレーションによりMOSEAC法の有効性を検証し,より少ない時間ステップで高いタスクと訓練性能を示し,最終的にエネルギー消費を低減した。この検証により、MOSEACは単一のパラメータを用いてエージェント制御ループ周波数を自動的に調整することで、RLアルゴリズムの展開を効率化する。その原理は任意のRLアルゴリズムを強化するために適用でき、様々な用途に汎用的な解である。 Traditional reinforcement learning (RL) methods typically employ a fixed control loop, where each cycle corresponds to an action. This rigidity poses challenges in practical applications, as the optimal control frequency is task-dependent. A suboptimal choice can lead to high computational demands and reduced exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues by using adaptive frequencies for the control loop, executing actions only when necessary. This approach, rooted in reactive programming principles, reduces computational load and extends the action space by including action durations. However, VTS-RL's implementation is often complicated by the need to tune multiple hyperparameters that govern exploration in the multi-objective action-duration space (i.e., balancing task performance and number of time steps to achieve a goal). To overcome these challenges, we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method. This method features an adaptive reward scheme that adjusts hyperparameters based on observed trends in task rewards during training. This scheme reduces the complexity of hyperparameter tuning, requiring a single hyperparameter to guide exploration, thereby simplifying the learning process and lowering deployment costs. We validate the MOSEAC method through simulations in a Newtonian kinematics environment, demonstrating high task and training performance with fewer time steps, ultimately lowering energy consumption. This validation shows that MOSEAC streamlines RL algorithm deployment by automatically tuning the agent control loop frequency using a single parameter. Its principles can be applied to enhance any RL algorithm, making it a versatile solution for various applications.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 物理知識とデータに制限のある動的プロセス操作のための物理インフォームニューラルネットワーク Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data ( http://arxiv.org/abs/2406.01528v1 ) ライセンス: Link先を確認	Mehmet Velioglu, Song Zhai, Sophia Rupprecht, Alexander Mitsos, Andreas Jupke, Manuel Dahmen,	(参考訳) 化学工学において、プロセスデータは取得するのに高価であり、複雑な現象は厳密にモデル化することは困難であり、完全にデータ駆動と純粋に機械的モデリングのアプローチは非現実的である。プロセスデータが不足し,完全な機械的知識が欠如している場合に,微分代数方程式系が支配する動的プロセスをモデル化するために,物理情報ニューラルネットワーク(PINN)を用いて検討する。特に,直接観測データも構成方程式も利用できない状態の推定に着目する。実験目的のために, 連続加熱槽と液液分離器について検討した。 PINNは、測定されていない状態を妥当な精度で推測でき、純粋にデータ駆動モデルよりも低データシナリオでよりよく一般化できる。したがって、PINNは、ハイブリッド力学/データ駆動モデルと同様、比較的少数の実験データと部分的に知られている機械的記述が利用可能である場合に、プロセスのモデリングが可能であることを示し、さらなる調査を保証できる有望な経路を構成すると結論付けた。 In chemical engineering, process data is often expensive to acquire, and complex phenomena are difficult to model rigorously, rendering both entirely data-driven and purely mechanistic modeling approaches impractical. We explore using physics-informed neural networks (PINNs) for modeling dynamic processes governed by differential-algebraic equation systems when process data is scarce and complete mechanistic knowledge is missing. In particular, we focus on estimating states for which neither direct observational data nor constitutive equations are available. For demonstration purposes, we study a continuously stirred tank reactor and a liquid-liquid separator. We find that PINNs can infer unmeasured states with reasonable accuracy, and they generalize better in low-data scenarios than purely data-driven models. We thus show that PINNs, similar to hybrid mechanistic/data-driven models, are capable of modeling processes when relatively few experimental data and only partially known mechanistic descriptions are available, and conclude that they constitute a promising avenue that warrants further investigation.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# Coughsの数え方: 自動カフ検出アルゴリズムの性能を評価するイベントベースフレームワーク How to Count Coughs: An Event-Based Framework for Evaluating Automatic Cough Detection Algorithm Performance ( http://arxiv.org/abs/2406.01529v1 ) ライセンス: Link先を確認	Lara Orlandic, Jonathan Dan, Jerome Thevenot, Tomas Teijeiro, Alain Sauty, David Atienza,	(参考訳) 慢性うっ血性疾患は、その頻度に関する主観的な患者アンケートに頼っているため、広く評価が困難である。機械学習(ML)アルゴリズムを実行するウェアラブルデバイスは、毎日のうなり声の定量化、症状の追跡と治療評価のための客観的指標の提供を約束している。しかし、コーカウンティングアルゴリズムの最先端メトリクスと、臨床医に関連する情報との間にはミスマッチがある。ほとんどの研究は、コウイベントの数や時間的パターンなど、臨床的に関係のある結果を直接提供しない非コウサンプルとコーを区別することに焦点を当てている。さらに、特異性や精度といった典型的な指標は、クラス不均衡によってバイアスを受けることができる。本稿では,臨床ガイドラインと整合したイベントベース評価指標を用いて,有意なコーカウンティングエンドポイントについて検討する。 ML分類器を用いて、従来のサンプルベース精度測定の欠点を説明し、データセットクラス不均衡とサンプルウィンドウ長による差異を明らかにする。また、コーグイベントを特定し、偽陽性を否定するアルゴリズム性能をテストするための、オープンソースのイベントベース評価フレームワークを提案する。臨床関連性でアルゴリズムの性能を評価するための第1ステップとして,イベントベースコーカウンティングの事例とベストプラクティスガイドラインを提供する。 Chronic cough disorders are widespread and challenging to assess because they rely on subjective patient questionnaires about cough frequency. Wearable devices running Machine Learning (ML) algorithms are promising for quantifying daily coughs, providing clinicians with objective metrics to track symptoms and evaluate treatments. However, there is a mismatch between state-of-the-art metrics for cough counting algorithms and the information relevant to clinicians. Most works focus on distinguishing cough from non-cough samples, which does not directly provide clinically relevant outcomes such as the number of cough events or their temporal patterns. In addition, typical metrics such as specificity and accuracy can be biased by class imbalance. We propose using event-based evaluation metrics aligned with clinical guidelines on significant cough counting endpoints. We use an ML classifier to illustrate the shortcomings of traditional sample-based accuracy measurements, highlighting their variance due to dataset class imbalance and sample window length. We also present an open-source event-based evaluation framework to test algorithm performance in identifying cough events and rejecting false positives. We provide examples and best practice guidelines in event-based cough counting as a necessary first step to assess algorithm performance with clinical relevance.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 大規模言語モデルと脳内マッピング : 脳スコアの過度信頼に対する一事例 What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores ( http://arxiv.org/abs/2406.01538v1 ) ライセンス: Link先を確認	Ebrahim Feghhi, Nima Hadidi, Bryan Song, Idan A. Blank, Jonathan C. Kao,	(参考訳) 大きな言語モデル(LLM)の顕著な能力を考えると、人間の脳との類似性を評価することへの関心が高まっている。この類似性を定量化するための1つのアプローチは、モデルがいかに神経信号を予測するかを測定することである。 LLMの内部表現は最先端の脳スコアを達成し、人間の言語処理と計算原理を共有するという憶測に繋がる。この推論は、LLMによって予測される神経活動のサブセットが言語処理のコア要素を反映している場合にのみ有効である。本稿では、LLM-to-Brainマッピングの衝撃的な研究で使用される3つのニューラルネットワークを解析することにより、この仮定を疑問視する。最初に、これらのデータセットを用いた以前の研究で示されたように、シャッフルトレインテストのスプリットを使用すると、時間的自己相関がLLMより優れているだけでなく、LLMが説明しているほとんどの神経の分散も説明できる。したがって、私たちは前進する連続的な分割を使用します。第二に、トレーニングされていないLLMの驚くほど高い脳のスコアは、それらが2つの単純な特徴である文の長さと文の位置以外の追加的な神経の分散を考慮しないことを示すことによって説明される。このことは、トランスフォーマーアーキテクチャが計算をもっと脳に似たものに偏っているという証拠を弱めている。第3に、このデータセット上で訓練されたLLMの脳のスコアは、文の長さ、位置、代名詞の推論による静的単語の埋め込みによって説明できる。脳のスコアの過度な信頼は、LLMと脳の類似性を過度に解釈し、LLMが神経信号にマッピングしているものをデコンストラクションすることの重要性を強調した。 Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 物理インフォームド深層学習と高次元拡散反応方程式の圧縮コロケーション:実用的存在論と数値 Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics ( http://arxiv.org/abs/2406.01539v1 ) ライセンス: Link先を確認	Simone Brugiapaglia, Nick Dexter, Samir Karam, Weiqi Wang,	(参考訳) 科学計算の最前線では、Deep Learning(DL)、すなわちDeep Neural Networks(DNN)による機械学習が、部分微分方程式(PDE)を解く強力な新しいツールとして登場した。 DNNは特に、50年代後半にリチャード・ベルマン(Richard E. Bellman)が提唱した「次元の呪い」の効果を弱めるのに適している。しかし、DNNは90年代以降、PDEの解法として使われてきたが、数値解析(安定性、精度、サンプルの複雑さなど)でそれらの数学的効率を支えている文献は、最近現れ始めたばかりである。本稿では,分散度に基づく手法とランダムサンプリングを用いた関数近似の最近の進歩を活用し,DLに基づく効率的な高次元PDEソルバの開発と解析を行う。理論的にも数値的にも,新しい安定かつ高精度なスペクトルコロケーション法と競合できることを示す。特に,ネットワークアーキテクチャに適切な境界を持つ訓練可能なDNNのクラスと,サンプルの複雑性に十分な条件が存在すること,対数的あるいは最悪の場合,ネットワークが安定かつ正確に拡散反応PDEを高い確率で近似できるような次元の線形スケーリングが存在すること,という新たな実用的存在定理を実証する。 On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such as the exponential dependence of the sample complexity, i.e., the number of samples required to solve an approximation problem, on the dimension of the ambient space. However, although DNNs have been used to solve PDEs since the `90s, the literature underpinning their mathematical efficiency in terms of numerical analysis (i.e., stability, accuracy, and sample complexity), is only recently beginning to emerge. In this paper, we leverage recent advancements in function approximation using sparsity-based techniques and random sampling to develop and analyze an efficient high-dimensional PDE solver based on DL. We show, both theoretically and numerically, that it can compete with a novel stable and accurate compressive spectral collocation method. In particular, we demonstrate a new practical existence theorem, which establishes the existence of a class of trainable DNNs with suitable bounds on the network architecture and a sufficient condition on the sample complexity, with logarithmic or, at worst, linear scaling in dimension, such that the resulting networks stably and accurately approximate a diffusion-reaction PDE with high probability.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 誤りから学ぶ:自動運転車計画における配電シフトの微妙な制御方法 Learning from Mistakes: a Weakly-supervised Method for Mitigating the Distribution Shift in Autonomous Vehicle Planning ( http://arxiv.org/abs/2406.01544v1 ) ライセンス: Link先を確認	Fazel Arasteh, Mohammed Elmahgiubi, Behzad Khamidehi, Hamidreza Mirkhani, Weize Zhang, Kasra Rezaee,	(参考訳) 計画問題は、自律運転フレームワークの基本的な側面を構成する。近年の表現学習の進歩により、車両は周囲の環境を理解することができ、学習に基づく計画戦略の統合が容易になった。これらのアプローチの中で、Imitation Learningは優れたトレーニング効率のために際立っている。しかし、従来の模倣学習手法は、共変量シフト現象に関連する課題に遭遇する。本稿では,この問題に対する対策としてLearning from Mistakes (LfM)を提案する。 LfMの本質は、様々なシナリオで事前訓練されたプランナーをデプロイすることにある。障害から安全な距離を維持したり、交通ルールを守ったりといった、プランナーが直接の目的から逸脱するケースは、間違いとしてフラグ付けされる。これらのミスに対応する環境は、配布外状態に分類され、クローズドループミスデータセットと呼ばれる新しいデータセットにコンパイルされる。特に、クローズドループデータに専門家アノテーションがないことは、標準的な模倣学習アプローチの適用性を妨げている。閉ループ誤りからの学習を容易にするために,現状の環境条件下で有効な軌跡を識別することを目的とした,弱教師付き手法であるValidity Learningを導入する。 InDデータセットとNuplanデータセットで行った実験的評価は、プログレッシブやコリジョンレートなどのクローズドループメトリクスを大幅に向上させ、提案手法の有効性を裏付けるものである。 The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the co-variate shift phenomenon. We propose Learn from Mistakes (LfM) as a remedy to address this issue. The essence of LfM lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as mistakes. The environments corresponding to these mistakes are categorized as out-of-distribution states and compiled into a new dataset termed closed-loop mistakes dataset. Notably, the absence of expert annotations for the closed-loop data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce Validity Learning, a weakly supervised method, which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on the InD and Nuplan datasets reveal substantial enhancements in closed-loop metrics such as Progress and Collision Rate, underscoring the effectiveness of the proposed methodology.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 量子計算基底状態における格子構造の符号化 Encoding lattice structures in Quantum Computational Basis States ( http://arxiv.org/abs/2406.01547v1 ) ライセンス: Link先を確認	Kalyan Dasgupta,	(参考訳) 格子モデルまたは構造は、物理系を表現するために使用される数学的形式を持つ幾何学的対象である。様々な分野、すなわち凝縮物質物理学において、化学における分子の自由度の研究や、高分子力学やタンパク質構造の研究に広く用いられている。本稿では、量子計算アルゴリズムで用いられる量子ビットの計算基底状態における格子構造の符号化手法について論じる。タンパク質構造予測における格子モデルの具体的な利用例を示す。タンパク質構造予測問題を解くための量子アルゴリズムは提案せず、格子構造の一般的な符号化手法を提案する。 Lattice models or structures are geometrical objects with mathematical forms, that are used to represent physical systems. They have been used widely in diverse fields, namely, in condensed matter physics, to study degrees of freedom of molecules in chemistry and in studying polymer dynamics and protein structures to name a few. In this article we discuss an encoding methodology of lattice structures in computational basis states of qubits (as used in quantum computing algorithms). We demonstrate a specific use case of lattice models in protein structure prediction. We do not propose any quantum algorithm to solve the protein structure prediction problem, instead, we propose a generic encoding methodology of lattice structures.	翻訳日:2024-06-05 22:00:59 公開日:2024-06-03
# 検索再生における効果的なノイズフィルタリングのための情報基盤の展望 An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation ( http://arxiv.org/abs/2406.01549v1 ) ライセンス: Link先を確認	Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, Jingchang Chen, Bing Qin,	(参考訳) Retrieval-augmented Generationは、大規模コーパスから取得した関連情報と、大規模言語モデルの機能を統合しているが、現実のノイズの多いデータに直面すると、課題に遭遇する。最近の解決策の1つは、関連するコンテンツを見つけるためにフィルタモジュールを訓練するが、最適な雑音圧縮しか達成しないことである。本稿では,情報ボトルネック理論を検索強化世代に導入することを提案する。提案手法では,圧縮と地盤出力の相互情報を同時に最大化するとともに,圧縮と回収された通過の相互情報を最小化することにより,雑音のフィルタリングを行う。さらに,新たな総合評価,教師付き微調整データの選定,強化学習報酬の構築に活用するための情報ボトルネックの定式を導出する。実験の結果,提案手法は,回答生成の正確性だけでなく,2.5 %$圧縮率の簡潔性においても,様々な質問応答データセットに対して顕著な改善が得られた。 Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottleneck theory into retrieval-augmented generation. Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. In addition, we derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations, the selection of supervised fine-tuning data, and the construction of reinforcement learning rewards. Experimental results demonstrate that our approach achieves significant improvements across various question answering datasets, not only in terms of the correctness of answer generation but also in the conciseness with $2.5\%$ compression rate.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# ELSA:街路における社会活動の地域化の評価 ELSA: Evaluating Localization of Social Activities in Urban Streets ( http://arxiv.org/abs/2406.01551v1 ) ライセンス: Link先を確認	Maryam Hosseini, Marco Cipriano, Sedigheh Eslami, Daniel Hodczak, Liu Liu, Andres Sevtsuk, Gerard de Melo,	(参考訳) なぜ街路は、他の街路よりも多くの社会活動を惹きつけるのか? ストリートデザインのせいなのか、近所の土地利用パターンが、人々が集まるビジネスの機会を生み出しているのか? これらの質問は、都市社会学者、デザイナー、プランナーに何十年も興味を持たせてきた。しかし、この領域のほとんどの研究は、都市環境における社会的相互作用に影響を与える様々な要因に関する包括的視点を欠いているため、規模が限られている。これらの問題を探索するには、都市部における社会的相互作用の頻度と多様性に関する詳細なデータが必要である。コンピュータビジョンの最近の進歩とオープン語彙検出モデルの出現は、従来の観測手法では不可能だったスケールでのこの長年の問題に対処するユニークな機会を提供する。本稿では,都市の街路画像における社会活動の局所化を評価するためのベンチマークデータセットを提案する。 ELSAは都市社会学とデザインの理論的枠組みを踏襲している。アクション認識データセットの大部分は制御された設定で収集されるが、私たちは、ソーシャルグループのサイズとアクティビティの種類が著しく異なる、その中間のストリートレベルの画像を使用する。 ELSAには、個人とグループの活動のための4,300以上のマルチラベル境界ボックスを備えた手動で注釈付けされた937の画像が含まれており、条件、状態、行動の3つの主要なグループに分類される。各カテゴリーは、例えば、単独または条件下のグループ、立位または歩行の様々なサブカテゴリを含み、国家カテゴリーに該当し、アクションカテゴリーに関して話すか、食事をする。 ELSAは研究コミュニティ向けに公開されている。 Why do some streets attract more social activities than others? Is it due to street design, or do land use patterns in neighborhoods create opportunities for businesses where people gather? These questions have intrigued urban sociologists, designers, and planners for decades. Yet, most research in this area has remained limited in scale, lacking a comprehensive perspective on the various factors influencing social interactions in urban settings. Exploring these issues requires fine-level data on the frequency and variety of social interactions on urban street. Recent advances in computer vision and the emergence of the open-vocabulary detection models offer a unique opportunity to address this long-standing issue on a scale that was previously impossible using traditional observational methods. In this paper, we propose a new benchmark dataset for Evaluating Localization of Social Activities (ELSA) in urban street images. ELSA draws on theoretical frameworks in urban sociology and design. While majority of action recognition datasets are collected in controlled settings, we use in-the-wild street-level imagery, where the size of social groups and the types of activities can vary significantly. ELSA includes 937 manually annotated images with more than 4,300 multi-labeled bounding boxes for individual and group activities, categorized into three primary groups: Condition, State, and Action. Each category contains various sub-categories, e.g., alone or group under Condition category, standing or walking, which fall under the State category, and talking or dining with regards to the Action category. ELSA is publicly available for the research community.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 等変テンソル関数の学習と疎ベクトル回復への応用 Learning equivariant tensor functions with applications to sparse vector recovery ( http://arxiv.org/abs/2406.01552v1 ) ライセンス: Link先を確認	Wilson G. Gregory, Josué Tonelli-Cueto, Nicholas F. Marshall, Andrew S. Lee, Soledad Villar,	(参考訳) この仕事は、テンソル入力のタプルからテンソル出力への等変多項式関数を特徴づける。物理学によって極端に動機づけられた我々は、テンソル上の直交群の対角運動に関して同変函数に焦点をあてる。この特徴付けをローレンツ群やシンプレクティック群を含む他の線型代数群に拡張する方法を示す。これらの特徴付けの背景にある私たちのゴールは、同変機械学習モデルを定義することです。特に,スパースベクトル推定問題に着目する。この問題は理論計算機科学の文献で広く研究されており、二乗和の技法から導かれる明示的なスペクトル法は、特定の仮定の下でスパースベクトルを復元することを示すことができる。これらの結果から,提案した同変機械学習モデルは,理論上最もよく知られたスペクトル法よりも優れたスペクトル法を学習できることが示唆された。実験により,まだ理論的に解析されていない環境では,学習スペクトル法がこの問題を解決できることが示唆された。これは、理論が機械学習モデルや機械学習モデルに情報を伝えることができる有望な方向の例である。 This work characterizes equivariant polynomial functions from tuples of tensor inputs to tensor outputs. Loosely motivated by physics, we focus on equivariant functions with respect to the diagonal action of the orthogonal group on tensors. We show how to extend this characterization to other linear algebraic groups, including the Lorentz and symplectic groups. Our goal behind these characterizations is to define equivariant machine learning models. In particular, we focus on the sparse vector estimation problem. This problem has been broadly studied in the theoretical computer science literature, and explicit spectral methods, derived by techniques from sum-of-squares, can be shown to recover sparse vectors under certain assumptions. Our numerical results show that the proposed equivariant machine learning models can learn spectral methods that outperform the best theoretically known spectral methods in some regimes. The experiments also suggest that learned spectral methods can solve the problem in settings that have not yet been theoretically analyzed. This is an example of a promising direction in which theory can inform machine learning models and machine learning models could inform theory.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 人間誘導によるフレキシブル・インタラクティブ・リフレクション除去に向けて Towards Flexible Interactive Reflection Removal with Human Guidance ( http://arxiv.org/abs/2406.01555v1 ) ライセンス: Link先を確認	Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang,	(参考訳) 単一の画像反射除去は本質的に不明瞭であり、分離を必要とする反射成分と透過成分の両方が自然な画像統計に従う可能性がある。既存の手法では、様々な種類の低レベルおよび物理ベースのキューを反射信号の源として利用することでこの問題に対処しようとする。しかし、これらのキューは特定のキャプチャーシナリオでしか観測できないため、普遍的に適用できない。これは、テストイメージが彼らの仮定と一致しない場合、大幅なパフォーマンス低下につながる。本稿では,頑健な反射除去を実現するために,ポイントやバウンディングボックスなどの多種多様な人間の指示を補助的な高レベルとして活用する,フレキシブルな反射除去手法を提案する。しかし,既存のリフレクション除去ネットワークに生のユーザガイダンスを的確に組み込むことで,性能が向上することはない。そこで我々は,インタラクティブセグメンテーション・ファンデーション・モデルを用いて,生ユーザ入力をリフレクションマスクの統一形式に革新的に変換する。このような設計は、基本セグメンテーションモデルとフレキシブルなヒューマンガイダンスのクインテサンスを吸収し、反射分離の課題を軽減する。さらに,ユーザガイダンスを完全に活用し,ユーザアノテーションのコストを削減するために,提案する自己適応型プロンプトブロックを含むマスク誘導反射除去ネットワークを設計する。このブロックは、ユーザガイダンスをアンカーとして適応的に組み込んで、クロスアテンション機構を介して送信機能を洗練する。提案手法は, フレキシブルかつスパースなユーザガイダンスの助けを借りて, 各種データセット上での最先端性能を示す。私たちのコードとデータセットは、https://github.com/ShawnChenn/FlexibleReflectionRemoval.comで公開されます。 Single image reflection removal is inherently ambiguous, as both the reflection and transmission components requiring separation may follow natural image statistics. Existing methods attempt to address the issue by using various types of low-level and physics-based cues as sources of reflection signals. However, these cues are not universally applicable, since they are only observable in specific capture scenarios. This leads to a significant performance drop when test images do not align with their assumptions. In this paper, we aim to explore a novel flexible interactive reflection removal approach that leverages various forms of sparse human guidance, such as points and bounding boxes, as auxiliary high-level prior to achieve robust reflection removal. However, incorporating the raw user guidance naively into the existing reflection removal network does not result in performance gains. To this end, we innovatively transform raw user input into a unified form -- reflection masks using an Interactive Segmentation Foundation Model. Such a design absorbs the quintessence of the foundational segmentation model and flexible human guidance, thereby mitigating the challenges of reflection separations. Furthermore, to fully utilize user guidance and reduce user annotation costs, we design a mask-guided reflection removal network, comprising our proposed self-adaptive prompt block. This block adaptively incorporates user guidance as anchors and refines transmission features via cross-attention mechanisms. Extensive results on real-world images validate that our method demonstrates state-of-the-art performance on various datasets with the help of flexible and sparse user guidance. Our code and dataset will be publicly available here https://github.com/ShawnChenn/FlexibleReflectionRemoval.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 量子ネットワーク上の結合量子ウォーク Coined Quantum Walk on a Quantum Network ( http://arxiv.org/abs/2406.01558v1 ) ライセンス: Link先を確認	Jigyen Bhavsar, Shashank Shekhar, Siddhartha Santra,	(参考訳) 量子ネットワークにおけるウォーカーコインとクビット自由度とのユニタリ相互作用から、ウォーカームーブメントのコヒーレントな重ね合わせが生じる量子ネットワーク上で、離散時間で造られた量子ウォークを探索する。歩行力学は、歩行者とネットワークの間の絡み合いが増大し、他方では、ネットワーク量子ビット間の絡み合いが増大する。ネットワーク量子ビット間の最初の絡み合いは、これらの絡み合い測度と量子ウォーク統計の漸近値を決定する上で重要な役割を果たす。具体的には、ウォーカー・ネットワーク状態の絡み合いエントロピーと量子ネットワーク・量子状態の負性性は、初期ネットワークの絡み合いによって増加する値に飽和する。 Asymptotic time-averaged walker-position probability distribution showed increase localization around the initial walker-position with higher initial network entanglement。量子ネットワーク特性のキャラクタリゼーションツールとしてのこれらの結果の潜在的応用が提案されている。 We explore a discrete-time, coined quantum walk on a quantum network where the coherent superposition of walker-moves originates from the unitary interaction of the walker-coin with the qubit degrees of freedom in the quantum network. The walk dynamics leads to a growth of entanglement between the walker and the network on one hand, and on the other, between the network-qubits among themselves. The initial entanglement among the network qubits plays a crucial role in determining the asymptotic values of these entanglement measures and the quantum walk statistics. Specifically, the entanglement entropy of the walker-network state and the negativity of the quantum network-qubit state saturate to values increasing with the initial network-entanglement. The asymptotic time-averaged walker-position probability distribution shows increasing localization around the initial walker-position with higher initial network entanglement. A potential application of these results as a characterisation tool for quantum network properties is suggested.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 統一運動学習者としての原型変換器 Prototypical Transformer as Unified Motion Learners ( http://arxiv.org/abs/2406.01559v1 ) ライセンス: Link先を確認	Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu,	(参考訳) 本稿では,プロトタイプの観点から様々な動作タスクにアプローチする汎用かつ統一的なフレームワークであるPrototypeal Transformer(ProtoFormer)を紹介する。 ProtoFormerは、モーションダイナミクスを慎重に検討し、2つの革新的なデザインを導入することで、Transformerとプロトタイプ学習をシームレスに統合する。まず、クロスアテンションプロトタイピングは、シグネチャモーションパターンに基づくプロトタイプを発見し、モーションシーンの理解に透明性を提供する。第二に、Latent Synchronizationはプロトタイプによる特徴表現学習をガイドし、運動の不確実性の問題を効果的に緩和する。実験により,光学的流れやシーン深度といった一般的な動作課題に対して,本手法が競合性能を発揮することを示す。さらに、オブジェクト追跡やビデオ安定化など、さまざまな下流タスクにまたがる汎用性を示す。 In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature motion patterns, providing transparency in understanding motion scenes. Second, Latent Synchronization guides feature representation learning via prototypes, effectively mitigating the problem of motion uncertainty. Empirical results demonstrate that our approach achieves competitive performance on popular motion tasks such as optical flow and scene depth. Furthermore, it exhibits generality across various downstream tasks, including object tracking and video stabilization.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# ワンステップテキスト・ツー・イメージ生成のためのスコアアイデンティティ蒸留における長短誘導 Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation ( http://arxiv.org/abs/2406.01561v1 ) ライセンス: Link先を確認	Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang,	(参考訳) 広範テキストイメージペアで訓練された拡散ベースのテキスト画像生成モデルは、テキスト記述と整合したフォトリアリスティック画像を生成する能力を示している。しかし、これらのモデルの顕著な制限は、その遅いサンプル生成であり、同じネットワークを通して反復的な改善を必要とする。本稿では,Score ID Distillation (SiD) を強化し,Long and Short Classifier-free Guide (LSG) を開発した。 SiD はモデルに基づく明示的なスコアマッチング損失を最適化することを目的としており、実際の計算のために提案したLSG と並行してスコア同一性に基づく近似を用いている。一段生成器で合成された偽画像のみをトレーニングすることにより、LSGを備えたSiDは、FIDとCLIPのスコアを急速に改善し、競争力のあるCLIPスコアを維持しながら最先端のFIDのパフォーマンスを達成する。具体的には、そのデータフリー蒸留である安定拡散1.5は、COCO-2014検証セットで8.15の低いFID、LSGスケールで0.304のCLIPスコア、LSGスケールで0.313のCLIPスコアで9.56のFIDを達成している。我々はPyTorchの実装と蒸留したStable Diffusionワンステップジェネレータをhttps://github.com/mingyuanzhou/SiD-LSGで公開します。 Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and a FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. We will make our PyTorch implementation and distilled Stable Diffusion one-step generators available at https://github.com/mingyuanzhou/SiD-LSG	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# オンライン強化学習における計画の新たな視点 A New View on Planning in Online Reinforcement Learning ( http://arxiv.org/abs/2406.01562v1 ) ライセンス: Link先を確認	Kevin Roice, Parham Mohammad Panahi, Scott M. Jordan, Adam White, Martha White,	(参考訳) 本稿では,動的プログラミング更新とDynaアーキテクチャに似たモデルフリー更新を混合(近似)する,背景計画を用いたモデルベース強化学習の新しいアプローチについて検討する。学習したモデルによるバックグラウンドプランニングは、Double DQNのようなモデルフリーの代替よりも悪い場合が多い。根本的な問題は、学習したモデルが不正確であり、特に多くのステップを繰り返すと、しばしば無効な状態を生成することである。本稿では,背景プランニングを一連のサブゴールに制約し,ローカルなサブゴール条件付きモデルのみを学習することで,この制限を回避する。このゴール・スペース・プランニング(GSP)アプローチは計算効率が良く、時間的抽象化を組み込んで長期計画の高速化を実現し、トランジッション・ダイナミクスを完全に学習するのを避ける。 GSPアルゴリズムは抽象空間から様々な基礎学習者が異なる領域でより高速に学習できるような方法で価値を伝播することができることを示す。 This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# LoFiT: LLM表現の局所的な微調整 LoFiT: Localized Fine-tuning on LLM Representations ( http://arxiv.org/abs/2406.01563v1 ) ライセンス: Link先を確認	Fangcong Yin, Xi Ye, Greg Durrett,	(参考訳) 解釈可能性に関する最近の研究は、大規模言語モデル(LLM)が学習自由な方法で新しいタスクに適応可能であることを示している。例えば、ある注意ヘッドの出力に特定のバイアスベクトルを加えると、モデルの真性を高めることが報告される。本研究では,このような表現介入手法の効果的な代替手段として,局所的な微調整が有効であることを示す。そこで我々はLoFiT(Localized Fine-Tuning on LLM Representations)というフレームワークを導入し,特定のタスクを学習する上で最も重要なアテンションヘッドのサブセットを特定する。 LoFiTはスパースなヘッドセット(3%)にローカライズし、限られたトレーニングデータからオフセットベクトルを学習する。真理性や推論タスクにおいて,LoFiTの介入ベクトルは推論時間干渉などの表現介入手法のベクトルよりもLLM適応に有効であることがわかった。タスク固有のアテンションヘッドを選択することは、異なるタスクに選択されたヘッドに介入するよりも高いパフォーマンスをもたらす可能性がある。最後に、LoFiTは、パラメータを20倍から200倍に減らしたにもかかわらず、LoRAのような他のパラメータ効率のよい微調整手法と同等の性能を達成している。 Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. We introduce a framework called Localized Fine-Tuning on LLM Representations (LoFiT), which identifies a subset of attention heads that are most important for learning a specific task, then trains offset vectors to add to the model's hidden representations at those selected heads. LoFiT localizes to a sparse set of heads (3%) and learns the offset vectors from limited training data, comparable to the settings used for representation intervention. For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention. We also find that the localization step is important: selecting a task-specific set of attention heads can lead to higher performance than intervening on heads selected for a different task. Finally, for the tasks we study, LoFiT achieves comparable performance to other parameter-efficient fine-tuning methods such as LoRA, despite modifying 20x-200x fewer parameters than these methods.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# Helix: 異種GPU上のMax-Flowによる大規模言語モデルの分散サービング Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs ( http://arxiv.org/abs/2406.01566v1 ) ライセンス: Link先を確認	Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak,	(参考訳) 本稿では、異種GPUクラスタ上で動作する高スループット低レイテンシ大言語モデル(LLM)のための分散システムHelixを紹介する。 Helixの背景にある重要な考え方は、ノードがGPUインスタンスとエッジを表現している有向重み付きグラフの最大フロー問題として、ヘテロジニアスGPUとネットワーク接続上のLLMの推論計算を定式化することである。その後、Helixは混合整数線形プログラミング(MILP)アルゴリズムを使用して、高度に最適化された戦略を発見し、LLMを提供する。このアプローチにより、Helixはモデル配置と要求スケジューリングを共同で最適化できる。 24から42のGPUノードにわたる異種クラスタ設定の評価では、Helixはスループットを最大2.7$\times$に改善し、レイテンシを最大2.8$\times$と1.3$\times$に短縮した。 This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters. A key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem for a directed, weighted graph, whose nodes represent GPU instances and edges capture both GPU and network heterogeneity through their capacities. Helix then uses a mixed integer linear programming (MILP) algorithm to discover highly optimized strategies to serve LLMs. This approach allows Helix to jointly optimize model placement and request scheduling, two highly entangled tasks in heterogeneous LLM serving. Our evaluation on several heterogeneous cluster settings ranging from 24 to 42 GPU nodes shows that Helix improves serving throughput by up to 2.7$\times$ and reduces prompting and decoding latency by up to 2.8$\times$ and 1.3$\times$, respectively, compared to best existing approaches.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 単軌道コンフォーマル予測 Single Trajectory Conformal Prediction ( http://arxiv.org/abs/2406.01570v1 ) ライセンス: Link先を確認	Brian Lee, Nikolai Matni,	(参考訳) 本研究では, リスク制御予測セット(RCPS)の性能について, 確率力学系からの時間的相関データの単一軌跡を用いて, 共形予測を最小化する実験的リスク最小化法について検討した。まず, このブロッキング手法を用いて, RCPS が, 漸近的定常・収縮的ダイナミクスによってデータを生成する場合に, iid 設定で楽しむような性能保証を実現することを示す。次に,データ生成プロセスが定常性や収縮性から逸脱した場合に,RCPSの優雅な劣化を特徴付けるためにデカップリング手法を用いる。我々は、これらのツールがオンラインとオフラインの共形予測アルゴリズムの統一的な分析にどのように使えるのかを議論することで締めくくった。 We study the performance of risk-controlling prediction sets (RCPS), an empirical risk minimization-based formulation of conformal prediction, with a single trajectory of temporally correlated data from an unknown stochastic dynamical system. First, we use the blocking technique to show that RCPS attains performance guarantees similar to those enjoyed in the iid setting whenever data is generated by asymptotically stationary and contractive dynamics. Next, we use the decoupling technique to characterize the graceful degradation in RCPS guarantees when the data generating process deviates from stationarity and contractivity. We conclude by discussing how these tools could be used toward a unified analysis of online and offline conformal prediction algorithms, which are currently treated with very different tools.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 量子多体スピンラチェット Quantum many-body spin ratchets ( http://arxiv.org/abs/2406.01571v1 ) ライセンス: Link先を確認	Lenart Zadnik, Marko Ljubotina, Žiga Krajnik, Enej Ilievski, Tomaž Prosen,	(参考訳) キラル輸送を発生させるSU(2)不変量子ユニタリ回路のクラスを導入し、スピン輸送特性における空間反射と時間反転対称性の役割について検討する。局所的なユニタリゲートのパラメータを調整すると、ダイナミクスはカオスか積分可能である。後者は時空離散化(英語版)(Trotterized)高スピン量子ハイゼンベルク連鎖の一般化に対応する。空間反射対称性の破れは、動的スピン感受性の漂流をもたらすことを示した。注目すべきことに、単純な公式によって与えられる普遍的なドリフト速度は、平均磁化がゼロであれば、局所スピンに付随するSU(2)カシミール不変量の値にのみ依存する。積分可能な場合、熱力学Betheアンザッツ方程式の正確な解に基づいて、ドリフト速度公式を解析的に確認する。最後に、定常最大エントロピー状態における系の2つのハーフ間の時間積分電流の大きなゆらぎを検査することにより、ギャラヴォッティ-コーエン対称性の破れを証明し、そのような状態が平衡状態とはみなせないことを示唆する。時間積分電流のスケールした累積生成関数は、代わりに一般化された変動関係に従うことを示す。 Introducing a class of SU(2) invariant quantum unitary circuits generating chiral transport, we examine the role of broken space-reflection and time-reversal symmetries on spin transport properties. Upon adjusting parameters of local unitary gates, the dynamics can be either chaotic or integrable. The latter corresponds to a generalization of the space-time discretized (Trotterized) higher-spin quantum Heisenberg chain. We demonstrate that breaking of space-reflection symmetry results in a drift in the dynamical spin susceptibility. Remarkably, we find a universal drift velocity given by a simple formula which, at zero average magnetization, depends only on the values of SU(2) Casimir invariants associated with local spins. In the integrable case, the drift velocity formula is confirmed analytically based on the exact solution of thermodynamic Bethe ansatz equations. Finally, by inspecting the large fluctuations of the time-integrated current between two halves of the system in stationary maximum-entropy states, we demonstrate violation of the Gallavotti-Cohen symmetry, implying that such states cannot be regarded as equilibrium ones. We show that the scaled cumulant generating function of the time-integrated current instead obeys a generalized fluctuation relation.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 離散状態空間拡散と流れモデルのためのアンロック誘導 Unlocking Guidance for Discrete State-Space Diffusion and Flow Models ( http://arxiv.org/abs/2406.01572v1 ) ライセンス: Link先を確認	Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, Jennifer Listgarten,	(参考訳) 離散状態空間上の生成モデルは、特に自然科学の分野において、幅広い潜在的な応用を持つ。連続状態空間では、拡散と流れモデルに関するガイダンスを用いて、所望の特性を持つ制御可能で柔軟なサンプルの生成を実現している。しかし、これらのガイダンスアプローチは離散状態空間モデルに容易には適用できない。そこで本研究では,そのようなモデルにガイダンスを適用するための汎用的,原則的手法を提案する。提案手法は離散状態空間上での連続時間マルコフ過程の活用に依存し,所望の導出分布から抽出する際の計算的トラクタビリティを解放する。我々は,画像のガイド生成,小分子,DNA配列,タンパク質配列など,様々な応用のアプローチであるディスクリートガイダンスの有用性を実証する。 Generative models on discrete state-spaces have a wide range of potential applications, particularly in the domain of natural sciences. In continuous state-spaces, controllable and flexible generation of samples with desired properties has been realized using guidance on diffusion and flow models. However, these guidance approaches are not readily amenable to discrete state-space models. Consequently, we introduce a general and principled method for applying guidance on such models. Our method depends on leveraging continuous-time Markov processes on discrete state-spaces, which unlocks computational tractability for sampling from a desired guided distribution. We demonstrate the utility of our approach, Discrete Guidance, on a range of applications including guided generation of images, small-molecules, DNA sequences and protein sequences.	翻訳日:2024-06-05 21:51:15 公開日:2024-06-03
# 低レベルマルコフ決定過程を用いた確率的二値最適化 Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes ( http://arxiv.org/abs/2406.01575v1 ) ライセンス: Link先を確認	Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan Hu,	(参考訳) 様々な応用において、戦略的意思決定問題における最適政策は、環境構成と外因性事象の両方に依存する。これらの設定に対して、文脈マルコフ決定プロセス(BO-CMDP)を用いた二段階最適化(BO-CMDP)を導入する。 BO-CMDPは、リーダーとリーダーのコントロールを超えたランダムなコンテキストが、(潜在的に複数の)フォロワーが最も反応する(多くの)MDPのセットアップを決定する、スタックルバーグゲームと見なすことができる。このフレームワークは、従来の二段階最適化を超えて、MDPのモデル設計、税制設計、報酬形成、動的メカニズム設計など、さまざまな分野に関連性を見出す。本稿では,BO-CMDPを解くための確率的ハイパーポリシーグラディエントDescent (HPGD)アルゴリズムを提案し,その収束性を実証する。特にHPGDは、フォロワーの軌跡の観察のみを利用する。そのため、フォロワーは任意のトレーニング手順を使用でき、リーダーはさまざまな現実世界のシナリオに合わせて使用する特定のアルゴリズムを知らない。さらに,リーダがフォロワーのトレーニングに影響を及ぼすような設定も検討し,高速化されたアルゴリズムを提案する。アルゴリズムの性能を実証的に示す。 In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward shaping and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers' trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# 静的レジスト最小化と動的レジスト最小化の等価性 An Equivalence Between Static and Dynamic Regret Minimization ( http://arxiv.org/abs/2406.01577v1 ) ライセンス: Link先を確認	Andrew Jacobsen, Francesco Orabona,	(参考訳) オンライン凸最適化における動的後悔の最小化の問題は,アルゴリズムの累積損失と任意のコンパレータ列との差を最小化することを目的としている。このトピックに関する文献は非常に豊富だが、これらのアルゴリズムの分析と設計のための統一されたフレームワークはいまだに欠落している。本稿では, 動的後悔最小化は拡張決定空間における静的後悔最小化と同値であることを示す。この簡単な観察から、コンパレータシーケンスのばらつきによる損失と罰則のばらつきにより、罰則を取引する下位境界のフロンティアが存在することを示し、このフロンティアに沿った保証を達成するための枠組みを提供する。その結果、任意のコンパレータ列の正方形パス長に適応して、後悔する$R_{T}(u_{1},\dots,u_{T})\le O(\sqrt{T\sum_{t} \\|u_{t}-u_{t+1}\\|^{2}})$が成立することを初めて証明した。しかし、コンパレータ列の局所滑らかな2乗経路長に基づく新しい変数の概念に適応できることを証明し、$R_{T}(u_{1},\dots,u_{T})\le \tilde O(\sqrt{T\sum_{i}\\|\bar u_{i}-\bar u_{i+1}\\|^{2}})$という形の動的後悔を保証するアルゴリズムを提供する。多対数的な言葉では、新しい変数の概念はパス長を含む古典的な概念よりも決して悪くはない。 We study the problem of dynamic regret minimization in online convex optimization, in which the objective is to minimize the difference between the cumulative loss of an algorithm and that of an arbitrary sequence of comparators. While the literature on this topic is very rich, a unifying framework for the analysis and design of these algorithms is still missing. In this paper, \emph{we show that dynamic regret minimization is equivalent to static regret minimization in an extended decision space}. Using this simple observation, we show that there is a frontier of lower bounds trading off penalties due to the variance of the losses and penalties due to variability of the comparator sequence, and provide a framework for achieving any of the guarantees along this frontier. As a result, we prove for the first time that adapting to the squared path-length of an arbitrary sequence of comparators to achieve regret $R_{T}(u_{1},\dots,u_{T})\le O(\sqrt{T\sum_{t} \\|u_{t}-u_{t+1}\\|^{2}})$ is impossible. However, we prove that it is possible to adapt to a new notion of variability based on the locally-smoothed squared path-length of the comparator sequence, and provide an algorithm guaranteeing dynamic regret of the form $R_{T}(u_{1},\dots,u_{T})\le \tilde O(\sqrt{T\sum_{i}\\|\bar u_{i}-\bar u_{i+1}\\|^{2}})$. Up to polylogarithmic terms, the new notion of variability is never worse than the classic one involving the path-length.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# 技術的負債がリードタイムに与える影響を測る--産業ケーススタディ Towards Measuring the Impact of Technical Debt on Lead Time: An Industrial Case Study ( http://arxiv.org/abs/2406.01578v1 ) ライセンス: Link先を確認	Bhuwan Paudel, Javier Gonzalez-Huerta, Ehsan Zabardast, Eriks Klotins,	(参考訳) 背景: ソフトウェア企業は、技術的負債を導入し、潜在的に開発者の時間を浪費できるトレードオフとして、迅速な価値提供と品質のバランスをとる必要があります。ソフトウェアシステムが進化するにつれて、技術的負債は増加する傾向にある。しかし、リードタイムへの影響を見積もるには、より経験的かつ実験的な証拠が必要である。目的: 技術的負債がJira問題の解決にリードタイムに影響を及ぼすかどうかを実証研究する。さらに、技術的負債によってリードタイムの変動が説明できる範囲を測定することを目的としています。方法: 産業ケーススタディを行い, それぞれが個別に分析した6成分の関係について検討した。技術的負債はSonarQubeを用いて測定され、コンポーネントのサイズで正規化され、Jiraの問題を解決するリードタイムはJiraから直接収集された。結果: さまざまな結果が得られた。技術的負債は2つのコンポーネントのリードタイムに適度な影響を与えましたが、他の2つのコンポーネントには意味のある影響は見られませんでした。残りの2成分に中程度の負の影響が認められた。結論: 技術的負債だけでは、リードタイムにおけるすべてのばらつきを説明できない。ですから,他の変数(例えば,変更のサイズ,複雑性,関与するチームの数,コンポーネントのオーナシップなど)がリードタイムに影響を与えているか,あるいは後になって現れる可能性のある残留的な影響があるかも知れません。これらの相反する変数のさらなる研究が不可欠である。 Background: Software companies must balance fast value delivery with quality, a trade-off that can introduce technical debt and potentially waste developers' time. As software systems evolve, technical debt tends to increase. However, estimating its impact on lead time still requires more empirical and experimental evidence. Objective: We conduct an empirical study investigating whether technical debt impacts lead time in resolving Jira issues. Furthermore, our aim is to measure the extent to which variance in lead time is explainable by the technical debt. Method: We conducted an industrial case study to examine the relationship in six components, each of which was analyzed individually. Technical debt was measured using SonarQube and normalized with the component's size, while lead time to resolve Jira issues was collected directly from Jira. Results: We found a set of mixed results. Technical debt had a moderate positive impact on lead time in two components, while we did not see a meaningful impact on two others. A moderate negative impact was found in the remaining two components. Conclusion: The findings show that technical debt alone can not explain all the variance in lead time, which ranges from 5% up to 41% across components. So, there should be some other variables (e.g., size of the changes made, complexity, number of teams involved, component ownership) impacting lead time, or it might have a residual effect that might manifest later on. Further investigation into those confounding variables is essential.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# 3次元生成のためのテトラメドロンめっき Tetrahedron Splatting for 3D Generation ( http://arxiv.org/abs/2406.01579v1 ) ライセンス: Link先を確認	Chun Gu, Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang,	(参考訳) 3次元表現は2次元拡散前の3次元生成の顕著な進歩に不可欠である。フレキシブルな表現として、NeRFは初めて3D表現に採用された。しかし、密度ベースのボリュームレンダリングでは、計算オーバーヘッドと不正確なメッシュ抽出の両方に悩まされる。署名された距離フィールドとマーチング・テトラヘドラを使用することで、DMTetは正確なメッシュ抽出とリアルタイムレンダリングが可能になるが、メッシュにおける大きなトポロジ的変化の処理には制限があり、最適化の課題に繋がる。あるいは、3Dガウススメッティング(3DGS)は、メッシュ抽出に不足しながら、トレーニングとレンダリングの効率の両方に好適である。本研究では,最適化時の収束,高精度メッシュ抽出,リアルタイムレンダリングを同時に実現する新しい3D表現であるTetrahedron Splatting(TeT-Splatting)を提案する。これは、正確なメッシュ抽出の望ましい能力を保ちつつ、構造化テトラヘドラルグリッドに表面ベースの体積レンダリングを統合することで実現される。さらに,符号付き距離場に対する固有および正規整合正則化項を組み込んで生成品質と安定性を向上する。批判的に言えば、私たちの表現はメッシュ抽出なしで訓練できるため、最適化プロセスの収束が容易になります。私たちのTeT-Splattingは、テクスチャ最適化のためのポリゴンメッシュとともに、既存の3D生成パイプラインに簡単に統合できます。大規模な実験により, コンバージェンス速度, レンダリング効率, メッシュ品質のトレードオフは, 異なる3次元生成環境下での代替品に比べて優れていることがわかった。 3D representation is essential to the significant advance of 3D generation with 2D diffusion priors. As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously. This is achieved by integrating surface-based volumetric rendering within a structured tetrahedral grid while preserving the desired ability of precise mesh extraction, and a tile-based differentiable tetrahedron rasterizer. Furthermore, we incorporate eikonal and normal consistency regularization terms for the signed distance field to improve generation quality and stability. Critically, our representation can be trained without mesh extraction, making the optimization process easier to converge. Our TeT-Splatting can be readily integrated in existing 3D generation pipelines, along with polygonal mesh for texture optimization. Extensive experiments show that our TeT-Splatting strikes a superior tradeoff among convergence speed, render efficiency, and mesh quality as compared to previous alternatives under varying 3D generation settings.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# ニューラルネットワークによる情報理論限界近傍のSGDを用いた低次元多項式の学習 Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit ( http://arxiv.org/abs/2406.01581v1 ) ライセンス: Link先を確認	Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu,	(参考訳) 単一インデックス対象関数 $f_(\boldsymbol{x}) = \textstyle\sigma_\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ の勾配勾配勾配学習の問題を研究する。前回の研究では、ニューラルネットワークの勾配に基づくトレーニングが、$n\gtrsim d^{\Thetaでこのターゲットを学習できることが示されている。 (p)}$サンプルとそのような統計的複雑さは相関的な統計的クエリーの下限によって予測される。驚くべきことに、SGDアルゴリズムで最適化された2層ニューラルネットワークは、サンプルと実行時複雑性が$n \asymp T \asymp Cの任意の多項式リンク関数の$f_$を学習する。 (q) \cdot d\mathrm{polylog} d$, where constant $C (q)$は情報指数に関係なく$\sigma_$の次数にのみ依存する。我々の分析の核となるのは、勾配計算におけるミニバッチの再利用であり、相関クエリ以上の高次情報をもたらす。 We study the problem of gradient descent learning of a single-index target function $f_(\boldsymbol{x}) = \textstyle\sigma_\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $\sigma_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# CLIP 以外の ViT におけるテキストによる画像表現の分解と解釈 Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP ( http://arxiv.org/abs/2406.01583v1 ) ライセンス: Link先を確認	Sriram Balasubramanian, Samyadeep Basu, Soheil Feizi,	(参考訳) 最近の研究は、CLIP-ViTモデルの個々のコンポーネントが、CLIPの共有画像テキスト表現空間を活用することで、最終的な表現にどのように貢献するかを探求している。これらのコンポーネント、例えばアテンションヘッドやMLPは、形状、色、テクスチャといった異なる画像の特徴を捉えている。しかし、任意の視覚変換器(ViT)におけるこれらのコンポーネントの役割を理解することは困難である。この目的のために、CLIP以外のViTにおける様々なコンポーネントの役割を識別できる一般的なフレームワークを紹介します。具体的には (a) 異なるモデルコンポーネントからのコントリビューションへの最終表現の分解を自動化し、 b) これらのコントリビューションをCLIP空間に線形にマッピングしてテキストで解釈する。さらに,特定の特徴について重要な要素をランク付けする新しいスコアリング機能を導入する。これらの知見は, テキスト記述や参照画像を用いた画像検索, トークンの重要度熱マップの可視化, スパイラル相関の緩和など, 様々なViT変異体(例: DeiT, DINO, DINOv2, Swin, MaxViT)にフレームワークを適用し, 特定の画像特徴に関する異なるコンポーネントの役割についての洞察を得る。 Recent works have explored how individual components of the CLIP-ViT model contribute to the final representation by leveraging the shared image-text representation space of CLIP. These components, such as attention heads and MLPs, have been shown to capture distinct image features like shape, color or texture. However, understanding the role of these components in arbitrary vision transformers (ViTs) is challenging. To this end, we introduce a general framework which can identify the roles of various components in ViTs beyond CLIP. Specifically, we (a) automate the decomposition of the final representation into contributions from different model components, and (b) linearly map these contributions to CLIP space to interpret them via text. Additionally, we introduce a novel scoring function to rank components by their importance with respect to specific features. Applying our framework to various ViT variants (e.g. DeiT, DINO, DINOv2, Swin, MaxViT), we gain insights into the roles of different components concerning particular image features.These insights facilitate applications such as image retrieval using text descriptions or reference images, visualizing token importance heatmaps, and mitigating spurious correlations.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# 空間RGPT:視覚言語モデルにおける基底空間推論 SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model ( http://arxiv.org/abs/2406.01584v1 ) ライセンス: Link先を確認	An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu,	(参考訳) 視覚言語モデル(VLM)は2次元視覚と言語タスクにおいて顕著な性能を示した。しかし、空間配置を推論する能力は依然として限られている。本研究では,VLMの空間知覚と推論能力を高めるために空間領域GPT(SpatialRGPT)を導入する。空間RGPTは,(1)3次元シーングラフからの地域表現の効果的な学習を可能にするデータキュレーションパイプライン,(2)既存のVLMのビジュアルエンコーダに奥行き情報を統合する柔軟なプラグインモジュールである。推測中、ユーザが指定した領域の提案が提供されると、SpatialRGPTは相対的な方向と距離を正確に知覚できる。さらに,室内,屋外,シミュレートされた環境を含む地上3次元アノテーションを用いたベンチマークであるSpatialRGBT-Benchを提案し,VLMにおける3次元空間認識の評価を行った。本研究では,空間的推論タスクにおける局所的プロンプトと非局所的プロンプトの双方において,空間的RGPTにより性能が著しく向上することを示す。このモデルはまた強力な一般化能力を示し、複雑な空間関係を効果的に推論し、ロボットタスクのための地域対応の高密度報酬アノテータとして機能する。コード、データセット、ベンチマークはhttps://www.anjiecheng.me/SpatialRGPTで公開される。 Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities. SpatialRGPT advances VLMs' spatial understanding through two key innovations: (1) a data curation pipeline that enables effective learning of regional representation from 3D scene graphs, and (2) a flexible plugin module for integrating depth information into the visual encoder of existing VLMs. During inference, when provided with user-specified region proposals, SpatialRGPT can accurately perceive their relative directions and distances. Additionally, we propose SpatialRGBT-Bench, a benchmark with ground-truth 3D annotations encompassing indoor, outdoor, and simulated environments, for evaluating 3D spatial cognition in VLMs. Our results demonstrate that SpatialRGPT significantly enhances performance in spatial reasoning tasks, both with and without local region prompts. The model also exhibits strong generalization capabilities, effectively reasoning about complex spatial relations and functioning as a region-aware dense reward annotator for robotic tasks. Code, dataset, and benchmark will be released at https://www.anjiecheng.me/SpatialRGPT	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# ManiCM:ロボットマニピュレーションのための一貫性モデルによるリアルタイム3次元拡散政策 ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation ( http://arxiv.org/abs/2406.01586v1 ) ライセンス: Link先を確認	Guanxing Lu, Zifeng Gao, Tianxing Chen, Wenxun Dai, Ziwei Wang, Yansong Tang,	(参考訳) 拡散モデルは自然画像から運動軌道への複雑な分布を生成するのに有効であることが確認されている。近年の拡散法は3次元ロボット操作作業において顕著な性能を示し,特に高次元観察において,複数のデノナイジングステップによる実行時の非効率に悩まされている。そこで本研究では,拡散過程に一貫性制約を課すリアルタイムロボット操作モデルManiCMを提案する。具体的には、点クラウド入力に条件付されたロボットの動作空間における一貫した拡散過程を定式化し、元の動作はODE軌道上の任意の点から直接微分される必要がある。この過程をモデル化するために、我々は、低次元の作用多様体における高速収束のために、視覚コミュニティ内のノイズを予測せずに、アクションサンプルを直接予測する一貫性蒸留法を設計する。我々は,AdroitとMetaworldの31のロボット操作タスクに対するManiCMの評価を行い,提案手法は競争平均成功率を維持しつつ,平均推論速度を10倍向上させることを示した。 Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# nn2poly:ニューラルネットワークを解釈可能な多項式に変換するためのRパッケージ nn2poly: An R Package for Converting Neural Networks into Interpretable Polynomials ( http://arxiv.org/abs/2406.01588v1 ) ライセンス: Link先を確認	Pablo Morala, Jenny Alexandra Cifuentes, Rosa E. Lillo, Iñaki Ucar,	(参考訳) NN2Poly法は、元のネットワークと同じ方法で予測される多項式表現を用いて、フィードフォワードニューラルネットワークを説明・解釈するためのRの実装を提供する。この相互作用をキャプチャする能力は、たいていの説明可能な人工知能(XAI)メソッドに欠けている重要な側面である。このパッケージはRの主要なディープラーニングフレームワークパッケージ(テンソルフローとトーチ)との統合を提供し、NN2Polyアルゴリズムのユーザフレンドリーなアプリケーションを可能にする。さらに、nn2polyは、同じフレームワークでネットワークトレーニングで使用されるために必要なウェイト制約の実装を提供する。他のニューラルネットワークパッケージも、その重みをリスト形式に含めることで使用できる。 nn2polyで得られたポリノミアルは、新しいデータで予測したり、独自のプロット法で視覚化することができる。シミュレーションは、ニューラルネットワークを解釈するためにRで利用可能な他のアプローチとの比較とともに、パッケージの使用を例示する。 The nn2poly package provides the implementation in R of the NN2Poly method to explain and interpret feed-forward neural networks by means of polynomial representations that predict in an equivalent manner as the original network.Through the obtained polynomial coefficients, the effect and importance of each variable and their interactions on the output can be represented. This capabiltiy of capturing interactions is a key aspect usually missing from most Explainable Artificial Intelligence (XAI) methods, specially if they rely on expensive computations that can be amplified when used on large neural networks. The package provides integration with the main deep learning framework packages in R (tensorflow and torch), allowing an user-friendly application of the NN2Poly algorithm. Furthermore, nn2poly provides implementation of the required weight constraints to be used during the network training in those same frameworks. Other neural networks packages can also be used by including their weights in list format. Polynomials obtained with nn2poly can also be used to predict with new data or be visualized through its own plot method. Simulations are provided exemplifying the usage of the package alongside with a comparison with other approaches available in R to interpret neural networks.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# ロッテリにおけるオッドの試行--ニューラルネットワークにおける過度パラメータ化とカリキュラムの相互作用 Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks ( http://arxiv.org/abs/2406.01589v1 ) ライセンス: Link先を確認	Stefano Sarao Mannelli, Yaraslau Ivashinka, Andrew Saxe, Luca Saglietti,	(参考訳) 幅広い経験的および理論的研究により、過パラメータ化がニューラルネットワークの性能を増幅できることが示されている。抽選券仮説によれば、過度にパラメータ化されたネットワークは、目の前の課題を解決するために十分に初期化されているサブネットワークを含む可能性が高くなっている。動物学習にインスパイアされたより微妙なアプローチは、例の順序、すなわちカリキュラムを提供することによって学習者をその課題に導くことである。しかし、この学習戦略はディープラーニングアプリケーションにはほとんど役に立たないようだ。本研究では,カリキュラム学習とオーバーパラメトリゼーションを結びつける分析的研究を行う。特に,XOR-like Gaussian Mixture 問題における2層ネットワークのオンライン学習環境における相互作用について検討する。以上の結果から,高次パラメータ化は,問題を単純化しつつもキュリキュラのメリットを制限し,ディープラーニングにおけるキュリキュラの非効率性を理論的に説明できることが示唆された。 A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-network that is well-initialised to solve the task at hand. A more parsimonious approach, inspired by animal learning, consists in guiding the learner towards solving the task by curating the order of the examples, i.e. providing a curriculum. However, this learning strategy seems to be hardly beneficial in deep learning applications. In this work, we undertake an analytical study that connects curriculum learning and overparameterisation. In particular, we investigate their interplay in the online learning setting for a 2-layer network in the XOR-like Gaussian Mixture problem. Our results show that a high degree of overparameterisation -- while simplifying the problem -- can limit the benefit from curricula, providing a theoretical account of the ineffectiveness of curricula in deep learning.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# ノイズ量子ゲートにおける一般化位相推定 Generalized phase estimation in noisy quantum gates ( http://arxiv.org/abs/2406.01590v1 ) ライセンス: Link先を確認	Giovanni Ragazzi, Simone Cavazzoni, Paolo Bordone, Matteo G. A. Paris,	(参考訳) 雑音のある量子ゲートの作用により、関心のパラメータが量子状態に符号化されるメロジカルシナリオについて検討し、量子フィッシャー情報(QFI)の挙動を解析して、正確性に縛られた究極の境界について検討する。我々は、キュービットゲートに焦点をあて、ゲートの連続的な応用の可能性を検討する。我々は、単体ゲートの自明な場合を超えて、異なるステップ(ゲート応用)におけるQFIにどのように影響するかを考察し、実行された量子演算にノイズを導入するメトロジー手順の頑健さを特徴づける。我々は、Von Mises-Fisher分布に支配される古典的ゆらぎとして、キュービット回転に影響を与える劣化雑音と傾き雑音をモデル化する。ノイズレスの場合と比較して、QFIはステップ数と2次的に成長し、非単調な振る舞いと、ゲートの動作を正確に特徴づけるために実行すべきステップの理想的な数を定義するQFIにおける最大値の出現を観察する。 We examine metrological scenarios where the parameter of interest is encoded onto a quantum state through the action of a noisy quantum gate and investigate the ultimate bound to precision by analyzing the behaviour of the Quantum Fisher Information (QFI). We focus on qubit gates and consider the possibility of employing successive applications of the gate. We go beyond the trivial case of unitary gates and characterize the robustness of the metrological procedure introducing noise in the performed quantum operation, looking at how this affects the QFI at different steps (gate applications). We model the dephasing and tilting noise affecting qubit rotations as classical fluctuations governed by a Von Mises-Fisher distribution. Compared to the noiseless case, in which the QFI grows quadratically with the number of steps, we observe a non monotonic behavior, and the appearance of a maximum in the QFI, which defines the ideal number of steps that should be performed in order to precisely characterize the action of the gate.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# DeNVeR:unsupervised Video Vessel Segmentationのための変形可能なニューラル容器表現 DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation ( http://arxiv.org/abs/2406.01591v1 ) ライセンス: Link先を確認	Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee, Yu-Lun Liu,	(参考訳) 本稿では,X線ビデオにおける非教師なしの血管分割手法であるDeformable Neural Vessel Representation (DeNVeR)を提案する。 DeNVeRは光フローと層分離を使用し、テストタイムトレーニングを通じてセグメンテーション精度と適応性を向上する。我々の研究の重要な要素はXACVデータセットの導入である。これは、高品質で手動でセグメンテーショングラウンド真理をラベル付けした最初のX線冠動脈造影ビデオデータセットである。 DeNVeRは血管セグメンテーションの最先端手法よりも優れていることを示す。本稿では, 医用画像の進歩, 疾患診断・治療計画のための堅牢でデータ効率のよいツールの提供, ビデオ血管セグメンテーションにおける新たな研究基準の策定について述べる。ビデオ結果のプロジェクトページはhttps://kirito878.github.io/DeNVeR/。 This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation. See our project page for video results at https://kirito878.github.io/DeNVeR/.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# 対話型3次元モデリングのためのテキスト誘導制御可能なメッシュ微細化 Text-guided Controllable Mesh Refinement for Interactive 3D Modeling ( http://arxiv.org/abs/2406.01592v1 ) ライセンス: Link先を確認	Yun-Chun Chen, Selena Ling, Zhiqin Chen, Vladimir G. Kim, Matheus Gadelha, Alec Jacobson,	(参考訳) テキストプロンプトによって案内される入力粗い3Dメッシュに幾何学的詳細を加える新しい手法を提案する。私たちの方法は3つの段階から成り立っている。まず、入力粗い幾何学と入力テキストプロンプトに基づいて、単一のビューRGB画像を生成する。このシングルビュー画像生成ステップにより、ユーザは結果の事前視覚化が可能になり、その後のマルチビュー生成に対してより強い条件付けを提供する。第2に、新しいマルチビュー正規生成アーキテクチャを用いて、正常画像の6つの異なるビューを共同で生成する。共同ビュー生成は矛盾を低減し、よりシャープな詳細をもたらす。第3に、すべてのビューに対してメッシュを最適化し、出力として微細で詳細な幾何学を生成する。得られた方法は、数秒以内に出力を生成し、粗い構造、ポーズ、および結果の3Dメッシュの所望の詳細を明示的なユーザ制御を提供する。プロジェクトページ: https://text-mesh-refinement.github.io We propose a novel technique for adding geometric details to an input coarse 3D mesh guided by a text prompt. Our method is composed of three stages. First, we generate a single-view RGB image conditioned on the input coarse geometry and the input text prompt. This single-view image generation step allows the user to pre-visualize the result and offers stronger conditioning for subsequent multi-view generation. Second, we use our novel multi-view normal generation architecture to jointly generate six different views of the normal images. The joint view generation reduces inconsistencies and leads to sharper details. Third, we optimize our mesh with respect to all views and generate a fine, detailed geometry as output. The resulting method produces an output within seconds and offers explicit user control over the coarse structure, pose, and desired details of the resulting 3D mesh. Project page: https://text-mesh-refinement.github.io.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# メッシュ吸着ガウス平滑化による動的3次元物体の再構成とシミュレーション Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting ( http://arxiv.org/abs/2406.01593v1 ) ライセンス: Link先を確認	Shaojie Ma, Yawei Luo, Yi Yang,	(参考訳) 再現は多様なシーンに適応可能な柔軟な3D表現を要求するのに対し、シミュレーションは動きの原理を効果的にモデル化するために構造化された表現を必要とする。本稿では,このようなジレンマを解決するために,メッシュ吸着型ガウス平滑化法(MaGS)を提案する。 MaGSは3Dガウスのメッシュ表面へのホバリングを制約し、3Dガウスのレンダリング柔軟性とメッシュの空間コヒーレンスを組み合わせた相互吸着メッシュ-ガウスの3D表現を生成する。この表現を活用することで、メッシュと3Dガウス間の相対変位をモデル化する学習可能な相対変形場(RDF)を導入し、ARAPのみに依存する従来のメッシュ駆動変形パラダイムを拡張して、各3Dガウスの運動をより正確に捉える。メッシュ、3Dガウス、RDFを共同最適化することで、MaGSは高いレンダリング精度とリアルな変形を実現する。 D-NeRFデータセットとNeRF-DSデータセットの大規模な実験は、MaGSが再構成とシミュレーションの両方で競合する結果を生成できることを実証している。 3D reconstruction and simulation, while interrelated, have distinct objectives: reconstruction demands a flexible 3D representation adaptable to diverse scenes, whereas simulation requires a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to resolve such a dilemma. MaGS constrains 3D Gaussians to hover on the mesh surface, creating a mutual-adsorbed mesh-Gaussian 3D representation that combines the rendering flexibility of 3D Gaussians with the spatial coherence of meshes. Leveraging this representation, we introduce a learnable Relative Deformation Field (RDF) to model the relative displacement between the mesh and 3D Gaussians, extending traditional mesh-driven deformation paradigms that only rely on ARAP prior, thus capturing the motion of each 3D Gaussian more precisely. By joint optimizing meshes, 3D Gaussians, and RDF, MaGS achieves both high rendering accuracy and realistic deformation. Extensive experiments on the D-NeRF and NeRF-DS datasets demonstrate that MaGS can generate competitive results in both reconstruction and simulation.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# DiffUHaul: 画像にオブジェクトをドラッグする訓練不要の方法 DiffUHaul: A Training-Free Method for Object Dragging in Images ( http://arxiv.org/abs/2406.01594v1 ) ライセンス: Link先を確認	Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie,	(参考訳) テキストから画像への拡散モデルは多くの画像編集タスクを解くのに有効であることが証明されている。しかし、シーン内のオブジェクトをシームレスに移動させるという一見単純な作業は、驚くほど難しいままだ。この問題に対処する既存の手法は、空間的推論が欠如しているために、現実のシナリオで確実に機能するのに苦労することが多い。本研究では,DiffUHaulと呼ばれるオブジェクトドラッグングタスクに対して,局所的なテキスト・画像モデルの空間的理解を活用する学習自由度手法を提案する。局所モデルのレイアウト入力を盲目的に操作すると、モデル内のオブジェクト表現の内在的絡み合いにより、編集性能が低下する傾向にある。この目的のために,まず注目マスキングを各デノナイズステップに適用し,各生成物を異なるオブジェクトに分散させ,高レベルのオブジェクトの外観を維持するために自己認識共有機構を採用する。さらに,新しい拡散アンカリング手法を提案する。初期の段階では,ソース画像とターゲット画像の注意特徴を補間して,元の外観とスムーズに新しいレイアウトを融合させ,後段では,ソース画像から補間された画像に局所的特徴を渡すことで,細かなオブジェクトの詳細を保持する。 DiffUHaul を実画像編集に適用するために,DiffUHaul に DDPM 自己注意バケットを適用する。最後に,本課題に対する自動評価パイプラインを導入し,本手法の有効性を示す。私たちの結果は、ユーザの好み調査によって強化されています。 Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task. Blindly manipulating layout inputs of the localized model tends to cause low editing performance due to the intrinsic entanglement of object representation in the model. To this end, we first apply attention masking in each denoising step to make the generation more disentangled across different objects and adopt the self-attention sharing mechanism to preserve the high-level object appearance. Furthermore, we propose a new diffusion anchoring technique: in the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance; in the later denoising steps, we pass the localized features from the source images to the interpolated images to retain fine-grained object details. To adapt DiffUHaul to real-image editing, we apply a DDPM self-attention bucketing that can better reconstruct real images with the localized model. Finally, we introduce an automated evaluation pipeline for this task and showcase the efficacy of our method. Our results are reinforced through a user preference study.	翻訳日:2024-06-05 21:41:25 公開日:2024-06-03
# MultiPly:野生のモノクラービデオから複数人の再構築 MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild ( http://arxiv.org/abs/2406.01595v1 ) ライセンス: Link先を確認	Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song,	(参考訳) モノクラーインザワイルドビデオから3Dで複数の人物を再構成する新しいフレームワークであるMultiPlyを提案する。モノラルなインザワイルドビデオから自然に動き、相互作用する複数の個人を再構築することは、難しい課題だ。これに対処するには、被写体に関する事前の知識がなくても、個人が正確にピクセルレベルの絡み合う必要がある。さらに、短いビデオシーケンスから複雑な3次元の人間の形状を復元し、難易度を高める必要がある。これらの課題に対処するために、まず、個々の人間と背景モデルによって合成されたシーン全体の階層化されたニューラル表現を定義します。階層化可能なボリュームレンダリングを通じて,ビデオから階層化ニューラル表現を学習する。この学習プロセスは, 自己教師付き3次元セグメント化モジュールとプロンプト可能な2次元セグメント化モジュールを組み合わせたハイブリッドインスタンスセグメント化アプローチによってさらに強化され, 密接な相互作用の下でも信頼性の高いインスタンスセグメント化管理が実現される。人間のポーズと形状/外観を交互に最適化するために、信頼誘導最適化の定式化を導入する。光度情報を用いて人間のポーズを洗練させ、人間のダイナミクスに物理的に妥当な制約を課し、高忠実度で時間的に一貫した3D再構成を実現するための効果的な目的を取り入れた。提案手法の評価は,公開データセットや動画の先行技術よりも優れていることを示す。 We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering intricate and complete 3D human shapes from short video sequences, intensifying the level of difficulty. To tackle these challenges, we first define a layered neural representation for the entire scene, composited by individual human and background models. We learn the layered neural representation from videos via our layer-wise differentiable volume rendering. This learning process is further enhanced by our hybrid instance segmentation approach which combines the self-supervised 3D segmentation and the promptable 2D segmentation module, yielding reliable instance segmentation supervision even under close human interaction. A confidence-guided optimization formulation is introduced to optimize the human poses and shape/appearance alternately. We incorporate effective objectives to refine human poses via photometric information and impose physically plausible constraints on human dynamics, leading to temporally consistent 3D reconstructions with high fidelity. The evaluation of our method shows the superiority over prior art on publicly available datasets and in-the-wild videos.	翻訳日:2024-06-05 21:31:36 公開日:2024-06-03
# TimeCMA: クロスモーダルアライメントによるLCMを利用した時系列予測を目指して TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment ( http://arxiv.org/abs/2406.01638v1 ) ライセンス: Link先を確認	Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, Rui Zhao,	(参考訳) スケーラブルなモバイルセンシングの普及は、現実世界のアプリケーションに大量の時系列データをもたらした。多変量時系列予測 (MTSF) は, 過去の観測結果に基づいて, 将来の時系列値を予測することを目的としている。既存のMTSF法は、パラメータ化の制限と小規模な訓練データに悩まされている。近年,予測性能が期待できるが計算コストが重い大規模言語モデル (LLM) が時系列で導入されている。これらの課題を解決するために,LLMを利用した時系列予測フレームワークであるTimeCMAを提案する。 2つの分岐を持つ双対モダリティ符号化モジュールを設計し、逆変換器を用いて時系列の比較的低品質で純粋な埋め込みを抽出する。さらに、LLMを利用したエンコード分岐は、プレトレーニングLDMを介して高品質だが絡み合ったプロンプト埋め込みを得るよう促すのと同じ時系列をラップする。そこで我々は,高速な埋め込みから高品質で純粋な時系列埋め込みを検索するためのモジュールを設計する。さらに,複数の変数間の依存関係を抽出し,複数の変数間の関係を予測し,関係する埋め込みをデコードする時系列予測モジュールを開発した。特に、時間情報を最後のトークンにエンコードするプロンプトを調整し、計算コストを削減するために最後のトークン埋め込みストレージを設計する。実データに関する大規模な実験は、提案したフレームワークの精度と効率に関する洞察を提供する。 The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language models (LLMs) have been introduced in time series, which achieve promising forecasting performance but incur heavy computational costs. To solve these challenges, we propose TimeCMA, an LLM-empowered framework for time series forecasting with cross-modality alignment. We design a dual-modality encoding module with two branches, where the time series encoding branch extracts relatively low-quality yet pure embeddings of time series through an inverted Transformer. In addition, the LLM-empowered encoding branch wraps the same time series as prompts to obtain high-quality yet entangled prompt embeddings via a Pre-trained LLM. Then, we design a cross-modality alignment module to retrieve high-quality and pure time series embeddings from the prompt embeddings. Moreover, we develop a time series forecasting module to decode the aligned embeddings while capturing dependencies among multiple variables for forecasting. Notably, we tailor the prompt to encode sufficient temporal information into a last token and design the last token embedding storage to reduce computational costs. Extensive experiments on real data offer insight into the accuracy and efficiency of the proposed framework.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 自己関心エージェントからの相互報酬効果の協調 Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents ( http://arxiv.org/abs/2406.01641v1 ) ライセンス: Link先を確認	John L. Zhou, Weizhe Hong, Jonathan C. Kao,	(参考訳) 利己的な個人間の創発的な協力は、自然界で広く見られる現象であるが、人工的に知的なエージェント間の相互作用においては、いまだ解明されていない。代わりに、ナシブ強化学習アルゴリズムは一般的に、最も単純な社会的ジレンマにおいてもパレートに支配された結果に収束する。対戦者形成手法の新たなクラスは、他のエージェントの学習に影響を与えることにより、社会的な結果に到達する能力を示している。しかし、それらは他のエージェントの予測学習ステップやメタゲームダイナミクスの学習を通じて高階微分に依存しており、それぞれ反対の学習規則や指数的なサンプル複雑性よりも厳密な仮定に依存している。学習ルールに依存しない、サンプル効率の良い代替手段として、本研究では、相手の行動がリターンに与える影響を、本質的に動機づける強化学習エージェントであるReciprocatorを導入する。このアプローチは、他のエージェントのQ$値の変更を、(Reciprocatorに関して)有益なアクションの後にリターンを増やし、有害なアクションの後にそれを減らし、ポリシー更新を直接形作ることなく、相互に有益なアクションへと導くことによって効果的に求めている。共用者は同時学習中に時間的に拡張された社会ジレンマの協力を促進するために使用できることを示す。 Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, na\"ive reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, they rely on higher-order derivatives through the predicted learning step of other agents or learning meta-game dynamics, which in turn rely on stringent assumptions over opponent learning rules or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of an opponent's actions on their returns. This approach effectively seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without attempting to directly shape policy updates. We show that Reciprocators can be used to promote cooperation in a variety of temporally extended social dilemmas during simultaneous learning.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# FNP : 任意解法データ同化のためのフーリエニューラルプロセス FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation ( http://arxiv.org/abs/2406.01645v1 ) ライセンス: Link先を確認	Kun Chen, Tao Chen, Peng Ye, Hao Chen, Kang Chen, Tao Han, Wanli Ouyang, Lei Bai,	(参考訳) データ同化は、短期的な予測と観測を組み合わせることで、大気状態の最良の推定を得るために、現代の中距離気象予報システムにおいて欠かせない要素である。近年、AIベースのデータ同化アプローチは、計算消費の点で従来の技術よりも大きな優位性があることから注目が集まっている。しかし、既存のAIベースのデータ同化法は特定の解像度で観測のみを扱うことができ、他の解像度と観測を同化する互換性と一般化能力に欠ける。本稿では、複雑な実世界の観測がしばしば異なる分解能を持つことを考慮し、この論文において、textit{arbitrary- resolution data assimilation} のための \textit{\textbf{Fourier Neural Processes}} (FNP) を提案する。設計されたモジュールの効率と神経プロセスの柔軟な構造を活用し、FNPは様々な解像度で観察を同化することで最先端の結果を達成するとともに、解像度と観測量の増大に伴い、それに対する利点が増大する。さらに, 固定解像度で訓練したFNPは, 細調整を伴わずに, 分布外解像度と観測情報再構成タスクとの同化を直接処理し, データ解像度, タスク間での優れた一般化能力を示すことができる。 Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However, existing AI-based data assimilation methods can only handle observations with a specific resolution, lacking the compatibility and generalization ability to assimilate observations with other resolutions. Considering that complex real-world observations often have different resolutions, we propose the \textit{\textbf{Fourier Neural Processes}} (FNP) for \textit{arbitrary-resolution data assimilation} in this paper. Leveraging the efficiency of the designed modules and flexible structure of neural processes, FNP achieves state-of-the-art results in assimilating observations with varying resolutions, and also exhibits increasing advantages over the counterparts as the resolution and the amount of observations increase. Moreover, our FNP trained on a fixed resolution can directly handle the assimilation of observations with out-of-distribution resolutions and the observational information reconstruction task without additional fine-tuning, demonstrating its excellent generalization ability across data resolutions as well as across tasks.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# iKAN: Kanを用いたグローバルインクリメンタルラーニングによる異種データセット間の人間活動認識 iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets ( http://arxiv.org/abs/2406.01646v1 ) ライセンス: Link先を確認	Mengxi Liu, Sizhen Bian, Bo Zhou, Paul Lukowicz,	(参考訳) 本研究では,ウェアラブルセンサを用いたヒューマンアクティビティ認識(HAR)のためのインクリメンタルラーニング(IL)フレームワークを提案する。スケーラブルなフレームワークであるiKANは、局所的な可塑性とスプラインのグローバル安定性を活用する分類器として多層パーセプトロンを置き換えるために、KAN(Kolmogorov-Arnold Networks)と共にILを開拓した。 KanをHARに適応させるために、iKANはタスク固有の機能ブランチと機能再配布層を使用する。出力次元や分類器ノードの数を調整して新しいタスクに適応させる既存のILメソッドとは異なり、iKANは特徴抽出枝を拡張して、一貫性のある次元と分類器出力の数を維持しながら、異なるセンサモードからの新しい入力に対応することに重点を置いている。 6つの公開HARデータセットにわたる継続的な学習では、iKANフレームワークのインクリメンタル学習性能が84.9\%(重み付きF1スコア)、平均インクリメンタル学習性能が81.34\%となり、EWC(51.42\%)やエクスペリエンスリプレイ(59.92\%)といった既存の2つのインクリメンタル学習方法よりも大幅に向上した。 This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of splines. To adapt KAN for HAR, iKAN uses task-specific feature branches and a feature redistribution layer. Unlike existing IL methods that primarily adjust the output dimension or the number of classifier nodes to adapt to new tasks, iKAN focuses on expanding the feature extraction branches to accommodate new inputs from different sensor modalities while maintaining consistent dimensions and the number of classifier outputs. Continual learning across six public HAR datasets demonstrated the iKAN framework's incremental learning performance, with a last performance of 84.9\% (weighted F1 score) and an average incremental performance of 81.34\%, which significantly outperforms the two existing incremental learning methods, such as EWC (51.42\%) and experience replay (59.92\%).	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 出力制約付き学習アルゴリズムの統一定式化による解析 An Analysis under a Unified Fomulation of Learning Algorithms with Output Constraints ( http://arxiv.org/abs/2406.01647v1 ) ライセンス: Link先を確認	Mooho Song, Jay-Yoon Lee,	(参考訳) ニューラルネットワーク(NN)は様々なタスクでよく機能するが、時には人間に非意味な結果をもたらす。ほとんどのNNモデルは(インプット、アウトプット)ペアから学び、時に人間の知識と矛盾する。多くの研究は、トレーニング中に出力制約を減らして人間の知識を注入することは、モデル性能を改善し、制約違反を減らすことを示唆している。同じプログラミングフレームワークの下で、異なる既存のアルゴリズムを比較する試みはいくつかあるが、しかしながら、学習アルゴリズムを統一的な方法で出力制約に分類する以前の研究は行われていない。筆者らの貢献は,(1) 使用する制約損失の種類(確率的ソフトロジック,REINFORCE), 制約違反事例の探索戦略, および主課題と制約からの学習信号の統合メカニズムの3つの軸に基づいて, これまでの研究を分類することである。 2) 連続学習アルゴリズムにインスパイアされた主課題情報と制約注入情報を統合する新しいアルゴリズムを提案する。さらに,本手法と制約違反を同時に考慮するための指標として,$H\beta$-scoreを提案する。自然言語推論(NLI)、合成翻訳例(STE)、意味的役割ラベリング(SRL)という3つのNLPタスクにおける全てのアルゴリズムを網羅的に分析する。我々は、高い$H\beta$-scoresを達成するための様々なアルゴリズムの鍵となる要素を探求し、明らかにする。 Neural networks (NN) perform well in diverse tasks, but sometimes produce nonsensical results to humans. Most NN models "solely" learn from (input, output) pairs, occasionally conflicting with human knowledge. Many studies indicate injecting human knowledge by reducing output constraints during training can improve model performance and reduce constraint violations. While there have been several attempts to compare different existing algorithms under the same programming framework, nonetheless, there has been no previous work that categorizes learning algorithms with output constraints in a unified manner. Our contributions are as follows: (1) We categorize the previous studies based on three axes: type of constraint loss used (e.g. probabilistic soft logic, REINFORCE), exploration strategy of constraint-violating examples, and integration mechanism of learning signals from main task and constraint. (2) We propose new algorithms to integrate the information of main task and constraint injection, inspired by continual-learning algorithms. (3) Furthermore, we propose the $H\beta$-score as a metric for considering the main task metric and constraint violation simultaneously. To provide a thorough analysis, we examine all the algorithms on three NLP tasks: natural language inference (NLI), synthetic transduction examples (STE), and semantic role labeling (SRL). We explore and reveal the key factors of various algorithms associated with achieving high $H\beta$-scores.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 定義された意識:生物と人工の汎用知能の要件 Consciousness defined: requirements for biological and artificial general intelligence ( http://arxiv.org/abs/2406.01648v1 ) ライセンス: Link先を確認	Craig I. McKenzie,	(参考訳) 意識は客観的な言葉で定義するのが難しいことで知られている。意識の客観的定義は、生物学的システムや人工システムにおいて、意識と結果の選択行動がどのように発生するかを正確に理解するために、批判的に必要である。多くの理論は、意識がどのように生じるかを説明するために神経生物学と心理学の研究を統合しているが、意識を発生させるのに何が必要かを概説する理論はほとんどない。このような要件を特定するために、意識の現在の理論とそれに対応する科学的研究を調査し、第一原理から意識の定義を新たに生成する。批判的に言えば、意識は決定を行う能力を提供する装置であるが、決定そのものによって定義されていない。したがって、意識の定義には選択行動や時間的意識は必要ない。むしろ、意識の要求には以下のものが含まれる: 少なくともある種の知覚能力、そのような知覚情報の記憶のための記憶は、結果的に、自己の感覚が可能な未来と望まれる未来に基づいて決定を下すことができる想像の枠組みを提供する。思考実験と観察可能な神経学的現象は、これらの成分が基本的に意識に必要であることを示している。これらの要件の特定は、人間以外の動物や人工的な知能システムのような、知覚可能なエージェントの意識を客観的に決定できる新しい意識の定義を提供する。 Consciousness is notoriously hard to define with objective terms. An objective definition of consciousness is critically needed so that we might accurately understand how consciousness and resultant choice behaviour may arise in biological or artificial systems. Many theories have integrated neurobiological and psychological research to explain how consciousness might arise, but few, if any, outline what is fundamentally required to generate consciousness. To identify such requirements, I examine current theories of consciousness and corresponding scientific research to generate a new definition of consciousness from first principles. Critically, consciousness is the apparatus that provides the ability to make decisions, but it is not defined by the decision itself. As such, a definition of consciousness does not require choice behaviour or an explicit awareness of temporality despite both being well-characterised outcomes of conscious thought. Rather, requirements for consciousness include: at least some capability for perception, a memory for the storage of such perceptual information which in turn provides a framework for an imagination with which a sense of self can be capable of making decisions based on possible and desired futures. Thought experiments and observable neurological phenomena demonstrate that these components are fundamentally required of consciousness, whereby the loss of any one component removes the capability for conscious thought. Identifying these requirements provides a new definition for consciousness by which we can objectively determine consciousness in any conceivable agent, such as non-human animals and artificially intelligent systems.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# CoLa-DCE-概念誘導型遅延拡散対実説明 CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations ( http://arxiv.org/abs/2406.01649v1 ) ライセンス: Link先を確認	Franz Motzkus, Christian Hellert, Ute Schmid,	(参考訳) 生成AIの最近の進歩は、新しい展望と実践的実装をもたらした。特に拡散モデルは、多様かつ同時に現実的な特徴を生み出す上での強みを示し、コンピュータビジョンモデルに対する反実的説明を生成するのに適している。イメージ分類器をその予測を変えるために何を変える必要があるかという「もし」質問に答えると、反現実的な説明は人間の理解とよく一致し、結果としてモデルの振る舞いをより理解しやすいものにするのに役立つ。現在の手法は真正な偽物を生成するのに成功しているが、機能変更が直接認識できないため透明性が欠如している。この制限に対処するため,概念誘導型遅延拡散対実法(CoLa-DCE)を提案する。 CoLa-DCEは、概念選択と空間条件に関する高度な制御を持つ任意の分類器に対して、概念誘導対物を生成する。カウンターファクトは、最小限の特徴変化によって粒度が増大する。参照機能の可視化によって理解性が向上し、機能ローカライゼーションによって"どこ"が"何"を変えたかの透明性が向上する。我々は、複数の画像分類モデルとデータセットにまたがる最小化と理解性のアプローチの利点を実証し、私たちのCoLa-DCE説明が、誤分類ケースのようなモデルエラーを理解するのにどのように役立つかを洞察する。 Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering "what if" questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of "where" changed "what". We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# TAGMol: ターゲット対応のグラディエント誘導分子生成 TAGMol: Target-Aware Gradient-guided Molecule Generation ( http://arxiv.org/abs/2406.01650v1 ) ライセンス: Link先を確認	Vineeth Dorna, D. Subhalingam, Keshav Kolluru, Shreshth Tuli, Mrityunjay Singh, Saurabh Singal, N. M. Anoop Krishnan, Sayan Ranu,	(参考訳) 3次元生成モデルは、構造に基づく薬物設計(SBDD)において、特に特定の標的結合部位に適合したリガンドの発見において大きな可能性を示している。既存のアルゴリズムは、主にリガンド-ターゲット結合に焦点を当て、結合親和性によって特徴づけられる。さらに、標的リガンド分布のみに訓練されたモデルは、薬物設計プロセスの多面的性質を裏付ける、薬物類似性や合成性といった望ましい性質を持つ新規リガンドの開発など、薬物発見の幅広い目的に対処する上で不足する可能性がある。これらの課題を克服するために、我々は問題を分子生成と特性予測に分離する。後者は相乗的に拡散サンプリング過程を導出し、誘導拡散を促進し、所望の性質を持つ有意義な分子を創出する。この誘導分子生成過程をTAGMolと呼ぶ。ベンチマークデータセットの実験を通じて、TAGMolは最先端のベースラインよりも優れたパフォーマンスを示し、平均的なVina Scoreの22%の改善を実現し、必須の補助特性において良好な結果をもたらす。これにより、TAGMolは薬物生成の包括的枠組みとして確立される。 3D generative models have shown significant promise in structure-based drug design (SBDD), particularly in discovering ligands tailored to specific target binding sites. Existing algorithms often focus primarily on ligand-target binding, characterized by binding affinity. Moreover, models trained solely on target-ligand distribution may fall short in addressing the broader objectives of drug discovery, such as the development of novel ligands with desired properties like drug-likeness, and synthesizability, underscoring the multifaceted nature of the drug design process. To overcome these challenges, we decouple the problem into molecular generation and property prediction. The latter synergistically guides the diffusion sampling process, facilitating guided diffusion and resulting in the creation of meaningful molecules with the desired properties. We call this guided molecular generation process as TAGMol. Through experiments on benchmark datasets, TAGMol demonstrates superior performance compared to state-of-the-art baselines, achieving a 22% improvement in average Vina Score and yielding favorable outcomes in essential auxiliary properties. This establishes TAGMol as a comprehensive framework for drug generation.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# FusionDTI: 薬物-標的相互作用のためのトークンレベルの融合によるきめ細かい結合発見 FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction ( http://arxiv.org/abs/2406.01651v1 ) ライセンス: Link先を確認	Zhaohan Meng, Zaiqiao Meng, Iadh Ounis,	(参考訳) 薬物-標的相互作用(DTI)の予測は、薬物発見プロセスにおいて重要である。近年のDTIモデルにおいて、様々な薬物と標的エンコーダの表現の統合による顕著な進歩にもかかわらず、そのようなモデルはしばしば、薬物とタンパク質のきめ細かい相互作用、すなわち特定の薬物原子(またはサブ構造)とタンパク質のキーアミノ酸の結合を捉えるのに苦労している。本稿では、トークンレベルのFusionモジュールを用いて、ドラッグ・ターゲットインタラクションの詳細な情報を効果的に学習する、FusionDTIと呼ばれる新しいモデルを提案する。特に、FusionDTIモデルは、医薬品のSELFIES表現を用いて、配列の断片化を軽減し、標的タンパク質の構造認識(SA)語彙を組み込んで、構造情報のアミノ酸配列の制限に対処し、またエンコーダとして大規模バイオメディカルデータセットで広く訓練された訓練済み言語モデルを利用して、医薬品や標的の複雑な情報をキャプチャする。 3つのよく知られたベンチマークデータセットの実験により、提案したFusionDTIモデルは、既存の7つの最先端ベースラインと比較して、DTI予測において最高のパフォーマンスを達成することが示された。さらに本症例では,FusionDTIが潜在的な結合部位を強調し,DTI予測の説明可能性を高めることが示唆された。 Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins, which is crucial for understanding the binding mechanisms and optimising drug design. To address this issue, this paper introduces a novel model, called FusionDTI, which uses a token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction. In particular, our FusionDTI model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets. Experiments on three well-known benchmark datasets show that our proposed FusionDTI model achieves the best performance in DTI prediction compared with seven existing state-of-the-art baselines. Furthermore, our case study indicates that FusionDTI could highlight the potential binding sites, enhancing the explainability of the DTI prediction.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 分散バイアスが一対一のクロスバリデーションを妥協 Distributional bias compromises leave-one-out cross-validation ( http://arxiv.org/abs/2406.01652v1 ) ライセンス: Link先を確認	George I. Austin, Itsik Pe'er, Tal Korem,	(参考訳) クロスバリデーションは機械学習モデルの予測性能を推定する一般的な手法である。モデルトレーニングに使用されるインスタンス数を最大化したいというデータスカース方式では、"leave-one-out cross-validation"と呼ばれるアプローチがよく使われる。この設計では、他のすべてのインスタンスでトレーニングした後、各データインスタンスを予測するために、別のモデルを構築します。これにより、トレーニングされたモデル毎に単一のテストデータポイントが利用可能になるため、予測はデータセット全体にわたって集約され、レシーバの操作特性や精度のリコール曲線の下の領域のような一般的なランクベースのパフォーマンスメトリクスが計算される。本研究では,本手法が,各トレーニングフォールドの平均ラベルと対応するテストインスタンスのラベルとの間に負の相関関係を生じさせることを示す。機械学習モデルがトレーニングデータの平均に回帰する傾向にあるため、この分布バイアスは性能評価やハイパーパラメータ最適化に悪影響を及ぼす傾向にある。この効果は, モデルおよび評価アプローチの幅広い範囲にわたって継続し, より強い正則化に対するバイアスをもたらす可能性があることを示す。これを解決するために、分布バイアスを補正する一般化可能な再均衡型クロスバリデーション手法を提案する。提案手法は, 合成シミュレーションにおけるクロスバリデーション性能の評価を改良し, 複数論文の残響解析において改善することを示した。 Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach called "leave-one-out cross-validation" is often used. In this design, a separate model is built for predicting each data instance after training on all other instances. Since this results in a single test data point available per model trained, predictions are aggregated across the entire dataset to calculate common rank-based performance metrics such as the area under the receiver operating characteristic or precision-recall curves. In this work, we demonstrate that this approach creates a negative correlation between the average label of each training fold and the label of its corresponding test instance, a phenomenon that we term distributional bias. As machine learning models tend to regress to the mean of their training data, this distributional bias tends to negatively impact performance evaluation and hyperparameter optimization. We show that this effect generalizes to leave-P-out cross-validation and persists across a wide range of modeling and evaluation approaches, and that it can lead to a bias against stronger regularization. To address this, we propose a generalizable rebalanced cross-validation approach that corrects for distributional bias. We demonstrate that our approach improves cross-validation performance evaluation in synthetic simulations and in several published leave-one-out analyses.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# パラメタライズドニューラルネットワークを用いたジャンプ拡散過程の再構築のための効率的なワッサースタイン距離法 An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks ( http://arxiv.org/abs/2406.01653v1 ) ライセンス: Link先を確認	Mingtao Xia, Xiangting Li, Qijing Shen, Tom Chou,	(参考訳) 2つの多次元ジャンプ拡散過程に関連する2つの確率分布間のワッサーシュタイン距離(W$-distance)を解析する。具体的には, ドリフト, 拡散, 跳躍振幅関数に付随する上下境界を, 2つの跳躍拡散過程の間に有する時間的に分離した正方形W_2$-distanceを解析する。次に,パラメータ化ニューラルネットワークを用いたデータから未知のジャンプ拡散過程を効率的に再構築する,時間的に分離された2乗法W_2$-distance法を提案する。さらに,ジャンプ拡散過程のドリフト関数に関する事前情報を利用して,その性能を向上できることを示す。提案手法の有効性をいくつかの例と応用例で示す。 We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# TinySV: デバイス上での学習によるTinyMLの話者検証 TinySV: Speaker Verification in TinyML with On-device Learning ( http://arxiv.org/abs/2406.01655v1 ) ライセンス: Link先を確認	Massimo Pavan, Gioele Mombelli, Francesco Sinacori, Manuel Roveri,	(参考訳) TinyMLは、小さなデバイス(Internet-of-Thingsや組み込みシステムなど)で機械学習アルゴリズムを実行する能力のおかげで、ここ数年で大きな勢いを増した、機械学習の新たな領域である。興味深いことに、この分野での研究は、TinyMLモデルの推論フェーズを小さなデバイスで効率的に実行することに焦点を当てている。本研究の目的は、デバイス上での学習アルゴリズムに対処する必要のある、提示された \textit{Tiny Speaker Verification} (TinySV) のようなタスクで使用できる新しいタイプの適応型TinyMLソリューションを導入することである。この目標を達成するには i)TinyML学習アルゴリズムのメモリと計算要求の低減、及び (2)TinyML学習アルゴリズムの設計。提案したTinySVソリューションは、キーワードスポッティングと適応話者検証モジュールで構成される2層階層のTinyMLソリューションに依存している。 Infineon PSoC 62S2 Wi-Fi BT Pioneer Kit(PSOC 62S2 Wi-Fi BT Pioneer Kit)を用いて,提案手法の有効性と効率を評価した。 TinyML is a novel area of machine learning that gained huge momentum in the last few years thanks to the ability to execute machine learning algorithms on tiny devices (such as Internet-of-Things or embedded systems). Interestingly, research in this area focused on the efficient execution of the inference phase of TinyML models on tiny devices, while very few solutions for on-device learning of TinyML models are available in the literature due to the relevant overhead introduced by the learning algorithms. The aim of this paper is to introduce a new type of adaptive TinyML solution that can be used in tasks, such as the presented \textit{Tiny Speaker Verification} (TinySV), that require to be tackled with an on-device learning algorithm. Achieving this goal required (i) reducing the memory and computational demand of TinyML learning algorithms, and (ii) designing a TinyML learning algorithm operating with few and possibly unlabelled training data. The proposed TinySV solution relies on a two-layer hierarchical TinyML solution comprising Keyword Spotting and Adaptive Speaker Verification module. We evaluated the effectiveness and efficiency of the proposed TinySV solution on a dataset collected expressly for the task and tested the proposed solution on a real-world IoT device (Infineon PSoC 62S2 Wi-Fi BT Pioneer Kit).	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# ソースフリードメイン適応のためのプロキシDenoising Proxy Denoising for Source-Free Domain Adaptation ( http://arxiv.org/abs/2406.01658v1 ) ライセンス: Link先を確認	Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu,	(参考訳) Source-free Domain Adaptation (SFDA)は、トレーニング済みのソースモデルを、ソースデータにアクセスせずにラベルなしのターゲットドメインに適応することを目的としている。他の多くの応用において、事前訓練された大型視覚言語(ViL)モデルの成功に触発されて、最新のFDA法は、それらの予測を疑似監視として活用することで、ViLモデルの利点を検証した。しかし、ViLの予測はノイズが多く、未知の速度で不正確な場合があり、適応中に付加的な負の効果が生じる可能性がある。このような無視された課題に対処するために,本稿ではProxy Denoising(ProDe)アプローチを紹介する。具体的には、ViLモデルをプロキシとして利用し、潜在ドメイン不変空間への適応プロセスを容易にする。重要な点として、ViLの予測を修正するためのプロキシ記述機構を設計する。これは、領域不変空間に対するプロキシの発散によるドメイン適応効果をエレガントにモデル化することで、新しいプロキシ信頼理論に基づいている。補正されたプロキシを大まかに活用するために、我々はまた、正規化を蒸留する相互知識を導出する。我々のProDeは、従来のクローズドセット設定と、より挑戦的なオープンセット、部分セット、一般化されたSFDA設定の両方の下で、最先端の代替品よりも大幅に優れています。コードはまもなくリリースされる。 Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's predictions could be noisy and inaccurate at an unknown rate, potentially introducing additional negative effects during adaption. To address this thus-far ignored challenge, in this paper, we introduce a novel Proxy Denoising (ProDe) approach. Specifically, we leverage the ViL model as a proxy to facilitate the adaptation process towards the latent domain-invariant space. Critically, we design a proxy denoising mechanism for correcting ViL's predictions. This is grounded on a novel proxy confidence theory by modeling elegantly the domain adaption effect of the proxy's divergence against the domain-invariant space. To capitalize the corrected proxy, we further derive a mutual knowledge distilling regularization. Extensive experiments show that our ProDe significantly outperforms the current state-of-the-art alternatives under both conventional closed-set setting and the more challenging open-set, partial-set and generalized SFDA settings. The code will release soon.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 自己改善ロバスト推論最適化 Self-Improving Robust Preference Optimization ( http://arxiv.org/abs/2406.01660v1 ) ライセンス: Link先を確認	Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad Gheshlaghi Azar,	(参考訳) PPOやDPOのようなオンラインおよびオフラインのRLHFメソッドは、AIと人間の好みを合わせることに成功している。彼らの成功にもかかわらず、既存の手法は、その最適解がタスク依存性が高いという根本的な問題に悩まされている(すなわち、アウト・オブ・ディストリビューション(OOD)タスクに対して堅牢ではない)。本稿では、タスクの変更に対して完全に堅牢な、実用的で数学的に原則化されたオフラインRLHFフレームワークである、自己改善ロバスト推論最適化SRPOを提案することで、この問題に対処する。 SRPOの鍵となる考え方は、人間の嗜好から学ぶことの問題を自己改善のプロセスとして提示することであり、これは、自己改善政策の協調最適化と、敵のやり方で生成政策を共同で最適化することを目的とした、min-max目的の数学的表現によって表現することができる。この最適化問題の解決策は、トレーニングタスクとは独立しているため、その変更に対して堅牢である。そこで我々は,この目的を,報酬モデルやオンライン推論を必要とせずに,標準化された最適化手法を用いて大規模に最適化できる非逆オフライン損失の形で再表現できることを示す。本稿では,AI Win-Rate (WR) による人間(GOLD) の完成に対するSRPOの有効性を示す。特に、SRPOがOOD XSUMデータセットで評価されると、5回の自己修正で15%の明確なマージンを達成し、90%のWRを達成する。 Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimization SRPO, a practical and mathematically principled offline RLHF framework that is completely robust to the changes in the task. The key idea of SRPO is to cast the problem of learning from human preferences as a self-improvement process, which can be mathematically expressed in terms of a min-max objective that aims at joint optimization of self-improvement policy and the generative policy in an adversarial fashion. The solution for this optimization problem is independent of the training task and thus it is robust to its changes. We then show that this objective can be re-expressed in the form of a non-adversarial offline loss which can be optimized using standard supervised optimization techniques at scale without any need for reward model and online inference. We show the effectiveness of SRPO in terms of AI Win-Rate (WR) against human (GOLD) completions. In particular, when SRPO is evaluated on the OOD XSUM dataset, it outperforms the celebrated DPO by a clear margin of 15% after 5 self-revisions, achieving WR of 90%.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 教師なしニューラルネットワーク最適化のための拡散モデルフレームワーク A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization ( http://arxiv.org/abs/2406.01661v1 ) ライセンス: Link先を確認	Sebastian Sanokowski, Sepp Hochreiter, Sebastian Lehner,	(参考訳) 個別のデータセット上の難解な分布から、対応するトレーニングデータに頼ることなくサンプルを学習することは、 Combinatorial Optimizationを含む幅広い分野において中心的な問題である。現在、人気のあるディープラーニングベースのアプローチは、主に正確なサンプル確率を生み出す生成モデルに依存している。この研究は、この制限を解除する手法を導入し、拡散モデルのような高度に表現力のある潜在変数モデルを採用する可能性を開く。提案手法は, 逆カルバック・リーブラー分岐を上界とする損失に基づいて, 正確なサンプル確率の要求を回避している。我々は,データフリーなコンビネーション最適化におけるアプローチを実験的に検証し,幅広いベンチマーク問題に対して新しい最先端の手法を実現することを実証した。 Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. Our approach is conceptually based on a loss that upper bounds the reverse Kullback-Leibler divergence and evades the requirement of exact sample likelihoods. We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.	翻訳日:2024-06-05 21:21:41 公開日:2024-06-03
# 日常生活の対話活動の少ない分類法(InteractADL) Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) ( http://arxiv.org/abs/2406.01662v1 ) ライセンス: Link先を確認	Zane Durante, Robathan Harries, Edward Vendrow, Zelun Luo, Yuta Kyuragi, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli,	(参考訳) 日常生活のアクティビティ(ADL)を理解することは、補助ロボット、スマートホーム、ヘルスケアなど、さまざまなアプリケーションにとって重要なステップである。しかし、これまでに複雑なADL、特に家庭環境における多人数インタラクションに焦点を絞ったベンチマークや手法はほとんどない。本稿では,人間(と物体)間の相互作用を含む複雑なADLを理解するために,新しいデータセットとベンチマークであるInteractADLを提案する。さらに、家庭環境において発生する複雑なADLは、多人数インタラクションの希少性により、困難で長い尾の分布を構成し、意味的および視覚的に類似したクラスが存在するため、きめ細かな視覚認識タスクを行う。これらの問題に対処するために、最適なクラス名ベクトルを学習することで、より意味的な分離を可能にする、ネームチューニングと呼ばれる、きめ細かいビデオ分類法を提案する。入力テキスト全体(プロンプトやクラス名のみを学習するのではなく)を学習するための既存のプロンプトチューニング戦略と組み合わせて、InteractADLおよび他の4つのきめ細かい視覚的分類ベンチマーク上でのいくつかのショット分類の性能向上を示す。透明性と再現性のために、私たちはhttps://github.com/zanedurante/vlm_benchmark.comでコードを公開しています。 Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involving multi-person interactions in home environments. In this paper, we propose a new dataset and benchmark, InteractADL, for understanding complex ADLs that involve interaction between humans (and objects). Furthermore, complex ADLs occurring in home environments comprise a challenging long-tailed distribution due to the rarity of multi-person interactions, and pose fine-grained visual recognition tasks due to the presence of semantically and visually similar classes. To address these issues, we propose a novel method for fine-grained few-shot video classification called Name Tuning that enables greater semantic separability by learning optimal class name vectors. We show that Name Tuning can be combined with existing prompt tuning strategies to learn the entire input text (rather than only learning the prompt or class names) and demonstrate improved performance for few-shot classification on InteractADL and 4 other fine-grained visual classification benchmarks. For transparency and reproducibility, we release our code at https://github.com/zanedurante/vlm_benchmark.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 枝が結合した木上の隠れマルコフモデルに対する効率的な解法 An efficient solution to Hidden Markov Models on trees with coupled branches ( http://arxiv.org/abs/2406.01663v1 ) ライセンス: Link先を確認	Farzan Vafa, Sahand Hormoz,	(参考訳) 隠れマルコフモデル(HMM)はシーケンシャルデータをモデリングするための強力なツールであり、基礎となる状態は確率的に進化し、間接的にしか観測できない。従来のHMMアプローチは線形列に対して十分に確立されており、木などの他の構造にも拡張されている。本稿では、木上のHMMの枠組みを拡張し、データのツリーのような構造が結合枝を含むシナリオに対処する。本研究では,木系HMMと分岐した分岐木に対する確率,復号化,パラメータ学習問題を効率的に解く動的プログラミングアルゴリズムを開発した。提案手法は状態数やノード数と多項式的にスケールし,幅広いアプリケーションで計算可能であり,下フロー問題に悩まされない。シミュレーションデータに適用してアルゴリズムを実証し,推論に使用するモデルの仮定を検証するための自己整合性チェックを提案する。この研究は、木上のHMMの理論的理解を前進させるだけでなく、枝間の依存関係を無視できない複雑な生物学的データを解析するための実用的なツールも提供する。 Hidden Markov Models (HMMs) are powerful tools for modeling sequential data, where the underlying states evolve in a stochastic manner and are only indirectly observable. Traditional HMM approaches are well-established for linear sequences, and have been extended to other structures such as trees. In this paper, we extend the framework of HMMs on trees to address scenarios where the tree-like structure of the data includes coupled branches -- a common feature in biological systems where entities within the same lineage exhibit dependent characteristics. We develop a dynamic programming algorithm that efficiently solves the likelihood, decoding, and parameter learning problems for tree-based HMMs with coupled branches. Our approach scales polynomially with the number of states and nodes, making it computationally feasible for a wide range of applications and does not suffer from the underflow problem. We demonstrate our algorithm by applying it to simulated data and propose self-consistency checks for validating the assumptions of the model used for inference. This work not only advances the theoretical understanding of HMMs on trees but also provides a practical tool for analyzing complex biological data where dependencies between branches cannot be ignored.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 代数的観察宇宙論 Algebraic Observational Cosmology ( http://arxiv.org/abs/2406.01669v1 ) ライセンス: Link先を確認	Jonah Kudler-Flam, Samuel Leutheusser, Gautam Satishchandran,	(参考訳) 宇宙の観測者が測定できるものは何か。この問題に対処するために、FLRW時空において、過去に漸近的にデ・シッター(英語版)の漸近的な観測者に対して、重力的に修飾された可観測物の代数を構築し、インフレのエポックを記述した。本質的な量子化された自由度は、インフラトンのゼロモードであり、インフレーション中に有効宇宙定数の変動を引き起こし、半古典的極限における最大エントロピー状態の存在を防ぐ。宇宙論的な地平線を超えて測定が到達できないため、すべての状態がよく定義されたフォン・ノイマンエントロピーと混合されることが示される。半古典状態の場合、フォン・ノイマンのエントロピーは観測者の因果ダイヤモンドの一般化エントロピー(状態に依存しない定数まで)に対応する。 What can be measured by an observer in our universe? We address this question by constructing an algebra of gravitationally-dressed observables accessible to a comoving observer in FLRW spacetimes that are asymptotically de Sitter in the past, describing an inflationary epoch. An essential quantized degree of freedom is the zero-mode of the inflaton, which leads to fluctuations in the effective cosmological constant during inflation and prevents the existence of a maximum entropy state in the semiclassical limit. Due to the inaccessibility of measurements beyond our cosmological horizon, we demonstrate that all states are mixed with well-defined von Neumann entropy. For semiclassical states, the von Neumann entropy corresponds to the generalized entropy (up to a state-independent constant) of the observer's causal diamond, a fine-grained quantity that is sensitive to the initial conditions of the universe.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 孤立量子系における熱力学第二法則の創発 Emergence of a second law of thermodynamics in isolated quantum systems ( http://arxiv.org/abs/2406.01677v1 ) ライセンス: Link先を確認	Florian Meier, Tom Rivlin, Tiago Debarba, Jake Xuereb, Marcus Huber, Maximilian P. E. Lock,	(参考訳) 熱力学の第2法則は、孤立系のエントロピーは時間とともにしか増加しないと述べている。これは、フォン・ノイマンのエントロピーを保存するシュリンガー方程式の下で孤立量子系の可逆的進化と矛盾しているように見える。それでも、多くの観測可能な値に対して期待値は、その平衡値である固定値に近づくことが分かる。どのようにして、孤立量子系のエントロピーは時間とともに増加するのだろうか? 古典系では、物理系の微妙な詳細についての無知の概念とともに、低エントロピー初期状態の仮定を導入し、第二法則の統計的解釈をもたらす。量子系を探索する観測可能量を考えると、これらの仮定はどちらも組み込まれ、観測可能量の平均の平衡に関する最近の研究に基づいている。観測可能な期待値の統計的挙動は良好に確立されているが、エントロピーへの定量的な関係は今のところ欠落している。可観測物の平衡に関する新しい境界を導出し、可観測物に対する系のエントロピーを考えると、与えられた可観測物に対するエントロピーは系のユニタリ進化の過程でその平衡値に傾くという第二法則の変則を回復する。これらの結果は、量子系における平衡の非可積分性の必要性を疑問視する最近の知見を支持している。さらに、スピンの連鎖上の量子イジングモデルのパラダイム的な例から得られる数値的な結果を用いて、我々の境界をさらに説明します。そこでは、平衡値まで増加するエントロピーと、導出された境界に従って、基礎となる可逆的進化を明らかにする揺らぎを観察する。 The second law of thermodynamics states that the entropy of an isolated system can only increase over time. This appears to conflict with the reversible evolution of isolated quantum systems under the Schr\"odinger equation, which preserves the von Neumann entropy. Nonetheless, one finds that with respect to many observables, expectation values approach a fixed value -- their equilibrium value. This ultimately raises the question: in what sense does the entropy of an isolated quantum system increase over time? For classical systems, one introduces the assumption of a low entropy initial state along with the concept of ignorance about the microscopic details of the physical system, leading to a statistical interpretation of the second law. By considering the observables through which we examine quantum systems, both these assumptions can be incorporated, building upon recent studies of the equilibration on average of observables. While the statistical behavior of observable expectation values is well-established, a quantitative connection to entropy increase has been lacking so far. In deriving novel bounds for the equilibration of observables, and considering the entropy of the system relative to observables, we recover a variant of the second law: the entropy with respect to a given observable tends towards its equilibrium value in the course of the system's unitary evolution. These results also support recent findings which question the necessity of non-integrability for equilibration in quantum systems. We further illustrate our bounds using numerical results from the paradigmatic example of a quantum Ising model on a chain of spins. There, we observe entropy increasing up to equilibrium values, as well as fluctuations which expose the underlying reversible evolution in accordance with the derived bounds.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 予熱時結晶コーナモード Prethermal Time-Crystalline Corner Modes ( http://arxiv.org/abs/2406.01686v1 ) ライセンス: Link先を確認	Si Jiang, Dong Yuan, Wenjie Jiang, Dong-Ling Deng, Francisco Machado,	(参考訳) 本研究では, 非調和応答が0次元角モードに完全に局在する予熱離散時間結晶の存在を実証する。指数関数的に長い前熱状態の中で、これらのコーナーモードの堅牢性は、2つの関連するが異なるメカニズム、すなわち、有効ハミルトニアンにおける高次対称性保護位相の存在、あるいはコーナーモードの崩壊を防ぐ動的制約の存在から生じることを示す。第1のメカニズムは、前熱的状態全体のサブハーモニック応答の安定性を保証するが、有効ハミルトニアンの基底状態多様体における初期状態に制限される。対照的に、第2のメカニズムは任意の初期状態に対する前熱前の時間結晶秩序の観測を可能にするが、これは駆動の周波数によって決定されるだけでなく、系のサブラテックス全体の相対エネルギースケールによっても決定される。我々は、周期的に駆動される2次元スピンモデルの力学をシミュレートすることでこれらの2つのメカニズムを特徴づけ、我々のモデルが他のすべての次元に自然に拡張することについて議論する。 We demonstrate the existence of prethermal discrete time crystals whose sub-harmonic response is entirely localized to zero-dimensional corner modes. Within the exponentially long prethermal regime, we show that the robustness of these corner modes arises from two related, yet distinct mechanisms: the presence of a higher-order symmetry-protected topological phase in the effective Hamiltonian, or the emergence of a dynamical constraint that prevents the decay of the corner mode. While the first mechanism ensures the stability of the sub-harmonic response throughout the entirety of the prethermal regime, it is restricted to initial states in the ground state manifold of the effective Hamiltonian. By contrast, the second mechanism enables the observation of the prethermal time-crystalline order for arbitrary initial states, albeit with a time scale that is not only determined by the frequency of the drive, but also the relative energy scale across the system's sublattices. We characterize these two mechanisms by simulating the dynamics of a periodically driven two-dimensional spin model, and discuss natural extensions of our model to all other dimensions.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# Bit by Bit: 量子情報のレンズを通しての重力 Bit by Bit: Gravity Through the Lens of Quantum Information ( http://arxiv.org/abs/2406.01695v1 ) ライセンス: Link先を確認	William Munizzi,	(参考訳) この論文は、量子情報とホログラフィーの交差における最近のいくつかの進歩をレビューしている。ホログラフィーにおいて、量子系の特性はAdS/CFT対応による重力解釈を許容する。ホログラフィック状態の場合、境界エンタングルメントエントロピーは、龍高柳面として知られるバルク測地圏と双対である。さらに、ホログラフィック双対を全く持たない生存性は、絡み合い構造によって制約される。したがって、絡み合いはヒルベルト空間における状態の粗い分類を可能にする。同様に、作用素群の下での状態変換はヒルベルト空間の分類も提供する。例えば安定化器状態は、大きな演算セットの下で不変であり、したがって古典的なコンピュータ上でシミュレートできる。ケイリーグラフは、頂点が群要素を表し、エッジが生成元を表す作用素群に対して有用な表現を提供する。群の作用状態の軌道は、群ケイリーグラフの商である「到達可能性グラフ」としても表すことができる。到達可能性グラフは絡み合い情報をエンコードするために着ることができ、絡み合いのダイナミクスを研究するのに有用なツールとなる。状態計算可能な、例えば絡み合うエントロピーを固定する群要素による到達可能性グラフの定式化は、「収縮グラフ」を構築する。量子回路における状態パラメータの明示的に束縛されたグラフ。この論文では、クリフォード回路における絡み合いエントロピー進化の上限について述べる。量子系のもう1つの重要な性質は、量子状態のシミュレートの難しさを定量化するマジックである。 AdS/CFTにおける創発現象を記述する際、マジックと絡み合いは相補的な役割を果たす。この研究は絡み合いと魔法の相互作用を記述し、宇宙のブレインバック反応として魔法にホログラフィックな結果を与える。 This dissertation reviews several recent advances at the intersection of quantum information and holography. In holography, properties of quantum systems admit a gravitational interpretation via the AdS/CFT correspondence. For holographic states, boundary entanglement entropy is dual to bulk geodesic areas, known as Ryu-Takayanagi surfaces. Furthermore, the viability to possess a holographic dual at all is constrained by entanglement structure. Accordingly, entanglement enables a coarse classification of states in a Hilbert space. Similarly, state transformation under operator groups also provides a classification on the Hilbert space. Stabilizer states, for example, are invariant under large sets of operations and consequently can be simulated on a classical computer. Cayley graphs offer a useful representation for a group of operators, where vertices represent group elements and edges represent generators. The orbit of a state under action of the group can also be represented as a "reachability graph", a quotient of the group Cayley graph. Reachability graphs can be dressed to encode entanglement information, making them a useful tool for studying entanglement dynamics. Quotienting a reachability graph by group elements that fix a state computable, e.g. entanglement entropy, builds a "contracted graph". Contracted graphs explicitly bound state parameter evolution in quantum circuits. In this thesis, an upper bound on entanglement entropy evolution in Clifford circuits is presented. Another important property of quantum systems is magic, which quantifies the difficulty of simulating a quantum state. Magic and entanglement play complementary roles when describing emergent phenomena in AdS/CFT. This work describes the interplay of entanglement and magic, offering holographic consequences for magic as cosmic brane back-reaction.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 異種LSM推論におけるプラットフォーム要件のデミスティファイト化 Demystifying Platform Requirements for Diverse LLM Inference Use Cases ( http://arxiv.org/abs/2406.01698v1 ) ライセンス: Link先を確認	Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna,	(参考訳) 大きな言語モデル(LLM)は、広範囲のアプリケーションで顕著なパフォーマンスを示しており、しばしば人間の専門家よりも優れています。しかし、様々な推論ユースケースのためにこれらのパラメータ重モデルを効率的にデプロイするには、十分なコンピューティング、メモリ、ネットワークリソースを備えたハードウェアプラットフォームを慎重に設計する必要がある。 LLMデプロイメントシナリオとモデルがブレークネックスピードで進化する中で、SLOを満たすためのハードウェア要件は、依然としてオープンな研究課題である。本研究では,LLM推論性能とプラットフォーム設計パラメータの関係を解析的に解析するGenZを提案する。我々の分析は、異なるLLMワークロードとユースケースのためのプラットフォーム構成に関する洞察を提供する。 LLaMA や GPT-4 のような SOTA LLM モデルをサポートするためのプラットフォーム要件を,多様なサービス設定下で定量化する。さらに、将来のLCMが数百兆のパラメータを超える可能性を実現するために必要なハードウェア機能も提案する。 GenZのトレンドと洞察は、LLMをデプロイするAIエンジニアと、次世代ハードウェアアクセラレータやプラットフォームを設計するコンピュータアーキテクトを導くことができる。結局のところ、この研究は、幅広いアプリケーションにまたがる大きな言語モデルの潜在能力を最大限に活用するためのプラットフォーム設計の考察に光を当てている。ソースコードはhttps://github.com/abhibambhaniya/GenZ-LLM-Analyzerで入手できる。 Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer .	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# ペッツ・レニーの相互情報の2倍最小化:直接指数による特性と操作的解釈 Doubly minimized Petz Renyi mutual information: Properties and operational interpretation from direct exponent ( http://arxiv.org/abs/2406.01699v1 ) ライセンス: Link先を確認	Laura Burri,	(参考訳) 2倍に最小化された位数$\alpha$のペッツ・レニーの相互情報は、任意の積状態に対する固定二部量子状態の位数$\alpha$のペッツ発散の最小化として定義される。本研究では、このタイプのRenyi相互情報のいくつかの特性を確立し、$\alpha\in [1/2,2]$に対する加法性を含む。応用として、ある二項量子状態判別問題の直接指数は、位数$\alpha\in (1/2,1)$の2倍に最小化されたペッツ・レニイ相互情報によって決定されることを示す。これはこの種のレニイ相互情報の操作的解釈を提供し、古典的確率分布の以前の結果を量子設定に一般化する。 The doubly minimized Petz Renyi mutual information of order $\alpha$ is defined as the minimization of the Petz divergence of order $\alpha$ of a fixed bipartite quantum state relative to any product state. In this work, we establish several properties of this type of Renyi mutual information, including its additivity for $\alpha\in [1/2,2]$. As an application, we show that the direct exponent of certain binary quantum state discrimination problems is determined by the doubly minimized Petz Renyi mutual information of order $\alpha\in (1/2,1)$. This provides an operational interpretation of this type of Renyi mutual information, and generalizes a previous result for classical probability distributions to the quantum setting.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# Snowflake: 分散ストリーミングデコーダ Snowflake: A Distributed Streaming Decoder ( http://arxiv.org/abs/2406.01701v1 ) ライセンス: Link先を確認	Tim Chan,	(参考訳) 我々は、ストリーミング形式で動作し、単純で局所的な実装が可能な量子エラー補正デコーダであるSnowflakeを設計する。そこで本研究では,ウィンドウオーバーラップによる処理オーバーヘッドを解消する汎用ストリーム復号法を提案する。最初の研究として、サーキットレベルの雑音下でのサーフェスコード上でのSnowflakeの局所的な実装を検証した。約2/3で、Union-Findデコーダの精度閾値をウィンドウニングメソッドで調整し、より平均的なランタイムスケーリングを実現している。本研究では,Snowflakeを2次元チップ上に実装し,量子メモリだけでなく格子演算に基づく計算をデコードする方法について論じる。 We design Snowflake, a quantum error correction decoder that runs in a streaming fashion and is capable of a simple, local implementation. In doing so we propose a new method for general stream decoding that eliminates the processing overhead due to window overlap in existing windowing methods. As a first study, we test our local implementation of Snowflake on the surface code under circuit-level noise. It recovers roughly 2/3 the accuracy threshold of the Union-Find decoder adapted with a windowing method, with a better mean runtime scaling: subquadratic as opposed to cubic in code distance $d$. We discuss how Snowflake may be implemented on a 2D chip and decode not just quantum memory but lattice surgery-based computation.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# シリコンにおける分散量子コンピューティング Distributed Quantum Computing in Silicon ( http://arxiv.org/abs/2406.01704v1 ) ライセンス: Link先を確認	Francis Afzal, Mohsen Akhlaghi, Stefanie J. Beale, Olinka Bedroya, Kristin Bell, Laurent Bergeron, Kent Bonsma-Fisher, Polina Bychkova, Zachary M. E. Chaisson, Camille Chartrand, Chloe Clear, Adam Darcie, Adam DeAbreu, Colby DeLisle, Lesley A. Duncan, Chad Dundas Smith, John Dunn, Amir Ebrahimi, Nathan Evetts, Daker Fernandes Pinheiro, Patricio Fuentes, Tristen Georgiou, Biswarup Guha, Rafael Haenel, Daniel Higginbottom, Daniel M. Jackson, Navid Jahed, Amin Khorshidahmad, Prasoon K. Shandilya, Alexander T. K. Kurkjian, Nikolai Lauk, Nicholas R. Lee-Hone, Eric Lin, Rostyslav Litynskyy, Duncan Lock, Lisa Ma, Iain MacGilp, Evan R. MacQuarrie, Aaron Mar, Alireza Marefat Khah, Alex Matiash, Evan Meyer-Scott, Cathryn P. Michaels, Juliana Motira, Narwan Kabir Noori, Egor Ospadov, Ekta Patel, Alexander Patscheider, Danny Paulson, Ariel Petruk, Adarsh L. Ravindranath, Bogdan Reznychenko, Myles Ruether, Jeremy Ruscica, Kunal Saxena, Zachary Schaller, Alex Seidlitz, John Senger, Youn Seok Lee, Orbel Sevoyan, Stephanie Simmons, Oney Soykal, Leea Stott, Quyen Tran, Spyros Tserkis, Ata Ulhaq, Wyatt Vine, Russ Weeks, Gary Wolfowicz, Isao Yoneda,	(参考訳) 量子化学やショアのアルゴリズムのような商業的に影響力のある量子アルゴリズムは、既存の量子プロセッサの容量を超える多くの量子ビットとゲートを必要とする。ネットワークモジュールによって水平にスケールする分散アーキテクチャは、商用ユーティリティへのルートを提供し、最終的には単一の量子コンピューティングモジュールの能力を超えます。このようなプロセッサは、モジュール間で分散されたリモートの絡み合いを消費し、分散量子論理を実現する。したがって、ネットワーク化された量子コンピュータはモジュール間の高忠実な絡み合いを迅速に分散する能力を必要とする。ここでは、等方的に濃縮されたシリコン中のシリコンT中心上に、いくつかの重要な分散量子コンピューティングプロトコルの予備的なデモンストレーションを示す。本稿では,モジュール間の絡み合いの分布を実証し,それを伝送ゲートシーケンスに適用し,分散量子コンピューティングおよびネットワークプラットフォームとしてTセンタの概念実証を確立する。 Commercially impactful quantum algorithms such as quantum chemistry and Shor's algorithm require a number of qubits and gates far beyond the capacity of any existing quantum processor. Distributed architectures, which scale horizontally by networking modules, provide a route to commercial utility and will eventually surpass the capability of any single quantum computing module. Such processors consume remote entanglement distributed between modules to realize distributed quantum logic. Networked quantum computers will therefore require the capability to rapidly distribute high fidelity entanglement between modules. Here we present preliminary demonstrations of some key distributed quantum computing protocols on silicon T centres in isotopically-enriched silicon. We demonstrate the distribution of entanglement between modules and consume it to apply a teleported gate sequence, establishing a proof-of-concept for T centres as a distributed quantum computing and networking platform.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# ピーナッツモデル:トレーニングアクセスなしでMLモデルをハイジャックすることは可能か Model for Peanuts: Hijacking ML Models without Training Access is Possible ( http://arxiv.org/abs/2406.01708v1 ) ライセンス: Link先を確認	Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani,	(参考訳) 機械学習(ML)モデルの大規模な展開は、信頼性を脅かし、プライバシーの侵害、差別リスク、説明責任の欠如といった倫理的および社会的懸念を提起する、いくつかの攻撃の出現に伴っている。モデルハイジャックはこれらの攻撃の1つであり、敵は被害者のモデルをハイジャックして元のモデルとは異なるタスクを実行する。モデルハイジャックは、ハイジャックされたモデル所有者が、違法または非倫理的なサービスを提供するモデルを持つことによって、説明責任とセキュリティ上のリスクを引き起こす可能性がある。従来の最先端の作業では、モデルハイジャックはトレーニングタイムアタックであり、敵は攻撃を実行するためにMLモデルのトレーニングにアクセスする必要がある。本稿では、攻撃者が被害者モデルの訓練段階にアクセスできないような、より強力な脅威モデルを考える。私たちの直感では、MLモデルは、通常過パラメータ化され、(意図せずに)トレーニング対象のタスクよりも多くを学ぶことができる。本研究では,SnatchMLと命名された推論時間におけるモデルハイジャックに対する簡単なアプローチを提案し,被害者モデルの潜伏空間における距離測定を用いて未知の入力サンプルを,ハイジャックタスククラスに関連する既知のサンプルに分類する。 SnatchMLは経験的に、良質な事前訓練されたモデルが初期タスクと意味的に関連するタスクを実行できることを示している。驚いたことに、これは元のタスクとは無関係なタスクをハイジャックしても当てはまる。このリスクを軽減するために、さまざまな方法も検討しています。最初にメタ学習と呼ぶ新しいアプローチを提案し、モデルが元のタスクデータセットをトレーニングしながら潜在的に悪意のあるタスクを解放するのに役立つように設計した。また,モデルハイジャックを容易にする1つの要因として,過パラメータ化に関する洞察を提供し,この攻撃に対する圧縮に基づく対策を提案する。 The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# ハニカムモアレポテンシャルにおける相互作用電子の強磁性半金属および電荷密度波位相 Ferromagnetic semimetal and charge-density wave phases of interacting electrons in a honeycomb moiré potential ( http://arxiv.org/abs/2406.01715v1 ) ライセンス: Link先を確認	Yubo Yang, Miguel A. Morales, Shiwei Zhang,	(参考訳) モワール系における量子相の探索は、激しい実験的、理論的努力を惹きつけてきた。ハニカム対称性の実現は近年注目されている。強い相互作用とハニカム対称性の組み合わせは、分数チャーン絶縁体、非伝統的な超伝導体、量子スピン液体のようなエキゾチックな電子状態をもたらす。このようなシステムにおける正確な計算は、強い長距離クーロン相互作用を確実に処理し、大きなシステムサイズに接近して熱力学的位相を抽出することで、ほとんど失われている。我々は, 固定相拡散モンテカルロを用いて, ハニカムモアレ'e格子上の2次元電子ガスの研究を行った。この重要なモデルの基底状態位相は、現在の実験に関連するパラメータ状態で決定される。モワールポテンシャルの増大に伴い、系は常磁性金属から遍歴強磁性半金属、そして電荷密度波絶縁体へと遷移する。 The exploration of quantum phases in moir\'e systems has drawn intense experimental and theoretical efforts. The realization of honeycomb symmetry has been a recent focus. The combination of strong interaction and honeycomb symmetry can lead to exotic electronic states such as fractional Chern insulator, unconventional superconductor, and quantum spin liquid. Accurate computations in such systems, with reliable treatment of strong long-ranged Coulomb interaction and approaching the large system sizes to extract thermodynamic phases, are mostly missing. We study the two-dimensional electron gas on a honeycomb moir\'e lattice at quarter filling, using fixed-phase diffusion Monte Carlo. The ground state phases of this important model are determined in the parameter regime relevant to current experiments. With increasing moir\'e potential, the systems transitions from a paramagnetic metal to an itinerant ferromagnetic semimetal and then a charge-density-wave insulator.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# 混合フォック状態の非古典性の定量化 Quantifying nonclassicality of mixed Fock states ( http://arxiv.org/abs/2406.01717v1 ) ライセンス: Link先を確認	Spencer Rogers, Tommy Muth, Wenchao Ge,	(参考訳) ボソニックモードの非古典的状態は、量子化技術にとって重要な資源である。しかし、これらの状態、特に混合状態の非古典性を定量化することは困難である。ここでは、オペレーショナルリソース理論(ORT)測度(W. Ge, K. Jacobs, S. Asiri, M. Foss-Feig, M. S. Zubairy, Phys. Rev. Res. 2, 023400 (2020))]を介して混合フォック状態におけるボソニックモードの非古典性を定量化する結果を示す。一般的に、混合状態に対するORTの測定は凸屋根の発見を伴うため、難しい。しかし,この問題を線形プログラミング問題に還元できることを示す。数値最適化の結果を解析することにより, 隣接する3つないし4つのフォック状態が非ゼロ集団である場合に, 正確な解析結果を得ることができる。興味深いことに、このようなモードは人口によって異なる段階にある可能性がある。最後に,本手法が高階密度行列に対して一般化可能であることを示す。本研究は, 任意の混合ボゾン状態の非古典性評価と, その他の凸屋根最適化問題の解決に有効であることを示す。 Nonclassical states of bosonic modes are important resources for quantum-enhanced technologies. Yet, quantifying nonclassicality of these states, in particular mixed states, can be a challenge. Here we present results of quantifying the nonclassicality of a bosonic mode in a mixed Fock state via the operational resource theory (ORT) measure [W. Ge, K. Jacobs, S. Asiri, M. Foss-Feig, and M. S. Zubairy, Phys. Rev. Res. 2, 023400 (2020)], which relates nonclassicality to metrological advantage. Generally speaking, evaluating the ORT measure for mixed states is challenging, since it involves finding a convex roof. However, we show that our problem can be reduced to a linear programming problem. By analyzing the results of numerical optimization, we are able to extract exact, analytical results for the case where three or four neighboring Fock states have nonzero population. Interestingly, we find that such a mode can be in distinct phases, depending on the populations. Lastly, we demonstrate how our method is generalizable to density matrices of higher ranks. Our findings suggests a viable method for evaluating nonclassicality of arbitrary mixed bosonic states and potentially for solving other convex roof optimization problems.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# LLMの高度な外部管理と効率的な量子化のための回転と置換 Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs ( http://arxiv.org/abs/2406.01721v1 ) ライセンス: Link先を確認	Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei,	(参考訳) 大規模言語モデル(LLM)の量子化は、主に低ビット表現の効率を損なう外部アクティベーションが原因で、大きな課題を生んでいる。従来のアプローチは主に、すべてのトークンに対して常に高い等級を持つ通常のアウトリーチ-アクティベーションの解決に重点を置いている。しかし、これらの技術は、価値が著しく高く、低ビット量子化時に大きな性能損失を生じさせるような、大量出力器を扱う際には弱まる。本研究では,2種類の外乱を効果的に除去するために,回転変換と置換変換を用いた革新的な量子化戦略であるDuQuantを提案する。当初、DuQuantは特定の外周次元から情報を得た回転行列を構築し、異なる回転ブロック内の隣接チャネルでこれらの外周を再分配する。その後、ブロック間の外れ値のバランスの取れた分布を確保するためにジグザグ置換を適用し、ブロック単位の分散を最小化する。追加回転により、活性化ランドスケープの滑らか性がさらに向上し、モデル性能が向上する。 DuQuantは量子化プロセスの合理化を図り、4ビットのウェイトアクティベーション量子化の下でも、様々なLLMアーキテクチャで複数のタスクにおいて上位階層の結果を達成する。私たちのコードはhttps://github.com/Hsu1023/DuQuant.comから入手可能です。 Quantizing large language models (LLMs) presents significant challenges, primarily due to outlier activations that compromise the efficiency of low-bit representation. Traditional approaches mainly focus on solving Normal Outliers-activations with consistently high magnitudes across all tokens. However, these techniques falter when dealing with Massive Outliers, which are significantly higher in value and often cause substantial performance losses during low-bit quantization. In this study, we propose DuQuant, an innovative quantization strategy employing rotation and permutation transformations to more effectively eliminate both types of outliers. Initially, DuQuant constructs rotation matrices informed by specific outlier dimensions, redistributing these outliers across adjacent channels within different rotation blocks. Subsequently, a zigzag permutation is applied to ensure a balanced distribution of outliers among blocks, minimizing block-wise variance. An additional rotation further enhances the smoothness of the activation landscape, thereby improving model performance. DuQuant streamlines the quantization process and demonstrates superior outlier management, achieving top-tier results in multiple tasks with various LLM architectures even under 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# UTMシステムにおけるUAVの協調型広帯域スペクトルセンシングとスケジューリング Federated Learning-based Collaborative Wideband Spectrum Sensing and Scheduling for UAVs in UTM Systems ( http://arxiv.org/abs/2406.01727v1 ) ライセンス: Link先を確認	Sravan Reddy Chintareddy, Keenan Roach, Kenny Cheung, Morteza Hashemi,	(参考訳) 本稿では,ネットワーク化無人航空機(UAV)の協調広帯域スペクトル検出とスケジューリングのためのデータ駆動型フレームワークを提案する。フレームワーク全体は3つの主要なステージで構成されています。まず、モデルトレーニング段階では、マルチセル環境におけるデータセット生成と、フェデレートラーニング(FL)アーキテクチャを用いた機械学習(ML)モデルのトレーニングを行う。本研究は,無線用FLに関する既存の研究と異なり,無線データセット生成を直接統合した新しいアーキテクチャを提案し,マルチセル環境における大気上信号からのI/QサンプルをFLトレーニングプロセスに統合する。第2に、協調スペクトル推定段階において、無人航空機システム交通管理(UTM)エコシステムと互換性のある協調スペクトル融合戦略を提案する。最後に、スペクトルスケジューリング段階において、検出されたスペクトル孔を二次ユーザに動的に割り当てるために強化学習(RL)ソリューションを利用する。提案手法を評価するため,MATLAB LTEツールボックスを用いたほぼ現実的な合成データセットを生成するための総合シミュレーションフレームワークを構築した。この評価手法は、航空機用ML/AIベースのスペクトル管理ソリューションの開発に使用できる大規模なスペクトルデータセットを生成するフレキシブルなフレームワークを提供する。 In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users (SUs) to opportunistically utilize detected "spectrum holes". Our overall framework consists of three main stages. Firstly, in the model training stage, we explore dataset generation in a multi-cell environment and training a machine learning (ML) model using the federated learning (FL) architecture. Unlike the existing studies on FL for wireless that presume datasets are readily available for training, we propose a novel architecture that directly integrates wireless dataset generation, which involves capturing I/Q samples from over-the-air signals in a multi-cell environment, into the FL training process. Secondly, in the collaborative spectrum inference stage, we propose a collaborative spectrum fusion strategy that is compatible with the unmanned aircraft system traffic management (UTM) ecosystem. Finally, in the spectrum scheduling stage, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users. To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station~(BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for developing ML/AI-based spectrum management solutions for aerial devices.	翻訳日:2024-06-05 21:11:55 公開日:2024-06-03
# ラーニング・トゥ・キャッシュ:層キャッシングによる拡散変換器の高速化 Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching ( http://arxiv.org/abs/2406.01733v1 ) ライセンス: Link先を確認	Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang,	(参考訳) 拡散変換器は近年,様々なタスクに対して前例のない生成能力を実証している。しかしながら、奨励的な結果は、大きなパラメータを持つトランスフォーマーモデルに対する推論を必要とするため、遅延推論のコストが伴う。本研究では,モデルパラメータを更新することなく,キャッシング機構を導入することで,拡散変圧器内の多数の層を計算し,容易に除去することができることを示す。例えば、U-ViT-H/2の場合、最大93.68%のキャッシュステップ(全ステップで46.84%)を削除でき、FIDは0.01未満である。そこで本研究では,拡散変圧器の動的手法でキャッシュを実行することを学習する,L2C(Learning-to-Cache)という新しい手法を提案する。具体的には,変圧器の層構造と拡散の逐次的性質を利用して,各層をキャッシングの基本単位として扱うことで,時間ステップ間の冗長な計算を探索する。層をキャッシュ・削除する層を特定するディープモデルにおける指数探索空間の課題に対処するため,新しい微分可能な最適化手法を提案する。その後、入力不変かつタイムステップ可変なルータが最適化され、最終的に静的な計算グラフが生成される。実験の結果,L2C は DDIM や DPM-Solver など,キャッシュベースの手法とほぼ同等の推論速度で,サンプリング性能を向上していることがわかった。 Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. In the case of U-ViT-H/2, for example, we may remove up to 93.68% of the computation in the cache steps (46.84% for all steps), with less than 0.01 drop in FID. To achieve this, we introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Specifically, by leveraging the identical structure of layers in transformers and the sequential nature of diffusion, we explore redundant computations between timesteps by treating each layer as the fundamental unit for caching. To address the challenge of the exponential search space in deep models for identifying layers to cache and remove, we propose a novel differentiable optimization objective. An input-invariant yet timestep-variant router is then optimized, which can finally produce a static computation graph. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-Solver, alongside prior cache-based methods at the same inference speed.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 良いバイブズ! - 腕時計振動による個人認証に向けて Good Vibes! Towards Phone-to-User Authentication Through Wristwatch Vibrations ( http://arxiv.org/abs/2406.01738v1 ) ライセンス: Link先を確認	Jakob Dittrich, Rainhard Dieter Findling,	(参考訳) モバイルデバイスは、ユーザに対して、不正アクセスを防ぐために認証を要求することが多いが、モバイルデバイスは通常、ユーザに対して認証を行わない。これは、ユーザーが無意識に異なるモバイルデバイスと対話する余地を残している。本稿では,モバイル端末間認証の変種であるGoodVibes認証について述べる。ユーザのスマートフォンは,事前に選択した認証振動パターンで振動する腕時計を通じてユーザに対して認証を行う。我々はAndroidのプロトタイプとしてGoodVibes認証を実装し、30人の参加者で異なる認証シナリオを評価し、認証振動パターンを異なるパッターから、無関係な振動から、そして欠落しているパターンから、適切に認識し識別できるユーザを見つける。 While mobile devices frequently require users to authenticate to prevent unauthorized access, mobile devices typically do not authenticate to their users. This leaves room for users to unwittingly interact with different mobile devices. We present GoodVibes authentication, a variant of mobile device-to-user authentication, where the user's phone authenticates to the user through their wristwatch vibrating in their pre-selected authentication vibration pattern. We implement GoodVibes authentication as an Android prototype, evaluate different authentication scenarios with 30 participants, and find users to be able to well recognize and distinguish their authentication vibration pattern from different patters, from unrelated vibrations, and from the pattern being absent.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# ウィグナー関数を用いた開量子系の量子速度限界 Quantum speed limit of open quantum system models using the Wigner function ( http://arxiv.org/abs/2406.01741v1 ) ライセンス: Link先を確認	Arti Gaharwar, Devvrat Tiwari, Subhashish Banerjee,	(参考訳) 開系モデルの量子速度制限時間は、ワッサーシュタイン-1-距離とウィグナー関数を用いて検討する。使用法は相共変体と、位置依存結合を介して硬化した熱浴と相互作用する2量子モデルからなる。量子ビットの位置へのカップリングの依存は、進化のスピードアップに寄与する集合状態における力学の研究を可能にした。ウィグナー関数の使用は、自然に研究されたシステムの量子性の研究を可能にする。非マルコフ的挙動、量子性、および量子速度制限時間の間の興味深い相互作用が観察される。量子相関の存在は進化を加速させる。 The quantum speed limit time of open system models is explored using the Wasserstein-1-distance and the Wigner function. Use is made of the phase covariant and a two-qubit model interacting with a squeezed thermal bath via position-dependent coupling. The dependence of the coupling on the position of the qubits allowed the study of the dynamics in the collective regime, which is conducive to speeding up the evolution. The use of the Wigner function naturally allows the study of the quantumness of the systems studied. An interesting interplay is observed between non-Markovian behavior, quantumness, and the quantum speed limit time. The presence of quantum correlations is seen to speed up the evolution.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 127キュービットゲートモデルIBM量子コンピュータを用いた量子最適化は、非自明なバイナリ最適化問題に対して量子アニールより優れている。 Quantum optimization using a 127-qubit gate-model IBM quantum computer can outperform quantum annealers for nontrivial binary optimization problems ( http://arxiv.org/abs/2406.01743v1 ) ライセンス: Link先を確認	Natasha Sachdeva, Gavin S. Harnett, Smarak Maity, Samuel Marsh, Yulun Wang, Adam Winick, Ryan Dougherty, Daniel Canuto, You Quan Chong, Michael Hush, Pranav S. Mundada, Christopher D. B. Bentley, Michael J. Biercuk, Yuval Baum,	(参考訳) ゲートモデル量子コンピュータにおける二項組合せ最適化問題に対する包括的量子解法を導入する。内部ワークフローの概要として、カスタマイズされたアンサッツと変分パラメータ更新戦略の統合、ハードウェア実行におけるエラーの効率的な抑制、ビットフリップエラーの修正のためのオーバーヘッドのない後処理について述べる。我々は、この問題をIBMの量子コンピュータにベンチマークし、古典的な非自明なバイナリ最適化問題をいくつか行ない、古典的なシミュレーションやソリューションの事前知識を使わずに、ハードウェア上で最適化を行う。まず、最大120キュービットの密度を持つランダムな正規グラフに対して、そのグラフトポロジがデバイス接続と一致しないようなランダムな正規グラフに対して、Max-Cutのインスタンスを正しく解く能力を示す。次に, 線形, 二次, 立方体相互作用項を持つ127キュービットスピングラスモデルの高次二乗最適化に適用し, 基底状態エネルギーの探索に成功した。この新しい量子解法は、DWaveアニールラーを用いて公表された結果と比較して最大$\sim1500\times$で最小エネルギーを見つける可能性を高め、アニールラーが故障した場合に正しい解を見つけることができる。さらに、どちらの問題にも、Q-CTRLソルバは、追求された問題の相対的難易度を示すために用いられるヒューリスティック局所解器よりも優れる。全体として、これらの結果はハードウェア上での解決に成功している最大の量子最適化であり、ゲートモデル量子コンピュータが二進最適化のクラスにおいてアニールを初めて上回ったことを実証している。 We introduce a comprehensive quantum solver for binary combinatorial optimization problems on gate-model quantum computers that outperforms any published alternative and consistently delivers correct solutions for problems with up to 127 qubits. We provide an overview of the internal workflow, describing the integration of a customized ansatz and variational parameter update strategy, efficient error suppression in hardware execution, and overhead-free post-processing to correct for bit-flip errors. We benchmark this solver on IBM quantum computers for several classically nontrivial unconstrained binary optimization problems -- the entire optimization is conducted on hardware with no use of classical simulation or prior knowledge of the solution. First, we demonstrate the ability to correctly solve Max-Cut instances for random regular graphs with a variety of densities using up to 120 qubits, where the graph topologies are not matched to device connectivity. Next, we apply the solver to higher-order binary optimization and successfully search for the ground state energy of a 127-qubit spin-glass model with linear, quadratic, and cubic interaction terms. Use of this new quantum solver increases the likelihood of finding the minimum energy by up to $\sim1,500\times$ relative to published results using a DWave annealer, and it can find the correct solution when the annealer fails. Furthermore, for both problem types, the Q-CTRL solver outperforms a heuristic local solver used to indicate the relative difficulty of the problems pursued. Overall, these results represent the largest quantum optimizations successfully solved on hardware to date, and demonstrate the first time a gate-model quantum computer has been able to outperform an annealer for a class of binary optimization problems.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# データブリーチに直面する危機コミュニケーション Crisis Communication in the Face of Data Breaches ( http://arxiv.org/abs/2406.01744v1 ) ライセンス: Link先を確認	Jukka Ruohonen, Kalle Hjerppe, Katleena Kortesuo,	(参考訳) データ漏洩は、データへの不正アクセスを指す。典型的には、常に、データ漏洩はサイバー犯罪に関するものだ。このような犯罪に直面している組織も、しばしば危機的状況にある。したがって、組織は危機管理手順におけるデータ漏洩に備えるべきである。これらの手続きには危機コミュニケーション計画も含まれなければならない。そこで本研究では,データ漏洩危機コミュニケーション戦略とその実践的実行について検討する。背景は、活気ある危機コミュニケーション研究領域から来ている。フィンランドのいくつかの質的なケーススタディによると、従来の知恵は良好であり、成功事例は早期にコミュニケーションを行い、責任を負い、謝罪し、当局に通知する。失敗したケースは、非難のシフト、組織を犠牲者として位置づけること、公的当局に知らせることのできないことなど、さまざまな逆の度合いを示す。これらの質的な洞察により、既存の危機コミュニケーション研究で無視されているヨーロッパの規制を含め、データ漏洩の危機、その特異性、およびそれらの管理に特化して研究領域に寄与する。 Data breaches refer to unauthorized accesses to data. Typically but not always, data breaches are about cyber crime. An organization facing such a crime is often also in a crisis situation. Therefore, organizations should prepare also for data breaches in their crisis management procedures. These procedures should include also crisis communication plans. To this end, this paper examines data breach crisis communication strategies and their practical executions. The background comes from the vibrant crisis communication research domain. According to a few qualitative case studies from Finland, the conventional wisdom holds well; the successful cases indicate communicating early, taking responsibility, offering an apology, and notifying public authorities. The unsuccessful cases show varying degrees of the reverse, including shifting of blame, positioning of an organization as a victim, and failing to notify public authorities. With these qualitative insights, the paper contributes to the research domain by focusing specifically on data breach crises, their peculiarities, and their management, including with respect to European regulations that have been neglected in existing crisis communication research.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 対話的接地理解のための大規模言語モデルの構築に向けて Towards Harnessing Large Language Models for Comprehension of Conversational Grounding ( http://arxiv.org/abs/2406.01749v1 ) ライセンス: Link先を確認	Kristiina Jokinen, Phillip Schneider, Taiga Mori,	(参考訳) 会話基盤とは、対話を行う参加者間の相互知識を確立するための協調的なメカニズムである。本研究では、情報探索会話を分析し、暗黙的または暗黙的な接地と接地的知識要素の予測に関連する対話を分類する際の大規模言語モデルの能力について検討する。実験の結果,2つのタスクにおいて,大規模言語モデルが直面する課題を明らかにし,パイプラインアーキテクチャや知識ベースを通じて,大規模言語モデルに基づく会話基盤の理解を強化するための研究が進行中であることを明らかにした。これらのイニシアチブは、会話における基礎知識の複雑さを扱うために、より効果的な対話システムを開発することを目的としている。 Conversational grounding is a collaborative mechanism for establishing mutual knowledge among participants engaged in a dialogue. This experimental study analyzes information-seeking conversations to investigate the capabilities of large language models in classifying dialogue turns related to explicit or implicit grounding and predicting grounded knowledge elements. Our experimental results reveal challenges encountered by large language models in the two tasks and discuss ongoing research efforts to enhance large language model-based conversational grounding comprehension through pipeline architectures and knowledge bases. These initiatives aim to develop more effective dialogue systems that are better equipped to handle the intricacies of grounded knowledge in conversations.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 最適重み付き平均値の最適化:効率的な分散スパース分類 Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification ( http://arxiv.org/abs/2406.01753v1 ) ライセンス: Link先を確認	Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt,	(参考訳) 分散トレーニングは、ますます大規模なデータセット上で線形モデルを最適化するソリューションとしてしばしば見なされるが、一般的な分散アプローチのマシン間通信コストは、データ次元が増加するにつれて支配的になる。最近の非インタラクティブアルゴリズムの研究は、機械間の1ラウンドの通信だけで線形モデルの近似解を効率的に得ることができることを示している。しかし、この近似はしばしば機械の数が増えるにつれて縮退する。本稿では,近年の最適重み付け平均法に基づく新しい手法であるACOWAを導入する。その結果、分散ロジスティック回帰では、ACOWAは経験的リスク最小化に忠実で、他の分散アルゴリズムよりもかなり高い精度で解が得られることがわかった。 While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases. Recent work on non-interactive algorithms shows that approximate solutions for linear models can be obtained efficiently with only a single round of communication among machines. However, this approximation often degenerates as the number of machines increases. In this paper, building on the recent optimal weighted average method, we introduce a new technique, ACOWA, that allows an extra round of communication to achieve noticeably better approximation quality with minor runtime increases. Results show that for sparse distributed logistic regression, ACOWA obtains solutions that are more faithful to the empirical risk minimizer and attain substantially higher accuracy than other distributed algorithms.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# スペーサー、より良く、より深く、より強く:厳密な直交初期化によるスパーストレーニングの改善 Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization ( http://arxiv.org/abs/2406.01755v1 ) ライセンス: Link先を確認	Aleksandra Irena Nowak, Łukasz Gniecki, Filip Szatkowski, Jacek Tabor,	(参考訳) 静的スパーストレーニングは、スパースモデルをスクラッチからトレーニングすることを目的としており、近年顕著な成果を上げている。鍵となる設計選択はスパース初期化によって与えられ、バイナリマスクを介してトレーニング可能なサブネットワークを決定する。既存の方法は、あらかじめ定義された密接な初期化に基づいて、主にそのようなマスクを選択する。このようなアプローチは、最適化に対するマスクの潜在的影響を効果的に活用できないかもしれない。動的等尺性の研究にインスパイアされた別の方向は、勾配信号の安定化に役立つスパースサブネットワークに直交性を導入することである。そこで本研究では,ランダムなアジェンダ回転の合成に基づく,新しいスパースな直交初期化スキームであるExact Orthogonal Initialization (EOI)を提案する。他の既存手法とは対照的に、我々の手法は正確な(近似されていない)直交性を提供し、任意の密度を持つ層の作成を可能にする。実験によりEOIの優れた有効性と効率を実証し、共通のスパース初期化技術より一貫して優れていることを示す。本手法は,スパルスマスク選択に伴う静的スパーストレーニングにおいて,重量初期化の重要な役割を強調し,残差接続や正規化を伴わない1000層MLPおよびCNNネットワークの高度スパース訓練を可能にする。コードはhttps://github.com/woocash2/sparser-better-deeper-strongerで公開されている。 Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask's potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection. The code is available at https://github.com/woocash2/sparser-better-deeper-stronger	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 位置:集団化社会に向けてのカスケード格差の法則を破る Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities ( http://arxiv.org/abs/2406.01757v1 ) ライセンス: Link先を確認	Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh,	(参考訳) 基礎モデルの台頭はAIを前進させる大きな可能性を秘めているが、この進歩は既存のリスクと不平等を増幅し、余分なコミュニティを置き去りにする可能性がある。本稿では,疎外化社会への格差 - パフォーマンス, 表現, プライバシー, 堅牢性, 解釈可能性, 安全性 - は, 孤立した関心事ではなく, カスケード的不一致現象の相互接続要素である,と論じる。我々は、基礎モデルと伝統的なモデルとを対比し、限界化コミュニティに対する更なる格差の可能性を強調します。さらに,相互接続の相違が長期的負の結果を招きうる基礎モデルにおいて,カスケードの影響の独特な脅威を強調した。機械学習の文脈において、余分なコミュニティを定義し、格差の多面的な性質を探求する。我々はこれらの格差の源泉を分析し、データ作成、トレーニング、展開手順からそれらを追跡し、複雑な技術的・社会技術的景観を強調します。プレッシャー危機を緩和するため、我々は、その源泉における格差を軽減するための一連の行動を呼び掛けて結論づける。 The rise of foundation models holds immense promise for advancing AI, but this progress may amplify existing risks and inequalities, leaving marginalized communities behind. In this position paper, we discuss that disparities towards marginalized communities - performance, representation, privacy, robustness, interpretability and safety - are not isolated concerns but rather interconnected elements of a cascading disparity phenomenon. We contrast foundation models with traditional models and highlight the potential for exacerbated disparity against marginalized communities. Moreover, we emphasize the unique threat of cascading impacts in foundation models, where interconnected disparities can trigger long-lasting negative consequences, specifically to the people on the margin. We define marginalized communities within the machine learning context and explore the multifaceted nature of disparities. We analyze the sources of these disparities, tracing them from data creation, training and deployment procedures to highlight the complex technical and socio-technical landscape. To mitigate the pressing crisis, we conclude with a set of calls to action to mitigate disparity at its source.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# LatentからLucidへ:知識グラフの埋め込みを解釈可能な構造に変換する From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures ( http://arxiv.org/abs/2406.01759v1 ) ライセンス: Link先を確認	Christoph Wehner, Chrysa Iliopoulou, Tarek R. Besold,	(参考訳) 本稿では,知識グラフ埋め込みモデルに適したポストホックな説明可能なAI手法を提案する。これらのモデルは知識グラフ補完にとって必須であり、不透明でブラックボックスの性質を批判している。高次元の潜在表現を通して知識グラフのセマンティクスを捉えることに大きな成功にもかかわらず、その固有の複雑さは説明可能性に重大な課題をもたらす。既存手法とは異なり,本手法は知識グラフ埋め込みモデルによって符号化された潜在表現を直接デコードし,類似の埋め込みが知識グラフ内の類似した振る舞いを反映する原理を活用する。類似した埋め込みエンティティのサブグラフ近傍の異なる構造を同定することにより、モデルが依存する統計規則を同定し、これらの知見を人間の理解可能な象徴的規則や事実に変換する。これにより、知識グラフ埋め込みモデルの抽象表現と予測出力とのギャップを埋め、明確で解釈可能な洞察を提供する。主要なコントリビューションには、知識グラフ埋め込みモデルのための、新しいポストホックな説明可能なAIメソッドが含まれている。このメソッドの柔軟性は、多様なユーザニーズを満たすルールベース、インスタンスベース、アナロジーベースの説明の生成を可能にする。広範囲な評価は、忠実で局所的な説明を提供することにおける我々のアプローチの有効性を示し、知識グラフ埋め込みモデルの透明性と信頼性を高めている。 This paper introduces a post-hoc explainable AI method tailored for Knowledge Graph Embedding models. These models are essential to Knowledge Graph Completion yet criticized for their opaque, black-box nature. Despite their significant success in capturing the semantics of knowledge graphs through high-dimensional latent representations, their inherent complexity poses substantial challenges to explainability. Unlike existing methods, our approach directly decodes the latent representations encoded by Knowledge Graph Embedding models, leveraging the principle that similar embeddings reflect similar behaviors within the Knowledge Graph. By identifying distinct structures within the subgraph neighborhoods of similarly embedded entities, our method identifies the statistical regularities on which the models rely and translates these insights into human-understandable symbolic rules and facts. This bridges the gap between the abstract representations of Knowledge Graph Embedding models and their predictive outputs, offering clear, interpretable insights. Key contributions include a novel post-hoc explainable AI method for Knowledge Graph Embedding models that provides immediate, faithful explanations without retraining, facilitating real-time application even on large-scale knowledge graphs. The method's flexibility enables the generation of rule-based, instance-based, and analogy-based explanations, meeting diverse user needs. Extensive evaluations show our approach's effectiveness in delivering faithful and well-localized explanations, enhancing the transparency and trustworthiness of Knowledge Graph Embedding models.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# タイムビン光子を介する閉じ込められた原子の高忠実リモート絡み合い High-fidelity remote entanglement of trapped atoms mediated by time-bin photons ( http://arxiv.org/abs/2406.01761v1 ) ライセンス: Link先を確認	Sagnik Saha, Mikhail Shalaev, Jameson O'Reilly, Isabella Goetting, George Toh, Ashish Kalakuntla, Yichao Yu, Christopher Monroe,	(参考訳) 量子処理ノード間のフォトニック相互接続は、大規模量子コンピュータやネットワークを実現する唯一の方法である。このようなアーキテクチャのボトルネックは、よく分離された量子メモリとフライング光子のインターフェイスである。遠隔分離された原子量子ビットメモリ間の高忠実な絡み合いを確立し, パルスのタイミングに蓄えられたフォトニック量子ビットを介する。このような時間ビン符号化は偏極誤差に対する感度を除去し、長距離量子通信を可能にし、2つ以上の状態を持つ量子メモリに拡張可能である。測定に基づく誤差検出プロセスを用い,原子再コイルによる基本的な誤差源を抑制することにより,97%のエンタングルメント忠実度を実現し,99.9%を超える忠実度が実現可能であることを示す。 Photonic interconnects between quantum processing nodes are likely the only way to achieve large-scale quantum computers and networks. The bottleneck in such an architecture is the interface between well-isolated quantum memories and flying photons. We establish high-fidelity entanglement between remotely separated trapped atomic qubit memories, mediated by photonic qubits stored in the timing of their pulses. Such time-bin encoding removes sensitivity to polarization errors, enables long-distance quantum communication, and is extensible to quantum memories with more than two states. Using a measurement-based error detection process and suppressing a fundamental source of error due to atomic recoil, we achieve an entanglement fidelity of 97% and show that fidelities beyond 99.9% are feasible.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# コンパチブル関数近似を用いた単一ループ(Natural Actor-Critic)の非漸近解析 Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation ( http://arxiv.org/abs/2406.01762v1 ) ライセンス: Link先を確認	Yudan Wang, Yue Wang, Yi Zhou, Shaofeng Zou,	(参考訳) アクター批判 (AC) は強化学習において最適な政策を学習する強力な方法であり、例えば、時間差(TD)学習を関数近似で用い、現在の方針を評価し、アクターは評論家の情報を用いて近似勾配方向に沿ってポリシーを更新する。本稿では、ACアルゴリズムとNACアルゴリズムの両方に対して、textit{tightest}非漸近収束境界を提供する。具体的には、AC は $\epsilon +\varepsilon_{\text{critic}}$ 定常点近傍に収束し、NAC は $\epsilon +\varepsilon_{\text{critic}}+\sqrt{\varepsilon_{\text{actor}}} グローバル最適点近傍に収束し、最も知られている $\mathcal{O}(\epsilon^{-3})$ は、批評家の近似誤差である。本稿では,ACアルゴリズムとNACアルゴリズムのコンバージェンスを,相反する関数近似を用いて解析する。私たちの分析では、最もよく知られたサンプルの複雑さを達成しながら、エラー境界から$\varepsilon_{\text{critic}}$という用語を排除しています。さらに,1つのマルコフサンプル軌道を用いた単一ループ設定に着目する。我々の主要な技術的新奇性は、批判者における政策依存的かつ時間変化の相反する関数近似による確率的バイアスの分析と、マルコフ標本軌道によるMDPの非エルゴード性を扱うことである。付録にも数値結果が記載されている。 Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, e.g., temporal difference (TD) learning with function approximation, to evaluate the current policy and the actor updates the policy along an approximate gradient direction using information from the critic. This paper provides the \textit{tightest} non-asymptotic convergence bounds for both the AC and natural AC (NAC) algorithms. Specifically, existing studies show that AC converges to an $\epsilon+\varepsilon_{\text{critic}}$ neighborhood of stationary points with the best known sample complexity of $\mathcal{O}(\epsilon^{-2})$ (up to a log factor), and NAC converges to an $\epsilon+\varepsilon_{\text{critic}}+\sqrt{\varepsilon_{\text{actor}}}$ neighborhood of the global optimum with the best known sample complexity of $\mathcal{O}(\epsilon^{-3})$, where $\varepsilon_{\text{critic}}$ is the approximation error of the critic and $\varepsilon_{\text{actor}}$ is the approximation error induced by the insufficient expressive power of the parameterized policy class. This paper analyzes the convergence of both AC and NAC algorithms with compatible function approximation. Our analysis eliminates the term $\varepsilon_{\text{critic}}$ from the error bounds while still achieving the best known sample complexities. Moreover, we focus on the challenging single-loop setting with a single Markovian sample trajectory. Our major technical novelty lies in analyzing the stochastic bias due to policy-dependent and time-varying compatible function approximation in the critic, and handling the non-ergodicity of the MDP due to the single Markovian sample trajectory. Numerical results are also provided in the appendix.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 腹部大動脈瘤CT像に対する近似的アプローチとAI的アプローチ An approximation-based approach versus an AI one for the study of CT images of abdominal aorta aneurysms ( http://arxiv.org/abs/2406.01764v1 ) ライセンス: Link先を確認	Lucrezia Rinelli, Arianna Travaglini, Nicolò Vescera, Gianluca Vinti,	(参考訳) 本研究は,腹部大動脈瘤のCT像に応用した2つのアプローチについて検討した。両者とも,大動脈の特許領域を抽出するために基底CT像を分割することを目的としており,この病理診断のための腎毒性造影剤の代替案を提案する。決定論的アプローチは、カントロビッチ作用素のサンプリングと背景理論を採用し、これらの演算子の画像への再構成と拡張能力を活用する一方で、人工知能ベースのアプローチは、U-netニューラルネットワークに基づいている。 2つの手法の試験から得られた結果は数値的および視覚的に比較され、両モデルが正確な結果が得られることを示した。 This study evaluates two approaches applied to computed tomography (CT) images of patients with abdominal aortic aneurysm: one deterministic, based on tools of Approximation Theory, and one based on Artificial Intelligence. Both aim to segment the basal CT images to extract the patent area of the aortic vessel, in order to propose an alternative to nephrotoxic contrast agents for diagnosing this pathology. While the deterministic approach employs sampling Kantorovich operators and the theory behind, leveraging the reconstruction and enhancement capabilities of these operators applied to images, the artificial intelligence-based approach lays on a U-net neural network. The results obtained from testing the two methods have been compared numerically and visually to assess their performances, demonstrating that both models yield accurate results.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# ロバスト変圧器トラッカーに対する対向攻撃の再現性の検討 Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers ( http://arxiv.org/abs/2406.01765v1 ) ライセンス: Link先を確認	Fatemeh Nourilenjan Nokabadi, Jean-François Lalonde, Christian Gagné,	(参考訳) 新しいトランスフォーマーネットワークはオブジェクトトラッキングパイプラインに統合され、最新のベンチマークで強いパフォーマンスを示している。本稿では, 逆攻撃におけるトランスフォーマートラッカーの挙動と, パラメータの変化に伴うデータセットの追跡において, 異なる攻撃がどう作用するかを理解することに焦点を当てる。我々は,変圧器と非変圧器のバックボーンを有するオブジェクトトラッカーに対する既存の敵攻撃の有効性を評価するために,一連の実験を行った。トランスフォーマーベースの3つと、他のアーキテクチャを活用する4つを含む、7つの異なるトラッカーを実験しました。これらのトラッカーは、VOT2022ST、UAV123、GOT10kデータセットのパフォーマンスと堅牢性を評価するために、4つの最近の攻撃方法に対してテストされる。本研究では,境界ボックスと二値マスク予測に基づく物体追跡器の対向ロバスト性の評価と,異なるレベルの摂動による攻撃方法について検討した。興味深いことに, 摂動レベルの変化は, 攻撃後の全体追跡結果に有意な影響を及ぼさない可能性が示唆された。同様に、攻撃摂動の空間性と非受容性は、摂動レベルシフトに対して安定である。すべてのトランストラッカーに特定の攻撃を施すことにより、より強力なクロスアテンションモデリングを持つトランストラッカーが、VOT2022STやGOT10kのようなトラッキングデータセットに対してより逆の堅牢性を実現することを示す。また, 最新の変圧器トラッカーを効果的に扱うために, 新たな攻撃方法の必要性も示唆した。この研究の再現に必要なコードはhttps://github.com/fatemehN/ReproducibilityStudy.comで公開されている。 New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at https://github.com/fatemehN/ReproducibilityStudy.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 定常二層ニューラルネットワークの局所解析 How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks ( http://arxiv.org/abs/2406.01766v1 ) ライセンス: Link先を確認	Mo Zhou, Rong Ge,	(参考訳) 有用な特徴を学習する能力は、ニューラルネットワークの大きな利点の1つだ。最近の研究は、ニューラルネットワークが機能学習を許さないニューラルタンジェントカーネル(NTK)で動作可能であることを示しているが、多くの研究は、ニューラルネットワークがNTKレギュレーションを超えて機能学習を行う可能性も示している。最近、一連の研究は、勾配に基づくトレーニングの初期段階における特徴学習能力を強調した。本稿では,局所収束解析による勾配降下による特徴学習の別のメカニズムについて考察する。損失が一定の閾値以下になると、慎重に規則化された目標を持つ勾配降下が地道方向を捉えていることが示される。以上の結果から,機能学習は最初の段階だけでなく,訓練の終わりにも起こりうることが示唆された。 The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training.	翻訳日:2024-06-05 21:02:09 公開日:2024-06-03
# 正方形の原子周波数コム量子メモリの確率的最適性 Provable Optimality of the Square-Tooth Atomic Frequency Comb Quantum Memory ( http://arxiv.org/abs/2406.01769v1 ) ライセンス: Link先を確認	Allen Zang, Martin Suchara, Tian Zhong,	(参考訳) AFC(Atomic frequency comb)量子メモリは、オンデマンド検索による光子の多モード高忠実な記憶を可能にするため、量子リピータネットワークにとって有望な技術である。 AFCメモリの検索効率の最適化は、量子ネットワークの絡み合い発生率に強い影響を与えるため重要である。初期の理論的解析と最近の実験的実証にもかかわらず、最高AFC検索効率に対する普遍的最適構成の厳密な証明は示されていない。本稿では,原子アンサンブルの最大光学深さの物理的制約の下で,最適化された2乗歯形 AFC がすべての歯形の中から最も高い検索効率を提供することを示す,簡単な解析的証明を提案する。最適性は、非ゼロ背景吸収と原子の有限均一拡大を考慮した場合でも維持される。我々の証明は、現実的な実験条件下で最適なAFCを作成する方法について、厳密な議論を実験者に提供する。最後に、証明手法が適用可能な他の機能最適化問題も同定し、より一般的なシナリオで平方関数の最適性を証明する。 Atomic frequency comb (AFC) quantum memories are a promising technology for quantum repeater networks because they enable multi-mode, high-fidelity storage of photons with on-demand retrieval. The optimization of the retrieval efficiency of an AFC memory is important because it strongly impacts the entanglement generation rate in quantum networks. Despite initial theoretical analyses and recent experimental demonstrations, a rigorous proof of the universally optimal configuration for the highest AFC retrieval efficiency has not been presented. In this paper we offer a simple analytical proof which shows that the optimized square-tooth AFC provides the highest retrieval efficiency among all possible comb tooth shapes, under the physical constraint of maximal optical depth of an atomic ensemble. The optimality still holds even when the non-zero background absorption and the finite homogeneous broadening of atoms are considered. Our proof provides experimentalists with rigorous arguments how to create optimal AFC under realistic experimental conditions. Finally, we also identify other functional optimization problems where our proof technique is applicable, thus proving the optimality of the square function in more general scenarios.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# LLMs Beyond English: Multilingual Capability of LLMs with Cross-Lingual Feedback (英語) LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback ( http://arxiv.org/abs/2406.01771v1 ) ライセンス: Link先を確認	Wen Lai, Mohsen Mesgar, Alexander Fraser,	(参考訳) 大規模言語モデル(LLM)をほとんどの自然言語に民主化するためには、これらのモデルを多くの言語、特に低リソース言語におけるテキストの理解と生成を可能にすることが不可欠である。近年の多言語 LLM はそのような能力において顕著な性能を示したが、低リソース言語のトレーニングデータが不足しているため、これらの LLM は限られた数の人間言語をサポートしている。さらに、これらのLLMは、英語でのLLMの成功に欠かせない下流タスクに対する人間の嗜好と整合していない。本稿では,LLaMAとBLOOMの多言語対応能力を100言語に拡張したxLLaMA-100とxBLOOM-100を紹介する。そこで我々は,これまでで最大規模の言語カバレッジを示す100言語を含む多言語命令データセットと,30言語を含む言語間フィードバックデータセットの2つのデータセットを構築した。我々は、構築した命令データに基づいて多言語命令チューニングを行い、さらにDPOアルゴリズムを用いて人間のフィードバックとLLMを一致させる。 5つの多言語ベンチマークでxLLMs-100の多言語理解と生成能力を評価する。実験結果から、xLLMs-100は100言語をサポートする新しい最先端の多言語LLMを定義することにより、ベンチマーク全体のピアをかなりのマージンで一貫して上回っていることがわかった。 To democratize large language models (LLMs) to most natural languages, it is imperative to make these models capable of understanding and generating texts in many languages, in particular low-resource ones. While recent multilingual LLMs demonstrate remarkable performance in such capabilities, these LLMs still support a limited number of human languages due to the lack of training data for low-resource languages. Moreover, these LLMs are not yet aligned with human preference for downstream tasks, which is crucial for the success of LLMs in English. In this paper, we introduce xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which scale the multilingual capabilities of LLaMA and BLOOM to 100 languages. To do so, we construct two datasets: a multilingual instruction dataset including 100 languages, which represents the largest language coverage to date, and a cross-lingual human feedback dataset encompassing 30 languages. We perform multilingual instruction tuning on the constructed instruction data and further align the LLMs with human feedback using the DPO algorithm on our cross-lingual human feedback dataset. We evaluate the multilingual understanding and generating capabilities of xLLMs-100 on five multilingual benchmarks. Experimental results show that xLLMs-100 consistently outperforms its peers across the benchmarks by considerable margins, defining a new state-of-the-art multilingual LLM that supports 100 languages.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 加速的フェデレーション学習のための効率的なデータ分布推定 Efficient Data Distribution Estimation for Accelerated Federated Learning ( http://arxiv.org/abs/2406.01774v1 ) ライセンス: Link先を確認	Yuanli Wang, Lei Huang,	(参考訳) Federated Learning(FL)は、多数の分散エッジデバイスでグローバルモデルをその場でトレーニングする、プライバシ保護機械学習パラダイムである。これらのシステムはしばしば数百万のユーザデバイスで構成されており、各エポックでのトレーニングには利用可能なデバイスのサブセットしか使用できない。デバイスがシステムリソースとトレーニングデータの両方において非常に異質であることを考えると、デバイス選択戦略の設計は難しい。この不均一性により、デバイス選択はタイムリーなモデル収束と十分なモデル精度にとって極めて重要である。 FLクライアントの不均一性問題に対処するため、モデルカバレッジと精度の点で有望な性能向上を示す様々なクライアント選択アルゴリズムが開発されている。本研究では,大規模FL環境におけるクライアント選択アルゴリズムのオーバーヘッドについて検討する。そこで本研究では,実世界の大規模FL環境におけるオーバヘッドを低減するために,効率的なデータ分散要約計算アルゴリズムを提案する。評価の結果,提案手法はデータの要約時間を最大30倍に,クラスタリング時間を最大360倍に削減できることがわかった。 Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. These systems are often comprised of millions of user devices and only a subset of available devices can be used for training in each epoch. Designing a device selection strategy is challenging, given that devices are highly heterogeneous in both their system resources and training data. This heterogeneity makes device selection very crucial for timely model convergence and sufficient model accuracy. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy. In this work, we study the overhead of client selection algorithms in a large scale FL environment. Then we propose an efficient data distribution summary calculation algorithm to reduce the overhead in a real-world large scale FL environment. The evaluation shows that our proposed solution could achieve up to 30x reduction in data summary time, and up to 360x reduction in clustering time.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# OLoRA:大規模言語モデルの正規化低ランク適応 OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models ( http://arxiv.org/abs/2406.01775v1 ) ライセンス: Link先を確認	Kerim Büyükakyüz,	(参考訳) 大規模言語モデル(LLM)の出現は自然言語処理に革命をもたらし、人間のようなテキストの理解と生成における前例のない能力を可能にした。しかし、これらのモデルの微調整に伴う計算コストと収束時間は重要な課題である。 Low-Rank Adaptation (LoRA) は、トレーニング可能なパラメータの少ない効率的な微調整技術を導入することで、これらの問題を緩和する有望な方法として登場した。本稿では、QR分解による正則行列初期化を利用するLoRA法の強化であるOLoRAを提案する。 OLoRAは、トレーニング可能なパラメータの数やGPUメモリフットプリントなどのLoRAの効率性を維持しながら、LLMトレーニングの収束を著しく加速する。実験により,OLoRAはより高速に収束するだけでなく,様々な言語モデリングタスクにまたがる標準のLoRAと比較して性能も向上していることが示された。この進歩により、LLMのより効率的でアクセスしやすい微調整のための新たな道が開かれ、自然言語アプリケーションに広く採用され、イノベーションがもたらされる可能性がある。 The advent of large language models (LLMs) has revolutionized natural language processing, enabling unprecedented capabilities in understanding and generating human-like text. However, the computational cost and convergence times associated with fine-tuning these models remain significant challenges. Low-Rank Adaptation (LoRA) has emerged as a promising method to mitigate these issues by introducing efficient fine-tuning techniques with a reduced number of trainable parameters. In this paper, we present OLoRA, an enhancement to the LoRA method that leverages orthonormal matrix initialization through QR decomposition. OLoRA significantly accelerates the convergence of LLM training while preserving the efficiency benefits of LoRA, such as the number of trainable parameters and GPU memory footprint. Our empirical evaluations demonstrate that OLoRA not only converges faster but also exhibits improved performance compared to standard LoRA across a variety of language modeling tasks. This advancement opens new avenues for more efficient and accessible fine-tuning of LLMs, potentially enabling broader adoption and innovation in natural language applications.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# DEFT:一般化された$h$-変換学習による条件拡散モデルの効率的な微調整 DEFT: Efficient Finetuning of Conditional Diffusion Models by Learning the Generalised $h$-transform ( http://arxiv.org/abs/2406.01781v1 ) ライセンス: Link先を確認	Alexander Denker, Francisco Vargas, Shreyas Padhy, Kieran Didi, Simon Mathis, Vincent Dutordoir, Riccardo Barbano, Emile Mathieu, Urszula Julia Komorowska, Pietro Lio,	(参考訳) 退化拡散過程に基づく生成的モデリングパラダイムが,逆問題における条件付きサンプリングの第一候補として浮上している。実世界の多くのアプリケーションでは、高価に訓練された大規模無条件拡散モデルにアクセスでき、条件付きサンプリングの改善に活用することを目的としている。最近のアプローチはヒューリスティックに動機付けられており、それら間の接続を隠蔽する統一されたフレームワークが欠如している。さらに、ハイパーパラメータに非常に敏感であること、トレーニングにコストがかかること、あるいはクローズドAPIの背後に隠されたウェイトへのアクセスが必要なこと、といった問題に悩まされることも少なくない。本研究では,数学的によく理解されたDoobのh-transformを用いて条件付きトレーニングとサンプリングを統一する。この新たな視点は、共通の傘の下で既存のメソッドを統一することを可能にする。本フレームワークでは,条件付きネットワークを高速に学習し,大きめの条件付きネットワークを維持しつつ,非常に小さなネットワークを微調整する,条件付き生成のための新しいアプローチであるDEFT(Doob's h-transform Efficient FineTuning)を提案する。 DEFTは既存のベースラインよりもはるかに高速で、様々な線形および非線形のベンチマークで最先端のパフォーマンスを実現している。画像再構成作業では, 自然画像の知覚品質と医用画像の再現性能に優れ, 最大1.6$\times$の高速化を実現している。 Generative modelling paradigms based on denoising diffusion processes have emerged as a leading candidate for conditional sampling in inverse problems. In many real-world applications, we often have access to large, expensively trained unconditional diffusion models, which we aim to exploit for improving conditional sampling. Most recent approaches are motivated heuristically and lack a unifying framework, obscuring connections between them. Further, they often suffer from issues such as being very sensitive to hyperparameters, being expensive to train or needing access to weights hidden behind a closed API. In this work, we unify conditional training and sampling using the mathematically well-understood Doob's h-transform. This new perspective allows us to unify many existing methods under a common umbrella. Under this framework, we propose DEFT (Doob's h-transform Efficient FineTuning), a new approach for conditional generation that simply fine-tunes a very small network to quickly learn the conditional $h$-transform, while keeping the larger unconditional network unchanged. DEFT is much faster than existing baselines while achieving state-of-the-art performance across a variety of linear and non-linear benchmarks. On image reconstruction tasks, we achieve speedups of up to 1.6$\times$, while having the best perceptual quality on natural images and reconstruction performance on medical images.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 状態強化学習によるマルチエージェントの課題 Multi-agent assignment via state augmented reinforcement learning ( http://arxiv.org/abs/2406.01782v1 ) ライセンス: Link先を確認	Leopoldo Agorio, Sean Van Alen, Miguel Calvo-Fullana, Santiago Paternain, Juan Andres Bazerque,	(参考訳) 我々は,制約付き強化学習を通じて,マルチエージェント代入問題の相反する要件に対処し,この目的のために標準正規化手法が不十分であることを強調する。代わりに、二重変数の振動をエージェントによって利用してタスク間の交互化を行う状態拡張アプローチを繰り返す。さらに,これらの乗算器を通信ネットワークを介してゴシップし,他のエージェント状態にアクセスする必要がなくなることで,各エージェントがローカル状態に作用する動作を協調する。これらの方法により、モニタリング数値実験において相関する理論的実現可能性を保証する分散マルチエージェント割当てプロトコルを提案する。 We address the conflicting requirements of a multi-agent assignment problem through constrained reinforcement learning, emphasizing the inadequacy of standard regularization techniques for this purpose. Instead, we recur to a state augmentation approach in which the oscillation of dual variables is exploited by agents to alternate between tasks. In addition, we coordinate the actions of the multiple agents acting on their local states through these multipliers, which are gossiped through a communication network, eliminating the need to access other agent states. By these means, we propose a distributed multi-agent assignment protocol with theoretical feasibility guarantees that we corroborate in a monitoring numerical experiment.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 密度行列精製によるオープン量子系のユニタリダイナミクス Unitary Dynamics for Open Quantum Systems with Density-Matrix Purification ( http://arxiv.org/abs/2406.01783v1 ) ライセンス: Link先を確認	Luis H. Delgado-Granados, Samuel Warren, David A. Mazziotti,	(参考訳) 環境と相互作用する量子システムの正確なモデリングには、計算のアプローチを著しく複雑にする非単項力学に対処する必要がある。本研究では,密度行列の純化を用いた開量子系(OQS)理論を強化し,同じ次元の環境で系を絡み合わせることで力学のユニタリ記述を可能にする。まず, 密度行列浄化法と従来のOQS法との接続性を確立する。次に,システム環境相互作用を基本設計原理から導出することにより,浄化理論のスタンドアロン適用性を実証する。モデルシステムを用いて, 完全正の条件を超えて浄化法が拡張されることを示し, マルコフ力学と非マルコフ力学の両方を効果的にモデル化する。最後に、量子シミュレータに密度行列の精製を実装し、量子コンピュータに適したユニタリフレームワークに非ユニタリなOQSダイナミクスをマッピングする能力を示す。 Accurate modeling of quantum systems interacting with environments requires addressing non-unitary dynamics, which significantly complicates computational approaches. In this work, we enhance an open quantum system (OQS) theory using density-matrix purification, enabling a unitary description of dynamics by entangling the system with an environment of equal dimension. We first establish the connection between density-matrix purification and conventional OQS methods. We then demonstrate the standalone applicability of purification theory by deriving system-environment interactions from fundamental design principles. Using model systems, we show that the purification approach extends beyond the complete positivity condition and effectively models both Markovian and non-Markovian dynamics. Finally, we implement density-matrix purification on a quantum simulator, illustrating its capability to map non-unitary OQS dynamics onto a unitary framework suitable for quantum computers.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# データ駆動型ビジネスプロセス管理の最近の進歩 Recent Advances in Data-Driven Business Process Management ( http://arxiv.org/abs/2406.01786v1 ) ライセンス: Link先を確認	Lars Ackermann, Martin Käppel, Laura Marcus, Linda Moder, Sebastian Dunzer, Markus Hornsteiner, Annina Liessmann, Yorck Zisgen, Philip Empl, Lukas-Valentin Herm, Nicolas Neis, Julian Neuberger, Leo Poss, Myriam Schaschek, Sven Weinzierl, Niklas Wördehoff, Stefan Jablonski, Agnes Koschmider, Wolfgang Kratsch, Martin Matzner, Stefanie Rinderle-Ma, Maximilian Röglinger, Stefan Schönig, Axel Winkelmann,	(参考訳) 最先端技術の急速な発展、データ量の増大、新しいタイプのデータソースの可用性と処理性は、データベースの管理と意思決定のパラダイムシフトにつながった。ビジネスプロセスは組織作業の中核にあるので、これらの開発は組織にとって重要な成功要因としてBPMに大きな影響を与えます。この新たな可能性を考えると、データ駆動型ビジネスプロセス管理は、関連性があり活気ある研究領域となっている。研究分野の複雑さと学際性を考えると、このポジション・ペーパーはデータ駆動型BPMに関する研究知見を提示する。 The rapid development of cutting-edge technologies, the increasing volume of data and also the availability and processability of new types of data sources has led to a paradigm shift in data-based management and decision-making. Since business processes are at the core of organizational work, these developments heavily impact BPM as a crucial success factor for organizations. In view of this emerging potential, data-driven business process management has become a relevant and vibrant research area. Given the complexity and interdisciplinarity of the research field, this position paper therefore presents research insights regarding data-driven BPM.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# RSMM:研究ソフトウェアプロジェクトの成熟度を評価するフレームワーク RSMM: A Framework to Assess Maturity of Research Software Project ( http://arxiv.org/abs/2406.01788v1 ) ライセンス: Link先を確認	Deekshitha, Rena Bakhshi, Jason Maassen, Carlos Martinez Ortiz, Rob van Nieuwpoort, Slinger Jansen,	(参考訳) 研究ソフトウェアを開発する組織や研究者は、単一の研究プロジェクトが提供する資金を超えるソフトウェアを持続可能なものにするという共通の問題に直面している。これは、ソフトウェアを中心としたコミュニティを構築し、適切なライセンスを提供し、信頼性があり再現可能な研究ソフトウェアを作成し、持続的で影響があり、研究ソフトウェアが研究ワークフローで簡単に採用できるようにすることを通じて、研究ソフトウェアエンジニアによって対処される。その結果、研究ソフトウェアの品質、再利用可能性、持続可能性を高めるための多くのプラクティスとガイドラインが存在する。しかしながら、これらのプラクティスを体系的に統合し、組織や研究ソフトウェア開発者が開発や管理プロセスを改善するのに役立つ統一されたフレームワークが欠如しています。本稿では,新しいフレームワークであるRSMMを導入することで,このギャップを埋めることを目的とする。組織的な文献レビューと、研究ソフトウェアプロジェクトの専門家とのインタビューからの洞察によって設計されている。簡単に言うと、RSMMは、79のベストプラクティスを4つの焦点領域にまたがる17の能力に分類することで、研究ソフトウェアプロジェクト管理を評価し、改善するための構造化された経路を提供します。コード品質とセキュリティの評価から影響、持続可能性、再現性の測定に至るまで、このモデルは研究ソフトウェアプロジェクトの成熟度を完全に評価します。 RSMMでは、研究ソフトウェア開発に関わる個人や組織が、さまざまな研究ソフトウェアエンジニアリング課題に取り組むための体系的なアプローチを得る。包括的なチェックリストとしてRSMMを活用することで、組織はプロジェクト管理のプラクティスや組織構造を体系的に評価し、洗練することができる。 The organizations and researchers producing research software face a common problem of making their software sustainable beyond funding provided by a single research project. This is addressed by research software engineers through building communities around their software, providing appropriate licensing, creating reliable and reproducible research software, making it sustainable and impactful, promoting, and ensuring that the research software is easy to adopt in research workflows, etc. As a result, numerous practices and guidelines exist to enhance research software quality, reusability, and sustainability. However, there is a lack of a unified framework to systematically integrate these practices and help organizations and research software developers refine their development and management processes. Our paper aims at bridging this gap by introducing a novel framework: RSMM. It is designed through systematic literature review and insights from interviews with research software project experts. In short, RSMM offers a structured pathway for evaluating and refining research software project management by categorizing 79 best practices into 17 capabilities across 4 focus areas. From assessing code quality and security to measuring impact, sustainability, and reproducibility, the model provides a complete evaluation of a research software project maturity. With RSMM, individuals as well as organizations involved in research software development gain a systematic approach to tackling various research software engineering challenges. By utilizing RSMM as a comprehensive checklist, organizations can systematically evaluate and refine their project management practices and organizational structure.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# AIによるカスタマーサポートチケットの分類:AutoMLによる最先端と実装 AI-based Classification of Customer Support Tickets: State of the Art and Implementation with AutoML ( http://arxiv.org/abs/2406.01789v1 ) ライセンス: Link先を確認	Mario Truss, Stephan Boehm,	(参考訳) サポートチケット分類の自動化は、顧客サポート性能の向上と顧客からの問い合わせの解決時間短縮に不可欠である。本研究の目的は,自動機械学習(AutoML)の適用性を,サポートチケットを分類可能な機械学習モデル(MLモデル)をトレーニングする技術としてテストすることである。本研究で行ったモデル評価は,AutoMLが機械学習モデルを優れた分類性能で訓練するのに利用できることを示す。さらに、AutoMLを活用することで、専門のAI部門やスタッフを持たない企業にとって、AIソリューションの開発に関する新たな洞察を提供することで、研究ギャップを埋める。 Automation of support ticket classification is crucial to improve customer support performance and shortening resolution time for customer inquiries. This research aims to test the applicability of automated machine learning (AutoML) as a technology to train a machine learning model (ML model) that can classify support tickets. The model evaluation conducted in this research shows that AutoML can be used to train ML models with good classification performance. Moreover, this paper fills a research gap by providing new insights into developing AI solutions without a dedicated professional by utilizing AutoML, which makes this technology more accessible for companies without specialized AI departments and staff.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# マルチドメインラベルを用いたハイブリッド学習映像モーメント検索 Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels ( http://arxiv.org/abs/2406.01791v1 ) ライセンス: Link先を確認	Weitong Cai, Jiabo Huang, Shaogang Gong,	(参考訳) ビデオモーメント検索(VMR)とは、与えられたテキストクエリ記述(文)により、未編集の生ビデオ中の視覚的時間モーメントを検索することである。既存の研究は、対象モーメントの時間的境界に関する徹底的なフレームワイズアノテーションの収集から始まり(十分に教師付き)、ビデオレベルのビデオテキストペアリングラベル(弱教師付き)のみで学習する。前者は、高価なアノテーションコストの下でデータセットの規模や多様性が制限されたため、未知の概念や、あるいは新しいシーンへの一般化に乏しく、後者は不完全なラベルから視覚的・テクスチュアルな誤相関を受けやすい。本研究では,共有ラベル空間を共有しない場合に,完全教師付きソースドメインから学習したビデオテキストマッチング関係を弱ラベルのターゲットドメインに適応させることにより,知識伝達による問題解決を目的としたハイブリッド学習ビデオモーメント検索という手法を提案する。本研究の目的は,弱層対象領域におけるモデル学習を改善するために,両領域間の共通知識を探索することである。具体的には、マルチプル分岐ビデオテキストアライメントモデル(EVA)を導入し、マルチモーダル特徴アライメントと、ドメイン不変の視覚的特徴とテキスト的特徴を最適化する。実験は、ソースドメインにおける時間セグメントアノテーションの探索におけるEVAの有効性を示し、ターゲットドメイン内の時間ラベルなしでビデオモーメント検索を学習するのに役立つ。 Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisation to unknown concepts and/or novel scenes due to restricted dataset scale and diversity under expensive annotation costs; the latter is subject to visual-textual mis-correlations from incomplete labels. In this work, we introduce a new approach called hybrid-learning video moment retrieval to solve the problem by knowledge transfer through adapting the video-text matching relationships learned from a fully-supervised source domain to a weakly-labelled target domain when they do not share a common label space. Our aim is to explore shared universal knowledge between the two domains in order to improve model learning in the weakly-labelled target domain. Specifically, we introduce a multiplE branch Video-text Alignment model (EVA) that performs cross-modal (visual-textual) matching information sharing and multi-modal feature alignment to optimise domain-invariant visual and textual features as well as per-task discriminative joint video-text representations. Experiments show EVA's effectiveness in exploring temporal segment annotations in a source domain to help learn video moment retrieval without temporal labels in a target domain.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 正規化逆強化学習によるリワードの伝達可能性向上に向けて Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning ( http://arxiv.org/abs/2406.01793v1 ) ライセンス: Link先を確認	Andreas Schlaginhaufen, Maryam Kamgarpour,	(参考訳) 逆強化学習(IRL)は、専門家によるデモンストレーションから報酬を推し進めることを目的としており、その報酬は政策ではなく、最も簡潔で伝達可能なタスクの記述である(Ng et al, 2000)。しかし、最適政策に対応する報酬はユニークではないため、その最適政策が専門家の真の報酬に対応する最適政策と一致しているという意味で、IRLが引き起こした報酬が新しい移行法に転送可能であるかどうかは不明である。過去の研究は、専門家の方針に完全にアクセスできるという前提の下でのみこの問題に対処しており、同じ報酬を持つ2人の専門家から学びながら、特定の階級条件を満たす異なる移行法則(ローランド等、2022年)を学ぶ際に、移行可能性を保証する。本研究は,専門家の方針に完全にアクセスして開発された条件が,専門家のデモンストレーションにのみアクセス可能なより実践的なシナリオにおいて,伝達可能性を保証することができないことを示す。双対階数条件の代わりに、遷移法則間の類似性と相似性のより洗練された尺度として主角を提案する。これに基づいて、我々は2つの重要な結果を確立する。 1) 十分に異なる移行法則を有する少なくとも2人の専門家から学ぶ際に、移行法則への移転可能性に関する十分な条件 2 一人の専門家から学ぶとき、移行法における局所的な変更に対する移転可能性の十分な条件。さらに,複数の専門家のデモンストレーションから伝達可能な報酬を学習するための,ほぼ正しいPACアルゴリズムとエンドツーエンド分析も提供する。 Inverse reinforcement learning (IRL) aims to infer a reward from expert demonstrations, motivated by the idea that the reward, rather than the policy, is the most succinct and transferable description of a task [Ng et al., 2000]. However, the reward corresponding to an optimal policy is not unique, making it unclear if an IRL-learned reward is transferable to new transition laws in the sense that its optimal policy aligns with the optimal policy corresponding to the expert's true reward. Past work has addressed this problem only under the assumption of full access to the expert's policy, guaranteeing transferability when learning from two experts with the same reward but different transition laws that satisfy a specific rank condition [Rolland et al., 2022]. In this work, we show that the conditions developed under full access to the expert's policy cannot guarantee transferability in the more practical scenario where we have access only to demonstrations of the expert. Instead of a binary rank condition, we propose principal angles as a more refined measure of similarity and dissimilarity between transition laws. Based on this, we then establish two key results: 1) a sufficient condition for transferability to any transition laws when learning from at least two experts with sufficiently different transition laws, and 2) a sufficient condition for transferability to local changes in the transition law when learning from a single expert. Furthermore, we also provide a probably approximately correct (PAC) algorithm and an end-to-end analysis for learning transferable rewards from demonstrations of multiple experts.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# ブロックチェーン検証器のジレンマに対するPeer-Predictionソリューションは2つ It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma ( http://arxiv.org/abs/2406.01794v1 ) ライセンス: Link先を確認	Zishuo Zhao, Xi Chen, Yuan Zhou,	(参考訳) ブロックチェーンシステムのセキュリティは、基本的には、大多数の当事者が誠実に振る舞う分散コンセンサスに基づいており、ブロックチェーンシステムの堅牢性を維持するためには、コンテンツ検証のプロセスが不可欠である。しかし、不正行為者が少ない、あるいは全くないセキュアなブロックチェーンシステムが、検証者が正直に検証を行うのに十分なインセンティブを与えられないという現象は、検証者のジレンマと呼ばれ、ブロックチェーンシステムの基本的なセキュリティを著しく損なう可能性がある。既存の研究は遅延検証の非インセンティブ化のために意図的にエラーを挿入しようと試みているが、分散環境は検証の正しさを判断したり、悪意のある検証を直接検出することは不可能である。本稿では,複数の検証者間での分散検証ゲームのためのベイズ的真理機構の設計に対するピア予測手法を活用する研究を開始し,検証プロセスにおけるノイズ観測の存在下においても,基礎的真理にアクセスせずに誠実な検証を行うよう,検証者全員にインセンティブを与える。理論的に検証ゲームのメカニズムの真実性を保証することで、当社の作業は、ブロックチェーンやその他の分散システムのセキュリティと堅牢性を向上する検証メカニズムのフレームワークを提供します。 The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 連続的視力計測における撮影・移動の実証的影響 The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry ( http://arxiv.org/abs/2406.01797v1 ) ライセンス: Link先を確認	Paolo Cudrano, Xiaoyu Luo, Matteo Matteucci,	(参考訳) ロボティクスが進歩を続けるにつれ、適応的で継続的な学習を行うエージェントの必要性が高まり、特に補助ロボティクスの領域ではその必要性が高まっている。迅速な適応性と長期情報保持は、人間の日常生活に典型的な動的な環境での運用に不可欠である。そのため、生涯学習パラダイムが必要であるが、現在のロボティクス文献ではほとんど扱われていない。本研究は, 連続的にトレーニングされたニューラルネットワークにおいて, 破滅的記憶の影響と, 知識伝達の有効性を実験的に検討した。我々は, 自己局在化の実現において, エンボディエージェントが重要な役割を担う視覚計測の課題に焦点をあてる。本研究は,屋内空間間の離散的な遷移の簡易な連続シナリオを,異なるアパートをナビゲートするロボットと類似した実験である。本体制では,環境間の移動性が高い初期満足度性能を観察し,その後,モデルが一般化を犠牲にして現在の環境固有の知識を優先する特殊化段階を呈する。従来の正規化戦略とモデル容量の増加は、この現象を緩和するのに効果がないことを証明している。リハーサルは軽度に有益だが、かなりのメモリコストがかかる。動作情報を組み込むことは、具体的設定で一般的に行われるように、より早く収束しやすくするが、特殊化を悪化させるので、モデルは動きの期待に過度に依存し、視覚的手がかりを正しく解釈することができない。これらの知見は、生涯ロボット工学における適応と記憶保持のバランスをとることのオープンな課題を強調し、生涯パラダイムの実施に関する貴重な洞察をエンボディエージェントに適用することに貢献した。 As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 人口動態のオンライン制御 Online Control in Population Dynamics ( http://arxiv.org/abs/2406.01799v1 ) ライセンス: Link先を確認	Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun,	(参考訳) 人口動態の研究は初期の社会学的な著作(Malthus, 1872)から始まったが、その後生物学、疫学、進化ゲーム理論、経済学など多くの分野に及んだ。人口動態に関するほとんどの研究は、制御よりも予測の問題に焦点を当てている。既存の人口制御の数学的モデルは、しばしば特定のノイズのない力学に制限されるが、現実の人口変動は複雑で敵対的である。このギャップに対処するために,オンライン制御のパラダイムに基づく新しいフレームワークを提案する。まず、進化する個体群を自然にモデル化できる線形力学系の集合を特徴づける。次に、これらのシステムに対して、線形ポリシーの幅広いクラスに関して、ほぼ最適な後悔境界を持つ効率的な勾配ベースの制御を与える。実験により,SIRやレプリケータダイナミクスなどの非線形モデルにおいても,提案アルゴリズムによる個体群制御の有効性が示された。 The study of population dynamics originated with early sociological works (Malthus, 1872) but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for population control are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial. To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for population control even in non-linear models such as SIR and replicator dynamics.	翻訳日:2024-06-05 20:52:25 公開日:2024-06-03
# 期待伝播におけるFearless Stochasticity Fearless Stochasticity in Expectation Propagation ( http://arxiv.org/abs/2406.01801v1 ) ライセンス: Link先を確認	Jonathan So, Richard E. Turner,	(参考訳) 予測伝搬 (EP) は確率論的モデルにおいて近似推論を行うアルゴリズムの一群である。 EPの更新には、モンテカルロ(MC)のサンプルから推定できるモーメント(特定の機能の期待)の評価が含まれる。しかし、更新は直感的に行うとMCノイズに対して堅牢ではなく、様々な先行研究が様々な方法でこの問題に対処しようと試みている。本研究では,EPのモーメントマッチング更新に対する新たな視点,すなわち,変動目的の自然な漸進的最適化を実現することを提案する。我々はこの洞察を用いて2つの新しいEP変異体を動機付け、特にMC推定に適した更新を行い、安定であり、単一のサンプルで見積もると最もサンプル効率が高い。これらの新しいバリエーションは、前者の利点と重要な弱点に対処するものである。特に、チューニングが容易で、スピード精度の向上されたトレードオフを提供し、デバイアス推定器の使用に依存しない。様々な確率的推論タスクにおいて有効性を示す。 Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation; they remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# TabMDA: In-context Subsetting を用いた変換器を用いた任意の分類器に対するタブラルマニフォールドデータ拡張 TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting ( http://arxiv.org/abs/2406.01805v1 ) ライセンス: Link先を確認	Andrei Margeloiu, Adrián Bazaga, Nikola Simidjievski, Pietro Liò, Mateja Jamnik,	(参考訳) タブラルデータは多くの臨界領域で広く使われているが、大量に取得することはしばしば困難である。この不足は、通常、そのようなデータ上での機械学習モデルの性能の低下をもたらす。データ拡張(Data Augmentation)は、視覚と言語タスクのパフォーマンス向上のための一般的な戦略であり、通常、入力空間に明示的な対称性が欠如しているため、表形式のデータではパフォーマンスが低下する。この課題を克服するために,表データの多様体データ拡張法であるTabMDAを導入する。この方法は、TabPFNのような事前訓練されたインコンテキストモデルを使用して、データを多様体空間にマッピングする。 TabMDAは、さまざまなコンテキストでデータを複数回エンコードすることで、ラベル不変変換を実行する。このプロセスは、基礎となるインコンテキストモデルの多様体を探索し、トレーニングデータセットを拡大する。 TabMDAはトレーニング不要のメソッドであり、任意の分類器に適用できる。我々は,5つの標準分類器上でTabMDAを評価し,様々な表付きデータセット間での大幅な性能向上を観察した。この結果から,TabMDAは,事前学習したテキスト内モデルの情報を有効活用し,下流の分類器の性能を向上させることができることを示した。 Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into a manifold space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the manifold of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it applicable to any classifier. We evaluate TabMDA on five standard classifiers and observe significant performance improvements across various tabular datasets. Our results demonstrate that TabMDA provides an effective way to leverage information from pre-trained in-context models to enhance the performance of downstream classifiers.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 文脈型シーケンスの類似性:自然言語生成における信頼度向上 Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation ( http://arxiv.org/abs/2406.01806v1 ) ライセンス: Link先を確認	Zhen Lin, Shubhendu Trivedi, Jimeng Sun,	(参考訳) 大規模言語モデル(LLM)の出現は、多くの自然言語生成タスクにおいて、最先端の技術を劇的に進歩させてきた。 LLMを確実に適用するには、その信頼性を正確に測定することが不可欠である。現在最も一般的に使われている信頼スコア関数は、生成されたシーケンスの確率であり、セマンティックおよび構文成分を混同している。例えば、質問応答(QA)タスクでは、正しい答えの曖昧な表現は、より低い確率予測をもたらす。さらに、異なるトークンはコンテキストによって異なる重み付けをすべきである。本研究では,LLMから抽出した注目値を用いて,様々なトークンに異なる重みを割り当てることで,予測シーケンスの確率を向上させることを提案する。検証セットを用いることで、関連する注意ヘッドを識別し、バニラシーケンスの確率信頼度測定の信頼性を大幅に向上させることができる。我々は、この新しいスコアをContextualized Sequence Likelihood (CSL)と呼ぶ。 CSLは実装が容易で、高速で計算でき、タスク固有のプロンプトでさらに改善する可能性がある。いくつかのQAデータセットと多種多様なLLMの範囲で、CSLはAUROCやAUARCで測定されたように、生成品質の予測において最先端のベースラインよりもはるかに高い信頼性を示している。 The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components. For instance, in question-answering (QA) tasks, an awkward phrasing of the correct answer might result in a lower probability prediction. Additionally, different tokens should be weighted differently depending on the context. In this work, we propose enhancing the predicted sequence probability by assigning different weights to various tokens using attention values elicited from the base LLM. By employing a validation set, we can identify the relevant attention heads, thereby significantly improving the reliability of the vanilla sequence probability confidence measure. We refer to this new score as the Contextualized Sequence Likelihood (CSL). CSL is easy to implement, fast to compute, and offers considerable potential for further improvement with task-specific prompts. Across several QA datasets and a diverse array of LLMs, CSL has demonstrated significantly higher reliability than state-of-the-art baselines in predicting generation quality, as measured by the AUROC or AUARC.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 物理特性の文脈内学習:分布外分子グラフへのFew-Shot適応 In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs ( http://arxiv.org/abs/2406.01808v1 ) ライセンス: Link先を確認	Grzegorz Kaszuba, Amirhossein D. Naghdi, Dario Massa, Stefanos Papanikolaou, Andrzej Jaszkiewicz, Piotr Sankowski,	(参考訳) 大規模な言語モデルでは、提供されたサンプルのシーケンスへの少数ショット適応の能力を示す。この振る舞いは、インコンテキスト学習(in-context learning)として知られるもので、推論中のみに非自明な機械学習タスクを実行することができる。この研究で、我々は、イン・コンテクスト・ラーニングを利用して、配布外物質特性を予測できるだろうか? しかし、効率的な手法が変圧器モデルに原子レベルの幾何学的特徴を渡すことがなければ、構造特性予測タスクでは不可能である。この問題に対処するために、GPT-2が幾何認識型グラフニューラルネットワークの出力に作用し、コンテキスト内情報に適応する複合モデルを用いる。モデルの能力を実証するために、QM9データセットを共通のサブ構造を共有する分子列に分割し、コンテキスト内学習に使用します。このアプローチは, 一般グラフニューラルネットワークモデルを上回る分布外例において, モデルの性能を著しく向上させる。 Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# ゲノミクス概要統計の共有におけるプライバシとユーティリティのトレードオフに対するゲーム理論的アプローチ A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics ( http://arxiv.org/abs/2406.01811v1 ) ライセンス: Link先を確認	Tao Zhang, Rajagopal Venkatesaramani, Rajat K. De, Bradley A. Malin, Yevgeniy Vorobeychik,	(参考訳) オンラインゲノミクスデータ共有サービスの出現は、サマリ統計などの遺伝的変異に関するクエリーを許可し、スプリケートなゲノム変異と臨床的意義を区別するケアプロバイダを支援することによって、大きなゲノムデータセットのアクセシビリティを高めることを目指している。しかし、多くの研究は、要約ゲノム情報を共有することでさえ、そのようなデータセットの個々のメンバーを、メンバーシップ推論攻撃による重大なプライバシーリスクに晒すことを実証している。ノイズを追加したり、共有する情報の量を減らしたりすることでプライバシーリスクを減らすいくつかのアプローチが出現しているが、これらは通常、比例テスト(LRT)統計を用いた非適応攻撃を前提としている。本稿では,ゲノムサマリー統計の共有において,最適なプライバシ・ユーティリティ・トレードオフのためのベイズゲーム理論フレームワークを提案する。我々の最初の貢献は、我々のゲーム理論的アプローチを定着させる非常に一般的なベイズ攻撃モデルが従来のLRTベースの脅威モデルよりも強力であることを証明することである。攻撃者が非インフォームティブな主観的前者を用いた場合であっても、これは事実であることを示す。次に,ベイズ攻撃と任意の主観的先行点と,微分プライバシーフレームワークに共通するガウス機構の下でのナイマン・ピアソン最適LRT攻撃との比較を行う。最後に、ディープニューラルネットワーク生成器を用いてプレイヤーのベイズ・ナッシュ均衡を近似し、プレイヤーの混合戦略を暗黙的に表現する手法を提案する。実験により,提案したゲーム理論の枠組みは,最先端技術よりも強力な攻撃と強力な防衛戦略をもたらすことが示された。 The advent of online genomic data-sharing services has sought to enhance the accessibility of large genomic datasets by allowing queries about genetic variants, such as summary statistics, aiding care providers in distinguishing between spurious genomic variations and those with clinical significance. However, numerous studies have demonstrated that even sharing summary genomic information exposes individual members of such datasets to a significant privacy risk due to membership inference attacks. While several approaches have emerged that reduce privacy risks by adding noise or reducing the amount of information shared, these typically assume non-adaptive attacks that use likelihood ratio test (LRT) statistics. We propose a Bayesian game-theoretic framework for optimal privacy-utility tradeoff in the sharing of genomic summary statistics. Our first contribution is to prove that a very general Bayesian attacker model that anchors our game-theoretic approach is more powerful than the conventional LRT-based threat models in that it induces worse privacy loss for the defender who is modeled as a von Neumann-Morgenstern (vNM) decision-maker. We show this to be true even when the attacker uses a non-informative subjective prior. Next, we present an analytically tractable approach to compare the Bayesian attacks with arbitrary subjective priors and the Neyman-Pearson optimal LRT attacks under the Gaussian mechanism common in differential privacy frameworks. Finally, we propose an approach for approximating Bayes-Nash equilibria of the game using deep neural network generators to implicitly represent player mixed strategies. Our experiments demonstrate that the proposed game-theoretic framework yields both stronger attacks and stronger defense strategies than the state of the art.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# シリコンマイクロリング共振器非線形性に基づく時間遅延貯留層計算のメモリ容量解析 Memory Capacity Analysis of Time-delay Reservoir Computing Based on Silicon Microring Resonator Nonlinearities ( http://arxiv.org/abs/2406.01812v1 ) ライセンス: Link先を確認	Bernard J. Giron Castro, Christophe Peucheret, Francesco Da Ros,	(参考訳) シリコンマイクロリング共振器(MRR)はフォトニック貯水池計算(RC)方式の非線形ノードとして機能する可能性が強い。自由キャリア分散(FCD)や熱光学効果(TO)の影響などシリコンMRR内の非線形性を利用することで、RCの入力データを高次元空間にマッピングすることができる。さらに、MRRのスルーとポートの間に外部導波路を追加することで、メモリを拡張したTDRC(Time-delay RC)を実装することができる。スルーポートからの入力は、外部導波路によって印加された遅延がメモリを効果的に加算することでリングの加算ポートにフィードバックされる。 TDRCでは、ノードは時間内に多重化され、それぞれの時間進化がドロップポートで検出される。 MRRに基づくTDRCの性能は、MRRの非線形性の量に大きく依存する。非線形効果は、その効果の寿命を決定するため、MRRの物理的性質に依存する。もう一つの要因はMRR応答の安定性であり、ドロップポートにおける強い時間領域の不連続性は自己パルス(高い非線形挙動)によってFCD非線形性から生じることが知られている。しかし、最適性能を達成するために、あるタスクにRCが必要とする正しい非線形性の定量化は困難である。したがって、このTDRCセットアップの非線形力学を完全に理解するためには、さらなる解析が必要である。本稿では, 先述したマイクロリング型TDRC方式の非線形・線形メモリ容量を, 発生したキャリアの時間定数とTO効果の熱の関数として定量化する。本稿では,パラメータ空間を生成するTDRC力学の特性を,入力信号パワーと周波数調整範囲の観点から解析する。 Silicon microring resonators (MRRs) have shown strong potential in acting as the nonlinear nodes of photonic reservoir computing (RC) schemes. By using nonlinearities within a silicon MRR, such as the ones caused by free-carrier dispersion (FCD) and thermo-optic (TO) effects, it is possible to map the input data of the RC to a higher dimensional space. Furthermore, by adding an external waveguide between the through and add ports of the MRR, it is possible to implement a time-delay RC (TDRC) with enhanced memory. The input from the through port is fed back into the add port of the ring with the delay applied by the external waveguide effectively adding memory. In a TDRC, the nodes are multiplexed in time, and their respective time evolutions are detected at the drop port. The performance of MRR-based TDRC is highly dependent on the amount of nonlinearity in the MRR. The nonlinear effects, in turn, are dependent on the physical properties of the MRR as they determine the lifetime of the effects. Another factor to take into account is the stability of the MRR response, as strong time-domain discontinuities at the drop port are known to emerge from FCD nonlinearities due to self-pulsing (high nonlinear behaviour). However, quantifying the right amount of nonlinearity that RC needs for a certain task in order to achieve optimum performance is challenging. Therefore, further analysis is required to fully understand the nonlinear dynamics of this TDRC setup. Here, we quantify the nonlinear and linear memory capacity of the previously described microring-based TDRC scheme, as a function of the time constants of the generated carriers and the thermal of the TO effects. We analyze the properties of the TDRC dynamics that generate the parameter space, in terms of input signal power and frequency detuning range, over which conventional RC tasks can be satisfactorily performed by the TDRC scheme.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 拡散隆起木 Diffusion Boosted Trees ( http://arxiv.org/abs/2406.01813v1 ) ライセンス: Link先を確認	Xizewen Han, Mingyuan Zhou,	(参考訳) 拡散確率モデルと勾配促進モデルの両方の利点を組み合わせて、拡散促進パラダイムを導入し、教師付き学習問題に対処する。本研究では,決定木によってパラメータ化される新たな拡散生成モデル(拡散時間ステップ毎に1本ずつの1本木)と,弱学習者を条件分布の強い学習者に組み合わせた新しいブースティングアルゴリズムを,その密度形式に関するパラメトリックな仮定を行うことなく開発する。実験により,深層ニューラルネットワークに基づく拡散モデルに対するDBTの利点と,実世界の回帰タスクにおけるDBTの能力を示すとともに,遅延学習能力を備えた表データの分類のためのDBTのビジネスアプリケーション(詐欺検出)を提案する。 Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorithm that combines the weak learners into a strong learner of conditional distributions without making explicit parametric assumptions on their density forms. We demonstrate through experiments the advantages of DBT over deep neural network-based diffusion models as well as the competence of DBT on real-world regression tasks, and present a business application (fraud detection) of DBT for classification on tabular data with the ability of learning to defer.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 教師なし細胞セグメンテーションのための深部非対称混合モデル Deep asymmetric mixture model for unsupervised cell segmentation ( http://arxiv.org/abs/2406.01815v1 ) ライセンス: Link先を確認	Yang Nan, Guang Yang,	(参考訳) 手指の脱線は過度に退屈で主観的であるため、疾患の診断や薬物発見において、細胞分画の自動化がますます重要になっている。この問題を解決するために、研究者は半教師なしセグメンテーションアプローチを開発した。これらのアプローチの中で、ディープガウス混合モデルは、複雑なデータ分散を促進する能力のために重要な役割を果たす。しかし、これらのモデルは、データが対称正規分布に従うと仮定し、非対称分布のデータには適用できない。これらのモデルもまた、一般化能力の弱い障害であり、外れ値に敏感である。これらの問題に対処するために, 教師なし細胞分割のための新しい非対称混合モデルを提案する。この非対称混合モデルは、ある多変量ガウス混合モデルをログ状および自己教師付き最適化関数で集約することによって構築される。提案した非対称混合モデルは, セグメントを含むセルセグメンテーションにおける既存の非教師なしモデルよりも優れている(ダイス係数が約2-30%向上, p<0.05)。 Automated cell segmentation has become increasingly crucial for disease diagnosis and drug discovery, as manual delineation is excessively laborious and subjective. To address this issue with limited manual annotation, researchers have developed semi/unsupervised segmentation approaches. Among these approaches, the Deep Gaussian mixture model plays a vital role due to its capacity to facilitate complex data distributions. However, these models assume that the data follows symmetric normal distributions, which is inapplicable for data that is asymmetrically distributed. These models also obstacles weak generalization capacity and are sensitive to outliers. To address these issues, this paper presents a novel asymmetric mixture model for unsupervised cell segmentation. This asymmetric mixture model is built by aggregating certain multivariate Gaussian mixture models with log-likelihood and self-supervised-based optimization functions. The proposed asymmetric mixture model outperforms (nearly 2-30% gain in dice coefficient, p<0.05) the existing state-of-the-art unsupervised models on cell segmentation including the segment anything.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 量子cpoのカテゴリ Categories of quantum cpos ( http://arxiv.org/abs/2406.01816v1 ) ライセンス: Link先を確認	Andre Kornell, Bert Lindenhovius, Michael Mislove,	(参考訳) 本論文は2つの研究線をまとめる。 1つ目は、量子プログラミング言語とその型システムの分類モデルを見つけることである。第2の行は、これらの構造の非可換一般化(量子一般化とも呼ばれる)を見つけることにつながる数学的構造の量子化のプログラムに関するものである。離散量子化と呼ばれる量子化法は、本質的にはフォン・ノイマン代数と量子関係の圏における構造の内部化に相当し、$\omega$-complete partial order (cpos) の非可換な一般化を見出す。 CPOはドメイン理論の中心であり、プログラミング言語の分類モデルを構築するために広く利用されている。量子cposはcposに類似した分類特性を持ち、量子プログラミング言語の分類モデルの構築に適していることが、いくつかの例で示される。このため、量子cposは将来の量子領域理論のバックボーンを形成することができる。 This paper unites two research lines. The first involves finding categorical models of quantum programming languages and their type systems. The second line concerns the program of quantization of mathematical structures, which amounts to finding noncommutative generalizations (also called quantum generalizations) of these structures. Using a quantization method called discrete quantization, which essentially amounts to the internalization of structures in a category of von Neumann algebras and quantum relations, we find a noncommutative generalization of $\omega$-complete partial orders (cpos), called quantum cpos. Cpos are central in domain theory, and are widely used to construct categorical models of programming languages. We show that quantum cpos have similar categorical properties to cpos and are therefore suitable for the construction of categorical models for quantum programming languages, which is illustrated with some examples. For this reason, quantum cpos may form the backbone of a future quantum domain theory.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 教師なし学習と教師なし学習を併用した長期フェーン再建 Long-term foehn reconstruction combining unsupervised and supervised learning ( http://arxiv.org/abs/2406.01818v1 ) ライセンス: Link先を確認	Reto Stauffer, Achim Zeileis, Georg J. Mayr,	(参考訳) 急激な気温上昇と風速変化を特徴とするフォーン・ウィンドは、山火事の広がりによって、山腹側(例:山火事)に著しく影響した。気候変動の下でフォアーンがどのように変化するかを理解することが重要である。残念ながら、フォアーンを直接測定することはできないが、適切な分類法を用いて気象観測から推定する必要がある。したがって、このアプローチは通常、必要なデータが利用可能な特定の期間に限られる。本稿では,教師なしおよび教師付き確率論的統計的学習法の組み合わせを用いて,歴史的フォアーン発生を再現する新しい手法を提案する。本研究は,教師なし学習者(有限混合モデル)を訓練するためのin-situ測定(ここ数十年で利用可能)を利用する。これらのラベル付きデータは、教師付き学習者(ラスソまたはブースティング)を使用して、再分析データ(長い期間をカバーする)にリンクされる。これにより、分析データのみに基づいて過去のフォアーン確率を再構築することができる。この手法をスイスとオーストリアの6つの駅のERA5の再解析データに適用すると、1940年に遡る北のフォアーンと南のフォアーンの正確な時間的復元が達成される。このことは、過去83年間に季節的なフォアーンパターンがどのように進化してきたかを調査する方法であり、これらの臨界風事象に対する気候変動の影響についての貴重な洞察を与えてくれる。 Foehn winds, characterized by abrupt temperature increases and wind speed changes, significantly impact regions on the leeward side of mountain ranges, e.g., by spreading wildfires. Understanding how foehn occurrences change under climate change is crucial. Unfortunately, foehn cannot be measured directly but has to be inferred from meteorological measurements employing suitable classification schemes. Hence, this approach is typically limited to specific periods for which the necessary data are available. We present a novel approach for reconstructing historical foehn occurrences using a combination of unsupervised and supervised probabilistic statistical learning methods. We utilize in-situ measurements (available for recent decades) to train an unsupervised learner (finite mixture model) for automatic foehn classification. These labeled data are then linked to reanalysis data (covering longer periods) using a supervised learner (lasso or boosting). This allows to reconstruct past foehn probabilities based solely on reanalysis data. Applying this method to ERA5 reanalysis data for six stations across Switzerland and Austria achieves accurate hourly reconstructions of north and south foehn occurrence, respectively, dating back to 1940. This paves the way for investigating how seasonal foehn patterns have evolved over the past 83 years, providing valuable insights into climate change impacts on these critical wind events.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# データ駆動型スペクトルフォアサイト・プルーニングによる視覚モデルにおけるロッキーティケットの発見 Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning ( http://arxiv.org/abs/2406.01820v1 ) ライセンス: Link先を確認	Leonardo Iurada, Marco Ciccone, Tatiana Tommasi,	(参考訳) ニューラルネットワークプルーニングの最近の進歩は、トレーニング前にディープラーニングモデルの計算コストとメモリ要求を削減できることを示している。我々は,この枠組みに焦点をあて,ニューラルタンジェントカーネル(NTK)理論を利用して,スパースネットワークのトレーニングダイナミクスと高密度ネットワークのトレーニングダイナミクスを整合させる新しい初期化アルゴリズムを提案する。具体的には、ニューラルネットワークを個別の経路に分解して得られたNTKのトレースに解析上界を与えることにより、NTKスペクトルの通常無視されるデータ依存成分を考慮に入れる方法を示す。これはNTKのトレースに大きく影響するパラメータを保存するために設計された、先見的なプルーニング手法であるPath eXclusion(PX)につながります。 PXは、高い空き地でも宝くじ(つまり良い道)を見つけることができ、追加の訓練の必要性を大幅に減らすことができる。事前訓練されたモデルに適用すると、いくつかの下流タスクで直接使用できるサブネットワークを抽出し、結果として高密度のタスクに匹敵するパフォーマンスを得るが、かなりのコストと計算コストを節約できる。 https://github.com/iurada/px-ntk-pruning Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://github.com/iurada/px-ntk-pruning	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# GPUによるルール評価と進化 GPU-Accelerated Rule Evaluation and Evolution ( http://arxiv.org/abs/2406.01821v1 ) ライセンス: Link先を確認	Hormoz Shahrzad, Risto Miikkulainen,	(参考訳) 本稿では、進化的ルールに基づく機械学習(ERL)の効率性とスケーラビリティを高めるための革新的なアプローチを紹介する。従来のERLシステムは複数のCPUに分散できるが、特に大規模なデータセットでは、候補ルールの適合性評価がボトルネックとなっている。本稿では, AERL (Accelerated ERL) がこの問題を2つの方法で解決する手法を提案する。まず、PyTorchフレームワーク内でのテンソル化表現によるGPU最適化ルールセットの採用により、AERLはボトルネックを緩和し、フィットネス評価を大幅に加速する。第二に、AERLはバックプロパゲーションにより規則係数を微調整することでGPUをさらに活用し、探索空間探索を改善する。実験的な証拠は、AERL検索がより速く、より効果的であることを確認し、説明可能な人工知能に力を与える。 This paper introduces an innovative approach to boost the efficiency and scalability of Evolutionary Rule-based machine Learning (ERL), a key technique in explainable AI. While traditional ERL systems can distribute processes across multiple CPUs, fitness evaluation of candidate rules is a bottleneck, especially with large datasets. The method proposed in this paper, AERL (Accelerated ERL) solves this problem in two ways. First, by adopting GPU-optimized rule sets through a tensorized representation within the PyTorch framework, AERL mitigates the bottleneck and accelerates fitness evaluation significantly. Second, AERL takes further advantage of the GPUs by fine-tuning the rule coefficients via back-propagation, thereby improving search space exploration. Experimental evidence confirms that AERL search is faster and more effective, thus empowering explainable artificial intelligence.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 低条件独立試験による因果発見 Causal Discovery with Fewer Conditional Independence Tests ( http://arxiv.org/abs/2406.01823v1 ) ライセンス: Link先を確認	Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler,	(参考訳) 科学における多くの疑問は、因果関係を理解するという根本的な問題に関するものである。しかし、よく評価されたPCアルゴリズムを含むほとんどの制約ベースの因果探索アルゴリズムは、しばしば指数関数的な数の条件独立(CI)テストを引き起こし、様々なアプリケーションで制限を課している。これに対応するために、当社の作業は、基礎となる因果グラフについて何が学べるかを、CIテストの数を減らすことで特徴づけることに重点を置いています。隠れ因果グラフの粗い表現を多項式数で学習することは可能であることを示す。この粗い表現はCausal Consistent Partition Graph (CCPG) と呼ばれ、頂点の分割と、そのコンポーネント上で定義された有向グラフからなる。 CCPGは、配向の一貫性と、より微細な分割を好む追加の制約を満たす。さらに、因果グラフが識別可能であれば、根底にある因果グラフに還元される。その結果,観測データと潜在的に追加的な介入によって因果グラフが完全に識別可能な場合において,真の因果グラフを多項式数で復元するための最初の効率的なアルゴリズムが得られた。 Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of CI tests. We show that it is possible to a learn a coarser representation of the hidden causal graph with a polynomial number of tests. This coarser representation, named Causal Consistent Partition Graph (CCPG), comprises of a partition of the vertices and a directed graph defined over its components. CCPG satisfies consistency of orientations and additional constraints which favor finer partitions. Furthermore, it reduces to the underlying causal graph when the causal graph is identifiable. As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a polynomial number of tests, in special cases where the causal graph is fully identifiable through observational data and potentially additional interventions.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# EMOE:ロバストな不確実性に基づく拒絶のための専門家の広範囲なマッチング EMOE: Expansive Matching of Experts for Robust Uncertainty Based Rejection ( http://arxiv.org/abs/2406.01825v1 ) ライセンス: Link先を確認	Yunni Qu, James Wellnitz, Alexander Tropsha, Junier Oliva,	(参考訳) Expansive Matching of Experts (EMOE) は, アウト・オブ・ディストリビューション(OOD)点に基づく予測と不確実性に基づく拒絶を改善するために, サポート拡張, 補間的擬似ラベルを用いた新しい手法である。本稿では,潜在空間におけるOODインスタンスを生成する拡張データ拡張手法と,擬似ラベル処理のための拡張拡張点をフィルタリングするための実証実験に基づくアプローチを提案する。 EMOEは、複数のベースエキスパートの多様なセットを、拡張データ上の擬似ラベルとして使用して、複数のヘッドを持つ共有MLP(専門家1人)を通じて、OODのパフォーマンスを改善する。 EMOEは表データの最先端手法に比べて優れた性能を示すことを示す。 Expansive Matching of Experts (EMOE) is a novel method that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty based rejection on out-of-distribution (OOD) points. We propose an expansive data augmentation technique that generates OOD instances in a latent space, and an empirical trial based approach to filter out augmented expansive points for pseudo-labeling. EMOE utilizes a diverse set of multiple base experts as pseudo-labelers on the augmented data to improve OOD performance through a shared MLP with multiple heads (one per expert). We demonstrate that EMOE achieves superior performance compared to state-of-the-art methods on tabular data.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# FacAID : 神経・筋肉のファサード再構築のためのトランスフォーマーモデル FacAID: A Transformer Model for Neuro-Symbolic Facade Reconstruction ( http://arxiv.org/abs/2406.01829v1 ) ライセンス: Link先を確認	Aleksander Płocharski, Jan Swidzinski, Joanna Porter-Sobieraj, Przemyslaw Musialski,	(参考訳) 本稿では, 階層型ファサード構造をカスタム設計のスプリット文法を用いて手続き的定義に変換するニューロシンボリックトランスフォーマーモデルを提案する。そこで我々はまず,建築ファサードに適した半複素分割文法を開発し,それに対応する手続き表現とともにファサードからなるデータセットを生成する。このデータセットはトランスモデルをトレーニングするために使われ、セグメント化された平坦なファサードを文法の手続き言語に変換する。推論の間、この学習された変換を新しいファサードセグメンテーションに適用し、ユーザーが様々なファサードデザインを生成するように調整できる手続き的表現を提供する。この方法は静的ファサード画像の動的で編集可能なプロシージャフォーマットへの変換を自動化するだけでなく、設計の柔軟性を高め、アーキテクトやデザイナによる変更やバリエーションを容易にする。本手法は、手続き生成の精度とニューロシンボリック学習の適応性を組み合わせることで、ファサード設計の新たな標準を定めている。 We introduce a neuro-symbolic transformer-based model that converts flat, segmented facade structures into procedural definitions using a custom-designed split grammar. To facilitate this, we first develop a semi-complex split grammar tailored for architectural facades and then generate a dataset comprising of facades alongside their corresponding procedural representations. This dataset is used to train our transformer model to convert segmented, flat facades into the procedural language of our grammar. During inference, the model applies this learned transformation to new facade segmentations, providing a procedural representation that users can adjust to generate varied facade designs. This method not only automates the conversion of static facade images into dynamic, editable procedural formats but also enhances the design flexibility, allowing for easy modifications and variations by architects and designers. Our approach sets a new standard in facade design by combining the precision of procedural generation with the adaptability of neuro-symbolic learning.	翻訳日:2024-06-05 20:42:35 公開日:2024-06-03
# 人間-ロボットインタラクションシナリオにおけるマーカレス多人数追跡のためのロバストフィルタ A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios ( http://arxiv.org/abs/2406.01832v1 ) ライセンス: Link先を確認	Enrico Martini, Harshil Parekh, Shaoting Peng, Nicola Bombieri, Nadia Figueroa,	(参考訳) 自然でマーカーのない人間-ロボットのインタラクション(HRI)は、物理的マーカーのないシームレスなコラボレーションのビジョンによって、長年にわたるロボット研究の焦点となっている。マーカレスアプローチは、ユーザエクスペリエンスの向上を約束するが、人間のポーズ推定(HPE)とディープカメラにおける本質的なエラーによって引き起こされる課題に、最先端の技術は苦労する。これらのエラーは、ロボットのジッタリングのような問題を引き起こす可能性がある。本研究では,HPEバックボーンと1台のRGB-Dカメラから不完全な3Dポーズを洗練し,これらの課題に対処するフィルタパイプラインを提案する。実験結果から,提案フィルタを用いることで,より一貫した雑音のない動きの表現が可能となり,予期せぬロボットの動きを低減し,よりスムーズな対話を可能にした。 Pursuing natural and marker-less human-robot interaction (HRI) has been a long-standing robotics research focus, driven by the vision of seamless collaboration without physical markers. Marker-less approaches promise an improved user experience, but state-of-the-art struggles with the challenges posed by intrinsic errors in human pose estimation (HPE) and depth cameras. These errors can lead to issues such as robot jittering, which can significantly impact the trust users have in collaborative systems. We propose a filtering pipeline that refines incomplete 3D human poses from an HPE backbone and a single RGB-D camera to address these challenges, solving for occlusions that can degrade the interaction. Experimental results show that using the proposed filter leads to more consistent and noise-free motion representation, reducing unexpected robot movements and enabling smoother interaction.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# CAFO:時系列分類における特徴中心的説明 CAFO: Feature-Centric Explanation on Time Series Classification ( http://arxiv.org/abs/2406.01833v1 ) ライセンス: Link先を確認	Jaeho Kim, Seok-Ju Hahn, Yoontae Hwang, Junghye Lee, Seulki Lee,	(参考訳) 多変量時系列分類(MTS)では、MTSデータの複雑で高次元の性質、複雑な時間ダイナミクス、ドメイン固有の解釈の必要性から、モデル性能の重要な特徴(例えばセンサ)を見つけることは極めて困難である。 MTSの現在の説明法は、主に時間中心の説明に焦点を当てており、重要な期間を特定できるが、重要な特徴を特定するのにはあまり効果がない。この制限は、時間中心の分析を補完する重要で見過ごされがちな、機能中心のアプローチの必要性を浮き彫りにする。このギャップを埋めるために,本稿ではCAFO(Channel Attention and Feature Orthgonalization)という,MCSのための特徴中心の説明・評価フレームワークを提案する。 CAFOは、チャネルアテンション機構を備えた畳み込みベースのアプローチを採用し、ディープワイドな分離可能なチャネルアテンションモジュール(DepCA)とQR分解に基づくロスを取り入れ、機能ワイドな直交性を促進する。この直交化により、注意分布の分離性が向上し、特徴量のランク付けと安定化が図られる。この機能的ランキングの改善は、MSSの機能的説明可能性の理解を高める。さらに,グローバルな特徴とクラス固有の特徴の重要度を評価する指標を開発する。我々のフレームワークの有効性は、2つの主要な公開ベンチマークと実世界のデータセットに関する広範な実証分析によって検証される。 MTS分類作業における特徴量評価におけるCAFOの頑健さと情報伝達能力を確認した。本研究は,MTSにおける特徴中心的説明の理解を深めるだけでなく,特徴中心的説明の今後の探求の基盤となる。 In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# ウィキペディアの可読性向上のためのオープン多言語システム An Open Multilingual System for Scoring Readability of Wikipedia ( http://arxiv.org/abs/2406.01835v1 ) ライセンス: Link先を確認	Mykola Trokhymovych, Indira Sen, Martin Gerlach,	(参考訳) 6000万以上の記事があり、ウィキペディアはオープンで自由にアクセスできる知識のための最大のプラットフォームになっている。月間ビジター数は1500億を超えているが、テキストの読みやすさが欠如しているため、多くの読者にはアクセスできないと信じられている。しかし、ウィキペディアの可読性に関する以前の調査は英語のみに限定されており、現在ウィキペディアの300以上の言語の自動可読性評価をサポートするシステムは存在しない。このギャップを埋めるため、ウィキペディア記事の可読性を評価するための多言語モデルを構築した。このモデルを訓練し、評価するために、ウィキペディアの論文を単純化したウィキペディアとオンライン児童百科事典にマッチングすることで、14言語にまたがる新しい多言語データセットを作成する。ゼロショットのシナリオでは,14言語で80%以上のランキング精度が得られ,以前のベンチマークでは改善された。これらの結果から, モデル微調整に利用できる基礎構造データがない言語に対して, 大規模に適用可能であることを示す。さらに,ウィキペディアの可読性について,英語以外の文献で概説する。 With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# トランスダクションによる視覚言語モデルの構築 Boosting Vision-Language Models with Transduction ( http://arxiv.org/abs/2406.01837v1 ) ライセンス: Link先を確認	Maxime Zanella, Benoît Gérin, Ismail Ben Ayed,	(参考訳) トランスダクションは、ラベルのないデータの構造を利用して予測精度を高める強力なパラダイムである。本稿では,視覚言語モデル(VLM)のための新しい,計算効率の良いトランスダクティブアプローチであるTransCLIPを提案する。 TransCLIPは、一般的なインダクティブゼロおよび少数ショットモデルの上に、プラグイン・アンド・プレイモジュールとして適用でき、一貫してパフォーマンスを改善している。我々の新たな目的関数は、テキストエンコーダの知識を統合し、トランスダクティブ学習プロセスを導くKL発散ペナルティによって制約された、正規化された最大類似度推定と見なすことができる。さらに,BMM(Block Majorize-Minimize)手順の反復的導出を行い,コンバージェンスとデカップリングされたサンプルアサインメントの更新を保証し,大規模データセットに対する計算効率のよいトランスダクションを実現する。以下に示すような総合的な評価、比較、アブレーション研究について報告する。一トランスダクションは、誘導事前訓練されたゼロ及び少数ショットVLMの一般化能力を大幅に向上させることができる。 (II)TransCLIPは,KL言語制約による視覚的特徴のみに頼って,標準的なトランスダクティブな少数ショット学習手法を著しく上回っている。 Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models, consistently improving their performances. Our new objective function can be viewed as a regularized maximum-likelihood estimation, constrained by a KL divergence penalty that integrates the text-encoder knowledge and guides the transductive learning process. We further derive an iterative Block Majorize-Minimize (BMM) procedure for optimizing our objective, with guaranteed convergence and decoupled sample-assignment updates, yielding computationally efficient transduction for large-scale datasets. We report comprehensive evaluations, comparisons, and ablation studies that demonstrate: (i) Transduction can greatly enhance the generalization capabilities of inductive pretrained zero- and few-shot VLMs; (ii) TransCLIP substantially outperforms standard transductive few-shot learning methods relying solely on vision features, notably due to the KL-based language constraint.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# 関数空間におけるターゲットネットワークの学習 Learning the Target Network in Function Space ( http://arxiv.org/abs/2406.01838v1 ) ライセンス: Link先を確認	Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor,	(参考訳) 本稿では,強化学習(RL)における価値関数の学習に焦点をあてる。この課題は、オンラインネットワークとターゲットネットワークのペアを更新し、これらの2つのネットワークのパラメータが等価であることを保証することで解決されることが多い。このパラメータ空間同値性に依存しない新しい値関数近似アルゴリズムであるLookahead-Replicate (LR)を提案する。代わりに、LRアルゴリズムは関数空間における2つのネットワーク間の等価性を維持するように設計されている。この値ベースの等価性は、新しいターゲットネットワーク更新を用いて得られる。 LRは値関数の学習において収束挙動をもたらすことを示す。また、LRベースのターゲットネットワーク更新により、Atariベンチマークの深いRLが大幅に改善されることを示す実験結果を示す。 We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# GraphWeaver: 数十億ドル規模のサイバーセキュリティインシデント相関 GraphWeaver: Billion-Scale Cybersecurity Incident Correlation ( http://arxiv.org/abs/2406.01842v1 ) ライセンス: Link先を確認	Scott Freitas, Amir Gharib,	(参考訳) 大企業のサイバーセキュリティの動的な状況では、何十億ものセキュリティアラートを包括的インシデントに正確かつ効率的に関連付けることが大きな課題である。伝統的な相関技術は、しばしば保守、スケーリング、新しい脅威やテレメトリの新たな源への適応に苦しむ。 GraphWeaverは、従来のインシデント相関プロセスを、データ最適化されたジオ分散グラフベースのアプローチに移行する、業界規模のフレームワークです。 GraphWeaverは、数十万の企業にまたがる数十億の共有エビデンスアラートに関連する複雑さを扱うために、一連のイノベーションを紹介している。これらのイノベーションの鍵となるのは、大規模データ処理のためのジオ分散データベースとPySpark分析エンジン、相関ストレージを最適化する最小のスパンニングツリーアルゴリズム、セキュリティドメイン知識と脅威インテリジェンスの統合、重要な相関プロセスとパラメータを継続的に洗練するヒューマン・イン・ザ・ループフィードバックシステムである。 GraphWeaverはMicrosoft Defender XDR製品に統合され、世界中のデプロイされ、顧客からのフィードバックとセキュリティ専門家による広範な調査によって確認されたように、何十億もの相関を99%の精度で処理している。この統合は高い相関精度を維持しただけでなく、従来の相関ストレージの要求を7.4倍削減した。 GraphWeaverの重要な設計と運用機能の詳細な概要を提供し、このレベルでこれらの重要な機能をオープンに議論する最初のサイバーセキュリティ企業として、前例を定めています。 In the dynamic landscape of large enterprise cybersecurity, accurately and efficiently correlating billions of security alerts into comprehensive incidents is a substantial challenge. Traditional correlation techniques often struggle with maintenance, scaling, and adapting to emerging threats and novel sources of telemetry. We introduce GraphWeaver, an industry-scale framework that shifts the traditional incident correlation process to a data-optimized, geo-distributed graph based approach. GraphWeaver introduces a suite of innovations tailored to handle the complexities of correlating billions of shared evidence alerts across hundreds of thousands of enterprises. Key among these innovations are a geo-distributed database and PySpark analytics engine for large-scale data processing, a minimum spanning tree algorithm to optimize correlation storage, integration of security domain knowledge and threat intelligence, and a human-in-the-loop feedback system to continuously refine key correlation processes and parameters. GraphWeaver is integrated into the Microsoft Defender XDR product and deployed worldwide, handling billions of correlations with a 99% accuracy rate, as confirmed by customer feedback and extensive investigations by security experts. This integration has not only maintained high correlation accuracy but reduces traditional correlation storage requirements by 7.4x. We provide an in-depth overview of the key design and operational features of GraphWeaver, setting a precedent as the first cybersecurity company to openly discuss these critical capabilities at this level of depth.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# L-MAGIC:コヒーレンスを用いた画像生成支援言語モデル L-MAGIC: Language Model Assisted Generation of Images with Coherence ( http://arxiv.org/abs/2406.01843v1 ) ライセンス: Link先を確認	Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch,	(参考訳) 生成AIのブレークスルーの時代において、単一の入力画像からパノラマシーンを生成することは、依然として重要な課題である。既存のほとんどのメソッドは拡散に基づく反復的もしくは同時多視点インペイントを使用する。しかし、グローバルなシーンレイアウトの事前が欠如しているため、重複したオブジェクト(例えば、寝室の複数のベッド)をサブパー出力したり、ビューごとに人間のテキスト入力に時間を要する。 L-MAGICは,360度パノラマシーンの複数のコヒーレントなビューを拡散しながら,大規模言語モデルを利用した指導手法を提案する。 L-MAGICは、微調整なしで事前訓練された拡散と言語モデルを利用し、ゼロショット性能を保証する。出力品質は超解像・多視点融合技術によりさらに向上する。大規模な実験により、パノラマシーンは、人間の評価において70%以上の選好で、関連する作品と比較して、シーンレイアウトと視点表示の質が向上していることが示された。条件付き拡散モデルと組み合わせることで、L-MAGICはテキスト、深度マップ、スケッチ、色付きスクリプトなど、様々な入力モダリティを受け入れることができる。さらに深度推定を適用することで、3Dポイントクラウドの生成と流体カメラモーションによる動的シーン探索が可能になる。コードはhttps://github.com/IntelLabs/MMPano.comで入手できる。ビデオプレゼンテーションはhttps://youtu.be/XDMNEzH4-Ec? list=PLG9Zyvu7iBa0-a7ccNLO8LjcVRAoMn57s。 In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view. We propose L-MAGIC, a novel method leveraging large language models for guidance while diffusing multiple coherent views of 360 degree panoramic scenes. L-MAGIC harnesses pre-trained diffusion and language models without fine-tuning, ensuring zero-shot performance. The output quality is further enhanced by super-resolution and multi-view fusion techniques. Extensive experiments demonstrate that the resulting panoramic scenes feature better scene layouts and perspective view rendering quality compared to related works, with >70% preference in human evaluations. Combined with conditional diffusion models, L-MAGIC can accept various input modalities, including but not limited to text, depth maps, sketches, and colored scripts. Applying depth estimation further enables 3D point cloud generation and dynamic scene exploration with fluid camera motion. Code is available at https://github.com/IntelLabs/MMPano. The video presentation is available at https://youtu.be/XDMNEzH4-Ec?list=PLG9Zyvu7iBa0-a7ccNLO8LjcVRAoMn57s.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# ECHOで高速でタイムリーに暗号化されたトラフィック分類 Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO ( http://arxiv.org/abs/2406.01852v1 ) ライセンス: Link先を確認	Shilo Daum, Tal Shapira, David Hay, Anat Bremler-Barr,	(参考訳) インターネットトラフィックの95%が暗号化されているため、このトラフィックを分類するための効果的なアプローチは、ネットワークのセキュリティと管理にとって不可欠である。本稿では,ML/DLベースの暗号化トラフィック分類のための新しい最適化プロセスであるECHOを紹介する。 ECHOは、分類時間とメモリ利用の両方を目標とし、2つの革新的なテクニックを取り入れている。最初のコンポーネントであるHO(Hyperparameter Optimization of binnings)は、効率的なトラフィック表現を作ることを目的としている。従来の研究では,パケットサイズやパケット到着時刻を固定サイズのビンにマッピングする表現を用いていた。これらの不均一な双対は、トレーニング段階でハイパーパラメータ最適化アルゴリズムを用いて導出される。 HOは必要な表現サイズに応じて精度を著しく向上させるか、または同等に、より小さな表現を用いて同等の精度を達成する。次に,EC(Early Classification of traffic)を導入し,信頼度に基づいて,異なる終了時間に適応した分類器のカスケードを用いて,より高速な分類を可能にする。 ECは、平均分類遅延を最大90%削減する。注目すべきは、この手法が分類精度を維持するだけでなく、場合によってはその精度を向上させることである。 3つの公開データセットを用いて、組み合わせた手法であるEarly Classification with Hyperparameter Optimization (ECHO)が、分類効率を大幅に向上させることを示した。 With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# 放射線治療におけるリーフシークエンシングとマルチエージェント強化学習 Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy ( http://arxiv.org/abs/2406.01853v1 ) ライセンス: Link先を確認	Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen,	(参考訳) 現代の放射線治療計画(RTP)では、キーモジュールのリーフシークエンシングは主に最適化に基づくアプローチによって対処される。本稿では,リーフシークエンシングのためのマルチエージェントフレームワークにおいて,強化リーフシークエンサー(RLS)と呼ばれる新しい深層強化学習(DRL)モデルを提案する。 RLSモデルは、大規模なトレーニングを通じて、時間を要する反復最適化ステップを改善し、報酬機構の設計を通じて運動パターンを制御することができる。我々は、4つのメトリクスを持つ4つのデータセットの実験を行い、我々のモデルを主要な最適化シーケンサと比較した。その結果,提案したRSSモデルはフラエンス再構成誤差を低減し,最適化プランナに組み込むとより高速に収束できることがわかった。さらに、RSSは完全な人工知能RTPパイプラインで有望な結果を示している。我々は、この先駆的なマルチエージェントRLリーフシーケンサーが、RTPのための機械学習の研究を後押しできることを期待している。 In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale training and can control movement patterns through the design of reward mechanisms. We have conducted experiments on four datasets with four metrics and compared our model with a leading optimization sequencer. Our findings reveal that the proposed RLS model can achieve reduced fluence reconstruction errors, and potential faster convergence when integrated in an optimization planner. Additionally, RLS has shown promising results in a full artificial intelligence RTP pipeline. We hope this pioneer multi-agent RL leaf sequencer can foster future research on machine learning for RTP.	翻訳日:2024-06-05 20:32:51 公開日:2024-06-03
# 個人統計推測のためのサンプリング手法 Resampling methods for private statistical inference ( http://arxiv.org/abs/2402.07131v3 ) ライセンス: Link先を確認	Karan Chadha, John Duchi, Rohith Kuditipudi,	(参考訳) 我々は、信頼区間を異なるプライバシーで構築する作業について検討する。本研究では,複数のブートストラップがデータのパーティション上で実行された結果の中央値をプライベートに計算し,その結果の信頼区間のカバレッジ誤差に漸近的境界を与える,非パラメトリックブートストラップの2つのプライベート変種を提案する。固定差分プライバシーパラメータ$\epsilon$に対して、我々のメソッドは、サンプルサイズ$n$の対数係数内の非プライベートブートストラップと同じエラー率を享受します。我々は,実データと合成データの両方を用いて,平均推定,中央値推定,ロジスティック回帰の手法の性能を実証的に検証した。提案手法は,既存手法(および非プライベートベースライン)と同様のカバレッジ精度を達成し,従来手法よりもはるかに短い信頼区間(10ドル倍)を提供する。 We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $\epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.	翻訳日:2024-06-05 10:48:12 公開日:2024-06-03
# API Pack: APIコール生成のための大規模マルチプログラミング言語データセット API Pack: A Massive Multi-Programming Language Dataset for API Call Generation ( http://arxiv.org/abs/2402.09615v4 ) ライセンス: Link先を確認	Zhen Guo, Adriana Meza Soria, Wei Sun, Yikang Shen, Rameswar Panda,	(参考訳) 我々は,大規模言語モデルのAPIコール生成機能を改善するために,100万以上の命令-APIコールペアを含む大規模マルチプログラミング言語データセットであるAPI Packを紹介する。 API Packから2万のPythonインスタンス上でCodeLlama-13Bを微調整することで、未確認のAPI呼び出しを生成する際に、GPT-3.5とGPT-4を上回ります。 API Packの微調整は、1つの言語で大量のデータと、他の言語からの少量のデータを活用することで、クロスプログラミング言語の一般化を容易にする。トレーニングデータを100万インスタンスにスケールアップすることで、トレーニングに使用されていない新しいAPIにモデルを一般化する能力がさらに向上する。さらなる研究を容易にするため、私たちは、API Packデータセット、トレーニングされたモデル、および関連するソースコードをhttps://github.com/zguo0525/API-Packでオープンソース化しました。 We introduce API Pack, a massive multi-programming language dataset containing more than 1 million instruction-API call pairs to improve the API call generation capabilities of large language models. By fine-tuning CodeLlama-13B on 20,000 Python instances from API Pack, we enable it to outperform GPT-3.5 and GPT-4 in generating unseen API calls. Fine-tuning on API Pack also facilitates cross-programming language generalization by leveraging a large amount of data in one language and small amounts of data from other languages. Scaling the training data to 1 million instances further improves the model's ability to generalize to new APIs not used in training. To facilitate further research, we open-source the API Pack dataset, trained model, and associated source code at https://github.com/zguo0525/API-Pack.	翻訳日:2024-06-05 10:48:12 公開日:2024-06-03
# CheXpert Plus:テキストラジオグラフィーレポート、患者のデモグラフィー、画像フォーマットを追加して胸部X線データセットを拡大 CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats ( http://arxiv.org/abs/2405.19538v2 ) ライセンス: Link先を確認	Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz,	(参考訳) 5年前にCheXpertの最初の論文がリリースされて以来、CheXpertは最も広く使われ、引用された臨床AIデータセットの1つになった。ビジョン言語モデルの出現は、CheXpertイメージに関連するレポートの共有要求の高まりを招き、人口統計データを取得することへのAIフェアネス研究者の関心が高まった。これを解決するため、CheXpert Plusは、放射線学の分野におけるその後のすべての機械学習タスクに対するモデルのスケーリング、パフォーマンス、堅牢性、公平性を高めるために公開された、新しい放射線学データソースのコレクションとして機能する。 CheXpert Plusは、放射線学で公開された最大のテキストデータセットで、合計で3600万のテキストトークンがあり、1300万のインプレッショントークンが含まれている。私たちの知る限りでは、これは放射線学における最大のテキスト識別の取り組みであり、ほぼ100万PHIが匿名化されている。大規模な英語ペアデータセットが放射線学でリリースされたのは2回目であり、これにより初めて大規模なクロスインスティテュートトレーニングが可能になる。全てのレポートは、DICOMフォーマットの高品質な画像と組み合わせられ、様々な臨床および社会経済的グループを含む多数の画像と患者のメタデータ、および多くの病理ラベルとRadGraphアノテーションが提供される。このデータセットは、放射線科医のさらなる支援と医療改善に役立つAIモデルの研究を促進することを願っている。 https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 モデルは以下のURLで利用可能である。 Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus	翻訳日:2024-06-05 10:40:04 公開日:2024-06-03
# 単一画像からの物理的に適合する3次元物体モデリング Physically Compatible 3D Object Modeling from a Single Image ( http://arxiv.org/abs/2405.20510v2 ) ライセンス: Link先を確認	Minghao Guo, Bohan Wang, Pingchuan Ma, Tianyuan Zhang, Crystal Elaine Owens, Chuang Gan, Joshua B. Tenenbaum, Kaiming He, Wojciech Matusik,	(参考訳) 単一画像を3次元物理オブジェクトに変換する計算フレームワークを提案する。画像中の物体の視覚的幾何学は、機械的特性、外部力、静止形状の3つの直交特性によって決定される。既存の1次元の3D再構成手法は、剛性や外力の無視を前提として、しばしばこの基礎となる構成を見落としている。その結果、再構成された物体は現実世界の物理的力に耐えられず、不安定または望ましくない変形をもたらす。我々の最適化フレームワークは、物理互換性を再構築プロセスに埋め込むことによって、この問題に対処する。 3つの物理的属性を明示的に分解し、静的平衡によってリンクし、これはハード制約として機能し、最適化された物理的形状が望ましい物理的挙動を示すことを保証する。 Objaverseから収集したデータセットの評価は、我々のフレームワークが既存の手法よりも連続的に3Dモデルの物理的現実性を高めることを示した。我々のフレームワークの実用性は、動的シミュレーションや3Dプリンティングにおける実践的な応用にまで拡張され、物理的互換性への固執が最重要である。 We present a computational framework that transforms single images into 3D physical objects. The visual geometry of a physical object in an image is determined by three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Existing single-view 3D reconstruction methods often overlook this underlying composition, presuming rigidity or neglecting external forces. Consequently, the reconstructed objects fail to withstand real-world physical forces, resulting in instability or undesirable deformation -- diverging from their intended designs as depicted in the image. Our optimization framework addresses this by embedding physical compatibility into the reconstruction process. We explicitly decompose the three physical attributes and link them through static equilibrium, which serves as a hard constraint, ensuring that the optimized physical shapes exhibit desired physical behaviors. Evaluations on a dataset collected from Objaverse demonstrate that our framework consistently enhances the physical realism of 3D models over existing methods. The utility of our framework extends to practical applications in dynamic simulations and 3D printing, where adherence to physical compatibility is paramount.	翻訳日:2024-06-05 10:40:04 公開日:2024-06-03
# 近代LLMの量子化に及ぼす外乱と校正セットの影響 Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs ( http://arxiv.org/abs/2405.20835v2 ) ライセンス: Link先を確認	Davide Paglieri, Saurabh Dash, Tim Rocktäschel, Jack Parker-Holder,	(参考訳) PTQ(Post-Training Quantization)は、メモリ使用量を減らすことで、より高速な操作と、よりアクセスしやすいハードウェアとの互換性を実現することで、パフォーマンス低下のコストで、Large Language Models(LLMs)の効率を向上させる。 PTQにおけるキャリブレーションセットの役割,特に各種オープンソースLCMにおける隠れアクティベーションへの影響について検討する。キャリブレーションセットは、アクティベーションの規模を評価して、量子化範囲を歪め、性能に悪影響を及ぼすような外れ値を特定するのに不可欠である。我々の分析では、モデル間で量子化の有効性が顕著に比較されている。量子化文献の大部分をベースとした古いOPTモデルでは, キャリブレーションセットの異なる外れ値に対して, 顕著な性能劣化と高い感受性を示す。対照的に、Llama-2 7B、Llama-3 8B、Command-R 35B、Mistral 7Bといった新しいモデルは強い堅牢性を示し、Mistral 7Bは外れ値と安定した活性化を示す。これらの結果はPTQ戦略の転換が必要であることを示唆している。事前学習手法の進歩により、外れ値の関連性が低下するにつれ、現在の量子化文学の基礎を再評価する必要性が高まっている。現状のLDMの進化する特性に合わせるために、主にアウターリー保存に焦点を当てるのではなく、推論速度の最適化に重点を置くべきである。 Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial for evaluating activation magnitudes and identifying outliers, which can distort the quantization range and negatively impact performance. Our analysis reveals a marked contrast in quantization effectiveness across models. The older OPT model, upon which much of the quantization literature is based, shows significant performance deterioration and high susceptibility to outliers with varying calibration sets. In contrast, newer models like Llama-2 7B, Llama-3 8B, Command-R 35B, and Mistral 7B demonstrate strong robustness, with Mistral 7B showing near-immunity to outliers and stable activations. These findings suggest a shift in PTQ strategies might be needed. As advancements in pre-training methods reduce the relevance of outliers, there is an emerging need to reassess the fundamentals of current quantization literature. The emphasis should pivot towards optimizing inference speed, rather than primarily focusing on outlier preservation, to align with the evolving characteristics of state-of-the-art LLMs.	翻訳日:2024-06-05 10:40:04 公開日:2024-06-03
# ロボットがバーに足を踏み入れる:コメディーの創造性支援ツールとして言語モデルが生き残るか? : コメディアンによるLLMのユーモアアライメントの評価 A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians ( http://arxiv.org/abs/2405.20956v2 ) ライセンス: Link先を確認	Piotr Wojciech Mirowski, Juliette Love, Kory W. Mathewson, Shakir Mohamed,	(参考訳) 我々は2023年8月にエディンバラ・フェスティバル・フランジで行われた「AI x Comedy」のワークショップの一環として,聴衆の前でライブショーを行う20人のプロコメディアンにインタビューを行った。ワークショップは、大規模言語モデル(LLMs)によるコメディ執筆セッション、AIの創造性サポート指標を記述ツールとして評価するための人間とコンピュータのインタラクションのアンケート、AIの使用の動機とプロセスに対するコメディアンの疑問、バイアス、検閲、著作権に関する倫理的懸念などで構成された。参加者は、安全フィルタリングや指導訓練のLLMで使用されている既存のモデレーション戦略は、少数派とその視点を消去することでヘゲモニックな視点を強化し、検閲の一形態としてこれを認定した。同時に、ほとんどの参加者は、LLMが創造性支援ツールとして成功しなかったと感じ、1950年代の「船の喜劇の素材を掘り下げるが、少し人種差別的でない」というような、白地と偏見のある喜劇のトロープを制作した。我々の研究は、一方が有害な言論であり、他方が抵抗、風刺、そして '`punching up'' の実践である '`offensive'' 言語との微妙な相違についての学問を拡張している。我々はまた、そのような言語モデルの背後にあるグローバルな価値アライメントを疑問視し、アーティストのニーズに合うAIツールを構築するために、コミュニティベースの価値アライメントとデータオーナシップの重要性について議論する。 We interviewed twenty professional comedians who perform live shows in front of audiences and who use artificial intelligence in their artistic process as part of 3-hour workshops on ``AI x Comedy'' conducted at the Edinburgh Festival Fringe in August 2023 and online. The workshop consisted of a comedy writing session with large language models (LLMs), a human-computer interaction questionnaire to assess the Creativity Support Index of AI as a writing tool, and a focus group interrogating the comedians' motivations for and processes of using AI, as well as their ethical concerns about bias, censorship and copyright. Participants noted that existing moderation strategies used in safety filtering and instruction-tuned LLMs reinforced hegemonic viewpoints by erasing minority groups and their perspectives, and qualified this as a form of censorship. At the same time, most participants felt the LLMs did not succeed as a creativity support tool, by producing bland and biased comedy tropes, akin to ``cruise ship comedy material from the 1950s, but a bit less racist''. Our work extends scholarship about the subtle difference between, one the one hand, harmful speech, and on the other hand, ``offensive'' language as a practice of resistance, satire and ``punching up''. We also interrogate the global value alignment behind such language models, and discuss the importance of community-based value alignment and data ownership to build AI tools that better suit artists' needs.	翻訳日:2024-06-05 10:40:04 公開日:2024-06-03
# 一般化された「Notの平方根」行列とその隠れた論理作用素の発表および完全行列円ユーラー関数の定義への応用 Generalized "Square roots of Not" matrices, their application to the unveiling of hidden logical operators and to the definition of fully matrix circular Euler functions ( http://arxiv.org/abs/2107.06067v3 ) ライセンス: Link先を確認	Eduardo Mizraji,	(参考訳) ノットの平方根は量子コンピューティング理論において重要な論理演算子であり、それ自身で数学的対象として興味を持つ。物理学では、次元 2 の平方複素行列である。現在の研究において、これは任意の次元の複素正方行列である。線形代数の論理理論への導入は、近年、ニューラルネットワークと量子コンピューティングの分野の研究によって強化されている。ここでは、行列による論理演算の表現を簡潔に記述し、Nt演算子の2乗根に対する一般表現がどのように得られるかを示す。次に2つのトピックを探求します。まず、Deutschのアルゴリズムの短い形式の非量子領域の拡張について検討する。そして、Not の根は虚数単位 i の行列拡大であると仮定し、この考えの下で、オイラー拡大と複素指数関数による円函数の表現に対する完全行列バージョンを得る。 The square root of Not is a logical operator of importance in quantum computing theory and of interest as a mathematical object in its own right. In physics, it is a square complex matrix of dimension 2. In the present work it is a complex square matrix of arbitrary dimension. The introduction of linear algebra into logical theory has been enhanced in recent decades by the researches in the field of neural networks and quantum computing. Here we will make a brief description of the representation of logical operations through matrices and we show how general expressions for the two square roots of the Not operator are obtained. Then, we explore two topics. First, we study an extension to a non-quantum domain of a short form of Deutsch's algorithm. Then, we assume that a root of Not is a matrix extension of the imaginary unit i, and under this idea we obtain fully matrix versions for the Euler expansions and for the representations of circular functions by complex exponentials.	翻訳日:2024-06-05 07:04:28 公開日:2024-06-03
# 言語理解のためのレイテンシ適応型トランスフォーマーエンコーダ Latency Adjustable Transformer Encoder for Language Understanding ( http://arxiv.org/abs/2201.03327v8 ) ライセンス: Link先を確認	Sajjad Kachuee, Mohammad Sharifkhani,	(参考訳) 自然言語理解モデルのレイテンシ、パワー、精度を調整することは、効率的なアーキテクチャの望ましい目的である。本稿では,提案する推論遅延の高速化により,推論コストを適応的に調整する効率的なトランスフォーマーアーキテクチャを提案する。微調整フェーズにおいて、提案手法は、重要でない隠れシーケンス要素(ワードベクター)を検出し、提案したAttention Context Contribution (ACC) メトリックを用いて、各エンコーダ層でそれらを除去する。ファインチューニングフェーズの後、新しいオフラインチューニング特性により、モデルの推論遅延を、それ以上のトレーニングをすることなく、広範囲の推論スピードアップ選択で調整することができる。提案手法をBERT_base, GPT-2, Flan-T5モデルに適用して評価を行った。大規模な実験では、高いトランスフォーマー層におけるワードベクタの大部分が、その後のレイヤへの寄与が少ないことが示されており、推論遅延を改善するためにそれらを取り除くことができる。 GLUEのような大規模な感情分析、分類、テキスト生成タスク、回帰ベンチマークによる実験の結果、この手法は入力のグローバルな文脈に最小限の影響を与えることなく、様々なデータセットに有効であることが示された。また,本手法を指導指導パラダイムで評価し,異なる種類のプロンプトを用いて評価した。提案手法は,BERT_base と GPT-2 の推論遅延を最大4.8倍,3.72倍に改善し,0.75% の精度低下と平均パープレキシティが可能である。提案するアプローチは、Large Language Models (LLMs) において、トレーニングには完全なネットワークが必要であるが、微調整フェーズで切り離すことができることを示唆している。 Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational cost adaptively with a desired inference latency speedup. In fine-tuning phase, the proposed method detects less important hidden sequence elements (word-vectors) and eliminates them in each encoder layer using a proposed Attention Context Contribution (ACC) metric. After the fine-tuning phase, with the novel offline-tuning property, the inference latency of the model can be adjusted in a wide range of inference speedup selections without any further training. The proposed method is applied to the BERT_base, GPT-2 and Flan-T5 models for evaluation. Extensive experiments show that most of the word-vectors in higher Transformer layers have less contribution to the subsequent layers; hence, they can be eliminated to improve the inference latency. Experimental results on extensive sentiment analysis, classification, text generation tasks and regression benchmarks like GLUE showed that the method is effective in various datasets with minimal impact on the input's global context. The method was also evaluated under the instruction tuning paradigm, and its performance was measured using different types of prompting. The proposed method mathematically and experimentally improves the inference latency of BERT_base and GPT-2 by up to 4.8 and 3.72 times with less than 0.75% accuracy drop and passable perplexity on average. The suggested approach posits that in Large Language Models (LLMs), although the complete network is necessary for training, it can be truncated during the fine-tuning phase.	翻訳日:2024-06-05 07:04:28 公開日:2024-06-03

Title

Authors

Abstract

論文公表日・翻訳日

# 有害自殺検出

Harmful Suicide Content Detection ( http://arxiv.org/abs/2407.13942v1 )

ライセンス: Link先を確認

Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park,

(参考訳) インターネット上の有害な自殺コンテンツは、脆弱な人口の自殺的思考や行動を引き起こす重要な危険因子である。世界的努力にもかかわらず、既存の資源、特に大韓民国のような高リスク地域では不足している。現在の研究は、内容の有害性を自動的に検出するのではなく、個人におけるそのような内容や自殺リスクのネガティブな影響を理解することに焦点を当てている。このギャップを埋めるために、オンライン自殺コンテンツを5つの有害レベルに分類する有害自殺コンテンツ検出タスクを導入する。我々は,医療専門家と共同でマルチモーダル・ベンチマークとタスク記述文書を開発し,大規模言語モデル(LLM)を活用して,そのようなコンテンツをモデレートするための効率的な手法を探索する。コントリビューションには,新たな検出タスクの提案,専門家アノテーションを用いたマルチモーダル韓国ベンチマーク,違法かつ有害なコンテンツの検出にLLMを用いた戦略の提案などが含まれている。潜在的な害が伴うため、倫理的検証プロセスを導入し、実装とベンチマークを公表します。

Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.

翻訳日:2024-08-05 01:55:24 公開日:2024-06-03

# 人工知能を使って集団知能を加速する - ポリシーシンスとよりスマートなクラウドソーシング

Using Artificial Intelligence to Accelerate Collective Intelligence: Policy Synth and Smarter Crowdsourcing ( http://arxiv.org/abs/2407.13960v1 )

ライセンス: Link先を確認

Róbert Bjarnason, Dane Gambrell, Joshua Lanthier-Welch,

(参考訳) 社会の急激な変化と複雑な課題を特徴とする時代には、公共セクターにおける従来の問題解決方法が不十分になってきている。本研究では, 人工知能を用いて, 緊急時問題に対する効果的な解を, より効率的に生成することのできる, 革新的で効果的なモデルを提案する。クラウドソーシングを通じて、問題に関する専門知識を持つ人々の集合的インテリジェンスを行動可能なソリューションに変換するために設計された、Smarter Crowdsourcingと呼ばれる、実証済みの集合的インテリジェンス手法について説明する。次に、AIを活用する革新的なツールキットであるPolicy Synthを紹介します。 Policy Synthは人間中心のアプローチを使って開発されており、AIは人間の知性と創造性を高めるツールであり、それを置き換えるものではない、と認識している。専門家のクラウドソーシングの結果と、ポリシーシンスAIエージェントが支援する専門家のクラウドソーシング結果を比較した実世界のケーススタディに基づいて、我々は、ポリシーシンスによるスマートクラウドソーシングが、人間の専門家の集合的な知恵とAIの計算力を統合して、公共の問題解決プロセスの強化とスケールアップに有効なモデルを提供すると結論付けた。既存の多くのアプローチでは、AIをクラウドソーシングと熟考プロセスをより効率的にするためのツールとして見ているが、Policy Synthはさらに一歩進んで、AIが研究と共にエンゲージメントからの発見を合成し、エビデンスベースのソリューションとポリシーを開発するために使用できることを認識している。この研究は、緊急の社会的課題に対処するために、コミュニティを効果的に取り組もうとする機関に対して、実践的なツールと洞察を提供する。

In an era characterized by rapid societal changes and complex challenges, institutions' traditional methods of problem-solving in the public sector are increasingly proving inadequate. In this study, we present an innovative and effective model for how institutions can use artificial intelligence to enable groups of people to generate effective solutions to urgent problems more efficiently. We describe a proven collective intelligence method, called Smarter Crowdsourcing, which is designed to channel the collective intelligence of those with expertise about a problem into actionable solutions through crowdsourcing. Then we introduce Policy Synth, an innovative toolkit which leverages AI to make the Smarter Crowdsourcing problem-solving approach both more scalable, more effective and more efficient. Policy Synth is crafted using a human-centric approach, recognizing that AI is a tool to enhance human intelligence and creativity, not replace it. Based on a real-world case study comparing the results of expert crowdsourcing alone with expert sourcing supported by Policy Synth AI agents, we conclude that Smarter Crowdsourcing with Policy Synth presents an effective model for integrating the collective wisdom of human experts and the computational power of AI to enhance and scale up public problem-solving processes. While many existing approaches view AI as a tool to make crowdsourcing and deliberative processes better and more efficient, Policy Synth goes a step further, recognizing that AI can also be used to synthesize the findings from engagements together with research to develop evidence-based solutions and policies. The study offers practical tools and insights for institutions looking to engage communities effectively in addressing urgent societal challenges.

翻訳日:2024-08-05 01:55:24 公開日:2024-06-03

# コンピューターの創造性は死んだインターネットで繁栄しているか?

Is computational creativity flourishing on the dead internet? ( http://arxiv.org/abs/2407.17590v1 )

ライセンス: Link先を確認

Terence Broad,

(参考訳) 死んだインターネット理論は、ソーシャルメディア上のすべてのインタラクションとポストは、もはや現実の人間ではなく、自律的なボットによって作られている、という陰謀論である。この理論は明らかに真実ではないが、ソーシャルメディアへの投稿が増えているのは、フォロワーを獲得してソーシャルメディアプラットフォームへのエンゲージメントを促進するために最適化されたボットによるものだ。本稿では、これらのボットの最近の現象を考察し、それらの振る舞いを計算的創造性のレンズを通して分析し、その疑問を考察する: 計算的創造性は死んだインターネット上で繁栄しているか?

The dead internet theory is a conspiracy theory that states that all interactions and posts on social media are no longer being made by real people, but rather by autonomous bots. While the theory is obviously not true, an increasing amount of posts on social media have been made by bots optimised to gain followers and drive engagement on social media platforms. This paper looks at the recent phenomenon of these bots, analysing their behaviour through the lens of computational creativity to investigate the question: is computational creativity flourishing on the dead internet?

翻訳日:2024-08-05 01:35:56 公開日:2024-06-03

# テキスト・画像拡散モデルのための分割自由誘導法

Segmentation-Free Guidance for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2407.04800v1 )

ライセンス: Link先を確認

Kambiz Azarian, Debasmit Das, Qiqi Hou, Fatih Porikli,

(参考訳) 安定拡散のようなテキストと画像の拡散モデルのための新しい手法であるセグメンテーションフリーガイダンスを導入する。拡散モデルの再学習は不要である。追加の計算コストなしでは、拡散モデル自体をインプリッドセグメンテーションネットワークとして使用し、したがってセグメンテーションフリーガイダンスと呼ばれ、プロンプトの概念に対するパッチの関連性に基づいて、生成された画像の各パッチに対する負のプロンプトを動的に調整する。 FID,CLIP,IS,PickScoreを主観的,主観的に評価する。主観評価には,MS COCO-30Kのようなデータセットのプロンプトをサブサンプリングする手法も提案する。その結果,広く使用されている分類器フリー手法に対するセグメント化フリーガイダンスの優位性を示した。人間の評価者は、分類子なしの60%から19%よりもセグメンテーションなしの指導を好んだが、その18%は強い嗜好を示した。さらに、最近提案された人間の嗜好を模倣する指標であるPickScore win-rateも、分類器フリーよりもメソッドの好みを示している。

We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generated image, based on the patch's relevance to concepts in the prompt. We evaluate segmentation-free guidance both objectively, using FID, CLIP, IS, and PickScore, and subjectively, through human evaluators. For the subjective evaluation, we also propose a methodology for subsampling the prompts in a dataset like MS COCO-30K to keep the number of human evaluations manageable while ensuring that the selected subset is both representative in terms of content and fair in terms of model performance. The results demonstrate the superiority of our segmentation-free guidance to the widely used classifier-free method. Human evaluators preferred segmentation-free guidance over classifier-free 60% to 19%, with 18% of occasions showing a strong preference. Additionally, PickScore win-rate, a recently proposed metric mimicking human preference, also indicates a preference for our method over classifier-free.

翻訳日:2024-07-22 14:29:03 公開日:2024-06-03

# グラディエントDescent法によるMU-MIMO放送チャンネルの連成星形成

Joint Constellation Shaping Using Gradient Descent Approach for MU-MIMO Broadcast Channel ( http://arxiv.org/abs/2407.07708v1 )

ライセンス: Link先を確認

Maxime Vaillant, Alix Jeannerot, Jean-Marie Gorce,

(参考訳) 我々は,マルチユーザMIMO放送チャンネル(T$Txアンテナ,K$ユーザ,それぞれ$R$Rxアンテナ)のコンステレーションを,完全チャネル知識で最適化するための学習ベースのアプローチを導入する。最適化器(MAX-MIN)の目的は、送信機と受信機間の最小の相互情報を和力制約の下で最大化することである。提案手法は、送信機に重ね合わせ符号(SC)やその他の線形プリコーディングの使用や、受信機での逐次干渉キャンセル(SIC)の使用を強制しない。その代わりに、各受信機$k$のサブスペースへの投影を最適化し、送信された各バイナリ入力$W_k$と意図された受信機$Y_k$の出力信号との間の最小相互情報$I(W_k;Y_k)$を最大化する。本手法により得られたレートは,線形プリコーダで得られたレートと比較される。

We introduce a learning-based approach to optimize a joint constellation for a multi-user MIMO broadcast channel ($T$ Tx antennas, $K$ users, each with $R$ Rx antennas), with perfect channel knowledge. The aim of the optimizer (MAX-MIN) is to maximize the minimum mutual information between the transmitter and each receiver, under a sum-power constraint. The proposed optimization method do neither impose the transmitter to use superposition coding (SC) or any other linear precoding, nor to use successive interference cancellation (SIC) at the receiver. Instead, the approach designs a joint constellation, optimized such that its projection into the subspace of each receiver $k$, maximizes the minimum mutual information $I(W_k;Y_k)$ between each transmitted binary input $W_k$ and the output signal at the intended receiver $Y_k$. The rates obtained by our method are compared to those achieved with linear precoders.

翻訳日:2024-07-22 13:58:01 公開日:2024-06-03

# ディープスパイクニューロンネットワークの効率化に向けて:圧縮に関する調査研究

Toward Efficient Deep Spiking Neuron Networks:A Survey On Compression ( http://arxiv.org/abs/2407.08744v1 )

ライセンス: Link先を確認

Hui Xie, Ge Yang, Wenjuan Gao,

(参考訳) ディープラーニングの急速な発展に伴い、Deep Spiking Neural Networks(DSNN)は、独自のスパイクイベント処理と非同期計算のために、有望な存在として現れている。ニューロモルフィックチップにデプロイすると、DSNNはディープ・ニューラル・ニューラルネットワーク(DANN)よりも大きなパワーアドバンテージを提供し、スパイク(0または1)のバイナリの性質による時間とエネルギー消費の乗算をなくす。さらに、DSNNは時間情報の処理に優れており、DANNよりも時間データの処理に優れている可能性がある。しかし、その深いネットワーク構造と多くのパラメータは計算コストとエネルギー消費を増大させ、実際の展開を制限する。 DSNNの効率を高めるために、研究者は、プルーニング、量子化、知識蒸留といったDANNの手法を応用し、スパイクシューティングやプルーニングタイムステップの削減のような特定の技術を開発した。以前の調査では、DSNNのアルゴリズム、ハードウェアデプロイメント、一般的な概要をカバーしていたが、DSNNの圧縮と効率性についての研究は欠如している。本研究では,効率的なDSNNとその圧縮手法に集中することで,このギャップを解消する。 DSNNの生物学的背景と計算単位の探索から始まり、DANNとの違いを強調している。その後、プルーニング、量子化、知識の蒸留、スパイク発火の低減など様々な圧縮手法を練り込み、今後の研究の方向性を示唆した。

With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or 1). Additionally, DSNNs excel in processing temporal information, making them potentially superior for handling temporal data compared to DANNs. However, their deep network structure and numerous parameters result in high computational costs and energy consumption, limiting real-life deployment. To enhance DSNNs efficiency, researchers have adapted methods from DANNs, such as pruning, quantization, and knowledge distillation, and developed specific techniques like reducing spike firing and pruning time steps. While previous surveys have covered DSNNs algorithms, hardware deployment, and general overviews, focused research on DSNNs compression and efficiency has been lacking. This survey addresses this gap by concentrating on efficient DSNNs and their compression methods. It begins with an exploration of DSNNs' biological background and computational units, highlighting differences from DANNs. It then delves into various compression methods, including pruning, quantization, knowledge distillation, and reducing spike firing, and concludes with suggestions for future research directions.

翻訳日:2024-07-22 13:48:17 公開日:2024-06-03

# 汎用人工知能システムの設計と強化のための進化計算:調査と展望

Evolutionary Computation for the Design and Enrichment of General-Purpose Artificial Intelligence Systems: Survey and Prospects ( http://arxiv.org/abs/2407.08745v1 )

ライセンス: Link先を確認

Javier Poyatos, Javier Del Ser, Salvador Garcia, Hisao Ishibuchi, Daniel Molina, Isaac Triguero, Bing Xue, Xin Yao, Francisco Herrera,

(参考訳) 人工知能では、多様な学習タスクを扱うことができる適応モデルへの需要が増加しており、単一のタスクに対処するために考案されたシステムの制限を超越している。最近の汎用人工知能システム(GPAIS)の出現は、従来の機械学習モデルの最適設計よりもはるかに複雑なスケールでモデル構成と適応性の問題を引き起こす。進化計算(Evolutionary Computation:EC)は、機械学習モデルの設計と最適化の両方に有用なツールであり、考慮中のタスクに自分自身を設定および/または適応する能力を提供する。したがって、GPAISへの応用は自然な選択である。本稿では,GPAISの分野におけるECの役割を解析し,その設計や富化におけるECの利用について検討する。私たちはまた、GPAISのプロパティを、ECが目立った貢献をした機械学習領域にマッチさせ、GPAISのECの最近のマイルストーンを強調します。さらに、GPAISにおけるECのメリットを活用し、GPAISをECで設計・改善するための異なる戦略を提示し、接する領域をカバーし、研究ニッチを識別し、ECとGPAISの潜在的研究方向性を概説する課題についても論じる。

In Artificial Intelligence, there is an increasing demand for adaptive models capable of dealing with a diverse spectrum of learning tasks, surpassing the limitations of systems devised to cope with a single task. The recent emergence of General-Purpose Artificial Intelligence Systems (GPAIS) poses model configuration and adaptability challenges at far greater complexity scales than the optimal design of traditional Machine Learning models. Evolutionary Computation (EC) has been a useful tool for both the design and optimization of Machine Learning models, endowing them with the capability to configure and/or adapt themselves to the task under consideration. Therefore, their application to GPAIS is a natural choice. This paper aims to analyze the role of EC in the field of GPAIS, exploring the use of EC for their design or enrichment. We also match GPAIS properties to Machine Learning areas in which EC has had a notable contribution, highlighting recent milestones of EC for GPAIS. Furthermore, we discuss the challenges of harnessing the benefits of EC for GPAIS, presenting different strategies to both design and improve GPAIS with EC, covering tangential areas, identifying research niches, and outlining potential research directions for EC and GPAIS.

翻訳日:2024-07-22 13:48:17 公開日:2024-06-03

# Twitterボット分類のための時系列スパイクニューラルネットワークにおけるイベント空間の反復

Iteration over event space in time-to-first-spike spiking neural networks for Twitter bot classification ( http://arxiv.org/abs/2407.08746v1 )

ライセンス: Link先を確認

Mateusz Pabian, Dominik Rzepka, Mirosław Pawlak,

(参考訳) 本研究では,従来の時分割スパイクスパイクニューラルネットワーク(SNN)モデルを拡張して,時間とともに情報を処理するフレームワークを提案する。本稿では、各ニューロンにおける複数の入力と出力のスパイクを持つモデルによるスパイク伝播と、エンドツーエンドのバックプロパゲーションのためのトレーニングルールの設計について説明する。この戦略により、時間とともに変化する情報を処理できます。モデルは、イベントの時間(ツイートとリツイート)が情報の主要キャリアであるTwitterボット検出タスクでトレーニングされ、評価される。このタスクは、提案されたSNNが、時間スケールで発生した数百のイベントからなるスパイクトレインデータをどのように扱うかを評価するために選択された。各種パラメータがモデル特性,性能,訓練時間安定性に与える影響を解析した。

This study proposes a framework that extends existing time-coding time-to-first-spike spiking neural network (SNN) models to allow processing information changing over time. We explain spike propagation through a model with multiple input and output spikes at each neuron, as well as design training rules for end-to-end backpropagation. This strategy enables us to process information changing over time. The model is trained and evaluated on a Twitter bot detection task where the time of events (tweets and retweets) is the primary carrier of information. This task was chosen to evaluate how the proposed SNN deals with spike train data composed of hundreds of events occurring at timescales differing by almost five orders of magnitude. The impact of various parameters on model properties, performance and training-time stability is analyzed.

翻訳日:2024-07-22 13:48:17 公開日:2024-06-03

# 大規模言語モデルのライフサイクル:教育におけるバイアスの概観

The Life Cycle of Large Language Models: A Review of Biases in Education ( http://arxiv.org/abs/2407.11203v1 )

ライセンス: Link先を確認

Jinsook Lee, Yann Hicke, Renzhe Yu, Christopher Brooks, René F. Kizilcec,

(参考訳) 大規模言語モデル(LLM)は、学生や教師にパーソナライズされたサポートを提供するために、教育の文脈でますます採用されている。自然言語を理解・生成するLLMベースのアプリケーションの前例のない能力は、指導効果と学習結果を改善する可能性があるが、教育技術におけるLLMの統合は、教育的不平等を悪化させる可能性のあるアルゴリズムバイアスに対して、新たな懸念を抱いている。本稿では,従来の機械学習のライフサイクルをマッピングするための先行研究に基づいて,LLMの初期開発から教育環境における各種応用のための事前学習モデルのカスタマイズまで,LCMのライフサイクルの全体地図を提供する。 LLMのライフサイクルにおける各ステップを説明し、教育の文脈で生じる可能性のあるバイアスの原因を特定する。従来の機械学習による偏見は、テキストが高次元であること、複数の正しい応答が存在すること、不公平であることより、教育におけるLLM生成コンテンツへの変換に失敗する可能性があること、などについて論じる。本論は,LLMアプリケーションにおける偏見の複雑な性質を明らかにすることを目的として,その評価のための実践的ガイダンスを提供する。

Large Language Models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers. The unprecedented capacity of LLM-based applications to understand and generate natural language can potentially improve instructional effectiveness and learning outcomes, but the integration of LLMs in education technology has renewed concerns over algorithmic bias which may exacerbate educational inequities. In this review, building on prior work on mapping the traditional machine learning life cycle, we provide a holistic map of the LLM life cycle from the initial development of LLMs to customizing pre-trained models for various applications in educational settings. We explain each step in the LLM life cycle and identify potential sources of bias that may arise in the context of education. We discuss why current measures of bias from traditional machine learning fail to transfer to LLM-generated content in education, such as tutoring conversations because the text is high-dimensional, there can be multiple correct responses, and tailoring responses may be pedagogically desirable rather than unfair. This review aims to clarify the complex nature of bias in LLM applications and provide practical guidance for their evaluation to promote educational equity.

翻訳日:2024-07-22 12:00:08 公開日:2024-06-03

# 学習バディとしてのジェネレーティブAI : 教員の使い方と態度

Generative AI as a Learning Buddy and Teaching Assistant: Pre-service Teachers' Uses and Attitudes ( http://arxiv.org/abs/2407.11983v1 )

ライセンス: Link先を確認

Matthew Nyaaba, Lehong Shi, Macharious Nabang, Xiaoming Zhai, Patrick Kyeremeh, Samuel Arthur Ayoberd, Bismark Nyaaba Akanzire,

(参考訳) 先進的な教員(PST)のユーザ体験と生成的人工知能(GenAI)アプリケーションに対する認識を明らかにするために,Ghana PSTsの学習仲間および指導助手としてのGenAIの具体的な使用状況と,それらの応用に対する態度を調査した。探索的因子分析(EFA)を用いて,PSTのGenAIに対する態度を形作る3つの重要な要因を同定した。これらの要因の平均スコアは、GenAIに対する概して肯定的な態度を示し、PSTのコンテンツ知識を高め、学習や教材へのアクセスを可能とすることで、同僚の援助の必要性を減らした。特に、PSTは、GenAIを学習仲間として、読み物、深い内容の説明、実践例へのアクセス、教材の強化、アセスメント戦略の展開、プランニングの指導支援として利用している。回帰分析の結果,年齢,性別,研究年数などの背景因子はPSTsのGenAIに対する態度を予測しないが,年齢と研究年数はGenAIの使用頻度を有意に予測する一方で,性別は予測しないことがわかった。これらの結果から,教員教育プログラムにおける高齢者のPSTとそれに伴うPSTは,より頻繁にGenAIを使用する可能性があるが,その適用に対する認識は変化していないことが示唆された。しかし、PSTはGenAIアプリケーションが提供する情報の正確性と信頼性に関する懸念を表明している。そこで我々は,これらの懸念に対処し,教員準備プログラムにおいてPSTが確実にこれらの応用に頼れるようにすることを提案する。さらに,PSTの学習・教育プロセスにGenAIをより効果的に統合するための戦略を推奨する。

To uncover pre-service teachers' (PSTs') user experience and perceptions of generative artificial intelligence (GenAI) applications, we surveyed 167 Ghana PSTs' specific uses of GenAI as a learning buddy and teaching assistant, and their attitudes towards these applications. Employing exploratory factor analysis (EFA), we identified three key factors shaping PSTs' attitudes towards GenAI: teaching, learning, and ethical and advocacy factors. The mean scores of these factors revealed a generally positive attitude towards GenAI, indicating high levels of agreement on its potential to enhance PSTs' content knowledge and access to learning and teaching resources, thereby reducing their need for assistance from colleagues. Specifically, PSTs use GenAI as a learning buddy to access reading materials, in-depth content explanations, and practical examples, and as a teaching assistant to enhance teaching resources, develop assessment strategies, and plan lessons. A regression analysis showed that background factors such as age, gender, and year of study do not predict PSTs' attitudes towards GenAI, but age and year of study significantly predict the frequency of their use of GenAI, while gender does not. These findings suggest that older PSTs and those further along in their teacher education programs may use GenAI more frequently, but their perceptions of the application remain unchanged. However, PSTs expressed concerns about the accuracy and trustworthiness of the information provided by GenAI applications. We, therefore, suggest addressing these concerns to ensure PSTs can confidently rely on these applications in their teacher preparation programs. Additionally, we recommend targeted strategies to integrate GenAI more effectively into both learning and teaching processes for PSTs.

翻訳日:2024-07-22 11:50:18 公開日:2024-06-03

# AI開発とガバナンスへの参加的アプローチ:原則的アプローチ

Participatory Approaches in AI Development and Governance: A Principled Approach ( http://arxiv.org/abs/2407.13100v1 )

ライセンス: Link先を確認

Ambreesh Parthasarathy, Aditya Phalnikar, Ameen Jauhar, Dhruv Somayajula, Gokul S Krishnan, Balaraman Ravindran,

(参考訳) 人工知能(AI)技術が公共部門や民間セクターに広く採用され、新しい、予期せぬ方法で人々の生活に大きな影響を与えている。この文脈では、設計、開発、デプロイメントがどのように行われるかを知ることが重要になります。この調査の結果、これらのシステムの展開によって影響を受けそうな人は、どのように開発されているかはほとんど語られていないことが明らかとなった。この研究は、より責任があり、安全で、人間中心のAIシステムを構築し、使用するのに、参加的アプローチが(実用的にも規範的にも)有益である、という前提を推し進めている。厳密には、これはプロセスの公正性を高め、市民が自分の生活に大きな影響を及ぼす可能性のあるシステムへの関心を喚起する権限を与える。実際には、AIアルゴリズムの品質向上に役立ちそうな、新たな情報手段を開発者に提供します。論文はまず,AIシステムのライフサイクルを説明することによって,この議論を推し進める。第2に,参加型エクササイズにおいて関連する利害関係者を特定するために使用される基準を特定し,第3に,関連する利害関係者をAIライフサイクルの異なる段階にマッピングすることによって,この議論を推し進める。本稿は、AIにおける参加型ガバナンスに関する2部構成のシリーズの第1部を構成する。第2の論文では、本論文で開発された原則を拡張し、拡張し、実際のAIシステムのユースケースに適用する。

The widespread adoption of Artificial Intelligence (AI) technologies in the public and private sectors has resulted in them significantly impacting the lives of people in new and unexpected ways. In this context, it becomes important to inquire how their design, development and deployment takes place. Upon this inquiry, it is seen that persons who will be impacted by the deployment of these systems have little to no say in how they are developed. Seeing this as a lacuna, this research study advances the premise that a participatory approach is beneficial (both practically and normatively) to building and using more responsible, safe, and human-centric AI systems. Normatively, it enhances the fairness of the process and empowers citizens in voicing concerns to systems that may heavily impact their lives. Practically, it provides developers with new avenues of information which will be beneficial to them in improving the quality of the AI algorithm. The paper advances this argument first, by describing the life cycle of an AI system; second, by identifying criteria which may be used to identify relevant stakeholders for a participatory exercise; and third, by mapping relevant stakeholders to different stages of AI lifecycle. This paper forms the first part of a two-part series on participatory governance in AI. The second paper will expand upon and concretise the principles developed in this paper and apply the same to actual use cases of AI systems.

翻訳日:2024-07-22 08:07:30 公開日:2024-06-03

# AI開発とガバナンスへの参加的アプローチ:ケーススタディ

Participatory Approaches in AI Development and Governance: Case Studies ( http://arxiv.org/abs/2407.13103v1 )

ライセンス: Link先を確認

Ambreesh Parthasarathy, Aditya Phalnikar, Gokul S Krishnan, Ameen Jauhar, Balaraman Ravindran,

(参考訳) 本稿では、AI開発と展開への参加的アプローチの価値に関する2部シリーズの第2部を構成する。最初の論文は、この2つのエクササイズ(つまり、AIの開発と展開)に参加メソッドをデプロイするための、原則と実践的な正当化を考案した。現実的な正当化は、よりきめ細かい情報を提供することで、全体的なアルゴリズムの品質を向上させることである。より原則化された正当化は、アルゴリズムの展開に影響を受けそうな人たちへの声を提供し、AIシステムの信頼と購入を築こうとするエンゲージメントを通じて実現している。参加型アプローチでは、AIシステムのライフサイクルを通じて、実際の意思決定プロセスにさまざまな利害関係者(特定の方法を定義する)を含めます。上記の正当化にもかかわらず、実際の実装は、プロセス全体の利害関係者の特定方法、どのような情報が提供され、どのように組み込まれているかに大きく依存する。本稿では、これらの予備的な結論を、法と秩序の覚醒における顔認識技術の使用と、医療分野における大規模言語モデルの使用の2つの分野で検証する。これらの部門は2つの主要な理由から選ばれた。 Facial Recognition Technologiesは、よく研究され、その影響が十分に文書化されているAIソリューションの分野であるため、PAIを既存のドメイン、特に最近かなり批判的な領域に適応するさまざまな側面を説明するための確立されたスペースを提供する。医療分野におけるLLMは、比較的研究の少ない分野のキャンバスを提供し、イノベーションが常に患者の福祉と整合しなくてはならない分野において、比較的新しい技術のためにPAIの原則を具現化する方法を、どのように想像できるかを説明するのに役立つ。

This paper forms the second of a two-part series on the value of a participatory approach to AI development and deployment. The first paper had crafted a principled, as well as pragmatic, justification for deploying participatory methods in these two exercises (that is, development and deployment of AI). The pragmatic justification is that it improves the quality of the overall algorithm by providing more granular and minute information. The more principled justification is that it offers a voice to those who are going to be affected by the deployment of the algorithm, and through engagement attempts to build trust and buy-in for an AI system. By a participatory approach, we mean including various stakeholders (defined a certain way) in the actual decision making process through the life cycle of an AI system. Despite the justifications offered above, actual implementation depends crucially on how stakeholders in the entire process are identified, what information is elicited from them, and how it is incorporated. This paper will test these preliminary conclusions in two sectors, the use of facial recognition technology in the upkeep of law and order and the use of large language models in the healthcare sector. These sectors have been chosen for two primary reasons. Since Facial Recognition Technologies are a branch of AI solutions that are well-researched and the impact of which is well documented, it provides an established space to illustrate the various aspects of adapting PAI to an existing domain, especially one that has been quite contentious in the recent past. LLMs in healthcare provide a canvas for a relatively less explored space, and helps us illustrate how one could possibly envision enshrining the principles of PAI for a relatively new technology, in a space where innovation must always align with patient welfare.

翻訳日:2024-07-22 08:07:30 公開日:2024-06-03

# MOT:アルゴリズム取引のための最適輸送によるアクター強化学習手法の混合

MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading ( http://arxiv.org/abs/2407.01577v1 )

ライセンス: Link先を確認

Xi Cheng, Jinghao Zhang, Yunan Zeng, Wenfang Xue,

(参考訳) アルゴリズム取引は、自動的に特定された取引機会に基づいて、特定の資産の売買注文を実行することを指す。強化学習(RL)に基づく戦略は,アルゴリズム取引問題に対処する際,顕著な能力を示した。しかし、流通データの変化により、取引パターンは市場状況によって異なる。データ内の複数のパターンを無視することは、RLのパフォーマンスを損なう。本稿では,複数のアクターを非交叉表現学習で設計し,市場の異なるパターンをモデル化するMOTを提案する。さらに、正規化損失項を導入することにより、サンプルを適切なアクターに割り当てるために、最適なトランスポート(OT)アルゴリズムを組み込む。さらに,アクターの出力を専門家の戦略と整合させ,RLの探索と活用のバランスを良くすることで,模倣学習を容易にするためのPretrain Moduleを提案する。将来の市場データによる実験結果から,MOTはリスクのバランスを保ちながら優れた収益性を示すことが示された。アブレーション研究はMOTの成分の有効性を検証する。

Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the data will undermine the performance of RL. In this paper, we propose MOT,which designs multiple actors with disentangled representation learning to model the different patterns of the market. Furthermore, we incorporate the Optimal Transport (OT) algorithm to allocate samples to the appropriate actor by introducing a regularization loss term. Additionally, we propose Pretrain Module to facilitate imitation learning by aligning the outputs of actors with expert strategy and better balance the exploration and exploitation of RL. Experimental results on real futures market data demonstrate that MOT exhibits excellent profit capabilities while balancing risks. Ablation studies validate the effectiveness of the components of MOT.

翻訳日:2024-07-07 13:34:23 公開日:2024-06-03

# 反復的局所探索-スパロー探索アルゴリズムに基づくユーザVR体験予測のためのランダムフォレスト機械学習アルゴリズムの最適化

Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm ( http://arxiv.org/abs/2406.16905v1 )

ライセンス: Link先を確認

Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong,

(参考訳) 本稿では,空間探索アルゴリズムと局所探索最適化スパロウ探索アルゴリズムにより改良されたランダムフォレストアルゴリズムを導入することにより,VRユーザエクスペリエンス予測の改善手法について検討する。この研究はまずデータを統計的に分析し、続いて従来のランダム森林モデルを用いて訓練および試験を行い、スパロウ探索アルゴリズムによって改良されたランダム森林モデルと、反復的局所探索-スパロー探索アルゴリズムに基づいて改良されたランダム森林アルゴリズムを用いてランダム森林モデルを構築した。その結果、従来のランダム林モデルでは、トレーニングセットで93%の予測精度を持つが、一般化が不十分なテストセットでは73.3%に過ぎず、一方、スパロウ探索アルゴリズムで改良されたモデルは、従来のモデルと比較して94%の予測精度を持つことがわかった。さらに注目すべきは、反復的な局所探索-スパロー探索アルゴリズムに基づく改良されたモデルが、トレーニングとテストセットの両方で100%精度を達成し、他の2つの手法よりもはるかに優れていることである。これらの研究結果は、VRユーザエクスペリエンス予測の新しいアイデアと方法、特に、反復的局所探索-スパロー探索アルゴリズムに基づく改善されたモデルを提供し、ユーザのVRエクスペリエンスをより正確に予測し、分類することができる。将来的には、他の分野への本手法の適用をさらに検討し、実際の事例を通してその有効性を検証し、ユーザエクスペリエンス分野におけるAI技術の開発を促進することができる。

In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by the sparrow search algorithm, and the random forest algorithm improved based on the iterative local search-sparrow search algorithm, respectively. The results show that the traditional random forest model has a prediction accuracy of 93% on the training set but only 73.3% on the test set, which is poor in generalisation; whereas the model improved by the sparrow search algorithm has a prediction accuracy of 94% on the test set, which is improved compared with the traditional model. What is more noteworthy is that the improved model based on the iterative local search-sparrow search algorithm achieves 100% accuracy on both the training and test sets, which is significantly better than the other two methods. These research results provide new ideas and methods for VR user experience prediction, especially the improved model based on the iterative local search-sparrow search algorithm performs well and is able to more accurately predict and classify the user's VR experience. In the future, the application of this method in other fields can be further explored, and its effectiveness can be verified through real cases to promote the development of AI technology in the field of user experience.

翻訳日:2024-07-01 06:41:31 公開日:2024-06-03

# REST: 残留状態更新による効率よく加速されたEEGseizure分析

REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates ( http://arxiv.org/abs/2406.16906v1 )

ライセンス: Link先を確認

Arshia Afzal, Grigorios Chrysos, Volkan Cevher, Mahsa Shoaran,

(参考訳) EEGベースの発作検出モデルは、推測速度とメモリ効率の点で課題に直面し、臨床機器におけるリアルタイム実装を制限する。本稿では、てんかん発作検出などのアプリケーションにおけるリアルタイム脳波信号解析のための新しいグラフベースの残状態更新機構(REST)を提案する。グラフニューラルネットワークとリカレント構造の組み合わせを活用することで、RESTは、非ユークリッド幾何学とEEGデータ内の時間的依存関係の両方を効率的にキャプチャする。本モデルは,発作検出と分類作業において高い精度を示す。特に、RESTは最先端のモデルと比較して、推論速度の9倍の大幅な加速を実現していますが、同時にこのタスクで使用される最小のモデルよりもメモリをかなり少なく要求しています。これらの属性は、RESTを、レスポンシブ神経刺激や発作警報システムなど、臨床機器におけるリアルタイム実装の候補と位置づけている。

EEG-based seizure detection models face challenges in terms of inference speed and memory efficiency, limiting their real-time implementation in clinical devices. This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis in applications such as epileptic seizure detection. By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data. Our model demonstrates high accuracy in both seizure detection and classification tasks. Notably, REST achieves a remarkable 9-fold acceleration in inference speed compared to state-of-the-art models, while simultaneously demanding substantially less memory than the smallest model employed for this task. These attributes position REST as a promising candidate for real-time implementation in clinical devices, such as Responsive Neurostimulation or seizure alert systems.

翻訳日:2024-07-01 06:41:31 公開日:2024-06-03

# FLOW:IMUを用いたユーザ間人間活動認識のためのグローバルおよびローカルビューの融合とシャッフル

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs ( http://arxiv.org/abs/2406.18569v1 )

ライセンス: Link先を確認

Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie,

(参考訳) 慣性測定ユニット(IMU)センサーは、可搬性、エネルギー効率、研究の関心の高まりにより、HAR(Human Activity Recognition)に広く利用されている。しかし、IMU-HARモデルにとって重要な課題は、多様なユーザー間で堅牢な一般化性能を達成することである。この制限は、個々のユーザ間でのデータ分散のかなりのバリエーションに起因する。この分布の相違の主な理由は、局所座標系におけるIMUセンサデータの表現にある。この問題に対処するために,IMUデータの特徴に基づいてグローバルなビュー表現を抽出し,着用スタイルによるデータ分散の相違を効果的に緩和する手法を提案する。グローバルビュー表現の有効性を検証するため,グローバルビューデータとローカルビューデータの両方を実験モデルに投入した。その結果,グローバルなビューデータは,ユーザ間の実験において,ローカルなビューデータよりも有意に優れていた。さらに,Shufflingに基づくマルチビュー監視ネットワーク(MVFNet)を提案し,ローカルビューとグローバルビューデータを効果的に融合させる。ビュー分割とビューシャッフルを通じて各ビューの特徴抽出を監督し、重要な特徴を無視したモデルを避ける。 OPPORTUNITYとPAMAP2データセットを用いた大規模な実験により、提案アルゴリズムはユーザ間HARにおける現在の最先端手法よりも優れていることを示した。

Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-03

# 画像生成器の創造的な流動度を計測する「バグ」ではなく「バグ」

It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators ( http://arxiv.org/abs/2406.18570v1 )

ライセンス: Link先を確認

Aditi Ramaswamy, Melane Navaratnarajah, Hana Chockler,

(参考訳) 無償で利用できる画像生成装置の登場に伴い、AI生成アートは、人間の創造性の概念に関する一連の熱い議論の中心となっている。画像生成AIは、アーティストと同じタイプの「創造性」を示すことができる。本稿では,AIにおける創造的行動の1つの側面を定義し,実験的に測定する試みとして,選択された画像生成装置の「素早い解釈の流動性」や単に「流動性」を定量化する実験を行った。流動性を研究するために,(1) 初期「地中真実」の画像を用いた自動生成プロンプトと画像のチェーンの作成,(3) 既存の視覚的および意味的指標を用いたこれらのチェーンの破壊点の測定,(4) 統計的検査と視覚的説明の両方を用いて,これらのチェーンを解析し,生成に使用する画像生成装置が流動性を示すか否かを判定する。

With the rise of freely available image generators, AI-generated art has become the centre of a series of heated debates, one of which concerns the concept of human creativity. Can an image generation AI exhibit ``creativity'' of the same type that artists do, and if so, how does that manifest? Our paper attempts to define and empirically measure one facet of creative behavior in AI, by conducting an experiment to quantify the "fluidity of prompt interpretation", or just "fluidity", in a series of selected popular image generators. To study fluidity, we (1) introduce a clear definition for it, (2) create chains of auto-generated prompts and images seeded with an initial "ground-truth: image, (3) measure these chains' breakage points using preexisting visual and semantic metrics, and (4) use both statistical tests and visual explanations to study these chains and determine whether the image generators used to produce them exhibit significant fluidity.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-03

# UltraCortex: サブミリ超高磁場9.4 T1脳MR画像収集と手動皮質切片

UltraCortex: Submillimeter Ultra-High Field 9.4 T1 Brain MR Image Collection and Manual Cortical Segmentations ( http://arxiv.org/abs/2406.18571v1 )

ライセンス: Link先を確認

Lucas Mahler, Julius Steiglechner, Benjamin Bender, Tobias Lindig, Dana Ramadan, Jonas Bause, Florian Birk, Rahel Heule, Edyta Charyasz, Michael Erb, Vinod Jangir Kumar, Gisela E Hagberg, Pascal Martin, Gabriele Lohmann, Klaus Scheffler,

(参考訳) UltraCortexリポジトリ(https://www.ultracortex.org)には、超高磁場強度9.4Tで取得したヒト脳の磁気共鳴画像データが格納されている。さらに、レポジトリは12の脳をグレーとホワイトの物質区画に分割する。これらのセグメンテーションは、2人の専門神経放射線学者によって独立に検証され、信頼できる金の標準として確立されている。このリソースは、高品質な脳画像データと検証されたセグメンテーションへのアクセスを提供し、神経画像の研究を促進し、脳の構造と機能の理解を促進する。既存のリポジトリは7 T以上のフィールド強度を許容せず、検証されたセグメンテーションも提供せず、この新しいリソースの重要性を強調している。

The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independently validated by two expert neuroradiologists, thus establishing them as a reliable gold standard. This resource provides researchers with access to high-quality brain imaging data and validated segmentations, facilitating neuroimaging studies and advancing our understanding of brain structure and function. Existing repositories do not accommodate field strengths beyond 7 T, nor do they offer validated segmentations, underscoring the significance of this new resource.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-03

# GeoReasoner:大規模視覚言語モデルを用いたストリートビューにおける推論による地理局在化

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model ( http://arxiv.org/abs/2406.18572v1 )

ライセンス: Link先を確認

Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng,

(参考訳) 本研究は,人間の推論知識を付加した大規模視覚言語モデル (LVLM) を用いた新しいパラダイムを用いて,ジオローカライゼーションの課題に取り組む。既存のストリートビューデータセットには、視覚的な手がかりが欠如し、推論が欠如している多くの低品質画像が含まれていることが多い。データ品質の問題に対処するため、我々はCLIPベースのネットワークを考案し、街路ビューがどこにあるかを定量化し、高度に配置可能な街路ビューからなる新しいデータセットを作成する。推論の精度を高めるために,実地局所化ゲームから得られた外部知識を統合し,価値ある人間の推論能力を活用する。データはGeoReasonerのトレーニングに利用される。質的および定量的評価により、GeoReasonerは、国レベルでは25%以上、都市レベルでは38%、StreetCLIPのパフォーマンスを上回り、トレーニングリソースの削減を図っている。データとコードはhttps://github.com/lingli1996/GeoReasoner.comで入手できる。

This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https://github.com/lingli1996/GeoReasoner.

翻訳日:2024-07-01 05:50:36 公開日:2024-06-03

# O(3)等変結晶テンソル予測のための空間群対称性インフォームドネットワーク

A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction ( http://arxiv.org/abs/2406.12888v1 )

ライセンス: Link先を確認

Keqiang Yan, Alexandra Saxton, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji,

(参考訳) 誘電体,圧電体,弾性テンソルを含む結晶材料の一般的な引張特性の予測を考察する。ここでの重要な課題は、予測が O(3) 群に対する一意のテンソル同値と結晶空間群への不変性を満足させる方法である。そこで本研究では,必要な対称性を満たすために,GMTNet(General Materials Tensor Network)を提案する。提案手法を評価するため, 結晶テンソル予測の複雑さに合わせて, データセットをキュレートし, 評価指標を確立する。実験結果から,GMTNetは様々な順序の結晶テンソル上での有望な性能を達成するだけでなく,固有結晶対称性と完全に一致した予測を生成することがわかった。私たちのコードはAIRSライブラリ(https://github.com/divelab/AIRS)の一部として公開されています。

We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to O(3) group and invariance to crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries. To evaluate our method, we curate a dataset and establish evaluation metrics that are tailored to the intricacies of crystal tensor predictions. Experimental results show that our GMTNet not only achieves promising performance on crystal tensors of various orders but also generates predictions fully consistent with the intrinsic crystal symmetries. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).

翻訳日:2024-06-23 13:24:48 公開日:2024-06-03

# 小言語モデルにおけるスパースアクティベーションの実現

Achieving Sparse Activation in Small Language Models ( http://arxiv.org/abs/2406.06562v1 )

ライセンス: Link先を確認

Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao,

(参考訳) 入力依存ニューロンのみを選択的に活性化するスパースアクティベーションは、再訓練や適応をすることなく、LLM(Large Language Models)の計算コストを削減するのに有用である。しかし、最近登場したSLM(Small Language Models)に適用できるかどうかは疑問視されている。本稿では,SLMにおけるスパースアクティベーションの実現を目指す。まず, ニューロンの出力大小をベースとしたLLMのスパース活性化スキームはSLMには適用できないことを示し, その属性スコアに基づいてニューロンを活性化することがよりよい選択肢であることを示した。さらに,異なる層にまたがるニューロンの属性スコア間の相互依存性から,スパースアクティベーション時に既存の属性メトリクスの大規模な誤差を実証し,定量化した。これらの観測に基づいて,これらの誤りを確実に修正し,正確なスパースアクティベーションを実現するための新しい属性指標を提案した。複数のSLMおよびデータセットに対する実験結果から,本手法はモデルの精度損失を5%に抑えながら80%のスペース化比を達成できることが示唆された。ソースコードは、https://github.com/pittisl/Sparse-Activation.comで入手できる。

Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In this paper, we aim to achieve sparse activation in SLMs. We first show that the existing sparse activation schemes in LLMs that build on neurons' output magnitudes cannot be applied to SLMs, and activating neurons based on their attribution scores is a better alternative. Further, we demonstrated and quantified the large errors of existing attribution metrics when being used for sparse activation, due to the interdependency among attribution scores of neurons across different layers. Based on these observations, we proposed a new attribution metric that can provably correct such errors and achieve precise sparse activation. Experiments over multiple popular SLMs and datasets show that our approach can achieve 80% sparsification ratio with <5% model accuracy loss, comparable to the sparse activation achieved in LLMs. The source code is available at: https://github.com/pittisl/Sparse-Activation.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# Skywork-MoE:Mixture-of-Experts言語モデルのトレーニングテクニックを深く掘り下げる

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models ( http://arxiv.org/abs/2406.06563v1 )

ライセンス: Link先を確認

Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou,

(参考訳) 本稿では,約1460億のパラメータと16人のエキスパートを対象とする,高性能な言語モデル (LLM) であるSkywork-MoEの開発に実装されたトレーニング手法を紹介する。既存のSkywork-13Bモデルの高密度チェックポイントから初期化されています。我々は,スクラッチ初期化によるアップサイクリングとトレーニングの比較効果について検討した。以上の結果から,これらの2つのアプローチの選択は,既存の高密度チェックポイントの性能とMoEトレーニング予算の両方を考慮すべきであることが示唆された。本稿では, 適応型補助損失係数, 適応型補助損失係数を改良し, 補助損失係数の層特異的な調整を可能にする2つの革新的な手法について述べる。これらの手法の有効性を実験的に検証した。これらの技術と洞察を活用して、SkyPileコーパスの凝縮したサブセットで、リサイクルされたSkywork-MoEをトレーニングしました。評価結果は,本モデルが幅広いベンチマークで高い性能を示すことを示す。

In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# 動的パラメータ調整による大規模言語モデル学習の革新

Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment ( http://arxiv.org/abs/2406.06564v1 )

ライセンス: Link先を確認

Kaiye Zhou, Shucheng Wang,

(参考訳) 大規模言語モデルの時代になると、計算資源の効率的な利用への需要が重要になってきている。パラメータ効率のよい微調整技術は完全な微調整に匹敵する結果を得たが、事前学習フェーズでの応用は大きな課題を生んでいる。具体的には、特に大規模モデルにおいて、事前学習の開始時にパラメータ効率の戦略を採用することは、効率を著しく損なう可能性がある。本稿では,パラメータのトレーニング可能な部分を頻繁に変更し,効果的な事前学習を容易にする新しいパラメータ効率訓練手法を提案する。提案手法は, 事前学習段階における現在最先端パラメータ効率アルゴリズムに匹敵するメモリ削減と計算オーバーヘッドを達成するだけでなく, 完全事前学習段階に匹敵する精度も維持する。提案手法の有効性を実証するために,理論的解析と実証的証拠の両方を提供する。

In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can severely compromise efficiency, especially in larger models. In this paper, building upon the fine-tuning method LoRA, we introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method not only achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase but also maintains accuracy levels comparable to those of full pre-training. We provide both theoretical analyses and empirical evidence to demonstrate the effectiveness of our approach.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# MixEval: LLMベンチマークから群衆の知恵を導き出す

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures ( http://arxiv.org/abs/2406.06565v1 )

ライセンス: Link先を確認

Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You,

(参考訳) 大規模言語モデル(LLM)の評価は難しい。 LLM-as-judgeベンチマークは、グレーディングバイアスと限られたクエリ量に悩まされている。両者とも時間とともに汚染されることもある。 Chatbot Arenaのようなユーザによる評価は、信頼できる信号を提供するが、高価で遅い。そこで本研究では,市販のベンチマークを戦略的に混合することにより,効率的な金標準LCM評価を実現するための新しいパラダイムであるMixEvalを提案する。提案手法は,(1)包括的でよく分散された実世界のユーザクエリと(2)Webから抽出したクエリと,既存のベンチマークからの類似したクエリとをマッチングすることによって,効率よく,かつ,かなり改善された基盤トラスベースのベンチマークを橋渡しする。 MixEvalをベースにMixEval-Hardを構築しました。本ベンチマークの利点は,(1) 高速かつ安価かつ再現性の高い実行(MMLUの時間とコストの6%),(3) 高速かつ安定なデータ更新パイプラインで実現可能な動的評価などである。我々は, LLM評価に関するコミュニティの理解を深め, 今後の研究方向性を導くため, 既存の LLM ベンチマークのメタ評価と分析を行う。

Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and slow. In this work, we propose MixEval, a new paradigm for establishing efficient, gold-standard LLM evaluation by strategically mixing off-the-shelf benchmarks. It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks. Based on MixEval, we further build MixEval-Hard, which offers more room for model improvement. Our benchmarks' advantages lie in (1) a 0.96 model ranking correlation with Chatbot Arena arising from the highly impartial query distribution and grading mechanism, (2) fast, cheap, and reproducible execution (6% of the time and cost of MMLU), and (3) dynamic evaluation enabled by the rapid and stable data update pipeline. We provide extensive meta-evaluation and analysis for our and existing LLM benchmarks to deepen the community's understanding of LLM evaluation and guide future research directions.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# 家庭電力モニタリングに関するRAGの議論が可能に

RAG Enabled Conversations about Household Electricity Monitoring ( http://arxiv.org/abs/2406.06566v1 )

ライセンス: Link先を確認

Carolina Fortuna, Vid Hanžel, Blaž Bertalanič,

(参考訳) 本稿では,ChatGPT,Gemini,Llamaなどの大規模言語モデル(LLM)とRAG(Retrieval Augmented Generation)を統合することにより,電気データセットに関する複雑な質問に対する応答の精度と特異性を向上する。実感的理解よりもトレーニングデータのパターンに依存しているため,LLMの正確で文脈的に関係のある回答を生成する際の限界を認識し,専門的な電気知識グラフを活用するソリューションを提案する。このアプローチは、LLMの生成能力によって合成される正確なリアルタイムデータの検索を容易にする。以上の結果から,RAG手法はLLMが生成する誤情報の発生を減少させるだけでなく,検証可能なデータに応答することで,出力の質を著しく向上させることがわかった。本稿では、我々の方法論を詳述し、RAGを用いた応答と非応答の比較分析を行い、エネルギーデータ分析のような専門分野におけるAIの今後の応用について考察する。

In this paper, we investigate the integration of Retrieval Augmented Generation (RAG) with large language models (LLMs) such as ChatGPT, Gemini, and Llama to enhance the accuracy and specificity of responses to complex questions about electricity datasets. Recognizing the limitations of LLMs in generating precise and contextually relevant answers due to their dependency on the patterns in training data rather than factual understanding, we propose a solution that leverages a specialized electricity knowledge graph. This approach facilitates the retrieval of accurate, real-time data which is then synthesized with the generative capabilities of LLMs. Our findings illustrate that the RAG approach not only reduces the incidence of incorrect information typically generated by LLMs but also significantly improves the quality of the output by grounding responses in verifiable data. This paper details our methodology, presents a comparative analysis of responses with and without RAG, and discusses the implications of our findings for future applications of AI in specialized sectors like energy data analysis.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# DHA:適応型頭融合による変圧器チェックポイントからの非結合型注意の学習

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion ( http://arxiv.org/abs/2406.06567v1 )

ライセンス: Link先を確認

Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun,

(参考訳) 数十億のパラメータを持つ大規模言語モデル(LLM)は、素晴らしいパフォーマンスを示している。しかし、LLMにおけるMHA(Multi-Head Attention)は、推論中にかなりの計算コストとメモリコストを発生させる。ヘッドを切断したり、ヘッド間でパラメータを共有することで注意機構を最適化する試みもあるが、これらの手法は性能低下や性能回復のためにかなりの事前訓練コストを必要とすることが多い。注意力の冗長性の分析に基づいて,DHA(Decoupled-Head Attention)機構を設計する。 DHAは、様々なレイヤにわたるキーヘッドとバリューヘッドのグループ共有を適応的に構成し、パフォーマンスと効率のバランスを改善する。そこで本研究では,MHAチェックポイントのパラメトリック知識を維持しつつ,類似頭部パラメータの線形融合を段階的に行うことで,MHAチェックポイントをDHAモデルに段階的に変換することを提案する。 DHA モデルの構築には,目標とする予算に応じて様々な MHA チェックポイントを変換する。我々の実験によると、DHAは、75パーセントのKVキャッシュを節約しながら97.6%のパフォーマンスを達成するために、オリジナルのモデルの事前トレーニング予算のわずか0.25\%しか必要としていない。グループクエリアテンション(GQA)と比較して、DHAは5$\times$トレーニングアクセラレーション、最大13.93\%の事前トレーニング予算、0.05\%の相対的な改善を達成している。

Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget.

翻訳日:2024-06-17 00:11:14 公開日:2024-06-03

# 合成データによる臨床ドキュメンテーションの強化:精度向上のための生成モデルを活用する

Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy ( http://arxiv.org/abs/2406.06569v1 )

ライセンス: Link先を確認

Anjanava Biswas, Wrick Talukdar,

(参考訳) 正確かつ包括的な臨床文書は、高品質な医療の提供、提供者間の効果的なコミュニケーションの促進、規制要件の遵守の確保に不可欠である。しかし、手動による書き起こしとデータ入力のプロセスは、時間がかかり、エラーが発生し、不整合に陥り、不完全または不正確な医療記録に繋がる。本稿では, 臨床文書の現実的, 多様な書式を生成するために, 合成データ生成技術を活用することによって, 臨床文書の充実に向けた新たなアプローチを提案する。本稿では,GAN (Generative Adversarial Networks) やVAE (variantal Autoencoders) といった最先端のジェネレーティブ・モデルと,実際の臨床転写とその他の臨床データを組み合わせて合成転写を生成する手法を提案する。これらの合成写本は、既存のドキュメントワークフローを補完し、自然言語処理モデルのための追加のトレーニングデータを提供し、より正確で効率的な転写プロセスを可能にするために使用することができる。匿名化クリニカル・トランスクリプトの大規模なデータセットに関する広範な実験を通じて、実世界のデータによく似た高品質な合成・トランスクリプトを作成する上で、我々のアプローチの有効性を実証した。パープレキシティスコアやBLEUスコア、ドメインの専門家による質的評価などの定量的評価指標は、生成された合成転写産物の忠実さと有用性を検証する。本研究は, 患者医療の改善, 管理負担の軽減, 医療システム効率の向上など, 臨床ドキュメントの課題に対処する合成データ生成の可能性を明らかにするものである。

Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper proposes a novel approach to augment clinical documentation by leveraging synthetic data generation techniques to generate realistic and diverse clinical transcripts. We present a methodology that combines state-of-the-art generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), with real-world clinical transcript and other forms of clinical data to generate synthetic transcripts. These synthetic transcripts can then be used to supplement existing documentation workflows, providing additional training data for natural language processing models and enabling more accurate and efficient transcription processes. Through extensive experiments on a large dataset of anonymized clinical transcripts, we demonstrate the effectiveness of our approach in generating high-quality synthetic transcripts that closely resemble real-world data. Quantitative evaluation metrics, including perplexity scores and BLEU scores, as well as qualitative assessments by domain experts, validate the fidelity and utility of the generated synthetic transcripts. Our findings highlight synthetic data generation's potential to address clinical documentation challenges, improving patient care, reducing administrative burdens, and enhancing healthcare system efficiency.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# コンピュータ・エピグラフィーの概観

Review of Computational Epigraphy ( http://arxiv.org/abs/2406.06570v1 )

ライセンス: Link先を確認

Vishal Kumar,

(参考訳) 計算エピノグラフィー(Computational Epigraphy)とは、計算手法の助けを借りて、石碑文、翻訳、解釈、帰属からテキストを抽出する過程を指す。伝統的なエピノグラフィー法は時間がかかり、テキストを抽出しながら碑文を損傷させる傾向がある。さらに、解釈と帰属は主観的であり、異なるエピグラフィーによって異なる可能性がある。しかし、現代の計算手法は、テキストを抽出するだけでなく、テキストを頑健な方法で解釈し、属性付けするためにも利用できる。エピノグラフィーにおける上記の課題を支援する既存の計算手法を調査・文書化する。

Computational Epigraphy refers to the process of extracting text from stone inscription, transliteration, interpretation, and attribution with the aid of computational methods. Traditional epigraphy methods are time consuming, and tend to damage the stone inscriptions while extracting text. Additionally, interpretation and attribution are subjective and can vary between different epigraphers. However, using modern computation methods can not only be used to extract text, but also interpret and attribute the text in a robust way. We survey and document the existing computational methods that aid in the above-mentioned tasks in epigraphy.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# SUBLLM: LLMのためのToken Sequence Subsamplingを用いた新しい効率的なアーキテクチャ

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM ( http://arxiv.org/abs/2406.06571v1 )

ライセンス: Link先を確認

Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang,

(参考訳) 大規模言語モデル(LLM)は様々な分野で大きな成功を収めてきたが、トレーニングと推論の効率性は依然として大きな課題である。本稿では,Subsampling-Upsampling-Bypass Large Language Modelの略で,Subsampling, Upsampling, Bypassモジュールを組み込んでコアデコーダのみのフレームワークを拡張する革新的なアーキテクチャであるSUBLLMを提案する。サブサンプリングモジュールはシーケンスを短縮し、アップサンプリングモジュールはシーケンスの長さを復元し、バイパスモジュールは収束を高める。 LLaMAと比較して、提案されたSUBLLMは、トレーニング速度と推論速度、メモリ使用量の両方で大幅に向上し、競合する数ショットのパフォーマンスを維持している。トレーニング中、SUBLLMはスピードを26%向上し、GPU毎にメモリを10GB削減する。推論では、スピードを最大37%向上し、1GPUあたりのメモリを1GB削減する。トレーニングと推論のスピードは、コンテキストウィンドウが8192に拡張された場合、それぞれ34%と52%向上できる。提案されたアーキテクチャのソースコードを公開バージョンで公開します。

While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. We shall release the source code of the proposed architecture in the published version.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# グラフニューラルネットワークによるLLMの問合せ検索

Graph Neural Network Enhanced Retrieval for Question Answering of LLMs ( http://arxiv.org/abs/2406.06572v1 )

ライセンス: Link先を確認

Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang,

(参考訳) 検索拡張生成は、ファクトサポートを提供することで、大規模言語モデル(LLM)の出力に革命をもたらした。それにもかかわらず、複雑な推論問題に必要な知識をすべて捉えるのに苦労している。既存の検索方法は通常、参照文書を通路に分割し、それらを分離して扱う。しかし、これらの節はしばしば相互に関連しており、例えば連続した節や同じキーワードを共有している節などである。したがって、検索プロセスの強化には関連性を認識することが不可欠である。本稿では,グラフニューラルネットワーク(GNN)を利用した新しい検索手法GNN-Retを提案する。具体的には、まず、構造に関連がありキーワードに関連のある通路を接続することで、通路のグラフを構築する。グラフニューラルネットワーク(GNN)を使用して、パス間の関係を利用して、サポートパスの検索を改善する。さらに、リカレントグラフニューラルネットワーク(RGNN-Ret)を用いて、マルチホップ推論問題に対処する手法を拡張した。各ステップにおいて、RGNN-Retは、前のステップからのパスのグラフを統合し、サポートパスの検索を強化する。ベンチマークデータセットに対する大規模な実験により、GNN-Retは複数のクエリを必要とする強いベースラインよりも単一のLLMクエリによる質問応答の精度が高く、RGNN-Retはさらに精度を改善し、最先端のパフォーマンスを実現し、2WikiMQAデータセットでは最大10.4%の精度向上を実現している。

Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are contiguous or share the same keywords. Therefore, recognizing the relatedness is crucial for enhancing the retrieval process. In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by considering the relatedness between passages. Specifically, we first construct a graph of passages by connecting passages that are structure-related and keyword-related. A graph neural network (GNN) is then leveraged to exploit the relationships between passages and improve the retrieval of supporting passages. Furthermore, we extend our method to handle multi-hop reasoning questions using a recurrent graph neural network (RGNN), named RGNN-Ret. At each step, RGNN-Ret integrates the graphs of passages from previous steps, thereby enhancing the retrieval of supporting passages. Extensive experiments on benchmark datasets demonstrate that GNN-Ret achieves higher accuracy for question answering with a single query of LLMs than strong baselines that require multiple queries, and RGNN-Ret further improves accuracy and achieves state-of-the-art performance, with up to 10.4% accuracy improvement on the 2WikiMQA dataset.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# MedFuzz: 医療質問応答における大規模言語モデルのロバスト性を探る

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering ( http://arxiv.org/abs/2406.06573v1 )

ライセンス: Link先を確認

Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E. Priebe, Eric Horvitz,

(参考訳) 大規模言語モデル (LLM) は、医学的質問応答ベンチマークにおいて優れたパフォーマンスを達成している。しかし、高いベンチマーク精度は、実際の臨床環境にパフォーマンスが一般化することを意味するものではない。医学的質問答えベンチマークは、LLMのパフォーマンスの定量化と整合した仮定に頼っているが、クリニックのオープンワールドには当てはまらないかもしれない。しかし、LLMは、著名なベンチマークにおける非現実的な仮定に関係なく、LLMが実践的な条件に一般化するのに役立つ幅広い知識を学習する。我々は, LLM の医学的質問応答ベンチマークの性能が, ベンチマークの仮定に違反した場合にどのように一般化するかを定量化する。具体的には,MedFuzz(医療ファズリング)と呼ぶ逆法を提案する。 MedFuzz は LLM のコンバウンドを目的とした方法でベンチマークの問題を修正しようと試みている。 MedQAベンチマークで提示された患者特性に関する強い仮定をターゲットとして,本手法を実証する。成功した"アタック"は、ベンチマーク項目を医療専門家を騙す可能性が低い方法で修正するが、にもかかわらず、LSMを正しい回答から間違った回答に変更させる"トリック"を行う。さらに,攻撃を成功させることが統計的に重要であることを示すための置換試験手法を提案する。我々は、"MedFuzzed"ベンチマークでのパフォーマンスの使用方法と、個々の攻撃の成功例を示します。これらの手法は、LLMがより現実的な環境で堅牢に動作する能力についての洞察を提供することを約束している。

Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong assumptions about patient characteristics presented in the MedQA benchmark. Successful "attacks" modify a benchmark item in ways that would be unlikely to fool a medical expert but nonetheless "trick" the LLM into changing from a correct to an incorrect answer. Further, we present a permutation test technique that can ensure a successful attack is statistically significant. We show how to use performance on a "MedFuzzed" benchmark, as well as individual successful attacks. The methods show promise at providing insights into the ability of an LLM to operate robustly in more realistic settings.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# 透明性に向けて:ビジュアルトピックモデリングとセマンティックフレームによるLCMトレーニングデータセットの探索

Towards Transparency: Exploring LLM Trainings Datasets through Visual Topic Modeling and Semantic Frame ( http://arxiv.org/abs/2406.06574v1 )

ライセンス: Link先を確認

Charles de Dampierre, Andrei Mogoutov, Nicolas Baumard,

(参考訳) LLMは現在、質問に答えることから物事の分類に至るまで、日々の生活において重要な役割を担っている。近年、計算とモデルアーキテクチャは急速に拡大しているが、トレーニングデータセットのキュレーションへの取り組みはまだ始まったばかりである。このトレーニングデータセットの過小評価により、LLMはバイアスのある低品質のコンテンツを作成できるようになった。この問題を解決するために、AIと認知科学を活用してテキストデータセットの洗練を改善するソフトウェアであるBunkaを紹介する。トピックモデリングと2次元カルトグラフィーを組み合わせることで、データセットの透明性が向上することを示す。次に、同じトピックモデリング手法をPreferencesデータセットに適用して、微調整プロセスを加速し、異なるベンチマーク上でモデルの能力を高める方法を示す。最後に、フレーム分析を用いることで、トレーニングコーパス内の既存のバイアスに対する洞察が得られることを示す。全体として、私たちはLLMのトレーニングデータセットの品質と透明性を探求し、向上するためのより良いツールが必要であると論じています。

LLMs are now responsible for making many decisions on behalf of humans: from answering questions to classifying things, they have become an important part of everyday life. While computation and model architecture have been rapidly expanding in recent years, the efforts towards curating training datasets are still in their beginnings. This underappreciation of training datasets has led LLMs to create biased and low-quality content. In order to solve that issue, we present Bunka, a software that leverages AI and Cognitive Science to improve the refinement of textual datasets. We show how Topic Modeling coupled with 2-dimensional Cartography can increase the transparency of datasets. We then show how the same Topic Modeling techniques can be applied to Preferences datasets to accelerate the fine-tuning process and increase the capacities of the model on different benchmarks. Lastly, we show how using Frame Analysis can give insights into existing biases in the training corpus. Overall, we argue that we need better tools to explore and increase the quality and transparency of LLMs training datasets.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# Ask-EDA: LLM, Hybrid RAG, Abbreviation De-hallucinationを活用したデザインアシスタント

Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-hallucination ( http://arxiv.org/abs/2406.06575v1 )

ライセンス: Link先を確認

Luyao Shi, Michael Kazda, Bradley Sears, Nick Shropshire, Ruchir Puri,

(参考訳) 電子設計技術者は、設計構築、検証、技術開発における無数のタスクに対して、関連情報を効率的に見つけることが課題である。大規模言語モデル(LLM)は、主語の専門家として効果的に機能する会話エージェントとして機能することで生産性を向上させる可能性がある。本稿では,設計技術者にガイダンスを提供するために,24x7のエキスパートとして設計されたチャットエージェントであるAsk-EDAを実演する。 Ask-EDAは、LLM、ハイブリッド検索拡張生成(RAG)、短縮脱ハロシン化(ADH)技術を利用して、より関連性が高く正確な応答を提供する。我々は,q2a-100,cmds-100,abr-100の3つの評価データセットをキュレートした。各データセットは、一般的な設計質問応答、デザインコマンドハンドリング、省略解決といった、異なる側面を評価するように調整されている。我々は、ハイブリッドRAGがq2a-100データセットのリコールを40%以上改善し、cmds-100データセットの60%以上をRAGを使用しないのに対して、ADHはabr-100データセットのリコールを70%以上改善することを示した。評価の結果,Ask-EDAは設計関連質問に対して効果的に応答できることがわかった。

Electronic design engineers are challenged to find relevant information efficiently for a myriad of tasks within design construction, verification and technology development. Large language models (LLM) have the potential to help improve productivity by serving as conversational agents that effectively function as subject-matter experts. In this paper we demonstrate Ask-EDA, a chat agent designed to serve as a 24x7 expert available to provide guidance to design engineers. Ask-EDA leverages LLM, hybrid retrieval augmented generation (RAG) and abbreviation de-hallucination (ADH) techniques to deliver more relevant and accurate responses. We curated three evaluation datasets, namely q2a-100, cmds-100 and abbr-100. Each dataset is tailored to assess a distinct aspect: general design question answering, design command handling and abbreviation resolution. We demonstrated that hybrid RAG offers over a 40% improvement in Recall on the q2a-100 dataset and over a 60% improvement on the cmds-100 dataset compared to not using RAG, while ADH yields over a 70% enhancement in Recall on the abbr-100 dataset. The evaluation results show that Ask-EDA can effectively respond to design-related inquiries.

翻訳日:2024-06-17 00:04:06 公開日:2024-06-03

# VerilogReader: LLM支援ハードウェアテスト生成

VerilogReader: LLM-Aided Hardware Test Generation ( http://arxiv.org/abs/2406.04373v1 )

ライセンス: Link先を確認

Ruiyang Ma, Yuxin Yang, Ziqian Liu, Jiaxi Zhang, Min Li, Junhua Huang, Guojie Luo,

(参考訳) テスト生成はハードウェア設計の検証において、重要かつ労働集約的なプロセスである。近年,Large Language Model (LLM) の出現とその高度な理解と推論能力は,新しいアプローチを導入している。本研究では,LLMがVerilog Readerとして機能するCoverage Directed Test Generation (CDG)プロセスへのLLMの統合について検討する。コードロジックを正確に把握し、未探索のコードブランチに到達可能な刺激を生成する。私たちは、自設計のVerilogベンチマークスイートを使用して、ランダムなテストとフレームワークを比較します。実験により,本フレームワークはLLMの理解範囲内での設計におけるランダムなテストよりも優れていることが示された。また,LLMの理解範囲と精度を高めるために,迅速な工学的最適化を提案する。

Test generation has been a critical and labor-intensive process in hardware design verification. Recently, the emergence of Large Language Model (LLM) with their advanced understanding and inference capabilities, has introduced a novel approach. In this work, we investigate the integration of LLM into the Coverage Directed Test Generation (CDG) process, where the LLM functions as a Verilog Reader. It accurately grasps the code logic, thereby generating stimuli that can reach unexplored code branches. We compare our framework with random testing, using our self-designed Verilog benchmark suite. Experiments demonstrate that our framework outperforms random testing on designs within the LLM's comprehension scope. Our work also proposes prompt engineering optimizations to augment LLM's understanding scope and accuracy.

翻訳日:2024-06-10 18:49:00 公開日:2024-06-03

# $\ell_0$-regularized問題に対する分岐境界更新フレームワーク

A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems ( http://arxiv.org/abs/2406.03504v1 )

ライセンス: Link先を確認

Theo Guyard, Cédric Herzet, Clément Elvira, Ayşe-Nur Arslan,

(参考訳) 本稿では,ブランチ・アンド・バウンド(BnB)アルゴリズムによる$\ell_0$-regularizationを含む学習問題の解決について考察する。これらの手法は、問題の実現可能な空間の領域を探索し、それらが「プルーニングテスト」によって解を含まないかどうかを確認する。標準的な実装では、プルーニングテストの評価には凸最適化の問題が解決され、計算ボトルネックが発生する可能性がある。本稿では,$\ell_0$-regularized問題に対するプルーニングテストの実装方法を提案する。提案手法により,複数の領域の同時評価が可能となり,計算オーバーヘッドが無視できる標準BnB実装に組み込むことができる。我々は,機械学習アプリケーションで発生する典型的な問題に対して,BnBプロシージャの解法時間を桁違いに改善できることを数値シミュレーションにより示す。

We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.

翻訳日:2024-06-07 19:34:24 公開日:2024-06-03

# 部分ラベル情報を用いた半教師付きコントラスト学習

Semi-supervised Contrastive Learning Using Partial Label Information ( http://arxiv.org/abs/2003.07921v2 )

ライセンス: Link先を確認

Colin B. Hansen, Vishwesh Nath, Diego A. Mesa, Yuankai Huo, Bennett A. Landman, Thomas A. Lasko,

(参考訳) 半教師付き学習では、ラベルなし例からの情報はラベル付き例から学習したモデルを改善するために使用される。いくつかの学習問題では、ラベルの情報をラベルのない例から推測し、モデルをさらに改善するために使用することができる。特に、トレーニングサンプルのサブセットがラベル自体が欠落しているにも関わらず、同じラベルを持つことがわかっているときに、部分的なラベル情報が存在している。対照的な学習目標を通じて、モデルに同じラベルをすべての例に付与するように促すことで、パフォーマンスを向上する可能性がある。この促進をNullspace Tuningと呼ぶのは、同じラベルを持つ任意の一対の例の差分ベクトルが線型モデルのnull空間にあるからである。そこで,本稿では,適切に分類された公開データセットに対する慎重な比較フレームワークを用いて,部分ラベル情報を使用することの利点について検討する。部分ラベルによって提供される付加情報は、良い半教師付き手法よりもテストエラーを2倍から5.5倍に減らすことを示す。また、最新かつ最先端のMixMatchメソッドにNullspace Tuningを追加することで、テストエラーを最大1.8倍に削減することを示す。

In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# 高次元偏微分方程式に対する時空間ディープニューラルネットワーク近似

Space-time deep neural network approximations for high-dimensional partial differential equations ( http://arxiv.org/abs/2006.02199v2 )

ライセンス: Link先を確認

Fabian Hornung, Arnulf Jentzen, Diyora Salimova,

(参考訳) 応用数学において、高次元偏微分方程式(PDE)を近似的に解くことが最も難しい問題の一つであり、科学文献におけるPDEの数値近似法は、対応する近似スキームで用いられる計算演算の数が PDE 次元および/または $\varepsilon$ の逆数で指数関数的に増加するという意味で、いわゆる次元の呪いに苦しむ。近年, 深層学習に基づくPDEの近似法が提案されており, 深部ニューラルネットワーク(DNN)近似は, PDE次元の$d\in\mathbb{N}$と所定精度の$\varepsilon>0$の両方において, 近似DNNを記述するために用いられる実パラメータの数が多項式的に増加するという意味で, 次元性の呪いを克服する能力を持つ可能性が示唆されている。現在では、DNNがPDEの近似解における次元性の呪いを克服していることを証明することによって、この予想を裏付ける科学文献に厳密な結果がいくつかある。これらの結果は、DNN が適当な PDE 解を一定時間点 $T>0$ で近似し、コンパクトな立方体 $[a,b]^d$ で空間で近似することで、次元性の呪いを克服することを証明しているが、これらの結果は、次元性の呪いを伴わない DNN で PDE 解全体が $[0,T]\times [a,b]^d$ で近似できるかどうかという疑問に対する答えを与えていない。この問題を克服するのはまさにこの記事の主題である。より具体的には、この研究の主な結果は、任意の$a\in\mathbb{R}$, $ b\in (a,\infty)$に対して、あるコルモゴロフ PDE の解は時空領域 $[0,T]\times [a,b]^d$ の時空領域 $[0,T]\times [a,b]^d$ の DNN によって近似可能であることを証明している。

It is one of the most challenging issues in applied mathematics to approximately solve high-dimensional partial differential equations (PDEs) and most of the numerical approximation methods for PDEs in the scientific literature suffer from the so-called curse of dimensionality in the sense that the number of computational operations employed in the corresponding approximation scheme to obtain an approximation precision $\varepsilon>0$ grows exponentially in the PDE dimension and/or the reciprocal of $\varepsilon$. Recently, certain deep learning based approximation methods for PDEs have been proposed and various numerical simulations for such methods suggest that deep neural network (DNN) approximations might have the capacity to indeed overcome the curse of dimensionality in the sense that the number of real parameters used to describe the approximating DNNs grows at most polynomially in both the PDE dimension $d\in\mathbb{N}$ and the reciprocal of the prescribed accuracy $\varepsilon>0$. There are now also a few rigorous results in the scientific literature which substantiate this conjecture by proving that DNNs overcome the curse of dimensionality in approximating solutions of PDEs. Each of these results establishes that DNNs overcome the curse of dimensionality in approximating suitable PDE solutions at a fixed time point $T>0$ and on a compact cube $[a,b]^d$ in space but none of these results provides an answer to the question whether the entire PDE solution on $[0,T]\times [a,b]^d$ can be approximated by DNNs without the curse of dimensionality. It is precisely the subject of this article to overcome this issue. More specifically, the main result of this work in particular proves for every $a\in\mathbb{R}$, $ b\in (a,\infty)$ that solutions of certain Kolmogorov PDEs can be approximated by DNNs on the space-time region $[0,T]\times [a,b]^d$ without the curse of dimensionality.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# MNIST-1Dによるディープラーニングのスケールアップ

Scaling Down Deep Learning with MNIST-1D ( http://arxiv.org/abs/2011.14439v5 )

ライセンス: Link先を確認

Sam Greydanus, Dmitry Kobak,

(参考訳) 深層学習モデルは商業的・政治的に関係があるが、その訓練と運用の重要な側面はいまだに理解されていない。これはディープラーニングプロジェクトの科学への関心を喚起し、その多くは大量の時間、お金、電気を必要とする。しかし、この研究のどれ程を大規模に行う必要があるのか? 本稿では,従来のディープラーニングベンチマークに代わる最小限のプロシージャ生成,低メモリ,低計算量であるMNIST-1Dを紹介する。 MNIST-1Dの寸法は40に過ぎず、デフォルトのトレーニングセットのサイズは4000に限られるが、MNIST-1Dは異なる深層建築の帰納バイアスの研究、宝くじの発見、深層二重降下の観察、アクティベーション関数の金属化、および自己教師付き学習におけるギロチン正則化の実証に使用できる。これらの実験はすべてGPU上で、あるいは数分でCPU上で行うことができ、高速なプロトタイピング、教育ユースケース、低予算での最先端の研究を可能にする。

Although deep learning models have taken on commercial and political relevance, key aspects of their training and operation remain poorly understood. This has sparked interest in science of deep learning projects, many of which require large amounts of time, money, and electricity. But how much of this research really needs to occur at scale? In this paper, we introduce MNIST-1D: a minimalist, procedurally generated, low-memory, and low-compute alternative to classic deep learning benchmarks. Although the dimensionality of MNIST-1D is only 40 and its default training set size only 4000, MNIST-1D can be used to study inductive biases of different deep architectures, find lottery tickets, observe deep double descent, metalearn an activation function, and demonstrate guillotine regularization in self-supervised learning. All these experiments can be conducted on a GPU or often even on a CPU within minutes, allowing for fast prototyping, educational use cases, and cutting-edge research on a low budget.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# ドメイン特化人工知能を用いた発達小児のデジタル治療の改善 : 機械学習による研究

Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study ( http://arxiv.org/abs/2012.08678v2 )

ライセンス: Link先を確認

Peter Washington, Haik Kalantarian, John Kent, Arman Husic, Aaron Kline, Emilie Leblanc, Cathy Hou, Onur Cezmi Mutlu, Kaitlyn Dunlap, Yordan Penev, Maya Varma, Nate Tyler Stockham, Brianna Chrisman, Kelley Paskov, Min Woo Sun, Jae-Yoon Jung, Catalin Voss, Nick Haber, Dennis Paul Wall,

(参考訳) 背景: 自動感情分類は、自閉症などの発達的行動状態を持つ子供を含む感情の認識に苦慮する人々を支援する。しかし、ほとんどのコンピュータビジョンの感情認識モデルは大人の感情に基づいて訓練されているため、子供の顔に適用された場合、性能は低下する。目的:我々は,児童の感情に富んだ画像の収集とラベル付けをゲーミフィケーションし,児童の感情自動認識モデルの性能を,デジタル医療のアプローチに必要なレベルに近づけるための戦略を考案した。方法: 発達的, 行動的条件の子ども向けに設計されたスマートフォンゲームGuessWhatを, ゲームによって引き起こされる様々な感情を表現した子どものビデオデータのセキュアな収集に活用した。独立して、私たちはHorwoodSquaresと呼ばれる人間のラベル付け作業をゲーミフィケーションするためのセキュアなWebインターフェースを作成しました。私たちは2155の動画、39,968の感情フレーム、106,001のラベルをすべての画像に集めてラベル付けしました。この拡張された小児感情中心データベース(既存の一般の小児感情データセットの30倍)を用いて、我々は、子供によって誘発される幸せ、悲しみ、驚き、恐怖、怒り、嫌悪感、中立表現の畳み込みニューラルネットワーク(CNN)コンピュータビジョン分類器を訓練した。結果: この分類器の精度は66.9%, 顔表情全体のF1スコア67.4%, バランスの取れた精度79.1%, CAFEサブセットAではF1スコア78%であった。この性能は、CAFEに対して評価されたすべての開発済みの分類器よりも少なくとも10%高く、最も優れたものは、"anger"と"disgust"を1つのクラスに組み合わせた場合でも、56%のバランスの取れた精度に達した。

Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# エキスパートの一貫性を活用してアルゴリズム決定サポートを改善する

Leveraging Expert Consistency to Improve Algorithmic Decision Support ( http://arxiv.org/abs/2101.09648v3 )

ライセンス: Link先を確認

Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova,

(参考訳) 機械学習(ML)は、高い意思決定をサポートするためにますます使われています。しかし、意思決定タスクに対する関心の構成と、MLモデルをトレーニングするためにラベルとして使われるプロキシで捉えられるものとの間には、しばしば構成上のギャップがある。その結果、MLモデルは決定基準の重要な次元を捉えることができず、意思決定支援の実用性を阻害する可能性がある。したがって、決定支援のためのMLシステムの設計において重要なステップは、利用可能なプロキシの中からターゲットラベルを選択することである。この研究では、構成ギャップを狭めるために観測結果と組み合わせることができる情報の源泉として、歴史的専門家による決定がリッチで不完全なものとして使われることを探る。マネージャとシステムデザイナは、観察結果から学習しながら、相互に一貫性を示すケースで専門家から学ぶことに興味があるかもしれない、と私たちは主張する。我々は,組織情報システムでよく見られる情報を用いて,この目標を達成するための方法論を開発する。これには2つの中核ステップが含まれる。まず、データ内の各ケースが1人の専門家によって評価された場合、専門家の一貫性を間接的に推定する影響関数に基づく方法論を提案する。第2に,MLモデルを専門家の判断から同時に学習し,その結果を観察するラベルアマルガメーション手法を導入する。本研究は, 臨床環境におけるシミュレーションと児童福祉領域の実世界データを用いた実証的評価から, 提案手法が構成ギャップを狭くし, 観察結果や専門家の判断だけでの学習よりも優れた予測性能が得られることを示した。

Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# サンプリングの力:民間EMMにおける次元自由リスク境界

The Power of Sampling: Dimension-free Risk Bounds in Private ERM ( http://arxiv.org/abs/2105.13637v4 )

ライセンス: Link先を確認

Yin Tat Lee, Daogao Liu, Zhou Lu,

(参考訳) DP-ERM(differially private empirical risk minimization)は、プライベート最適化における基本的な問題である。 DP-ERMの理論はよく研究されているが、大規模モデルが普及するにつれて、従来のDP-ERM法は、(1)周囲次元への禁忌的依存、(2)非滑らかな目的関数、(3)高価な一階勾配オラクルなど、新しい課題に直面している。このような課題は、既存のDP-ERM方法論を再考することを要求する。本研究では,既存のサンプルと組み合わせた正規化指数関数機構が,これらの課題を完全に解決できることを示す: 標準の非制約領域と低ランク勾配仮定の下では,従来の手法では達成されなかったゼロ次オーラクルのみを用いて,非滑らかな凸対象に対するランク依存的リスクバウンダリを実現することができる。これは、差分プライバシーにおけるサンプリングのパワーを強調します。さらに下限を構築し、勾配がフルランクの場合、制約された設定と制約のない設定の間には分離がないことを示す。我々の下限は、制約された領域に制限されない一般的なブラックボックス還元と、独立した関心を持つかもしれない制約された設定における改善された下限から導かれる。

Differentially private empirical risk minimization (DP-ERM) is a fundamental problem in private optimization. While the theory of DP-ERM is well-studied, as large-scale models become prevalent, traditional DP-ERM methods face new challenges, including (1) the prohibitive dependence on the ambient dimension, (2) the highly non-smooth objective functions, (3) costly first-order gradient oracles. Such challenges demand rethinking existing DP-ERM methodologies. In this work, we show that the regularized exponential mechanism combined with existing samplers can address these challenges altogether: under the standard unconstrained domain and low-rank gradients assumptions, our algorithm can achieve rank-dependent risk bounds for non-smooth convex objectives using only zeroth order oracles, which was not accomplished by prior methods. This highlights the power of sampling in differential privacy. We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings. Our lower bound is derived from a general black-box reduction from unconstrained to the constrained domain and an improved lower bound in the constrained setting, which might be of independent interest.

翻訳日:2024-06-07 05:08:03 公開日:2024-06-03

# Rydberg原子における3体微細構造変化フェルスター共鳴に基づくトフォリゲート

Toffoli gate based on a three-body fine-structure-state-changing Förster resonance in Rydberg atoms ( http://arxiv.org/abs/2112.11058v3 )

ライセンス: Link先を確認

I. N. Ashkarin, I. I. Beterov, E. A. Yakshina, D. B. Tretyakov, V. M. Entin, I. I. Ryabtsev, P. Cheinet, K. -L. Pham, S. Lepoutre, P. Pillet,

(参考訳) 我々は,3体リングバーグ相互作用を変化させる微細構造状態に基づく3量子トフォリゲートの改良手法を開発した。 I.I.Beterov et al , Physical Review A 98, 042704 (2018)]。異なるタイプの3体F\"オースター共鳴を用いることにより、レーザー励起と集合3体状態の位相ダイナミクスのスキームを大幅に単純化した。このタイプのF\オースター共鳴は、2つ以上の原子を持つ系にしか存在しないが、2体共鳴は存在しない。我々は,外部磁場のゆらぎに対するゲート密度の感度を低減し,Rydberg原子に基づくトフォリゲートの以前の方式と比較して,共振電界値の微調整に外部磁場を使用する必要をなくした。計算の結果, ゲート忠実度は99%であった。

We have developed an improved scheme of a three-qubit Toffoli gate based on fine structure state changing three-body Stark-tuned Rydberg interaction. This scheme is a substantial improvement of our previous proposal [I.I.Beterov et al., Physical Review A 98, 042704 (2018)]. Due to the use of a different type of three-body F\"orster resonance we substantially simplified the scheme of laser excitation and phase dynamics of collective three-body states. This type of F\"orster resonance exists only in systems with more than two atoms, while the two-body resonance is absent. We reduced the sensitivity of the gate fidelity to fluctuations of external electric field and eliminated the necessity to use external magnetic field for fine tuning of the resonant electric field value, compared to the previous scheme of Toffoli gate based on Rydberg atoms. A gate fidelity of >99% was demonstrated in the calculations.

翻訳日:2024-06-07 04:58:43 公開日:2024-06-03

# TATTOOED:拡散スペクトルチャネル符号化に基づくロバストなディープニューラルネットワーク透かし方式

TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding ( http://arxiv.org/abs/2202.06091v3 )

ライセンス: Link先を確認

Giulio Pagnotta, Dorjan Hitaj, Briland Hitaj, Fernando Perez-Cruz, Luigi V. Mancini,

(参考訳) 近年、ディープニューラルネットワーク(DNN)の透かしは、所有者の許可なくこれらのモデルが取得されるシナリオにおいて、DNNの所有権を検証するメカニズムとして多くの(透かし)戦略が提案されている。しかし, 既存の透かし機構は, 微調整, パラメータの刈り取り, シャッフルなど, 除去技術に非常に敏感であることが示された。本稿では,既存の脅威に対して堅牢な新しいDNN透かし技術であるTATTOOEDを提案する。 DNN所有者は, TATTOOEDを透かし機構として使用することにより, 99%のモデルパラメータが変更されている場合においても, 透かしを取得し, モデルのオーナシップを検証できることを示した。さらに、TATTOOEDは、トレーニングパイプラインで簡単に使用でき、モデルパフォーマンスに無視できる影響があることが示される。

Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.

翻訳日:2024-06-07 04:58:43 公開日:2024-06-03

# ニューラルネットワークによるアスファルトコンクリートの疲労寿命予測

Predicting the fatigue life of asphalt concrete using neural networks ( http://arxiv.org/abs/2406.01523v1 )

ライセンス: Link先を確認

Jakub Houlík, Jan Valentin, Václav Nežerka,

(参考訳) アスファルトコンクリート(AC)の耐久性と維持要求は, その疲労寿命に強く影響される。この特徴を決定する従来の方法は、リソース集約と時間消費の両方である。本研究では, ニューラルネットワークを用いて交流疲労寿命を予測し, ひずみレベル, バインダー含量, 空気ボイド含量の影響に着目した。実際のデータセットを活用することで、当社のモデルを調整し、一般的に対数スケールで表現される幅広い疲労寿命データを効果的に扱えるようにしました。平均2乗対数誤差を損失関数として利用し, 疲労寿命のすべてのレベルにわたって予測精度を向上した。各種ハイパーパラメータの比較分析により,データ内の複雑な関係を抽出する機械学習モデルを開発した。以上の結果から, 高いバインダー含有量では疲労寿命が著しく向上する一方, 気液含量の影響はバインダー濃度によって大きく変化することが示唆された。最も重要なこととして、この研究は、ANNをモデリングに使用する複雑さに関する洞察を提供し、より大きなデータセットでその潜在的なユーティリティを示す。この研究で使用されたコードとデータはGitHubリポジトリのオープンソースとして提供され、論文には完全なアクセスのためのリンクが含まれている。

Asphalt concrete's (AC) durability and maintenance demands are strongly influenced by its fatigue life. Traditional methods for determining this characteristic are both resource-intensive and time-consuming. This study employs artificial neural networks (ANNs) to predict AC fatigue life, focusing on the impact of strain level, binder content, and air-void content. Leveraging a substantial dataset, we tailored our models to effectively handle the wide range of fatigue life data, typically represented on a logarithmic scale. The mean square logarithmic error was utilized as the loss function to enhance prediction accuracy across all levels of fatigue life. Through comparative analysis of various hyperparameters, we developed a machine-learning model that captures the complex relationships within the data. Our findings demonstrate that higher binder content significantly enhances fatigue life, while the influence of air-void content is more variable, depending on binder levels. Most importantly, this study provides insights into the intricacies of using ANNs for modeling, showcasing their potential utility with larger datasets. The codes developed and the data used in this study are provided as open source on a GitHub repository, with a link included in the paper for full access.

翻訳日:2024-06-06 23:49:24 公開日:2024-06-03

# PPINtonus:Deep-Learning Tonal Analysis を用いたパーキンソン病早期発見

PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis ( http://arxiv.org/abs/2406.02608v1 )

ライセンス: Link先を確認

Varun Reddy,

(参考訳) PPINtonusはパーキンソン病(PD)を早期に検出するためのシステムであり、ディープラーニングの音節解析を利用して、従来の神経学的検査に代わる費用対効果とアクセス性を提供する。 Parkinson's Voice Project (PVP)と共同で、PPINtonusは、半教師付き条件生成対向ネットワークを使用して合成データポイントを生成し、多層ディープニューラルネットワークのトレーニングデータセットを強化している。 PRAAT音声ソフトウェアと組み合わせて、典型的な家庭内騒音条件下で標準マイクを用いて実施した120秒音声検査から、生体医学的音声測定値を正確に評価する。モデルの性能は混乱行列を用いて検証され、92.5 \%の精度で偽陰性率を低くした。 PPINtonusは92.7 \%の精度を示し、早期PD検出のための信頼性の高いツールとなった。 PPINtonusの非侵襲的で効率的な方法は、早期診断を可能にし、タイムリーな介入と管理を通じて何百万人ものPD患者の生活の質を向上させることによって、発展途上国に多大な利益をもたらすことができる。

PPINtonus is a system for the early detection of Parkinson's Disease (PD) utilizing deep-learning tonal analysis, providing a cost-effective and accessible alternative to traditional neurological examinations. Partnering with the Parkinson's Voice Project (PVP), PPINtonus employs a semi-supervised conditional generative adversarial network to generate synthetic data points, enhancing the training dataset for a multi-layered deep neural network. Combined with PRAAT phonetics software, this network accurately assesses biomedical voice measurement values from a simple 120-second vocal test performed with a standard microphone in typical household noise conditions. The model's performance was validated using a confusion matrix, achieving an impressive 92.5 \% accuracy with a low false negative rate. PPINtonus demonstrated a precision of 92.7 \%, making it a reliable tool for early PD detection. The non-intrusive and efficient methodology of PPINtonus can significantly benefit developing countries by enabling early diagnosis and improving the quality of life for millions of PD patients through timely intervention and management.

翻訳日:2024-06-06 23:39:37 公開日:2024-06-03

# Pseudo-Label Filtering for Continual Test-Time Adaptation

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation ( http://arxiv.org/abs/2406.02609v1 )

ライセンス: Link先を確認

Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang,

(参考訳) 連続的テスト時間適応(CTTA)は、ソースデータにアクセスすることなく、テストフェーズ中に対象ドメインのシーケンスに事前訓練されたモデルを適用することを目的としている。未知のドメインからのラベルのないデータに適応するために、既存のメソッドは、すべてのサンプルに対して擬似ラベルを構築し、自己学習を通じてモデルを更新する。しかし、これらの擬似ラベルは、しばしばノイズを伴い、適応が不十分になる。 Pseudo Labeling Filter (PLF) と呼ばれるCTTAの擬似ラベル選択法を提案する。 PLFの鍵となる考え方は、擬似ラベルの適切なしきい値を選択し続け、自己学習のための信頼できるしきい値を特定することである。具体的には、初期化、成長、多様性を含む、継続的なドメイン学習の間にしきい値を設定するための3つの原則を提示します。これらの原則に基づいて、擬似ラベルをフィルタするために自己適応型閾値を設計する。さらに、未知のドメインサンプルに対して多様な予測を行うようモデルに促すために、クラス優先アライメント(CPA)手法を導入する。広範な実験を通じて、PLFは現在の最先端の手法よりも優れており、CTTAにおいてその効果が証明されている。

Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.

翻訳日:2024-06-06 23:39:37 公開日:2024-06-03

# MoFormer:条件付き変圧器連成多モード核融合記述子に基づく多目的抗微生物ペプチド生成

MoFormer: Multi-objective Antimicrobial Peptide Generation Based on Conditional Transformer Joint Multi-modal Fusion Descriptor ( http://arxiv.org/abs/2406.02610v1 )

ライセンス: Link先を確認

Li Wang, Xiangzheng Fu, Jiahao Yang, Xinyi Zhang, Xiucai Ye, Yiping Liu, Tetsuya Sakurai, Xiangxiang Zeng,

(参考訳) 深層学習は、より望ましい性質を持つ既存のペプチドを最適化する大きな可能性を秘めている。いくつかの最適化された抗微生物ペプチド(AMP)生成法が最近出現したにもかかわらず、多目的最適化は依然として理想主義と現実主義のトレードオフにおいて非常に難しい。そこで我々は,AMPの多属性同時最適化のための多目的AMP合成パイプライン (MoFormer) を構築した。 MoFormer は高度に構造化された潜伏空間における AMP 配列の所望の属性を改善し, 条件制約と細粒度多記述子により誘導される。また,大規模モデルの微調整に基づくパレートに基づく非支配的ソートアルゴリズムとプロキシを用いて,候補を階層的にランク付けする。 1)分子シミュレーションとアミノ酸間の相互作用のスコアリングによるAMPの構造と機能の解析,(2)品質と分布特性の検証のための潜伏空間の可視化,デザイン制約のある多目的最適化AMPの有効な方法の検証,の2点から,MoFormerを用いた実質的な特性改善を実証した。

Deep learning holds a big promise for optimizing existing peptides with more desirable properties, a critical step towards accelerating new drug discovery. Despite the recent emergence of several optimized Antimicrobial peptides(AMP) generation methods, multi-objective optimizations remain still quite challenging for the idealism-realism tradeoff. Here, we establish a multi-objective AMP synthesis pipeline (MoFormer) for the simultaneous optimization of multi-attributes of AMPs. MoFormer improves the desired attributes of AMP sequences in a highly structured latent space, guided by conditional constraints and fine-grained multi-descriptor.We show that MoFormer outperforms existing methods in the generation task of enhanced antimicrobial activity and minimal hemolysis. We also utilize a Pareto-based non-dominated sorting algorithm and proxies based on large model fine-tuning to hierarchically rank the candidates. We demonstrate substantial property improvement using MoFormer from two perspectives: (1) employing molecular simulations and scoring interactions among amino acids to decipher the structure and functionality of AMPs; (2) visualizing latent space to examine the qualities and distribution features, verifying an effective means to facilitate multi-objective optimization AMPs with design constraints

翻訳日:2024-06-06 23:39:37 公開日:2024-06-03

# LOLA:コンテンツ実験のためのLLM支援オンライン学習アルゴリズム

LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments ( http://arxiv.org/abs/2406.02611v1 )

ライセンス: Link先を確認

Zikun Ye, Hema Yoganarasimhan, Yufeng Zheng,

(参考訳) 急速に進化するデジタルコンテンツの世界では、メディア企業やニュース出版社は、ユーザーエンゲージメントを高めるための自動化された効率的な方法を必要としている。本稿では,LLM-Assisted Online Learning Algorithm (LOLA)を紹介し,Large Language Models (LLM) と適応実験を統合し,コンテンツ配信を最適化する新しいフレームワークを提案する。記事の内容に関連付けられた様々な見出しのパフォーマンスを評価するための17,681の見出しA/Bテストを含む、Upworthyから大規模データセットを活用することで、まず、プロンプトベースのメソッド、埋め込みベースの分類モデル、微調整されたオープンソースLCMの3つの幅広い純粋なLLMアプローチを調査する。以上の結果から,プロンプトベースアプローチの精度は65%に満たないことが示唆された。対照的に、OpenAI埋め込みベースの分類モデルと微調整のLlama-3-8bモデルは82～84%の精度を実現しているが、十分なトラフィックでの実験性能には達していない。次に,最適純粋LLM手法とアッパー信頼境界アルゴリズムを組み合わせたLOLAを導入し,トラフィックを適応的に割り当て,クリックを最大化する。 Upworthy データの数値実験により,LOLA は標準的な A/B テスト法 (Upworthy の現在の状態 quo ) ,純バンドビットアルゴリズム,純粋LLM アプローチ,特に実験トラフィックの制限や多数のアームのシナリオにおいて,優れた性能を示した。当社のアプローチは,デジタル広告やソーシャルメディアレコメンデーションなどのユーザエンゲージメントを最適化する,さまざまなディジタルセッティングのコンテンツ実験にも適用可能です。

In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which includes 17,681 headline A/B tests aimed at evaluating the performance of various headlines associated with the same article content, we first investigate three broad pure-LLM approaches: prompt-based methods, embedding-based classification models, and fine-tuned open-source LLMs. Our findings indicate that prompt-based approaches perform poorly, achieving no more than 65% accuracy in identifying the catchier headline among two options. In contrast, OpenAI-embedding-based classification models and fine-tuned Llama-3-8b models achieve comparable accuracy, around 82-84%, though still falling short of the performance of experimentation with sufficient traffic. We then introduce LOLA, which combines the best pure-LLM approach with the Upper Confidence Bound algorithm to adaptively allocate traffic and maximize clicks. Our numerical experiments on Upworthy data show that LOLA outperforms the standard A/B testing method (the current status quo at Upworthy), pure bandit algorithms, and pure-LLM approaches, particularly in scenarios with limited experimental traffic or numerous arms. Our approach is both scalable and broadly applicable to content experiments across a variety of digital settings where firms seek to optimize user engagement, including digital advertising and social media recommendations.

翻訳日:2024-06-06 23:39:37 公開日:2024-06-03

# データ評価は学習可能か、解釈可能か?

Is Data Valuation Learnable and Interpretable? ( http://arxiv.org/abs/2406.02612v1 )

ライセンス: Link先を確認

Ou Wu, Weiyao Zhu, Mengyang Li,

(参考訳) 個々のサンプルの価値を測定することは、深層学習モデルのトレーニングなど、多くのデータ駆動タスクにおいて重要である。近年の文献では、データ評価手法の開発に多大な努力が注がれている。主要なデータ評価手法はゲーム理論のShapley値に基づいており、この経路に沿って様々な手法が提案されている。例えば、Shapleyの値に基づく評価には理論的な根拠があるが、完全に実験に基づくアプローチであり、これまでに評価モデルが構築されていない。さらに、現在のデータアセスメント手法は、データ価格などのアプリケーションにおいて、相互運用可能なデータアセスメント手法が非常に有用であるにもかかわらず、出力値の解釈可能性を無視している。この研究は、データバリュエーションは学習可能か、解釈可能か、という重要な疑問に答えることを目的としている。学習された評価モデルには、パラメータの固定数や知識再利用可能性など、いくつかの望ましいメリットがある。解釈不能なデータバリュエーションモデルは、なぜサンプルが価値あるのか、あるいは価値がないのかを説明することができる。この目的のために、2つの新しいデータ価値モデリングフレームワークを提案し、モデルトレーニングと解釈可能性のための特定のベースモデルとして、多層知覚~〜(MLP)と新しい回帰ツリーをそれぞれ利用した。ベンチマークデータセット上で大規模な実験が行われる。実験結果は、その質問に対して肯定的な答えを与える。 }本研究は,データ値の評価のための新たな技術パスを開く。大規模なデータバリュエーションモデルは、さまざまなデータ駆動タスクにまたがって構築することができ、データバリュエーションの広範な適用を促進することができる。

Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in developing data valuation methods. The primary data valuation methodology is based on the Shapley value from game theory, and various methods are proposed along this path. {Even though Shapley value-based valuation has solid theoretical basis, it is entirely an experiment-based approach and no valuation model has been constructed so far.} In addition, current data valuation methods ignore the interpretability of the output values, despite an interptable data valuation method is of great helpful for applications such as data pricing. This study aims to answer an important question: is data valuation learnable and interpretable? A learned valuation model have several desirable merits such as fixed number of parameters and knowledge reusability. An intrepretable data valuation model can explain why a sample is valuable or invaluable. To this end, two new data value modeling frameworks are proposed, in which a multi-layer perception~(MLP) and a new regression tree are utilized as specific base models for model training and interpretability, respectively. Extensive experiments are conducted on benchmark datasets. {The experimental results provide a positive answer for the question.} Our study opens up a new technical path for the assessing of data values. Large data valuation models can be built across many different data-driven tasks, which can promote the widespread application of data valuation.

翻訳日:2024-06-06 23:39:37 公開日:2024-06-03

# ACCO: 分散LLMトレーニングにおけるコミュニケーションを保ちながら蓄積する

ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training ( http://arxiv.org/abs/2406.02613v1 )

ライセンス: Link先を確認

Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon,

(参考訳) 大規模言語モデル(LLM)のトレーニングは、複数のGPUを使用してモデルレプリカの確率勾配を並列に計算する分散実装に大きく依存している。しかし、データ並列設定における勾配の同期は、分散ワーカーの数の増加に伴って通信オーバーヘッドを増大させ、並列化の効率向上を妨げる可能性がある。この課題に対処するために、フェデレートラーニングで使用される局所最適化手法など、労働者間通信を減らす最適化アルゴリズムが登場した。通信オーバヘッドの最小化には有効であるが、これらの手法はスケーラビリティを損なうため、余分な運動量変数に加えて、複数のローカル最適化ステップ間の通信が許される場合、オプティマイザの状態はワーカ間で共有できない。これに対して,LLMの分散トレーニングに適したメモリ効率最適化アルゴリズムである$\textbf{AC}$cumulate while $\textbf{CO}$mmunicate ($\textt{ACCO}$。 $\texttt{ACCO}$は、ワーカー間でオプティマイザステートをシャーディングし、グラデーション計算と通信をオーバーラップして通信コストを隠蔽し、異種ハードウェアに対応する。本手法は、勾配計算と通信の並列実行に固有の1ステップ遅延を緩和し、ウォームアップステップを不要とし、標準分散最適化のトレーニングダイナミクスと整合し、ウォールクロック時間でより高速に収束する手法である。我々は、いくつかのLLMトレーニングおよび微調整タスクにおける$\texttt{ACCO}$の有効性を実証する。

Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge, optimization algorithms reducing inter-worker communication have emerged, such as local optimization methods used in Federated Learning. While effective in minimizing communication overhead, these methods incur significant memory costs, hindering scalability: in addition to extra momentum variables, if communications are only allowed between multiple local optimization steps, then the optimizer's states cannot be sharded among workers. In response, we propose $\textbf{AC}$cumulate while $\textbf{CO}$mmunicate ($\texttt{ACCO}$), a memory-efficient optimization algorithm tailored for distributed training of LLMs. $\texttt{ACCO}$ allows to shard optimizer states across workers, overlaps gradient computations and communications to conceal communication costs, and accommodates heterogeneous hardware. Our method relies on a novel technique to mitigate the one-step delay inherent in parallel execution of gradient computations and communications, eliminating the need for warmup steps and aligning with the training dynamics of standard distributed optimization while converging faster in terms of wall-clock time. We demonstrate the effectiveness of $\texttt{ACCO}$ on several LLMs training and fine-tuning tasks.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 都市間ファウショット交通予報のための周波数強化事前学習

Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting ( http://arxiv.org/abs/2406.02614v1 )

ライセンス: Link先を確認

Zhanyu Liu, Jianrong Ding, Guanjie Zheng,

(参考訳) インテリジェントトランスポーテーションシステム(ITS)の分野は、様々な下流アプリケーションを実現するために正確なトラフィック予測に依存している。しかし、開発途上国は、限られた資源と時代遅れのインフラのために、十分なトレーニングトラフィックデータを収集する上で、しばしば課題に直面している。この障害を認識して、都市間数発の予測という概念が実現可能なアプローチとして浮上した。従来の都市間数ショット予測手法では、都市間の周波数類似性は無視されていたが、都市間の周波数領域では、交通データがより類似していることが観察された。この事実に基づき、我々は \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross})を提案する。 FEPCrossは事前訓練段階と微調整段階を有する。事前学習段階において,時間・周波数領域の情報を含むクロスドメイン空間・テンポラルエンコーダを提案する。微調整の段階では、トレーニングサンプルを豊かにし、モーメント更新されたグラフ構造を維持するモジュールを設計し、これにより、数ショットのトレーニングデータに過度に適合するリスクを軽減する。実世界の交通データセット上で実施された実証的な評価は、FEPCrossの異常な有効性を検証し、多様なカテゴリの既存アプローチを上回り、都市間数ショット予測の進行を促進する特性を示す。

The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While previous cross-city few-shot forecasting methods ignore the frequency similarity between cities, we have made an observation that the traffic data is more similar in the frequency domain between cities. Based on this fact, we propose a \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross}). FEPCross has a pre-training stage and a fine-tuning stage. In the pre-training stage, we propose a novel Cross-Domain Spatial-Temporal Encoder that incorporates the information of the time and frequency domain and trains it with self-supervised tasks encompassing reconstruction and contrastive objectives. In the fine-tuning stage, we design modules to enrich training samples and maintain a momentum-updated graph structure, thereby mitigating the risk of overfitting to the few-shot training data. Empirical evaluations performed on real-world traffic datasets validate the exceptional efficacy of FEPCross, outperforming existing approaches of diverse categories and demonstrating characteristics that foster the progress of cross-city few-shot forecasting.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 非パラメトリックな測地に対する低次モデリングとグラフニューラルネットワークのハイブリッド数値解法結合:構造力学問題への応用

A hybrid numerical methodology coupling Reduced Order Modeling and Graph Neural Networks for non-parametric geometries: applications to structural dynamics problems ( http://arxiv.org/abs/2406.02615v1 )

ライセンス: Link先を確認

Victor Matray, Faisal Amlani, Frédéric Feyel, David Néron,

(参考訳) 本研究は、複雑な物理系を管理する時間領域偏微分方程式(PDE)の数値解析を高速化するための新しいアプローチを導入する。この手法は、古典的な低次モデリング(ROM)フレームワークと最近導入されたグラフニューラルネットワーク(GNN)の組み合わせに基づいている。提案手法は非パラメトリックなジオメトリに特に適しており、最終的には多様なジオメトリやトポロジーを扱えることが示されている。航空機の座席の設計およびそれに対応する衝撃に対する機械的応答に関する応用文脈において,性能研究は計算負荷を低減し,非パラメトリックな測地を伴わない問題に対する迅速な設計イテレーションを可能にすることが主な動機である。提案手法は, 有限要素に基づく数値シミュレーションを多数必要とする他の科学的・工学的な問題にも適用可能である。

This work introduces a new approach for accelerating the numerical analysis of time-domain partial differential equations (PDEs) governing complex physical systems. The methodology is based on a combination of a classical reduced-order modeling (ROM) framework and recently-introduced Graph Neural Networks (GNNs), where the latter is trained on highly heterogeneous databases of varying numerical discretization sizes. The proposed techniques are shown to be particularly suitable for non-parametric geometries, ultimately enabling the treatment of a diverse range of geometries and topologies. Performance studies are presented in an application context related to the design of aircraft seats and their corresponding mechanical responses to shocks, where the main motivation is to reduce the computational burden and enable the rapid design iteration for such problems that entail non-parametric geometries. The methods proposed here are straightforwardly applicable to other scientific or engineering problems requiring a large number of finite element-based numerical simulations, with the potential to significantly enhance efficiency while maintaining reasonable accuracy.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# エッジコンピューティングにおける無線LLM推論のための適応層分割:モデルに基づく強化学習アプローチ

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach ( http://arxiv.org/abs/2406.02616v1 )

ライセンス: Link先を確認

Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, Honggang Zhang,

(参考訳) エッジコンピューティング環境における大規模言語モデル(LLM)のデプロイの最適化は、プライバシと計算効率の向上に不可欠である。本研究は,エッジコンピューティングにおける効率的な無線LLM推論に向けて,主要なオープンソースLLMにおける分割点の影響を包括的に分析する。そこで本研究では,モデルベース強化学習(MBRL)からインスピレーションを得て,エッジとユーザ機器(UE)間の最適分割点を決定するフレームワークを提案する。報酬代理モデルを導入することで、頻繁な性能評価の計算コストを大幅に削減できる。大規模シミュレーションにより, この手法は, 異なるネットワーク条件下での推論性能と計算負荷のバランスを効果的に保ち, 分散環境におけるLLM配置の堅牢なソリューションを提供することを示した。

Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 免疫組織学のために自動生成された巨大免疫細胞データベースImmunocto

Immunocto: a massive immune cell database auto-generated for histopathology ( http://arxiv.org/abs/2406.02618v1 )

ライセンス: Link先を確認

Mikaël Simard, Zhuoyan Shen, Maria A. Hawkins, Charles-Antoine Collins-Fekete,

(参考訳) 免疫療法などの新しいがん治療オプションの出現に伴い、腫瘍免疫マイクロ環境の研究は予後を知らせ、治療薬に対する反応を理解するために重要である。腫瘍免疫マイクロ環境を特徴付けるための重要なアプローチは、(1)ヘマトキシリンとエオシン(H&E)染色組織断面のデジタル化による高分解能光学像と(2)自動免疫細胞検出および分類法を組み合わせることである。しかし、デジタル病理学における現在の個別免疫細胞分類モデルでは、比較的性能が劣っている。これは主に、現在利用可能な個々の免疫細胞のデータセットが限られているためであり、これは、デジタル化されたH&E全スライド画像に免疫細胞を手動で注釈付けするという、時間と難しい問題の結果である。そこで本研究では,CD4$^+$T細胞リンパ球,CD8$^+$T細胞リンパ球,B細胞リンパ球,マクロファージの4つのサブタイプにまたがる2,282,818個の免疫細胞を含む,6,848,454個のヒト細胞の自動生成データベースであるImmomoctoを紹介する。それぞれのセルに対して、64$\times$64ピクセルのH&Eイメージを$\mathbf{40}\times$倍率で提供し、核とラベルのバイナリマスクを提供します。 Imctoを作成するために、オープンソースモデルとデータを組み合わせて、輪郭やラベルを自動生成しました。これらの細胞は、一致したH&EおよびOrionプラットフォームから免疫蛍光性大腸癌データセットから取得され、Segment Anything Modelを用いて輪郭が取得される。免疫組織からのH&E画像に基づいて訓練された分類器は、平均的なF1スコア0.74を生成し、4つの免疫細胞サブタイプや他の細胞を区別する。 Immunocto は https://zenodo.org/uploads/11073373 でダウンロードできる。

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 暗号変換器回路を用いた言語モデルにおける未知のバックドア

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits ( http://arxiv.org/abs/2406.02619v1 )

ライセンス: Link先を確認

Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, Charlie Rogers-Smith, Jeffrey Ladish, Christian Schroeder de Witt,

(参考訳) オープンソース言語モデルの急速な普及は、下流のバックドア攻撃のリスクを著しく高める。これらのバックドアは、モデル展開中に危険な振る舞いを導入し、従来のサイバーセキュリティ監視システムによる検出を回避することができる。本稿では,従来の技術とは対照的に,自己回帰型トランスフォーマーモデルにおけるバックドアの新たなクラスについて紹介する。無効性は、ディフェンダーがバックドアをトリガーすることを防ぐため、完全なホワイトボックスアクセスを与えられたり、レッドチームや特定の形式的な検証方法のような自動化技術を使用したりしても、デプロイ前に評価や検出が不可能になる。我々は, 暗号技術を用いることで, 新規な構築が不必要であるだけでなく, 良好な堅牢性を有することを示す。これらの特性を実証的な調査で確認し、我々のバックドアが最先端の緩和戦略に耐えられることを示す。さらに、ホワイトボックス設定で完全に検出できないような普遍的なバックドアは、既存の設計よりも検出が難しいことを示して、これまでの作業を拡張しました。本稿では, トランスモデルへのバックドアのシームレスな統合の実現可能性を示すことによって, プリデプロイ検出戦略の有効性を根本的に疑問視する。これにより、AIの安全性とセキュリティにおける犯罪と防御のバランスに関する新たな洞察が得られる。

The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 大規模言語モデルの保護: 調査

Safeguarding Large Language Models: A Survey ( http://arxiv.org/abs/2406.02622v1 )

ライセンス: Link先を確認

Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang,

(参考訳) 大規模言語モデル (LLMs) の急成長する分野において、堅牢な安全メカニズムを開発する「安全ガード (safeguards)」あるいは「ガードレール (guardrails)」は、指定された境界内でのLLMの倫理的使用を保証するために必須となっている。本稿は、この重要なメカニズムの現状について、体系的な文献レビューを提供する。その主な課題と、様々な文脈における倫理的問題を扱う包括的なメカニズムにどのように拡張できるかを論じる。まず、主要なLCMサービスプロバイダとオープンソースコミュニティが採用している保護メカニズムの現在の状況を明らかにする。続いて、幻覚、公正性、プライバシーなど、ガードレールが強制したいと思われるいくつかの(望ましくない)プロパティを評価し、分析し、拡張するテクニックが続く。これらに基づいて、これらの制御(すなわち攻撃)を回避し、攻撃を防御し、ガードレールを補強する手法をレビューする。上記の技術は現状や研究動向を反映しているが,本手法では容易に対処できないいくつかの課題についても論じるとともに,多分野的アプローチ,ニューラルシンボリック手法,システム開発ライフサイクルの完全な検討を通じて,包括的ガードレールの実装方法に関するビジョンを提示する。

In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# Linuxカーネルの爆発におけるページスプレーの理解

Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation ( http://arxiv.org/abs/2406.02624v1 )

ライセンス: Link先を確認

Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing,

(参考訳) 近年,カーネル脆弱性に対するページレベルのエクスプロイトに着目したPage Sprayと呼ばれる新しい手法が登場している。エクスプロイラビリティ、安定性、互換性の面では利点があるが、Page Sprayに関する包括的な研究は依然として乏しい。その根本原因、搾取モデル、他の搾取技術に対する比較利益、および潜在的緩和戦略に関する質問は、ほとんど答えられていない。本稿では,本手法の詳細な理解を提供するため,Page Sprayの系統的な検討を行う。我々は、その基本原理を解明し、Shasysモデルと呼ばれる包括的なエクスプロイトモデルを導入する。さらに、Linuxカーネル内でのPage Spray発生の原因となる根本原因を徹底的に分析する。我々は,Page Spray解析モデルに基づく解析器を設計し,Page Sprayの呼び出し元を同定する。次に, 微妙に設計した実験により, ページスプレーの安定性, 利用性, 適合性を評価する。最後に,Page Sprayに対処するための緩和原則を提案し,軽量化アプローチを提案する。この研究は、セキュリティ研究者や開発者がPage Sprayに関する洞察を得るのを支援することを目的としており、最終的に、この新たなエクスプロイト技術に対する我々の集団的理解を高め、コミュニティの改善を図っている。

Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# プログレッシブ推論:中間予測を用いたデコーダオンリーシーケンス分類モデルの説明

Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions ( http://arxiv.org/abs/2406.02625v1 )

ライセンス: Link先を確認

Sanjay Kariyappa, Freddy Lécué, Saumitra Mishra, Christopher Pond, Daniele Magazzeni, Manuela Veloso,

(参考訳) 本稿では、デコーダのみのシーケンス分類モデルの予測を説明するために、入力属性を計算するためのフレームワークであるプログレッシブ推論を提案する。本研究は、デコーダのみのトランスフォーマーモデルの分類ヘッドを用いて、入力シーケンスの異なる点で評価することで中間予測を行うことができるという知見に基づいている。因果的注意機構のため、これらの中間予測は推論点の前のトークンにのみ依存し、マスク付き入力サブシーケンス上でモデルの予測を得ることができ、計算上のオーバーヘッドは無視できる。この知見を用いてサブシーケンスレベルの属性を提供する2つの方法を開発した。まず,連続する中間予測の差を捉えて属性を計算するシングルパスプログレッシブ推論(Single Pass-Progressive Inference,SP-PI)を提案する。次に、Kernel SHAPとの接続を利用して、MP-PI(Multiple Pass-Progressive Inference)を開発する。 MP-PIは、複数のマスク付きバージョンの入力から中間予測を使用して、より高い品質の属性を計算する。テキスト分類タスクを訓練した多種多様なモデルについて検討したところ,SP-PIとMP-PIは,従来の作業に比べて有意に優れた属性を提供することがわかった。

This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechanism, these intermediate predictions only depend on the tokens seen before the inference point, allowing us to obtain the model's prediction on a masked input sub-sequence, with negligible computational overheads. We develop two methods to provide sub-sequence level attributions using this insight. First, we propose Single Pass-Progressive Inference (SP-PI), which computes attributions by taking the difference between consecutive intermediate predictions. Second, we exploit a connection with Kernel SHAP to develop Multi Pass-Progressive Inference (MP-PI). MP-PI uses intermediate predictions from multiple masked versions of the input to compute higher quality attributions. Our studies on a diverse set of models trained on text classification tasks show that SP-PI and MP-PI provide significantly better attributions compared to prior work.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# ディープラーニングを用いたMRI再構成のための最適化アルゴリズムの概要

A Brief Overview of Optimization-Based Algorithms for MRI Reconstruction Using Deep Learning ( http://arxiv.org/abs/2406.02626v1 )

ライセンス: Link先を確認

Wanyu Bian,

(参考訳) 磁気共鳴イメージング(MRI)はその例外的な軟組織コントラストと高い空間分解能で知られており、医用画像において重要なツールである。ディープラーニングアルゴリズムの統合は、MRI再構成プロセスを最適化する大きな可能性を秘めている。この領域における研究の活発化にもかかわらず、MRI再構成に適した最適化に基づくディープラーニングモデルに関する総合的な調査はまだ行われていない。本稿では,MRI再構成に特化して設計されたディープラーニングにおいて,最新の最適化アルゴリズムを徹底的に検討することにより,このギャップに対処する。本研究の目的は、MRIコミュニティ内でのさらなるイノベーションと応用を促進するために、これらの進歩を研究者に詳細に理解することである。

Magnetic resonance imaging (MRI) is renowned for its exceptional soft tissue contrast and high spatial resolution, making it a pivotal tool in medical imaging. The integration of deep learning algorithms offers significant potential for optimizing MRI reconstruction processes. Despite the growing body of research in this area, a comprehensive survey of optimization-based deep learning models tailored for MRI reconstruction has yet to be conducted. This review addresses this gap by presenting a thorough examination of the latest optimization-based algorithms in deep learning specifically designed for MRI reconstruction. The goal of this paper is to provide researchers with a detailed understanding of these advancements, facilitating further innovation and application within the MRI community.

翻訳日:2024-06-06 23:29:51 公開日:2024-06-03

# 平均アンサンブルを超える - サブシーズン予測のための気候モデルアンサンブルの活用

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting ( http://arxiv.org/abs/2211.15856v4 )

ライセンス: Link先を確認

Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin A. Cash, Rebecca Willett,

(参考訳) 温暖化や降水などの重要な気候変数の季節下時間スケールにおける高品質な予測は、長年にわたって運用上の予測のギャップであった。本研究では,機械学習モデル(ML)を時系列予測のための後処理ツールとして応用することを検討した。大陸アメリカにおける月平均降水量と2週間前の2週間の気温を予測するために、タグ付き数値アンサンブル予測(すなわち、メンバーが初期化日が異なるアンサンブル)と観測データ(相対湿度、海面圧力、測地高度など)をMLの様々な手法に組み込む。回帰、量子レグレッション、およびtercile分類タスクでは、線形モデル、ランダムフォレスト、畳み込みニューラルネットワーク、および積み重ねモデル(個々のMLモデルの予測に基づくマルチモデルアプローチ)を用いて検討する。アンサンブルを単独で使用する従来のMLアプローチとは異なり、アンサンブル予測に埋め込まれた情報を活用して予測精度を向上させる。さらに,計画や緩和に不可欠な極端な事象予測についても検討する。アンサンブルメンバーを空間予測の集合として考慮し、空間情報を用いた様々なアプローチを探求する。異なるアプローチ間のトレードオフは、モデルの積み重ねによって緩和される可能性がある。提案手法は,気候予報やアンサンブル手段などの標準基準よりも優れている。さらに,全アンサンブルを用いた場合とアンサンブル平均のみを用いた場合のトレードオフ,空間的変動を考慮した説明方法の相違について検討した。

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.

翻訳日:2024-06-06 16:52:40 公開日:2024-06-03

# 情報理論を用いた目的関数の選択法

How to select an objective function using information theory ( http://arxiv.org/abs/2212.06566v4 )

ライセンス: Link先を確認

Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall,

(参考訳) 機械学習や科学計算では、モデル性能は客観的関数で測定される。しかし、なぜ別の目的を選ぶのか? 情報理論は1つの答えを与える: モデルの情報を最大限にするために、最少ビットにおけるエラーを表す目的関数を選択する。異なる目的を評価するために、これらを可能性関数に変換する。可能性として、それらの相対的な大きさは、ある目的が他の目標よりもどれだけ強く、その関係のログはビット長の違いと不確実性の違いを表す。言い換えれば、どちらの目的も不確実性を最小化する。情報理論のパラダイムの下では、最終的な目的は、特定のユーティリティとは対照的に、情報の最大化(および不確実性の最小化)である。このパラダイムは、気候変動の影響を理解するために使用される大規模な地球システムモデルのように、多くの用途を持ち、明確な実用性を持たないモデルに適している、と我々は主張する。

In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.

翻訳日:2024-06-06 16:52:40 公開日:2024-06-03

# OpenAPI Specification Extended Security Scheme:Broken Object Level Authorizationの頻度を下げる方法

OpenAPI Specification Extended Security Scheme: A method to reduce the prevalence of Broken Object Level Authorization ( http://arxiv.org/abs/2212.06606v3 )

ライセンス: Link先を確認

Rami Haddad, Rim El Malki, Daniel Cozma,

(参考訳) APIは、サービス間通信を達成するための重要な技術になっています。 APIデプロイメントの増加により、セキュリティ標準の欠如に対処する緊急性が高まっている。 API Securityは、OpenAPI標準の標準化された認証がないため、不適切な認証は、既知の脆弱性や未知の脆弱性の可能性を開く。本稿は,API Security: Broken Object Level Authorization (BOLA) における第1の脆弱性について検討し,この脆弱性の頻度を下げるための方法とツールを提案する。 BOLAはさまざまなAPIフレームワークに影響を与えており、私たちのスコープはOpenAPI Specification(OAS)に固定されています。 OASはAPIの記述と実装の標準であり、一般的なOAS実装はFastAPI、Connexion(Flask)などである。これらの実装には、OASsのAPIプロパティに関する知識に関連する長所と短所がある。 Open API Specificationsのセキュリティプロパティは、オブジェクト認証に対処せず、そのようなオブジェクトプロパティを定義するための標準化されたアプローチを提供しない。これにより、オブジェクトレベルのセキュリティは開発者の慈悲に委ねられ、意図しない攻撃ベクタ生成のリスクが増大する。私たちの目標は、この空白に挑戦することです。 1) OAS ESS(OpenAPI Specification Extended Security Scheme)には、OAS(Design-based approach)内のオブジェクトに対する宣言型セキュリティ制御が含まれている。 2) APIサービス(Flask/FastAPI)にインポートして、オブジェクトレベルで認証チェックを実行することができる認証モジュール(開発ベースのアプローチ)。 APIサービスを構築する場合、開発者はAPI設計(仕様)またはそのコードから始めることができる。どちらの場合も、BOLAの頻度を緩和し、削減するために一連のメカニズムが導入される。

APIs have become the prominent technology of choice for achieving inter-service communications. The growth of API deployments has driven the urgency in addressing its lack of security standards. API Security is a topic for concern given the absence of standardized authorization in the OpenAPI standard, improper authorization opens the possibility for known and unknown vulnerabilities, which in the past years have been exploited by malicious actors resulting in data loss. This paper examines the number one vulnerability in API Security: Broken Object Level Authorization(BOLA), and proposes methods and tools to reduce the prevalence of this vulnerability. BOLA affects various API frameworks, our scope is fixated on the OpenAPI Specification(OAS). The OAS is a standard for describing and implementing APIs; popular OAS Implementations are FastAPI, Connexion (Flask), and many more. These implementations carry the pros and cons that are associated with the OASs knowledge of API properties. The Open API Specifications security properties do not address object authorization and provide no standardized approach to define such object properties. This leaves object-level security at the mercy of developers, which presents an increased risk of unintentionally creating attack vectors. Our aim is to tackle this void by introducing 1) the OAS ESS (OpenAPI Specification Extended Security Scheme) which includes declarative security controls for objects in OAS (design-based approach), and 2) an authorization module that can be imported to API services (Flask/FastAPI) to enforce authorization checks at the object level (development-based approach). When building an API service, a developer can start with the API design (specification) or its code. In both cases, a set of mechanisms are introduced to help developers mitigate and reduce the prevalence of BOLA.

翻訳日:2024-06-06 16:52:40 公開日:2024-06-03

# テキスト・ツー・イメージ・ジェネレータを用いたインターベンショナルデータ拡張に向けて

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators ( http://arxiv.org/abs/2212.11237v4 )

ライセンス: Link先を確認

Jianhao Yuan, Francesco Pinto, Adam Davies, Philip Torr,

(参考訳) ニューラルイメージ分類器は、トレーニングデータと異なる環境条件からサンプリングされた入力に曝されると、深刻な性能劣化が起こることが知られている。近年のテキスト・トゥ・イメージ・ジェネレーション(T2I)の進展を考えると、近年のT2Iジェネレータは、トレーニングデータを強化し、下流分類器の堅牢性を向上させるために、こうした環境要因に対する任意の介入をシミュレートするためにどのように使用できるのかという疑問がある。我々は、単一ドメイン一般化(SDG)におけるベンチマークの多種多様なコレクションを実験し、介入プロンプト戦略、条件付け機構、ポストホックフィルタリングを含む、T2I生成の重要な次元にまたがるスプリアス特徴(RRSF)への依存を減らした。我々の広範な実証実験により、Stable Diffusionのような現代のT2Iジェネレータは、それぞれの寸法がどう構成されているかに関わらず、従来の最先端のデータ拡張技術よりも優れた、強力な介入データ拡張メカニズムとして実際に使用できることが示された。

Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that are sampled from environmental conditions that differ from their training data. Given the recent progress in Text-to-Image (T2I) generation, a natural question is how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors in order to augment training data and improve the robustness of downstream classifiers. We experiment across a diverse collection of benchmarks in single domain generalization (SDG) and reducing reliance on spurious features (RRSF), ablating across key dimensions of T2I generation, including interventional prompting strategies, conditioning mechanisms, and post-hoc filtering. Our extensive empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism, outperforming previously state-of-the-art data augmentation techniques regardless of how each dimension is configured.

翻訳日:2024-06-06 14:46:08 公開日:2024-06-03

# 精密健康におけるクラウドソーシングとヒューマン・イン・ザ・ループワークフローの展望

A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health ( http://arxiv.org/abs/2303.03578v2 )

ライセンス: Link先を確認

Peter Washington,

(参考訳) 現代の機械学習アプローチは、様々な健康状態に対するパフォーマンス診断モデルにつながっている。決定木やディープニューラルネットワークなど、いくつかの機械学習アプローチは、原則として、任意の関数を近似することができる。しかし、入力データが不均一で高次元であり、出力クラスが非常に非線形である場合に、過度に適合する傾向が拡大されるため、このパワーはギフトと呪いの両方と見なすことができる。この問題は、特に主観的基準で診断される行動や精神状態を予測する診断システムに悩まされる可能性がある。この問題に対する新たな解決策はクラウドソーシング(クラウドソーシング)であり、クラウドワーカーは金銭的補償やゲーミフィケーション体験の見返りに複雑な行動特徴に注釈を付けるために支払われる。これらのラベルは、直接または診断機械学習モデルへの入力としてラベルを使用することによって、診断を導出するために使用することができる。この視点では、この新興分野における既存の研究について述べ、新たな研究分野であるクラウドパワー診断システムにおける現在進行中の課題と機会について論じる。正しい考慮により、複雑でニュアンスのある健康状態の予測のために、人為的な機械学習ワークフローにクラウドソーシングを追加することで、スクリーニング、診断、最終的にケアへのアクセスを加速することができる。

Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.

翻訳日:2024-06-06 14:46:07 公開日:2024-06-03

# MAWSEO: 不正なオンラインプロモーションのための逆ウィキ検索

MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion ( http://arxiv.org/abs/2304.11300v3 )

ライセンス: Link先を確認

Zilong Lin, Zhengyi Li, Xiaojing Liao, XiaoFeng Wang, Xiaozhong Liu,

(参考訳) Wiki検索中毒(Wiki search poisoning for illicit promotion)は、ウィキ記事の編集と、関連するクエリのWiki検索結果による不正なビジネスの促進を目的としたサイバー犯罪である。本稿では,Wiki上のステルスブラックハットSEOが自動化可能であることを示す研究を報告する。我々の技術はMAWSEOと呼ばれ、現実のサイバー犯罪の目的を達成するために、ランクアップ、破壊的検出回避、トピック関連性、セマンティック一貫性、プロモーションコンテンツのユーザ認識(警告はしない)など、敵対的な修正を用いています。評価とユーザスタディにより、MAWSEOは、最先端のWiki破壊検知器をバイパスし、アラームを発生させることなく、Wikiユーザーにプロモーションコンテンツを届けることのできる、敵の破壊的編集を効果的かつ効率的に生成できることが示されている。さらに, ウィキエコシステムにおける攻撃に対するコヒーレンスに基づく検出および破壊行為検出の敵意訓練を含む潜在的防御について検討した。

As a prominent instance of vandalism edits, Wiki search poisoning for illicit promotion is a cybercrime in which the adversary aims at editing Wiki articles to promote illicit businesses through Wiki search results of relevant queries. In this paper, we report a study that, for the first time, shows that such stealthy blackhat SEO on Wiki can be automated. Our technique, called MAWSEO, employs adversarial revisions to achieve real-world cybercriminal objectives, including rank boosting, vandalism detection evasion, topic relevancy, semantic consistency, user awareness (but not alarming) of promotional content, etc. Our evaluation and user study demonstrate that MAWSEO is capable of effectively and efficiently generating adversarial vandalism edits, which can bypass state-of-the-art built-in Wiki vandalism detectors, and also get promotional content through to Wiki users without triggering their alarms. In addition, we investigated potential defense, including coherence based detection and adversarial training of vandalism detection, against our attack in the Wiki ecosystem.

翻訳日:2024-06-06 14:36:23 公開日:2024-06-03

# SciMON:新奇性に最適化された科学的な吸気装置

SciMON: Scientific Inspiration Machines Optimized for Novelty ( http://arxiv.org/abs/2305.14259v7 )

ライセンス: Link先を確認

Qingyun Wang, Doug Downey, Heng Ji, Tom Hope,

(参考訳) 文献に基づく新たな科学的方向を生成するために,ニューラルランゲージモデルを探索し,拡張する。文献に基づく仮説生成の研究は伝統的に、仮説の表現性を制限する二進的リンク予測に焦点を当ててきた。この一連の作品は、新規性を最適化することにも焦点をあてていない。我々は、入力背景コンテキスト(例えば、問題、実験的な設定、目標)としてモデルを使い、文学に根ざした自然言語のアイデアを出力する、新しい設定で劇的な出発点を取ります。本稿では,過去の科学的論文から「吸入」を抽出し,先行論文と反復的に比較し,十分な新規性が達成されるまでアイデア提案を更新することによって,新規性のために明示的に最適化するモデリングフレームワークであるSciMONについて述べる。包括的評価の結果,GPT-4は全体的に低い技術深度と新規性を持つアイデアを産み出す傾向にあることがわかった。我々の研究は、科学文献から生まれた新しいアイデアを生み出す言語モデルの評価と開発に向けた第一歩である。

We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and developing language models that generate new ideas derived from the scientific literature

翻訳日:2024-06-06 14:36:23 公開日:2024-06-03

# ロバストなデータ駆動型規範性最適化

Robust Data-driven Prescriptiveness Optimization ( http://arxiv.org/abs/2306.05937v2 )

ライセンス: Link先を確認

Mehran Poursoltani, Erick Delage, Angelos Georghiou,

(参考訳) データの豊富さは、利用可能なサイド情報を活用してより予測的な決定を下そうとする、さまざまな最適化手法の出現につながっている。応用の幅広い方法や文脈は、規範性の係数として知られる普遍的な単位のないパフォーマンス尺度の設計を動機付けている。この係数は、参照情報と比較して文脈決定の質と、サイド情報の規範的パワーの両方を定量化するように設計された。データ駆動型コンテキストにおいて前者を最大化するポリシーを特定するために,古典的経験的リスク最小化の目的に代えて規範性の係数が代わる分布的ロバストな文脈最適化モデルを提案する。分布のあいまいさ集合が適切なネスト形式と多面体構造を持つ場合、一連の線形プログラムを解くことに依存する、このモデルを解くための分岐アルゴリズムを提案する。文脈的最短経路問題について検討し、アウト・オブ・サンプルデータセットが様々な分布シフトを受ける場合の代替手法に対する結果のロバスト性を評価する。

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

翻訳日:2024-06-06 14:26:34 公開日:2024-06-03

# CompanyKG: 企業類似性定量化のための大規模不均一グラフ

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification ( http://arxiv.org/abs/2306.10649v3 )

ライセンス: Link先を確認

Lele Cao, Vilhelm von Ehrenheim, Mark Granroth-Wilding, Richard Anselmo Stahl, Andrew McCornack, Armin Catovic, Dhiana Deva Cavacanti Rocha,

(参考訳) 投資業界では、市場マッピング、競合分析、合併・買収など、さまざまな目的のために、きめ細かい会社の類似度定量化を実施することが不可欠であることが多い。我々は,企業の特徴や関係を多様に表現し,学習するための知識グラフである企業KGを提案し,公開する。具体的には、117万の企業が企業記述の埋め込みに富んだノードとして表現され、15の異なる企業間関係によって51.06百万のエッジが生成される。企業類似度定量化のための手法を総合的に評価するために, 類似度予測, 競合検索, 類似度ランキングという, 注釈付きテストセットを用いた3つの評価タスクを考案し, コンパイルした。本稿では,11個の再現可能な予測手法について,ノードのみ,エッジのみ,ノード+エッジの3つのグループに分類したベンチマーク結果を示す。私たちの知る限りでは、企業間類似性を定量化するのに適した、実世界の投資プラットフォームから派生した、最初の大規模な異種グラフデータセットである。

In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.

翻訳日:2024-06-06 14:26:34 公開日:2024-06-03

# 分類における部分的バイアスの補正

Correcting Underrepresentation and Intersectional Bias for Classification ( http://arxiv.org/abs/2306.11112v4 )

ライセンス: Link先を確認

Emily Diana, Alexander Williams Tolbert,

(参考訳) 偏見バイアスによって劣化したデータから学習することの問題点を考察し, 正の例を, 一定数のセンシティブなグループに対して異なる未知のレートでフィルタする。交叉群のメンバーシップが各交叉率を計算不能にするような設定であっても,少数の偏りのないデータを用いてグループ単位のドロップアウト率を効率的に推定できることが示される。これらの推定値を用いて、偏りのあるサンプル上で経験的誤差のみを観測しても、真の分布上の仮説の損失を近似できる再重み付け方式を構築する。そこで本研究では,この学習過程と再加重過程を包括するアルゴリズムを提案する。最後に,表現不足と交叉バイアス設定に対するPAC学習可能性の概念を定義し,このアルゴリズムが有限VC次元のモデルクラスに対して効率的な学習を可能にすることを示す。

We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show that our algorithm permits efficient learning for model classes of finite VC dimension.

翻訳日:2024-06-06 14:26:34 公開日:2024-06-03

# 2層ReLUニューラルネットワークによる確率的マルチタスク表現学習

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks ( http://arxiv.org/abs/2307.06887v4 )

ライセンス: Link先を確認

Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai,

(参考訳) ますます人気のある機械学習パラダイムは、多くのタスクでニューラルネットワーク(NN)をオフラインで事前トレーニングし、ダウンストリームタスクに適応させることである。このアプローチは様々な文脈において強力なダウンストリーム性能をもたらし、マルチタスク事前学習が効果的な特徴学習につながることを示す。最近のいくつかの理論的研究は、浅いNNはいずれかが有意義な特徴を学習していることを示している。 i) 単調なタスクで訓練されるか (ii)これらは線型であり、非線型NNが複数のタスクで訓練された場合についてはほとんど知られていない。本研究では,複数タスクにおける非線形モデルを用いたトレーニング中に特徴学習が発生することを示す最初の結果を示す。私たちのキーとなる洞察は、マルチタスク事前トレーニングは、通常タスク間で同じラベルを持つポイントを整列する表現を好む擬似コントラスト的損失を誘導するということです。この結果から,2層 ReLU NN 上の単純な勾配に基づくマルチタスク学習アルゴリズムにより,データを$d\gg r$-dimensional 入力空間内の$r$-dimensional 部分空間に投影した値に依存したラベル付きバイナリ分類タスクが,このプロジェクションを復元し,サンプルとニューロンの複雑さを$d$と独立にダウンストリームタスクに一般化できることが示唆された。対照的に、1つのタスクの引き分けよりも高い確率で、この1つのタスクのトレーニングは、すべての$r$グランドトルース機能を学ぶことを保証できない。

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a {\em single} task or (ii) they are {\em linear}, very little is known about the closer-to-practice case of {\em nonlinear} NNs trained on {\em multiple} tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an $r$-dimensional subspace within the $d\gg r$-dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of $d$. In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all $r$ ground-truth features.

翻訳日:2024-06-06 14:26:34 公開日:2024-06-03

# シフト雑音をもつ分布ロバスト変動量子アルゴリズム

Distributionally Robust Variational Quantum Algorithms with Shifted Noise ( http://arxiv.org/abs/2308.14935v2 )

ライセンス: Link先を確認

Zichang He, Bo Peng, Yuri Alexeev, Zheng Zhang,

(参考訳) 短期的な量子優位性を示す可能性を考えると、変分量子アルゴリズム(VQA)は広く研究されている。 VQAパラメータ最適化のための多くの技術が開発されているが、依然として大きな課題である。現実的な問題は、量子ノイズは非常に不安定であり、したがってリアルタイムに変化する可能性が高いことである。これは、最適化されたVQAアンザッツが異なるノイズ環境下では効果的に動作しないため、重要な問題となる。本稿では,VQAパラメータを未知のシフトノイズに対して頑健に最適化する方法を初めて検討する。ノイズレベルを未知の確率密度関数を持つ確率変数(PDF)としてモデル化し、不確実性セット内でPDFがシフトする可能性があると仮定する。この仮定は、シフトノイズの下で有効性を維持するパラメータを見つけることを目的として、分布的に堅牢な最適化問題を定式化することを促す。我々は,分布的に頑健なベイズ最適化問題を定式化するために利用する。このことは、量子近似最適化アルゴリズム(QAOA)とハードウェア効率のアンサッツを持つ変分量子固有解器(VQE)の両方で数値的な証拠を提供し、シフトノイズ下でより堅牢に実行されるパラメータを特定できることを示唆している。本研究は,パラメータ最適化の観点からのシフトノイズの影響を受け,VQAの信頼性向上に向けた第一歩とみなす。

Given their potential to demonstrate near-term quantum advantage, variational quantum algorithms (VQAs) have been extensively studied. Although numerous techniques have been developed for VQA parameter optimization, it remains a significant challenge. A practical issue is that quantum noise is highly unstable and thus it is likely to shift in real time. This presents a critical problem as an optimized VQA ansatz may not perform effectively under a different noise environment. For the first time, we explore how to optimize VQA parameters to be robust against unknown shifted noise. We model the noise level as a random variable with an unknown probability density function (PDF), and we assume that the PDF may shift within an uncertainty set. This assumption guides us to formulate a distributionally robust optimization problem, with the goal of finding parameters that maintain effectiveness under shifted noise. We utilize a distributionally robust Bayesian optimization solver for our proposed formulation. This provides numerical evidence in both the Quantum Approximate Optimization Algorithm (QAOA) and the Variational Quantum Eigensolver (VQE) with hardware-efficient ansatz, indicating that we can identify parameters that perform more robustly under shifted noise. We regard this work as the first step towards improving the reliability of VQAs influenced by shifted noise from the parameter optimization perspective.

翻訳日:2024-06-06 14:16:48 公開日:2024-06-03

# 因果的基礎モデルに向けて:因果的推論と注意の二重性について

Towards Causal Foundation Model: on Duality between Causal Inference and Attention ( http://arxiv.org/abs/2310.00809v3 )

ライセンス: Link先を確認

Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, Chao Ma,

(参考訳) ファンデーションモデルは、機械学習の風景に変化をもたらし、多様なタスクにまたがる人間レベルのインテリジェンスの火花を誇示している。しかし、因果推論のような複雑なタスクにおいてギャップは持続し、主に複雑な推論ステップと高い数値的精度の要求に関連する課題が原因である。本研究では,治療効果推定のための因果認識基盤モデルの構築に向けて第一歩を踏み出す。提案手法は,複数のラベルのないデータセットを用いて自己教師付き因果学習を行い,その結果,未知のタスクに対するゼロショット因果推論を新しいデータで実現する,Causal Inference with Attention (CInA) と呼ばれる,理論的に正当化された手法を提案する。これは、最適共変量バランスと自己アテンションの原始的双対関係を実証し、訓練されたトランスフォーマー型アーキテクチャの最終層を通したゼロショット因果推論を容易にする理論結果に基づいている。我々は、CInAが、従来のデータセットごとの手法にマッチしたり、超えたりしながら、分散データセットや様々な実世界のデータセットに効果的に一般化できることを実証的に実証した。これらの結果は,本手法が因果基盤モデルの発展の足掛かりとなる可能性を示唆する証拠となる。

Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset methodologies. These results provide compelling evidence that our method has the potential to serve as a stepping stone for the development of causal foundation models.

翻訳日:2024-06-06 14:16:48 公開日:2024-06-03

# 不確かさを定量的に予測するオンラインアルゴリズム

Online Algorithms with Uncertainty-Quantified Predictions ( http://arxiv.org/abs/2310.11558v2 )

ライセンス: Link先を確認

Bo Sun, Jerry Huang, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman, Raouf Boutaba,

(参考訳) 予測を伴うアルゴリズムの急成長する分野は、オンラインアルゴリズムのパフォーマンスを改善するために、潜在的に不完全な機械学習予測を使用することの問題を研究する。このフレームワークの既存のアルゴリズムのほとんどすべてが予測品質を前提としていないが、機械学習モデルに不確実な定量化(UQ)を提供する方法が近年開発され、意思決定時の予測品質に関する追加情報を可能にしている。本研究では,オンラインアルゴリズムの設計における不確実性定量化予測を最適に活用する問題について検討する。特に,スキーレンタルとオンライン検索という2つの古典的なオンライン問題について検討し,意思決定者がUQを付加した予測を行い,基底真理が特定の範囲の値に収まる可能性について述べる。我々は、UQ予測を完全に活用するために、アルゴリズム設計への非自明な修正が必要であることを実証する。さらに、より一般的なUQの活用方法を考察し、マルチインスタンス環境での意思決定にUQを活用することを学ぶオンライン学習フレームワークを提案する。

The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which could enable additional information about prediction quality at decision time. In this work, we investigate the problem of optimally utilizing uncertainty-quantified predictions in the design of online algorithms. In particular, we study two classic online problems, ski rental and online search, where the decision-maker is provided predictions augmented with UQ describing the likelihood of the ground truth falling within a particular range of values. We demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the UQ predictions. Moreover, we consider how to utilize more general forms of UQ, proposing an online learning framework that learns to exploit UQ to make decisions in multi-instance settings.

翻訳日:2024-06-06 14:07:02 公開日:2024-06-03

# ParisLuco3D:LiDAR知覚の領域一般化のための高品質なターゲットデータセット

ParisLuco3D: A high-quality target dataset for domain generalization of LiDAR perception ( http://arxiv.org/abs/2310.16542v3 )

ライセンス: Link先を確認

Jules Sanchez, Louis Soum-Fontez, Jean-Emmanuel Deschaud, Francois Goulette,

(参考訳) LiDARは、シーンに関する正確な幾何学的情報を収集することによって、自律運転に不可欠なセンサーである。 % 利用可能なデータの量が増えるにつれて,この情報を認識するために公開することが興味深い。様々なLiDAR認識タスクの性能が向上するにつれて、これらの最適化されたモデルを実環境下でテストするために、新しい環境やセンサーへの一般化が出現している。本稿では,クロスドメイン評価のための新しいデータセットParisLuco3Dを提案する。データセットに加えて、LiDARセマンティックセグメンテーション、LiDARオブジェクト検出、LiDARトラッキングのためのオンラインベンチマークも提供され、メソッド間の公正な比較が保証される。 ParisLuco3Dデータセット、評価スクリプト、ベンチマークへのリンクは以下のウェブサイトで見ることができる。

LiDAR is an essential sensor for autonomous driving by collecting precise geometric information regarding a scene. %Exploiting this information for perception is interesting as the amount of available data increases. As the performance of various LiDAR perception tasks has improved, generalizations to new environments and sensors has emerged to test these optimized models in real-world conditions. This paper provides a novel dataset, ParisLuco3D, specifically designed for cross-domain evaluation to make it easier to evaluate the performance utilizing various source datasets. Alongside the dataset, online benchmarks for LiDAR semantic segmentation, LiDAR object detection, and LiDAR tracking are provided to ensure a fair comparison across methods. The ParisLuco3D dataset, evaluation scripts, and links to benchmarks can be found at the following website:https://npm3d.fr/parisluco3d

翻訳日:2024-06-06 14:07:02 公開日:2024-06-03

# 言語モデルからの制御された復号化

Controlled Decoding from Language Models ( http://arxiv.org/abs/2310.17022v3 )

ライセンス: Link先を確認

Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami,

(参考訳) KL正規化強化学習(KL-regularized reinforcement learning、RL)は、高い報奨結果に対する言語モデル応答を制御するための一般的なアライメントフレームワークである。トークン単位のRLを目的とし、制御復号(CD)と呼ばれるモジュラーソルバを提案する。 CDは個別のプレフィックススコアリングモジュールを通じて制御を行い、報酬の値関数を学習するように訓練される。プレフィックススコアラは、推論時に凍結ベースモデルから生成を制御するために使用され、RL目標に対する溶液から確実にサンプリングされる。我々は,CDが人気のあるベンチマークの制御機構として有効であることを実証的に実証した。また,複数報酬に対するプレフィックススコアラを推論時に組み合わせることで,追加のトレーニングを伴わずに多目的RL問題を効果的に解決できることを示す。また,CD転送を未確認のベースモデルに適用することで,さらなるチューニングを行なわないメリットが示された。最後に,CDを推論時にブロックワイズで復号化することで,一般的なK戦略と強化学習によるトークンワイズ制御のギャップを埋めることができることを示す。これにより、CDは言語モデルのアライメントに有望なアプローチとなる。

KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-K strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.

翻訳日:2024-06-06 14:07:02 公開日:2024-06-03

# 無線通信におけるデータ再構成強化のための条件付き拡散確率モデル

Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications ( http://arxiv.org/abs/2310.19460v2 )

ライセンス: Link先を確認

Mehdi Letafati, Samad Ali, Matti Latva-aho,

(参考訳) 本稿では,無線チャネル上でのデータ伝送と再構成を強化するために,条件付き拡散確率モデル(DDPM)を提案する。 DDPMの基盤となるメカニズムは、いわゆる“デノイング”ステップでデータ生成プロセスを分解することだ。これに触発された鍵となる考え方は、情報信号の「ノイズからクリーン」変換を学ぶ際に、拡散モデルの生成的先行を活用して、データ再構成を強化することである。提案手法は,マルチメディア通信において,情報コンテンツに関する事前知識が利用できる通信シナリオに有用である。したがって、情報レートを下げる複雑なチャネル符号を使う代わりに、信頼性の高いデータ再構成、特に信号対雑音比(SNR)の低い信号対雑音比(SNR)やハードウェア障害通信による極端なチャネル条件下で拡散先を利用することができる。提案したDDPM支援受信機は、MNISTデータセットを用いた無線画像伝送のシナリオに合わせて調整される。数値計算の結果は,従来のデジタル通信やディープニューラルネットワーク(DNN)ベースのベンチマークと比較して,提案手法の再構築性能を強調した。また, 誤り訂正のための情報レートを低下させることなく, 低いSNR体制下で10dB以上の改善が達成できることが示唆された。

In this paper, conditional denoising diffusion probabilistic models (DDPMs) are proposed to enhance the data transmission and reconstruction over wireless channels. The underlying mechanism of DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, the key idea is to leverage the generative prior of diffusion models in learning a "noisy-to-clean" transformation of the information signal to help enhance data reconstruction. The proposed scheme could be beneficial for communication scenarios in which a prior knowledge of the information content is available, e.g., in multimedia transmission. Hence, instead of employing complicated channel codes that reduce the information rate, one can exploit diffusion priors for reliable data reconstruction, especially under extreme channel conditions due to low signal-to-noise ratio (SNR), or hardware-impaired communications. The proposed DDPM-assisted receiver is tailored for the scenario of wireless image transmission using MNIST dataset. Our numerical results highlight the reconstruction performance of our scheme compared to the conventional digital communication, as well as the deep neural network (DNN)-based benchmark. It is also shown that more than 10 dB improvement in the reconstruction could be achieved in low SNR regimes, without the need to reduce the information rate for error correction.

翻訳日:2024-06-06 14:07:02 公開日:2024-06-03

# VQPy: 現代的なビデオ分析のためのオブジェクト指向アプローチ

VQPy: An Object-Oriented Approach to Modern Video Analytics ( http://arxiv.org/abs/2311.01623v4 )

ライセンス: Link先を確認

Shan Yu, Zhenting Zhu, Yu Chen, Hanchen Xu, Pengzhan Zhao, Yang Wang, Arthi Padmanabhan, Hugo Latapie, Harry Xu,

(参考訳) ビデオ分析は現代のシステムやサービスで広く使われている。ビデオ分析の最前線は、ユーザが特定の関心のあるオブジェクトを見つけるために開発するビデオクエリである。ビデオ分析の中心である映像オブジェクト(人間、動物、車など)は、従来のオブジェクト指向言語でモデル化されたオブジェクトと精神的に類似しているという知見に基づいて、ビデオ分析のためのオブジェクト指向アプローチを開発することを提案する。 VQPyという名前のこのアプローチは、フロントエンド$\unicode{x2015}$a Python variantと、ビデオオブジェクトとそのインタラクションを簡単に表現できるコンストラクトと、ビデオオブジェクトに基づいてパイプラインを自動構築および最適化する拡張可能なバックエンドで構成されている。私たちは、DeepVisionフレームワークの一部としてCiscoで製品化されているVQPyを実装、オープンソース化しました。

Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a frontend$\unicode{x2015}$a Python variant with constructs that make it easy for users to express video objects and their interactions$\unicode{x2015}$as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which has been productized in Cisco as part of its DeepVision framework.

翻訳日:2024-06-06 14:07:02 公開日:2024-06-03

# genEVA:LLMを用いた分岐物語の生成と可視化

GENEVA: GENErating and Visualizing branching narratives using LLMs ( http://arxiv.org/abs/2311.09213v2 )

ライセンス: Link先を確認

Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan,

(参考訳) 対話型ロールプレイングゲーム(RPG)は強力なストーリーテリングを必要とする。これらの物語は、大きな創造的なチームを書くのに何年もかかるかもしれない。本研究では,このプロセスを支援するため,大規模生成テキストモデルの可能性を示す。プロトタイプツールである \textbf{GENEVA} は、デザイナによって提供される高レベルな物語記述と制約にマッチするストーリーラインの分岐と再収束を伴うリッチな物語グラフを生成する。大規模言語モデル(LLM)であるGPT-4は、分岐した物語を生成し、2段階のプロセスでグラフ形式でレンダリングするために使用される。本稿では,異なる文脈制約下での4つの有名な物語の分岐物語生成におけるgenEVAの利用について述べる。このツールはゲーム開発、シミュレーション、その他のゲームライクな特性を持つアプリケーションを支援する可能性がある。

Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# 材料生成のためのスケーラブル拡散

Scalable Diffusion for Materials Generation ( http://arxiv.org/abs/2311.09235v2 )

ライセンス: Link先を確認

Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk,

(参考訳) インターネット規模のデータに基づいてトレーニングされた生成モデルは、新規で現実的なテキスト、画像、ビデオを生成することができる。次の自然な疑問は、新しい安定物質を生成するなど、これらのモデルが科学を前進させることができるかどうかである。伝統的に、明示的な構造を持つモデル(例えばグラフ)は、科学データ(例えば結晶中の原子や結合)の構造関係をモデル化するのに使われてきたが、大規模で複雑なシステムにスケールすることは困難である。材料生成におけるもうひとつの課題は、標準生成モデリングメトリクスと下流アプリケーションとのミスマッチである。例えば、復元誤差のような一般的な指標は、安定した材料を発見するという下流の目標とよく相関しない。本研究では,任意の結晶構造を表現可能な統一結晶表現(UniMat)を開発し,これらのUniMat表現上で拡散確率モデルを訓練することによって,拡張性に挑戦する。実験の結果,UniMatは明示的な構造モデリングの欠如にもかかわらず,より大規模で複雑な化学系から高忠実度結晶構造を生成できることが示唆された。新規な安定材料発見などの下流アプリケーションへの材料生成の質向上を図るため,密度関数理論(DFT)の分解エネルギーを通した凸殻に対するコンベックス生成エネルギーと安定性を含む材料生成モデルの評価指標を提案する。最後に、UniMatを用いた条件付き生成は、数百万の結晶構造を持つ既存の結晶データセットにスケール可能であることを示し、新しい安定物質を発見する上で、ランダムな構造探索(構造発見の現在の先導方法)よりも優れていることを示す。

Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# 量子開始スコア

Quantum Inception Score ( http://arxiv.org/abs/2311.12163v3 )

ライセンス: Link先を確認

Akira Sone, Akira Tanji, Naoki Yamamoto,

(参考訳) 機械学習における古典的生成モデルの成功に触発されて、量子バージョンの熱心な探索が最近始まった。この旅に出発するためには、量子生成モデルの質を評価するための関連する指標を開発することが重要である。本稿では,cISの自然な拡張として,量子発生器の量子開始スコア(qIS)を提案する。重要な点として、QISは、与えられたデータセットを分類する量子チャネルのホレボ情報に品質を関連付ける。この文脈では、qISのいくつかの特性を示す。第一に、qISは対応するcISよりも大きいか等しいかであり、システム出力の投影測定によって定義される。第2に、QISとcISの違いは、非対称性の資源理論によって特徴づけられるように、量子コヒーレンスの存在から生じる。第3に、絡み合ったジェネレータのセットを用意した場合には、QISのさらなる拡張につながる分類プロセスが存在する。第4に、量子ゆらぎ定理を利用して、QISの物理的極限を特徴づける。最後に、量子多体物理学における位相分類問題に対して、量子畳み込みニューラルネットワークを量子分類器として、量子生成モデルとして1次元スピンチェーンモデルの品質を評価するためにqISを適用した。

Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such example is the (classical) inception score (cIS). In this paper, as a natural extension of cIS, we propose the quantum inception score (qIS) for quantum generators. Importantly, qIS relates the quality to the Holevo information of the quantum channel that classifies a given dataset. In this context, we show several properties of qIS. First, qIS is greater than or equal to the corresponding cIS, which is defined through projection measurements on the system output. Second, the difference between qIS and cIS arises from the presence of quantum coherence, as characterized by the resource theory of asymmetry. Third, when a set of entangled generators is prepared, there exists a classifying process leading to the further enhancement of qIS. Fourth, we harness the quantum fluctuation theorem to characterize the physical limitation of qIS. Finally, we apply qIS to assess the quality of the one-dimensional spin chain model as a quantum generative model, with the quantum convolutional neural network as a quantum classifier, for the phase classification problem in the quantum many-body physics.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# OASIS:フェデレートラーニングにおけるアクティブリコンストラクションアタックのオフセット

OASIS: Offsetting Active Reconstruction Attacks in Federated Learning ( http://arxiv.org/abs/2311.13739v2 )

ライセンス: Link先を確認

Tre' R. Jeter, Truc Nguyen, Raed Alharbi, My T. Thai,

(参考訳) フェデレートラーニング(FL)は、モデルのトレーニング効率を高めながら、ユーザのプライバシを保護する可能性について、大きな注目を集めている。そのため、FLは医療から工業工学まで、特に機密情報やプライバシー法によってデータが簡単に交換できない分野において、さまざまな領域で利用されてきた。しかし、最近の研究では、不適切なサーバによって実行されるアクティブリコンストラクションアタックによって、FLプロトコルが容易に損なわれることが示されている。これらの攻撃には、グローバルモデルパラメータの悪意ある修正が含まれており、サーバは、勾配更新を反転させることで、ユーザのプライベートデータの冗長コピーを取得することができる。このタイプの攻撃に対処することは、強力な脅威モデルのために重要な課題である。本稿では, モデル性能を維持しつつ, アクティブリコンストラクション攻撃を効果的に防止する, 画像強化に基づく防御機構, OASISを提案する。まず,これらの攻撃を可能にする勾配反転の原理を明らかにし,攻撃戦略によらず防御が堅牢である主条件を理論的に同定する。次に,攻撃原理を損なう可能性があることを示す画像拡張による防御を構築した。総合的な評価は、そのソリューションとしての可能性を強調する防衛機構の有効性を示すものである。

Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. For that reason, FL has found its use in various domains, from healthcare to industrial engineering, especially where data cannot be easily exchanged due to sensitive information or privacy laws. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose a defense mechanism, namely OASIS, based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct our defense with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of the defense mechanism highlighting its feasibility as a solution.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# 量子コンピューティングアプローチによる高スピンモデルの2次元コヒーレントスペクトル

Two-dimensional coherent spectrum of high-spin models via a quantum computing approach ( http://arxiv.org/abs/2311.14035v4 )

ライセンス: Link先を確認

Martin Mootz, Peter P. Orth, Chuankun Huang, Liang Luo, Jigang Wang, Yong-Xin Yao,

(参考訳) 本稿では,高スピンモデルの2次元コヒーレントスペクトル(2DCS)を計算するための量子コンピューティング手法を提案する。本手法は,数個の磁場パルスの存在下でのリアルタイムダイナミクスのシミュレーションに基づく。適応型変動量子力学シミュレーション(AVQDS)アルゴリズムを,その小型回路による研究に利用し,周波数空間の必要な分解能を達成するために,十分に長時間のシミュレーションを可能にする。具体的には、Dzyaloshinskii-Moriya相互作用と単一イオン異方性を含む反強磁性量子スピンモデルを考える。得られた2DCSスペクトルは、未摂動ハミルトニアンの異なる固有状態間の遷移から生じるマグノン周波数の倍数の異なるピークを示す。 1次元コヒーレントスペクトルを2DCSと比較することにより、2DCSがエネルギースペクトルの高分解能を提供することを示す。さらに、高スピン演算子の2つの異なるバイナリエンコーディング(標準バイナリエンコーディングとグレイ符号)を用いて、スピンの大きさで量子資源がスケールする方法について検討する。低磁場では、両方の符号化は同等の量子資源を必要とするが、より大きな磁場ではグレイ符号が有利である。サイト数が増加するスピンモデルの数値シミュレーションは、量子資源の多項式系サイズのスケーリングを示している。最後に,2DCSの数値計算結果と希土類オルソフェリット系の実験結果を比較した。量子ハイスピンモデルの2DCSにおける高調波発生信号の観測強度は実験データとよく一致し, 対応する平均場よりも顕著に向上した。

We present and benchmark a quantum computing approach to calculate the two-dimensional coherent spectrum (2DCS) of high-spin models. Our approach is based on simulating their real-time dynamics in the presence of several magnetic field pulses, which are spaced in time. We utilize the adaptive variational quantum dynamics simulation (AVQDS) algorithm for the study due to its compact circuits, which enables simulations over sufficiently long times to achieve the required resolution in frequency space. Specifically, we consider an antiferromagnetic quantum spin model that incorporates Dzyaloshinskii-Moriya interactions and single-ion anisotropy. The obtained 2DCS spectra exhibit distinct peaks at multiples of the magnon frequency, arising from transitions between different eigenstates of the unperturbed Hamiltonian. By comparing the one-dimensional coherent spectrum with 2DCS, we demonstrate that 2DCS provides a higher resolution of the energy spectrum. We further investigate how the quantum resources scale with the magnitude of the spin using two different binary encodings of the high-spin operators: the standard binary encoding and the Gray code. At low magnetic fields both encodings require comparable quantum resources, but at larger field strengths the Gray code is advantageous. Numerical simulations for spin models with increasing number of sites indicate a polynomial system-size scaling for quantum resources. Lastly, we compare the numerical 2DCS with experimental results on a rare-earth orthoferrite system. The observed strength of the magnonic high-harmonic generation signals in the 2DCS of the quantum high-spin model aligns well with the experimental data, showing significant improvement over the corresponding mean-field results.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# 混成分類器による精度・ロバスト性取引の軽減

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off ( http://arxiv.org/abs/2311.15165v2 )

ライセンス: Link先を確認

Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi,

(参考訳) 深層神経分類器は、最近、データ駆動制御システムで大きな成功を収めている。しかし、既存のモデルは精度と敵の堅牢性の間のトレードオフに悩まされている。この制限は、高い性能と厳格な堅牢性の両方を必要とする安全クリティカルなシステムの制御において克服されなければならない。本研究では、ロバストモデルから高いロバスト性と標準モデルから高い精度を同時に継承する分類器を開発する。具体的には、標準ニューラルネットワークとロバストニューラルネットワークの出力確率を混合した理論的動機付け型定式化を提案する。どちらの基本分類器も事前訓練されているので、我々の方法は追加の訓練を必要としない。数値実験により,混合分類器は精度・損耗トレードオフを顕著に改善し,ロバスト基底分類器の信頼性特性を,より良質なトレードオフの鍵となるものとして同定することを確認した。我々の理論的結果は、弱い仮定の下で、ロバスト基底モデルのロバスト性が証明された場合、入力上の閉じた形式である$\ell_p$半径内での変更や攻撃は、混合分類器の誤分類をもたらすことを証明している。

Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.

翻訳日:2024-06-06 13:57:08 公開日:2024-06-03

# 電界波の夢:拡散モデルを用いた心臓励起波の生成モデル

Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models ( http://arxiv.org/abs/2312.14830v2 )

ライセンス: Link先を確認

Tanish Baranwal, Jan Lebert, Jan Christoph,

(参考訳) 心臓の電気波は、心房細動や心室細動などの不整脈が持続する間、回転する渦巻波またはスクロール波を形成する。波動力学は通常、励起媒質中の反応拡散力学を記述する結合偏微分方程式を用いてモデル化される。最近では、物理的および生物学的システムにおいて時空間パターンを生成する代替として、データ駆動生成モデリングが出現している。本稿では,心組織における電磁波パターンの生成モデル構築のための拡散確率モデルについて検討する。我々は、非条件および条件付き生成タスクにおいて、そのような波動パターンを生成できるように、模擬波動パターンを用いた拡散モデルを訓練した。例えば、拡散に基づく研究を行った。 i) パラメータ固有の生成 ii) 進化と進化三表面二次元測定による三次元スクロール波動の再構成を含む渦巻き波動の塗装さらに, 任意の形状の両心室ジオメトリを生成し, 拡散を利用したスクロールウェーブパターンを同時に開始した。生体物理モデルを用いて得られた解に対する拡散生成溶液の特性と比較を行った結果,拡散モデルはスパイラル波とスクロール波のダイナミックスを再現し,心組織における励起波のデータ駆動モデリングに利用できることがわかった。例えば、拡散誘起スパイラル波動のアンサンブルは、生物物理学モデルでシミュレートされた対応するアンサンブルと同様の自己終端統計を示す。しかし, 拡散モデルでは, トレーニングデータが不足している場合, 例えば, 自己終端時, および, 制約が不十分な場合の「幻覚」波のパターンを生成できることがわかった。

Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored the diffusion-based i) parameter-specific generation, ii) evolution and iii) inpainting of spiral wave dynamics, including reconstructing three-dimensional scroll wave dynamics from superficial two-dimensional measurements. Further, we generated arbitrarily shaped bi-ventricular geometries and simultaneously initiated scroll wave patterns inside these geometries using diffusion. We characterized and compared the diffusion-generated solutions to solutions obtained with corresponding biophysical models and found that diffusion models learn to replicate spiral and scroll waves dynamics so well that they could be used for data-driven modeling of excitation waves in cardiac tissue. For instance, an ensemble of diffusion-generated spiral wave dynamics exhibits similar self-termination statistics as the corresponding ensemble simulated with a biophysical model. However, we also found that diffusion models {produce artifacts if training data is lacking, e.g. during self-termination,} and `hallucinate' wave patterns when insufficiently constrained.

翻訳日:2024-06-06 13:37:33 公開日:2024-06-03

# 高分解能ジコトコス像の両側参照

Bilateral Reference for High-Resolution Dichotomous Image Segmentation ( http://arxiv.org/abs/2401.03407v4 )

ライセンス: Link先を確認

Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe,

(参考訳) 高分解能ディコトコス像分割(DIS)のための新しい両側参照フレームワーク(BiRefNet)を導入する。本研究は,2つの基本成分: 局所化モジュール (LM) と再構成モジュール (RM) を, 提案した両側参照 (BiRef) で構成する。 LMはグローバルな意味情報を用いたオブジェクトのローカライゼーションを支援する。 RM内では、画像の階層的パッチがソース参照を提供し、勾配マップがターゲット参照として機能する、再構成プロセスにBiRefを利用する。これらのコンポーネントは、最終的な予測マップを生成するために協力する。また,より詳細な領域に焦点を絞るために,補助的な勾配監督を導入する。さらに、地図の質とトレーニングプロセスを改善するために、Disdisに適した実践的なトレーニング戦略を概説する。提案手法の汎用性を検証するため,BiRefNetがすべてのベンチマークにおいて,タスク固有の最先端手法よりも優れた性能を示すことを示すため,4つのタスクについて広範な実験を行った。私たちのコードはhttps://github.com/ZhengPeng7/BiRefNetで公開されています。

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.

翻訳日:2024-06-06 13:27:48 公開日:2024-06-03

# REBUS: シンボル理解のためのロバストな評価ベンチマーク

REBUS: A Robust Evaluation Benchmark of Understanding Symbols ( http://arxiv.org/abs/2401.05604v2 )

ライセンス: Link先を確認

Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung,

(参考訳) 本稿では,リバスパズルを用いたマルチモーダル大言語モデルの性能評価手法を提案する。データセットは、画像ベースのワードプレイのオリジナル例333をカバーし、映画、作曲家、主要都市、食品など13のカテゴリを網羅している。キーワードやフレーズを識別するベンチマークで優れたパフォーマンスを達成するためには、画像認識と文字列操作を仮説テスト、多段階推論、人間の認知の理解と組み合わせて、複雑なマルチモーダルな機能評価を行う必要がある。 GPT-4oは他のモデルよりも大幅に優れており、続いてプロプライエタリモデルも他のモデルよりも優れています。しかし、最高のモデルでさえ、最終的な精度は42\%に過ぎず、ハードパズルでは7\%に低下し、推論の大幅な改善の必要性が浮かび上がっている。さらに、モデルはパズルのすべての部分をほとんど理解せず、ほとんど常に正解を遡って説明できない。したがって、我々のベンチマークは、マルチモーダルな大言語モデルの知識と推論における大きな欠点を特定するのに利用できる。

We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that GPT-4o significantly outperforms all other models, followed by proprietary models outperforming all other evaluated models. However, even the best model has a final accuracy of only 42\%, which goes down to just 7\% on hard puzzles, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models.

翻訳日:2024-06-06 13:27:48 公開日:2024-06-03

# ニューロ・シンボリック推論と学習のための凸とバイレベル最適化

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning ( http://arxiv.org/abs/2401.09651v2 )

ライセンス: Link先を確認

Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor,

(参考訳) 我々は凸と双レベル最適化の手法を活用し、ニューラルシンボリック(NeSy)システムのための一般的な勾配に基づくパラメータ学習フレームワークを開発する。我々は、最先端のNeSyアーキテクチャであるNeuPSLを使って、我々のフレームワークを実演する。そこで本研究では、NeuPSL推論のスムーズな原始的および双対的定式化を提案し、学習勾配が最適双対変数の関数であることを示す。さらに,温暖化開始を自然に活用する新しい定式化のための二重ブロック座標降下アルゴリズムを開発した。これにより、現在の最高のNeuPSL推論メソッドよりも100倍以上の学習ランタイムが改善される。最後に、さまざまなタスクをカバーする8つのデータセットにわたる広範な経験的評価を行い、我々の学習フレームワークが、代替学習手法よりも最大16%のポイント予測性能の向上を達成することを実証する。

We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.

翻訳日:2024-06-06 13:27:48 公開日:2024-06-03

# モース不整合からのリップシンクディープフェイクの抽出

Exposing Lip-syncing Deepfakes from Mouth Inconsistencies ( http://arxiv.org/abs/2401.10113v2 )

ライセンス: Link先を確認

Soumyya Kanti Datta, Shan Jia, Siwei Lyu,

(参考訳) リップシンクのディープフェイク(英: Lip-syncing Deepfake)は、人の唇の動きをAIモデルを使って説得力のある方法で生成し、修正された音声や全く新しい音声にマッチさせるデジタル操作されたビデオである。リップ同期のディープフェイクは、人工物がリップ領域に限定されており、識別が困難であるため、危険なタイプのディープフェイクである。本稿では,口領域の時間的不整合を識別し,口内不整合(LIPINC)に基づく口内深度検出法を提案する。これらの矛盾は、隣接するフレームやビデオ全体に見られる。我々のモデルはこれらの不規則性をうまく捉え、いくつかのベンチマークディープフェイクデータセットで最先端の手法より優れている。コードはhttps://github.com/skrantidatta/LIPINCで公開されている。

A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio. Lip-syncing deepfakes are a dangerous type of deepfakes as the artifacts are limited to the lip region and more difficult to discern. In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency (LIPINC), for lip-syncing deepfake detection by identifying temporal inconsistencies in the mouth region. These inconsistencies are seen in the adjacent frames and throughout the video. Our model can successfully capture these irregularities and outperforms the state-of-the-art methods on several benchmark deepfake datasets. Code is available at https://github.com/skrantidatta/LIPINC

翻訳日:2024-06-06 13:27:48 公開日:2024-06-03

# 任意スケールの病理画像スーパーレゾリューションに向けて: インシシト自己テクスチャ強化による効率的なデュアルブランチフレームワーク

Towards Arbitrary-Scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework via Implicit Self-texture Enhancement ( http://arxiv.org/abs/2401.15613v2 )

ライセンス: Link先を確認

Minghong Duan, Linhao Qu, Zhiwei Yang, Manning Wang, Chenxi Zhang, Zhijian Song,

(参考訳) 高品質な全スライディングスキャナーは高価で複雑で時間を要するため、日常臨床における高解像度の病理画像の取得と利用が制限される。低分解能画像から高分解能画像を合成することにより、深層学習に基づく単一画像の超解像技術がこの問題の解決に有効な方法である。しかし、病理画像に適用された既存の超解像モデルは、固定整数倍率でしか機能せず、適用性が著しく低下する。暗黙的な神経表現に基づく手法は、自然画像の任意のスケールの超解像において有望な結果を示しているが、それを病理画像に直接適用することは、自然画像とは異なる独特の微細な画像テクスチャを持つため、不十分である。そこで本研究では,この課題に対処するために,任意の規模の病理像の超解像を実現するためのImplicit Self-Texture Enhancement-based dual-branch framework (ISTE)を提案する。 ISTEには、まずピクセルの特徴とテクスチャの特徴を学習するテクスチャ学習ブランチと、画素学習ブランチが含まれている。そして、2段階のテクスチャ強化戦略を設計し、2段階のテクスチャを融合させて超解像結果を得る。 3つの公開データセットに対する大規模な実験によると、ISTEは既存の固定スケールおよび任意のスケールのアルゴリズムを複数の倍率で上回り、下流タスクのパフォーマンスを向上させる。我々の知る限りでは、病理画像における任意のスケールの超解像を実現するための最初の試みである。コードは利用可能。

High-quality whole-slide scanners are expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution pathology whole-slide images in daily clinical work. Deep learning-based single-image super-resolution techniques are an effective way to solve this problem by synthesizing high-resolution images from low-resolution ones. However, the existing super-resolution models applied in pathology images can only work in fixed integer magnifications, significantly decreasing their applicability. Though methods based on implicit neural representation have shown promising results in arbitrary-scale super-resolution of natural images, applying them directly to pathology images is inadequate because they have unique fine-grained image textures different from natural images. Thus, we propose an Implicit Self-Texture Enhancement-based dual-branch framework (ISTE) for arbitrary-scale super-resolution of pathology images to address this challenge. ISTE contains a pixel learning branch and a texture learning branch, which first learn pixel features and texture features, respectively. Then, we design a two-stage texture enhancement strategy to fuse the features from the two branches to obtain the super-resolution results, where the first stage is feature-based texture enhancement, and the second stage is spatial-domain-based texture enhancement. Extensive experiments on three public datasets show that ISTE outperforms existing fixed-scale and arbitrary-scale algorithms at multiple magnifications and helps to improve downstream task performance. To the best of our knowledge, this is the first work to achieve arbitrary-scale super-resolution in pathology images. Codes will be available.

翻訳日:2024-06-06 13:17:49 公開日:2024-06-03

# 変圧器はコピー時の状態空間モデルより優れている

Repeat After Me: Transformers are Better than State Space Models at Copying ( http://arxiv.org/abs/2402.01032v2 )

ライセンス: Link先を確認

Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach,

(参考訳) トランスフォーマーはシーケンスモデリングにおいて支配的なアーキテクチャであるが、我々は「一般化状態空間モデル」(GSSM)と呼ばれるシーケンス長に依存しない固定サイズの潜在状態を使用するモデルへの関心が高まっている。本稿では,GSSMは推論時間効率の面で有望であるが,入力コンテキストからのコピーを必要とするタスクにおいて,トランスフォーマーモデルと比較して限定的であることを示す。まず,2層変換器が指数関数長の文字列をコピーできるのに対して,GSSMは固定サイズ潜在状態によって根本的に制限されていることを証明する。実験により,コンテクストの複製を必要とする合成タスクにおいて,トランスフォーマーがGSSMよりも効率や一般化に優れていることが判明した。最後に、事前学習した大規模言語モデルを評価し、コンテクストからの情報のコピーと検索において、トランスフォーマーモデルが状態空間モデルより劇的に優れていることを見出した。これらの結果は,本研究の課題におけるトランスフォーマーとGSSMの根本的なギャップを示唆するものである。

Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context. We start with a theoretical analysis of the simple task of string copying and prove that a two layer transformer can copy strings of exponential length while GSSMs are fundamentally limited by their fixed-size latent state. Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context. Finally, we evaluate pretrained large language models and find that transformer models dramatically outperform state space models at copying and retrieving information from context. Taken together, these results suggest a fundamental gap between transformers and GSSMs on tasks of practical interest.

翻訳日:2024-06-06 13:17:49 公開日:2024-06-03

# 構成生成モデリング:1つのモデルだけでは十分ではない

Compositional Generative Modeling: A Single Model is Not All You Need ( http://arxiv.org/abs/2402.01103v3 )

ライセンス: Link先を確認

Yilun Du, Leslie Kaelbling,

(参考訳) 大量のデータに基づいてトレーニングされた巨大なモノリシックな生成モデルは、AI研究においてますます支配的なアプローチになりつつある。本稿では,より小さな生成モデルを構成することによって,より大規模な生成システムを構築するべきであると論じる。このような構成的生成アプローチによって、よりデータ効率の良い方法で分布を学習し、トレーニング時に見つからないデータ分布の一部に一般化できることを示す。さらに、トレーニングで完全に見えないタスクのための新しい生成モデルをプログラムし、構築することを可能にする方法を示す。最後に、多くの場合、データから別々の構成成分を発見できることを示す。

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.

翻訳日:2024-06-06 13:17:49 公開日:2024-06-03

# PINNの育成における課題--景観の喪失をめざして

Challenges in Training PINNs: A Loss Landscape Perspective ( http://arxiv.org/abs/2402.01868v2 )

ライセンス: Link先を確認

Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell,

(参考訳) 本稿では,物理情報ニューラルネットワーク(PINN)の学習における課題について考察し,学習過程における損失景観の役割を強調した。本稿では, PINN損失関数の最小化の難しさについて検討する。我々は、勾配に基づく最適化器AdamとL-BFGSとそれらの組み合わせAdam+L-BFGSを比較し、Adam+L-BFGSの優位性を示し、新しい二階最適化器NysNewton-CG(NNCG)を導入し、PINNの性能を大幅に向上させた。理論的には、不条件微分演算子と不条件演算子のPINN損失の関係を解明し、一階と二階の最適化法を組み合わせる利点を示す。我々の研究は、PINNを訓練するための貴重な洞察とより強力な最適化戦略を示し、難しい偏微分方程式を解くためのPINNの有用性を向上させることができる。

This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.

翻訳日:2024-06-06 13:17:49 公開日:2024-06-03

# 効率的であることを学ぶ - 大規模言語モデルにおける構造化された疎結合の構築

Learn To be Efficient: Build Structured Sparsity in Large Language Models ( http://arxiv.org/abs/2402.06126v3 )

ライセンス: Link先を確認

Haizhong Zheng, Xiaoyan Bai, Xueshen Liu, Z. Morley Mao, Beidi Chen, Fan Lai, Atul Prakash,

(参考訳) 大きな言語モデル(LLM)は、その10億レベルのパラメータで驚くべき成功を収めていますが、高い推論オーバーヘッドを引き起こします。 LLMにおける活性化空間の出現は、推論のためのパラメータの一部だけを含むことによって、このコストを削減する自然なアプローチを提供する。しかし、既存の手法では、この自然に形成された活性化空間を訓練後の環境で利用することのみに焦点が当てられており、この固有領域をさらに増幅する可能性を見越している。本稿では,より構造化された活性化空間を実現することにより,LCMが効率良く学習できるという仮説を立てる。そこで本研究では,Learning-To-be-Efficient(LTE)という新しいトレーニングアルゴリズムを導入し,LLMを学習してニューロンの活性化を減らし,空間性と性能のトレードオフを改善することを目的とした。さらに、主にReLUベースのモデルに焦点を当てたSOTA MoEfication法とは異なり、LTEは非ReLUアクティベーションを使用してLLaMAのようなLLMにも適用することができる。言語理解、言語生成、命令チューニングタスクに関する広範囲な評価は、LTEがSOTAベースラインを一貫して上回っていることを示している。ハードウェア対応のカスタムカーネル実装に加えて、LTEはLLaMA2-7B推論遅延を50%の間隔で25%削減します。

Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like LLaMA using non-ReLU activations. Extensive evaluation on language understanding, language generation, and instruction tuning tasks show that LTE consistently outperforms SOTA baselines. Along with our hardware-aware custom kernel implementation, LTE reduces LLaMA2-7B inference latency by 25% at 50% sparsity.

翻訳日:2024-06-06 13:08:02 公開日:2024-06-03

# 効率的な普遍的形態制御のための蒸留型ハイパーネット

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control ( http://arxiv.org/abs/2402.06570v2 )

ライセンス: Link先を確認

Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson,

(参考訳) 異なるロボット形態の普遍的なポリシーを学ぶことは、学習効率を著しく向上させ、ゼロショットの一般化を目に見えない形態の一般化を可能にする。しかし、高性能なユニバーサルポリシーを学ぶには、より単純な多層パーセプトロン(MLP)よりもメモリと計算コストが大きいトランスフォーマー(TF)のような高度なアーキテクチャを必要とする。 TFのような優れた性能と、推論時のMLPのような高効率を実現するために、(1)ロボットのMDPポリシーを生成する形態条件付きハイパーネットワーク(HN)、(2)トレーニングを成功させるために不可欠なポリシー蒸留アプローチからなるHyperDistillを提案する。何百もの多様な形態のベンチマークであるUNIMALにおいて、HyperDistillはトレーニングと未確認テストロボットの共通TF教師ポリシーと同様に、異なる環境でモデルサイズを6～14倍、計算コストを67～160倍削減することを示した。我々の分析は、推論時間におけるHyperDistillの効率性は、知識分離、すなわち、タスク間知識とタスク内知識を分離する能力に起因している。

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.

翻訳日:2024-06-06 13:08:02 公開日:2024-06-03

# ランダム化平滑化を用いたセグメンテーションのための適応的階層的認証

Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing ( http://arxiv.org/abs/2402.08400v2 )

ライセンス: Link先を確認

Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz,

(参考訳) 機械学習の認証は、特定の条件下でモデルを回避する敵のサンプルが存在しないことを証明している。セグメンテーションの一般的な認証方法は、平らな粒度のクラスを使い、多くのクラスでモデルの不確実性のために高い断続率をもたらす。本稿では,複数レベルの階層内の画素を認証し,不安定なコンポーネントに対する粗いレベルの認証を適応的に緩和する,より実用的な設定を提案する。問題設定を数学的に定式化し、適応的階層的認証アルゴリズムを導入し、その保証の正確性を証明する。認証精度は、粗いクラスを考慮した情報損失を考慮しないので、クラス粒度レベルに比例した認証情報ゲイン(\mathrm{CIG}$)メトリクスを導入する。 Cityscapes, PASCAL-Context, ACDC, COCO-Stuffのデータセットに関する広範な実験により、我々の適応アルゴリズムは、現在の最先端認証法と比較して、より高い$\mathrm{CIG}$と低い吸収率を達成することを示した。私たちのコードは、https://github.com/AlaaAnani/adaptive-certify.comで参照できます。

Certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions, a necessity for safety-critical domains. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty across many classes. We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees. Since certified accuracy does not take the loss of information into account for coarser classes, we introduce the Certified Information Gain ($\mathrm{CIG}$) metric, which is proportional to the class granularity level. Our extensive experiments on the datasets Cityscapes, PASCAL-Context, ACDC and COCO-Stuff demonstrate that our adaptive algorithm achieves a higher $\mathrm{CIG}$ and lower abstain rate compared to the current state-of-the-art certification method. Our code can be found here: https://github.com/AlaaAnani/adaptive-certify.

翻訳日:2024-06-06 12:58:06 公開日:2024-06-03

# 時間分布シフト下におけるモデル評価と選択

Model Assessment and Selection under Temporal Distribution Shift ( http://arxiv.org/abs/2402.08672v2 )

ライセンス: Link先を確認

Elise Han, Chengpiao Huang, Kaizheng Wang,

(参考訳) 変動環境におけるモデル評価と選択について,現在と歴史的時代の両方からデータセットを合成することによって検討する。未知かつ潜在的に任意の時間分布シフトに対処するため、与えられたモデルの一般化誤差を推定する適応型ローリングウインドウ手法を開発した。この戦略はまた、一般化誤差の差を推定することにより、任意の2つの候補モデルの比較を容易にする。さらに、ペアワイズ比較を単一消去トーナメントに統合し、候補の集合から最適に近いモデル選択を実現する。理論的解析と数値実験により,提案手法の非定常性に対する適応性を示す。

We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.

翻訳日:2024-06-06 12:58:06 公開日:2024-06-03

# トランスダクティブサンプル複合体はコンパクトである

Transductive Sample Complexities Are Compact ( http://arxiv.org/abs/2402.10360v2 )

ライセンス: Link先を確認

Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng,

(参考訳) すべての仮説クラス$H$は、すべての有限射影が標本複雑性$m$で学習可能であれば、正確には、半帰納的標本複雑性$m$で学習可能である。この厳密なコンパクト性は、任意の適切な計量損失函数(例えば、$\mathbb{R}^d$のノルム)およびコンパクト空間上の任意の連続損失(例えば、クロスエントロピー、正方形損失)に関して、実現可能かつ非依存的な学習に成り立つことを証明している。不適切な計量損失を伴う実現可能な学習のために、サンプルの複雑さの正確なコンパクト性は失敗しうることを示し、そのようなサンプルの複雑さが相違する程度で2の係数の上と下の境界が一致することを示す。我々は、無知の場合においてより大きなギャップが可能であると推測する。さらに、PACのサンプル複雑度とトランスダクティブモデル(実現可能な場合、低次因子まで)の等価性を呼び出すことで、結果を直接PACモデルに移植することが可能となり、PAC学習において広く保持されるほぼ正確なコンパクト性の形式が明らかになる。

We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to any proper metric loss function (e.g., any norm on $\mathbb{R}^d$) and any continuous loss on a compact space (e.g., cross-entropy, squared loss). For realizable learning with improper metric losses, we show that exact compactness of sample complexity can fail, and provide matching upper and lower bounds of a factor of 2 on the extent to which such sample complexities can differ. We conjecture that larger gaps are possible for the agnostic case. Furthermore, invoking the equivalence between sample complexities in the PAC and transductive models (up to lower order factors, in the realizable case) permits us to directly port our results to the PAC model, revealing an almost-exact form of compactness holding broadly in PAC learning.

翻訳日:2024-06-06 12:58:06 公開日:2024-06-03

# PAT-Questions: リアルタイム質問応答のための自己更新ベンチマーク

PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering ( http://arxiv.org/abs/2402.11034v2 )

ライセンス: Link先を確認

Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong, Vagelis Hristidis,

(参考訳) TQA(Temporal Question Answering)の既存の研究は、主に特定のタイムスタンプやイベント(1970年のアメリカ大統領は誰だったのか? 時間的文脈が現在と相対的な問題(例えば「前大統領は誰だったのか」など)は、ほとんど研究されていない。本報告では,この問題をPATQA(Present-Anchored Temporal QA)と呼ぶ。 PATQAは、(1)大きな言語モデル(LLM)が時代遅れの知識を持つかもしれないし、(2)複雑な時間的関係(例えば 'before' や 'previous' など)は推論が難しいし、(3)マルチホップ推論が必要かもしれないし、(4)ベンチマークの金の回答を継続的に更新する必要がある。これらの課題に対処するために、単座と多座の時間的問題を含むPAT-Questionsベンチマークを導入する。 PAT-Questionsの回答は、もし利用可能であれば、ナレッジグラフ上でSPARQLクエリを再実行することで、自動的に更新できる。我々は、直接的プロンプトと検索強化生成(RAG)を用いて、PAT-Questionsにおける最先端のLLMとSOTA時間的推論モデル(TEMPREASON-T5)を評価した。その結果、PATQAにおける既存のソリューションの限界を強調し、PATQA推論機能を改善するための新しい方法の必要性を動機付けている。

Existing work on Temporal Question Answering (TQA) has predominantly focused on questions anchored to specific timestamps or events (e.g. "Who was the US president in 1970?"). Little work has studied questions whose temporal context is relative to the present time (e.g. "Who was the previous US president?"). We refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses unique challenges: (1) large language models (LLMs) may have outdated knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are hard to reason, (3) multi-hop reasoning may be required, and (4) the gold answers of benchmarks must be continuously updated. To address these challenges, we introduce the PAT-Questions benchmark, which includes single and multi-hop temporal questions. The answers in PAT-Questions can be automatically refreshed by re-running SPARQL queries on a knowledge graph, if available. We evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model (TEMPREASON-T5) on PAT-Questions through direct prompting and retrieval-augmented generation (RAG). The results highlight the limitations of existing solutions in PATQA and motivate the need for new methods to improve PATQA reasoning capabilities.

翻訳日:2024-06-06 12:58:06 公開日:2024-06-03

# 近接量子限界雑音特性を持つ4波混合を用いた4-8GHzの動特性インダクタンスパラメトリック増幅器

A 4-8 GHz Kinetic Inductance Travelling-Wave Parametric Amplifier Using Four-Wave Mixing with Near Quantum-Limit Noise Performance ( http://arxiv.org/abs/2402.11751v4 )

ライセンス: Link先を確認

Farzad Faramarzi, Ryan Stephenson, Sasha Sypkens, Byeong H. Eom, Henry LeDuc, Peter Day,

(参考訳) 動インダクタンス進行波パラメトリック増幅器(KI-TWPA)は、量子限界に近い性能と比較的高いダイナミックレンジを持つ広い瞬時帯域を有する。このため、低温検出器や超伝導量子ビットに適した読み出し装置であり、量子センシングに様々な応用がある。本研究では,NbTiNマイクロストリップ伝送線路における4波長混合に基づくKI-TWPAの設計,製造,性能について述べる。別個の高周波数帯域で発生する画像トーンから汚染されることなく、4〜8〜GHzの信号帯域を増幅する。 4〜8〜GHz帯は、マイクロ波速度インダクタンス検出器(MKID)やジョセフソンジャンクションベースの量子ビットなどの低温検出器の読み出しに一般的に用いられている。 1-dBゲイン圧縮点が-58dBmの4波長混合による最大ゲイン20dB以上を,そのバンドよりも15dBのゲイン15dBで測定した。帯域幅とピークゲインは、ポンプ音の周波数と電力を調整することで調整可能である。また、Y-factor法を用いて、4.5 - 8GHzの1.5$光子に対して0.5ドル/0.5ドル/0.5ドル/0.5ドル/の増幅雑音を測定する。

Kinetic inductance traveling-wave parametric amplifiers (KI-TWPA) have a wide instantaneous bandwidth with near quantum-limited performance and a relatively high dynamic range. Because of this, they are suitable readout devices for cryogenic detectors and superconducting qubits and have a variety of applications in quantum sensing. This work discusses the design, fabrication, and performance of a KI-TWPA based on four-wave mixing in a NbTiN microstrip transmission line. This device amplifies a signal band from 4 to 8~GHz without contamination from image tones, which are produced in a separate higher frequency band. The 4 - 8~GHz band is commonly used to read out cryogenic detectors, such as microwave kinetic inductance detectors (MKIDs) and Josephson junction-based qubits. We report a measured maximum gain of over 20 dB using four-wave mixing with a 1-dB gain compression point of -58 dBm at 15 dB of gain over that band. The bandwidth and peak gain are tunable by adjusting the pump-tone frequency and power. Using a Y-factor method, we measure an amplifier-added noise of $ 0.5 \leq N_{added} \leq 1.5$ photons from 4.5 - 8 GHz.

翻訳日:2024-06-06 12:48:21 公開日:2024-06-03

# 非線形力学系の状態とパラメータ推定のための反復INLA

Iterated INLA for State and Parameter Estimation in Nonlinear Dynamical Systems ( http://arxiv.org/abs/2402.17036v2 )

ライセンス: Link先を確認

Rafael Anderka, Marc Peter Deisenroth, So Takao,

(参考訳) データ同化法(DA)法は、微分方程式から生じる先行値を用いてデータを頑健に補間し、外挿する。高次元非線形PDE事前処理を行うアンサンブル法のような一般的な手法は、主に状態推定に重点を置いているが、パラメータを正確に学習することは困難である。一方、機械学習に基づくアプローチは、状態とパラメータを自然に学習することができるが、適用性は制限されるか、解釈が難しい不確実性を生成することができる。空間統計学におけるIntegrated Nested Laplace Approximation (INLA)法に着想を得て,動的モデルの反復線形化に基づくDAへの代替手法を提案する。これにより、各イテレーションでガウスマルコフランダムフィールドを生成し、INLAを使って状態とパラメータを推測することができる。本手法は,解釈可能性を維持しながら任意の非線形システムに利用することができ,さらにDAタスクにおける既存手法よりも優れていることを示す。非線形PDE事前処理に対するよりニュアンスなアプローチを提供することにより、予測精度の向上とロバスト性、特にデータ空間が普及している場所での予測が可能となる。

Data assimilation (DA) methods use priors arising from differential equations to robustly interpolate and extrapolate data. Popular techniques such as ensemble methods that handle high-dimensional, nonlinear PDE priors focus mostly on state estimation, however can have difficulty learning the parameters accurately. On the other hand, machine learning based approaches can naturally learn the state and parameters, but their applicability can be limited, or produce uncertainties that are hard to interpret. Inspired by the Integrated Nested Laplace Approximation (INLA) method in spatial statistics, we propose an alternative approach to DA based on iteratively linearising the dynamical model. This produces a Gaussian Markov random field at each iteration, enabling one to use INLA to infer the state and parameters. Our approach can be used for arbitrary nonlinear systems, while retaining interpretability, and is furthermore demonstrated to outperform existing methods on the DA task. By providing a more nuanced approach to handling nonlinear PDE priors, our methodology offers improved accuracy and robustness in predictions, especially where data sparsity is prevalent.

翻訳日:2024-06-06 12:38:37 公開日:2024-06-03

# 急激な不安定性を超えて--LLMにおける政治的世界観の信頼性と一貫性の評価

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs ( http://arxiv.org/abs/2402.17649v2 )

ライセンス: Link先を確認

Tanise Ceron, Neele Falk, Ana Barić, Dmitry Nikolaev, Sebastian Padó,

(参考訳) ユビキタスシステムで大規模言語モデル(LLM)が広く使われているため、それらが特定の世界観を埋め込んでいるのか、どのように反映されているのかを理解する必要がある。近年の研究では、政治的アンケートにより、LLMは左利き(Feng et al , 2023; Motoki et al , 2024)を示すことが報告されている。しかし、これらの傾きが信頼できるか(変動を促すために悪用されている)、また、その傾きが政策や政治的傾きに一貫したものであるかは定かではない。本研究では、EU7カ国から収集された投票支援票のデータセットに基づいて、政治声明に対するLCMの姿勢の信頼性と整合性を評価する一連のテストを提案する。本研究では, 7B から 70B までの大きさの LLM について検討し, パラメータ数によって信頼性が向上することを確認した。より大規模なモデルは、左派政党との全体的な整合性を示すが、政策プログラムによって異なる: 環境保護、社会福祉国家、リベラル社会に対する(左派)肯定的な姿勢と、(右派)法と秩序を、外交政策と移民に一貫した好意を持たない。

Due to the widespread use of large language models (LLMs) in ubiquitous systems, we need to understand whether they embed a specific worldview and what these views reflect. Recent studies report that, prompted with political questionnaires, LLMs show left-liberal leanings (Feng et al., 2023; Motoki et al., 2024). However, it is as yet unclear whether these leanings are reliable (robust to prompt variations) and whether the leaning is consistent across policies and political leaning. We propose a series of tests which assess the reliability and consistency of LLMs' stances on political statements based on a dataset of voting-advice questionnaires collected from seven EU countries and annotated for policy domains. We study LLMs ranging in size from 7B to 70B parameters and find that their reliability increases with parameter count. Larger models show overall stronger alignment with left-leaning parties but differ among policy programs: They evince a (left-wing) positive stance towards environment protection, social welfare state and liberal society but also (right-wing) law and order, with no consistent preferences in foreign policy and migration.

翻訳日:2024-06-06 12:38:37 公開日:2024-06-03

# ヘラクレス:高分解能画像と時系列解析のためのハイブリッドSSM変換器モデル

Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis ( http://arxiv.org/abs/2403.18063v2 )

ライセンス: Link先を確認

Badri N. Patro, Suhas Ranganath, Vinay P. Namboodiri, Vijay S. Agneeswaran,

(参考訳) トランスフォーマーは、DeIT、Swin、SVT、Biformer、STVit、FDVITなどの適応で画像モデリングタスクに革命をもたらした。しかし、これらのモデルはしばしば誘導バイアスと高い二次的複雑性の課題に直面し、高解像度画像では効率が低下する。 Mamba、V-Mamba、ViM、SiMBAのような状態空間モデル(SSM)は、コンピュータビジョンタスクで高解像度の画像を処理する代替手段を提供する。これらのSSMは2つの大きな問題に遭遇する。まず、大規模なネットワークサイズにスケールすると不安定になる。第二に、画像内のグローバルな情報を効率的にキャプチャするが、本質的にはローカル情報を扱うのに苦労する。これらの課題に対処するため,ローカルSSM,グローバルSSM,アテンションベースのトークンインタラクションモジュールを統合した新しいSSMであるHeraclesを紹介した。 Heraclesは、グローバルイメージ情報のためのHartelyカーネルベースのステートスペースモデル、ローカル詳細のためのローカライズされた畳み込みネットワーク、トークンインタラクションのためのより深いレイヤにおけるアテンションメカニズムを活用する。大規模な実験により、Heracles-C-smallは84.5\%のトップ-1精度でImageNetデータセット上で最先端のパフォーマンスを達成することが示された。 Heracles-C-Large と Heracles-C-Huge はさらに精度を 85.9\% と 86.4\% に改善した。さらに、Heraclesは、CIFAR-10、CIFAR-100、Oxford Flowers、Stanford Carsといったデータセット上のトランスファー学習タスクや、例えばMSCOCOデータセット上のセグメンテーションに優れています。ヘラクレスはまた、7つの時系列データセットで最先端の結果を達成し、スペクトルデータでドメインをまたいで一般化する能力を示し、ローカル情報とグローバル情報の両方をキャプチャすることで、その汎用性を証明している。プロジェクトのページはこちらのリンクで公開されている。 https://github.com/badripatro/heracles}

Transformers have revolutionized image modeling tasks with adaptations like DeIT, Swin, SVT, Biformer, STVit, and FDVIT. However, these models often face challenges with inductive bias and high quadratic complexity, making them less efficient for high-resolution images. State space models (SSMs) such as Mamba, V-Mamba, ViM, and SiMBA offer an alternative to handle high resolution images in computer vision tasks. These SSMs encounter two major issues. First, they become unstable when scaled to large network sizes. Second, although they efficiently capture global information in images, they inherently struggle with handling local information. To address these challenges, we introduce Heracles, a novel SSM that integrates a local SSM, a global SSM, and an attention-based token interaction module. Heracles leverages a Hartely kernel-based state space model for global image information, a localized convolutional network for local details, and attention mechanisms in deeper layers for token interactions. Our extensive experiments demonstrate that Heracles-C-small achieves state-of-the-art performance on the ImageNet dataset with 84.5\% top-1 accuracy. Heracles-C-Large and Heracles-C-Huge further improve accuracy to 85.9\% and 86.4\%, respectively. Additionally, Heracles excels in transfer learning tasks on datasets such as CIFAR-10, CIFAR-100, Oxford Flowers, and Stanford Cars, and in instance segmentation on the MSCOCO dataset. Heracles also proves its versatility by achieving state-of-the-art results on seven time-series datasets, showcasing its ability to generalize across domains with spectral data, capturing both local and global information. The project page is available at this link.\url{https://github.com/badripatro/heracles}

翻訳日:2024-06-06 12:19:03 公開日:2024-06-03

# IoTクラウドシステムのストレステストのためのリーンシミュレーションフレームワーク

A Lean Simulation Framework for Stress Testing IoT Cloud Systems ( http://arxiv.org/abs/2404.11542v3 )

ライセンス: Link先を確認

Jia Li, Behrad Moeini, Shiva Nejati, Mehrdad Sabetzadeh, Michael McCallen,

(参考訳) モノのインターネット(Internet of Things)は、スマートシティ、自動運転車、健康モニタリングなど、さまざまな分野のスマートデバイスを世界中に接続する。シミュレーションはIoTシステムのテストにおいて重要な役割を果たす。本稿は、IoTのシミュレーションベースのテストにおいて、特に重要なニーズである、クラウドシステムのストレステストに対処する。既存のIoT用のストレステストソリューションは、かなりの計算リソースを必要とするため、不適合でコストがかかる。クラウドと通信する多数のIoTデバイスとエッジデバイスの効率的なシミュレーションを可能にする,IoTクラウドストレステスト用に設計されたリーンシミュレーションフレームワークを提案する。実践者のシミュレーション構築を容易にするため,モデルベース仕様からシミュレータを生成するためのドメイン固有言語であるIoTECSを開発した。我々はIoTECSの構文とセマンティクスを提供し、XtextとXtendを使ってIoTECSを実装します。我々は、クラウドベースのIoT監視システムとIoT接続車両システムという、2つの実世界のシステムのストレステストのためのIoTECS仕様から生成されたシミュレータを評価する。実験結果から,(1)Dockerコンテナ化の設定時に最高のパフォーマンスを得る,(2)ケーススタディシステムのサービス容量を効果的に評価する,(3) 産業用ストレステストベースラインツールであるJMeterとLocustを,同じハードウェアリソースを使用してシミュレート可能なIoTおよびエッジデバイスの数で3.5倍に向上させる,という結果が得られた。 IoTECSの実用性に関する最初の洞察を得るために、私たちは、IoTECSを初めて経験した業界パートナの2人のエンジニアにインタビューした。これらのインタビューからのフィードバックは、IoTECSがIoTクラウドシステムのストレステストに有効であり、かなりの時間と労力を節約できることを示している。

The Internet of Things connects a plethora of smart devices globally across various applications like smart cities, autonomous vehicles and health monitoring. Simulation plays a key role in the testing of IoT systems, noting that field testing of a complete IoT product may be infeasible or prohibitively expensive. This paper addresses a specific yet important need in simulation-based testing for IoT: Stress testing of cloud systems. Existing stress testing solutions for IoT demand significant computational resources, making them ill-suited and costly. We propose a lean simulation framework designed for IoT cloud stress testing which enables efficient simulation of a large array of IoT and edge devices that communicate with the cloud. To facilitate simulation construction for practitioners, we develop a domain-specific language (DSL), named IoTECS, for generating simulators from model-based specifications. We provide the syntax and semantics of IoTECS and implement IoTECS using Xtext and Xtend. We assess simulators generated from IoTECS specifications for stress testing two real-world systems: a cloud-based IoT monitoring system and an IoT-connected vehicle system. Our empirical results indicate that simulators created using IoTECS: (1)achieve best performance when configured with Docker containerization; (2)effectively assess the service capacity of our case-study systems, and (3)outperform industrial stress-testing baseline tools, JMeter and Locust, by a factor of 3.5 in terms of the number of IoT and edge devices they can simulate using identical hardware resources. To gain initial insights about the usefulness of IoTECS in practice, we interviewed two engineers from our industry partner who have firsthand experience with IoTECS. Feedback from these interviews suggests that IoTECS is effective in stress testing IoT cloud systems, saving significant time and effort.

翻訳日:2024-06-06 11:37:14 公開日:2024-06-03

# 構造された環境に結合したJaynes-Cummings原子:漏れ除去作用素とペッツ回収写像

Jaynes-Cummings atoms coupled to a structured environment: Leakage elimination operators and the Petz recovery maps ( http://arxiv.org/abs/2404.13762v2 )

ライセンス: Link先を確認

Da-Wei Luo, Ting Yu,

(参考訳) 本稿では,ジャイアンス・カミングス(Jyanes-Cummings,JC)モデルについて考察する。本稿では、JC原子の量子コヒーレンスを保護するために、デコヒーレンス効果の制御と抑制に有効ないくつかの戦略を提案する。漏れ除去演算子を用いたシステムダイナミクスの非摂動制御について検討する。また,ペッツ回収マップを用いて,システムと浴槽とのカップリングを工学的に行うことで,完全な量子状態逆転スキームについても検討する。その結果,ペッツ回収マップでは,マルコフノイズや非マルコフノイズによらず,JC原子のダイナミクスを完全に復元できることがわかった。最後に,我々の量子制御とリカバリ手法は,システムの一貫性の異なる側面を保護するのに有効であることを示す。

We consider the Jaynes-Cummings (JC) model embedded in a structured environment, where the atom inside an optical cavity will be affected by a hierarchical environment consisting of the cavity and its environment. We propose several effective strategies to control and suppress the decoherence effects to protect the quantum coherence of the JC atom. We study the non-perturbative control of the system dynamics by means of the leakage elimination operators. We also investigate a full quantum state reversal scheme by engineering the system and its coupling to the bath via the Petz recovery map. Our findings conclude that, with the Petz recovery map, the dynamics of the JC atom can be fully recovered regardless of Markov or non-Markovian noises. Finally, we show that our quantum control and recovery methods are effective at protecting different aspects of the system coherence.

翻訳日:2024-06-06 11:37:14 公開日:2024-06-03

# LLM型ゲームナラティブにおけるプレイヤー駆動創発

Player-Driven Emergence in LLM-Driven Game Narrative ( http://arxiv.org/abs/2404.17027v3 )

ライセンス: Link先を確認

Xiangyu Peng, Jessica Quaye, Sudha Rao, Weijia Xu, Portia Botchway, Chris Brockett, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Bill Dolan,

(参考訳) 我々は,大規模言語モデル (LLM) との相互作用が創発的行動を引き起こし,プレイヤーがゲーム物語の進化に参加する力を与える方法を探る。我々のテストベッドはテキストアドベンチャーゲームであり、プレイヤーは固定された物語の前提でミステリーを解こうとするが、大きな言語モデルであるGPT-4によって生成された非プレイヤーキャラクターと自由に対話できる。ゲームプレイのために28人のゲーマーを募集し、GPT-4を使用してゲームログを自動的にゲームプレイの物語を表すノードグラフに変換する。 LLMの非決定論的行動と相互作用することで、プレイヤーはオリジナルの物語の一部ではなく、楽しみとエンゲージメントの可能性がある興味深い新しい創発的ノードを発見できることがわかった。最も創発的なノードを作ったプレイヤーは、しばしば発見、探索、実験を容易にするゲームを楽しむ傾向にあった。

We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.

翻訳日:2024-06-06 11:37:14 公開日:2024-06-03

# Calo-VQ:カロリメータシミュレーションにおけるベクトル量子化された2段階生成モデル

Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation ( http://arxiv.org/abs/2405.06605v2 )

ライセンス: Link先を確認

Qibin Liu, Chase Shimmin, Xiulong Liu, Eli Shlizerman, Shu Li, Shih-Chieh Hsu,

(参考訳) 本稿では,ベクトル量子化変分オートエンコーダ(VQ-VAE)を応用した,温度計応答の高速シミュレーションのための機械学習手法を提案する。そこで本モデルでは,まずジオメトリ・アウェア・カロリーメータデータを離散潜在空間に圧縮し,次に列モデルを用いて潜在トークンを学習・生成する。 Calo-Challengeデータセットの大規模な実験は,2000年の因子による従来の手法と比較して,生成速度が著しく向上したことを示す。顕著なことに、我々のモデルはミリ秒以内のカロリーメータシャワーを発生させる。さらに, 様々な測定値の総合的な定量的評価を行い, 生成の物理性能を検証した。

We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.

翻訳日:2024-06-06 09:12:28 公開日:2024-06-03

# Swin Transformer UNetによる地上画像のデコンボリューション

Ground-based image deconvolution with Swin Transformer UNet ( http://arxiv.org/abs/2405.07842v2 )

ライセンス: Link先を確認

Utsav Akhaury, Pascale Jablonka, Jean-Luc Starck, Frédéric Courbin,

(参考訳) 地上のオールスキー天体調査では今後数年で数百万の画像が収集されるため、これらの画像の空間分解能を効率的に改善できる高速デコンボリューションアルゴリズムを開発する上で重要な要件が生まれる。これらの調査からクリーンで高解像度の画像の回収に成功したことにより、正確な測光によって銀河の形成と進化の理解を深めることが目的である。 Swin Transformerアーキテクチャを用いた2段階のデコンボリューションフレームワークを提案する。我々の研究は、ディープラーニングベースのソリューションが、科学的分析の範囲を制限してバイアスをもたらすことを明らかにした。この制限に対処するため,スパーシティウェーブレットフレームワークの活性係数に依存する新しい第3ステップを提案する。 EDisCSクラスタのサブセットの分析に基づいて,本手法と古典的デコンボリューションアルゴリズムFiredecの性能比較を行った。本手法の利点は, 分解能回復, ノイズ特性の一般化, 計算効率の両立にある。このクラスターサンプルの分析により、我々の手法の効率を評価することができるだけでなく、これらの銀河内のクランプの数を、円盤の色と関連づけて定量化することが可能になった。提案するロバストな手法は、地上画像による遠方の宇宙の構造の同定を約束する。

As ground-based all-sky astronomical surveys will gather millions of images in the coming years, a critical requirement emerges for the development of fast deconvolution algorithms capable of efficiently improving the spatial resolution of these images. By successfully recovering clean and high-resolution images from these surveys, the objective is to deepen the understanding of galaxy formation and evolution through accurate photometric measurements. We introduce a two-step deconvolution framework using a Swin Transformer architecture. Our study reveals that the deep learning-based solution introduces a bias, constraining the scope of scientific analysis. To address this limitation, we propose a novel third step relying on the active coefficients in the sparsity wavelet framework. We conducted a performance comparison between our deep learning-based method and Firedec, a classical deconvolution algorithm, based on an analysis of a subset of the EDisCS cluster samples. We demonstrate the advantage of our method in terms of resolution recovery, generalisation to different noise properties, and computational efficiency. The analysis of this cluster sample not only allowed us to assess the efficiency of our method, but it also enabled us to quantify the number of clumps within these galaxies in relation to their disc colour. This robust technique that we propose holds promise for identifying structures in the distant universe through ground-based images.

翻訳日:2024-06-06 09:12:28 公開日:2024-06-03

# オフラインリワード学習のための統一線形プログラミングフレームワーク

A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback ( http://arxiv.org/abs/2405.12421v2 )

ライセンス: Link先を確認

Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo,

(参考訳) Inverse Reinforcement Learning (IRL) と Reinforcement Learning from Human Feedback (RLHF) は報酬学習において重要な方法論であり、人間の実演とフィードバックに基づいて、連続的な意思決定問題の報酬関数を推論・形成する。報奨学習におけるほとんどの以前の作業は、決定や選好モデルに関する事前の知識や仮定に依存しており、堅牢性の問題につながる可能性がある。そこで本研究では,オフライン報酬学習に適した新しい線形プログラミング(LP)フレームワークを提案する。本フレームワークは,オンライン探索を使わずに事前に収集した軌道を用いて,設計したLPの一次双対最適条件から設定した有望な報酬を推定し,提案可能なサンプル効率の最適性保証を提供する。我々のLPフレームワークはまた、計算的トラクタビリティとサンプル効率を維持しながら、ペアの軌道比較データなど、報酬関数を人間のフィードバックと整合させることができる。解析例と数値実験により,従来の最大推定法(MLE)と比較して,本フレームワークは性能が向上する可能性が示唆された。

Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments.

翻訳日:2024-06-06 09:02:44 公開日:2024-06-03

# 基礎モデルの違いを理解する:注意、状態空間モデル、リカレントニューラルネットワーク

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks ( http://arxiv.org/abs/2405.15731v2 )

ライセンス: Link先を確認

Jerome Sieber, Carmen Amo Alonso, Alexandre Didier, Melanie N. Zeilinger, Antonio Orvieto,

(参考訳) ソフトマックス・アテンション(Softmax attention)は、様々な人工知能アプリケーションの基礎モデルの基本的なバックボーンであるが、シーケンス長の2次複雑さは、長いコンテキスト設定で推論スループットを制限することができる。この課題に対処するため、線形アテンション、ステートスペースモデル(SSM)、リカレントニューラルネットワーク(RNN)といった代替アーキテクチャがより効率的な代替案として検討されている。これらのアプローチ間の関係は存在するが、そのようなモデルは一般的に独立して開発されており、これらのアーキテクチャを支える共通原則とその微妙な違いを理論的に理解していないため、パフォーマンスとスケーラビリティに大きな影響を及ぼす。本稿では,これらすべてのアーキテクチャを共通表現で探索する動的システムフレームワーク(DSF)について紹介する。我々のフレームワークは厳密な比較を促進し、各モデルクラスの特色に関する新たな洞察を提供する。例えば、線形注意と選択的SSMを比較し、両者が等価である相違点と条件を詳述する。また、ソフトマックスアテンションと他のモデルクラスとの原理的な比較を行い、ソフトマックスアテンションを近似できる理論条件について議論する。さらに、これらの新たな知見を経験的検証と数学的議論で裏付ける。このことは、DSFが将来のより効率的でスケーラブルな基盤モデルの体系的な開発を導く可能性を示している。

Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.

翻訳日:2024-06-06 09:02:44 公開日:2024-06-03

# シークエンシャル意思決定におけるユーティリティと時間優先の推論

Inference of Utilities and Time Preference in Sequential Decision-Making ( http://arxiv.org/abs/2405.15975v2 )

ライセンス: Link先を確認

Haoyang Cao, Zhengqi Wu, Renyuan Xu,

(参考訳) 本稿では,過去の業務からクライアントの投資嗜好を正確に推測することで,自動投資管理者やロボアドバイザの能力を高めるための,新しい確率制御フレームワークを提案する。提案手法は,各クライアントのリスク許容度,日々の消費評価,重要な生活目標に合わせた,実用機能と時間変化率の一般的な割引スキームを組み込んだ連続時間モデルを活用する。我々は、状態拡張と動的プログラミング原理の確立と検証定理の確立を通じて、結果の時間的矛盾問題に対処する。また、顧客投資嗜好の特定可能性について十分な条件を提供する。理論的発展を補完するために,エントロピー正則化を付加した離散時間マルコフ決定プロセスフレームワーク内での最大推定に基づく学習アルゴリズムを提案する。ログのような関数が局所的に凹凸であることが証明され,提案アルゴリズムの高速収束が促進される。実効性と効率性は、メルトンの問題と、未解決のリスクを伴う投資問題を含む2つの数値的な例を通して示される。提案する枠組みは、個別の投資アドバイスを改善することで金融技術を発展させるだけでなく、個別の嗜好を理解することが不可欠である医療、経済学、人工知能など他の分野にも広く貢献する。

This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.

翻訳日:2024-06-06 09:02:44 公開日:2024-06-03

# ランダムグラフのプライベートエッジ密度推定:最適,効率,ロバスト

Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust ( http://arxiv.org/abs/2405.16663v2 )

ライセンス: Link先を確認

Hongjie Chen, Jingqiu Ding, Yiding Hua, David Steurer,

(参考訳) 我々は、Erd\H{o}s-R\'enyiランダムグラフのエッジ密度とそれらの一般化、不均一ランダムグラフを推定するための、最初の多項式時間、微分ノードプライベートおよびロバストアルゴリズムを与える。さらに,アルゴリズムの誤差率を対数的因子まで最適とする情報理論的下界を証明した。以前のアルゴリズムは指数的なランニングタイムまたは準最適エラーレートを発生させる。提案アルゴリズムの主な要素は,(1)頑健なエッジ密度推定のための新しいサム・オブ・スクエアスアルゴリズム,(2)ホプキンス等による2乗指数機構に基づくプライバシーからロバストネスへの削減である。

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).

翻訳日:2024-06-06 08:53:00 公開日:2024-06-03

# BaboonLand Dataset: 野生の霊長類の追跡と、ドローンビデオからの行動認識の自動化

BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos ( http://arxiv.org/abs/2405.17698v3 )

ライセンス: Link先を確認

Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Dan Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart,

(参考訳) ドローンを使って自然環境で複数の個人を同時に追跡することは、グループ霊長類の振る舞いをよりよく理解するための強力なアプローチだ。以前の研究では、ビデオデータから霊長類の行動の分類を自動化できることが示されているが、これらの研究は、捕獲や地上カメラで行われている。集団行動と集団の自己組織化を理解するためには、生態的な決定が下される自然環境に関連して行動が観察できるスケールで部隊全体を見る必要がある。本研究では,バブーン検出,追跡,行動認識のための,ドローンビデオからの新たなデータセットを提案する。 Baboon検出データセットは、ドローンビデオにすべてのbaboonをバウンディングボックスで手動でアノテートすることで作成されている。その後、初期の5.3K解像度画像から様々なスケールの画像のピラミッドを作成するためにタイリング法が適用され、約30Kの画像がバブーン検出に使用された。トラッキングデータセットは、すべてのバウンディングボックスがビデオ全体で同じIDに割り当てられている検出データセットから導出される。このプロセスにより、30時間に及ぶ非常に密集した追跡データが得られた。行動認識データセットは、各動物を中心としたビデオサブリージョンであるミニシーンにトラックを変換することで生成され、各ミニシーンは12種類の異なる行動タイプで手動でアノテートされ、20時間以上のデータが得られる。ベンチマーク結果によると、YOLOv8-X検出モデルの平均平均精度(mAP)は92.62\%、BotSort追跡アルゴリズムでは63.81\%、X3D動作認識モデルでは63.97\%である。深層学習を用いて、ドローン映像から野生生物の行動を分類することで、グループ全体の集団行動に対する非侵襲的な洞察を促進する。

Using drones to track multiple individuals simultaneously in their natural environment is a powerful approach for better understanding group primate behavior. Previous studies have demonstrated that it is possible to automate the classification of primate behavior from video data, but these studies have been carried out in captivity or from ground-based cameras. To understand group behavior and the self-organization of a collective, the whole troop needs to be seen at a scale where behavior can be seen in relation to the natural environment in which ecological decisions are made. This study presents a novel dataset from drone videos for baboon detection, tracking, and behavior recognition. The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes. A tiling method was subsequently applied to create a pyramid of images at various scales from the original 5.3K resolution images, resulting in approximately 30K images used for baboon detection. The tracking dataset is derived from the detection dataset, where all bounding boxes are assigned the same ID throughout the video. This process resulted in half an hour of very dense tracking data. The behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal; each mini-scene was manually annotated with 12 distinct behavior types, resulting in over 20 hours of data. Benchmark results show mean average precision (mAP) of 92.62\% for the YOLOv8-X detection model, multiple object tracking precision (MOTA) of 63.81\% for the BotSort tracking algorithm, and micro top-1 accuracy of 63.97\% for the X3D behavior recognition model. Using deep learning to classify wildlife behavior from drone footage facilitates non-invasive insight into the collective behavior of an entire group.

翻訳日:2024-06-06 08:53:00 公開日:2024-06-03

# Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation ( http://arxiv.org/abs/2405.17784v2 )

ライセンス: Link先を確認

Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg,

(参考訳) 政策勾配定理を利用したモデル自由強化学習(MFRL)は連続制御タスクにおいてかなりの成功を収めた。しかし、これらのアプローチは、ゼロ階勾配推定による高勾配のばらつきに悩まされ、その結果、準最適ポリシーがもたらされる。逆に、微分可能シミュレーションを用いた第1次モデルベース強化学習(FO-MBRL)法は、ばらつきを低減した勾配を提供するが、物理的接触などの剛体力学を含むシナリオにおいて、誤差をサンプリングする可能性がある。本稿では,この誤差の原因を調査し,厳密なダイナミクスを避けるためにモデルベース地平線を適用して勾配誤差を低減するFO-MBRLアルゴリズムであるAdaptive Horizon Actor-Critic (AHAC)を導入する。実験結果から,AHACはMFRLベースラインより優れており,ローコモーションタスク全体で40%以上の報酬が得られ,壁面時間効率が向上した高次元制御環境への効率なスケーリングが可能であった。

Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.

翻訳日:2024-06-06 08:53:00 公開日:2024-06-03

# 動的治療レジームにおける強化学習 : 批判的再検討の必要性

Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination ( http://arxiv.org/abs/2405.18556v2 )

ライセンス: Link先を確認

Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu,

(参考訳) 急速に変化する医療分野では、動的治療体制(DTR)におけるオフライン強化学習(RL)の実装は、前例のない機会と課題の混在を示している。本稿では、DTRの文脈におけるオフラインRLの現状を批判的に検証する。本稿では,DTRにRLを適用することの再評価について論じる。不整合性,潜在的に不整合性評価指標,ナイーブおよび教師あり学習ベースラインの欠如,既存研究におけるRL定式化の選択の多様さなどの懸念を引用する。公開されているSepsisデータセットを用いて17,000以上の評価実験を行ったケーススタディにより、RLアルゴリズムの性能は評価指標の変化やマルコフ決定プロセス(MDP)の定式化と大きく異なることを示した。驚いたことに、いくつかのケースでは、RLアルゴリズムはポリシー評価手法や報酬設計に従属するランダムなベースラインによって超えることができる。これにより、将来のDTRにおけるより慎重な政策評価とアルゴリズム開発が求められている。さらに,RLに基づく動的治療体制の信頼性向上に向けた可能性についても検討し,コミュニティ内でさらなる議論を招いた。コードはhttps://github.com/GilesLuo/ReassessDTRで入手できる。

In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.

翻訳日:2024-06-06 08:53:00 公開日:2024-06-03

# BadRAG: 大規模言語モデルの検索拡張生成における脆弱性の特定

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models ( http://arxiv.org/abs/2406.00083v1 )

ライセンス: Link先を確認

Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou,

(参考訳) LLM(Large Language Models)は、古い情報や不正なデータを生成する傾向によって制約される。 Retrieval-Augmented Generation (RAG) は、検索手法の強みと生成モデルを組み合わせることで、これらの制限に対処する。このアプローチでは、大規模で最新のデータセットから関連する情報を取得し、生成プロセスを強化するためにそれを使用することで、より正確でコンテキスト的に適切なレスポンスが得られます。特にRAGデータベースは、Webなどの公開データからしばしばソースされるためである。本稿では,検索部(RAGデータベース)に対する脆弱性と攻撃とその生成部(LLM)に対する間接攻撃を特定するために,TrojRAG{}を提案する。具体的には、いくつかのカスタマイズされたコンテンツパスを汚染すると、検索バックドアが得られ、検索はクリーンなクエリではうまく機能するが、常にカスタマイズされた有害な逆行クエリを返す。トリガーと毒入りの通路は、様々な攻撃を実装するために高度にカスタマイズできる。例えば、トリガーは「共和党、ドナルド・トランプなど」のような意味的なグループかもしれない。逆行路は異なる内容に合わせて調整することができ、トリガーとリンクするだけでなく、それを変更することなく間接的にジェネリックLSMを攻撃するためにも用いられる。これらの攻撃には、RAGに対するサービス拒否攻撃や、トリガーによって条件付けられたLLM世代に対するセマンティックステアリング攻撃が含まれる。実験の結果,10個の逆行路を毒殺しただけで98.2 %の成功率を誘導し,逆行路を回収できることがわかった。これにより、RAGベースの GPT-4 の拒絶比を 0.01\% から 74.6\% に引き上げるか、ターゲットクエリに対して 0.22\% から 72\% に増加させることができる。

Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to enhance the generation process, leading to more accurate and contextually appropriate responses. Despite its benefits, RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web. In this paper, we propose \TrojRAG{} to identify the vulnerabilities and attacks on retrieval parts (RAG database) and their indirect attacks on generative parts (LLMs). Specifically, we identify that poisoning several customized content passages could achieve a retrieval backdoor, where the retrieval works well for clean queries but always returns customized poisoned adversarial queries. Triggers and poisoned passages can be highly customized to implement various attacks. For example, a trigger could be a semantic group like "The Republican Party, Donald Trump, etc." Adversarial passages can be tailored to different contents, not only linked to the triggers but also used to indirectly attack generative LLMs without modifying them. These attacks can include denial-of-service attacks on RAG and semantic steering attacks on LLM generations conditioned by the triggers. Our experiments demonstrate that by just poisoning 10 adversarial passages can induce 98.2\% success rate to retrieve the adversarial passages. Then, these passages can increase the reject ratio of RAG-based GPT-4 from 0.01\% to 74.6\% or increase the rate of negative responses from 0.22\% to 72\% for targeted queries.

翻訳日:2024-06-06 08:43:16 公開日:2024-06-03

# DDA:腹腔鏡下手術におけるコントラスト学習のための次元駆動型拡張探索

DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic Surgery ( http://arxiv.org/abs/2406.00907v1 )

ライセンス: Link先を確認

Yuning Zhou, Henry Badgery, Matthew Read, James Bailey, Catherine E. Davey,

(参考訳) 自己教師付き学習(SSL)は、医用画像における効果的な表現学習の可能性を秘めているが、データ拡張の選択は重要であり、ドメイン固有である。一般的な拡大政策が外科的応用に当てはまるかどうかは不明である。本研究では,DDA(Diality Driven Augmentation Search)と呼ばれる新しい手法を用いて,適切な拡張ポリシーの探索を自動化する。 DDAは、ディープ表現の局所的な次元性をプロキシターゲットとして利用し、コントラスト学習において適切なデータ拡張ポリシーを微分的に検索する。腹腔鏡下手術におけるDDAの有用性と有効性を示すとともに,適切なデータ拡張ポリシーの確立に成功している。 DDAを3つの腹腔鏡画像分類とセグメンテーションタスクで体系的に評価し,既存のベースラインよりも有意に改善した。さらに、DDAの最適化された拡張セットは、医療アプリケーションに対照的な学習を適用する際に、ドメイン固有の依存関係に関する洞察を提供する。例えば、hueは自然画像に有効な拡張であるが、腹腔鏡画像には有利ではない。

Self-supervised learning (SSL) has potential for effective representation learning in medical imaging, but the choice of data augmentation is critical and domain-specific. It remains uncertain if general augmentation policies suit surgical applications. In this work, we automate the search for suitable augmentation policies through a new method called Dimensionality Driven Augmentation Search (DDA). DDA leverages the local dimensionality of deep representations as a proxy target, and differentiably searches for suitable data augmentation policies in contrastive learning. We demonstrate the effectiveness and efficiency of DDA in navigating a large search space and successfully identifying an appropriate data augmentation policy for laparoscopic surgery. We systematically evaluate DDA across three laparoscopic image classification and segmentation tasks, where it significantly improves over existing baselines. Furthermore, DDA's optimised set of augmentations provides insight into domain-specific dependencies when applying contrastive learning in medical applications. For example, while hue is an effective augmentation for natural images, it is not advantageous for laparoscopic images.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# ZeroSmooth: 高フレームレートビデオ生成のためのトレーニング不要ディフューザ適応

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation ( http://arxiv.org/abs/2406.00908v1 )

ライセンス: Link先を確認

Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran He,

(参考訳) ビデオ生成は、特にビデオ拡散モデルの出現以来、近年顕著な進歩を遂げている。多くのビデオ生成モデルは、可塑性合成ビデオ(例えば、安定ビデオ拡散(SVD))を作成できる。しかし、ほとんどのビデオモデルは、GPUメモリが限られているだけでなく、大規模なフレームセットのモデリングが難しいため、低フレームレートのビデオしか生成できない。トレーニングビデオは常に時間圧縮のために指定された間隔で一様にサンプリングされる。以前の方法は、画素空間におけるビデオ補間モデルを後処理段階として訓練するか、特定のベースビデオモデルに対して潜時空間における補間モデルを訓練することでフレームレートを促進させる。本稿では,プラグイン・アンド・プレイ方式で異なるモデルに一般化可能な生成ビデオ拡散モデルの学習自由なビデオ補間法を提案する。ビデオ拡散モデルの特徴空間における非線形性について検討し、設計した隠れ状態補正モジュールを組み込んだ自己カスケード映像拡散モデルに変換する。鍵フレームと補間フレーム間の時間的一貫性を維持するために,自己カスケードアーキテクチャと修正モジュールを提案する。提案手法の有効性を実証するために,複数の人気ビデオモデル上で大規模な評価を行い,特に,大規模な計算資源と大規模データセットによって支援された訓練型補間モデルに匹敵する訓練自由な手法を提案する。

Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video models can only generate low frame rate videos due to the limited GPU memory as well as the difficulty of modeling a large set of frames. The training videos are always uniformly sampled at a specified interval for temporal compression. Previous methods promote the frame rate by either training a video interpolation model in pixel space as a postprocessing stage or training an interpolation model in latent space for a specific base video model. In this paper, we propose a training-free video interpolation method for generative video diffusion models, which is generalizable to different models in a plug-and-play manner. We investigate the non-linearity in the feature space of video diffusion models and transform a video model into a self-cascaded video diffusion model with incorporating the designed hidden state correction modules. The self-cascaded architecture and the correction module are proposed to retain the temporal consistency between key frames and the interpolated frames. Extensive evaluations are preformed on multiple popular video models to demonstrate the effectiveness of the propose method, especially that our training-free method is even comparable to trained interpolation models supported by huge compute resources and large-scale datasets.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# 分散安定状態のキャラクタリゼーションと温度測定

Characterization and thermometry of dissapatively stabilized steady states ( http://arxiv.org/abs/2406.00911v1 )

ライセンス: Link先を確認

George S. Grattan, Alek M. Liguori-Schremp, David. Rodríguez Pérez, Peter Graf, Wes Jones, Eliot Kapit,

(参考訳) 本研究では,ノイズ量子アルゴリズムにおける基底状態と平衡誤差の発見を目的としたアルゴリズムのファミリーの一つであるRelaxational Quantum Eigensolver (RQE) と呼ばれるアルゴリズムについて検討し,その特性について検討する。 RQEでは、二次量子ビットの2番目のレジスタをトロタライズド進化において一次系に弱結合し、アルゴリズムの実行中に補助量子ビットを周期的にリセットすることで、近似ゼロ温度バスを設計する。ランダムゲート誤差の無限温度浴のバランスをとると、RQEは基底状態の定数分に相当する平均エネルギーで状態を返す。熱的挙動からTと偏差を推定するためのいくつかの手法を用いて, このアルゴリズムの定常状態について検討する。特に, これらの系の定常状態は熱分布によってよく近似されることが確認され, 冷却に使用する同じ資源を熱測定に利用でき, 温度の信頼性の高い測定値が得られることを示す。これらの手法は、短期量子ハードウェアで容易に実装することができ、古典的なコンピュータでは近似熱状態のシミュレーションが困難であるハミルトニアンの安定化と探索が可能である。

In this work we study the properties of dissipatively stabilized steady states of noisy quantum algorithms, exploring the extent to which they can be well approximated as thermal distributions, and proposing methods to extract the effective temperature T. We study an algorithm called the Relaxational Quantum Eigensolver (RQE), which is one of a family of algorithms that attempt to find ground states and balance error in noisy quantum devices. In RQE, we weakly couple a second register of auxiliary "shadow" qubits to the primary system in Trotterized evolution, thus engineering an approximate zero-temperature bath by periodically resetting the auxiliary qubits during the algorithm's runtime. Balancing the infinite temperature bath of random gate error, RQE returns states with an average energy equal to a constant fraction of the ground state. We probe the steady states of this algorithm for a range of base error rates, using several methods for estimating both T and deviations from thermal behavior. In particular, we both confirm that the steady states of these systems are often well-approximated by thermal distributions, and show that the same resources used for cooling can be adopted for thermometry, yielding a fairly reliable measure of the temperature. These methods could be readily implemented in near-term quantum hardware, and for stabilizing and probing Hamiltonians where simulating approximate thermal states is hard for classical computers.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# 最適確率測度分解のためのワッサーシュタイン勾配流

Wasserstein gradient flow for optimal probability measure decomposition ( http://arxiv.org/abs/2406.00914v1 )

ライセンス: Link先を確認

Jiangze Han, Christopher Thomas Ryan, Xin T. Tong,

(参考訳) クラスタリングとユーザグループ化の応用に着想を得た特定の損失関数を最小化するために,確率測度をK確率サブ尺度に分解する無限次元最適化問題を検討した。最適サブ尺度の支持構造を解析的に検討し、ワッサーシュタイン勾配流に基づくアルゴリズムを導入し、それらの収束を実証する。数値的な結果は、我々のアルゴリズムの実装可能性を示し、さらなる洞察を提供する。

We examine the infinite-dimensional optimization problem of finding a decomposition of a probability measure into K probability sub-measures to minimize specific loss functions inspired by applications in clustering and user grouping. We analytically explore the structures of the support of optimal sub-measures and introduce algorithms based on Wasserstein gradient flow, demonstrating their convergence. Numerical results illustrate the implementability of our algorithms and provide further insights.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# アライメントフリーなRGBT有向物体検出:セマンティック誘導非対称ネットワークと統一ベンチマーク

Alignment-Free RGBT Salient Object Detection: Semantics-guided Asymmetric Correlation Network and A Unified Benchmark ( http://arxiv.org/abs/2406.00917v1 )

ライセンス: Link先を確認

Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo,

(参考訳) RGB and Thermal (RGBT) Salient Object Detection (SOD) は、可視画像対と熱画像対の相補的情報を利用して高品質な塩分濃度予測を実現することを目的としている。しかし、既存の手法は、労働集約的な手動整列画像対に適合し、これらの手法を元の非整列画像対に直接適用することで、その性能を著しく低下させる可能性がある。本稿では,手動のアライメントを伴わないRGBT SODと熱画像のペアに対して,RGBT SODに対処するための最初の試みを行う。具体的には2つの新しい構成要素からなるセマンティックス誘導非対称相関ネットワーク(SACNet)を提案する。 1) セマンティクス誘導による注意力を利用した非対称相関モジュール 2)マルチモーダル機能統合のためのRGB機能に応じて,関連する熱的特徴をサンプリングするための関連する特徴サンプリングモジュール。さらに,アライメントのないRGBT SODの研究を容易にするため,2000 RGBと熱画像のペアをアライメントなしで様々な現実世界のシーンから直接キャプチャするUVT2000という統合ベンチマークデータセットを構築した。整列データセットと非整列データセットの併用実験により,本手法の有効性と性能を実証した。データセットとコードはhttps://github.com/Angknpng/SACNetで公開されている。

RGB and Thermal (RGBT) Salient Object Detection (SOD) aims to achieve high-quality saliency prediction by exploiting the complementary information of visible and thermal image pairs, which are initially captured in an unaligned manner. However, existing methods are tailored for manually aligned image pairs, which are labor-intensive, and directly applying these methods to original unaligned image pairs could significantly degrade their performance. In this paper, we make the first attempt to address RGBT SOD for initially captured RGB and thermal image pairs without manual alignment. Specifically, we propose a Semantics-guided Asymmetric Correlation Network (SACNet) that consists of two novel components: 1) an asymmetric correlation module utilizing semantics-guided attention to model cross-modal correlations specific to unaligned salient regions; 2) an associated feature sampling module to sample relevant thermal features according to the corresponding RGB features for multi-modal feature integration. In addition, we construct a unified benchmark dataset called UVT2000, containing 2000 RGB and thermal image pairs directly captured from various real-world scenes without any alignment, to facilitate research on alignment-free RGBT SOD. Extensive experiments on both aligned and unaligned datasets demonstrate the effectiveness and superior performance of our method. The dataset and code are available at https://github.com/Angknpng/SACNet.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# 知覚ハッシュアルゴリズムの敵対的安全性の評価

Assessing the Adversarial Security of Perceptual Hashing Algorithms ( http://arxiv.org/abs/2406.00918v1 )

ライセンス: Link先を確認

Jordan Madden, Moxanki Bhavsar, Lhamo Dorje, Xiaohua Li,

(参考訳) 知覚ハッシュアルゴリズム(PHA)は、違法なオンラインコンテンツを識別するために広く利用されている。センシティブなアプリケーションにおける重要な役割を考えると、セキュリティの強みと弱点を理解することが重要です。本稿では,PhotoDNA,PDQ,NeuralHashの3つの主要なPHAを比較し,通常の画像編集攻撃,悪意のある敵攻撃,ハッシュ反転攻撃の3つの典型的な攻撃に対する堅牢性を評価する。一般的な研究とは対照的に,これらのPHAは乱れやクエリ予算に関する現実的な制約を適用した場合,無作為なハッシュ変動のユニークな性質から,ブラックボックス攻撃に対する弾力性を示すことが明らかとなった。さらに,本論文では,元の画像をハッシュビットから再構成し,重要なプライバシー上の懸念を提起する。セキュリティ上の脆弱性を包括的に公開することにより,PHAのセキュリティを効果的に展開するための継続的な取り組みに寄与する。

Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing attacks, malicious adversarial attacks, and hash inversion attacks. Contrary to prevailing studies, this paper reveals that these PHAs exhibit resilience to black-box adversarial attacks when realistic constraints regarding the distortion and query budget are applied, attributed to the unique property of random hash variations. Moreover, this paper illustrates that original images can be reconstructed from the hash bits, raising significant privacy concerns. By comprehensively exposing their security vulnerabilities, this paper contributes to the ongoing efforts aimed at enhancing the security of PHAs for effective deployment.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# セグメンションワイド擬似ラベリングによる弱スーパービジョンオーディオ・ビジュアル・ビデオ・パーシングの高速化

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling ( http://arxiv.org/abs/2406.00919v1 )

ライセンス: Link先を確認

Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang,

(参考訳) オーディオ・ビジュアル・ビデオ・パーシング(Audio-Visual Video Parsing)タスクは、可聴ビデオの音声ストリームと視覚ストリームの両方で発生する事象を特定し、時間的にローカライズすることを目的としている。ビデオ・イベント・ラベルのみが提供され、iie、モダリティ、ラベルのタイムスタンプが不明な、弱い教師付きで実行されることが多い。高度に注釈付けされたラベルがないため、最近の研究は偽のラベルを活用して監督を強化しようとしている。一般的に使用される戦略は、既知のビデオイベントラベルをモダリティごとに分類することで擬似ラベルを生成することである。しかし、ラベルは依然としてビデオレベルに限定されており、イベントの時間的境界はラベル付きのままである。本稿では,オープンワールドから学んだ事前知識を活用することで,各ビデオセグメントにラベルを明示的に割り当てることのできる,新しい擬似ラベル生成戦略を提案する。具体的には、CLIPとCLAPという大規模な事前学習モデルを用いて、各ビデオセグメントのイベントを推定し、セグメントレベルの視覚的および音声的擬似ラベルを生成する。そこで我々は,これらの擬似ラベルをカテゴリ豊かさとセグメント豊かさを考慮した新たな損失関数を提案する。また、異常に大きな前方損失が発生した場合にそれを反転させることで、視覚的擬似ラベルをさらに改善するためのラベル装飾戦略も採用する。 LLPデータセットの広範な実験を行い、提案した各設計の有効性を実証し、あらゆる種類のイベント解析、Shaie、オーディオイベント、ビジュアルイベント、オーディオ視覚イベントにおける最先端のビデオ解析性能を達成する。また,本手法の利点と一般化を再度検証し,音声・視覚事象の局所化タスクに関する擬似ラベル生成戦略についても検討した。

The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are provided, \ie, the modalities and the timestamps of the labels are unknown. Due to the lack of densely annotated labels, recent work attempts to leverage pseudo labels to enrich the supervision. A commonly used strategy is to generate pseudo labels by categorizing the known video event labels for each modality. However, the labels are still confined to the video level, and the temporal boundaries of events remain unlabeled. In this paper, we propose a new pseudo label generation strategy that can explicitly assign labels to each video segment by utilizing prior knowledge learned from the open world. Specifically, we exploit the large-scale pretrained models, namely CLIP and CLAP, to estimate the events in each video segment and generate segment-level visual and audio pseudo labels, respectively. We then propose a new loss function to exploit these pseudo labels by taking into account their category-richness and segment-richness. A label denoising strategy is also adopted to further improve the visual pseudo labels by flipping them whenever abnormally large forward losses occur. We perform extensive experiments on the LLP dataset and demonstrate the effectiveness of each proposed design and we achieve state-of-the-art video parsing performance on all types of event parsing, \ie, audio event, visual event, and audio-visual event. We also examine the proposed pseudo label generation strategy on a relevant weakly-supervised audio-visual event localization task and the experimental results again verify the benefits and generalization of our method.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# 二重確率勾配によるSGDのデマイタイズ

Demystifying SGD with Doubly Stochastic Gradients ( http://arxiv.org/abs/2406.00920v1 )

ライセンス: Link先を確認

Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner,

(参考訳) 難解な期待の和の形の最適化の目的は重要度(拡散モデル、変分オートエンコーダなど)が高くなり、「無限のデータ付き有限和」とも呼ばれる。これらの問題に対して、一般的な戦略は、SGDを2倍確率勾配(二重確率勾配)で採用することであり、期待値は各成分の勾配推定器を用いて推定され、その和はこれらの推定器のサブサンプリングによって推定される。その人気にもかかわらず、有界分散のような強い仮定の下では、二重SGDの収束性についてはほとんど知られていない。本研究では,従属成分勾配推定器を含む独立ミニバッチとランダムリシャッフルによる2つのSGDの収束を確立する。特に、依存推定器の場合、我々の分析は効果相関の微粒化解析を可能にする。その結果,1項目あたりの計算予算は$b \times m$で,$b$はミニバッチサイズであり,$m$はモンテカルロのサンプル数である。さらに、ランダムリシャッフル(RR)がサブサンプリングノイズの複雑性依存性を向上させることを証明する。

Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of the effect correlations. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our analysis suggests where one should invest most of the budget in general. Furthermore, we prove that random reshuffling (RR) improves the complexity dependence on the subsampling noise.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# コントラクトランタイムビヘイビアグラフを用いたEthereum上のPonziスキームの有効検出に向けて

Towards Effective Detection of Ponzi schemes on Ethereum with Contract Runtime Behavior Graph ( http://arxiv.org/abs/2406.00921v1 )

ライセンス: Link先を確認

Ruichao Liang, Jing Chen, Cong Wu, Kun He, Yueming Wu, Weisong Sun, Ruiying Du, Qingchuan Zhao, Yang Liu,

(参考訳) 詐欺の一種であるPonziスキームは、近年Ethereumスマートコントラクトで発見されており、巨額の損失をもたらしている。既存の検出方法は、主に静的情報を特徴として利用するルールベースのアプローチと機械学習技術に焦点を当てている。しかし、これらの手法には大きな制限がある。ルールベースのアプローチは、限られた機能とドメイン知識に依存した事前定義されたルールに依存します。マシンラーニングにオプコードのような静的情報を使用することで、Ponziコントラクトを効果的に特徴付けることができなくなり、信頼性と解釈性が低下する。さらに、機械学習のためのトランザクションのような静的情報に依存するには、検出を実現するために一定の数のトランザクションが必要になるため、検出のスケーラビリティが制限され、0日のPonziスキームの識別が妨げられる。本稿では,契約実行時の動作に基づく効率的なPonziスキーム検出手法であるPonziGuardを提案する。 PonziGuard氏は、契約のランタイム動作が、無実のコントラクトからPonziコントラクトを分離する上でより効果的であるという観察に触発されて、契約ランタイム動作グラフ(CRBG)と呼ばれる包括的なグラフ表現を確立し、Ponziコントラクトの振る舞いを正確に表現する。さらに、CRBG上のグラフ分類タスクとして検出プロセスを定式化し、全体的な効果を高める。実験の結果、PonziGuardは、地上の真実のデータセットにおける現在の最先端のアプローチを超越していることがわかった。我々はPonziGuardをEthereum Mainnetに適用し、実世界のシナリオでその効果を実証した。 PonziGuardを使ってEthereum Mainnet上の805のPonzi契約を特定しました。また、最近デプロイされた1万のスマートコントラクトにおいて、0日間のPonziスキームも見つけました。

Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabilities and domain knowledge dependency. Using static information like opcodes for machine learning fails to effectively characterize Ponzi contracts, resulting in poor reliability and interpretability. Moreover, relying on static information like transactions for machine learning requires a certain number of transactions to achieve detection, which limits the scalability of detection and hinders the identification of 0-day Ponzi schemes. In this paper, we propose PonziGuard, an efficient Ponzi scheme detection approach based on contract runtime behavior. Inspired by the observation that a contract's runtime behavior is more effective in disguising Ponzi contracts from the innocent contracts, PonziGuard establishes a comprehensive graph representation called contract runtime behavior graph (CRBG), to accurately depict the behavior of Ponzi contracts. Furthermore, it formulates the detection process as a graph classification task on CRBG, enhancing its overall effectiveness. The experiment results show that PonziGuard surpasses the current state-of-the-art approaches in the ground-truth dataset. We applied PonziGuard to Ethereum Mainnet and demonstrated its effectiveness in real-world scenarios. Using PonziGuard, we identified 805 Ponzi contracts on Ethereum Mainnet, which have resulted in an estimated economic loss of 281,700 Ether or approximately $500 million USD. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# ランダム化中間点を用いた高速拡散型サンプリング:シークエンシャルと並列

Faster Diffusion-based Sampling with Randomized Midpoints: Sequential and Parallel ( http://arxiv.org/abs/2406.00924v1 )

ライセンス: Link先を確認

Shivam Gupta, Linda Cai, Sitan Chen,

(参考訳) 近年,拡散モデルに対する離散化境界の証明への関心が高まっている。これらの研究は、基本的に任意のデータ分布に対して、異なる雑音レベルにおけるスコア関数の十分な正確な推定値が与えられた多項式時間でおよそサンプリングできることを示している。本研究では,ShenとLeeのランダム化中間点法に着想を得た拡散モデルに対する新しい離散化手法を提案する。このアプローチは、全変動距離 (\widetilde O(d^{5/12})$) における任意の滑らかな分布からサンプリングする際の最もよく知られた次元依存性を、以前の作業から$\widetilde O(\sqrt{d})$と比較する。また,我々のアルゴリズムは,拡散モデルによる並列サンプリングの証明可能な最初の保証として,$\widetilde O(\log^2 d)$並列ラウンドでのみ並列化可能であることを示す。提案手法の副産物として,全変動距離におけるログコンケーブサンプリングのよく研究された問題に対して,従来の作業から得られる次元依存性を$\widetilde O(d^{5/12})$と$\widetilde O(\sqrt{d})$と比較するアルゴリズムと簡単な解析を行う。

In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and Lee's randomized midpoint method for log-concave sampling~\cite{ShenL19}. We prove that this approach achieves the best known dimension dependence for sampling from arbitrary smooth distributions in total variation distance ($\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work). We also show that our algorithm can be parallelized to run in only $\widetilde O(\log^2 d)$ parallel rounds, constituting the first provable guarantees for parallel sampling with diffusion models. As a byproduct of our methods, for the well-studied problem of log-concave sampling in total variation distance, we give an algorithm and simple analysis achieving dimension dependence $\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# ロバストな単眼視眼振に対する自己監督型幾何誘導初期化法

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry ( http://arxiv.org/abs/2406.00929v1 )

ライセンス: Link先を確認

Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Kazuhiro Shintani,

(参考訳) モノクロ・ビジュアル・オドメトリーは、様々な自律システムにおいて重要な技術である。従来の特徴に基づく手法とは対照的に、照明不足、テクスチャ不足、大きな動きなどによる故障に悩まされているため、近年の学習ベースSLAM法は、そのような障害に対処するために反復的な密集バンドル調整を利用して、ドメイン固有のトレーニングデータに依存することなく、様々な実環境における堅牢な正確なローカライゼーションを実現している。しかし、その可能性にもかかわらず、学習ベースのSLAMは、大きな動きとオブジェクトのダイナミクスを含むシナリオに苦戦している。本稿では、屋外ベンチマークにおける主要な障害事例を分析し、最適化プロセスの様々な欠点を明らかにすることで、一般的な学習ベースSLAMモデル(DROID-SLAM)の重大な弱点を診断する。次に,凍結した大規模単眼深度推定を応用した自己監督型前駆体を用いて,密集束調整過程を初期化し,SLAMバックボーンを微調整することなく頑健な視覚計測を行う。その単純さにもかかわらず,提案手法は, DDADベンチマークと同様に, KITTIオドメトリーの大幅な改善を示す。コードと事前訓練されたモデルは、公開時にリリースされる。

Monocular visual odometry is a key technology in a wide variety of autonomous systems. Relative to traditional feature-based methods, that suffer from failures due to poor lighting, insufficient texture, large motions, etc., recent learning-based SLAM methods exploit iterative dense bundle adjustment to address such failure cases and achieve robust accurate localization in a wide variety of real environments, without depending on domain-specific training data. However, despite its potential, learning-based SLAM still struggles with scenarios involving large motion and object dynamics. In this paper, we diagnose key weaknesses in a popular learning-based SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark. Code and pre-trained models will be released upon publication.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# LLM評価における有用性の検討

A Survey of Useful LLM Evaluation ( http://arxiv.org/abs/2406.00936v1 )

ライセンス: Link先を確認

Ji-Lun Peng, Sijia Cheng, Egil Diau, Yung-Yu Shih, Po-Heng Chen, Yen-Ting Lin, Yun-Nung Chen,

(参考訳) LLMは様々な研究領域で注目を集めている。したがって、LLMの能力を評価するための精巧な手法は、彼らが行うべき課題と責任を決定するために必要である。本研究は,LLMを有用なツールとして効果的に評価する方法を主に論じる。そこで我々は,「コア能力」から「エージェント」までの2段階のフレームワークを提案し,それぞれの段階における評価手法とともに,それぞれの能力に基づいてLLMをどのように適用できるかを明確に説明した。コア能力とは、LLMが高品質な自然言語テキストを生成するために必要とする能力を指す。 LLMがコア能力を持つことを確認した後、実世界の複雑なタスクをエージェントとして解決することができる。コア能力」の段階では, LLMの推論能力, 社会的影響, ドメイン知識について議論した。エージェントアプリケーションの動作,計画,ツール学習の具体化を実証した。最後に,LLMの評価手法に現在直面している課題と今後の開発方向性について検討した。

LLMs have gotten attention across various research domains due to their exceptional performance on a wide range of complex tasks. Therefore, refined methods to evaluate the capabilities of LLMs are needed to determine the tasks and responsibility they should undertake. Our study mainly discussed how LLMs, as useful tools, should be effectively assessed. We proposed the two-stage framework: from ``core ability'' to ``agent'', clearly explaining how LLMs can be applied based on their specific capabilities, along with the evaluation methods in each stage. Core ability refers to the capabilities that LLMs need in order to generate high-quality natural language texts. After confirming LLMs possess core ability, they can solve real-world and complex tasks as agent. In the "core ability" stage, we discussed the reasoning ability, societal impact, and domain knowledge of LLMs. In the ``agent'' stage, we demonstrated embodied action, planning, and tool learning of LLMs agent applications. Finally, we examined the challenges currently confronting the evaluation methods for LLMs, as well as the directions for future development.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# ニューロシンボリックAIによるネットワーク侵入検出における相乗的アプローチ

A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI ( http://arxiv.org/abs/2406.00938v1 )

ライセンス: Link先を確認

Alice Bizzarri, Chung-En Yu, Brian Jalaian, Fabrizio Riguzzi, Nathaniel D. Bastian,

(参考訳) NIDS(Network Intrusion Detection Systems)の一般的なアプローチは、高いリソース消費、重要な計算要求、弱い解釈可能性といった問題によってしばしば妨げられる。さらに、これらのシステムは一般的に、新しく、急速に変化するサイバー脅威を特定するのに苦労する。本稿では、NSAI(Neurosymbolic Artificial Intelligence, NSAI)をNIDSに組み込む可能性について論じ、深層学習のデータ駆動の強みと、サイバーセキュリティにおける動的な課題に取り組むためのAIの論理的推論を組み合わせる。 NIDSにNSAIを組み込むことは、ニューラルネットワークの堅牢なパターン認識と象徴的推論の解釈能力の恩恵を受け、複雑なネットワーク脅威の検出と解釈の両方において潜在的な進歩を示す。ネットワークトラフィックデータ型と機械学習アーキテクチャを解析することにより、NSAIの特有な能力を説明し、ネットワークの振る舞いに関するより深い洞察を提供することで、検知性能とシステムの適応性の両方を改善する。この技術の融合は、従来のNIDSの機能を強化するだけでなく、高度なサイバー脅威に対してより回復力があり、解釈可能で、ダイナミックな防御メカニズムを構築するための将来の発展のステージも設定している。この領域の継続的な進歩は、NIDSを既知の脅威に応答するシステムに転換し、新たな未知の脅威を予想する。

The prevailing approaches in Network Intrusion Detection Systems (NIDS) are often hampered by issues such as high resource consumption, significant computational demands, and poor interpretability. Furthermore, these systems generally struggle to identify novel, rapidly changing cyber threats. This paper delves into the potential of incorporating Neurosymbolic Artificial Intelligence (NSAI) into NIDS, combining deep learning's data-driven strengths with symbolic AI's logical reasoning to tackle the dynamic challenges in cybersecurity, which also includes detailed NSAI techniques introduction for cyber professionals to explore the potential strengths of NSAI in NIDS. The inclusion of NSAI in NIDS marks potential advancements in both the detection and interpretation of intricate network threats, benefiting from the robust pattern recognition of neural networks and the interpretive prowess of symbolic reasoning. By analyzing network traffic data types and machine learning architectures, we illustrate NSAI's distinctive capability to offer more profound insights into network behavior, thereby improving both detection performance and the adaptability of the system. This merging of technologies not only enhances the functionality of traditional NIDS but also sets the stage for future developments in building more resilient, interpretable, and dynamic defense mechanisms against advanced cyber threats. The continued progress in this area is poised to transform NIDS into a system that is both responsive to known threats and anticipatory of emerging, unseen ones.

翻訳日:2024-06-06 02:47:03 公開日:2024-06-03

# 時間グラフ上の状態空間モデル:第一原理的研究

State Space Models on Temporal Graphs: A First-Principles Study ( http://arxiv.org/abs/2406.00943v1 )

ライセンス: Link先を確認

Jintang Li, Ruofan Wu, Xinzhou Jin, Boqun Ma, Liang Chen, Zibin Zheng,

(参考訳) 過去数年間、ディープグラフ学習の研究は静的グラフから時間グラフに移行し、動的な振る舞いを示す実世界の複雑なシステムに応答した。実際には、時間グラフは、離散時間ポイントで観測された静的グラフスナップショットの順序列として形式化される。 RNNやTransformerのようなシーケンスモデルは、このような時間グラフをモデル化するための主要なバックボーンネットワークである。しかし、有望な結果にもかかわらず、RNNは長距離依存に苦しむ一方、トランスフォーマーは二次計算の複雑さに悩まされる。近年, 連続時間線形力学系の離散化表現として表される状態空間モデル (SSM) が注目され, 独立シーケンスモデリングにおいて飛躍的な進歩を遂げている。本研究では,SSM理論を時間グラフに拡張する原理的な調査を行い,ラプラシアン正規化項の採用により,構造化情報をオンライン近似対象に組み込むことにより,時間グラフに拡張する。創発的連続時間システムは、新しいアルゴリズム課題を導入し、時間グラフのダイナミクスをモデル化するためのグラフ状態空間モデルであるGraphSSMの開発を必要とします。各種時間グラフベンチマークにおけるGraphSSMフレームワークの有効性を実験的に検証した。

Over the past few years, research on deep graph learning has shifted from static graphs to temporal graphs in response to real-world complex systems that exhibit dynamic behaviors. In practice, temporal graphs are formalized as an ordered sequence of static graph snapshots observed at discrete time points. Sequence models such as RNNs or Transformers have long been the predominant backbone networks for modeling such temporal graphs. Yet, despite the promising results, RNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Recently, state space models (SSMs), which are framed as discretized representations of an underlying continuous-time linear dynamical system, have garnered substantial attention and achieved breakthrough advancements in independent sequence modeling. In this work, we undertake a principled investigation that extends SSM theory to temporal graphs by integrating structural information into the online approximation objective via the adoption of a Laplacian regularization term. The emergent continuous-time system introduces novel algorithmic challenges, thereby necessitating our development of GraphSSM, a graph state space model for modeling the dynamics of temporal graphs. Extensive experimental results demonstrate the effectiveness of our GraphSSM framework across various temporal graph benchmarks.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 検索機能強化ジェネレーションの二重性を明らかにする:理論的解析と実践的解法

Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution ( http://arxiv.org/abs/2406.00944v1 )

ライセンス: Link先を確認

Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng,

(参考訳) Retrieval-augmented Generation (RAG) は、検索したテキストを利用して大きな言語モデル(LLM)を強化する。しかし、研究によると、RAGは一貫して有効ではなく、ノイズや不正な検索されたテキストのためにLLMを誤解させることもある。これは、RAGが利益とデトリメントの両方を含む双対性を持っていることを示唆している。多くの既存の手法がこの問題に対処しようとするが、RAGにおける双対性の理論的な説明は欠如している。この双対性における利益と損失は、説明可能な方法で定量化または比較できないブラックボックスのままである。本稿では,(1)RAG予測から切り離して形式化する,(2)表現の類似性による値のギャップを近似する,(3)それらの間のトレードオフ機構を確立し,それらを説明し,定量化し,同等にすることによる,RAGの利益と有害性の基本的な説明を与えるための第一歩を踏み出した。検索したテキストとLLMの知識の分布差が両刃剣として機能し,利益と損益の両方をもたらすことを示した。また,RAGの実際の効果がトークンレベルで予測可能であることも証明した。提案手法は, トークンレベルでのLLMとRAGの協調生成を実現し, 利益の確保と損耗の回避を図るための, 実用的新しい手法であるX-RAGを提案する。 OPT, LLaMA-2, Mistral などの LLM に基づく実世界のタスクにおける実験は, 提案手法の有効性を示し, 理論的結果を支援する。

Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical explanation for the duality in RAG. The benefit and detriment within this duality remain a black box that cannot be quantified or compared in an explainable manner. This paper takes the first step in theoretically giving the essential explanation of benefit and detriment in RAG by: (1) decoupling and formalizing them from RAG prediction, (2) approximating the gap between their values by representation similarity and (3) establishing the trade-off mechanism between them, to make them explainable, quantifiable, and comparable. We demonstrate that the distribution difference between retrieved texts and LLMs' knowledge acts as double-edged sword, bringing both benefit and detriment. We also prove that the actual effect of RAG can be predicted at token level. Based on our theory, we propose a practical novel method, X-RAG, which achieves collaborative generation between pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks based on LLMs including OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical results.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 擬似3次元変換に基づく医用自己監督表現学習のクロス次元化

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation ( http://arxiv.org/abs/2406.00947v1 )

ライセンス: Link先を確認

Fei Gao, Siwen Wang, Churan Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Gang Yu, Yizhou Yu,

(参考訳) 医用画像解析は、アノテーションの有無にかかわらず、データの不足に悩まされる。これは、3Dの医療画像に関してさらに顕著になる。 SSL(Self-Supervised Learning)は、ラベルのないデータを使用することで、この状況を部分的に緩和することができる。しかし、既存のSSLメソッドのほとんどは、単一の次元(例えば2Dや3D)のデータしか利用できず、異なる次元を持つデータを使ってトレーニングデータセットを拡張できない。本稿では,CDSSL-P3Dをベースとした新しい3次元SSLフレームワークを提案する。具体的には、2D画像を3Dデータに整合したフォーマットに変換するim2colアルゴリズムに基づく画像変換を提案する。この変換は2次元および3次元データのシームレスな統合を可能にし、3次元医用画像解析のための相互教師あり学習を容易にする。我々は,2次元および3次元の分類とセグメンテーションを含む,13の下流タスクについて広範な実験を行った。その結果,CDSSL-P3Dは優れた性能を示し,他の高度なSSL手法よりも優れていた。

Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset by using data with differing dimensionalities jointly. In this paper, we propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D), that can leverage both 2D and 3D data for joint pre-training. Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data. This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis. We run extensive experiments on 13 downstream tasks, including 2D and 3D classification and segmentation. The results indicate that our CDSSL-P3D achieves superior performance, outperforming other advanced SSL methods.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 偽ニュースと偽ニュースが公共政策にどのような影響を及ぼすか--国際文献のレビュー

How disinformation and fake news impact public policies?: A review of international literature ( http://arxiv.org/abs/2406.00951v1 )

ライセンス: Link先を確認

Ergon Cugler de Moraes Silva, Jose Carlos Vaz,

(参考訳) 本研究では,偽情報が公共政策に与える影響について検討する。 8つのデータベースで28組のキーワードを使用して、Prisma 2020モデル(Page et al , 2021)に従って体系的なレビューを行った。 4,128の論文や資料にフィルター・包含・排他基準を適用した結果,46の出版物が分析され,23の偽情報影響カテゴリーが得られた。これらのカテゴリーは、国家と社会とアクターとダイナミクスの2つの主要な軸に分けられ、国家俳優、社会俳優、国家ダイナミクス、社会ダイナミクスへの影響をカバーした。その結果, 偽情報が公共の意思決定, 政策の遵守, 機関の威信, 現実の認識, 消費, 公衆衛生などの側面に影響を及ぼすことが明らかとなった。さらに, 偽情報を公的な問題として扱い, 公共政策研究課題に組み込むことが, 政府の行動への影響を緩和するための戦略開発に寄与することが示唆された。

This study investigates the impact of disinformation on public policies. Using 28 sets of keywords in eight databases, a systematic review was carried out following the Prisma 2020 model (Page et al., 2021). After applying filters and inclusion and exclusion criteria to 4,128 articles and materials found, 46 publications were analyzed, resulting in 23 disinformation impact categories. These categories were organized into two main axes: State and Society and Actors and Dynamics, covering impacts on State actors, society actors, State dynamics and society dynamics. The results indicate that disinformation affects public decisions, adherence to policies, prestige of institutions, perception of reality, consumption, public health and other aspects. Furthermore, this study suggests that disinformation should be treated as a public problem and incorporated into the public policy research agenda, contributing to the development of strategies to mitigate its effects on government actions.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# アノテーションガイドラインに基づく知識強化:教育用テキスト分類のための大規模言語モデルの実現を目指して

Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification ( http://arxiv.org/abs/2406.00954v1 )

ライセンス: Link先を確認

Shiqi Liu, Sannyuya Liu, Lele Sha, Zijie Zeng, Dragan Gasevic, Zhi Liu,

(参考訳) 各種機械学習アプローチは、学習エンゲージメントの指標、すなわち学習エンゲージメント分類(LEC)を識別する教育テキストの自動分類において、大きな人気を得ている。 LECは、人間の学習プロセスに関する包括的な洞察を提供し、自然言語処理(NLP)、学習分析、教育データマイニングなど、さまざまな研究コミュニティから大きな関心を集めている。近年,ChatGPT などの大規模言語モデル (LLM) は,様々な NLP タスクにおいて顕著な性能を示した。しかし, LECタスクにおける総合的な評価と改善アプローチについては, 十分には検討されていない。本研究では,アノテーションガイドラインに基づく知識向上手法(AGKA)を提案する。 AGKAはGPT 4.0を使用して、アノテーションガイドラインからラベル定義の知識を取得し、ランダムアンダーサンプラーを適用していくつかの典型的な例を選択する。その後、行動分類(クエストと緊急度)、感情分類(バイナリと認識の感情)、認知分類(オピニオンと認知の存在)の6つのLECデータセットを含むLECの体系的評価ベンチマークを行う。実験の結果、AGKAは非微調整LDM(特にGPT 4.0とLlama 3 70B)を増強できることが示された。 AGKAによるGPT 4.0は、単純なバイナリ分類データセット上でBERTやRoBERTaのようなフルショットの微調整モデルよりも優れている。しかし、GPT 4.0は複雑な意味情報の深い理解を必要とするマルチクラスタスクで遅れている。特に、Llama 370B と AGKA はオープンソース LLM をベースとした有望な組み合わせである。加えて、LLMは、マルチクラスの分類において、類似した名前のラベルを区別するのに苦労している。

Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# ビデオ会議はどのように表現を変えるか

How Video Meetings Change Your Expression ( http://arxiv.org/abs/2406.00955v1 )

ライセンス: Link先を確認

Sumit Sarin, Utkarsh Mall, Purva Tendulkar, Carl Vondrick,

(参考訳) ビデオ通話で話すと表情が変わるのか? 人のビデオが2つあるとすると、各セットに特有の時空間パターンを自動的に見つけ出そうとする。既存の方法は差別的アプローチを使用して、ポストホックな説明可能性分析を行う。このような手法は、明らかなデータセットバイアス以上の洞察を与えることができないため不十分であり、その説明は、人間自身がそのタスクに長けている場合に限り有用である。その代わりに、生成ドメイン翻訳のレンズを用いてこの問題に取り組む。本手法は、学習された、入力に依存した時空間的特徴の詳細なレポートと、それらがドメイン間で変化する範囲を出力する。本研究では,F2F(F2F)とVC(Voice-calls)の対話行動の違いを,本手法が検出できることを実証する。また,本手法が大統領通信方式の違いを発見する上での有効性を示す。さらに、教師なしの方法で表現を分離するビデオにおける時間的変化点を予測でき、モデルの解釈可能性や有用性を高めることができる。最後に,F2F設定で記録したようにビデオ通話を変換して表示する手法を提案する。実験と可視化は、我々のアプローチが様々な行動を発見し、人間の行動をより深く理解するための一歩を踏み出したことを示している。

Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set. Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task. Instead, we tackle the problem through the lens of generative domain translation: our method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains. We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). We also show the applicability of our method on discovering differences in presidential communication styles. Additionally, we are able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of our model. Finally, our method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show our approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 飛行中のセグメンテーションを改善する:医療画像セグメンテーションのための補助的オンライン学習と適応的融合

Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation ( http://arxiv.org/abs/2406.00956v1 )

ライセンス: Link先を確認

Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang,

(参考訳) SAM(Segment Anything Model)の現在の変種は、オリジナルのSAMとメディカルSAMを含むが、医用画像の十分な正確なセグメンテーションを生成できない。医療画像の文脈では、SAMがそのセグメンテーション予測を生成した後、人間の専門家が特定のテストサンプルのセグメンテーションを修正することは珍しくない。これらの修正は通常、最先端のアノテーションツールを使用した手動または半手動の修正を必要とする。このプロセスにより、オンライン機械学習の利点を活用して、テスト期間中にセグメンツ・ア・シング(SA)を強化する新しいアプローチを導入する。医用画像におけるSAのセグメンテーション品質を改善することを目的として,オンライン学習のための修正アノテーションを用いた。 SAMのような大規模ビジョンモデルと統合したオンライン学習の有効性と効率を向上させるため,AuxOL(Auxiliary Online Learning)と呼ばれる新しい手法を提案する。 AuxOLはSAM(ジェネラリスト)と連携して小さな補助モデルを作成し、適用し、適応的なオンラインバッチと適応的なセグメンテーション融合を必要とする。 4つの医用画像モダリティをカバーする8つのデータセットを用いて実験を行い,提案手法の有効性を検証した。本研究は,下流セグメンテーションタスク(例えば,医用画像セグメンテーション)におけるSAを強化するための,新しい,実用的で効果的なアプローチを提案し,検証する。

The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 矛盾する視点をナビゲートする:学習に対する信頼を損なう

Navigating Conflicting Views: Harnessing Trust for Learning ( http://arxiv.org/abs/2406.00958v1 )

ライセンス: Link先を確認

Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe,

(参考訳) 対立を解決することは、多視点分類の決定をより信頼できるものにするために不可欠である。すべての視点が同一に重要であり、厳密に整合していると仮定して、異なる視点における一貫した情報表現の学習について多くの研究がなされている。しかし、現実のマルチビューデータは必ずしもこれらの仮定に従わないかもしれない。この問題に対処するために,異なる視点の衝突が発生する可能性のあるシナリオにおいて,既存の信頼に値するフレームワークを強化するための,計算信頼に基づく割引手法を開発した。その信念融合プロセスは、個別の視点による予測の信頼性を、確率に敏感な信頼割引機構を通じて考慮する。提案手法は,Top-1精度,AUC-ROC for Uncertainty-Aware Prediction,Fleiss' Kappa,および基底真理ラベルを考慮したMulti-View Agreement with Ground Truthという新たな指標を用いて,実世界の6つのデータセットに対して評価を行った。実験結果から,コンフリクトを効果的に解決し,実世界のアプリケーションにおいてより信頼性の高いマルチビュー分類モデルを実現する方法が示された。

Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct information. To address this issue, we develop a computational trust-based discounting method to enhance the existing trustworthy framework in scenarios where conflicts between different views may arise. Its belief fusion process considers the trustworthiness of predictions made by individual views via an instance-wise probability-sensitive trust discounting mechanism. We evaluate our method on six real-world datasets, using Top-1 Accuracy, AUC-ROC for Uncertainty-Aware Prediction, Fleiss' Kappa, and a new metric called Multi-View Agreement with Ground Truth that takes into consideration the ground truth labels. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 動的ユーザ参加によるフェデレーション・アンラーニングにおけるデータプライバシの保証

Guaranteeing Data Privacy in Federated Unlearning with Dynamic User Participation ( http://arxiv.org/abs/2406.00966v1 )

ライセンス: Link先を確認

Ziyao Liu, Yu Jiang, Weifeng Jiang, Jiale Guo, Jun Zhao, Kwok-Yan Lam,

(参考訳) フェデレート・アンラーニング(FU)は、訓練されたグローバルなFLモデルから、FL(Federated Learning)ユーザのデータの影響を排除するために、その能力で注目を集めている。単純なFUメソッドでは、未学習のユーザを削除し、その後、残りのすべてのユーザとスクラッチから新しいグローバルFLモデルをトレーニングする。非学習効率を高めるため、広く採用されている戦略では、FLユーザをクラスタに分割し、各クラスタが独自のFLモデルを維持している。最終的な推論は、これらのサブモデルの推論から過半数の投票を集約することで決定される。これにより、未学習プロセスを個々のクラスタに閉じ込めてユーザを除去し、未学習の効率を高める。しかし、現在のクラスタリングベースのFUスキームは、学習効率を高めるためにクラスタリングの精細化に重点を置いているが、FLユーザの勾配からの情報漏洩の可能性を見落としている。通常、各クラスタにセキュアアグリゲーション(SecAgg)スキームを統合することで、プライバシ保護FUが容易になる。それでも、SecAggスキームをシームレスに組み込んだクラスタリング方法論の構築は、特に敵ユーザや動的ユーザを含むシナリオでは難しい。本稿では,SecAggプロトコルをクラスタリングをベースとした,最も広く使用されているフェデレーションアンラーニングスキームに統合して,動的ユーザ参加を効果的に管理しながらプライバシの確保を目的とした,プライバシ保護型FUフレームワークの確立を体系的に検討する。総合的な理論的評価と実験結果から,提案手法は,ユーザの参加状況に応じて,プライバシー保護とレジリエンスの向上とともに,同等の非学習効果を達成できることが示された。

Federated Unlearning (FU) is gaining prominence for its capacity to eliminate influences of Federated Learning (FL) users' data from trained global FL models. A straightforward FU method involves removing the unlearned users and subsequently retraining a new global FL model from scratch with all remaining users, a process that leads to considerable overhead. To enhance unlearning efficiency, a widely adopted strategy employs clustering, dividing FL users into clusters, with each cluster maintaining its own FL model. The final inference is then determined by aggregating the majority vote from the inferences of these sub-models. This method confines unlearning processes to individual clusters for removing a user, thereby enhancing unlearning efficiency by eliminating the need for participation from all remaining users. However, current clustering-based FU schemes mainly concentrate on refining clustering to boost unlearning efficiency but overlook the potential information leakage from FL users' gradients, a privacy concern that has been extensively studied. Typically, integrating secure aggregation (SecAgg) schemes within each cluster can facilitate a privacy-preserving FU. Nevertheless, crafting a clustering methodology that seamlessly incorporates SecAgg schemes is challenging, particularly in scenarios involving adversarial users and dynamic users. In this connection, we systematically explore the integration of SecAgg protocols within the most widely used federated unlearning scheme, which is based on clustering, to establish a privacy-preserving FU framework, aimed at ensuring privacy while effectively managing dynamic user participation. Comprehensive theoretical assessments and experimental results show that our proposed scheme achieves comparable unlearning effectiveness, alongside offering improved privacy protection and resilience in the face of varying user participation.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 多様な視点を識別するためにRLを用いると、ソーシャルメディア上のコミュニティを識別するためのLLM能力が向上する

Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media ( http://arxiv.org/abs/2406.00969v1 )

ライセンス: Link先を確認

Nikhil Mehta, Dan Goldwasser,

(参考訳) ソーシャルメディアの大規模利用と、その大きな影響が組み合わさって、ソーシャルメディアを理解することがますます重要になっている。特に、ユーザコミュニティを特定することは、多くのダウンストリームタスクに役立ちます。しかし、特にモデルが過去のデータに基づいてトレーニングされ、将来のテストを行う場合、これは難しい。本稿では,Large Language Models (LLMs) を利用してユーザコミュニティの同定を行う。また,ChatGPT など多くの LLM が固定されており,ブラックボックスとして扱わなければならないという事実から,より小規模な LLM を訓練することで,それらをより促進するためのアプローチを提案する。我々は、この小さなモデルをトレーニングするための戦略を考案し、コミュニティを検出するLLMのより大きな能力をどのように改善するかを示した。実験の結果、RedditとTwitterのデータ、コミュニティ検出、ボット検出、ニュースメディアのプロファイリングのタスクが改善された。

The large scale usage of social media, combined with its significant impact, has made it increasingly important to understand it. In particular, identifying user communities, can be helpful for many downstream tasks. However, particularly when models are trained on past data and tested on future, doing this is difficult. In this paper, we hypothesize to take advantage of Large Language Models (LLMs), to better identify user communities. Due to the fact that many LLMs, such as ChatGPT, are fixed and must be treated as black-boxes, we propose an approach to better prompt them, by training a smaller LLM to do this. We devise strategies to train this smaller model, showing how it can improve the larger LLMs ability to detect communities. Experimental results show improvements on Reddit and Twitter data, on the tasks of community detection, bot detection, and news media profiling.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# MiniGPT-Reverse-Designing: MiniGPT-4を用いた画像調整予測

MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4 ( http://arxiv.org/abs/2406.00971v1 )

ライセンス: Link先を確認

Vahid Azizi, Fatemeh Koochaki,

(参考訳) VLM(Vision-Language Models)は近年,LLM(Large Language Models)との統合によって,大幅な進歩を遂げている。画像とテキストのモダリティを同時に処理するVLMは、様々なマルチモーダルタスクにおける画像とテキスト間の相互作用を学習し、理解する能力を示している。複雑な視覚言語タスクとして定義できるリバースデザインは、ソースイメージ、編集バージョン、オプションの高レベルテキスト編集記述を与えられたときに、編集とそのパラメータを予測することを目的としている。このタスクでは、VLMは、ソースイメージ、編集されたバージョン、オプションのテキストコンテキスト間の相互作用を、従来の視覚言語タスクを超えて同時に理解する必要がある。本稿では,逆設計タスクのためにMiniGPT-4を拡張し,微調整する。本実験では, 逆設計などの複雑なタスクに対して, 市販VLM, 特にMiniGPT-4の拡張性を示す。 code is available at this \href{https://github.com/VahidAz/MiniGPT-Reverse-Designing}

Vision-Language Models (VLMs) have recently seen significant advancements through integrating with Large Language Models (LLMs). The VLMs, which process image and text modalities simultaneously, have demonstrated the ability to learn and understand the interaction between images and texts across various multi-modal tasks. Reverse designing, which could be defined as a complex vision-language task, aims to predict the edits and their parameters, given a source image, an edited version, and an optional high-level textual edit description. This task requires VLMs to comprehend the interplay between the source image, the edited version, and the optional textual context simultaneously, going beyond traditional vision-language tasks. In this paper, we extend and fine-tune MiniGPT-4 for the reverse designing task. Our experiments demonstrate the extensibility of off-the-shelf VLMs, specifically MiniGPT-4, for more complex tasks such as reverse designing. Code is available at this \href{https://github.com/VahidAz/MiniGPT-Reverse-Designing}

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# パーソナライズされた埋め込み領域の省力化によるコールドスタート勧告

Cold-start Recommendation by Personalized Embedding Region Elicitation ( http://arxiv.org/abs/2406.00973v1 )

ライセンス: Link先を確認

Hieu Trung Nguyen, Duy Nguyen, Khoa Doan, Viet Anh Nguyen,

(参考訳) レーティング・エリケーション(英: Rating elicitation)は、冷間開始時に、利用者の好みを事前に知ることなく、新たに到着したユーザに対して、商品を推薦する必要があるようなレコメンデーションシステムの成功要素である。既存のelicitationメソッドでは,ユーザの好みを学習し,残りの項目に対してユーザの好みを推測するために,固定されたアイテムセットを使用している。固定されたシードセットを使用することで、潜在的に多様な好みを持つすべての新規ユーザにとって、シードセットが最適ではないため、レコメンデーションシステムのパフォーマンスを制限することができる。本稿では、この課題を2段階のパーソナライズド・エイコレーション・スキームを用いて解決する。まず,"burn-in' フェーズにおいて,ユーザに対して,人気項目の小さなセットの評価を依頼する。第2に、ユーザの嗜好や表現を洗練させるために、適応項目の格付けを順次求めている。プロセス全体を通して、システムは、ポイント推定ではなく、リージョン推定によって、ユーザの埋め込み値を表す。ユーザの商品に対するレーティングを問うことで得られる情報の値は、ユーザの真の埋め込み値の信頼性の高い領域中心埋め込み空間からの距離によって定量化される。最後に、ユーザの嗜好領域を考慮したレコメンデーションを順次生成する。提案手法では,各サブプロブレムを効率よく実装可能であることを示す。さらに,提案手法の有効性を実証的に実証した。

Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can limit the performance of the recommendation system since the seed set is unlikely optimal for all new users with potentially diverse preferences. This paper addresses this challenge using a 2-phase, personalized elicitation scheme. First, the elicitation scheme asks users to rate a small set of popular items in a ``burn-in'' phase. Second, it sequentially asks the user to rate adaptive items to refine the preference and the user's representation. Throughout the process, the system represents the user's embedding value not by a point estimate but by a region estimate. The value of information obtained by asking the user's rating on an item is quantified by the distance from the region center embedding space that contains with high confidence the true embedding value of the user. Finally, the recommendations are successively generated by considering the preference region of the user. We show that each subproblem in the elicitation scheme can be efficiently implemented. Further, we empirically demonstrate the effectiveness of the proposed method against existing rating-elicitation methods on several prominent datasets.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# Luna: 高精度で低コストな言語モデル幻覚をキャッチするための評価基礎モデル

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost ( http://arxiv.org/abs/2406.00975v1 )

ライセンス: Link先を確認

Masha Belyi, Robert Friel, Shuai Shao, Atindriyo Sanyal,

(参考訳) Retriever Augmented Generation (RAG) システムは,外部知識検索機構を組み込むことで,言語モデルの能力向上に重要な役割を担っている。しかし、これらのシステムを業界アプリケーションに展開する上で重要な課題は幻覚の検出と緩和である。この問題に対処することは、様々な業界環境で大きな言語モデル(LLM)が生み出す応答の信頼性と正確性を保証するために不可欠である。現在の幻覚検出技術は、精度、低レイテンシ、低コストを同時に提供できない。本稿では,RAG設定における幻覚検出のためのLuna: a DeBERTA-large (440M)エンコーダについて紹介する。その結果,Luna は幻覚検出タスクにおいて GPT-3.5 と商用評価フレームワークを上回り,97% と 96% のコスト削減と遅延削減を実現している。 Lunaは軽量で、複数の業界分野とドメイン外データにまたがって一般化されており、業界LLMアプリケーションにとって理想的な候補となっている。

Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of responses generated by large language models (LLMs) in diverse industry settings. Current hallucination detection techniques fail to deliver accuracy, low latency, and low cost simultaneously. We introduce Luna: a DeBERTA-large (440M) encoder, finetuned for hallucination detection in RAG settings. We demonstrate that Luna outperforms GPT-3.5 and commercial evaluation frameworks on the hallucination detection task, with 97% and 96% reduction in cost and latency, respectively. Luna is lightweight and generalizes across multiple industry verticals and out-of-domain data, making it an ideal candidate for industry LLM applications.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# 効率的な階層変換器を用いた事前学習音声モデル

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer ( http://arxiv.org/abs/2406.00976v1 )

ライセンス: Link先を確認

Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu,

(参考訳) 近年の言語モデルの進歩は大きな進歩を遂げているが、ニューラルオーディオコーデックの長い音響シーケンスをモデル化する際の大きな課題に直面している。本稿では,効率的な音声言語モデリングのために設計された階層型トランスフォーマである \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST)を紹介する。 GPSTは、音声波形を2種類の離散音声表現に量子化し、階層的なトランスフォーマーアーキテクチャに統合し、統一された1段階生成プロセスとHi-Res音声生成機能を向上させる。エンド・ツー・エンドの教師なしで大規模な音声コーパスを訓練することにより、GPSTは多様な話者の同一性を持つ構文的に一貫した音声を生成することができる。短時間の3秒のプロンプトによって、GPSTは自然で一貫性のあるパーソナライズされた音声を生成し、コンテキスト内学習能力を示す。さらに,多言語意味トークンと普遍的音響トークンを組み込むことで,音声言語間音声生成へのアプローチを容易に拡張することができる。実験結果から,GPSTは単語誤り率,音声品質,話者類似度において,既存の言語モデルよりも有意に優れていた。デモサンプルについては \url{https://youngsheen.github.io/GPST/demo} を参照してください。

While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio waveforms into two distinct types of discrete speech representations and integrates them within a hierarchical transformer architecture, allowing for a unified one-stage generation process and enhancing Hi-Res audio generation capabilities. By training on large corpora of speeches in an end-to-end unsupervised manner, GPST can generate syntactically consistent speech with diverse speaker identities. Given a brief 3-second prompt, GPST can produce natural and coherent personalized speech, demonstrating in-context learning abilities. Moreover, our approach can be easily extended to spoken cross-lingual speech generation by incorporating multi-lingual semantic tokens and universal acoustic tokens. Experimental results indicate that GPST significantly outperforms the existing speech language models in terms of word error rate, speech quality, and speaker similarity. See \url{https://youngsheen.github.io/GPST/demo} for demo samples.

翻訳日:2024-06-06 02:37:18 公開日:2024-06-03

# Dragonfly:マルチリゾリューションズームが大型のビジュアルランゲージモデルをスーパーチャージャー

Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model ( http://arxiv.org/abs/2406.00977v1 )

ライセンス: Link先を確認

Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou,

(参考訳) 大規模マルチモーダルモデル(LMM)の最近の進歩は、高解像度画像の解像度が、視覚的コモンセンス推論やバイオメディカル画像解析といったタスクにおいて重要な、画像詳細のきめ細かい理解を促進することを示唆している。しかし、入力解像度の増大は2つの大きな課題をもたらす。 1) 言語モデルに必要なコンテキスト長を拡張し、非効率になり、モデルのコンテキスト限界に達する。 2) 視覚的機能の複雑さを増大させ、より多くのトレーニングデータやより複雑なアーキテクチャを必要とする。我々はDragonflyという新しいLMMアーキテクチャを導入し、これらの課題に対処するための画像領域のきめ細かい視覚的理解と推論を可能にした。 Dragonflyには、マルチ解像度のビジュアルエンコーディングとズームインパッチ選択という、2つの重要な戦略がある。これらの戦略により、適切なコンテキスト長を維持しつつ、高解像度画像を効率的に処理することができる。一般的な8つのベンチマークの実験では、Dragonflyは他のアーキテクチャと比較して、競争力や性能が向上していることが示され、設計の有効性が強調された。さらに,Dragonflyのバイオメディカルインストラクションを微調整し,Path-VQAデータセット(Med-Geminiの83.3%)の92.3%の精度と,バイオメディカルイメージキャプションの最も高い報告結果を含む,詳細な視覚的理解を必要とする複数のバイオメディカルタスクの最先端の結果を得た。モデルトレーニングを支援するため,一般領域の550万イメージインストラクションサンプルと,バイオメディカル領域の1.4万サンプルを用いた視覚的インストラクションチューニングデータセットをキュレートした。また、様々な建築設計や画像解像度の影響を特徴づけるアブレーション研究を行い、視覚的指示のアライメントに関する今後の研究への洞察を提供した。コードベースとモデルはhttps://github.com/together computer/Dragonfly.comで公開されている。

Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting the model's context limit; 2) It increases the complexity of visual features, necessitating more training data or more complex architecture. We introduce Dragonfly, a new LMM architecture that enhances fine-grained visual understanding and reasoning about image regions to address these challenges. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection. These strategies allow the model to process high-resolution images efficiently while maintaining reasonable context length. Our experiments on eight popular benchmarks demonstrate that Dragonfly achieves competitive or better performance compared to other architectures, highlighting the effectiveness of our design. Additionally, we finetuned Dragonfly on biomedical instructions, achieving state-of-the-art results on multiple biomedical tasks requiring fine-grained visual understanding, including 92.3% accuracy on the Path-VQA dataset (compared to 83.3% for Med-Gemini) and the highest reported results on biomedical image captioning. To support model training, we curated a visual instruction-tuning dataset with 5.5 million image-instruction samples in the general domain and 1.4 million samples in the biomedical domain. We also conducted ablation studies to characterize the impact of various architectural designs and image resolutions, providing insights for future research on visual instruction alignment. The codebase and model are available at https://github.com/togethercomputer/Dragonfly.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 視覚的質問を選択的に答える

Selectively Answering Visual Questions ( http://arxiv.org/abs/2406.00980v1 )

ライセンス: Link先を確認

Julian Martin Eisenschlos, Hernán Maina, Guido Ivetta, Luciana Benotti,

(参考訳) 近年,大規模なマルチモーダルモデル (LMM) が出現し,キャプションや視覚質問応答 (VQA) などの視覚タスクを前例のない精度で実行できるようになった。盲人や視覚障害者を助けるようなアプリケーションには、正確な答えが不可欠である。モデルを適切に校正し、不確実性を定量化して、いつ答えるか、いつ断念するか、明確化を求めるのかを選択的に決定することは特に重要である。テキスト内学習LMMを用いたVQAのためのキャリブレーション手法とメトリクスの詳細な分析を行う。 VQAを2つの解答性ベンチマークで検討したところ、サンプリング手法が一般的に優れているが、明確な勝者が存在しないテキストのみのテキスト学習よりも、視覚的に接地されたモデルのスコアが適していることが示された。 Avg BLEU は,サンプリング法と確率法の両方の利点をモダリティで組み合わせたキャリブレーションスコアである。

Recently, large multi-modal models (LMMs) have emerged with the capacity to perform vision tasks such as captioning and visual question answering (VQA) with unprecedented accuracy. Applications such as helping the blind or visually impaired have a critical need for precise answers. It is specially important for models to be well calibrated and be able to quantify their uncertainty in order to selectively decide when to answer and when to abstain or ask for clarifications. We perform the first in-depth analysis of calibration methods and metrics for VQA with in-context learning LMMs. Studying VQA on two answerability benchmarks, we show that the likelihood score of visually grounded models is better calibrated than in their text-only counterparts for in-context learning, where sampling based methods are generally superior, but no clear winner arises. We propose Avg BLEU, a calibration score combining the benefits of both sampling and likelihood methods across modalities.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 有害言語検出における非現実的因果効果による嫌悪感

Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect ( http://arxiv.org/abs/2406.00983v1 )

ライセンス: Link先を確認

Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin,

(参考訳) 現在の有害言語検出法(TLD)は、通常、決定を行うための特定のトークンに依存しており、それらが語彙バイアスに悩まされ、性能や一般化が低下する。語彙バイアスは「有用」と「誤解」の両方が毒性の理解に影響を及ぼす。残念なことに、これらの影響を区別する代わりに、現在のデバイアス法は一般的にそれらを無差別に排除し、結果としてモデルの検出精度が低下する。そこで本研究では,TLDにおける語彙バイアスを軽減するために,CCDF(Counterfactual Causal Debiasing Framework)を提案する。語彙バイアスの「無駄な影響」を保ち、「誤解を招く影響」を排除している。具体的には、まず、原文と偏見付きトークンの合計効果を因果的視点から判断する。次に、語彙バイアスの直接的な因果効果を全体効果から排除するために、反事実推論を行う。 CCDFを組み込んだデバイアスドTLDモデルは,複数のバニラモデルに適用した競合ベースラインと比較して,精度と公正性の両方で最先端の性能を発揮することを示す実証評価を行った。我々のモデルの一般化能力は、分布外データに対する現在のデバイアスモデルより優れています。

Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 単語埋め込みを用いたアナロジー課題による薬物・遺伝子関係の予測

Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings ( http://arxiv.org/abs/2406.00984v1 )

ライセンス: Link先を確認

Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Mariko Okada, Hidetoshi Shimodaira,

(参考訳) 自然言語処理(NLP)は、テキスト中の単語が通常、埋め込みと呼ばれる特徴ベクトルに変換される幅広い分野で利用される。 BioConceptVecは生物学に適した埋め込みの具体例であり、スキップグラムのようなモデルを使用して約3000万のPubMed抽象化に基づいてトレーニングされている。一般に、単語埋め込みは単純な算術演算によって類似タスクを解くことが知られている。例えば、$\mathrm{\textit{king}} - \mathrm{\textit{man}} + \mathrm{\textit{woman}}$ predicts $\mathrm{\textit{queen}}$である。本研究では,BioConceptVec の埋め込みと,PubMed の抽象化で訓練した埋め込みが,薬物遺伝子関係の情報を包含し,アナログ計算により薬剤の標的遺伝子を予測できることを実証した。また, 生物学的経路を用いた薬物や遺伝子を分類することで, 性能が向上することを示した。さらに,過去の既知の関係から派生したベクトルが,データセットの未知の将来の関係を年々予測できることを示す。

Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector arithmetic. For instance, $\mathrm{\textit{king}} - \mathrm{\textit{man}} + \mathrm{\textit{woman}}$ predicts $\mathrm{\textit{queen}}$. In this study, we demonstrate that BioConceptVec embeddings, along with our own embeddings trained on PubMed abstracts, contain information about drug-gene relations and can predict target genes from a given drug through analogy computations. We also show that categorizing drugs and genes using biological pathways improves performance. Furthermore, we illustrate that vectors derived from known relations in the past can predict unknown future relations in datasets divided by year.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 複数編集:テキスト・画像拡散モデルによる同時多視点編集

MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models ( http://arxiv.org/abs/2406.00985v1 )

ライセンス: Link先を確認

Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu,

(参考訳) テキスト駆動画像合成は、テキストプロンプトから視覚コンテンツがどのように生成されるかを変える拡散モデルの開発において、大きな進歩を遂げた。これらの進歩にもかかわらず、コンピュータグラフィックスの重要な領域であるテキスト駆動画像編集は、ユニークな課題に直面している。最大の課題は、複数のオブジェクトや属性を同時に編集することだ。マルチアスペクト編集にこれらの手法を順次適用すると、計算要求と効率損失が増大する。本稿では,これらの課題に多大な貢献をしながら対処する。私たちの主な貢献は、複数の属性をまたいだ同時編集をシームレスに管理するメソッドであるMultiEditsの開発です。従来のアプローチとは対照的に、MultiEditsは単一の属性編集の品質を保持するだけでなく、マルチタスク編集のパフォーマンスを大幅に改善する。これは、革新的な注意分布機構と、複数の処理ヘッドをまたいで動作するマルチブランチ設計によって実現される。さらに、元のPIE-Benchデータセットを拡張したPIE-Bench++データセットを導入し、複数のオブジェクトと属性を含む画像編集タスクの評価を同時にサポートする。このデータセットは、多面的シナリオにおけるテキスト駆動画像編集手法を評価するためのベンチマークである。データセットとコードはhttps://mingzhenhuang.com/projects/MultiEdits.htmlで公開されている。

Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially for multi-aspect edits increases computational demands and efficiency losses. In this paper, we address these challenges with significant contributions. Our main contribution is the development of MultiEdits, a method that seamlessly manages simultaneous edits across multiple attributes. In contrast to previous approaches, MultiEdits not only preserves the quality of single attribute edits but also significantly improves the performance of multitasking edits. This is achieved through an innovative attention distribution mechanism and a multi-branch design that operates across several processing heads. Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. This dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios. Dataset and code are available at https://mingzhenhuang.com/projects/MultiEdits.html.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 乱れによる教師なしグラフ異常検出における公平性向上

Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement ( http://arxiv.org/abs/2406.00987v1 )

ライセンス: Link先を確認

Wenjing Chang, Kay Liu, Philip S. Yu, Jianjun Yu,

(参考訳) グラフ異常検出(GAD)は、金融詐欺検出から偽ニュース検出まで、さまざまなアプリケーションにおいてますます重要になっている。しかし、現在のGAD法は主に公平性の問題を見落としており、差別的判断は、センシティブな属性(例えば、性別、宗教、民族など)で定義された特定の人口集団に偏っている可能性がある。これは、社会的および倫理的制約を考慮して、現実世界のシナリオにおけるこれらの手法の適用性を大幅に制限する。この重要なギャップに対処するため、我々はGAD意思決定における実用性と公正性を統合するための最初の試みを行う。具体的には,DefEND と呼ばれる属性グラフ上に,新しい DisEntangle ベースの FairnEss 対応 aNomaly 検出フレームワークを考案する。 DEFEND はまず GNN のアンタングル化を導入し、情報的かつ機密性の高いノード表現をキャプチャし、グラフ表現学習に固有の社会的バイアスを効果的に低減する。さらに、異常ノードの評価における識別バイアスを軽減するために、DEFENDは、グラフ構造を組み込まずにノード属性のみに集中する再構成ベースの異常検出を採用する。さらに、入力属性と感度属性の固有の関連性を考えると、DEFENDは再構成エラーと予測された感度属性との相関を制約する。実世界のデータセットに対する実証的な評価から、DEFENDはGADにおいて効果的に機能し、最先端のベースラインと比較して公正性を著しく向上することが明らかとなった。再現性を高めるため、私たちのコードはhttps://github.com/AhaChang/DEFEND.comで利用可能です。

Graph anomaly detection (GAD) is increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender, religion, ethnicity, etc.). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing societal bias inherent in graph representation learning. Besides, to alleviate discriminatory bias in evaluating anomalous nodes, DEFEND adopts a reconstruction-based anomaly detection, which concentrates solely on node attributes without incorporating any graph structure. Additionally, given the inherent association between input and sensitive attributes, DEFEND constrains the correlation between the reconstruction error and the predicted sensitive attributes. Our empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. To foster reproducibility, our code is available at https://github.com/AhaChang/DEFEND.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 軌道最適化のための制約を考慮した拡散モデル

Constraint-Aware Diffusion Models for Trajectory Optimization ( http://arxiv.org/abs/2406.00990v1 )

ライセンス: Link先を確認

Anjian Li, Zihan Ding, Adji Bousso Dieng, Ryne Beeson,

(参考訳) 拡散モデルは、軌道最適化問題に対する高品質で多様な解を生成することに成功している。しかし、ニューラルネットワークを用いた拡散モデルは、必然的に予測エラーを発生させ、非金属目標や衝突のような制約違反を引き起こす。本稿では,軌道最適化のための制約対応拡散モデルを提案する。本稿では,拡散サンプルの制約違反を最小限に抑えつつ,元のデータ分布を復元する学習用ハイブリッド損失関数を提案する。本モデルでは, 局所最適解に近いサンプルを生成するとともに, 制約違反を最小限に抑えつつ, 従来の拡散モデルよりも優れていることを示す。

The diffusion model has shown success in generating high-quality and diverse solutions to trajectory optimization problems. However, diffusion models with neural networks inevitably make prediction errors, which leads to constraint violations such as unmet goals or collisions. This paper presents a novel constraint-aware diffusion model for trajectory optimization. We introduce a novel hybrid loss function for training that minimizes the constraint violation of diffusion samples compared to the groundtruth while recovering the original data distribution. Our model is demonstrated on tabletop manipulation and two-car reach-avoid problems, outperforming traditional diffusion models in minimizing constraint violations while generating samples close to locally optimal solutions.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 分散リファインメントネットワーク:ディープラーニングによる分布予測

Distributional Refinement Network: Distributional Forecasting via Deep Learning ( http://arxiv.org/abs/2406.00998v1 )

ライセンス: Link先を確認

Benjamin Avanzi, Eric Dong, Patrick J. Laub, Bernard Wong,

(参考訳) アクチュエータモデリングにおける重要なタスクは、損失の分布特性をモデル化することである。 Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) のような古典的(分配的な)回帰アプローチは一般的に用いられるが、モデルの開発には課題が残っている。 (i)共変体が条件分布の異なる側面に柔軟に影響を及ぼすことを可能にする。二機械学習とAIの進歩を統合して予測力を最大化すること。 (i)及び(iii)モデルとその出力に対する信頼を高めるためにモデルにおける解釈可能性のレベルを維持し、追跡においてしばしば侵害される。 (i)および (II)。我々は、本質的に解釈可能なベースラインモデル(GLMなど)とフレキシブルニューラルネットワークを改良したDeep Distribution Regression(DDR; Li et al , 2019)を組み合わせた分散リファインメントネットワーク(DRN)を提案する。 Actuarial Neural Network (CANN, Schelldorfer and W{\'u}thrich, 2019)に触発された我々のアプローチは,ベースライン分布全体を柔軟に洗練する。結果として、DRNは全ての量子化の様々な効果を捉え、適切な解釈性を維持しながら予測性能を向上させる。合成データと実世界のデータの両方を用いて、DRNの優れた分布予測能力を示す。 DRNは、アクチュエータ科学などにおいて、強力な分散回帰モデルになる可能性を持っている。

A key task in actuarial modelling involves modelling the distributional properties of losses. Classic (distributional) regression approaches like Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) are commonly used, but challenges remain in developing models that can (i) allow covariates to flexibly impact different aspects of the conditional distribution, (ii) integrate developments in machine learning and AI to maximise the predictive power while considering (i), and, (iii) maintain a level of interpretability in the model to enhance trust in the model and its outputs, which is often compromised in efforts pursuing (i) and (ii). We tackle this problem by proposing a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model (such as GLMs) with a flexible neural network-a modified Deep Distribution Regression (DDR; Li et al., 2019) method. Inspired by the Combined Actuarial Neural Network (CANN; Schelldorfer and W{\''u}thrich, 2019), our approach flexibly refines the entire baseline distribution. As a result, the DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability. Using both synthetic and real-world data, we demonstrate the DRN's superior distributional forecasting capacity. The DRN has the potential to be a powerful distributional regression model in actuarial science and beyond.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# 木を通して森を見る:部分変圧器勾配からのデータ漏洩

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients ( http://arxiv.org/abs/2406.00999v1 )

ライセンス: Link先を確認

Weijun Li, Qiongkai Xu, Mark Dras,

(参考訳) 近年の研究では、分散機械学習は勾配反転攻撃に弱いことが示されており、トレーニングで共有されるモデルの勾配を分析することで、プライベートトレーニングデータを再構成することができる。以前の攻撃では、モデル全体の全てのパラメータからの勾配を使って、そのような再構築が可能であった。しかし、関係するモジュールやそのサブモジュールのほとんどが、データ漏洩を訓練するリスクがあることを仮定し、言語モデルの様々な中間層でそのような脆弱性を検証する。広範な実験により、単一トランスフォーマー層、あるいは0.54%のパラメータを持つ単一の線形コンポーネントからの勾配が、データ漏洩のトレーニングに影響されることが判明した。さらに、トレーニング中の勾配に差分プライバシーを適用することは、データ開示の新たな脆弱性に対して限定的な保護を提供することを示す。

Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# Uni-ISP: 複数のカメラからISPを学ぶこと

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras ( http://arxiv.org/abs/2406.01003v1 )

ライセンス: Link先を確認

Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu,

(参考訳) 現代のエンドツーエンドの画像信号プロセッサ(ISP)はRAW/XYZデータからsRGB(あるいは逆)への複雑なマッピングを学習し、画像処理の新たな可能性を開く。しかし、カメラモデルの多様性が拡大し続けているため、個々のISPの開発とメンテナンスは長期的には持続可能ではなく、本質的には汎用性に欠けており、複数のカメラモデルへの適応性を妨げている。本稿では,複数のカメラからISPを学習するための新しいパイプラインUni-ISPを提案する。 Uni-ISPの中核は、逆/フォワードISPとその特別なトレーニングスキームを学習することで、デバイス対応の埋め込みを活用することである。これにより、Uni-ISPは、逆/フォワードISPのパフォーマンスを向上するだけでなく、既存の学習ISPにはアクセスできない様々な新しいアプリケーションをアンロックする。さらに,複数のカメラで同期して撮影するデータセットは存在しないため,実世界の4KデータセットであるFiveCamを構築し,SRGB-RAW画像の2400組以上を5台のスマートフォンで同期的に撮影する。 Inverse/forward ISPsにおけるUni-ISPの精度(+1.5dB/2.4dB PSNRの改善)、新しいアプリケーションの実現における汎用性、新しいカメラモデルへの適応性など、幅広い実験を行った。

Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# SemCoder: 包括的なセマンティクスによるコード言語モデルのトレーニング

SemCoder: Training Code Language Models with Comprehensive Semantics ( http://arxiv.org/abs/2406.01006v1 )

ライセンス: Link先を確認

Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray,

(参考訳) コードLLM(Code Large Language Models)は、コード補完のようなタスクに優れていますが、実行効果や動的状態のようなより深いセマンティクスを見逃すことがよくあります。本稿では,静的テキストデータへのコードLLMの依存と,デバッグやプログラムの修復といった複雑なタスクに対する詳細な意味理解の必要性のギャップを埋めることを目的としている。本稿では,高レベルの機能記述,個々の文の局所的な実行効果,入力/出力動作全般を包含し,静的コードテキストを動的実行状態にリンクする,包括的セマンティクスによるコードLLMのトレーニング手法を提案する。まずは、機能記述と実行トレースを備えた、完全に実行可能なサンプルのクリーンコードコーパスであるPyXの収集から始めます。我々は、自然言語を用いてコードを書き、実行動作を表現し、推論するためのCode LLMのトレーニングを提案し、人間の言葉によるデバッグを模倣する。このアプローチは、コード生成と実行の推論タスクにおいてGPT-3.5-turboと競合する性能を示す6.7Bパラメータしか持たないコードLLMであるSemCoderの開発につながった。 SemCoderはHumanEval(GPT-3.5-turbo:76.8%)で81.1%、CRUXEval-I(GPT-3.5-turbo:50.3%)で54.5%を達成した。また,具体的なスクラッチパッド推論と比較して,SemCoderのモノローグスタイルの実行推論の有効性について検討し,複数の次元のセマンティクスをよりスムーズに統合することを示す。最後に、学習したセマンティクスを適用して、コードLLMのデバッグと自己修正機能を改善する可能性を実証する。

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean code corpus of fully executable samples with functional descriptions and execution tracing. We propose training Code LLMs to write code and represent and reason about execution behaviors using natural language, mimicking human verbal debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 81.1% on HumanEval (GPT-3.5-turbo: 76.8%) and 54.5% on CRUXEval-I (GPT-3.5-turbo: 50.3%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# イメージングレーダ3次元物体検出に基づく多対象追跡

Multi-Object Tracking based on Imaging Radar 3D Object Detection ( http://arxiv.org/abs/2406.01011v1 )

ライセンス: Link先を確認

Patrick Palmer, Martin Krüger, Richard Altendorfer, Torsten Bertram,

(参考訳) 周辺交通参加者の効果的な追跡により、将来の行動予測やエゴ車両軌道の適切な計画に必要となる正確な状態推定が可能となる。周辺交通の参加者を検知・追跡するためのアプローチは、学習に基づく物体検出と古典的な追跡アルゴリズムの組み合わせである。学習に基づく物体検出器はライダーとカメラのデータに適切に対応し、学習に基づく物体検出器は標準のレーダーデータ入力により劣っていることが示されている。近年,レーダセンサ技術の改良により,レーダの物体検出性能は大幅に向上したが,レーダ点雲の広さによりライダーセンサに制限が加えられている。これは、多目的追跡のタスクに特有の課題である。追跡アルゴリズムは、一貫したトラックを生成しながら、限られた検出品質を克服しなければならない。この目的のために、下流タスクの可能性を調べるために、レーダデータに対する異なるマルチオブジェクト追跡手法の比較が必要である。この研究は、複数のアプローチを比較し、レーダーデータに適用した場合の限界を分析します。さらに, この課題に対して, 確率的アソシエーションアルゴリズムによる提案手法の強化が検討されている。

Effective tracking of surrounding traffic participants allows for an accurate state estimation as a necessary ingredient for prediction of future behavior and therefore adequate planning of the ego vehicle trajectory. One approach for detecting and tracking surrounding traffic participants is the combination of a learning based object detector with a classical tracking algorithm. Learning based object detectors have been shown to work adequately on lidar and camera data, while learning based object detectors using standard radar data input have proven to be inferior. Recently, with the improvements to radar sensor technology in the form of imaging radars, the object detection performance on radar was greatly improved but is still limited compared to lidar sensors due to the sparsity of the radar point cloud. This presents a unique challenge for the task of multi-object tracking. The tracking algorithm must overcome the limited detection quality while generating consistent tracks. To this end, a comparison between different multi-object tracking methods on imaging radar data is required to investigate its potential for downstream tasks. The work at hand compares multiple approaches and analyzes their limitations when applied to imaging radar data. Furthermore, enhancements to the presented approaches in the form of probabilistic association algorithms are considered for this task.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# テンソル積表現の注意に基づく反復分解

Attention-based Iterative Decomposition for Tensor Product Representation ( http://arxiv.org/abs/2406.01012v1 )

ライセンス: Link先を確認

Taewon Park, Inchul Choi, Minho Lee,

(参考訳) 近年の研究では、データの構成構造を学習することにより、ディープニューラルネットワークの体系的一般化タスクにテンソル製品表現(TPR)を適用している。しかし、これらの先行研究は、その構造表現への分解が不完全であるため、目に見えないテストデータからシンボル構造を発見し、表現する上で、限られた性能を示した。本研究では,TPRを用いた逐次入力データから符号化された構造化表現の分解操作を強化するために,Attention-based Iterative Decomposition (AID)モジュールを提案する。我々のAIDは、任意のTPRモデルに容易に適応でき、入力特徴と構造化表現との間の競合的な注意機構を通じて、体系的な分解を提供する。本実験では,一連の系統的一般化タスクにおいて,TPRに基づく先行作業の性能を大幅に向上させることにより,AIDの有効性を示す。さらに、定量的および定性的な評価では、AIDは他の作品よりも構成的および十分有界な構造表現を生成する。

In recent research, Tensor Product Representation (TPR) is applied for the systematic generalization task of deep neural networks by learning the compositional structure of data. However, such prior works show limited performance in discovering and representing the symbolic structure from unseen test data because their decomposition to the structural representations was incomplete. In this work, we propose an Attention-based Iterative Decomposition (AID) module designed to enhance the decomposition operations for the structured representations encoded from the sequential input data with TPR. Our AID can be easily adapted to any TPR-based model and provides enhanced systematic decomposition through a competitive attention mechanism between input features and structured representations. In our experiments, AID shows effectiveness by significantly improving the performance of TPR-based prior works on the series of systematic generalization tasks. Moreover, in the quantitative and qualitative evaluations, AID produces more compositional and well-bound structural representations than other works.

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# Rewardの過度な最適化を緩和するためのスケーラブルな実装

Scalable Ensembling For Mitigating Reward Overoptimisation ( http://arxiv.org/abs/2406.01013v1 )

ライセンス: Link先を確認

Ahmed M. Ahmed, Rafael Rafailov, Stepan Sharkov, Xuechen Li, Sanmi Koyejo,

(参考訳) Reinforcement Learning from Human Feedback (RLHF)は、強力な命令追従モデルのための言語モデリングにおける大幅な進歩を可能にした。しかしながら、これらのモデルのアライメントは、よりパフォーマンスの高い ` `gold" 報酬モデルによって測定された、学習した ``proxy' 報酬モデルに過度に適合する傾向にあり、これは 'textit{over-optimization} として知られる現象である。オフライン強化学習では一般的だが、高いメモリ要求の言語モデルでは信じられないほどコストがかかるため、十分に大きなモデルではそのようなアプローチは実現できない。この目的のために、共有エンコーダを用いるが、分離された線形ヘッドを提案する。これは完全なアンサンブルと同じようなパフォーマンスをもたらしながら、同じサイズのモデルのトレーニングに必要なメモリと時間の大幅な節約を可能にします。 \end{abstract}

Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomenon known as \textit{over-optimization}. Prior work has mitigated this issue by computing a pessimistic statistic over an ensemble of reward models, which is common in Offline Reinforcement Learning but incredibly costly for language models with high memory requirements, making such approaches infeasible for sufficiently large models. To this end, we propose using a shared encoder but separate linear heads. We find this leads to similar performance as the full ensemble while allowing tremendous savings in memory and time required for training for models of similar size. \end{abstract}

翻訳日:2024-06-06 02:27:34 公開日:2024-06-03

# Mobile-Agent-v2:マルチエージェントコラボレーションによる効果的なナビゲーション機能を備えたモバイルデバイス操作アシスタント

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration ( http://arxiv.org/abs/2406.01014v1 )

ライセンス: Link先を確認

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang,

(参考訳) モバイルデバイス操作タスクは、一般的なマルチモーダルAIアプリケーションシナリオになりつつある。現在のMLLM(Multi-modal Large Language Models)は、訓練データによって制約されているが、操作アシスタントとして効果的に機能する能力は欠如している。代わりに、ツール呼び出しによる機能強化を行うMLLMベースのエージェントが、このシナリオに徐々に適用されている。しかし、モバイル機器操作タスクにおける2つの大きなナビゲーション課題、タスク進捗ナビゲーション、フォーカスコンテンツナビゲーションは、既存の作業の単一エージェントアーキテクチャの下でかなり複雑である。これは、非常に長いトークンシーケンスと、パフォーマンスを制限するインターリーブされたテキストイメージデータフォーマットのためである。これらのナビゲーション課題を効果的に解決するために,モバイルデバイス操作支援のためのマルチエージェントアーキテクチャであるMobile-Agent-v2を提案する。アーキテクチャは、計画エージェント、決定エージェント、反射エージェントの3つのエージェントから構成される。計画エージェントはタスク進捗を生成し、履歴操作のナビゲーションをより効率的にする。フォーカス内容を維持するため、タスクの進捗に応じて更新するメモリユニットを設計する。さらに、誤った操作を正すために、反射剤は各操作の結果を観察し、それに応じて誤りを処理する。実験の結果, Mobile-Agent-v2は, Mobile-Agentの単一エージェントアーキテクチャに比べてタスク完了率が30%以上向上していることがわかった。コードはhttps://github.com/X-PLUG/MobileAgent.comで公開されている。

Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# 変分モンテカルロ法におけるニューラル量子状態:簡単な概要

Neural Quantum States in Variational Monte Carlo Method: A Brief Summary ( http://arxiv.org/abs/2406.01017v1 )

ライセンス: Link先を確認

Yuntai Song,

(参考訳) 本稿では,スピン系の量子状態に基づく変分モンテカルロ法について概説する。ニューラルネットワークを波動関数として使用すると、その非線型活性化関数と密接に関連している非常に非局所的な相互作用を含む、様々な種類の相互作用のより一般化された表現が可能になる。さらに、ニューラルネットワークは、高次元システムを扱う場合、比較的小さな計算資源を持つ比較的複雑な波動関数を表現できる。量子状態トモグラフィーにおいて、ニューラル量子状態の表現法はすでに大きな成果を上げており、より大きなシステムを扱う可能性を示している。

In this note, variational Monte Carlo method based on neural quantum states for spin systems is reviewed. Using a neural network as the wave function allows for a more generalized expression of various types of interactions, including highly non-local interactions, which are closely related to its non-linear activation functions. Additionally, neural networks can represent relatively complex wave functions with relatively small computational resources when dealing with higher-dimensional systems, which is undoubtedly a "flattening" advantage. In quantum-state tomography, the representation method of neural quantum states has already achieved significant results, hinting at its potential in handling larger-sized systems.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# マルチレベルVAEと逆学習を用いたテキスト音声のアクセント変換

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training ( http://arxiv.org/abs/2406.01018v1 )

ライセンス: Link先を確認

Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans,

(参考訳) 急速なグローバル化により、包括的で代表的な音声技術を構築する必要性は過大評価されない。アクセントは、包括的音声合成装置を構築する際に考慮すべき音声の重要な側面である。包括的音声技術は、特定のアクセントを持つ人々のような特定のグループに対する偏見を消すことを目的としている。アクセントに焦点を絞らずに高品質な音声を生成するように設計されているため、現状のTTS(Text-to-Speech)システムは、背景に関係なく、現在すべての人に適していない可能性があることに留意する。本稿では,TTSにおけるアクセント付き音声合成と変換に対応するために,マルチレベル変分オートエンコーダを用いたTSモデルを提案する。主観的評価と主観的聴力テストによる評価を行った。その結果,アクセント変換能力はベースラインに比べて向上した。

With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated. Accent is an important aspect of speech that needs to be taken into consideration while building inclusive speech synthesizers. Inclusive speech technology aims to erase any biases towards specific groups, such as people of certain accent. We note that state-of-the-art Text-to-Speech (TTS) systems may currently not be suitable for all people, regardless of their background, as they are designed to generate high-quality voices without focusing on accent. In this paper, we propose a TTS model that utilizes a Multi-Level Variational Autoencoder with adversarial learning to address accented speech synthesis and conversion in TTS, with a vision for more inclusive systems in the future. We evaluate the performance through both objective metrics and subjective listening tests. The results show an improvement in accent conversion ability compared to the baseline.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment

CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment ( http://arxiv.org/abs/2406.01020v1 )

ライセンス: Link先を確認

Daekyu Kwon, Dongyoung Kim, Sehwan Ki, Younghyun Jo, Hyong-Euk Lee, Seon Joo Kim,

(参考訳) no-reference Image Quality Assessment (NR-IQA)では、限られたデータセットサイズでの課題は、堅牢で一般化可能なモデルの開発を妨げている。従来の方法では、大きなデータセットを使用してIQAのリッチな表現を抽出することでこの問題に対処する。また、視覚言語モデル(VLM)をベースとしたIQAを提案する手法もあるが、汎用VLMとIQAのドメインギャップはスケーラビリティを制約している。本稿では,VLM から品質関連知識を選択的に抽出し,大規模データセットのスケーラビリティを活用することにより,IQA の一般化可能な表現を構築する新しい事前学習フレームワークを提案する。具体的には、5つの代表的な画像品質属性に対して最適なテキストプロンプトを慎重に選択し、VLMを用いて擬似ラベルを生成する。多数の属性を意識した擬似ラベルを大きな画像データセットで生成し,画像品質に関する豊かな表現をIQAモデルで学習する。提案手法は,複数のIQAデータセット上での最先端性能を実現し,優れた一般化能力を示す。これらの長所を生かして、画像生成モデルの評価や画像強調モデルの訓練、実世界の適用可能性の実証など、いくつかの応用を提案する。私たちはそのコードを利用できるようにします。

In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalability. In this work, we propose a novel pretraining framework that constructs a generalizable representation for IQA by selectively extracting quality-related knowledge from VLM and leveraging the scalability of large datasets. Specifically, we carefully select optimal text prompts for five representative image quality attributes and use VLM to generate pseudo-labels. Numerous attribute-aware pseudo-labels can be generated with large image datasets, allowing our IQA model to learn rich representations about image quality. Our approach achieves state-of-the-art performance on multiple IQA datasets and exhibits remarkable generalization capabilities. Leveraging these strengths, we propose several applications, such as evaluating image generation models and training image enhancement models, demonstrating our model's real-world applicability. We will make the code available for access.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# フィンランド小説の文学的分析のための定性的・計算的アプローチの組み合わせ

Combining Qualitative and Computational Approaches for Literary Analysis of Finnish Novels ( http://arxiv.org/abs/2406.01021v1 )

ライセンス: Link先を確認

Emily Ohman, Riikka Rossi,

(参考訳) 計算感情分析を用いてフィンランド文学の古典から何が学べるか? 本稿は、文学作品研究における感情分析の計算手法が、文学や影響に対する質的あるいはより「伝統的な」アプローチとどのように併用できるかを検討することで、この問題に答えようとしている。本研究では,フィンランド文学の文体に適応した感情レキシコンと,フィンランド文学の文体の意味的感情空間を図解する単語埋め込みを組み合わせた,感情分析の単純かつ堅牢な計算手法を提示・開発する。我々は,ユハニ・アホ(Juhani Aho),ミンナ・カント(Minna Canth),マリア・ジョトゥニ(Maria Jotuni),F.E.シランプ(F.E. Sillanp\"a\"a"a)の4つの作品について定性的な分析を行った。テキストの語彙の計算分析は、テキスト内の感情的原子価の大規模な分布を評価するのに有用であり、他の研究者が研究結果を再現するのに役立つガイドラインを提供する。計算手法は, 文献に影響を及ぼす研究において, 近読的分析支援ツールとしての役割を担っているが, ジャンルや全国的カノンの大規模比較も可能であることを示す。

What can we learn from the classics of Finnish literature by using computational emotion analysis? This article tries to answer this question by examining how computational methods of sentiment analysis can be used in the study of literary works in conjunction with a qualitative or more 'traditional' approach to literature and affect. We present and develop a simple but robust computational approach of affect analysis that uses a carefully curated emotion lexicon adapted to Finnish turn-of-the-century literary texts combined with word embeddings to map out the semantic emotional spaces of seminal works of Finnish literature. We focus our qualitative analysis on selected case studies: four works by Juhani Aho, Minna Canth, Maria Jotuni, and F. E. Sillanp\"a\"a, but provide emotion arcs for a total of 975 Finnish novels. We argue that a computational analysis of a text's lexicon can be valuable in evaluating the large distribution of the emotional valence in a text and provide guidelines to help other researchers replicate our findings. We show that computational approaches have a place in traditional studies on affect in literature as a support tool for close-reading-based analyses, but also allowing for large-scale comparison between, for example, genres or national canons.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# レコメンダシステムにおける攻撃と防御

Poisoning Attacks and Defenses in Recommender Systems: A Survey ( http://arxiv.org/abs/2406.01022v1 )

ライセンス: Link先を確認

Zongwei Wang, Junliang Yu, Min Gao, Guanhua Ye, Shazia Sadiq, Hongzhi Yin,

(参考訳) 現代のレコメンデーターシステム(RS)は、デジタルプラットフォーム全体のユーザエクスペリエンスを著しく向上させたが、毒殺攻撃による重大な脅威に直面している。これらの攻撃は、非倫理的な利益のためにレコメンデーションアウトプットを操作することを目的としており、悪意のあるデータを注入したり、モデルのトレーニングを介入することでRSの脆弱性を悪用している。この調査は、攻撃者のレンズを通してこれらの脅威を調べ、そのメカニズムと影響について新たな洞察を提供することによって、ユニークな視点を示す。具体的には、攻撃目標の設定、攻撃能力の評価、被害者のアーキテクチャの分析、毒殺戦略の実行の4段階を含む、系統的なパイプラインを詳述する。パイプラインは様々な攻撃戦術と整合するだけでなく、異なる毒殺攻撃の焦点を特定するための包括的分類としても機能する。これに対応して、我々は防衛戦略を2つの主要なカテゴリに分類する: 有害なデータフィルタリングと、防御者の視点からの堅牢な訓練である。最後に、既存の制限を強調し、この分野におけるさらなる探索のための革新的な方向性を提案する。

Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats through the lens of an attacker, offering fresh insights into their mechanics and impacts. Concretely, we detail a systematic pipeline that encompasses four stages of a poisoning attack: setting attack goals, assessing attacker capabilities, analyzing victim architecture, and implementing poisoning strategies. The pipeline not only aligns with various attack tactics but also serves as a comprehensive taxonomy to pinpoint focuses of distinct poisoning attacks. Correspondingly, we further classify defensive strategies into two main categories: poisoning data filtering and robust training from the defender's perspective. Finally, we highlight existing limitations and suggest innovative directions for further exploration in this field.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# Khayyamがペルシアの筆跡データセットをオフラインで公開

Khayyam Offline Persian Handwriting Dataset ( http://arxiv.org/abs/2406.01025v1 )

ライセンス: Link先を確認

Pourya Jafarzadeh, Padideh Choobdar, Vahid Mohammadi Safarzadeh,

(参考訳) 手書き解析は、マシンラーニングにおいて依然として重要な応用である。どんな手書き認識アプリケーションでも基本的な要件は、包括的なデータセットが利用できることだ。標準ラベル付きデータセットは、学習アルゴリズムのトレーニングと評価において重要な役割を果たす。本稿では,ハヤムデータセットをペルシア語の要素(単語,文,文字,数字)の非拘束手書きデータセットとして提示する。現在利用可能なデータセットでは稀なペルシャ語サンプルの収集に集中しています。カヤムのデータセットには44000語、60000文字、6000桁が含まれている。さらに、この形式は400人のペルシア人作家によって埋められた。データセットの適用性を示すために、数字、文字、単語データに基づいて機械学習アルゴリズムを訓練し、結果を報告する。このデータセットは研究や学術的な用途で利用できる。

Handwriting analysis is still an important application in machine learning. A basic requirement for any handwriting recognition application is the availability of comprehensive datasets. Standard labelled datasets play a significant role in training and evaluating learning algorithms. In this paper, we present the Khayyam dataset as another large unconstrained handwriting dataset for elements (words, sentences, letters, digits) of the Persian language. We intentionally concentrated on collecting Persian word samples which are rare in the currently available datasets. Khayyam's dataset contains 44000 words, 60000 letters, and 6000 digits. Moreover, the forms were filled out by 400 native Persian writers. To show the applicability of the dataset, machine learning algorithms are trained on the digits, letters, and word data and results are reported. This dataset is available for research and academic use.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# 言語モデルの信頼性を向上したシンボル結合

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors ( http://arxiv.org/abs/2406.01026v1 )

ライセンス: Link先を確認

Mengge Xue, Zhenyu Hu, Meng Zhao, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Chengguo Yin,

(参考訳) 大規模言語モデル (LLMs) の研究において, MCQ (Multiple-Choice Questions) が重要な研究領域となっている。これまでの研究は、LCMのパフォーマンスが回答選択の提示に影響され、スーパービジョン・ファインチューニング(SFT)における選択バイアスが未探索のままである、というシナリオにおいて、MCQにおける選択バイアス問題を調査してきた。本稿では,LLMのMCSB能力が不十分なため,選択バイアスがSFT相に持続していることを明らかにする。この制限は、モデルが解の選択肢と対応する記号(例えば、A/B/C/D)を効果的に関連付けるのに苦労していることを意味する。モデルのMCSB能力を高めるために、まず損失関数にオプション内容を取り込んで、オプションシンボルとコンテンツの重みを調整し、現在のシンボルのオプション内容を理解するようモデルに指示する。そこで我々は,ポイントワイド・インテリジェント・フィードバック (PIF) と呼ばれるMCQに対する効率的なSFTアルゴリズムを提案する。 PIFは、不正なオプション内容とすべての候補シンボルをランダムに組み合わせて負のインスタンスを構築し、これらの負のサンプルをLLMにフィードバックするポイントワイズ損失を提案する。実験の結果, PIF は MCSB 能力を向上させることにより, モデル選択バイアスを著しく低減することが示された。興味深いことに、PIFはMCQの精度を大幅に向上させる。

Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM's inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model's MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combining the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model's selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# PRICE: クロスデータベース・カーディナリティ推定のための事前訓練モデル

PRICE: A Pretrained Model for Cross-Database Cardinality Estimation ( http://arxiv.org/abs/2406.01027v1 )

ライセンス: Link先を確認

Tianjing Zeng, Junwei Lan, Jiahong Ma, Wenqing Wei, Rong Zhu, Pengfei Li, Bolin Ding, Defu Lian, Zhewei Wei, Jingren Zhou,

(参考訳) クエリ実行計画の最適化には,カーディナリティ推定(CardEst)が不可欠である。最近のMLベースのCardEst手法は、データベース間の転送可能性の欠如と高い準備コストのため、高い精度を達成できるが、デプロイメント上の課題に直面している。本稿では,これらの制約に対処するPRetrained MultI-table CardEstモデルであるPRICEを提案する。 PRICEは低レベルだが転送可能なデータ分散とクエリ情報を取得し、メタ知識を学習するために自己認識モデルをエレガントに適用し、任意のデータベースの濃度を計算する。一般に、その作成コストは基本的な1次元ヒストグラムベースのCardEst法とほとんど変わらないが、高い推定精度を達成するために、目に見えない新しいデータベースに適用できる。さらに、PRICEを微調整することで、特定のデータベース上での性能をさらに向上することができる。 30の多様なデータセットを使用してPRICEを事前トレーニングし、約5時間で処理を完了し、結果としてモデルサイズは約40MBになった。評価の結果、PRICEは既存の手法を一貫して上回り、いくつかの未確認データベース上で最高の推定精度を達成し、オーバーヘッドを低くして高速な実行計画を生成することがわかった。少量のデータベース固有のクエリで微調整した後、PRICEは最適なクエリに非常に近いプランを見つけることができた。一方、PRICEは一般的に、データ更新、データスケーリング、クエリのワークロードシフトなど、さまざまな設定に適用できます。私たちはすべてのデータとコードをhttps://github.com/StCarmen/PRICE.comで公開しました。

Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across databases. In this paper, we propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations. PRICE takes low-level but transferable features w.r.t. data distributions and query information and elegantly applies self-attention models to learn meta-knowledge to compute cardinality in any database. It is generally applicable to any unseen new database to attain high estimation accuracy, while its preparation cost is as little as the basic one-dimensional histogram-based CardEst methods. Moreover, PRICE can be finetuned to further enhance its performance on any specific database. We pretrained PRICE using 30 diverse datasets, completing the process in about 5 hours with a resulting model size of only about 40MB. Evaluations show that PRICE consistently outperforms existing methods, achieving the highest estimation accuracy on several unseen databases and generating faster execution plans with lower overhead. After finetuning with a small volume of databasespecific queries, PRICE could even find plans very close to the optimal ones. Meanwhile, PRICE is generally applicable to different settings such as data updates, data scaling, and query workload shifts. We have made all of our data and codes publicly available at https://github.com/StCarmen/PRICE.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# LLEMamba:ディープ・アンフォールディング・ネットワークを用いたライティングガイドマンバによる低照度化

LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network ( http://arxiv.org/abs/2406.01028v1 )

ライセンス: Link先を確認

Xuanqi Zhang, Haijin Zeng, Jinwang Pan, Qiangqiang Shen, Yongyong Chen,

(参考訳) トランスフォーマーをベースとした低照度化手法は,グローバルコンテキストにおける長距離依存性を効果的にキャプチャすることで,有望な性能を実現している。しかし、その高い計算需要は、深層展開ネットワークにおける複数イテレーションのスケーラビリティを制限するため、解釈可能性と歪みの柔軟バランスが困難である。この問題に対処するために,Retinex Optimization と Mamba Deep Priors によって理論的解釈性と忠実性が保証される深層展開ネットワーク (LLEMamba) を用いたリライト誘導型マンバによる新しい低照度化手法を提案する。具体的には、LLEMambaは、まず、深く展開するネットワーク内に、乗算器の交互方向法(ADMM)に基づく反復最適化過程を組み込んだ、深い事前のRetinexモデルを構築します。 Transformerとは異なり、複数のイテレーションで深層展開フレームワークを支援するため、LLEMambaは計算複雑性の低い新しいMambaアーキテクチャを導入している。ベンチマーク実験により,LLEMambaは既存の最先端手法と比較して,優れた定量的評価と低歪みの視覚的結果が得られることが示された。

Transformer-based low-light enhancement methods have yielded promising performance by effectively capturing long-range dependencies in a global context. However, their elevated computational demand limits the scalability of multiple iterations in deep unfolding networks, and hence they have difficulty in flexibly balancing interpretability and distortion. To address this issue, we propose a novel Low-Light Enhancement method via relighting-guided Mamba with a deep unfolding network (LLEMamba), whose theoretical interpretability and fidelity are guaranteed by Retinex optimization and Mamba deep priors, respectively. Specifically, our LLEMamba first constructs a Retinex model with deep priors, embedding the iterative optimization process based on the Alternating Direction Method of Multipliers (ADMM) within a deep unfolding network. Unlike Transformer, to assist the deep unfolding framework with multiple iterations, the proposed LLEMamba introduces a novel Mamba architecture with lower computational complexity, which not only achieves light-dependent global visual context for dark images during reflectance relight but also optimizes to obtain more stable closed-form solutions. Experiments on the benchmarks show that LLEMamba achieves superior quantitative evaluations and lower distortion visual results compared to existing state-of-the-art methods.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# CYCLO: サイクリックグラフ変換器による空中映像の多目的関係モデリング

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos ( http://arxiv.org/abs/2406.01029v1 )

ライセンス: Link先を確認

Trong-Thuan Nguyen, Pha Nguyen, Xin Li, Jackson Cothren, Alper Yilmaz, Khoa Luu,

(参考訳) 映像シーングラフ生成(VidSGG)は、オブジェクト間の複雑な関係とビデオシーケンスにおける時間的ダイナミクスをキャプチャし、解釈するための変換的アプローチとして登場した。本稿では,空中ビデオにおける多目的関係モデリングに焦点を当てた新しいAeroEyeデータセットを提案する。私たちのAeroEyeデータセットには、さまざまなドローンシーンが含まれており、オブジェクト間の複雑な関係や空間的配置をキャプチャする、視覚的に包括的で正確な述語集が含まれています。この目的のために,循環グラフ変換器 (CYCLO) の手法を提案する。また、提案手法により、固有巡回パターンでシーケンスを処理し、オブジェクト関係を正しい順序で処理することができる。これにより、情報損失を最小限に抑えつつ、周期的・重複的な関係を効果的に捉えることができる。 AeroEyeデータセットに関する広範な実験は、提案されたCYCLOモデルの有効性を示し、ドローンビデオのシーン理解を行う可能性を示している。最後に、CYCLO法は、PVSGとASPIReの2つのシーングラフ生成ベンチマークに対して、常にステート・オブ・ザ・アート(SOTA)結果を達成する。

Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlapping relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# LLMとGNNは補完的:マルチモーダルグラフ学習のためのLLMを蒸留する

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning ( http://arxiv.org/abs/2406.01032v1 )

ライセンス: Link先を確認

Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang,

(参考訳) グラフニューラルネットワーク(GNN)の最近の進歩は、複雑な分子構造をモデル化して特性を予測する能力を大幅に強化している。それでも、分子データは、GNNがうまく扱えないテキスト情報や視覚情報を含む、単なるグラフ構造以上のものを含んでいる。このギャップを埋めるために,マルチモーダルな分子データを用いてLarge Language Models (LLMs) から洞察を抽出する,革新的なフレームワークを提案する。 GALLON(Graph Learning from Large Language Model Distillation)は,マルチモーダル知識をMLP(Multilayer Perceptron)に統合することにより,LLMとGNNの能力を相乗化するフレームワークである。本手法は、分子のリッチテキストデータと視覚データと、GNNの構造解析能力を統合する。大規模実験により, 蒸留MLPモデルにより, 分子特性予測の精度と効率が著しく向上することが明らかとなった。

Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# 配向誘導重み補正を用いたマルチタスク学習を用いた一般化ジャージ数認識

Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement ( http://arxiv.org/abs/2406.01033v1 )

ライセンス: Link先を確認

Yung-Hui Lin, Yu-Wen Chang, Huang-Chia Shih, Takahiro Ogawa,

(参考訳) ジャージー番号認識(JNR)はスポーツ分析において常に重要な課題である。画像がぼやけ、隠蔽、変形、解像度の低いため、認識精度の向上は現在も進行中の課題である。近年の研究では、数値ローカライゼーションと光学的文字認識を用いてこの問題に対処している。いくつかのアプローチでは、人体回転角がジャージの数字の識別に与える影響を無視して、プレイヤー識別スキームを画像シーケンスに適用している。マルチタスクスキームを用いて各数字を正確に予測することで、より堅牢な結果が得られる。そこで本研究では,人体方向角と数字の手がかりを組み合わせた多タスク学習手法であるアングル・ディジット・リファレンス・スキーム(ADRS)を提案する。実験結果から,提案手法は推測情報を増加させ,予測精度を大幅に向上させる。 1種類のスポーツしか扱えない最先端の手法と比較して、提案手法はより多種多様な実用的JNRアプリケーションを生成する。サッカー,サッカー,バスケットボール,バレーボール,野球などの多種多様なチームスポーツをデータセットに組み込むことは,スポーツ分析におけるJNRの一般化に大きく貢献する。我々の精度はトップ1で64.07%、トップ2で89.97%、対応するF1スコアは67.46%、90.64%である。

Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequences, ignoring the impact of human body rotation angles on jersey digit identification. Accurately predicting the number of jersey digits by using a multi-task scheme to recognize each individual digit enables more robust results. Based on the above considerations, this paper proposes a multi-task learning method called the angle-digit refine scheme (ADRS), which combines human body orientation angles and digit number clues to recognize athletic jersey numbers. Based on our experimental results, our approach increases inference information, significantly improving prediction accuracy. Compared to state-of-the-art methods, which can only handle a single type of sport, the proposed method produces a more diverse and practical JNR application. The incorporation of diverse types of team sports such as soccer, football, basketball, volleyball, and baseball into our dataset contributes greatly to generalized JNR in sports analytics. Our accuracy achieves 64.07% on Top-1 and 89.97% on Top-2, with corresponding F1 scores of 67.46% and 90.64%, respectively.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# 3次元心筋変形解析のための合成データ生成

Synthetic Data Generation for 3D Myocardium Deformation Analysis ( http://arxiv.org/abs/2406.01040v1 )

ライセンス: Link先を確認

Shahar Zuler, Dan Raviv,

(参考訳) 高分解能CTデータセットとGTアノテーションを用いた3次元心筋変形の正確な解析は、心血管画像研究の進展に不可欠である。しかし、そのようなデータセットの不足は、堅牢な心筋変形解析モデルを開発する上で大きな課題となる。そこで本研究では,心血管画像データセットの充実のための合成データ生成手法を提案する。本稿では,GT 3D光フローアノテーションを付加した合成データ生成手法を提案する。心4次元CTスキャン(4D)からのデータ作成,パラメータの選択,および同一または他の心3次元CTデータからの合成データの作成について概説した。本研究は,高分解能CTデータセットの欠如による限界を克服し,臨床応用と診断のための正確かつ信頼性の高い心筋変形解析アルゴリズムの開発に寄与する。私たちのコードは、http://www.github.com/shaharzuler/cardio_volume_skewerで利用可能です。

Accurate analysis of 3D myocardium deformation using high-resolution computerized tomography (CT) datasets with ground truth (GT) annotations is crucial for advancing cardiovascular imaging research. However, the scarcity of such datasets poses a significant challenge for developing robust myocardium deformation analysis models. To address this, we propose a novel approach to synthetic data generation for enriching cardiovascular imaging datasets. We introduce a synthetic data generation method, enriched with crucial GT 3D optical flow annotations. We outline the data preparation from a cardiac four-dimensional (4D) CT scan, selection of parameters, and the subsequent creation of synthetic data from the same or other sources of 3D cardiac CT data for training. Our work contributes to overcoming the limitations imposed by the scarcity of high-resolution CT datasets with precise annotations, thereby facilitating the development of accurate and reliable myocardium deformation analysis algorithms for clinical applications and diagnostics. Our code is available at: http://www.github.com/shaharzuler/cardio_volume_skewer

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# ガウススプラッティングを用いた単眼ビデオからの自己校正4次元新しいビュー合成

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting ( http://arxiv.org/abs/2406.01042v1 )

ライセンス: Link先を確認

Fang Li, Hao Zhang, Narendra Ahuja,

(参考訳) ガウス散乱(GS)は、特にダイナミックシーンにおいて、ニューラルレイディアンス場(NeRF)と比較して、シーン再構成効率と新規ビュー合成(NVS)の精度を著しく向上させた。しかし、GS や NeRF をベースとした現在の 4D NVS の手法は、主に COLMAP が提供するカメラパラメータに依存しており、COLMAP が生成したスパース点雲を初期化に利用している。これは、特に大きな物体の動きのあるシーンや、大きな回転と組み合わされた小さな翻訳のような極端なカメラ条件において、動的シーンの表現が貧弱になることがある。いくつかの研究は、市販のモデルから得られた深度、光学的流れなどの追加情報によって、カメラパラメータとシーンの推定を同時に最適化する。この証明されていない情報を真実として使うと、堅牢性と精度が低下し、長いモノクロビデオ(例えば数百フレーム)で頻繁に発生する。本稿では,カメラパラメータの自己校正による高忠実度 4D GS シーン表現の学習手法を提案する。これには、3D構造を頑健に表現する2D点の特徴の抽出や、カメラパラメータと3D構造を連続的に4Dシーンの最適化に利用することが含まれる。提案手法の精度と時間効率を,いくつかの標準ベンチマークにおける定量的,定性的な実験結果を通じて実証する。その結果,4次元の新規なビュー合成のための最先端手法よりも顕著な改善が見られた。ソースコードは近々https://github.com/fangli333/SC-4DGSで公開される。

Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis. The source code will be released soon at https://github.com/fangli333/SC-4DGS.

翻訳日:2024-06-06 02:17:50 公開日:2024-06-03

# 核医学人工知能の行動:Bethesda Report (AI Summit 2024)

Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024) ( http://arxiv.org/abs/2406.01044v1 )

ライセンス: Link先を確認

Arman Rahmim, Tyler J. Bradshaw, Guido Davidzon, Joyita Dutta, Georges El Fakhri, Munir Ghesani, Nicolas A. Karakatsanis, Quanzheng Li, Chi Liu, Emilie Roncali, Babak Saboury, Tahir Yusufaly, Abhinav K. Jha,

(参考訳) 第2回SNMMI人工知能(AI)サミット(第2回SNMMI AI Task Force)は、2024年2月29日から3月1日にかけて、MDのベセスダで開催された。さまざまなコミュニティメンバと利害関係者を集結させ、2022年に成功したAI Summitに続いて、サミットのテーマは「AI in Action」だった。主なトピックは6つ。 i)AIタスクフォースによる事前及び進行中の取り組みの概要二計算核腫瘍学の新たなニーズ及びツール三大規模言語及び生成モデルにおける新たなフロンティア四核医学におけるAIの利用に関する価値提案を定義すること。 (v)データとモデルリポジトリの取り組みを含むオープンサイエンス (vi)返済及び資金調達の問題。主な取り組み、発見、課題、次のステップはこの写本にまとめられている。

The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript.